[pgpool-hackers: 3401] Re: Failover consensus on even number of nodes

Muhammad Usama m.usama at gmail.com
Wed Aug 28 22:42:08 JST 2019


Hi Ishii-San,

On Wed, Aug 28, 2019 at 5:55 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> > Hi Ishii-San,
> >
> > The patch looks good overall, but I have few observations.
> >
> > First I don't think we needed the changes in the
> > get_mimimum_nodes_required_for_quorum()
> > function, Since the function, returns the int so the change are no-op I
> > believe.
> >
> > Also, I think we need kind of similar changes in
> > compute_failover_consensus() where we are checking
> > if we have got enough votes for failover as we have done in
> > update_quorum_status() function.
> >
> > So I have updated your patch a little bit. Can you see if the changes I
> > made looks good to you?
>
> Thanks. I will look into this.
>
> > Secondly, I think we may go for a different configuration parameter name
> > for *allow_a_half_consensus*.
> > I am not 100 percent convinced on which name we should go with but I
> have a
> > few suggestions for that.
> >
> >
> >
> > *1-- consensus_require_half_of_total_votes
> > 2-- resolve_consensus_on_half_of_total_votes
> > 3--half_of_total_votes_satisfy_majority*
> > *4-- half_of_total_votes_are_enough_for_majority*
> > *5-- half_of_total_votes_are_enough_for_consensus*
> > Thoughts and suggestions?
>
> They are too long. What about:
>
> consensus_with_half_of_the_votes
>

Yes, this one  looks better :-)

Thanks
Best regards
Muhammad Usama


> > Best Regards
> > Muhammad Usama
> >
> > On Wed, Aug 28, 2019 at 1:22 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> >> Subject: [pgpool-hackers: 3396] Re: Failover consensus on even number of
> >> nodes
> >> Date: Tue, 27 Aug 2019 11:11:51 +0900 (JST)
> >> Message-ID: <20190827.111151.2130894466144469209.t-ishii at sraoss.co.jp>
> >>
> >> >>>>> Hi Ishii-San,
> >> >>>>>
> >> >>>>> On Sat, Aug 17, 2019 at 1:00 PM Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >>>>>
> >> >>>>>> > Hi Ishii-San
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Thu, Aug 15, 2019 at 11:42 AM Tatsuo Ishii <
> ishii at sraoss.co.jp
> >> >
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Hi Usama,
> >> >>>>>> >>
> >> >>>>>> >> When number of Pgpool-II nodes is even, it seems consensus
> based
> >> >>>>>> >> failover occurs if n/2 Pgpool-II agrees on the failure. For
> >> example,
> >> >>>>>> >> if there are 4 nodes of Pgpool-II, 2 nodes agree on the
> failure,
> >> >>>>>> >> failover occurs. Is there any reason behind this? I am asking
> >> because
> >> >>>>>> >> it could easily lead to split brain, because 2 nodes could
> agree
> >> on
> >> >>>>>> >> the failover while other 2 nodes disagree. Actually other HA
> >> software,
> >> >>>>>> >> for example etcd, requires n/2+1 vote to gain consensus.
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>>
> >>
> https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance
> >> >>>>>> >>
> >> >>>>>> >> With n/2+1 vote requirements, there's no possibility of split
> >> brain.
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> > Yes, your observation is spot on. The original motivation to
> >> consider the
> >> >>>>>> > exact n/2 votes for consensus rather (n/2 +1)
> >> >>>>>> > was to ensure the working of 2 node Pgpool-II clusters.
> >> >>>>>> > My understanding was that most of the users use 2 Pgpool-II
> nodes
> >> in
> >> >>>>>> their
> >> >>>>>> > setup, so I wanted
> >> >>>>>> > to make sure that in the case when one of the Pgpool-II nodes
> >> goes down (
> >> >>>>>> > In 2 node) cluster the consensus
> >> >>>>>> > should still be possible.
> >> >>>>>> > But your point is also valid that makes the system prone to
> >> split-brain.
> >> >>>>>> So
> >> >>>>>> > what are your suggestions on that?
> >> >>>>>> > I think we can introduce a new configuration parameter to
> >> enable/disable
> >> >>>>>> > n/2 node consensus.
> >> >>>>>>
> >> >>>>>> If my understanding is correct, current behavior for 2 node
> >> Pgpool-II
> >> >>>>>> clusters there's no difference whether
> failover_when_quorum_exists
> >> is
> >> >>>>>> on or off. That means for 2 node Pgpool-II clusters even if we
> >> change
> >> >>>>>> n/2 node consensus to n/2+1 consensus, 2 node users could keep
> the
> >> >>>>>> existing behavior by turning off failover_when_quorum_exists. If
> >> this
> >> >>>>>> is correct, we don't need to introduce the new switch for 4.1,
> just
> >> >>>>>> change n/2 node consensus to n/2+1 consensus. What do you think?
> >> >>>>>>
> >> >>>>>
> >> >>>>> Yes, that's true, turning off the failover_when_quorum_exists will
> >> >>>>> effectively give us the
> >> >>>>> same behaviour for 2 nodes cluster.
> >> >>>>>
> >> >>>>>
> >> >>>>>> The only concern is 4 node Pgpool-II clusters. I doubt there's 4
> >> node
> >> >>>>>> users in the field though.
> >> >>>>>>
> >> >>>>>
> >> >>>>> Yes, you are right there wouldn't be many users who would deploy 4
> >> nodes
> >> >>>>> cluster. But somehow we need
> >> >>>>> to keep the behaviour and configurations consistent for all
> possible
> >> >>>>> scenarios.
> >> >>>>>
> >> >>>>> Also, the decision of considering either n/2 or (n/2 +1) as a
> valid
> >> >>>>> consensus for voting is not only limited to
> >> >>>>> the backend node failover. Pgpool-II also considers the valid
> >> consensus
> >> >>>>> with n/2 votes when deciding the
> >> >>>>> watchdog master. And currently, the behaviour of watchdog master
> >> elections
> >> >>>>> and backend node failover consensus
> >> >>>>> building is consistent. So If we want to revisit this we might
> need
> >> to
> >> >>>>> consider the behaviour in both cases.
> >> >>>>
> >> >>>> Ok, it seems creating new parameter for switching n/2 or n/2+1
> could
> >> >>>> be safer, I agree. Usama, would like to implement this for 4.1?
> >> >>>
> >> >>> Attached is a proof of concept patch. GUC and doc change are not
> >> >>> included. With the patch, 2 watchdog node cluster will go into
> "quorum
> >> >>> absent" state if one the nodes goes down.
> >> >>
> >> >> Attached is ready for review patch. GUC and English manual included.
> >> >
> >> > In additon, attached is a patch against 004.watchdog test. Without
> >> > this, the test fails.
> >>
> >> If there's no objection, I will commit/push tomorrow.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190828/a6184ed7/attachment.html>


More information about the pgpool-hackers mailing list