[pgpool-hackers: 3400] Re: Failover consensus on even number of nodes

Wed Aug 28 21:55:40 JST 2019

Hi Usama,

> Hi Ishii-San,
> 
> The patch looks good overall, but I have few observations.
> 
> First I don't think we needed the changes in the
> get_mimimum_nodes_required_for_quorum()
> function, Since the function, returns the int so the change are no-op I
> believe.
> 
> Also, I think we need kind of similar changes in
> compute_failover_consensus() where we are checking
> if we have got enough votes for failover as we have done in
> update_quorum_status() function.
> 
> So I have updated your patch a little bit. Can you see if the changes I
> made looks good to you?

Thanks. I will look into this.

> Secondly, I think we may go for a different configuration parameter name
> for *allow_a_half_consensus*.
> I am not 100 percent convinced on which name we should go with but I have a
> few suggestions for that.
> 
> 
> 
> *1-- consensus_require_half_of_total_votes
> 2-- resolve_consensus_on_half_of_total_votes
> 3--half_of_total_votes_satisfy_majority*
> *4-- half_of_total_votes_are_enough_for_majority*
> *5-- half_of_total_votes_are_enough_for_consensus*
> Thoughts and suggestions?

They are too long. What about:

consensus_with_half_of_the_votes

> Best Regards
> Muhammad Usama
> 
> On Wed, Aug 28, 2019 at 1:22 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>> Subject: [pgpool-hackers: 3396] Re: Failover consensus on even number of
>> nodes
>> Date: Tue, 27 Aug 2019 11:11:51 +0900 (JST)
>> Message-ID: <20190827.111151.2130894466144469209.t-ishii at sraoss.co.jp>
>>
>> >>>>> Hi Ishii-San,
>> >>>>>
>> >>>>> On Sat, Aug 17, 2019 at 1:00 PM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >>>>>
>> >>>>>> > Hi Ishii-San
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Thu, Aug 15, 2019 at 11:42 AM Tatsuo Ishii <ishii at sraoss.co.jp
>> >
>> >>>>>> wrote:
>> >>>>>> >
>> >>>>>> >> Hi Usama,
>> >>>>>> >>
>> >>>>>> >> When number of Pgpool-II nodes is even, it seems consensus based
>> >>>>>> >> failover occurs if n/2 Pgpool-II agrees on the failure. For
>> example,
>> >>>>>> >> if there are 4 nodes of Pgpool-II, 2 nodes agree on the failure,
>> >>>>>> >> failover occurs. Is there any reason behind this? I am asking
>> because
>> >>>>>> >> it could easily lead to split brain, because 2 nodes could agree
>> on
>> >>>>>> >> the failover while other 2 nodes disagree. Actually other HA
>> software,
>> >>>>>> >> for example etcd, requires n/2+1 vote to gain consensus.
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>>
>> https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance
>> >>>>>> >>
>> >>>>>> >> With n/2+1 vote requirements, there's no possibility of split
>> brain.
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> > Yes, your observation is spot on. The original motivation to
>> consider the
>> >>>>>> > exact n/2 votes for consensus rather (n/2 +1)
>> >>>>>> > was to ensure the working of 2 node Pgpool-II clusters.
>> >>>>>> > My understanding was that most of the users use 2 Pgpool-II nodes
>> in
>> >>>>>> their
>> >>>>>> > setup, so I wanted
>> >>>>>> > to make sure that in the case when one of the Pgpool-II nodes
>> goes down (
>> >>>>>> > In 2 node) cluster the consensus
>> >>>>>> > should still be possible.
>> >>>>>> > But your point is also valid that makes the system prone to
>> split-brain.
>> >>>>>> So
>> >>>>>> > what are your suggestions on that?
>> >>>>>> > I think we can introduce a new configuration parameter to
>> enable/disable
>> >>>>>> > n/2 node consensus.
>> >>>>>>
>> >>>>>> If my understanding is correct, current behavior for 2 node
>> Pgpool-II
>> >>>>>> clusters there's no difference whether failover_when_quorum_exists
>> is
>> >>>>>> on or off. That means for 2 node Pgpool-II clusters even if we
>> change
>> >>>>>> n/2 node consensus to n/2+1 consensus, 2 node users could keep the
>> >>>>>> existing behavior by turning off failover_when_quorum_exists. If
>> this
>> >>>>>> is correct, we don't need to introduce the new switch for 4.1, just
>> >>>>>> change n/2 node consensus to n/2+1 consensus. What do you think?
>> >>>>>>
>> >>>>>
>> >>>>> Yes, that's true, turning off the failover_when_quorum_exists will
>> >>>>> effectively give us the
>> >>>>> same behaviour for 2 nodes cluster.
>> >>>>>
>> >>>>>
>> >>>>>> The only concern is 4 node Pgpool-II clusters. I doubt there's 4
>> node
>> >>>>>> users in the field though.
>> >>>>>>
>> >>>>>
>> >>>>> Yes, you are right there wouldn't be many users who would deploy 4
>> nodes
>> >>>>> cluster. But somehow we need
>> >>>>> to keep the behaviour and configurations consistent for all possible
>> >>>>> scenarios.
>> >>>>>
>> >>>>> Also, the decision of considering either n/2 or (n/2 +1) as a valid
>> >>>>> consensus for voting is not only limited to
>> >>>>> the backend node failover. Pgpool-II also considers the valid
>> consensus
>> >>>>> with n/2 votes when deciding the
>> >>>>> watchdog master. And currently, the behaviour of watchdog master
>> elections
>> >>>>> and backend node failover consensus
>> >>>>> building is consistent. So If we want to revisit this we might need
>> to
>> >>>>> consider the behaviour in both cases.
>> >>>>
>> >>>> Ok, it seems creating new parameter for switching n/2 or n/2+1 could
>> >>>> be safer, I agree. Usama, would like to implement this for 4.1?
>> >>>
>> >>> Attached is a proof of concept patch. GUC and doc change are not
>> >>> included. With the patch, 2 watchdog node cluster will go into "quorum
>> >>> absent" state if one the nodes goes down.
>> >>
>> >> Attached is ready for review patch. GUC and English manual included.
>> >
>> > In additon, attached is a patch against 004.watchdog test. Without
>> > this, the test fails.
>>
>> If there's no objection, I will commit/push tomorrow.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>