<div dir="ltr"><div>Hi Ishii-San,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 28, 2019 at 5:55 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Usama,<br>
<br>
> Hi Ishii-San,<br>
> <br>
> The patch looks good overall, but I have few observations.<br>
> <br>
> First I don't think we needed the changes in the<br>
> get_mimimum_nodes_required_for_quorum()<br>
> function, Since the function, returns the int so the change are no-op I<br>
> believe.<br>
> <br>
> Also, I think we need kind of similar changes in<br>
> compute_failover_consensus() where we are checking<br>
> if we have got enough votes for failover as we have done in<br>
> update_quorum_status() function.<br>
> <br>
> So I have updated your patch a little bit. Can you see if the changes I<br>
> made looks good to you?<br>
<br>
Thanks. I will look into this.<br>
<br>
> Secondly, I think we may go for a different configuration parameter name<br>
> for *allow_a_half_consensus*.<br>
> I am not 100 percent convinced on which name we should go with but I have a<br>
> few suggestions for that.<br>
> <br>
> <br>
> <br>
> *1-- consensus_require_half_of_total_votes<br>
> 2-- resolve_consensus_on_half_of_total_votes<br>
> 3--half_of_total_votes_satisfy_majority*<br>
> *4-- half_of_total_votes_are_enough_for_majority*<br>
> *5-- half_of_total_votes_are_enough_for_consensus*<br>
> Thoughts and suggestions?<br>
<br>
They are too long. What about:<br>
<br>
consensus_with_half_of_the_votes<br></blockquote><div><br></div><div>Yes, this oneĀ looks better :-)</div><div><br></div><div>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> Best Regards<br>
> Muhammad Usama<br>
> <br>
> On Wed, Aug 28, 2019 at 1:22 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>
> <br>
>> From: Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>
>> Subject: [pgpool-hackers: 3396] Re: Failover consensus on even number of<br>
>> nodes<br>
>> Date: Tue, 27 Aug 2019 11:11:51 +0900 (JST)<br>
>> Message-ID: <<a href="mailto:20190827.111151.2130894466144469209.t-ishii@sraoss.co.jp" target="_blank">20190827.111151.2130894466144469209.t-ishii@sraoss.co.jp</a>><br>
>><br>
>> >>>>> Hi Ishii-San,<br>
>> >>>>><br>
>> >>>>> On Sat, Aug 17, 2019 at 1:00 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>
>> wrote:<br>
>> >>>>><br>
>> >>>>>> > Hi Ishii-San<br>
>> >>>>>> ><br>
>> >>>>>> ><br>
>> >>>>>> > On Thu, Aug 15, 2019 at 11:42 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a><br>
>> ><br>
>> >>>>>> wrote:<br>
>> >>>>>> ><br>
>> >>>>>> >> Hi Usama,<br>
>> >>>>>> >><br>
>> >>>>>> >> When number of Pgpool-II nodes is even, it seems consensus based<br>
>> >>>>>> >> failover occurs if n/2 Pgpool-II agrees on the failure. For<br>
>> example,<br>
>> >>>>>> >> if there are 4 nodes of Pgpool-II, 2 nodes agree on the failure,<br>
>> >>>>>> >> failover occurs. Is there any reason behind this? I am asking<br>
>> because<br>
>> >>>>>> >> it could easily lead to split brain, because 2 nodes could agree<br>
>> on<br>
>> >>>>>> >> the failover while other 2 nodes disagree. Actually other HA<br>
>> software,<br>
>> >>>>>> >> for example etcd, requires n/2+1 vote to gain consensus.<br>
>> >>>>>> >><br>
>> >>>>>> >><br>
>> >>>>>> >><br>
>> >>>>>><br>
>> <a href="https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance" rel="noreferrer" target="_blank">https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance</a><br>
>> >>>>>> >><br>
>> >>>>>> >> With n/2+1 vote requirements, there's no possibility of split<br>
>> brain.<br>
>> >>>>>> >><br>
>> >>>>>> >><br>
>> >>>>>> > Yes, your observation is spot on. The original motivation to<br>
>> consider the<br>
>> >>>>>> > exact n/2 votes for consensus rather (n/2 +1)<br>
>> >>>>>> > was to ensure the working of 2 node Pgpool-II clusters.<br>
>> >>>>>> > My understanding was that most of the users use 2 Pgpool-II nodes<br>
>> in<br>
>> >>>>>> their<br>
>> >>>>>> > setup, so I wanted<br>
>> >>>>>> > to make sure that in the case when one of the Pgpool-II nodes<br>
>> goes down (<br>
>> >>>>>> > In 2 node) cluster the consensus<br>
>> >>>>>> > should still be possible.<br>
>> >>>>>> > But your point is also valid that makes the system prone to<br>
>> split-brain.<br>
>> >>>>>> So<br>
>> >>>>>> > what are your suggestions on that?<br>
>> >>>>>> > I think we can introduce a new configuration parameter to<br>
>> enable/disable<br>
>> >>>>>> > n/2 node consensus.<br>
>> >>>>>><br>
>> >>>>>> If my understanding is correct, current behavior for 2 node<br>
>> Pgpool-II<br>
>> >>>>>> clusters there's no difference whether failover_when_quorum_exists<br>
>> is<br>
>> >>>>>> on or off. That means for 2 node Pgpool-II clusters even if we<br>
>> change<br>
>> >>>>>> n/2 node consensus to n/2+1 consensus, 2 node users could keep the<br>
>> >>>>>> existing behavior by turning off failover_when_quorum_exists. If<br>
>> this<br>
>> >>>>>> is correct, we don't need to introduce the new switch for 4.1, just<br>
>> >>>>>> change n/2 node consensus to n/2+1 consensus. What do you think?<br>
>> >>>>>><br>
>> >>>>><br>
>> >>>>> Yes, that's true, turning off the failover_when_quorum_exists will<br>
>> >>>>> effectively give us the<br>
>> >>>>> same behaviour for 2 nodes cluster.<br>
>> >>>>><br>
>> >>>>><br>
>> >>>>>> The only concern is 4 node Pgpool-II clusters. I doubt there's 4<br>
>> node<br>
>> >>>>>> users in the field though.<br>
>> >>>>>><br>
>> >>>>><br>
>> >>>>> Yes, you are right there wouldn't be many users who would deploy 4<br>
>> nodes<br>
>> >>>>> cluster. But somehow we need<br>
>> >>>>> to keep the behaviour and configurations consistent for all possible<br>
>> >>>>> scenarios.<br>
>> >>>>><br>
>> >>>>> Also, the decision of considering either n/2 or (n/2 +1) as a valid<br>
>> >>>>> consensus for voting is not only limited to<br>
>> >>>>> the backend node failover. Pgpool-II also considers the valid<br>
>> consensus<br>
>> >>>>> with n/2 votes when deciding the<br>
>> >>>>> watchdog master. And currently, the behaviour of watchdog master<br>
>> elections<br>
>> >>>>> and backend node failover consensus<br>
>> >>>>> building is consistent. So If we want to revisit this we might need<br>
>> to<br>
>> >>>>> consider the behaviour in both cases.<br>
>> >>>><br>
>> >>>> Ok, it seems creating new parameter for switching n/2 or n/2+1 could<br>
>> >>>> be safer, I agree. Usama, would like to implement this for 4.1?<br>
>> >>><br>
>> >>> Attached is a proof of concept patch. GUC and doc change are not<br>
>> >>> included. With the patch, 2 watchdog node cluster will go into "quorum<br>
>> >>> absent" state if one the nodes goes down.<br>
>> >><br>
>> >> Attached is ready for review patch. GUC and English manual included.<br>
>> ><br>
>> > In additon, attached is a patch against 004.watchdog test. Without<br>
>> > this, the test fails.<br>
>><br>
>> If there's no objection, I will commit/push tomorrow.<br>
>><br>
>> Best regards,<br>
>> --<br>
>> Tatsuo Ishii<br>
>> SRA OSS, Inc. Japan<br>
>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
>><br>
</blockquote></div></div>