<div dir="ltr"><div>Hi Ishii-San,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 28, 2019 at 5:55 PM Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Usama,<br>

<br>

&gt; Hi Ishii-San,<br>

&gt; <br>

&gt; The patch looks good overall, but I have few observations.<br>

&gt; <br>

&gt; First I don&#39;t think we needed the changes in the<br>

&gt; get_mimimum_nodes_required_for_quorum()<br>

&gt; function, Since the function, returns the int so the change are no-op I<br>

&gt; believe.<br>

&gt; <br>

&gt; Also, I think we need kind of similar changes in<br>

&gt; compute_failover_consensus() where we are checking<br>

&gt; if we have got enough votes for failover as we have done in<br>

&gt; update_quorum_status() function.<br>

&gt; <br>

&gt; So I have updated your patch a little bit. Can you see if the changes I<br>

&gt; made looks good to you?<br>

<br>

Thanks. I will look into this.<br>

<br>

&gt; Secondly, I think we may go for a different configuration parameter name<br>

&gt; for *allow_a_half_consensus*.<br>

&gt; I am not 100 percent convinced on which name we should go with but I have a<br>

&gt; few suggestions for that.<br>

&gt; <br>

&gt; <br>

&gt; <br>

&gt; *1-- consensus_require_half_of_total_votes<br>

&gt; 2-- resolve_consensus_on_half_of_total_votes<br>

&gt; 3--half_of_total_votes_satisfy_majority*<br>

&gt; *4-- half_of_total_votes_are_enough_for_majority*<br>

&gt; *5-- half_of_total_votes_are_enough_for_consensus*<br>

&gt; Thoughts and suggestions?<br>

<br>

They are too long. What about:<br>

<br>

consensus_with_half_of_the_votes<br></blockquote><div><br></div><div>Yes, this one  looks better :-)</div><div><br></div><div>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

&gt; Best Regards<br>

&gt; Muhammad Usama<br>

&gt; <br>

&gt; On Wed, Aug 28, 2019 at 1:22 PM Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt; wrote:<br>

&gt; <br>

&gt;&gt; From: Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;<br>

&gt;&gt; Subject: [pgpool-hackers: 3396] Re: Failover consensus on even number of<br>

&gt;&gt; nodes<br>

&gt;&gt; Date: Tue, 27 Aug 2019 11:11:51 +0900 (JST)<br>

&gt;&gt; Message-ID: &lt;<a href="mailto:20190827.111151.2130894466144469209.t-ishii@sraoss.co.jp" target="_blank">20190827.111151.2130894466144469209.t-ishii@sraoss.co.jp</a>&gt;<br>

&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; Hi Ishii-San,<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; On Sat, Aug 17, 2019 at 1:00 PM Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;<br>

&gt;&gt; wrote:<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Hi Ishii-San<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; On Thu, Aug 15, 2019 at 11:42 AM Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a><br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; wrote:<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; Hi Usama,<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; When number of Pgpool-II nodes is even, it seems consensus based<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; failover occurs if n/2 Pgpool-II agrees on the failure. For<br>

&gt;&gt; example,<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; if there are 4 nodes of Pgpool-II, 2 nodes agree on the failure,<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; failover occurs. Is there any reason behind this? I am asking<br>

&gt;&gt; because<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; it could easily lead to split brain, because 2 nodes could agree<br>

&gt;&gt; on<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; the failover while other 2 nodes disagree. Actually other HA<br>

&gt;&gt; software,<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; for example etcd, requires n/2+1 vote to gain consensus.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; <a href="https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance" rel="noreferrer" target="_blank">https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance</a><br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; With n/2+1 vote requirements, there&#39;s no possibility of split<br>

&gt;&gt; brain.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Yes, your observation is spot on. The original motivation to<br>

&gt;&gt; consider the<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; exact n/2 votes for consensus rather (n/2 +1)<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; was to ensure the working of 2 node Pgpool-II clusters.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; My understanding was that most of the users use 2 Pgpool-II nodes<br>

&gt;&gt; in<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; their<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; setup, so I wanted<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; to make sure that in the case when one of the Pgpool-II nodes<br>

&gt;&gt; goes down (<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; In 2 node) cluster the consensus<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; should still be possible.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; But your point is also valid that makes the system prone to<br>

&gt;&gt; split-brain.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; So<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; what are your suggestions on that?<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; I think we can introduce a new configuration parameter to<br>

&gt;&gt; enable/disable<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; n/2 node consensus.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; If my understanding is correct, current behavior for 2 node<br>

&gt;&gt; Pgpool-II<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; clusters there&#39;s no difference whether failover_when_quorum_exists<br>

&gt;&gt; is<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; on or off. That means for 2 node Pgpool-II clusters even if we<br>

&gt;&gt; change<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; n/2 node consensus to n/2+1 consensus, 2 node users could keep the<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; existing behavior by turning off failover_when_quorum_exists. If<br>

&gt;&gt; this<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; is correct, we don&#39;t need to introduce the new switch for 4.1, just<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; change n/2 node consensus to n/2+1 consensus. What do you think?<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; Yes, that&#39;s true, turning off the failover_when_quorum_exists will<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; effectively give us the<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; same behaviour for 2 nodes cluster.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; The only concern is 4 node Pgpool-II clusters. I doubt there&#39;s 4<br>

&gt;&gt; node<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; users in the field though.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; Yes, you are right there wouldn&#39;t be many users who would deploy 4<br>

&gt;&gt; nodes<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; cluster. But somehow we need<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; to keep the behaviour and configurations consistent for all possible<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; scenarios.<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; Also, the decision of considering either n/2 or (n/2 +1) as a valid<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; consensus for voting is not only limited to<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; the backend node failover. Pgpool-II also considers the valid<br>

&gt;&gt; consensus<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; with n/2 votes when deciding the<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; watchdog master. And currently, the behaviour of watchdog master<br>

&gt;&gt; elections<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; and backend node failover consensus<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; building is consistent. So If we want to revisit this we might need<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt;&gt;&gt;&gt; consider the behaviour in both cases.<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; Ok, it seems creating new parameter for switching n/2 or n/2+1 could<br>

&gt;&gt; &gt;&gt;&gt;&gt; be safer, I agree. Usama, would like to implement this for 4.1?<br>

&gt;&gt; &gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt; Attached is a proof of concept patch. GUC and doc change are not<br>

&gt;&gt; &gt;&gt;&gt; included. With the patch, 2 watchdog node cluster will go into &quot;quorum<br>

&gt;&gt; &gt;&gt;&gt; absent&quot; state if one the nodes goes down.<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; Attached is ready for review patch. GUC and English manual included.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; In additon, attached is a patch against 004.watchdog test. Without<br>

&gt;&gt; &gt; this, the test fails.<br>

&gt;&gt;<br>

&gt;&gt; If there&#39;s no objection, I will commit/push tomorrow.<br>

&gt;&gt;<br>

&gt;&gt; Best regards,<br>

&gt;&gt; --<br>

&gt;&gt; Tatsuo Ishii<br>

&gt;&gt; SRA OSS, Inc. Japan<br>

&gt;&gt; English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

&gt;&gt; Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

&gt;&gt;<br>

</blockquote></div></div>