<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 11, 2015 at 12:55 PM, Yugo Nagata <span dir="ltr">&lt;<a href="mailto:nagata@sraoss.co.jp" target="_blank">nagata@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Usama,<br>

<br>

What do you think about adding some description about this into the document?<br>

For example, there is the description as bellow in the current document:<br>

<br>

  A master watchdog node can resign from being a master node, when the master node<br>

  pgpool-II shuts down, detects a network blackout or detects the lost of quorum.<br>

<br>

It&#39;s not so correct.<br></blockquote><div><br></div><div>Yes, Thank you for pointing this out. I forgot about the documentation part while working on that patch. I will update the documentation accordingly .</div><div> </div><div>Regards</div><div>Muhammad Usama</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class="im"><br>

<br>

On Fri, 4 Dec 2015 12:54:30 +0500<br>

Muhammad Usama &lt;<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>&gt; wrote:<br>

<br>

</span><div class=""><div class="h5">&gt; Hi Yugo<br>

&gt;<br>

&gt; Many thanks for looking into the code. please find my response inline.<br>

&gt;<br>

&gt; On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata &lt;<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt; Usama,<br>

&gt; &gt;<br>

&gt; &gt; I havn&#39;t test this specification yet, but I dought this affects<br>

&gt; &gt; interlocking.<br>

&gt; &gt; Certainly VIP conflicts is avoided, but, because there can be two<br>

&gt; &gt; coordinators,<br>

&gt; &gt; is it possible that there are also two lock-holders? If so, the failover<br>

&gt; &gt; command<br>

&gt; &gt; will be executed twice on the same backend node.<br>

&gt; &gt;<br>

&gt; &gt; Am I missing something?<br>

&gt; &gt;<br>

&gt;<br>

&gt; Yes, you are right. In a scenario like network partitioning, although it&#39;s<br>

&gt; a very rare scenario, but it is possible that the watchdog cluster could<br>

&gt; get more than one coordinator nodes and hens, the same number of lock<br>

&gt; holders after this commit. But before this commit the old behavior was that<br>

&gt; all nodes would have gone into waiting for quorum state and the cluster<br>

&gt; would be left with no coordinator.<br>

&gt;<br>

&gt; Now if we analyze both the above situations. The new behavior where a<br>

&gt; network has more than one coordinator node (but none of the coordinator<br>

&gt; have the VIP) is a better evil. Because if all the nodes were in waiting<br>

&gt; for quorum state and cluster have no coordinator node, All interlocking<br>

&gt; commands will get fail and we will end up with one of the two situations<br>

&gt;<br>

&gt; a-) All pgpool-II nodes in cluster go on to execute failover commands. (If<br>

&gt; we implement logic like:  &quot;wait for lock or timeout&quot; )<br>

&gt; b-) None of the pgpool-II node execute the failover (If we do not execute<br>

&gt; the failover command when the lock command fails).<br>

&gt;<br>

&gt; But remember the node could go into &quot;waiting for quorum&quot; state in many<br>

&gt; other situations like.<br>

&gt;<br>

&gt; 1-) only a few pgpool-II nodes are started by a user which are not enough<br>

&gt; for completing the quorum<br>

&gt; 2-) Hardware failure or any other failure, shutdown the pgpool-II nodes and<br>

&gt; quorum is lost in the cluster.<br>

&gt;<br>

&gt; In both these two cases with new design the cluster will have only one<br>

&gt; master and can successfully coordinate the locking and unlocking. So we<br>

&gt; will achieve the desired behavior with new implementation. But with<br>

&gt; previous design in both these cases the node would have gone in the waiting<br>

&gt; for quorum state and there would be no node to coordinate the locking and<br>

&gt; unlocking commands.<br>

&gt;<br>

&gt; I can only think of one scenario where a cluster could have multiple<br>

&gt; coordinator nodes, that is network partitioning. And if in that situation<br>

&gt; the backend failover happens. Both designs will fail short of being perfect.<br>

&gt;<br>

&gt; And as far as avoiding the VIP conflict is concerned, I think both designs<br>

&gt; (current without waiting for quorum state and the old one) will perform<br>

&gt; just the same. Because in the new design when ever the quorum is lost or<br>

&gt; not present even if the node becomes the coordinator node it will not<br>

&gt; acquire the VIP and wait until the quorum is complete. So even when in some<br>

&gt; case we have more than one coordinator nodes in the cluster, all the<br>

&gt; coordinator node will have the flag set that the quorum is not present and<br>

&gt; they will not acquire the VIP or execute wd_escalation command. And only<br>

&gt; when the cluster recovers itself from the situation and one coordinator<br>

&gt; node gets connected with the minimum number of nodes require to complete<br>

&gt; the quorum the VIP will be brought up. So I think the new design will serve<br>

&gt; well to make sure the IP conflict should not happen.<br>

&gt;<br>

&gt;<br>

&gt; &gt; And, could you explain what is different between coordinator and escalated<br>

&gt; &gt; node<br>

&gt; &gt; in your code.<br>

&gt; &gt;<br>

&gt;<br>

&gt; The coordinator node is just another name of the master or leader watchdog<br>

&gt; node. While the escalated node is the master/coordinator node which has VIP<br>

&gt; and/or has executed wd_escalation command. So only the master/coordinator<br>

&gt; node can become an escalated node when the quorum is complete.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Thanks<br>

&gt; Best regards<br>

&gt; Muhammad Usama<br>

&gt;<br>

&gt;<br>

&gt; &gt; Begin forwarded message:<br>

&gt; &gt;<br>

&gt; &gt; Date: Thu, 03 Dec 2015 15:14:29 +0000<br>

&gt; &gt; From: Muhammad Usama &lt;<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>&gt;<br>

&gt; &gt; To: <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>

&gt; &gt; Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the<br>

&gt; &gt; WD_WAITING_FOR_QUORUM state wheneve<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the<br>

&gt; &gt; quorum is<br>

&gt; &gt; not present or lost. Although it is a good guard against the split-brain<br>

&gt; &gt; syndrome,<br>

&gt; &gt; but there is a problem with this technique. pgpool-II commands which<br>

&gt; &gt; require<br>

&gt; &gt; a cluster wide synchronization like interlocking commands start getting<br>

&gt; &gt; failed<br>

&gt; &gt; when the node is in waiting for quorum, as these commands requires a<br>

&gt; &gt; central<br>

&gt; &gt; coordinator node for processing<br>

&gt; &gt;<br>

&gt; &gt; The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make<br>

&gt; &gt; sure that<br>

&gt; &gt; the cluster always elects the master node even when the quorum is not<br>

&gt; &gt; present.<br>

&gt; &gt; But the trick is not to execute the escalation commands on master node<br>

&gt; &gt; when the<br>

&gt; &gt; quorum is missing and waits until the quorum is complete. This new design<br>

&gt; &gt; ensures that even when because of network partitioning or some other issue<br>

&gt; &gt; the<br>

&gt; &gt; cluster gets multiple master nodes (split-brain syndrome) the VIP conflict<br>

&gt; &gt; will<br>

&gt; &gt; still not happen and that multiple master node would be harmless.<br>

&gt; &gt;<br>

&gt; &gt; Branch<br>

&gt; &gt; ------<br>

&gt; &gt; master<br>

&gt; &gt;<br>

&gt; &gt; Details<br>

&gt; &gt; -------<br>

&gt; &gt;<br>

&gt; &gt; <a href="http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09" rel="noreferrer" target="_blank">http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09</a><br>

&gt; &gt;<br>

&gt; &gt; Modified Files<br>

&gt; &gt; --------------<br>

&gt; &gt; src/include/watchdog/watchdog.h |    1 -<br>

&gt; &gt; src/watchdog/watchdog.c         |  455<br>

&gt; &gt; +++++++++++++++++++++++++--------------<br>

&gt; &gt; src/watchdog/wd_escalation.c    |    2 +-<br>

&gt; &gt; src/watchdog/wd_heartbeat.c     |    6 +-<br>

&gt; &gt; src/watchdog/wd_lifecheck.c     |    2 +-<br>

&gt; &gt; 5 files changed, 304 insertions(+), 162 deletions(-)<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; pgpool-committers mailing list<br>

&gt; &gt; <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>

&gt; &gt; <a href="http://www.pgpool.net/mailman/listinfo/pgpool-committers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-committers</a><br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; --<br>

&gt; &gt; Yugo Nagata &lt;<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>&gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; pgpool-hackers mailing list<br>

&gt; &gt; <a href="mailto:pgpool-hackers@pgpool.net">pgpool-hackers@pgpool.net</a><br>

&gt; &gt; <a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>

&gt; &gt;<br>

<br>

<br>

</div></div><span class=""><font color="#888888">--<br>

Yugo Nagata &lt;<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>&gt;<br>

</font></span></blockquote></div><br></div></div>