<div dir="ltr">Hi Yugo<div><br></div><div>Many thanks for looking into the code. please find my response inline.<br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata <span dir="ltr">&lt;<a href="mailto:nagata@sraoss.co.jp" target="_blank">nagata@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Usama,<br>

<br>

I havn&#39;t test this specification yet, but I dought this affects interlocking.<br>

Certainly VIP conflicts is avoided, but, because there can be two coordinators,<br>

is it possible that there are also two lock-holders? If so, the failover command<br>

will be executed twice on the same backend node.<br>

<br>

Am I missing something?<br></blockquote><div><br></div><div>Yes, you are right. In a scenario like network partitioning, although it&#39;s a very rare scenario, but it is possible that the watchdog cluster could get more than one coordinator nodes and hens, the same number of lock holders after this commit. But before this commit the old behavior was that all nodes would have gone into waiting for quorum state and the cluster would be left with no coordinator.</div><div> </div><div>Now if we analyze both the above situations. The new behavior where a network has more than one coordinator node (but none of the coordinator have the VIP) is a better evil. Because if all the nodes were in waiting for quorum state and cluster have no coordinator node, All interlocking commands will get fail and we will end up with one of the two situations</div><div><br></div><div>a-) All pgpool-II nodes in cluster go on to execute failover commands. (If we implement logic like:  &quot;wait for lock or timeout&quot; )</div><div>b-) None of the pgpool-II node execute the failover (If we do not execute the failover command when the lock command fails).</div><div><br></div><div>But remember the node could go into &quot;waiting for quorum&quot; state in many other situations like.</div><div><br></div><div>1-) only a few pgpool-II nodes are started by a user which are not enough for completing the quorum</div><div>2-) Hardware failure or any other failure, shutdown the pgpool-II nodes and quorum is lost in the cluster.</div><div><br></div><div>In both these two cases with new design the cluster will have only one master and can successfully coordinate the locking and unlocking. So we will achieve the desired behavior with new implementation. But with previous design in both these cases the node would have gone in the waiting for quorum state and there would be no node to coordinate the locking and unlocking commands.</div><div><br></div><div>I can only think of one scenario where a cluster could have multiple coordinator nodes, that is network partitioning. And if in that situation the backend failover happens. Both designs will fail short of being perfect.</div><div><br></div><div>And as far as avoiding the VIP conflict is concerned, I think both designs (current without waiting for quorum state and the old one) will perform just the same. Because in the new design when ever the quorum is lost or not present even if the node becomes the coordinator node it will not acquire the VIP and wait until the quorum is complete. So even when in some case we have more than one coordinator nodes in the cluster, all the coordinator node will have the flag set that the quorum is not present and they will not acquire the VIP or execute wd_escalation command. And only when the cluster recovers itself from the situation and one coordinator node gets connected with the minimum number of nodes require to complete the quorum the VIP will be brought up. So I think the new design will serve well to make sure the IP conflict should not happen.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

And, could you explain what is different between coordinator and escalated node<br>

in your code.<br></blockquote><div> </div><div>The coordinator node is just another name of the master or leader watchdog node. While the escalated node is the master/coordinator node which has VIP and/or has executed wd_escalation command. So only the master/coordinator node can become an escalated node when the quorum is complete. </div><div><br></div><div><br></div><div><br></div><div>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Begin forwarded message:<br>

<br>

Date: Thu, 03 Dec 2015 15:14:29 +0000<br>

From: Muhammad Usama &lt;<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>&gt;<br>

To: <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>

Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the WD_WAITING_FOR_QUORUM state wheneve<br>

<br>

<br>

Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the quorum is<br>

not present or lost. Although it is a good guard against the split-brain syndrome,<br>

but there is a problem with this technique. pgpool-II commands which require<br>

a cluster wide synchronization like interlocking commands start getting failed<br>

when the node is in waiting for quorum, as these commands requires a central<br>

coordinator node for processing<br>

<br>

The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make sure that<br>

the cluster always elects the master node even when the quorum is not present.<br>

But the trick is not to execute the escalation commands on master node when the<br>

quorum is missing and waits until the quorum is complete. This new design<br>

ensures that even when because of network partitioning or some other issue the<br>

cluster gets multiple master nodes (split-brain syndrome) the VIP conflict will<br>

still not happen and that multiple master node would be harmless.<br>

<br>

Branch<br>

------<br>

master<br>

<br>

Details<br>

-------<br>

<a href="http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09" rel="noreferrer" target="_blank">http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09</a><br>

<br>

Modified Files<br>

--------------<br>

src/include/watchdog/watchdog.h |    1 -<br>

src/watchdog/watchdog.c         |  455 +++++++++++++++++++++++++--------------<br>

src/watchdog/wd_escalation.c    |    2 +-<br>

src/watchdog/wd_heartbeat.c     |    6 +-<br>

src/watchdog/wd_lifecheck.c     |    2 +-<br>

5 files changed, 304 insertions(+), 162 deletions(-)<br>

<br>

_______________________________________________<br>

pgpool-committers mailing list<br>

<a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>

<a href="http://www.pgpool.net/mailman/listinfo/pgpool-committers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-committers</a><br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

--<br>

Yugo Nagata &lt;<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>&gt;<br>

_______________________________________________<br>

pgpool-hackers mailing list<br>

<a href="mailto:pgpool-hackers@pgpool.net">pgpool-hackers@pgpool.net</a><br>

<a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>

</font></span></blockquote></div><br></div></div></div>