<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 11, 2015 at 12:55 PM, Yugo Nagata <span dir="ltr"><<a href="mailto:nagata@sraoss.co.jp" target="_blank">nagata@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Usama,<br>
<br>
What do you think about adding some description about this into the document?<br>
For example, there is the description as bellow in the current document:<br>
<br>
A master watchdog node can resign from being a master node, when the master node<br>
pgpool-II shuts down, detects a network blackout or detects the lost of quorum.<br>
<br>
It's not so correct.<br></blockquote><div><br></div><div>Yes, Thank you for pointing this out. I forgot about the documentation part while working on that patch. I will update the documentation accordingly .</div><div> </div><div>Regards</div><div>Muhammad Usama</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span class="im"><br>
<br>
On Fri, 4 Dec 2015 12:54:30 +0500<br>
Muhammad Usama <<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>> wrote:<br>
<br>
</span><div class=""><div class="h5">> Hi Yugo<br>
><br>
> Many thanks for looking into the code. please find my response inline.<br>
><br>
> On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata <<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>> wrote:<br>
><br>
> > Usama,<br>
> ><br>
> > I havn't test this specification yet, but I dought this affects<br>
> > interlocking.<br>
> > Certainly VIP conflicts is avoided, but, because there can be two<br>
> > coordinators,<br>
> > is it possible that there are also two lock-holders? If so, the failover<br>
> > command<br>
> > will be executed twice on the same backend node.<br>
> ><br>
> > Am I missing something?<br>
> ><br>
><br>
> Yes, you are right. In a scenario like network partitioning, although it's<br>
> a very rare scenario, but it is possible that the watchdog cluster could<br>
> get more than one coordinator nodes and hens, the same number of lock<br>
> holders after this commit. But before this commit the old behavior was that<br>
> all nodes would have gone into waiting for quorum state and the cluster<br>
> would be left with no coordinator.<br>
><br>
> Now if we analyze both the above situations. The new behavior where a<br>
> network has more than one coordinator node (but none of the coordinator<br>
> have the VIP) is a better evil. Because if all the nodes were in waiting<br>
> for quorum state and cluster have no coordinator node, All interlocking<br>
> commands will get fail and we will end up with one of the two situations<br>
><br>
> a-) All pgpool-II nodes in cluster go on to execute failover commands. (If<br>
> we implement logic like: "wait for lock or timeout" )<br>
> b-) None of the pgpool-II node execute the failover (If we do not execute<br>
> the failover command when the lock command fails).<br>
><br>
> But remember the node could go into "waiting for quorum" state in many<br>
> other situations like.<br>
><br>
> 1-) only a few pgpool-II nodes are started by a user which are not enough<br>
> for completing the quorum<br>
> 2-) Hardware failure or any other failure, shutdown the pgpool-II nodes and<br>
> quorum is lost in the cluster.<br>
><br>
> In both these two cases with new design the cluster will have only one<br>
> master and can successfully coordinate the locking and unlocking. So we<br>
> will achieve the desired behavior with new implementation. But with<br>
> previous design in both these cases the node would have gone in the waiting<br>
> for quorum state and there would be no node to coordinate the locking and<br>
> unlocking commands.<br>
><br>
> I can only think of one scenario where a cluster could have multiple<br>
> coordinator nodes, that is network partitioning. And if in that situation<br>
> the backend failover happens. Both designs will fail short of being perfect.<br>
><br>
> And as far as avoiding the VIP conflict is concerned, I think both designs<br>
> (current without waiting for quorum state and the old one) will perform<br>
> just the same. Because in the new design when ever the quorum is lost or<br>
> not present even if the node becomes the coordinator node it will not<br>
> acquire the VIP and wait until the quorum is complete. So even when in some<br>
> case we have more than one coordinator nodes in the cluster, all the<br>
> coordinator node will have the flag set that the quorum is not present and<br>
> they will not acquire the VIP or execute wd_escalation command. And only<br>
> when the cluster recovers itself from the situation and one coordinator<br>
> node gets connected with the minimum number of nodes require to complete<br>
> the quorum the VIP will be brought up. So I think the new design will serve<br>
> well to make sure the IP conflict should not happen.<br>
><br>
><br>
> > And, could you explain what is different between coordinator and escalated<br>
> > node<br>
> > in your code.<br>
> ><br>
><br>
> The coordinator node is just another name of the master or leader watchdog<br>
> node. While the escalated node is the master/coordinator node which has VIP<br>
> and/or has executed wd_escalation command. So only the master/coordinator<br>
> node can become an escalated node when the quorum is complete.<br>
><br>
><br>
><br>
> Thanks<br>
> Best regards<br>
> Muhammad Usama<br>
><br>
><br>
> > Begin forwarded message:<br>
> ><br>
> > Date: Thu, 03 Dec 2015 15:14:29 +0000<br>
> > From: Muhammad Usama <<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>><br>
> > To: <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>
> > Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the<br>
> > WD_WAITING_FOR_QUORUM state wheneve<br>
> ><br>
> ><br>
> > Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the<br>
> > quorum is<br>
> > not present or lost. Although it is a good guard against the split-brain<br>
> > syndrome,<br>
> > but there is a problem with this technique. pgpool-II commands which<br>
> > require<br>
> > a cluster wide synchronization like interlocking commands start getting<br>
> > failed<br>
> > when the node is in waiting for quorum, as these commands requires a<br>
> > central<br>
> > coordinator node for processing<br>
> ><br>
> > The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make<br>
> > sure that<br>
> > the cluster always elects the master node even when the quorum is not<br>
> > present.<br>
> > But the trick is not to execute the escalation commands on master node<br>
> > when the<br>
> > quorum is missing and waits until the quorum is complete. This new design<br>
> > ensures that even when because of network partitioning or some other issue<br>
> > the<br>
> > cluster gets multiple master nodes (split-brain syndrome) the VIP conflict<br>
> > will<br>
> > still not happen and that multiple master node would be harmless.<br>
> ><br>
> > Branch<br>
> > ------<br>
> > master<br>
> ><br>
> > Details<br>
> > -------<br>
> ><br>
> > <a href="http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09" rel="noreferrer" target="_blank">http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09</a><br>
> ><br>
> > Modified Files<br>
> > --------------<br>
> > src/include/watchdog/watchdog.h | 1 -<br>
> > src/watchdog/watchdog.c | 455<br>
> > +++++++++++++++++++++++++--------------<br>
> > src/watchdog/wd_escalation.c | 2 +-<br>
> > src/watchdog/wd_heartbeat.c | 6 +-<br>
> > src/watchdog/wd_lifecheck.c | 2 +-<br>
> > 5 files changed, 304 insertions(+), 162 deletions(-)<br>
> ><br>
> > _______________________________________________<br>
> > pgpool-committers mailing list<br>
> > <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>
> > <a href="http://www.pgpool.net/mailman/listinfo/pgpool-committers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-committers</a><br>
> ><br>
> ><br>
> > --<br>
> > Yugo Nagata <<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>><br>
> > _______________________________________________<br>
> > pgpool-hackers mailing list<br>
> > <a href="mailto:pgpool-hackers@pgpool.net">pgpool-hackers@pgpool.net</a><br>
> > <a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>
> ><br>
<br>
<br>
</div></div><span class=""><font color="#888888">--<br>
Yugo Nagata <<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>><br>
</font></span></blockquote></div><br></div></div>