<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 24, 2017 at 4:15 AM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Usama,<br>
<br>
Thanks for the patch. I am going to review it.<br></blockquote><div><br></div><div>Thanks :-) </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In the mean time when I apply your patch, I got some trailing<br>
whitespace errors. Can you please fix them?<br>
<br>
/home/t-ishii/quorum_aware_<wbr>failover.diff:470: trailing whitespace.<br>
<br>
/home/t-ishii/quorum_aware_<wbr>failover.diff:485: trailing whitespace.<br>
<br>
/home/t-ishii/quorum_aware_<wbr>failover.diff:564: trailing whitespace.<br>
<br>
/home/t-ishii/quorum_aware_<wbr>failover.diff:1428: trailing whitespace.<br>
<br>
/home/t-ishii/quorum_aware_<wbr>failover.diff:1450: trailing whitespace.<br>
<br>
warning: squelched 3 whitespace errors<br>
warning: 8 lines add whitespace errors.<br></blockquote><div><br></div><div>Yes their are these and also some more code cleanup still remaining, I wanted to share the first version</div><div>as soon as possible because I want to take the community consent about the way changed behaviour .</div><div><br></div><div>The final version will have all these issues addressed.</div><div><br></div><div>Thanks</div><div>Best Regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
<span class=""><br>
> Hi<br>
><br>
> I was working on the new feature to make the backend node failover quorum<br>
> aware and on the half way through the implementation I also added the<br>
> majority consensus feature for the same.<br>
><br>
> So please find the first version of the patch for review that makes the<br>
> backend node failover consider the watchdog cluster quorum status and seek<br>
> the majority consensus before performing failover.<br>
><br>
</span>> *Changes in the Failover mechanism with watchdog.*<br>
<span class="">> For this new feature I have modified the Pgpool-II's existing failover<br>
> mechanism with watchdog.<br>
> Previously as you know when the Pgpool-II require to perform a node<br>
> operation (failover, failback, promote-node) with the watchdog. The<br>
> watchdog used to propagated the failover request to all the Pgpool-II nodes<br>
> in the watchdog cluster and as soon as the request was received by the<br>
> node, it used to initiate the local failover and that failover was<br>
> synchronised on all nodes using the distributed locks.<br>
><br>
</span>> *Now Only the Master node performs the failover.*<br>
<span class="">> The attached patch changes the mechanism of synchronised failover, and now<br>
> only the Pgpool-II of master watchdog node performs the failover, and all<br>
> other standby nodes sync the backend statuses after the master Pgpool-II is<br>
> finished with the failover.<br>
><br>
</span>> *Overview of new failover mechanism.*<br>
<span class="">> -- If the failover request is received to the standby watchdog node(from<br>
> local Pgpool-II), That request is forwarded to the master watchdog and the<br>
> Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE<br>
> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the<br>
> watchdog for the failover request the requesting Pgpool-II moves forward<br>
> without doing anything further for the particular failover command.<br>
><br>
> -- Now when the failover request from standby node is received by the<br>
> master watchdog, after performing the validation, applying the consensus<br>
> rules the failover request is triggered on the local Pgpool-II .<br>
><br>
> -- When the failover request is received to the master watchdog node from<br>
> the local Pgpool-II (On the IPC channel) the watchdog process inform the<br>
> Pgpool-II requesting process to proceed with failover (provided all<br>
> failover rules are satisfied).<br>
><br>
> -- After the failover is finished on the master Pgpool-II, the failover<br>
</span>> function calls the *wd_failover_end*() which sends the backend sync<br>
<span class="">> required message to all standby watchdogs.<br>
><br>
> -- Upon receiving the sync required message from master watchdog node all<br>
> Pgpool-II sync the new statuses of each backend node from the master<br>
> watchdog.<br>
><br>
</span>> *No More Failover locks*<br>
<span class="">> Since with this new failover mechanism we do not require any<br>
> synchronisation and guards against the execution of failover_commands by<br>
> multiple Pgpool-II nodes, So the patch removes all the distributed locks<br>
> from failover function, This makes the failover simpler and faster.<br>
><br>
</span>> *New kind of Failover operation NODE_QUARANTINE_REQUEST*<br>
<span class="">> The patch adds the new kind of backend node operation NODE_QUARANTINE which<br>
> is effectively same as the NODE_DOWN, but with node_quarantine the<br>
> failover_command is not triggered.<br>
> The NODE_DOWN_REQUEST is automatically converted to the<br>
> NODE_QUARANTINE_REQUEST when the failover is requested on the backend node<br>
> but watchdog cluster does not holds the quorum.<br>
> This means in the absence of quorum the failed backend nodes are<br>
> quarantined and when the quorum becomes available again the Pgpool-II<br>
> performs the failback operation on all quarantine nodes.<br>
> And again when the failback is performed on the quarantine backend node the<br>
> failover function does not trigger the failback_command.<br>
><br>
</span>> *Controlling the Failover behaviour.*<br>
<span class="">> The patch adds three new configuration parameters to configure the failover<br>
> behaviour from user side.<br>
><br>
</span>> *failover_when_quorum_exists*<br>
<span class="">> When enabled the failover command will only be executed when the watchdog<br>
> cluster holds the quorum. And when the quorum is absent and<br>
> failover_when_quorum_exists is enabled the failed backend nodes will get<br>
> quarantine until the quorum becomes available again.<br>
> disabling it will enable the old behaviour of failover commands.<br>
><br>
><br>
</span>> *failover_require_consensus*<wbr>This new configuration parameter can be used to<br>
<span class="">> make sure we get the majority vote before performing the failover on the<br>
</span>> node. When *failover_require_consensus* is enabled then the failover is<br>
<span class="">> only performed after receiving the failover request from the majority or<br>
> Pgpool-II nodes.<br>
> For example in three nodes cluster the failover will not be performed until<br>
> at least two nodes ask for performing the failover on the particular<br>
> backend node.<br>
><br>
</span>> It is also worthwhile to mention here that *failover_require_consensus*<br>
<span class="">> only works when failover_when_quorum_exists is enables.<br>
><br>
><br>
</span>> *enable_multiple_failover_<wbr>requests_from_node*<br>
> This parameter works in connection with *failover_require_consensus*<br>
<span class="">> config. When enabled a single Pgpool-II node can vote for failover multiple<br>
> times.<br>
> For example in the three nodes cluster if one Pgpool-II node sends the<br>
> failover request of particular node twice that would be counted as two<br>
> votes in favour of failover and the failover will be performed even if we<br>
> do not get a vote from other two nodes.<br>
><br>
</span>> And when *enable_multiple_failover_<wbr>requests_from_node* is disabled, Only<br>
<span class="">> the first vote from each Pgpool-II will be accepted and all other<br>
> subsequent votes will be marked duplicate and rejected.<br>
> So in that case we will require a majority votes from distinct nodes to<br>
> execute the failover.<br>
</span>> Again this *enable_multiple_failover_<wbr>requests_from_node* only becomes<br>
> effective when both *failover_when_quorum_exists* and<br>
> *failover_require_consensus* are enabled.<br>
><br>
><br>
> *Controlling the failover: The Coding perspective.*<br>
<span class="">> Although the failover functions are made quorum and consensus aware but<br>
> there is still a way to bypass the quorum conditions, and requirement of<br>
> consensus.<br>
><br>
> For this the patch uses the existing request_details flags in<br>
> POOL_REQUEST_NODE to control the behaviour of failover.<br>
><br>
> Here are the newly added flags values.<br>
><br>
</span>> *REQ_DETAIL_WATCHDOG*:<br>
<span class="">> Setting this flag while issuing the failover command will not send the<br>
> failover request to the watchdog. But this flag may not be useful in any<br>
> other place than where it is already used.<br>
> Mostly this flag can be used to avoid the failover command from going to<br>
> watchdog that is already originated from watchdog. Otherwise we can end up<br>
> in infinite loop.<br>
><br>
</span>> *REQ_DETAIL_CONFIRMED*:<br>
> Setting this flag will bypass the *failover_require_consensus*<br>
<span class="">> configuration and immediately perform the failover if quorum is present.<br>
> This flag can be used to issue the failover request originated from PCP<br>
> command.<br>
><br>
</span>> *REQ_DETAIL_UPDATE*:<br>
<span class="">> This flag is used for the command where we are failing back the quarantine<br>
> nodes. Setting this flag will not trigger the failback_command.<br>
><br>
</span>> *Some conditional flags used:*<br>
<span class="">> I was not sure about the configuration of each type of failover operation.<br>
> As we have three main failover operations NODE_UP_REQUEST,<br>
> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST<br>
> So I was thinking do we need to give the configuration option to the users,<br>
> if they want to enable/disable quorum checking and consensus for individual<br>
> failover operation type.<br>
> For example: is it a practical configuration where a user would want to<br>
> ensure quorum while preforming NODE_DOWN operation while does not want it<br>
> for NODE_UP.<br>
> So in this patch I use three compile time defines to enable disable the<br>
> individual failover operation, while we can decide on the best solution.<br>
><br>
> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking feature<br>
> for NODE_UP_REQUESTs<br>
><br>
> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking<br>
> feature for NODE_DOWN_REQUESTs<br>
><br>
> NODE_PROMOTE_REQUIRE_<wbr>CONSENSUS: defining it will enable quorum checking<br>
> feature for PROMOTE_NODE_REQUESTs<br>
><br>
</span>> *Some Point for Discussion:*<br>
><br>
> *Do we really need to check ReqInfo->switching flag before enqueuing<br>
> failover request.*<br>
<span class="">> While working on the patch I was wondering why do we disallow enqueuing the<br>
> failover command when the failover is already in progress? For example in<br>
</span>> *pcp_process_command*() function if we see the *Req_info->switching* flag<br>
<span class="">> set we bailout with the error instead of enqueuing the command. Is is<br>
> really necessary?<br>
><br>
</span>> *Do we need more granule control over each failover operation:*<br>
<span class="">> As described in section "Some conditional flags used" I want the opinion on<br>
> do we need configuration parameters in pgpool.conf to enable disable quorum<br>
> and consensus checking on individual failover types.<br>
><br>
</span>> *Which failover should be mark as Confirmed:*<br>
<span class="">> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the<br>
> failover request to not need consensus, currently the requests from the PCP<br>
> commands are fired with this flag. But I was wondering there may be more<br>
> places where we many need to use the flag.<br>
> For example I currently use the same confirmed flag when failover is<br>
</span>> triggered because of *replication_stop_on_mismatch*<wbr>.<br>
<span class="">><br>
> I think we should think this flag for each place of failover, like when the<br>
> failover is triggered<br>
> because of health_check failure.<br>
> because of replication mismatch<br>
> because of backend_error<br>
> e.t.c<br>
><br>
</span>> *Node Quarantine behaviour.*<br>
<span class="">> What do you think about the node quarantine used by this patch. Can you<br>
> think of some problem which can be caused by this?<br>
><br>
</span>> *What should be the default values for each newly added config parameters.*<br>
><br>
><br>
><br>
> *TODOs*<br>
<div class="HOEnZb"><div class="h5">><br>
> -- Updating the documentation is still todo. Will do that once every aspect<br>
> of the feature will be finalised.<br>
> -- Some code warnings and cleanups are still not done.<br>
> -- I am still little short on testing<br>
> -- Regression test cases for the feature<br>
><br>
><br>
> Thoughts and suggestions are most welcome.<br>
><br>
> Thanks<br>
> Best regards<br>
> Muhammad Usama<br>
</div></div></blockquote></div><br></div></div>