<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 25, 2017 at 5:05 PM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> On Fri, Aug 25, 2017 at 12:53 PM, Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br>
><br>
>> Usama,<br>
>><br>
>> With the new patch, the regression tests all passed.<br>
>><br>
><br>
> Glad to hear that :-)<br>
> Did you had a chance to look at the node quarantine state I added. What are<br>
> your thoughts on that ?<br>
<br>
</span>I'm going to look into the patch this weekend.<br></blockquote><div><br></div><div>Many thanks</div><div><br></div><div>Best Regards</div><div>Muhammad Usama </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
<br>
>> > Hi Ishii-San<br>
>> ><br>
>> > Please fine the updated patch, It fixes the regression issue you were<br>
>> > facing and also another bug which I encountered during my testing.<br>
>> ><br>
>> > -- Adding Yugo to the thread,<br>
>> > Hi Yugo,<br>
>> ><br>
>> > Since you are an expert of watchdog feature, So I thought you might have<br>
>> > something to say especially regarding the discussion points mentioned in<br>
>> > the initial mail.<br>
>> ><br>
>> ><br>
>> > Thanks<br>
>> > Best Regards<br>
>> > Muhammad Usama<br>
>> ><br>
>> ><br>
>> > On Thu, Aug 24, 2017 at 11:25 AM, Muhammad Usama <<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>><br>
>> wrote:<br>
>> ><br>
>> >><br>
>> >><br>
>> >> On Thu, Aug 24, 2017 at 4:34 AM, Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>><br>
>> wrote:<br>
>> >><br>
>> >>> After applying the patch, many of regression tests fail. It seems<br>
>> >>> pgpool.conf.sample has bogus comment which causes the pgpool.conf<br>
>> >>> parser to complain parse error.<br>
>> >>><br>
>> >>> 2017-08-24 08:22:36: pid 6017: FATAL: syntex error in configuration<br>
>> file<br>
>> >>> "/home/t-ishii/work/pgpool-II/<wbr>current/pgpool2/src/test/regre<br>
>> >>> ssion/tests/004.watchdog/<wbr>standby/etc/pgpool.conf"<br>
>> >>> 2017-08-24 08:22:36: pid 6017: DETAIL: parse error at line 568 '*'<br>
>> token<br>
>> >>> = 8<br>
>> >>><br>
>> >><br>
>> >> Really sorry, Somehow I overlooked the sample config file changes I made<br>
>> >> at the last minute.<br>
>> >> Will send you the updated version.<br>
>> >><br>
>> >> Thanks<br>
>> >> Best Regards<br>
>> >> Muhammad Usama<br>
>> >><br>
>> >>><br>
>> >>> Best regards,<br>
>> >>> --<br>
>> >>> Tatsuo Ishii<br>
>> >>> SRA OSS, Inc. Japan<br>
>> >>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
>> >>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
>> >>><br>
>> >>> > Usama,<br>
>> >>> ><br>
>> >>> > Thanks for the patch. I am going to review it.<br>
>> >>> ><br>
>> >>> > In the mean time when I apply your patch, I got some trailing<br>
>> >>> > whitespace errors. Can you please fix them?<br>
>> >>> ><br>
>> >>> > /home/t-ishii/quorum_aware_<wbr>failover.diff:470: trailing whitespace.<br>
>> >>> ><br>
>> >>> > /home/t-ishii/quorum_aware_<wbr>failover.diff:485: trailing whitespace.<br>
>> >>> ><br>
>> >>> > /home/t-ishii/quorum_aware_<wbr>failover.diff:564: trailing whitespace.<br>
>> >>> ><br>
>> >>> > /home/t-ishii/quorum_aware_<wbr>failover.diff:1428: trailing whitespace.<br>
>> >>> ><br>
>> >>> > /home/t-ishii/quorum_aware_<wbr>failover.diff:1450: trailing whitespace.<br>
>> >>> ><br>
>> >>> > warning: squelched 3 whitespace errors<br>
>> >>> > warning: 8 lines add whitespace errors.<br>
>> >>> ><br>
>> >>> > Best regards,<br>
>> >>> > --<br>
>> >>> > Tatsuo Ishii<br>
>> >>> > SRA OSS, Inc. Japan<br>
>> >>> > English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
>> >>> > Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
>> >>> ><br>
>> >>> >> Hi<br>
>> >>> >><br>
>> >>> >> I was working on the new feature to make the backend node failover<br>
>> >>> quorum<br>
>> >>> >> aware and on the half way through the implementation I also added<br>
>> the<br>
>> >>> >> majority consensus feature for the same.<br>
>> >>> >><br>
>> >>> >> So please find the first version of the patch for review that makes<br>
>> the<br>
>> >>> >> backend node failover consider the watchdog cluster quorum status<br>
>> and<br>
>> >>> seek<br>
>> >>> >> the majority consensus before performing failover.<br>
>> >>> >><br>
>> >>> >> *Changes in the Failover mechanism with watchdog.*<br>
>> >>> >> For this new feature I have modified the Pgpool-II's existing<br>
>> failover<br>
>> >>> >> mechanism with watchdog.<br>
>> >>> >> Previously as you know when the Pgpool-II require to perform a node<br>
>> >>> >> operation (failover, failback, promote-node) with the watchdog. The<br>
>> >>> >> watchdog used to propagated the failover request to all the<br>
>> Pgpool-II<br>
>> >>> nodes<br>
>> >>> >> in the watchdog cluster and as soon as the request was received by<br>
>> the<br>
>> >>> >> node, it used to initiate the local failover and that failover was<br>
>> >>> >> synchronised on all nodes using the distributed locks.<br>
>> >>> >><br>
>> >>> >> *Now Only the Master node performs the failover.*<br>
>> >>> >> The attached patch changes the mechanism of synchronised failover,<br>
>> and<br>
>> >>> now<br>
>> >>> >> only the Pgpool-II of master watchdog node performs the failover,<br>
>> and<br>
>> >>> all<br>
>> >>> >> other standby nodes sync the backend statuses after the master<br>
>> >>> Pgpool-II is<br>
>> >>> >> finished with the failover.<br>
>> >>> >><br>
>> >>> >> *Overview of new failover mechanism.*<br>
>> >>> >> -- If the failover request is received to the standby watchdog<br>
>> >>> node(from<br>
>> >>> >> local Pgpool-II), That request is forwarded to the master watchdog<br>
>> and<br>
>> >>> the<br>
>> >>> >> Pgpool-II main process is returned with the<br>
>> FAILOVER_RES_WILL_BE_DONE<br>
>> >>> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from<br>
>> the<br>
>> >>> >> watchdog for the failover request the requesting Pgpool-II moves<br>
>> >>> forward<br>
>> >>> >> without doing anything further for the particular failover command.<br>
>> >>> >><br>
>> >>> >> -- Now when the failover request from standby node is received by<br>
>> the<br>
>> >>> >> master watchdog, after performing the validation, applying the<br>
>> >>> consensus<br>
>> >>> >> rules the failover request is triggered on the local Pgpool-II .<br>
>> >>> >><br>
>> >>> >> -- When the failover request is received to the master watchdog node<br>
>> >>> from<br>
>> >>> >> the local Pgpool-II (On the IPC channel) the watchdog process inform<br>
>> >>> the<br>
>> >>> >> Pgpool-II requesting process to proceed with failover (provided all<br>
>> >>> >> failover rules are satisfied).<br>
>> >>> >><br>
>> >>> >> -- After the failover is finished on the master Pgpool-II, the<br>
>> failover<br>
>> >>> >> function calls the *wd_failover_end*() which sends the backend sync<br>
>> >>> >> required message to all standby watchdogs.<br>
>> >>> >><br>
>> >>> >> -- Upon receiving the sync required message from master watchdog<br>
>> node<br>
>> >>> all<br>
>> >>> >> Pgpool-II sync the new statuses of each backend node from the master<br>
>> >>> >> watchdog.<br>
>> >>> >><br>
>> >>> >> *No More Failover locks*<br>
>> >>> >> Since with this new failover mechanism we do not require any<br>
>> >>> >> synchronisation and guards against the execution of<br>
>> failover_commands<br>
>> >>> by<br>
>> >>> >> multiple Pgpool-II nodes, So the patch removes all the distributed<br>
>> >>> locks<br>
>> >>> >> from failover function, This makes the failover simpler and faster.<br>
>> >>> >><br>
>> >>> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*<br>
>> >>> >> The patch adds the new kind of backend node operation<br>
>> NODE_QUARANTINE<br>
>> >>> which<br>
>> >>> >> is effectively same as the NODE_DOWN, but with node_quarantine the<br>
>> >>> >> failover_command is not triggered.<br>
>> >>> >> The NODE_DOWN_REQUEST is automatically converted to the<br>
>> >>> >> NODE_QUARANTINE_REQUEST when the failover is requested on the<br>
>> backend<br>
>> >>> node<br>
>> >>> >> but watchdog cluster does not holds the quorum.<br>
>> >>> >> This means in the absence of quorum the failed backend nodes are<br>
>> >>> >> quarantined and when the quorum becomes available again the<br>
>> Pgpool-II<br>
>> >>> >> performs the failback operation on all quarantine nodes.<br>
>> >>> >> And again when the failback is performed on the quarantine backend<br>
>> >>> node the<br>
>> >>> >> failover function does not trigger the failback_command.<br>
>> >>> >><br>
>> >>> >> *Controlling the Failover behaviour.*<br>
>> >>> >> The patch adds three new configuration parameters to configure the<br>
>> >>> failover<br>
>> >>> >> behaviour from user side.<br>
>> >>> >><br>
>> >>> >> *failover_when_quorum_exists*<br>
>> >>> >> When enabled the failover command will only be executed when the<br>
>> >>> watchdog<br>
>> >>> >> cluster holds the quorum. And when the quorum is absent and<br>
>> >>> >> failover_when_quorum_exists is enabled the failed backend nodes will<br>
>> >>> get<br>
>> >>> >> quarantine until the quorum becomes available again.<br>
>> >>> >> disabling it will enable the old behaviour of failover commands.<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> *failover_require_consensus*<wbr>This new configuration parameter can be<br>
>> >>> used to<br>
>> >>> >> make sure we get the majority vote before performing the failover on<br>
>> >>> the<br>
>> >>> >> node. When *failover_require_consensus* is enabled then the<br>
>> failover is<br>
>> >>> >> only performed after receiving the failover request from the<br>
>> majority<br>
>> >>> or<br>
>> >>> >> Pgpool-II nodes.<br>
>> >>> >> For example in three nodes cluster the failover will not be<br>
>> performed<br>
>> >>> until<br>
>> >>> >> at least two nodes ask for performing the failover on the particular<br>
>> >>> >> backend node.<br>
>> >>> >><br>
>> >>> >> It is also worthwhile to mention here that<br>
>> *failover_require_consensus*<br>
>> >>> >> only works when failover_when_quorum_exists is enables.<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> *enable_multiple_failover_<wbr>requests_from_node*<br>
>> >>> >> This parameter works in connection with *failover_require_consensus*<br>
>> >>> >> config. When enabled a single Pgpool-II node can vote for failover<br>
>> >>> multiple<br>
>> >>> >> times.<br>
>> >>> >> For example in the three nodes cluster if one Pgpool-II node sends<br>
>> the<br>
>> >>> >> failover request of particular node twice that would be counted as<br>
>> two<br>
>> >>> >> votes in favour of failover and the failover will be performed even<br>
>> if<br>
>> >>> we<br>
>> >>> >> do not get a vote from other two nodes.<br>
>> >>> >><br>
>> >>> >> And when *enable_multiple_failover_<wbr>requests_from_node* is disabled,<br>
>> >>> Only<br>
>> >>> >> the first vote from each Pgpool-II will be accepted and all other<br>
>> >>> >> subsequent votes will be marked duplicate and rejected.<br>
>> >>> >> So in that case we will require a majority votes from distinct<br>
>> nodes to<br>
>> >>> >> execute the failover.<br>
>> >>> >> Again this *enable_multiple_failover_<wbr>requests_from_node* only<br>
>> becomes<br>
>> >>> >> effective when both *failover_when_quorum_exists* and<br>
>> >>> >> *failover_require_consensus* are enabled.<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> *Controlling the failover: The Coding perspective.*<br>
>> >>> >> Although the failover functions are made quorum and consensus aware<br>
>> but<br>
>> >>> >> there is still a way to bypass the quorum conditions, and<br>
>> requirement<br>
>> >>> of<br>
>> >>> >> consensus.<br>
>> >>> >><br>
>> >>> >> For this the patch uses the existing request_details flags in<br>
>> >>> >> POOL_REQUEST_NODE to control the behaviour of failover.<br>
>> >>> >><br>
>> >>> >> Here are the newly added flags values.<br>
>> >>> >><br>
>> >>> >> *REQ_DETAIL_WATCHDOG*:<br>
>> >>> >> Setting this flag while issuing the failover command will not send<br>
>> the<br>
>> >>> >> failover request to the watchdog. But this flag may not be useful in<br>
>> >>> any<br>
>> >>> >> other place than where it is already used.<br>
>> >>> >> Mostly this flag can be used to avoid the failover command from<br>
>> going<br>
>> >>> to<br>
>> >>> >> watchdog that is already originated from watchdog. Otherwise we can<br>
>> >>> end up<br>
>> >>> >> in infinite loop.<br>
>> >>> >><br>
>> >>> >> *REQ_DETAIL_CONFIRMED*:<br>
>> >>> >> Setting this flag will bypass the *failover_require_consensus*<br>
>> >>> >> configuration and immediately perform the failover if quorum is<br>
>> >>> present.<br>
>> >>> >> This flag can be used to issue the failover request originated from<br>
>> PCP<br>
>> >>> >> command.<br>
>> >>> >><br>
>> >>> >> *REQ_DETAIL_UPDATE*:<br>
>> >>> >> This flag is used for the command where we are failing back the<br>
>> >>> quarantine<br>
>> >>> >> nodes. Setting this flag will not trigger the failback_command.<br>
>> >>> >><br>
>> >>> >> *Some conditional flags used:*<br>
>> >>> >> I was not sure about the configuration of each type of failover<br>
>> >>> operation.<br>
>> >>> >> As we have three main failover operations NODE_UP_REQUEST,<br>
>> >>> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST<br>
>> >>> >> So I was thinking do we need to give the configuration option to the<br>
>> >>> users,<br>
>> >>> >> if they want to enable/disable quorum checking and consensus for<br>
>> >>> individual<br>
>> >>> >> failover operation type.<br>
>> >>> >> For example: is it a practical configuration where a user would<br>
>> want to<br>
>> >>> >> ensure quorum while preforming NODE_DOWN operation while does not<br>
>> want<br>
>> >>> it<br>
>> >>> >> for NODE_UP.<br>
>> >>> >> So in this patch I use three compile time defines to enable disable<br>
>> the<br>
>> >>> >> individual failover operation, while we can decide on the best<br>
>> >>> solution.<br>
>> >>> >><br>
>> >>> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking<br>
>> >>> feature<br>
>> >>> >> for NODE_UP_REQUESTs<br>
>> >>> >><br>
>> >>> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking<br>
>> >>> >> feature for NODE_DOWN_REQUESTs<br>
>> >>> >><br>
>> >>> >> NODE_PROMOTE_REQUIRE_<wbr>CONSENSUS: defining it will enable quorum<br>
>> >>> checking<br>
>> >>> >> feature for PROMOTE_NODE_REQUESTs<br>
>> >>> >><br>
>> >>> >> *Some Point for Discussion:*<br>
>> >>> >><br>
>> >>> >> *Do we really need to check ReqInfo->switching flag before enqueuing<br>
>> >>> >> failover request.*<br>
>> >>> >> While working on the patch I was wondering why do we disallow<br>
>> >>> enqueuing the<br>
>> >>> >> failover command when the failover is already in progress? For<br>
>> example<br>
>> >>> in<br>
>> >>> >> *pcp_process_command*() function if we see the *Req_info->switching*<br>
>> >>> flag<br>
>> >>> >> set we bailout with the error instead of enqueuing the command. Is<br>
>> is<br>
>> >>> >> really necessary?<br>
>> >>> >><br>
>> >>> >> *Do we need more granule control over each failover operation:*<br>
>> >>> >> As described in section "Some conditional flags used" I want the<br>
>> >>> opinion on<br>
>> >>> >> do we need configuration parameters in pgpool.conf to enable disable<br>
>> >>> quorum<br>
>> >>> >> and consensus checking on individual failover types.<br>
>> >>> >><br>
>> >>> >> *Which failover should be mark as Confirmed:*<br>
>> >>> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark<br>
>> >>> the<br>
>> >>> >> failover request to not need consensus, currently the requests from<br>
>> >>> the PCP<br>
>> >>> >> commands are fired with this flag. But I was wondering there may be<br>
>> >>> more<br>
>> >>> >> places where we many need to use the flag.<br>
>> >>> >> For example I currently use the same confirmed flag when failover is<br>
>> >>> >> triggered because of *replication_stop_on_mismatch*<wbr>.<br>
>> >>> >><br>
>> >>> >> I think we should think this flag for each place of failover, like<br>
>> >>> when the<br>
>> >>> >> failover is triggered<br>
>> >>> >> because of health_check failure.<br>
>> >>> >> because of replication mismatch<br>
>> >>> >> because of backend_error<br>
>> >>> >> e.t.c<br>
>> >>> >><br>
>> >>> >> *Node Quarantine behaviour.*<br>
>> >>> >> What do you think about the node quarantine used by this patch. Can<br>
>> you<br>
>> >>> >> think of some problem which can be caused by this?<br>
>> >>> >><br>
>> >>> >> *What should be the default values for each newly added config<br>
>> >>> parameters.*<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> *TODOs*<br>
>> >>> >><br>
>> >>> >> -- Updating the documentation is still todo. Will do that once every<br>
>> >>> aspect<br>
>> >>> >> of the feature will be finalised.<br>
>> >>> >> -- Some code warnings and cleanups are still not done.<br>
>> >>> >> -- I am still little short on testing<br>
>> >>> >> -- Regression test cases for the feature<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> Thoughts and suggestions are most welcome.<br>
>> >>> >><br>
>> >>> >> Thanks<br>
>> >>> >> Best regards<br>
>> >>> >> Muhammad Usama<br>
>> >>> > ______________________________<wbr>_________________<br>
>> >>> > pgpool-hackers mailing list<br>
>> >>> > <a href="mailto:pgpool-hackers@pgpool.net">pgpool-hackers@pgpool.net</a><br>
>> >>> > <a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/<wbr>listinfo/pgpool-hackers</a><br>
>> >>><br>
>> >><br>
>> >><br>
>><br>
</div></div></blockquote></div><br></div></div>