[pgpool-hackers: 2491] Re: New Feature with patch: Quorum and Consensus for backend failover

Thu Aug 24 08:34:49 JST 2017

After applying the patch, many of regression tests fail. It seems
pgpool.conf.sample has bogus comment which causes the pgpool.conf
parser to complain parse error.

2017-08-24 08:22:36: pid 6017: FATAL:  syntex error in configuration file "/home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/standby/etc/pgpool.conf"
2017-08-24 08:22:36: pid 6017: DETAIL:  parse error at line 568 '*' token = 8

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Usama,
> 
> Thanks for the patch. I am going to review it.
> 
> In the mean time when I apply your patch, I got some trailing
> whitespace errors. Can you please fix them?
> 
> /home/t-ishii/quorum_aware_failover.diff:470: trailing whitespace.
> 		
> /home/t-ishii/quorum_aware_failover.diff:485: trailing whitespace.
> 		
> /home/t-ishii/quorum_aware_failover.diff:564: trailing whitespace.
> 				
> /home/t-ishii/quorum_aware_failover.diff:1428: trailing whitespace.
> 	
> /home/t-ishii/quorum_aware_failover.diff:1450: trailing whitespace.
> 	
> warning: squelched 3 whitespace errors
> warning: 8 lines add whitespace errors.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Hi
>> 
>> I was working on the new feature to make the backend node failover quorum
>> aware and on the half way through the implementation I also added the
>> majority consensus feature for the same.
>> 
>> So please find the first version of the patch for review that makes the
>> backend node failover consider the watchdog cluster quorum status and seek
>> the majority consensus before performing failover.
>> 
>> *Changes in the Failover mechanism with watchdog.*
>> For this new feature I have modified the Pgpool-II's existing failover
>> mechanism with watchdog.
>> Previously as you know when the Pgpool-II require to perform a node
>> operation (failover, failback, promote-node) with the watchdog. The
>> watchdog used to propagated the failover request to all the Pgpool-II nodes
>> in the watchdog cluster and as soon as the request was received by the
>> node, it used to initiate the local failover and that failover was
>> synchronised on all nodes using the distributed locks.
>> 
>> *Now Only the Master node performs the failover.*
>> The attached patch changes the mechanism of synchronised failover, and now
>> only the Pgpool-II of master watchdog node performs the failover, and all
>> other standby nodes sync the backend statuses after the master Pgpool-II is
>> finished with the failover.
>> 
>> *Overview of new failover mechanism.*
>> -- If the failover request is received to the standby watchdog node(from
>> local Pgpool-II), That request is forwarded to the master watchdog and the
>> Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
>> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
>> watchdog for the failover request the requesting Pgpool-II moves forward
>> without doing anything further for the particular failover command.
>> 
>> -- Now when the failover request from standby node is received by the
>> master watchdog, after performing the validation, applying the consensus
>> rules the failover request is triggered on the local Pgpool-II .
>> 
>> -- When the failover request is received to the master watchdog node from
>> the local Pgpool-II (On the IPC channel) the watchdog process inform the
>> Pgpool-II requesting process to proceed with failover (provided all
>> failover rules are satisfied).
>> 
>> -- After the failover is finished on the master Pgpool-II, the failover
>> function calls the *wd_failover_end*() which sends the backend sync
>> required message to all standby watchdogs.
>> 
>> -- Upon receiving the sync required message from master watchdog node all
>> Pgpool-II sync the new statuses of each backend node from the master
>> watchdog.
>> 
>> *No More Failover locks*
>> Since with this new failover mechanism we do not require any
>> synchronisation and guards against the execution of failover_commands by
>> multiple Pgpool-II nodes, So the patch removes all the distributed locks
>> from failover function, This makes the failover simpler and faster.
>> 
>> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
>> The patch adds the new kind of backend node operation NODE_QUARANTINE which
>> is effectively same as the NODE_DOWN, but with node_quarantine the
>> failover_command is not triggered.
>> The NODE_DOWN_REQUEST is automatically converted to the
>> NODE_QUARANTINE_REQUEST when the failover is requested on the backend node
>> but watchdog cluster does not holds the quorum.
>> This means in the absence of quorum the failed backend nodes are
>> quarantined and when the quorum becomes available again the Pgpool-II
>> performs the failback operation on all quarantine nodes.
>> And again when the failback is performed on the quarantine backend node the
>> failover function does not trigger the failback_command.
>> 
>> *Controlling the Failover behaviour.*
>> The patch adds three new configuration parameters to configure the failover
>> behaviour from user side.
>> 
>> *failover_when_quorum_exists*
>> When enabled the failover command will only be executed when the watchdog
>> cluster holds the quorum. And when the quorum is absent and
>> failover_when_quorum_exists is enabled the failed backend nodes will get
>> quarantine until the quorum becomes available again.
>> disabling it will enable the old behaviour of failover commands.
>> 
>> 
>> *failover_require_consensus*This new configuration parameter can be used to
>> make sure we get the majority vote before performing the failover on the
>> node. When *failover_require_consensus* is enabled then the failover is
>> only performed after receiving the failover request from the majority or
>> Pgpool-II nodes.
>> For example in three nodes cluster the failover will not be performed until
>> at least two nodes ask for performing the failover on the particular
>> backend node.
>> 
>> It is also worthwhile to mention here that *failover_require_consensus*
>> only works when failover_when_quorum_exists is enables.
>> 
>> 
>> *enable_multiple_failover_requests_from_node*
>> This parameter works in connection with *failover_require_consensus*
>> config. When enabled a single Pgpool-II node can vote for failover multiple
>> times.
>> For example in the three nodes cluster if one Pgpool-II node sends the
>> failover request of particular node twice that would be counted as two
>> votes in favour of failover and the failover will be performed even if we
>> do not get a vote from other two nodes.
>> 
>> And when *enable_multiple_failover_requests_from_node* is disabled, Only
>> the first vote from each Pgpool-II will be accepted and all other
>> subsequent votes will be marked duplicate and rejected.
>> So in that case we will require a majority votes from distinct nodes to
>> execute the failover.
>> Again this *enable_multiple_failover_requests_from_node* only becomes
>> effective when both *failover_when_quorum_exists* and
>> *failover_require_consensus* are enabled.
>> 
>> 
>> *Controlling the failover: The Coding perspective.*
>> Although the failover functions are made quorum and consensus aware but
>> there is still a way to bypass the quorum conditions, and requirement of
>> consensus.
>> 
>> For this the patch uses the existing request_details flags in
>> POOL_REQUEST_NODE to control the behaviour of failover.
>> 
>> Here are the newly added flags values.
>> 
>> *REQ_DETAIL_WATCHDOG*:
>> Setting this flag while issuing the failover command will not send the
>> failover request to the watchdog. But this flag may not be useful in any
>> other place than where it is already used.
>> Mostly this flag can be used to avoid the failover command from going to
>> watchdog that is already originated from watchdog. Otherwise we can end up
>> in infinite loop.
>> 
>> *REQ_DETAIL_CONFIRMED*:
>> Setting this flag will bypass the *failover_require_consensus*
>> configuration and immediately perform the failover if quorum is present.
>> This flag can be used to issue the failover request originated from PCP
>> command.
>> 
>> *REQ_DETAIL_UPDATE*:
>> This flag is used for the command where we are failing back the quarantine
>> nodes. Setting this flag will not trigger the failback_command.
>> 
>> *Some conditional flags used:*
>> I was not sure about the configuration of each type of failover operation.
>> As we have three main failover operations NODE_UP_REQUEST,
>> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
>> So I was thinking do we need to give the configuration option to the users,
>> if they want to enable/disable quorum checking and consensus for individual
>> failover operation type.
>> For example: is it a practical configuration where a user would want to
>> ensure quorum while preforming NODE_DOWN operation while does not want it
>> for NODE_UP.
>> So in this patch I use three compile time defines to enable disable the
>> individual failover operation, while we can decide on the best solution.
>> 
>> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking feature
>> for NODE_UP_REQUESTs
>> 
>> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
>> feature for NODE_DOWN_REQUESTs
>> 
>> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum checking
>> feature for PROMOTE_NODE_REQUESTs
>> 
>> *Some Point for Discussion:*
>> 
>> *Do we really need to check ReqInfo->switching flag before enqueuing
>> failover request.*
>> While working on the patch I was wondering why do we disallow enqueuing the
>> failover command when the failover is already in progress? For example in
>> *pcp_process_command*() function if we see the *Req_info->switching* flag
>> set we bailout with the error instead of enqueuing the command. Is is
>> really necessary?
>> 
>> *Do we need more granule control over each failover operation:*
>> As described in section "Some conditional flags used" I want the opinion on
>> do we need configuration parameters in pgpool.conf to enable disable quorum
>> and consensus checking on individual failover types.
>> 
>> *Which failover should be mark as Confirmed:*
>> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the
>> failover request to not need consensus, currently the requests from the PCP
>> commands are fired with this flag. But I was wondering there may be more
>> places where we many need to use the flag.
>> For example I currently use the same confirmed flag when failover is
>> triggered because of *replication_stop_on_mismatch*.
>> 
>> I think we should think this flag for each place of failover, like when the
>> failover is triggered
>> because of health_check failure.
>> because of replication mismatch
>> because of backend_error
>> e.t.c
>> 
>> *Node Quarantine behaviour.*
>> What do you think about the node quarantine used by this patch. Can you
>> think of some problem which can be caused by this?
>> 
>> *What should be the default values for each newly added config parameters.*
>> 
>> 
>> 
>> *TODOs*
>> 
>> -- Updating the documentation is still todo. Will do that once every aspect
>> of the feature will be finalised.
>> -- Some code warnings and cleanups are still not done.
>> -- I am still little short on testing
>> -- Regression test cases for the feature
>> 
>> 
>> Thoughts and suggestions are most welcome.
>> 
>> Thanks
>> Best regards
>> Muhammad Usama
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers