[pgpool-hackers: 2616] Re: New Feature with patch: Quorum and Consensus for backend failover

Tue Nov 28 09:55:02 JST 2017

Hi Usama,

While writing a presentation material of Pgpool-II 3.7, I am not sure
I understand the behavior the quorum consusens behavior.

> *enable_multiple_failover_requests_from_node*
> This parameter works in connection with *failover_require_consensus*
> config. When enabled a single Pgpool-II node can vote for failover multiple
> times.

In what situation a Pgpool-II node could send multiple failover
requests? My guess is in the following scenario:

1) Pgpool-II watchdog standby health check process detects the failure
   of backend A and send a faiover request to the master Pgpool-II.

2) Since the vote does not satisfy the quorum consensus, failver is
   not occurred. Just backend_info->quarantine is set and
   backend_info->backend_status is set to CON_DOWN.

3) Pgpool-II watchdog standby health check process detects the failure
   of backend A again, then sent a failover request to the master
   Pgpool-II again. If enable_multiple_failover_requests_from_node is
   set, failover will happen.

But after thinking more, I realized that in step 3, since
backend_status is already set to CON_DOWN, health check will not be
performed against backend A. So the watchdog standby will not send
multiple vote.

Apparently I am missing something here.

Can you please tell what is the scenario in that a watchdog sends
multiple votes for failover?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Muhammad Usama <m.usama at gmail.com>
Subject: New Feature with patch: Quorum and Consensus for backend failover
Date: Tue, 22 Aug 2017 00:18:27 +0500
Message-ID: <CAEJvTzUbz-d8dfsJdLt=XNYWdOMxKf06sp+p=uAbxyjvG=vS3A at mail.gmail.com>

> Hi
> 
> I was working on the new feature to make the backend node failover quorum
> aware and on the half way through the implementation I also added the
> majority consensus feature for the same.
> 
> So please find the first version of the patch for review that makes the
> backend node failover consider the watchdog cluster quorum status and seek
> the majority consensus before performing failover.
> 
> *Changes in the Failover mechanism with watchdog.*
> For this new feature I have modified the Pgpool-II's existing failover
> mechanism with watchdog.
> Previously as you know when the Pgpool-II require to perform a node
> operation (failover, failback, promote-node) with the watchdog. The
> watchdog used to propagated the failover request to all the Pgpool-II nodes
> in the watchdog cluster and as soon as the request was received by the
> node, it used to initiate the local failover and that failover was
> synchronised on all nodes using the distributed locks.
> 
> *Now Only the Master node performs the failover.*
> The attached patch changes the mechanism of synchronised failover, and now
> only the Pgpool-II of master watchdog node performs the failover, and all
> other standby nodes sync the backend statuses after the master Pgpool-II is
> finished with the failover.
> 
> *Overview of new failover mechanism.*
> -- If the failover request is received to the standby watchdog node(from
> local Pgpool-II), That request is forwarded to the master watchdog and the
> Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
> watchdog for the failover request the requesting Pgpool-II moves forward
> without doing anything further for the particular failover command.
> 
> -- Now when the failover request from standby node is received by the
> master watchdog, after performing the validation, applying the consensus
> rules the failover request is triggered on the local Pgpool-II .
> 
> -- When the failover request is received to the master watchdog node from
> the local Pgpool-II (On the IPC channel) the watchdog process inform the
> Pgpool-II requesting process to proceed with failover (provided all
> failover rules are satisfied).
> 
> -- After the failover is finished on the master Pgpool-II, the failover
> function calls the *wd_failover_end*() which sends the backend sync
> required message to all standby watchdogs.
> 
> -- Upon receiving the sync required message from master watchdog node all
> Pgpool-II sync the new statuses of each backend node from the master
> watchdog.
> 
> *No More Failover locks*
> Since with this new failover mechanism we do not require any
> synchronisation and guards against the execution of failover_commands by
> multiple Pgpool-II nodes, So the patch removes all the distributed locks
> from failover function, This makes the failover simpler and faster.
> 
> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
> The patch adds the new kind of backend node operation NODE_QUARANTINE which
> is effectively same as the NODE_DOWN, but with node_quarantine the
> failover_command is not triggered.
> The NODE_DOWN_REQUEST is automatically converted to the
> NODE_QUARANTINE_REQUEST when the failover is requested on the backend node
> but watchdog cluster does not holds the quorum.
> This means in the absence of quorum the failed backend nodes are
> quarantined and when the quorum becomes available again the Pgpool-II
> performs the failback operation on all quarantine nodes.
> And again when the failback is performed on the quarantine backend node the
> failover function does not trigger the failback_command.
> 
> *Controlling the Failover behaviour.*
> The patch adds three new configuration parameters to configure the failover
> behaviour from user side.
> 
> *failover_when_quorum_exists*
> When enabled the failover command will only be executed when the watchdog
> cluster holds the quorum. And when the quorum is absent and
> failover_when_quorum_exists is enabled the failed backend nodes will get
> quarantine until the quorum becomes available again.
> disabling it will enable the old behaviour of failover commands.
> 
> 
> *failover_require_consensus*This new configuration parameter can be used to
> make sure we get the majority vote before performing the failover on the
> node. When *failover_require_consensus* is enabled then the failover is
> only performed after receiving the failover request from the majority or
> Pgpool-II nodes.
> For example in three nodes cluster the failover will not be performed until
> at least two nodes ask for performing the failover on the particular
> backend node.
> 
> It is also worthwhile to mention here that *failover_require_consensus*
> only works when failover_when_quorum_exists is enables.
> 
> 
> *enable_multiple_failover_requests_from_node*
> This parameter works in connection with *failover_require_consensus*
> config. When enabled a single Pgpool-II node can vote for failover multiple
> times.
> For example in the three nodes cluster if one Pgpool-II node sends the
> failover request of particular node twice that would be counted as two
> votes in favour of failover and the failover will be performed even if we
> do not get a vote from other two nodes.
> 
> And when *enable_multiple_failover_requests_from_node* is disabled, Only
> the first vote from each Pgpool-II will be accepted and all other
> subsequent votes will be marked duplicate and rejected.
> So in that case we will require a majority votes from distinct nodes to
> execute the failover.
> Again this *enable_multiple_failover_requests_from_node* only becomes
> effective when both *failover_when_quorum_exists* and
> *failover_require_consensus* are enabled.
> 
> 
> *Controlling the failover: The Coding perspective.*
> Although the failover functions are made quorum and consensus aware but
> there is still a way to bypass the quorum conditions, and requirement of
> consensus.
> 
> For this the patch uses the existing request_details flags in
> POOL_REQUEST_NODE to control the behaviour of failover.
> 
> Here are the newly added flags values.
> 
> *REQ_DETAIL_WATCHDOG*:
> Setting this flag while issuing the failover command will not send the
> failover request to the watchdog. But this flag may not be useful in any
> other place than where it is already used.
> Mostly this flag can be used to avoid the failover command from going to
> watchdog that is already originated from watchdog. Otherwise we can end up
> in infinite loop.
> 
> *REQ_DETAIL_CONFIRMED*:
> Setting this flag will bypass the *failover_require_consensus*
> configuration and immediately perform the failover if quorum is present.
> This flag can be used to issue the failover request originated from PCP
> command.
> 
> *REQ_DETAIL_UPDATE*:
> This flag is used for the command where we are failing back the quarantine
> nodes. Setting this flag will not trigger the failback_command.
> 
> *Some conditional flags used:*
> I was not sure about the configuration of each type of failover operation.
> As we have three main failover operations NODE_UP_REQUEST,
> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
> So I was thinking do we need to give the configuration option to the users,
> if they want to enable/disable quorum checking and consensus for individual
> failover operation type.
> For example: is it a practical configuration where a user would want to
> ensure quorum while preforming NODE_DOWN operation while does not want it
> for NODE_UP.
> So in this patch I use three compile time defines to enable disable the
> individual failover operation, while we can decide on the best solution.
> 
> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking feature
> for NODE_UP_REQUESTs
> 
> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
> feature for NODE_DOWN_REQUESTs
> 
> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum checking
> feature for PROMOTE_NODE_REQUESTs
> 
> *Some Point for Discussion:*
> 
> *Do we really need to check ReqInfo->switching flag before enqueuing
> failover request.*
> While working on the patch I was wondering why do we disallow enqueuing the
> failover command when the failover is already in progress? For example in
> *pcp_process_command*() function if we see the *Req_info->switching* flag
> set we bailout with the error instead of enqueuing the command. Is is
> really necessary?
> 
> *Do we need more granule control over each failover operation:*
> As described in section "Some conditional flags used" I want the opinion on
> do we need configuration parameters in pgpool.conf to enable disable quorum
> and consensus checking on individual failover types.
> 
> *Which failover should be mark as Confirmed:*
> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the
> failover request to not need consensus, currently the requests from the PCP
> commands are fired with this flag. But I was wondering there may be more
> places where we many need to use the flag.
> For example I currently use the same confirmed flag when failover is
> triggered because of *replication_stop_on_mismatch*.
> 
> I think we should think this flag for each place of failover, like when the
> failover is triggered
> because of health_check failure.
> because of replication mismatch
> because of backend_error
> e.t.c
> 
> *Node Quarantine behaviour.*
> What do you think about the node quarantine used by this patch. Can you
> think of some problem which can be caused by this?
> 
> *What should be the default values for each newly added config parameters.*
> 
> 
> 
> *TODOs*
> 
> -- Updating the documentation is still todo. Will do that once every aspect
> of the feature will be finalised.
> -- Some code warnings and cleanups are still not done.
> -- I am still little short on testing
> -- Regression test cases for the feature
> 
> 
> Thoughts and suggestions are most welcome.
> 
> Thanks
> Best regards
> Muhammad Usama