[pgpool-hackers: 2619] Re: New Feature with patch: Quorum and Consensus for backend failover

Wed Nov 29 01:44:03 JST 2017

Hi Ishii-San,

On Tue, Nov 28, 2017 at 5:55 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> While writing a presentation material of Pgpool-II 3.7, I am not sure
> I understand the behavior the quorum consusens behavior.
>
> > *enable_multiple_failover_requests_from_node*
> > This parameter works in connection with *failover_require_consensus*
> > config. When enabled a single Pgpool-II node can vote for failover
> multiple
> > times.
>
> In what situation a Pgpool-II node could send multiple failover
> requests? My guess is in the following scenario:
>
> 1) Pgpool-II watchdog standby health check process detects the failure
>    of backend A and send a faiover request to the master Pgpool-II.
>
> 2) Since the vote does not satisfy the quorum consensus, failver is
>    not occurred. Just backend_info->quarantine is set and
>    backend_info->backend_status is set to CON_DOWN.
>
> 3) Pgpool-II watchdog standby health check process detects the failure
>    of backend A again, then sent a failover request to the master
>    Pgpool-II again. If enable_multiple_failover_requests_from_node is
>    set, failover will happen.
>
> But after thinking more, I realized that in step 3, since
> backend_status is already set to CON_DOWN, health check will not be
> performed against backend A. So the watchdog standby will not send
> multiple vote.

> Apparently I am missing something here.
>
> Can you please tell what is the scenario in that a watchdog sends
> multiple votes for failover?
>
>
Basically when allow_multiple_failover_requests_from_node is set then
watchdog
does not performs the quarantine operation and node status is not changed
to DOWN.
So it is possible for the node to send multiple votes for node failover.
Also even when the allow_multiple_failover_requests_from_node is not set,
Pgpool-II does not quarantines the node straightaway after first failover
request while watchdog
is waiting for consensus. What happens is, when the watchdog receives the
failover requests
and that request requires a consensus, it returns
FAILOVER_RES_CONSENSUS_MAY_FAIL,
and when the main pgpool process receives this return code for failover
request from watchdog,
it just ignores this request without changing the backend node status to
down and relies on watchdog
to handle that failover request, meanwhile pgpool continues with its normal
duties,

Now when the same pgpool sends the failover request for the same backend
node second time around,
Then the behaviour depends upon the setting of
allow_multiple_failover_requests_from_node configuration.

1- When allow_multiple_failover_requests_from_node = off
    Then watchdog returns  FAILOVER_RES_CONSENSUS_MAY_FAIL, and Pgpool main
process quarantines
     the backend node and set its status to DOWN when it receives this code
from watchdog.

1- When allow_multiple_failover_requests_from_node = on
    Then watchdog returns FAILOVER_RES_BUILDING_CONSENSUS, and Pgpool main
process does not
    quarantines the backend node and its status remains unchanged and
effectively health check
    keeps executing on that backend node.

 Thanks
Best Regards
Muhammad Usama

Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> From: Muhammad Usama <m.usama at gmail.com>
> Subject: New Feature with patch: Quorum and Consensus for backend failover
> Date: Tue, 22 Aug 2017 00:18:27 +0500
> Message-ID: <CAEJvTzUbz-d8dfsJdLt=XNYWdOMxKf06sp+p=uAbxyjvG=vS3A at mail.
> gmail.com>
>
> > Hi
> >
> > I was working on the new feature to make the backend node failover quorum
> > aware and on the half way through the implementation I also added the
> > majority consensus feature for the same.
> >
> > So please find the first version of the patch for review that makes the
> > backend node failover consider the watchdog cluster quorum status and
> seek
> > the majority consensus before performing failover.
> >
> > *Changes in the Failover mechanism with watchdog.*
> > For this new feature I have modified the Pgpool-II's existing failover
> > mechanism with watchdog.
> > Previously as you know when the Pgpool-II require to perform a node
> > operation (failover, failback, promote-node) with the watchdog. The
> > watchdog used to propagated the failover request to all the Pgpool-II
> nodes
> > in the watchdog cluster and as soon as the request was received by the
> > node, it used to initiate the local failover and that failover was
> > synchronised on all nodes using the distributed locks.
> >
> > *Now Only the Master node performs the failover.*
> > The attached patch changes the mechanism of synchronised failover, and
> now
> > only the Pgpool-II of master watchdog node performs the failover, and all
> > other standby nodes sync the backend statuses after the master Pgpool-II
> is
> > finished with the failover.
> >
> > *Overview of new failover mechanism.*
> > -- If the failover request is received to the standby watchdog node(from
> > local Pgpool-II), That request is forwarded to the master watchdog and
> the
> > Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
> > return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
> > watchdog for the failover request the requesting Pgpool-II moves forward
> > without doing anything further for the particular failover command.
> >
> > -- Now when the failover request from standby node is received by the
> > master watchdog, after performing the validation, applying the consensus
> > rules the failover request is triggered on the local Pgpool-II .
> >
> > -- When the failover request is received to the master watchdog node from
> > the local Pgpool-II (On the IPC channel) the watchdog process inform the
> > Pgpool-II requesting process to proceed with failover (provided all
> > failover rules are satisfied).
> >
> > -- After the failover is finished on the master Pgpool-II, the failover
> > function calls the *wd_failover_end*() which sends the backend sync
> > required message to all standby watchdogs.
> >
> > -- Upon receiving the sync required message from master watchdog node all
> > Pgpool-II sync the new statuses of each backend node from the master
> > watchdog.
> >
> > *No More Failover locks*
> > Since with this new failover mechanism we do not require any
> > synchronisation and guards against the execution of failover_commands by
> > multiple Pgpool-II nodes, So the patch removes all the distributed locks
> > from failover function, This makes the failover simpler and faster.
> >
> > *New kind of Failover operation NODE_QUARANTINE_REQUEST*
> > The patch adds the new kind of backend node operation NODE_QUARANTINE
> which
> > is effectively same as the NODE_DOWN, but with node_quarantine the
> > failover_command is not triggered.
> > The NODE_DOWN_REQUEST is automatically converted to the
> > NODE_QUARANTINE_REQUEST when the failover is requested on the backend
> node
> > but watchdog cluster does not holds the quorum.
> > This means in the absence of quorum the failed backend nodes are
> > quarantined and when the quorum becomes available again the Pgpool-II
> > performs the failback operation on all quarantine nodes.
> > And again when the failback is performed on the quarantine backend node
> the
> > failover function does not trigger the failback_command.
> >
> > *Controlling the Failover behaviour.*
> > The patch adds three new configuration parameters to configure the
> failover
> > behaviour from user side.
> >
> > *failover_when_quorum_exists*
> > When enabled the failover command will only be executed when the watchdog
> > cluster holds the quorum. And when the quorum is absent and
> > failover_when_quorum_exists is enabled the failed backend nodes will get
> > quarantine until the quorum becomes available again.
> > disabling it will enable the old behaviour of failover commands.
> >
> >
> > *failover_require_consensus*This new configuration parameter can be
> used to
> > make sure we get the majority vote before performing the failover on the
> > node. When *failover_require_consensus* is enabled then the failover is
> > only performed after receiving the failover request from the majority or
> > Pgpool-II nodes.
> > For example in three nodes cluster the failover will not be performed
> until
> > at least two nodes ask for performing the failover on the particular
> > backend node.
> >
> > It is also worthwhile to mention here that *failover_require_consensus*
> > only works when failover_when_quorum_exists is enables.
> >
> >
> > *enable_multiple_failover_requests_from_node*
> > This parameter works in connection with *failover_require_consensus*
> > config. When enabled a single Pgpool-II node can vote for failover
> multiple
> > times.
> > For example in the three nodes cluster if one Pgpool-II node sends the
> > failover request of particular node twice that would be counted as two
> > votes in favour of failover and the failover will be performed even if we
> > do not get a vote from other two nodes.
> >
> > And when *enable_multiple_failover_requests_from_node* is disabled, Only
> > the first vote from each Pgpool-II will be accepted and all other
> > subsequent votes will be marked duplicate and rejected.
> > So in that case we will require a majority votes from distinct nodes to
> > execute the failover.
> > Again this *enable_multiple_failover_requests_from_node* only becomes
> > effective when both *failover_when_quorum_exists* and
> > *failover_require_consensus* are enabled.
> >
> >
> > *Controlling the failover: The Coding perspective.*
> > Although the failover functions are made quorum and consensus aware but
> > there is still a way to bypass the quorum conditions, and requirement of
> > consensus.
> >
> > For this the patch uses the existing request_details flags in
> > POOL_REQUEST_NODE to control the behaviour of failover.
> >
> > Here are the newly added flags values.
> >
> > *REQ_DETAIL_WATCHDOG*:
> > Setting this flag while issuing the failover command will not send the
> > failover request to the watchdog. But this flag may not be useful in any
> > other place than where it is already used.
> > Mostly this flag can be used to avoid the failover command from going to
> > watchdog that is already originated from watchdog. Otherwise we can end
> up
> > in infinite loop.
> >
> > *REQ_DETAIL_CONFIRMED*:
> > Setting this flag will bypass the *failover_require_consensus*
> > configuration and immediately perform the failover if quorum is present.
> > This flag can be used to issue the failover request originated from PCP
> > command.
> >
> > *REQ_DETAIL_UPDATE*:
> > This flag is used for the command where we are failing back the
> quarantine
> > nodes. Setting this flag will not trigger the failback_command.
> >
> > *Some conditional flags used:*
> > I was not sure about the configuration of each type of failover
> operation.
> > As we have three main failover operations NODE_UP_REQUEST,
> > NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
> > So I was thinking do we need to give the configuration option to the
> users,
> > if they want to enable/disable quorum checking and consensus for
> individual
> > failover operation type.
> > For example: is it a practical configuration where a user would want to
> > ensure quorum while preforming NODE_DOWN operation while does not want it
> > for NODE_UP.
> > So in this patch I use three compile time defines to enable disable the
> > individual failover operation, while we can decide on the best solution.
> >
> > NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking
> feature
> > for NODE_UP_REQUESTs
> >
> > NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
> > feature for NODE_DOWN_REQUESTs
> >
> > NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum checking
> > feature for PROMOTE_NODE_REQUESTs
> >
> > *Some Point for Discussion:*
> >
> > *Do we really need to check ReqInfo->switching flag before enqueuing
> > failover request.*
> > While working on the patch I was wondering why do we disallow enqueuing
> the
> > failover command when the failover is already in progress? For example in
> > *pcp_process_command*() function if we see the *Req_info->switching* flag
> > set we bailout with the error instead of enqueuing the command. Is is
> > really necessary?
> >
> > *Do we need more granule control over each failover operation:*
> > As described in section "Some conditional flags used" I want the opinion
> on
> > do we need configuration parameters in pgpool.conf to enable disable
> quorum
> > and consensus checking on individual failover types.
> >
> > *Which failover should be mark as Confirmed:*
> > As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the
> > failover request to not need consensus, currently the requests from the
> PCP
> > commands are fired with this flag. But I was wondering there may be more
> > places where we many need to use the flag.
> > For example I currently use the same confirmed flag when failover is
> > triggered because of *replication_stop_on_mismatch*.
> >
> > I think we should think this flag for each place of failover, like when
> the
> > failover is triggered
> > because of health_check failure.
> > because of replication mismatch
> > because of backend_error
> > e.t.c
> >
> > *Node Quarantine behaviour.*
> > What do you think about the node quarantine used by this patch. Can you
> > think of some problem which can be caused by this?
> >
> > *What should be the default values for each newly added config
> parameters.*
> >
> >
> >
> > *TODOs*
> >
> > -- Updating the documentation is still todo. Will do that once every
> aspect
> > of the feature will be finalised.
> > -- Some code warnings and cleanups are still not done.
> > -- I am still little short on testing
> > -- Regression test cases for the feature
> >
> >
> > Thoughts and suggestions are most welcome.
> >
> > Thanks
> > Best regards
> > Muhammad Usama
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20171128/93662fbe/attachment-0001.html>