[pgpool-hackers: 2495] Re: New Feature with patch: Quorum and Consensus for backend failover

Thu Aug 24 15:21:47 JST 2017

On Thu, Aug 24, 2017 at 4:15 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Usama,
>
> Thanks for the patch. I am going to review it.
>

Thanks :-)

>
> In the mean time when I apply your patch, I got some trailing
> whitespace errors. Can you please fix them?
>
> /home/t-ishii/quorum_aware_failover.diff:470: trailing whitespace.
>
> /home/t-ishii/quorum_aware_failover.diff:485: trailing whitespace.
>
> /home/t-ishii/quorum_aware_failover.diff:564: trailing whitespace.
>
> /home/t-ishii/quorum_aware_failover.diff:1428: trailing whitespace.
>
> /home/t-ishii/quorum_aware_failover.diff:1450: trailing whitespace.
>
> warning: squelched 3 whitespace errors
> warning: 8 lines add whitespace errors.
>

Yes their are these and also some more code cleanup still remaining, I
wanted to share the first version
as soon as possible because I want to take the community consent about the
way changed behaviour .

The final version will have all these issues addressed.

Thanks
Best Regards
Muhammad Usama

> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi
> >
> > I was working on the new feature to make the backend node failover quorum
> > aware and on the half way through the implementation I also added the
> > majority consensus feature for the same.
> >
> > So please find the first version of the patch for review that makes the
> > backend node failover consider the watchdog cluster quorum status and
> seek
> > the majority consensus before performing failover.
> >
> > *Changes in the Failover mechanism with watchdog.*
> > For this new feature I have modified the Pgpool-II's existing failover
> > mechanism with watchdog.
> > Previously as you know when the Pgpool-II require to perform a node
> > operation (failover, failback, promote-node) with the watchdog. The
> > watchdog used to propagated the failover request to all the Pgpool-II
> nodes
> > in the watchdog cluster and as soon as the request was received by the
> > node, it used to initiate the local failover and that failover was
> > synchronised on all nodes using the distributed locks.
> >
> > *Now Only the Master node performs the failover.*
> > The attached patch changes the mechanism of synchronised failover, and
> now
> > only the Pgpool-II of master watchdog node performs the failover, and all
> > other standby nodes sync the backend statuses after the master Pgpool-II
> is
> > finished with the failover.
> >
> > *Overview of new failover mechanism.*
> > -- If the failover request is received to the standby watchdog node(from
> > local Pgpool-II), That request is forwarded to the master watchdog and
> the
> > Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
> > return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
> > watchdog for the failover request the requesting Pgpool-II moves forward
> > without doing anything further for the particular failover command.
> >
> > -- Now when the failover request from standby node is received by the
> > master watchdog, after performing the validation, applying the consensus
> > rules the failover request is triggered on the local Pgpool-II .
> >
> > -- When the failover request is received to the master watchdog node from
> > the local Pgpool-II (On the IPC channel) the watchdog process inform the
> > Pgpool-II requesting process to proceed with failover (provided all
> > failover rules are satisfied).
> >
> > -- After the failover is finished on the master Pgpool-II, the failover
> > function calls the *wd_failover_end*() which sends the backend sync
> > required message to all standby watchdogs.
> >
> > -- Upon receiving the sync required message from master watchdog node all
> > Pgpool-II sync the new statuses of each backend node from the master
> > watchdog.
> >
> > *No More Failover locks*
> > Since with this new failover mechanism we do not require any
> > synchronisation and guards against the execution of failover_commands by
> > multiple Pgpool-II nodes, So the patch removes all the distributed locks
> > from failover function, This makes the failover simpler and faster.
> >
> > *New kind of Failover operation NODE_QUARANTINE_REQUEST*
> > The patch adds the new kind of backend node operation NODE_QUARANTINE
> which
> > is effectively same as the NODE_DOWN, but with node_quarantine the
> > failover_command is not triggered.
> > The NODE_DOWN_REQUEST is automatically converted to the
> > NODE_QUARANTINE_REQUEST when the failover is requested on the backend
> node
> > but watchdog cluster does not holds the quorum.
> > This means in the absence of quorum the failed backend nodes are
> > quarantined and when the quorum becomes available again the Pgpool-II
> > performs the failback operation on all quarantine nodes.
> > And again when the failback is performed on the quarantine backend node
> the
> > failover function does not trigger the failback_command.
> >
> > *Controlling the Failover behaviour.*
> > The patch adds three new configuration parameters to configure the
> failover
> > behaviour from user side.
> >
> > *failover_when_quorum_exists*
> > When enabled the failover command will only be executed when the watchdog
> > cluster holds the quorum. And when the quorum is absent and
> > failover_when_quorum_exists is enabled the failed backend nodes will get
> > quarantine until the quorum becomes available again.
> > disabling it will enable the old behaviour of failover commands.
> >
> >
> > *failover_require_consensus*This new configuration parameter can be
> used to
> > make sure we get the majority vote before performing the failover on the
> > node. When *failover_require_consensus* is enabled then the failover is
> > only performed after receiving the failover request from the majority or
> > Pgpool-II nodes.
> > For example in three nodes cluster the failover will not be performed
> until
> > at least two nodes ask for performing the failover on the particular
> > backend node.
> >
> > It is also worthwhile to mention here that *failover_require_consensus*
> > only works when failover_when_quorum_exists is enables.
> >
> >
> > *enable_multiple_failover_requests_from_node*
> > This parameter works in connection with *failover_require_consensus*
> > config. When enabled a single Pgpool-II node can vote for failover
> multiple
> > times.
> > For example in the three nodes cluster if one Pgpool-II node sends the
> > failover request of particular node twice that would be counted as two
> > votes in favour of failover and the failover will be performed even if we
> > do not get a vote from other two nodes.
> >
> > And when *enable_multiple_failover_requests_from_node* is disabled, Only
> > the first vote from each Pgpool-II will be accepted and all other
> > subsequent votes will be marked duplicate and rejected.
> > So in that case we will require a majority votes from distinct nodes to
> > execute the failover.
> > Again this *enable_multiple_failover_requests_from_node* only becomes
> > effective when both *failover_when_quorum_exists* and
> > *failover_require_consensus* are enabled.
> >
> >
> > *Controlling the failover: The Coding perspective.*
> > Although the failover functions are made quorum and consensus aware but
> > there is still a way to bypass the quorum conditions, and requirement of
> > consensus.
> >
> > For this the patch uses the existing request_details flags in
> > POOL_REQUEST_NODE to control the behaviour of failover.
> >
> > Here are the newly added flags values.
> >
> > *REQ_DETAIL_WATCHDOG*:
> > Setting this flag while issuing the failover command will not send the
> > failover request to the watchdog. But this flag may not be useful in any
> > other place than where it is already used.
> > Mostly this flag can be used to avoid the failover command from going to
> > watchdog that is already originated from watchdog. Otherwise we can end
> up
> > in infinite loop.
> >
> > *REQ_DETAIL_CONFIRMED*:
> > Setting this flag will bypass the *failover_require_consensus*
> > configuration and immediately perform the failover if quorum is present.
> > This flag can be used to issue the failover request originated from PCP
> > command.
> >
> > *REQ_DETAIL_UPDATE*:
> > This flag is used for the command where we are failing back the
> quarantine
> > nodes. Setting this flag will not trigger the failback_command.
> >
> > *Some conditional flags used:*
> > I was not sure about the configuration of each type of failover
> operation.
> > As we have three main failover operations NODE_UP_REQUEST,
> > NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
> > So I was thinking do we need to give the configuration option to the
> users,
> > if they want to enable/disable quorum checking and consensus for
> individual
> > failover operation type.
> > For example: is it a practical configuration where a user would want to
> > ensure quorum while preforming NODE_DOWN operation while does not want it
> > for NODE_UP.
> > So in this patch I use three compile time defines to enable disable the
> > individual failover operation, while we can decide on the best solution.
> >
> > NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking
> feature
> > for NODE_UP_REQUESTs
> >
> > NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
> > feature for NODE_DOWN_REQUESTs
> >
> > NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum checking
> > feature for PROMOTE_NODE_REQUESTs
> >
> > *Some Point for Discussion:*
> >
> > *Do we really need to check ReqInfo->switching flag before enqueuing
> > failover request.*
> > While working on the patch I was wondering why do we disallow enqueuing
> the
> > failover command when the failover is already in progress? For example in
> > *pcp_process_command*() function if we see the *Req_info->switching* flag
> > set we bailout with the error instead of enqueuing the command. Is is
> > really necessary?
> >
> > *Do we need more granule control over each failover operation:*
> > As described in section "Some conditional flags used" I want the opinion
> on
> > do we need configuration parameters in pgpool.conf to enable disable
> quorum
> > and consensus checking on individual failover types.
> >
> > *Which failover should be mark as Confirmed:*
> > As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the
> > failover request to not need consensus, currently the requests from the
> PCP
> > commands are fired with this flag. But I was wondering there may be more
> > places where we many need to use the flag.
> > For example I currently use the same confirmed flag when failover is
> > triggered because of *replication_stop_on_mismatch*.
> >
> > I think we should think this flag for each place of failover, like when
> the
> > failover is triggered
> > because of health_check failure.
> > because of replication mismatch
> > because of backend_error
> > e.t.c
> >
> > *Node Quarantine behaviour.*
> > What do you think about the node quarantine used by this patch. Can you
> > think of some problem which can be caused by this?
> >
> > *What should be the default values for each newly added config
> parameters.*
> >
> >
> >
> > *TODOs*
> >
> > -- Updating the documentation is still todo. Will do that once every
> aspect
> > of the feature will be finalised.
> > -- Some code warnings and cleanups are still not done.
> > -- I am still little short on testing
> > -- Regression test cases for the feature
> >
> >
> > Thoughts and suggestions are most welcome.
> >
> > Thanks
> > Best regards
> > Muhammad Usama
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170824/e46f3cb0/attachment-0001.html>