[pgpool-hackers: 2620] Re: New Feature with patch: Quorum and Consensus for backend failover

Wed Nov 29 01:48:11 JST 2017

Hi Ishii-San

On Tue, Nov 28, 2017 at 8:09 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> I have a question regarding the condition for "quorum exists".
> I think "g_cluster.quorum_status >= 0" is the condition for "quorum
> exists".
> Am I correct?
>
> BTW, I observed from watchdog.c:
>
> When there are only two watchdog nodes participating and they are
> alive, then g_cluster.quorum_status == 1, which means "quorum exists",
> because following condition is satisfied
> (get_mimimum_nodes_required_for_quorum() returns
> (g_cluster.remoteNodeCount - 1 ) / 2), which is 0).
>
>         if ( g_cluster.clusterMasterInfo.standby_nodes_count >
> get_mimimum_nodes_required_for_quorum())
>         {
>                 g_cluster.quorum_status = 1;
>         }
>
> In this case, if pool_config->failover_when_quorum_exists is on, and
> pool_config->failover_require_consensus is on, then failover will always
> perform because:
>
>                 if (failoverObj->request_count <=
> get_mimimum_nodes_required_for_quorum())
>
> always fails (get_mimimum_nodes_required_for_quorum() returns 0).
>
> So, it seems for a two-watchdog-cluster, regardless the setting of
> failover_when_quorum_exists and failover_require_consensus, Pgpool-II
> behaves as before. Is my understanding correct?
>

Yes that is correct, This is because in watchdog we consider the existence
of quorum if 50% of nodes are
alive, So for two node cluster only one node is enough to complete the
quorum.
Thats why we recommend to have minimum three and odd number of nodes.

Do you have any reservations on this and want to have a changed behaviour??

Thanks
Best Regards
Muhammad Usama

> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Usama,
> >
> > While writing a presentation material of Pgpool-II 3.7, I am not sure
> > I understand the behavior the quorum consusens behavior.
> >
> >> *enable_multiple_failover_requests_from_node*
> >> This parameter works in connection with *failover_require_consensus*
> >> config. When enabled a single Pgpool-II node can vote for failover
> multiple
> >> times.
> >
> > In what situation a Pgpool-II node could send multiple failover
> > requests? My guess is in the following scenario:
> >
> > 1) Pgpool-II watchdog standby health check process detects the failure
> >    of backend A and send a faiover request to the master Pgpool-II.
> >
> > 2) Since the vote does not satisfy the quorum consensus, failver is
> >    not occurred. Just backend_info->quarantine is set and
> >    backend_info->backend_status is set to CON_DOWN.
> >
> > 3) Pgpool-II watchdog standby health check process detects the failure
> >    of backend A again, then sent a failover request to the master
> >    Pgpool-II again. If enable_multiple_failover_requests_from_node is
> >    set, failover will happen.
> >
> > But after thinking more, I realized that in step 3, since
> > backend_status is already set to CON_DOWN, health check will not be
> > performed against backend A. So the watchdog standby will not send
> > multiple vote.
> >
> > Apparently I am missing something here.
> >
> > Can you please tell what is the scenario in that a watchdog sends
> > multiple votes for failover?
> >
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese:http://www.sraoss.co.jp
> >
> > From: Muhammad Usama <m.usama at gmail.com>
> > Subject: New Feature with patch: Quorum and Consensus for backend
> failover
> > Date: Tue, 22 Aug 2017 00:18:27 +0500
> > Message-ID: <CAEJvTzUbz-d8dfsJdLt=XNYWdOMxKf06sp+p=uAbxyjvG=vS3A
> @mail.gmail.com>
> >
> >> Hi
> >>
> >> I was working on the new feature to make the backend node failover
> quorum
> >> aware and on the half way through the implementation I also added the
> >> majority consensus feature for the same.
> >>
> >> So please find the first version of the patch for review that makes the
> >> backend node failover consider the watchdog cluster quorum status and
> seek
> >> the majority consensus before performing failover.
> >>
> >> *Changes in the Failover mechanism with watchdog.*
> >> For this new feature I have modified the Pgpool-II's existing failover
> >> mechanism with watchdog.
> >> Previously as you know when the Pgpool-II require to perform a node
> >> operation (failover, failback, promote-node) with the watchdog. The
> >> watchdog used to propagated the failover request to all the Pgpool-II
> nodes
> >> in the watchdog cluster and as soon as the request was received by the
> >> node, it used to initiate the local failover and that failover was
> >> synchronised on all nodes using the distributed locks.
> >>
> >> *Now Only the Master node performs the failover.*
> >> The attached patch changes the mechanism of synchronised failover, and
> now
> >> only the Pgpool-II of master watchdog node performs the failover, and
> all
> >> other standby nodes sync the backend statuses after the master
> Pgpool-II is
> >> finished with the failover.
> >>
> >> *Overview of new failover mechanism.*
> >> -- If the failover request is received to the standby watchdog node(from
> >> local Pgpool-II), That request is forwarded to the master watchdog and
> the
> >> Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
> >> watchdog for the failover request the requesting Pgpool-II moves forward
> >> without doing anything further for the particular failover command.
> >>
> >> -- Now when the failover request from standby node is received by the
> >> master watchdog, after performing the validation, applying the consensus
> >> rules the failover request is triggered on the local Pgpool-II .
> >>
> >> -- When the failover request is received to the master watchdog node
> from
> >> the local Pgpool-II (On the IPC channel) the watchdog process inform the
> >> Pgpool-II requesting process to proceed with failover (provided all
> >> failover rules are satisfied).
> >>
> >> -- After the failover is finished on the master Pgpool-II, the failover
> >> function calls the *wd_failover_end*() which sends the backend sync
> >> required message to all standby watchdogs.
> >>
> >> -- Upon receiving the sync required message from master watchdog node
> all
> >> Pgpool-II sync the new statuses of each backend node from the master
> >> watchdog.
> >>
> >> *No More Failover locks*
> >> Since with this new failover mechanism we do not require any
> >> synchronisation and guards against the execution of failover_commands by
> >> multiple Pgpool-II nodes, So the patch removes all the distributed locks
> >> from failover function, This makes the failover simpler and faster.
> >>
> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
> >> The patch adds the new kind of backend node operation NODE_QUARANTINE
> which
> >> is effectively same as the NODE_DOWN, but with node_quarantine the
> >> failover_command is not triggered.
> >> The NODE_DOWN_REQUEST is automatically converted to the
> >> NODE_QUARANTINE_REQUEST when the failover is requested on the backend
> node
> >> but watchdog cluster does not holds the quorum.
> >> This means in the absence of quorum the failed backend nodes are
> >> quarantined and when the quorum becomes available again the Pgpool-II
> >> performs the failback operation on all quarantine nodes.
> >> And again when the failback is performed on the quarantine backend node
> the
> >> failover function does not trigger the failback_command.
> >>
> >> *Controlling the Failover behaviour.*
> >> The patch adds three new configuration parameters to configure the
> failover
> >> behaviour from user side.
> >>
> >> *failover_when_quorum_exists*
> >> When enabled the failover command will only be executed when the
> watchdog
> >> cluster holds the quorum. And when the quorum is absent and
> >> failover_when_quorum_exists is enabled the failed backend nodes will get
> >> quarantine until the quorum becomes available again.
> >> disabling it will enable the old behaviour of failover commands.
> >>
> >>
> >> *failover_require_consensus*This new configuration parameter can be
> used to
> >> make sure we get the majority vote before performing the failover on the
> >> node. When *failover_require_consensus* is enabled then the failover is
> >> only performed after receiving the failover request from the majority or
> >> Pgpool-II nodes.
> >> For example in three nodes cluster the failover will not be performed
> until
> >> at least two nodes ask for performing the failover on the particular
> >> backend node.
> >>
> >> It is also worthwhile to mention here that *failover_require_consensus*
> >> only works when failover_when_quorum_exists is enables.
> >>
> >>
> >> *enable_multiple_failover_requests_from_node*
> >> This parameter works in connection with *failover_require_consensus*
> >> config. When enabled a single Pgpool-II node can vote for failover
> multiple
> >> times.
> >> For example in the three nodes cluster if one Pgpool-II node sends the
> >> failover request of particular node twice that would be counted as two
> >> votes in favour of failover and the failover will be performed even if
> we
> >> do not get a vote from other two nodes.
> >>
> >> And when *enable_multiple_failover_requests_from_node* is disabled,
> Only
> >> the first vote from each Pgpool-II will be accepted and all other
> >> subsequent votes will be marked duplicate and rejected.
> >> So in that case we will require a majority votes from distinct nodes to
> >> execute the failover.
> >> Again this *enable_multiple_failover_requests_from_node* only becomes
> >> effective when both *failover_when_quorum_exists* and
> >> *failover_require_consensus* are enabled.
> >>
> >>
> >> *Controlling the failover: The Coding perspective.*
> >> Although the failover functions are made quorum and consensus aware but
> >> there is still a way to bypass the quorum conditions, and requirement of
> >> consensus.
> >>
> >> For this the patch uses the existing request_details flags in
> >> POOL_REQUEST_NODE to control the behaviour of failover.
> >>
> >> Here are the newly added flags values.
> >>
> >> *REQ_DETAIL_WATCHDOG*:
> >> Setting this flag while issuing the failover command will not send the
> >> failover request to the watchdog. But this flag may not be useful in any
> >> other place than where it is already used.
> >> Mostly this flag can be used to avoid the failover command from going to
> >> watchdog that is already originated from watchdog. Otherwise we can end
> up
> >> in infinite loop.
> >>
> >> *REQ_DETAIL_CONFIRMED*:
> >> Setting this flag will bypass the *failover_require_consensus*
> >> configuration and immediately perform the failover if quorum is present.
> >> This flag can be used to issue the failover request originated from PCP
> >> command.
> >>
> >> *REQ_DETAIL_UPDATE*:
> >> This flag is used for the command where we are failing back the
> quarantine
> >> nodes. Setting this flag will not trigger the failback_command.
> >>
> >> *Some conditional flags used:*
> >> I was not sure about the configuration of each type of failover
> operation.
> >> As we have three main failover operations NODE_UP_REQUEST,
> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
> >> So I was thinking do we need to give the configuration option to the
> users,
> >> if they want to enable/disable quorum checking and consensus for
> individual
> >> failover operation type.
> >> For example: is it a practical configuration where a user would want to
> >> ensure quorum while preforming NODE_DOWN operation while does not want
> it
> >> for NODE_UP.
> >> So in this patch I use three compile time defines to enable disable the
> >> individual failover operation, while we can decide on the best solution.
> >>
> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking
> feature
> >> for NODE_UP_REQUESTs
> >>
> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
> >> feature for NODE_DOWN_REQUESTs
> >>
> >> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum checking
> >> feature for PROMOTE_NODE_REQUESTs
> >>
> >> *Some Point for Discussion:*
> >>
> >> *Do we really need to check ReqInfo->switching flag before enqueuing
> >> failover request.*
> >> While working on the patch I was wondering why do we disallow enqueuing
> the
> >> failover command when the failover is already in progress? For example
> in
> >> *pcp_process_command*() function if we see the *Req_info->switching*
> flag
> >> set we bailout with the error instead of enqueuing the command. Is is
> >> really necessary?
> >>
> >> *Do we need more granule control over each failover operation:*
> >> As described in section "Some conditional flags used" I want the
> opinion on
> >> do we need configuration parameters in pgpool.conf to enable disable
> quorum
> >> and consensus checking on individual failover types.
> >>
> >> *Which failover should be mark as Confirmed:*
> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark the
> >> failover request to not need consensus, currently the requests from the
> PCP
> >> commands are fired with this flag. But I was wondering there may be more
> >> places where we many need to use the flag.
> >> For example I currently use the same confirmed flag when failover is
> >> triggered because of *replication_stop_on_mismatch*.
> >>
> >> I think we should think this flag for each place of failover, like when
> the
> >> failover is triggered
> >> because of health_check failure.
> >> because of replication mismatch
> >> because of backend_error
> >> e.t.c
> >>
> >> *Node Quarantine behaviour.*
> >> What do you think about the node quarantine used by this patch. Can you
> >> think of some problem which can be caused by this?
> >>
> >> *What should be the default values for each newly added config
> parameters.*
> >>
> >>
> >>
> >> *TODOs*
> >>
> >> -- Updating the documentation is still todo. Will do that once every
> aspect
> >> of the feature will be finalised.
> >> -- Some code warnings and cleanups are still not done.
> >> -- I am still little short on testing
> >> -- Regression test cases for the feature
> >>
> >>
> >> Thoughts and suggestions are most welcome.
> >>
> >> Thanks
> >> Best regards
> >> Muhammad Usama
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20171128/4d326855/attachment-0001.html>