[pgpool-hackers: 2506] Re: New Feature with patch: Quorum and Consensus for backend failover

Fri Aug 25 16:53:13 JST 2017

Usama,

With the new patch, the regression tests all passed.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Ishii-San
> 
> Please fine the updated patch, It fixes the regression issue you were
> facing and also another bug which I encountered during my testing.
> 
> -- Adding Yugo to the thread,
> Hi Yugo,
> 
> Since you are an expert of watchdog feature, So I thought you might have
> something to say especially regarding the discussion points mentioned in
> the initial mail.
> 
> 
> Thanks
> Best Regards
> Muhammad Usama
> 
> 
> On Thu, Aug 24, 2017 at 11:25 AM, Muhammad Usama <m.usama at gmail.com> wrote:
> 
>>
>>
>> On Thu, Aug 24, 2017 at 4:34 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>>> After applying the patch, many of regression tests fail. It seems
>>> pgpool.conf.sample has bogus comment which causes the pgpool.conf
>>> parser to complain parse error.
>>>
>>> 2017-08-24 08:22:36: pid 6017: FATAL:  syntex error in configuration file
>>> "/home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regre
>>> ssion/tests/004.watchdog/standby/etc/pgpool.conf"
>>> 2017-08-24 08:22:36: pid 6017: DETAIL:  parse error at line 568 '*' token
>>> = 8
>>>
>>
>> Really sorry, Somehow I overlooked the sample config file changes I made
>> at the last minute.
>> Will send you the updated version.
>>
>> Thanks
>> Best Regards
>> Muhammad Usama
>>
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> > Usama,
>>> >
>>> > Thanks for the patch. I am going to review it.
>>> >
>>> > In the mean time when I apply your patch, I got some trailing
>>> > whitespace errors. Can you please fix them?
>>> >
>>> > /home/t-ishii/quorum_aware_failover.diff:470: trailing whitespace.
>>> >
>>> > /home/t-ishii/quorum_aware_failover.diff:485: trailing whitespace.
>>> >
>>> > /home/t-ishii/quorum_aware_failover.diff:564: trailing whitespace.
>>> >
>>> > /home/t-ishii/quorum_aware_failover.diff:1428: trailing whitespace.
>>> >
>>> > /home/t-ishii/quorum_aware_failover.diff:1450: trailing whitespace.
>>> >
>>> > warning: squelched 3 whitespace errors
>>> > warning: 8 lines add whitespace errors.
>>> >
>>> > Best regards,
>>> > --
>>> > Tatsuo Ishii
>>> > SRA OSS, Inc. Japan
>>> > English: http://www.sraoss.co.jp/index_en.php
>>> > Japanese:http://www.sraoss.co.jp
>>> >
>>> >> Hi
>>> >>
>>> >> I was working on the new feature to make the backend node failover
>>> quorum
>>> >> aware and on the half way through the implementation I also added the
>>> >> majority consensus feature for the same.
>>> >>
>>> >> So please find the first version of the patch for review that makes the
>>> >> backend node failover consider the watchdog cluster quorum status and
>>> seek
>>> >> the majority consensus before performing failover.
>>> >>
>>> >> *Changes in the Failover mechanism with watchdog.*
>>> >> For this new feature I have modified the Pgpool-II's existing failover
>>> >> mechanism with watchdog.
>>> >> Previously as you know when the Pgpool-II require to perform a node
>>> >> operation (failover, failback, promote-node) with the watchdog. The
>>> >> watchdog used to propagated the failover request to all the Pgpool-II
>>> nodes
>>> >> in the watchdog cluster and as soon as the request was received by the
>>> >> node, it used to initiate the local failover and that failover was
>>> >> synchronised on all nodes using the distributed locks.
>>> >>
>>> >> *Now Only the Master node performs the failover.*
>>> >> The attached patch changes the mechanism of synchronised failover, and
>>> now
>>> >> only the Pgpool-II of master watchdog node performs the failover, and
>>> all
>>> >> other standby nodes sync the backend statuses after the master
>>> Pgpool-II is
>>> >> finished with the failover.
>>> >>
>>> >> *Overview of new failover mechanism.*
>>> >> -- If the failover request is received to the standby watchdog
>>> node(from
>>> >> local Pgpool-II), That request is forwarded to the master watchdog and
>>> the
>>> >> Pgpool-II main process is returned with the FAILOVER_RES_WILL_BE_DONE
>>> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from the
>>> >> watchdog for the failover request the requesting Pgpool-II moves
>>> forward
>>> >> without doing anything further for the particular failover command.
>>> >>
>>> >> -- Now when the failover request from standby node is received by the
>>> >> master watchdog, after performing the validation, applying the
>>> consensus
>>> >> rules the failover request is triggered on the local Pgpool-II .
>>> >>
>>> >> -- When the failover request is received to the master watchdog node
>>> from
>>> >> the local Pgpool-II (On the IPC channel) the watchdog process inform
>>> the
>>> >> Pgpool-II requesting process to proceed with failover (provided all
>>> >> failover rules are satisfied).
>>> >>
>>> >> -- After the failover is finished on the master Pgpool-II, the failover
>>> >> function calls the *wd_failover_end*() which sends the backend sync
>>> >> required message to all standby watchdogs.
>>> >>
>>> >> -- Upon receiving the sync required message from master watchdog node
>>> all
>>> >> Pgpool-II sync the new statuses of each backend node from the master
>>> >> watchdog.
>>> >>
>>> >> *No More Failover locks*
>>> >> Since with this new failover mechanism we do not require any
>>> >> synchronisation and guards against the execution of failover_commands
>>> by
>>> >> multiple Pgpool-II nodes, So the patch removes all the distributed
>>> locks
>>> >> from failover function, This makes the failover simpler and faster.
>>> >>
>>> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
>>> >> The patch adds the new kind of backend node operation NODE_QUARANTINE
>>> which
>>> >> is effectively same as the NODE_DOWN, but with node_quarantine the
>>> >> failover_command is not triggered.
>>> >> The NODE_DOWN_REQUEST is automatically converted to the
>>> >> NODE_QUARANTINE_REQUEST when the failover is requested on the backend
>>> node
>>> >> but watchdog cluster does not holds the quorum.
>>> >> This means in the absence of quorum the failed backend nodes are
>>> >> quarantined and when the quorum becomes available again the Pgpool-II
>>> >> performs the failback operation on all quarantine nodes.
>>> >> And again when the failback is performed on the quarantine backend
>>> node the
>>> >> failover function does not trigger the failback_command.
>>> >>
>>> >> *Controlling the Failover behaviour.*
>>> >> The patch adds three new configuration parameters to configure the
>>> failover
>>> >> behaviour from user side.
>>> >>
>>> >> *failover_when_quorum_exists*
>>> >> When enabled the failover command will only be executed when the
>>> watchdog
>>> >> cluster holds the quorum. And when the quorum is absent and
>>> >> failover_when_quorum_exists is enabled the failed backend nodes will
>>> get
>>> >> quarantine until the quorum becomes available again.
>>> >> disabling it will enable the old behaviour of failover commands.
>>> >>
>>> >>
>>> >> *failover_require_consensus*This new configuration parameter can be
>>> used to
>>> >> make sure we get the majority vote before performing the failover on
>>> the
>>> >> node. When *failover_require_consensus* is enabled then the failover is
>>> >> only performed after receiving the failover request from the majority
>>> or
>>> >> Pgpool-II nodes.
>>> >> For example in three nodes cluster the failover will not be performed
>>> until
>>> >> at least two nodes ask for performing the failover on the particular
>>> >> backend node.
>>> >>
>>> >> It is also worthwhile to mention here that *failover_require_consensus*
>>> >> only works when failover_when_quorum_exists is enables.
>>> >>
>>> >>
>>> >> *enable_multiple_failover_requests_from_node*
>>> >> This parameter works in connection with *failover_require_consensus*
>>> >> config. When enabled a single Pgpool-II node can vote for failover
>>> multiple
>>> >> times.
>>> >> For example in the three nodes cluster if one Pgpool-II node sends the
>>> >> failover request of particular node twice that would be counted as two
>>> >> votes in favour of failover and the failover will be performed even if
>>> we
>>> >> do not get a vote from other two nodes.
>>> >>
>>> >> And when *enable_multiple_failover_requests_from_node* is disabled,
>>> Only
>>> >> the first vote from each Pgpool-II will be accepted and all other
>>> >> subsequent votes will be marked duplicate and rejected.
>>> >> So in that case we will require a majority votes from distinct nodes to
>>> >> execute the failover.
>>> >> Again this *enable_multiple_failover_requests_from_node* only becomes
>>> >> effective when both *failover_when_quorum_exists* and
>>> >> *failover_require_consensus* are enabled.
>>> >>
>>> >>
>>> >> *Controlling the failover: The Coding perspective.*
>>> >> Although the failover functions are made quorum and consensus aware but
>>> >> there is still a way to bypass the quorum conditions, and requirement
>>> of
>>> >> consensus.
>>> >>
>>> >> For this the patch uses the existing request_details flags in
>>> >> POOL_REQUEST_NODE to control the behaviour of failover.
>>> >>
>>> >> Here are the newly added flags values.
>>> >>
>>> >> *REQ_DETAIL_WATCHDOG*:
>>> >> Setting this flag while issuing the failover command will not send the
>>> >> failover request to the watchdog. But this flag may not be useful in
>>> any
>>> >> other place than where it is already used.
>>> >> Mostly this flag can be used to avoid the failover command from going
>>> to
>>> >> watchdog that is already originated from watchdog. Otherwise we can
>>> end up
>>> >> in infinite loop.
>>> >>
>>> >> *REQ_DETAIL_CONFIRMED*:
>>> >> Setting this flag will bypass the *failover_require_consensus*
>>> >> configuration and immediately perform the failover if quorum is
>>> present.
>>> >> This flag can be used to issue the failover request originated from PCP
>>> >> command.
>>> >>
>>> >> *REQ_DETAIL_UPDATE*:
>>> >> This flag is used for the command where we are failing back the
>>> quarantine
>>> >> nodes. Setting this flag will not trigger the failback_command.
>>> >>
>>> >> *Some conditional flags used:*
>>> >> I was not sure about the configuration of each type of failover
>>> operation.
>>> >> As we have three main failover operations NODE_UP_REQUEST,
>>> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
>>> >> So I was thinking do we need to give the configuration option to the
>>> users,
>>> >> if they want to enable/disable quorum checking and consensus for
>>> individual
>>> >> failover operation type.
>>> >> For example: is it a practical configuration where a user would want to
>>> >> ensure quorum while preforming NODE_DOWN operation while does not want
>>> it
>>> >> for NODE_UP.
>>> >> So in this patch I use three compile time defines to enable disable the
>>> >> individual failover operation, while we can decide on the best
>>> solution.
>>> >>
>>> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking
>>> feature
>>> >> for NODE_UP_REQUESTs
>>> >>
>>> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
>>> >> feature for NODE_DOWN_REQUESTs
>>> >>
>>> >> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum
>>> checking
>>> >> feature for PROMOTE_NODE_REQUESTs
>>> >>
>>> >> *Some Point for Discussion:*
>>> >>
>>> >> *Do we really need to check ReqInfo->switching flag before enqueuing
>>> >> failover request.*
>>> >> While working on the patch I was wondering why do we disallow
>>> enqueuing the
>>> >> failover command when the failover is already in progress? For example
>>> in
>>> >> *pcp_process_command*() function if we see the *Req_info->switching*
>>> flag
>>> >> set we bailout with the error instead of enqueuing the command. Is is
>>> >> really necessary?
>>> >>
>>> >> *Do we need more granule control over each failover operation:*
>>> >> As described in section "Some conditional flags used" I want the
>>> opinion on
>>> >> do we need configuration parameters in pgpool.conf to enable disable
>>> quorum
>>> >> and consensus checking on individual failover types.
>>> >>
>>> >> *Which failover should be mark as Confirmed:*
>>> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark
>>> the
>>> >> failover request to not need consensus, currently the requests from
>>> the PCP
>>> >> commands are fired with this flag. But I was wondering there may be
>>> more
>>> >> places where we many need to use the flag.
>>> >> For example I currently use the same confirmed flag when failover is
>>> >> triggered because of *replication_stop_on_mismatch*.
>>> >>
>>> >> I think we should think this flag for each place of failover, like
>>> when the
>>> >> failover is triggered
>>> >> because of health_check failure.
>>> >> because of replication mismatch
>>> >> because of backend_error
>>> >> e.t.c
>>> >>
>>> >> *Node Quarantine behaviour.*
>>> >> What do you think about the node quarantine used by this patch. Can you
>>> >> think of some problem which can be caused by this?
>>> >>
>>> >> *What should be the default values for each newly added config
>>> parameters.*
>>> >>
>>> >>
>>> >>
>>> >> *TODOs*
>>> >>
>>> >> -- Updating the documentation is still todo. Will do that once every
>>> aspect
>>> >> of the feature will be finalised.
>>> >> -- Some code warnings and cleanups are still not done.
>>> >> -- I am still little short on testing
>>> >> -- Regression test cases for the feature
>>> >>
>>> >>
>>> >> Thoughts and suggestions are most welcome.
>>> >>
>>> >> Thanks
>>> >> Best regards
>>> >> Muhammad Usama
>>> > _______________________________________________
>>> > pgpool-hackers mailing list
>>> > pgpool-hackers at pgpool.net
>>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>>
>>
>>