[pgpool-hackers: 2509] Re: New Feature with patch: Quorum and Consensus for backend failover

Fri Aug 25 21:05:28 JST 2017

> On Fri, Aug 25, 2017 at 12:53 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Usama,
>>
>> With the new patch, the regression tests all passed.
>>
> 
> Glad to hear that :-)
> Did you had a chance to look at the node quarantine state I added. What are
> your thoughts on that ?

I'm going to look into the patch this weekend.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

>> > Hi Ishii-San
>> >
>> > Please fine the updated patch, It fixes the regression issue you were
>> > facing and also another bug which I encountered during my testing.
>> >
>> > -- Adding Yugo to the thread,
>> > Hi Yugo,
>> >
>> > Since you are an expert of watchdog feature, So I thought you might have
>> > something to say especially regarding the discussion points mentioned in
>> > the initial mail.
>> >
>> >
>> > Thanks
>> > Best Regards
>> > Muhammad Usama
>> >
>> >
>> > On Thu, Aug 24, 2017 at 11:25 AM, Muhammad Usama <m.usama at gmail.com>
>> wrote:
>> >
>> >>
>> >>
>> >> On Thu, Aug 24, 2017 at 4:34 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >>
>> >>> After applying the patch, many of regression tests fail. It seems
>> >>> pgpool.conf.sample has bogus comment which causes the pgpool.conf
>> >>> parser to complain parse error.
>> >>>
>> >>> 2017-08-24 08:22:36: pid 6017: FATAL:  syntex error in configuration
>> file
>> >>> "/home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regre
>> >>> ssion/tests/004.watchdog/standby/etc/pgpool.conf"
>> >>> 2017-08-24 08:22:36: pid 6017: DETAIL:  parse error at line 568 '*'
>> token
>> >>> = 8
>> >>>
>> >>
>> >> Really sorry, Somehow I overlooked the sample config file changes I made
>> >> at the last minute.
>> >> Will send you the updated version.
>> >>
>> >> Thanks
>> >> Best Regards
>> >> Muhammad Usama
>> >>
>> >>>
>> >>> Best regards,
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS, Inc. Japan
>> >>> English: http://www.sraoss.co.jp/index_en.php
>> >>> Japanese:http://www.sraoss.co.jp
>> >>>
>> >>> > Usama,
>> >>> >
>> >>> > Thanks for the patch. I am going to review it.
>> >>> >
>> >>> > In the mean time when I apply your patch, I got some trailing
>> >>> > whitespace errors. Can you please fix them?
>> >>> >
>> >>> > /home/t-ishii/quorum_aware_failover.diff:470: trailing whitespace.
>> >>> >
>> >>> > /home/t-ishii/quorum_aware_failover.diff:485: trailing whitespace.
>> >>> >
>> >>> > /home/t-ishii/quorum_aware_failover.diff:564: trailing whitespace.
>> >>> >
>> >>> > /home/t-ishii/quorum_aware_failover.diff:1428: trailing whitespace.
>> >>> >
>> >>> > /home/t-ishii/quorum_aware_failover.diff:1450: trailing whitespace.
>> >>> >
>> >>> > warning: squelched 3 whitespace errors
>> >>> > warning: 8 lines add whitespace errors.
>> >>> >
>> >>> > Best regards,
>> >>> > --
>> >>> > Tatsuo Ishii
>> >>> > SRA OSS, Inc. Japan
>> >>> > English: http://www.sraoss.co.jp/index_en.php
>> >>> > Japanese:http://www.sraoss.co.jp
>> >>> >
>> >>> >> Hi
>> >>> >>
>> >>> >> I was working on the new feature to make the backend node failover
>> >>> quorum
>> >>> >> aware and on the half way through the implementation I also added
>> the
>> >>> >> majority consensus feature for the same.
>> >>> >>
>> >>> >> So please find the first version of the patch for review that makes
>> the
>> >>> >> backend node failover consider the watchdog cluster quorum status
>> and
>> >>> seek
>> >>> >> the majority consensus before performing failover.
>> >>> >>
>> >>> >> *Changes in the Failover mechanism with watchdog.*
>> >>> >> For this new feature I have modified the Pgpool-II's existing
>> failover
>> >>> >> mechanism with watchdog.
>> >>> >> Previously as you know when the Pgpool-II require to perform a node
>> >>> >> operation (failover, failback, promote-node) with the watchdog. The
>> >>> >> watchdog used to propagated the failover request to all the
>> Pgpool-II
>> >>> nodes
>> >>> >> in the watchdog cluster and as soon as the request was received by
>> the
>> >>> >> node, it used to initiate the local failover and that failover was
>> >>> >> synchronised on all nodes using the distributed locks.
>> >>> >>
>> >>> >> *Now Only the Master node performs the failover.*
>> >>> >> The attached patch changes the mechanism of synchronised failover,
>> and
>> >>> now
>> >>> >> only the Pgpool-II of master watchdog node performs the failover,
>> and
>> >>> all
>> >>> >> other standby nodes sync the backend statuses after the master
>> >>> Pgpool-II is
>> >>> >> finished with the failover.
>> >>> >>
>> >>> >> *Overview of new failover mechanism.*
>> >>> >> -- If the failover request is received to the standby watchdog
>> >>> node(from
>> >>> >> local Pgpool-II), That request is forwarded to the master watchdog
>> and
>> >>> the
>> >>> >> Pgpool-II main process is returned with the
>> FAILOVER_RES_WILL_BE_DONE
>> >>> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE from
>> the
>> >>> >> watchdog for the failover request the requesting Pgpool-II moves
>> >>> forward
>> >>> >> without doing anything further for the particular failover command.
>> >>> >>
>> >>> >> -- Now when the failover request from standby node is received by
>> the
>> >>> >> master watchdog, after performing the validation, applying the
>> >>> consensus
>> >>> >> rules the failover request is triggered on the local Pgpool-II .
>> >>> >>
>> >>> >> -- When the failover request is received to the master watchdog node
>> >>> from
>> >>> >> the local Pgpool-II (On the IPC channel) the watchdog process inform
>> >>> the
>> >>> >> Pgpool-II requesting process to proceed with failover (provided all
>> >>> >> failover rules are satisfied).
>> >>> >>
>> >>> >> -- After the failover is finished on the master Pgpool-II, the
>> failover
>> >>> >> function calls the *wd_failover_end*() which sends the backend sync
>> >>> >> required message to all standby watchdogs.
>> >>> >>
>> >>> >> -- Upon receiving the sync required message from master watchdog
>> node
>> >>> all
>> >>> >> Pgpool-II sync the new statuses of each backend node from the master
>> >>> >> watchdog.
>> >>> >>
>> >>> >> *No More Failover locks*
>> >>> >> Since with this new failover mechanism we do not require any
>> >>> >> synchronisation and guards against the execution of
>> failover_commands
>> >>> by
>> >>> >> multiple Pgpool-II nodes, So the patch removes all the distributed
>> >>> locks
>> >>> >> from failover function, This makes the failover simpler and faster.
>> >>> >>
>> >>> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
>> >>> >> The patch adds the new kind of backend node operation
>> NODE_QUARANTINE
>> >>> which
>> >>> >> is effectively same as the NODE_DOWN, but with node_quarantine the
>> >>> >> failover_command is not triggered.
>> >>> >> The NODE_DOWN_REQUEST is automatically converted to the
>> >>> >> NODE_QUARANTINE_REQUEST when the failover is requested on the
>> backend
>> >>> node
>> >>> >> but watchdog cluster does not holds the quorum.
>> >>> >> This means in the absence of quorum the failed backend nodes are
>> >>> >> quarantined and when the quorum becomes available again the
>> Pgpool-II
>> >>> >> performs the failback operation on all quarantine nodes.
>> >>> >> And again when the failback is performed on the quarantine backend
>> >>> node the
>> >>> >> failover function does not trigger the failback_command.
>> >>> >>
>> >>> >> *Controlling the Failover behaviour.*
>> >>> >> The patch adds three new configuration parameters to configure the
>> >>> failover
>> >>> >> behaviour from user side.
>> >>> >>
>> >>> >> *failover_when_quorum_exists*
>> >>> >> When enabled the failover command will only be executed when the
>> >>> watchdog
>> >>> >> cluster holds the quorum. And when the quorum is absent and
>> >>> >> failover_when_quorum_exists is enabled the failed backend nodes will
>> >>> get
>> >>> >> quarantine until the quorum becomes available again.
>> >>> >> disabling it will enable the old behaviour of failover commands.
>> >>> >>
>> >>> >>
>> >>> >> *failover_require_consensus*This new configuration parameter can be
>> >>> used to
>> >>> >> make sure we get the majority vote before performing the failover on
>> >>> the
>> >>> >> node. When *failover_require_consensus* is enabled then the
>> failover is
>> >>> >> only performed after receiving the failover request from the
>> majority
>> >>> or
>> >>> >> Pgpool-II nodes.
>> >>> >> For example in three nodes cluster the failover will not be
>> performed
>> >>> until
>> >>> >> at least two nodes ask for performing the failover on the particular
>> >>> >> backend node.
>> >>> >>
>> >>> >> It is also worthwhile to mention here that
>> *failover_require_consensus*
>> >>> >> only works when failover_when_quorum_exists is enables.
>> >>> >>
>> >>> >>
>> >>> >> *enable_multiple_failover_requests_from_node*
>> >>> >> This parameter works in connection with *failover_require_consensus*
>> >>> >> config. When enabled a single Pgpool-II node can vote for failover
>> >>> multiple
>> >>> >> times.
>> >>> >> For example in the three nodes cluster if one Pgpool-II node sends
>> the
>> >>> >> failover request of particular node twice that would be counted as
>> two
>> >>> >> votes in favour of failover and the failover will be performed even
>> if
>> >>> we
>> >>> >> do not get a vote from other two nodes.
>> >>> >>
>> >>> >> And when *enable_multiple_failover_requests_from_node* is disabled,
>> >>> Only
>> >>> >> the first vote from each Pgpool-II will be accepted and all other
>> >>> >> subsequent votes will be marked duplicate and rejected.
>> >>> >> So in that case we will require a majority votes from distinct
>> nodes to
>> >>> >> execute the failover.
>> >>> >> Again this *enable_multiple_failover_requests_from_node* only
>> becomes
>> >>> >> effective when both *failover_when_quorum_exists* and
>> >>> >> *failover_require_consensus* are enabled.
>> >>> >>
>> >>> >>
>> >>> >> *Controlling the failover: The Coding perspective.*
>> >>> >> Although the failover functions are made quorum and consensus aware
>> but
>> >>> >> there is still a way to bypass the quorum conditions, and
>> requirement
>> >>> of
>> >>> >> consensus.
>> >>> >>
>> >>> >> For this the patch uses the existing request_details flags in
>> >>> >> POOL_REQUEST_NODE to control the behaviour of failover.
>> >>> >>
>> >>> >> Here are the newly added flags values.
>> >>> >>
>> >>> >> *REQ_DETAIL_WATCHDOG*:
>> >>> >> Setting this flag while issuing the failover command will not send
>> the
>> >>> >> failover request to the watchdog. But this flag may not be useful in
>> >>> any
>> >>> >> other place than where it is already used.
>> >>> >> Mostly this flag can be used to avoid the failover command from
>> going
>> >>> to
>> >>> >> watchdog that is already originated from watchdog. Otherwise we can
>> >>> end up
>> >>> >> in infinite loop.
>> >>> >>
>> >>> >> *REQ_DETAIL_CONFIRMED*:
>> >>> >> Setting this flag will bypass the *failover_require_consensus*
>> >>> >> configuration and immediately perform the failover if quorum is
>> >>> present.
>> >>> >> This flag can be used to issue the failover request originated from
>> PCP
>> >>> >> command.
>> >>> >>
>> >>> >> *REQ_DETAIL_UPDATE*:
>> >>> >> This flag is used for the command where we are failing back the
>> >>> quarantine
>> >>> >> nodes. Setting this flag will not trigger the failback_command.
>> >>> >>
>> >>> >> *Some conditional flags used:*
>> >>> >> I was not sure about the configuration of each type of failover
>> >>> operation.
>> >>> >> As we have three main failover operations NODE_UP_REQUEST,
>> >>> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
>> >>> >> So I was thinking do we need to give the configuration option to the
>> >>> users,
>> >>> >> if they want to enable/disable quorum checking and consensus for
>> >>> individual
>> >>> >> failover operation type.
>> >>> >> For example: is it a practical configuration where a user would
>> want to
>> >>> >> ensure quorum while preforming NODE_DOWN operation while does not
>> want
>> >>> it
>> >>> >> for NODE_UP.
>> >>> >> So in this patch I use three compile time defines to enable disable
>> the
>> >>> >> individual failover operation, while we can decide on the best
>> >>> solution.
>> >>> >>
>> >>> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum checking
>> >>> feature
>> >>> >> for NODE_UP_REQUESTs
>> >>> >>
>> >>> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum checking
>> >>> >> feature for NODE_DOWN_REQUESTs
>> >>> >>
>> >>> >> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum
>> >>> checking
>> >>> >> feature for PROMOTE_NODE_REQUESTs
>> >>> >>
>> >>> >> *Some Point for Discussion:*
>> >>> >>
>> >>> >> *Do we really need to check ReqInfo->switching flag before enqueuing
>> >>> >> failover request.*
>> >>> >> While working on the patch I was wondering why do we disallow
>> >>> enqueuing the
>> >>> >> failover command when the failover is already in progress? For
>> example
>> >>> in
>> >>> >> *pcp_process_command*() function if we see the *Req_info->switching*
>> >>> flag
>> >>> >> set we bailout with the error instead of enqueuing the command. Is
>> is
>> >>> >> really necessary?
>> >>> >>
>> >>> >> *Do we need more granule control over each failover operation:*
>> >>> >> As described in section "Some conditional flags used" I want the
>> >>> opinion on
>> >>> >> do we need configuration parameters in pgpool.conf to enable disable
>> >>> quorum
>> >>> >> and consensus checking on individual failover types.
>> >>> >>
>> >>> >> *Which failover should be mark as Confirmed:*
>> >>> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We can mark
>> >>> the
>> >>> >> failover request to not need consensus, currently the requests from
>> >>> the PCP
>> >>> >> commands are fired with this flag. But I was wondering there may be
>> >>> more
>> >>> >> places where we many need to use the flag.
>> >>> >> For example I currently use the same confirmed flag when failover is
>> >>> >> triggered because of *replication_stop_on_mismatch*.
>> >>> >>
>> >>> >> I think we should think this flag for each place of failover, like
>> >>> when the
>> >>> >> failover is triggered
>> >>> >> because of health_check failure.
>> >>> >> because of replication mismatch
>> >>> >> because of backend_error
>> >>> >> e.t.c
>> >>> >>
>> >>> >> *Node Quarantine behaviour.*
>> >>> >> What do you think about the node quarantine used by this patch. Can
>> you
>> >>> >> think of some problem which can be caused by this?
>> >>> >>
>> >>> >> *What should be the default values for each newly added config
>> >>> parameters.*
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> *TODOs*
>> >>> >>
>> >>> >> -- Updating the documentation is still todo. Will do that once every
>> >>> aspect
>> >>> >> of the feature will be finalised.
>> >>> >> -- Some code warnings and cleanups are still not done.
>> >>> >> -- I am still little short on testing
>> >>> >> -- Regression test cases for the feature
>> >>> >>
>> >>> >>
>> >>> >> Thoughts and suggestions are most welcome.
>> >>> >>
>> >>> >> Thanks
>> >>> >> Best regards
>> >>> >> Muhammad Usama
>> >>> > _______________________________________________
>> >>> > pgpool-hackers mailing list
>> >>> > pgpool-hackers at pgpool.net
>> >>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> >>>
>> >>
>> >>
>>