[pgpool-hackers: 2510] Re: New Feature with patch: Quorum and Consensus for backend failover

Fri Aug 25 21:49:08 JST 2017

On Fri, Aug 25, 2017 at 5:05 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > On Fri, Aug 25, 2017 at 12:53 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> Usama,
> >>
> >> With the new patch, the regression tests all passed.
> >>
> >
> > Glad to hear that :-)
> > Did you had a chance to look at the node quarantine state I added. What
> are
> > your thoughts on that ?
>
> I'm going to look into the patch this weekend.
>

Many thanks

Best Regards
Muhammad Usama

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> >> > Hi Ishii-San
> >> >
> >> > Please fine the updated patch, It fixes the regression issue you were
> >> > facing and also another bug which I encountered during my testing.
> >> >
> >> > -- Adding Yugo to the thread,
> >> > Hi Yugo,
> >> >
> >> > Since you are an expert of watchdog feature, So I thought you might
> have
> >> > something to say especially regarding the discussion points mentioned
> in
> >> > the initial mail.
> >> >
> >> >
> >> > Thanks
> >> > Best Regards
> >> > Muhammad Usama
> >> >
> >> >
> >> > On Thu, Aug 24, 2017 at 11:25 AM, Muhammad Usama <m.usama at gmail.com>
> >> wrote:
> >> >
> >> >>
> >> >>
> >> >> On Thu, Aug 24, 2017 at 4:34 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >>
> >> >>> After applying the patch, many of regression tests fail. It seems
> >> >>> pgpool.conf.sample has bogus comment which causes the pgpool.conf
> >> >>> parser to complain parse error.
> >> >>>
> >> >>> 2017-08-24 08:22:36: pid 6017: FATAL:  syntex error in configuration
> >> file
> >> >>> "/home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regre
> >> >>> ssion/tests/004.watchdog/standby/etc/pgpool.conf"
> >> >>> 2017-08-24 08:22:36: pid 6017: DETAIL:  parse error at line 568 '*'
> >> token
> >> >>> = 8
> >> >>>
> >> >>
> >> >> Really sorry, Somehow I overlooked the sample config file changes I
> made
> >> >> at the last minute.
> >> >> Will send you the updated version.
> >> >>
> >> >> Thanks
> >> >> Best Regards
> >> >> Muhammad Usama
> >> >>
> >> >>>
> >> >>> Best regards,
> >> >>> --
> >> >>> Tatsuo Ishii
> >> >>> SRA OSS, Inc. Japan
> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >>> Japanese:http://www.sraoss.co.jp
> >> >>>
> >> >>> > Usama,
> >> >>> >
> >> >>> > Thanks for the patch. I am going to review it.
> >> >>> >
> >> >>> > In the mean time when I apply your patch, I got some trailing
> >> >>> > whitespace errors. Can you please fix them?
> >> >>> >
> >> >>> > /home/t-ishii/quorum_aware_failover.diff:470: trailing
> whitespace.
> >> >>> >
> >> >>> > /home/t-ishii/quorum_aware_failover.diff:485: trailing
> whitespace.
> >> >>> >
> >> >>> > /home/t-ishii/quorum_aware_failover.diff:564: trailing
> whitespace.
> >> >>> >
> >> >>> > /home/t-ishii/quorum_aware_failover.diff:1428: trailing
> whitespace.
> >> >>> >
> >> >>> > /home/t-ishii/quorum_aware_failover.diff:1450: trailing
> whitespace.
> >> >>> >
> >> >>> > warning: squelched 3 whitespace errors
> >> >>> > warning: 8 lines add whitespace errors.
> >> >>> >
> >> >>> > Best regards,
> >> >>> > --
> >> >>> > Tatsuo Ishii
> >> >>> > SRA OSS, Inc. Japan
> >> >>> > English: http://www.sraoss.co.jp/index_en.php
> >> >>> > Japanese:http://www.sraoss.co.jp
> >> >>> >
> >> >>> >> Hi
> >> >>> >>
> >> >>> >> I was working on the new feature to make the backend node
> failover
> >> >>> quorum
> >> >>> >> aware and on the half way through the implementation I also added
> >> the
> >> >>> >> majority consensus feature for the same.
> >> >>> >>
> >> >>> >> So please find the first version of the patch for review that
> makes
> >> the
> >> >>> >> backend node failover consider the watchdog cluster quorum status
> >> and
> >> >>> seek
> >> >>> >> the majority consensus before performing failover.
> >> >>> >>
> >> >>> >> *Changes in the Failover mechanism with watchdog.*
> >> >>> >> For this new feature I have modified the Pgpool-II's existing
> >> failover
> >> >>> >> mechanism with watchdog.
> >> >>> >> Previously as you know when the Pgpool-II require to perform a
> node
> >> >>> >> operation (failover, failback, promote-node) with the watchdog.
> The
> >> >>> >> watchdog used to propagated the failover request to all the
> >> Pgpool-II
> >> >>> nodes
> >> >>> >> in the watchdog cluster and as soon as the request was received
> by
> >> the
> >> >>> >> node, it used to initiate the local failover and that failover
> was
> >> >>> >> synchronised on all nodes using the distributed locks.
> >> >>> >>
> >> >>> >> *Now Only the Master node performs the failover.*
> >> >>> >> The attached patch changes the mechanism of synchronised
> failover,
> >> and
> >> >>> now
> >> >>> >> only the Pgpool-II of master watchdog node performs the failover,
> >> and
> >> >>> all
> >> >>> >> other standby nodes sync the backend statuses after the master
> >> >>> Pgpool-II is
> >> >>> >> finished with the failover.
> >> >>> >>
> >> >>> >> *Overview of new failover mechanism.*
> >> >>> >> -- If the failover request is received to the standby watchdog
> >> >>> node(from
> >> >>> >> local Pgpool-II), That request is forwarded to the master
> watchdog
> >> and
> >> >>> the
> >> >>> >> Pgpool-II main process is returned with the
> >> FAILOVER_RES_WILL_BE_DONE
> >> >>> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE
> from
> >> the
> >> >>> >> watchdog for the failover request the requesting Pgpool-II moves
> >> >>> forward
> >> >>> >> without doing anything further for the particular failover
> command.
> >> >>> >>
> >> >>> >> -- Now when the failover request from standby node is received by
> >> the
> >> >>> >> master watchdog, after performing the validation, applying the
> >> >>> consensus
> >> >>> >> rules the failover request is triggered on the local Pgpool-II .
> >> >>> >>
> >> >>> >> -- When the failover request is received to the master watchdog
> node
> >> >>> from
> >> >>> >> the local Pgpool-II (On the IPC channel) the watchdog process
> inform
> >> >>> the
> >> >>> >> Pgpool-II requesting process to proceed with failover (provided
> all
> >> >>> >> failover rules are satisfied).
> >> >>> >>
> >> >>> >> -- After the failover is finished on the master Pgpool-II, the
> >> failover
> >> >>> >> function calls the *wd_failover_end*() which sends the backend
> sync
> >> >>> >> required message to all standby watchdogs.
> >> >>> >>
> >> >>> >> -- Upon receiving the sync required message from master watchdog
> >> node
> >> >>> all
> >> >>> >> Pgpool-II sync the new statuses of each backend node from the
> master
> >> >>> >> watchdog.
> >> >>> >>
> >> >>> >> *No More Failover locks*
> >> >>> >> Since with this new failover mechanism we do not require any
> >> >>> >> synchronisation and guards against the execution of
> >> failover_commands
> >> >>> by
> >> >>> >> multiple Pgpool-II nodes, So the patch removes all the
> distributed
> >> >>> locks
> >> >>> >> from failover function, This makes the failover simpler and
> faster.
> >> >>> >>
> >> >>> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
> >> >>> >> The patch adds the new kind of backend node operation
> >> NODE_QUARANTINE
> >> >>> which
> >> >>> >> is effectively same as the NODE_DOWN, but with node_quarantine
> the
> >> >>> >> failover_command is not triggered.
> >> >>> >> The NODE_DOWN_REQUEST is automatically converted to the
> >> >>> >> NODE_QUARANTINE_REQUEST when the failover is requested on the
> >> backend
> >> >>> node
> >> >>> >> but watchdog cluster does not holds the quorum.
> >> >>> >> This means in the absence of quorum the failed backend nodes are
> >> >>> >> quarantined and when the quorum becomes available again the
> >> Pgpool-II
> >> >>> >> performs the failback operation on all quarantine nodes.
> >> >>> >> And again when the failback is performed on the quarantine
> backend
> >> >>> node the
> >> >>> >> failover function does not trigger the failback_command.
> >> >>> >>
> >> >>> >> *Controlling the Failover behaviour.*
> >> >>> >> The patch adds three new configuration parameters to configure
> the
> >> >>> failover
> >> >>> >> behaviour from user side.
> >> >>> >>
> >> >>> >> *failover_when_quorum_exists*
> >> >>> >> When enabled the failover command will only be executed when the
> >> >>> watchdog
> >> >>> >> cluster holds the quorum. And when the quorum is absent and
> >> >>> >> failover_when_quorum_exists is enabled the failed backend nodes
> will
> >> >>> get
> >> >>> >> quarantine until the quorum becomes available again.
> >> >>> >> disabling it will enable the old behaviour of failover commands.
> >> >>> >>
> >> >>> >>
> >> >>> >> *failover_require_consensus*This new configuration parameter
> can be
> >> >>> used to
> >> >>> >> make sure we get the majority vote before performing the
> failover on
> >> >>> the
> >> >>> >> node. When *failover_require_consensus* is enabled then the
> >> failover is
> >> >>> >> only performed after receiving the failover request from the
> >> majority
> >> >>> or
> >> >>> >> Pgpool-II nodes.
> >> >>> >> For example in three nodes cluster the failover will not be
> >> performed
> >> >>> until
> >> >>> >> at least two nodes ask for performing the failover on the
> particular
> >> >>> >> backend node.
> >> >>> >>
> >> >>> >> It is also worthwhile to mention here that
> >> *failover_require_consensus*
> >> >>> >> only works when failover_when_quorum_exists is enables.
> >> >>> >>
> >> >>> >>
> >> >>> >> *enable_multiple_failover_requests_from_node*
> >> >>> >> This parameter works in connection with
> *failover_require_consensus*
> >> >>> >> config. When enabled a single Pgpool-II node can vote for
> failover
> >> >>> multiple
> >> >>> >> times.
> >> >>> >> For example in the three nodes cluster if one Pgpool-II node
> sends
> >> the
> >> >>> >> failover request of particular node twice that would be counted
> as
> >> two
> >> >>> >> votes in favour of failover and the failover will be performed
> even
> >> if
> >> >>> we
> >> >>> >> do not get a vote from other two nodes.
> >> >>> >>
> >> >>> >> And when *enable_multiple_failover_requests_from_node* is
> disabled,
> >> >>> Only
> >> >>> >> the first vote from each Pgpool-II will be accepted and all other
> >> >>> >> subsequent votes will be marked duplicate and rejected.
> >> >>> >> So in that case we will require a majority votes from distinct
> >> nodes to
> >> >>> >> execute the failover.
> >> >>> >> Again this *enable_multiple_failover_requests_from_node* only
> >> becomes
> >> >>> >> effective when both *failover_when_quorum_exists* and
> >> >>> >> *failover_require_consensus* are enabled.
> >> >>> >>
> >> >>> >>
> >> >>> >> *Controlling the failover: The Coding perspective.*
> >> >>> >> Although the failover functions are made quorum and consensus
> aware
> >> but
> >> >>> >> there is still a way to bypass the quorum conditions, and
> >> requirement
> >> >>> of
> >> >>> >> consensus.
> >> >>> >>
> >> >>> >> For this the patch uses the existing request_details flags in
> >> >>> >> POOL_REQUEST_NODE to control the behaviour of failover.
> >> >>> >>
> >> >>> >> Here are the newly added flags values.
> >> >>> >>
> >> >>> >> *REQ_DETAIL_WATCHDOG*:
> >> >>> >> Setting this flag while issuing the failover command will not
> send
> >> the
> >> >>> >> failover request to the watchdog. But this flag may not be
> useful in
> >> >>> any
> >> >>> >> other place than where it is already used.
> >> >>> >> Mostly this flag can be used to avoid the failover command from
> >> going
> >> >>> to
> >> >>> >> watchdog that is already originated from watchdog. Otherwise we
> can
> >> >>> end up
> >> >>> >> in infinite loop.
> >> >>> >>
> >> >>> >> *REQ_DETAIL_CONFIRMED*:
> >> >>> >> Setting this flag will bypass the *failover_require_consensus*
> >> >>> >> configuration and immediately perform the failover if quorum is
> >> >>> present.
> >> >>> >> This flag can be used to issue the failover request originated
> from
> >> PCP
> >> >>> >> command.
> >> >>> >>
> >> >>> >> *REQ_DETAIL_UPDATE*:
> >> >>> >> This flag is used for the command where we are failing back the
> >> >>> quarantine
> >> >>> >> nodes. Setting this flag will not trigger the failback_command.
> >> >>> >>
> >> >>> >> *Some conditional flags used:*
> >> >>> >> I was not sure about the configuration of each type of failover
> >> >>> operation.
> >> >>> >> As we have three main failover operations NODE_UP_REQUEST,
> >> >>> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
> >> >>> >> So I was thinking do we need to give the configuration option to
> the
> >> >>> users,
> >> >>> >> if they want to enable/disable quorum checking and consensus for
> >> >>> individual
> >> >>> >> failover operation type.
> >> >>> >> For example: is it a practical configuration where a user would
> >> want to
> >> >>> >> ensure quorum while preforming NODE_DOWN operation while does not
> >> want
> >> >>> it
> >> >>> >> for NODE_UP.
> >> >>> >> So in this patch I use three compile time defines to enable
> disable
> >> the
> >> >>> >> individual failover operation, while we can decide on the best
> >> >>> solution.
> >> >>> >>
> >> >>> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum
> checking
> >> >>> feature
> >> >>> >> for NODE_UP_REQUESTs
> >> >>> >>
> >> >>> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum
> checking
> >> >>> >> feature for NODE_DOWN_REQUESTs
> >> >>> >>
> >> >>> >> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable quorum
> >> >>> checking
> >> >>> >> feature for PROMOTE_NODE_REQUESTs
> >> >>> >>
> >> >>> >> *Some Point for Discussion:*
> >> >>> >>
> >> >>> >> *Do we really need to check ReqInfo->switching flag before
> enqueuing
> >> >>> >> failover request.*
> >> >>> >> While working on the patch I was wondering why do we disallow
> >> >>> enqueuing the
> >> >>> >> failover command when the failover is already in progress? For
> >> example
> >> >>> in
> >> >>> >> *pcp_process_command*() function if we see the
> *Req_info->switching*
> >> >>> flag
> >> >>> >> set we bailout with the error instead of enqueuing the command.
> Is
> >> is
> >> >>> >> really necessary?
> >> >>> >>
> >> >>> >> *Do we need more granule control over each failover operation:*
> >> >>> >> As described in section "Some conditional flags used" I want the
> >> >>> opinion on
> >> >>> >> do we need configuration parameters in pgpool.conf to enable
> disable
> >> >>> quorum
> >> >>> >> and consensus checking on individual failover types.
> >> >>> >>
> >> >>> >> *Which failover should be mark as Confirmed:*
> >> >>> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We can
> mark
> >> >>> the
> >> >>> >> failover request to not need consensus, currently the requests
> from
> >> >>> the PCP
> >> >>> >> commands are fired with this flag. But I was wondering there may
> be
> >> >>> more
> >> >>> >> places where we many need to use the flag.
> >> >>> >> For example I currently use the same confirmed flag when
> failover is
> >> >>> >> triggered because of *replication_stop_on_mismatch*.
> >> >>> >>
> >> >>> >> I think we should think this flag for each place of failover,
> like
> >> >>> when the
> >> >>> >> failover is triggered
> >> >>> >> because of health_check failure.
> >> >>> >> because of replication mismatch
> >> >>> >> because of backend_error
> >> >>> >> e.t.c
> >> >>> >>
> >> >>> >> *Node Quarantine behaviour.*
> >> >>> >> What do you think about the node quarantine used by this patch.
> Can
> >> you
> >> >>> >> think of some problem which can be caused by this?
> >> >>> >>
> >> >>> >> *What should be the default values for each newly added config
> >> >>> parameters.*
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> *TODOs*
> >> >>> >>
> >> >>> >> -- Updating the documentation is still todo. Will do that once
> every
> >> >>> aspect
> >> >>> >> of the feature will be finalised.
> >> >>> >> -- Some code warnings and cleanups are still not done.
> >> >>> >> -- I am still little short on testing
> >> >>> >> -- Regression test cases for the feature
> >> >>> >>
> >> >>> >>
> >> >>> >> Thoughts and suggestions are most welcome.
> >> >>> >>
> >> >>> >> Thanks
> >> >>> >> Best regards
> >> >>> >> Muhammad Usama
> >> >>> > _______________________________________________
> >> >>> > pgpool-hackers mailing list
> >> >>> > pgpool-hackers at pgpool.net
> >> >>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >> >>>
> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170825/bfb58958/attachment-0001.html>