[pgpool-hackers: 1210] Re: Fw: [pgpool-committers: 2844] pgpool: Watchdog node goes into the WD_WAITING_FOR_QUORUM state wheneve

Fri Dec 11 17:11:45 JST 2015

On Fri, Dec 11, 2015 at 12:55 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:

> Usama,
>
> What do you think about adding some description about this into the
> document?
> For example, there is the description as bellow in the current document:
>
>   A master watchdog node can resign from being a master node, when the
> master node
>   pgpool-II shuts down, detects a network blackout or detects the lost of
> quorum.
>
> It's not so correct.
>

Yes, Thank you for pointing this out. I forgot about the documentation part
while working on that patch. I will update the documentation accordingly .

Regards
Muhammad Usama

>
>
> On Fri, 4 Dec 2015 12:54:30 +0500
> Muhammad Usama <m.usama at gmail.com> wrote:
>
> > Hi Yugo
> >
> > Many thanks for looking into the code. please find my response inline.
> >
> > On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >
> > > Usama,
> > >
> > > I havn't test this specification yet, but I dought this affects
> > > interlocking.
> > > Certainly VIP conflicts is avoided, but, because there can be two
> > > coordinators,
> > > is it possible that there are also two lock-holders? If so, the
> failover
> > > command
> > > will be executed twice on the same backend node.
> > >
> > > Am I missing something?
> > >
> >
> > Yes, you are right. In a scenario like network partitioning, although
> it's
> > a very rare scenario, but it is possible that the watchdog cluster could
> > get more than one coordinator nodes and hens, the same number of lock
> > holders after this commit. But before this commit the old behavior was
> that
> > all nodes would have gone into waiting for quorum state and the cluster
> > would be left with no coordinator.
> >
> > Now if we analyze both the above situations. The new behavior where a
> > network has more than one coordinator node (but none of the coordinator
> > have the VIP) is a better evil. Because if all the nodes were in waiting
> > for quorum state and cluster have no coordinator node, All interlocking
> > commands will get fail and we will end up with one of the two situations
> >
> > a-) All pgpool-II nodes in cluster go on to execute failover commands.
> (If
> > we implement logic like:  "wait for lock or timeout" )
> > b-) None of the pgpool-II node execute the failover (If we do not execute
> > the failover command when the lock command fails).
> >
> > But remember the node could go into "waiting for quorum" state in many
> > other situations like.
> >
> > 1-) only a few pgpool-II nodes are started by a user which are not enough
> > for completing the quorum
> > 2-) Hardware failure or any other failure, shutdown the pgpool-II nodes
> and
> > quorum is lost in the cluster.
> >
> > In both these two cases with new design the cluster will have only one
> > master and can successfully coordinate the locking and unlocking. So we
> > will achieve the desired behavior with new implementation. But with
> > previous design in both these cases the node would have gone in the
> waiting
> > for quorum state and there would be no node to coordinate the locking and
> > unlocking commands.
> >
> > I can only think of one scenario where a cluster could have multiple
> > coordinator nodes, that is network partitioning. And if in that situation
> > the backend failover happens. Both designs will fail short of being
> perfect.
> >
> > And as far as avoiding the VIP conflict is concerned, I think both
> designs
> > (current without waiting for quorum state and the old one) will perform
> > just the same. Because in the new design when ever the quorum is lost or
> > not present even if the node becomes the coordinator node it will not
> > acquire the VIP and wait until the quorum is complete. So even when in
> some
> > case we have more than one coordinator nodes in the cluster, all the
> > coordinator node will have the flag set that the quorum is not present
> and
> > they will not acquire the VIP or execute wd_escalation command. And only
> > when the cluster recovers itself from the situation and one coordinator
> > node gets connected with the minimum number of nodes require to complete
> > the quorum the VIP will be brought up. So I think the new design will
> serve
> > well to make sure the IP conflict should not happen.
> >
> >
> > > And, could you explain what is different between coordinator and
> escalated
> > > node
> > > in your code.
> > >
> >
> > The coordinator node is just another name of the master or leader
> watchdog
> > node. While the escalated node is the master/coordinator node which has
> VIP
> > and/or has executed wd_escalation command. So only the master/coordinator
> > node can become an escalated node when the quorum is complete.
> >
> >
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> > > Begin forwarded message:
> > >
> > > Date: Thu, 03 Dec 2015 15:14:29 +0000
> > > From: Muhammad Usama <m.usama at gmail.com>
> > > To: pgpool-committers at pgpool.net
> > > Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the
> > > WD_WAITING_FOR_QUORUM state wheneve
> > >
> > >
> > > Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the
> > > quorum is
> > > not present or lost. Although it is a good guard against the
> split-brain
> > > syndrome,
> > > but there is a problem with this technique. pgpool-II commands which
> > > require
> > > a cluster wide synchronization like interlocking commands start getting
> > > failed
> > > when the node is in waiting for quorum, as these commands requires a
> > > central
> > > coordinator node for processing
> > >
> > > The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make
> > > sure that
> > > the cluster always elects the master node even when the quorum is not
> > > present.
> > > But the trick is not to execute the escalation commands on master node
> > > when the
> > > quorum is missing and waits until the quorum is complete. This new
> design
> > > ensures that even when because of network partitioning or some other
> issue
> > > the
> > > cluster gets multiple master nodes (split-brain syndrome) the VIP
> conflict
> > > will
> > > still not happen and that multiple master node would be harmless.
> > >
> > > Branch
> > > ------
> > > master
> > >
> > > Details
> > > -------
> > >
> > >
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09
> > >
> > > Modified Files
> > > --------------
> > > src/include/watchdog/watchdog.h |    1 -
> > > src/watchdog/watchdog.c         |  455
> > > +++++++++++++++++++++++++--------------
> > > src/watchdog/wd_escalation.c    |    2 +-
> > > src/watchdog/wd_heartbeat.c     |    6 +-
> > > src/watchdog/wd_lifecheck.c     |    2 +-
> > > 5 files changed, 304 insertions(+), 162 deletions(-)
> > >
> > > _______________________________________________
> > > pgpool-committers mailing list
> > > pgpool-committers at pgpool.net
> > > http://www.pgpool.net/mailman/listinfo/pgpool-committers
> > >
> > >
> > > --
> > > Yugo Nagata <nagata at sraoss.co.jp>
> > > _______________________________________________
> > > pgpool-hackers mailing list
> > > pgpool-hackers at pgpool.net
> > > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> > >
>
>
> --
> Yugo Nagata <nagata at sraoss.co.jp>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20151211/598b8a24/attachment-0001.html>