[pgpool-hackers: 1209] Re: Fw: [pgpool-committers: 2844] pgpool: Watchdog node goes into the WD_WAITING_FOR_QUORUM state wheneve

Fri Dec 11 16:55:35 JST 2015

Usama,

What do you think about adding some description about this into the document?
For example, there is the description as bellow in the current document:

  A master watchdog node can resign from being a master node, when the master node 
  pgpool-II shuts down, detects a network blackout or detects the lost of quorum.

It's not so correct.

On Fri, 4 Dec 2015 12:54:30 +0500
Muhammad Usama <m.usama at gmail.com> wrote:

> Hi Yugo
> 
> Many thanks for looking into the code. please find my response inline.
> 
> On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> 
> > Usama,
> >
> > I havn't test this specification yet, but I dought this affects
> > interlocking.
> > Certainly VIP conflicts is avoided, but, because there can be two
> > coordinators,
> > is it possible that there are also two lock-holders? If so, the failover
> > command
> > will be executed twice on the same backend node.
> >
> > Am I missing something?
> >
> 
> Yes, you are right. In a scenario like network partitioning, although it's
> a very rare scenario, but it is possible that the watchdog cluster could
> get more than one coordinator nodes and hens, the same number of lock
> holders after this commit. But before this commit the old behavior was that
> all nodes would have gone into waiting for quorum state and the cluster
> would be left with no coordinator.
> 
> Now if we analyze both the above situations. The new behavior where a
> network has more than one coordinator node (but none of the coordinator
> have the VIP) is a better evil. Because if all the nodes were in waiting
> for quorum state and cluster have no coordinator node, All interlocking
> commands will get fail and we will end up with one of the two situations
> 
> a-) All pgpool-II nodes in cluster go on to execute failover commands. (If
> we implement logic like:  "wait for lock or timeout" )
> b-) None of the pgpool-II node execute the failover (If we do not execute
> the failover command when the lock command fails).
> 
> But remember the node could go into "waiting for quorum" state in many
> other situations like.
> 
> 1-) only a few pgpool-II nodes are started by a user which are not enough
> for completing the quorum
> 2-) Hardware failure or any other failure, shutdown the pgpool-II nodes and
> quorum is lost in the cluster.
> 
> In both these two cases with new design the cluster will have only one
> master and can successfully coordinate the locking and unlocking. So we
> will achieve the desired behavior with new implementation. But with
> previous design in both these cases the node would have gone in the waiting
> for quorum state and there would be no node to coordinate the locking and
> unlocking commands.
> 
> I can only think of one scenario where a cluster could have multiple
> coordinator nodes, that is network partitioning. And if in that situation
> the backend failover happens. Both designs will fail short of being perfect.
> 
> And as far as avoiding the VIP conflict is concerned, I think both designs
> (current without waiting for quorum state and the old one) will perform
> just the same. Because in the new design when ever the quorum is lost or
> not present even if the node becomes the coordinator node it will not
> acquire the VIP and wait until the quorum is complete. So even when in some
> case we have more than one coordinator nodes in the cluster, all the
> coordinator node will have the flag set that the quorum is not present and
> they will not acquire the VIP or execute wd_escalation command. And only
> when the cluster recovers itself from the situation and one coordinator
> node gets connected with the minimum number of nodes require to complete
> the quorum the VIP will be brought up. So I think the new design will serve
> well to make sure the IP conflict should not happen.
> 
> 
> > And, could you explain what is different between coordinator and escalated
> > node
> > in your code.
> >
> 
> The coordinator node is just another name of the master or leader watchdog
> node. While the escalated node is the master/coordinator node which has VIP
> and/or has executed wd_escalation command. So only the master/coordinator
> node can become an escalated node when the quorum is complete.
> 
> 
> 
> Thanks
> Best regards
> Muhammad Usama
> 
> 
> > Begin forwarded message:
> >
> > Date: Thu, 03 Dec 2015 15:14:29 +0000
> > From: Muhammad Usama <m.usama at gmail.com>
> > To: pgpool-committers at pgpool.net
> > Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the
> > WD_WAITING_FOR_QUORUM state wheneve
> >
> >
> > Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the
> > quorum is
> > not present or lost. Although it is a good guard against the split-brain
> > syndrome,
> > but there is a problem with this technique. pgpool-II commands which
> > require
> > a cluster wide synchronization like interlocking commands start getting
> > failed
> > when the node is in waiting for quorum, as these commands requires a
> > central
> > coordinator node for processing
> >
> > The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make
> > sure that
> > the cluster always elects the master node even when the quorum is not
> > present.
> > But the trick is not to execute the escalation commands on master node
> > when the
> > quorum is missing and waits until the quorum is complete. This new design
> > ensures that even when because of network partitioning or some other issue
> > the
> > cluster gets multiple master nodes (split-brain syndrome) the VIP conflict
> > will
> > still not happen and that multiple master node would be harmless.
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> >
> > http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09
> >
> > Modified Files
> > --------------
> > src/include/watchdog/watchdog.h |    1 -
> > src/watchdog/watchdog.c         |  455
> > +++++++++++++++++++++++++--------------
> > src/watchdog/wd_escalation.c    |    2 +-
> > src/watchdog/wd_heartbeat.c     |    6 +-
> > src/watchdog/wd_lifecheck.c     |    2 +-
> > 5 files changed, 304 insertions(+), 162 deletions(-)
> >
> > _______________________________________________
> > pgpool-committers mailing list
> > pgpool-committers at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-committers
> >
> >
> > --
> > Yugo Nagata <nagata at sraoss.co.jp>
> > _______________________________________________
> > pgpool-hackers mailing list
> > pgpool-hackers at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >

-- 
Yugo Nagata <nagata at sraoss.co.jp>