[pgpool-hackers: 1180] Re: Fw: [pgpool-committers: 2844] pgpool: Watchdog node goes into the WD_WAITING_FOR_QUORUM state wheneve

Fri Dec 4 16:54:30 JST 2015

Hi Yugo

Many thanks for looking into the code. please find my response inline.

On Fri, Dec 4, 2015 at 7:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:

> Usama,
>
> I havn't test this specification yet, but I dought this affects
> interlocking.
> Certainly VIP conflicts is avoided, but, because there can be two
> coordinators,
> is it possible that there are also two lock-holders? If so, the failover
> command
> will be executed twice on the same backend node.
>
> Am I missing something?
>

Yes, you are right. In a scenario like network partitioning, although it's
a very rare scenario, but it is possible that the watchdog cluster could
get more than one coordinator nodes and hens, the same number of lock
holders after this commit. But before this commit the old behavior was that
all nodes would have gone into waiting for quorum state and the cluster
would be left with no coordinator.

Now if we analyze both the above situations. The new behavior where a
network has more than one coordinator node (but none of the coordinator
have the VIP) is a better evil. Because if all the nodes were in waiting
for quorum state and cluster have no coordinator node, All interlocking
commands will get fail and we will end up with one of the two situations

a-) All pgpool-II nodes in cluster go on to execute failover commands. (If
we implement logic like:  "wait for lock or timeout" )
b-) None of the pgpool-II node execute the failover (If we do not execute
the failover command when the lock command fails).

But remember the node could go into "waiting for quorum" state in many
other situations like.

1-) only a few pgpool-II nodes are started by a user which are not enough
for completing the quorum
2-) Hardware failure or any other failure, shutdown the pgpool-II nodes and
quorum is lost in the cluster.

In both these two cases with new design the cluster will have only one
master and can successfully coordinate the locking and unlocking. So we
will achieve the desired behavior with new implementation. But with
previous design in both these cases the node would have gone in the waiting
for quorum state and there would be no node to coordinate the locking and
unlocking commands.

I can only think of one scenario where a cluster could have multiple
coordinator nodes, that is network partitioning. And if in that situation
the backend failover happens. Both designs will fail short of being perfect.

And as far as avoiding the VIP conflict is concerned, I think both designs
(current without waiting for quorum state and the old one) will perform
just the same. Because in the new design when ever the quorum is lost or
not present even if the node becomes the coordinator node it will not
acquire the VIP and wait until the quorum is complete. So even when in some
case we have more than one coordinator nodes in the cluster, all the
coordinator node will have the flag set that the quorum is not present and
they will not acquire the VIP or execute wd_escalation command. And only
when the cluster recovers itself from the situation and one coordinator
node gets connected with the minimum number of nodes require to complete
the quorum the VIP will be brought up. So I think the new design will serve
well to make sure the IP conflict should not happen.

> And, could you explain what is different between coordinator and escalated
> node
> in your code.
>

The coordinator node is just another name of the master or leader watchdog
node. While the escalated node is the master/coordinator node which has VIP
and/or has executed wd_escalation command. So only the master/coordinator
node can become an escalated node when the quorum is complete.

Thanks
Best regards
Muhammad Usama

> Begin forwarded message:
>
> Date: Thu, 03 Dec 2015 15:14:29 +0000
> From: Muhammad Usama <m.usama at gmail.com>
> To: pgpool-committers at pgpool.net
> Subject: [pgpool-committers: 2844] pgpool: Watchdog node goes into the
> WD_WAITING_FOR_QUORUM state wheneve
>
>
> Watchdog node goes into the WD_WAITING_FOR_QUORUM state whenever the
> quorum is
> not present or lost. Although it is a good guard against the split-brain
> syndrome,
> but there is a problem with this technique. pgpool-II commands which
> require
> a cluster wide synchronization like interlocking commands start getting
> failed
> when the node is in waiting for quorum, as these commands requires a
> central
> coordinator node for processing
>
> The fix for this is to remove the WD_WAITING_FOR_QUORUM state and make
> sure that
> the cluster always elects the master node even when the quorum is not
> present.
> But the trick is not to execute the escalation commands on master node
> when the
> quorum is missing and waits until the quorum is complete. This new design
> ensures that even when because of network partitioning or some other issue
> the
> cluster gets multiple master nodes (split-brain syndrome) the VIP conflict
> will
> still not happen and that multiple master node would be harmless.
>
> Branch
> ------
> master
>
> Details
> -------
>
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6238ba9d2aaa82db415b11aca90d54981a854e09
>
> Modified Files
> --------------
> src/include/watchdog/watchdog.h |    1 -
> src/watchdog/watchdog.c         |  455
> +++++++++++++++++++++++++--------------
> src/watchdog/wd_escalation.c    |    2 +-
> src/watchdog/wd_heartbeat.c     |    6 +-
> src/watchdog/wd_lifecheck.c     |    2 +-
> 5 files changed, 304 insertions(+), 162 deletions(-)
>
> _______________________________________________
> pgpool-committers mailing list
> pgpool-committers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-committers
>
>
> --
> Yugo Nagata <nagata at sraoss.co.jp>
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20151204/603c0b69/attachment.html>