[pgpool-hackers: 1797] Best possible solution when watchdog node and PostgreSQL backend nodes fails at the same time

Fri Sep 9 04:34:45 JST 2016

Hi All

I am looking into pgpool-II failover with watchdog issues.

   - Issue 234: pgpool fails to run failover script if the sever running
   primary pgpool node and postgresql server dies
   - Issue 227: failover not performed by standby node

Currently I am little stuck on deciding the best possible solution to the
above problems, So I am sharing the simplified scenario, problems and
possible solutions with the wider audience for the thoughts and suggestions

Scenario
=========
Consider two machines each with a PostgreSQL and pgpool-II installation as
given below:

Machine (M1)
================
PostgreSQL (PG1)
pgPool-II (PGP1)

Machine (M2)
================
PostgreSQL (PG2)
pgPool-II (PGP2)

So the cluster may be configured as:
================================
PGP1 [backend_0 = PG1(local), backend_1 = PG2(remote)]
PGP2 [backend_0 = PG1(remote), backend_1 = PG2(local)]

Each pgpool-II has two backend nodes configured, one on the local machine
and another on the remote second machine. Also, both the pgpool-II nodes
are connected through watchdog.

When this cluster boot up and link is fine, one of the pgpool-II node will
become a master watchdog node and the other will join in as a standby
watchdog while the status of both backend PG nodes on each pgpool-II is
[UP].

Scenario:
================
Now consider if we disconnect the network link between M1 and M2:

   - Watchdog communication between the PGP1 and PGP2 will be disconnected,
   but still each pgpool-II node will NOT consider the other node as lost till
   the watchdog lifecheck process (depending on the configuration) will
   mark the other node as dead.
   - At the same instance in time, PGP1 will lose the link with backend_1
   (PG2) and PGP2 will lose the link with backend_0 (PG1) and both pgpool-II
   nodes will start the failover of the failed backend PostgreSQL node.

Because of the configuration and events, both pgpool-II are performing the
failover on a different PG backend node. One pgpool is degenerating PG1
while the other is degenerating PG2 at the same time and also the
communication link at the moment does not exist between pgpool-II nodes.

Problems
=========
1 -- Since the communication link is broken between pgpool-II nodes so
neither pgpool-II node will not be able to communicate the degenerate
request to other pgpool-II node.

2 -- As all the locks and interlockings are handled by the master watchdog
node, and the standby watchdog pgpool-II node will not be able to
communicate with the master watchdog pgpool-II node, So standby pgpool-II
node will not be able to acquire or coordinate the locking.

3 -- If both pgpool-II nodes get done with respective backend node failover
and communication link is restored. Both pgpool-II will be connected to
different backend node and both backend nodes might have become the master
PostgreSQL node at that time.

Possible Solutions
==============

For Problem #1
============
As the communication link is broken between pgpool-II nodes, Sending the
degenerate request to other pgpool-II node will fail with time-out. So when
the time-out occurs, we have two options.

   1. Consider it as a temporary failure and enqueue the command until the
   watchdog cluster elects the new master node or become the single node
   cluster and process the command at that time.
   2. Consider it as a success and proceed with the node failover even when
   the time-out on watchdog failover command occurs.

For Problem #2
============
As the node will be isolated, especially the standby pgpool-II node so it
will not be able to coordinate/acquire/release the locks, and all lock
requests will get fail with time-out. Again, we have multiple options here

   1. Time-out on locking command can be considered as the lock-acquire
   failure, and consequently neither pgpool-II node will execute the user
   provided failover command.
   2. Consider time-out on locking command as the lock acquired and as a
   result, both pgpool-II nodes will execute the user provided failover
   commands.
   3. Finally, we can consider it as a temporary failure and enqueue the
   first lock command issued to the watchdog and keep it queued till the
   watchdog elects the new master or become the single node cluster. And till
   that time pgpool-II will keep waiting for the lock command to be completed.

For Problem #3
=============
If the communication is restored and both pgpool-II has a different
backend status
from each other, both pgpool-II nodes should shutdown itself and let the
user restart it again after verifying the data consistency among PostgreSQL
servers.

Please give your valuable feedback and suggestions to reach the best
possible solution for the above.

Thanks in anticipation
Best regards
Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160909/4839b79b/attachment-0001.html>