[pgpool-hackers: 812] Proposal to make watchdog more robust.

Muhammad Usama m.usama at gmail.com
Mon Mar 2 20:08:12 JST 2015


Hi pgpool-II hackers,

pgpool-II's watchdog is used to eliminate single point of failure and
provide HA, Although current watchdog is serving the purpose but I
think there is a need to enhance this feature and make it more robust
and adoptable. So that it can work seamlessly in verity of scenarios
and with different system and cloud flavours.

Below are the few points on which I think the enhancements can be made
to make pgpool-II more robust for high availability scenarios.

1-) Provide multiple options for heartbeat to check the availability
of other pgpool-II servers.
     a-) UDP uni-cast (Already present)
     b-) UDP multicast, Will be helpful in reducing network traffic.
     c-) TCP heartbeat.

2-)  pgpool-II running in one group, should also sync the configurations.

I think it would be good, If multiple pgpool-II servers running in one
group (connected to each other by watchdog), should have same
configuration parameter values and consistent view of backend nodes.
Doing this will also help in cases when some external IP based
load-balancer is used to load-balance between two or more pgpool-II
servers.


3-)  It may be good to offload the burden of PG backend node health
checking from secondary pgpool-II servers and delegating it solely to
master pgpool-II only. Which performs the backend node health
checking, this could help in improving the performance a little.

4-)  If somehow a split brain scenario happens because of network
partitioning or temporary network outage. The pgpool-II should be able
to recover by-itself after detecting the scenario.


5-)  Add some way in pgpool-II to allow configurable quorum settings
to decide how and when the pgpool-II can be escalated to master
pgpool-II

6-) pgpool-II should have some configuration parameter to wait for
configured amount of time before starting to elect new master node in
case of master pgpool-II node failure. This could help to guard
failover in case of temporary network glitches.

7) Allow to use watchdog in a configuration where watchdog master and
   secondary cannot share the same virtual IP address (for example,
   different regions in AWS).

Thoughts, comments snd suggestions are most welcome.

Thanks and regards!
Muhammad Usama


More information about the pgpool-hackers mailing list