<div dir="ltr">Hi <div><br></div><div>I have created a TODO item and proposal page in pgpool wiki. </div><div><br></div><div><a href="http://pgpool.net/mediawiki/index.php/watchdog_feature_enhancement">http://pgpool.net/mediawiki/index.php/watchdog_feature_enhancement</a><br></div><div><br></div><div><br></div><div>Thanks</div><div>Kind regards</div><div>Muhammad Usama</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 12, 2015 at 12:34 AM, Muhammad Usama <span dir="ltr">&lt;<a href="mailto:m.usama@gmail.com" target="_blank">m.usama@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi<div><br></div><div>I have been further working on above for enhancing the pgpool-II watchdog. Please read below for the detailed design overview document of watchdog enhancements</div><div><br></div><div>Terminologies used below<br>---------------------------------------------------<br>Cluster:<br>            Cluster is the logical entity which contains all the pgpool-II server nodes connected by pgpool-II watchdog.<br><br><br>What is required by the watchdog?<br>---------------------------------------------------<br>The main purpose of the watchdog in pgpool-II is to provide high availability, For this purpose the watchdog is required to ensure following.<br><br>-- Ensure only healthy nodes are part of the cluster<br>-- Ensure only authorized nodes can become the member of the cluster<br>-- Ensure only one pgpool-II node is a designated master node at any time<br>-- Provide the automatic recovery mechanism when possible when some problem occurs <br><br><br>The watchdog should provide a guard against following types of failures<div>-----------------------------------------------------------------<br>-- pgpool-II service failure<br>-- complete or partial network failures.<br><br><br>High level responsibilities of the watchdog<br>--------------------------------------------------------<br>-- Health checking of all participating pgpool-II nodes in the cluster including the health checking of local pgpool-II server.<br>-- Ensure the availability of delegate-ip always on a single node at all the time.<br>-- Mechanism to add and remove pgpool-II nodes from the cluster.<br>-- Perform the leader election to select the master node when the cluster is initialized or in case of master node failure.<br>-- Performs an automatic recovery if the due to some issue the cluster state is broken or split-brain scenario happens<br>-- Generate alarms for failures where administrator intervention is required to rectify the problem.<br>-- Manage the pgpool-II configurations to make sure all the nodes in the cluster have similar configurations.<br>-- Provide the effective way of health checking of other nodes (heartbeat) and messaging between participating nodes.<br>-- Ensure security so that only intended nodes can become the cluster members.<br>-- Provide the mechanism so that administrator can check the status of the cluster and alarms generated by cluster.<br>-- Able to remove the node membership from the cluster(node fencing) if a problematic node is detected or requested by administrator command.<br><br>Watchdog on Amazon Cloud and other cloud flavours<br>-------------------------------------------------------------------------<br>This is the much asked for feature that pgpool-II watchdog should work seamlessly on AWS. So the enhanced watchdog will work on amazon cloud where a simple virtual IP can not be used by pgpool-II watchdog. For this the enhanced watchdog will implement two new features.<br><br>1 -- Active-Active watchdog configuration: </div><div>                          This will be a big improvement to the pgpool-II watchdog and this would effectually mean that multiple pgpool-II servers can be installed and external load-balancer and HA system can be used with the pgpool-II<br><br>2 -- New watchdog will be flexible enough to allow utilities other than ifconfig (e.g ec2-assign-private-ip-addresses for AWS virtual IP) can be used to bring up virtual-IP<br><br><br>Logical Components of watchdog<br>---------------------------------------------<br>The pgpool-II watchdog system will consists of following discrete logical components<br><br>-- Heartbeat to monitor health and availability of cluster member nodes.<br>-- Messaging system, to share status and configurations between cluster member nodes.<br>               ---- All the messaging will be in xml or text based extensible protocol to ensure easy debugging and future extensions<br>               ---- Will provide a communication mechanism for unicast as well as broadcast messaging<br>-- Local resource manager, which will have a responsibility to monitor the health of local resources. It will consist of two sub components<br>               ---- delegate-IP monitoring and control<br>               ---- Local pgpool-II server monitoring<br>-- Information database, That will store and manage all the cluster wide runtime information and pgpool-II configurations<br>-- IPC listener to enable administrator control by PCP commands.<br><br><br>Working overview of watchdog system</div><div>---------------------------------------------------<br>The new watchdog system will be a finite state machine which will transit between different states. Some prominent systems states will be<br><br>IDLE                              -- nothing is happening<br><br>STARTING                    -- starting up<br><br>STOPING                      -- stoping<br><br>ELECTION                    -- Take part in the election<br><br>JOINING CLUSTER     -- we are initialised and joining the cluster<br><br>ELECTED MASTER     -- If the node has been just elected as the master node<br><br>NORMAL NODE           -- If we are not master and have joined the cluster as a slave node<br><br>RECOVERY                  -- some event occurred and we are recovering from it<br><br><br>The basic working of the watchdog will be as follows:<br>========================================<br>At startup do basic sanity checks and go into the normal member node state, wait for the instructions from the master node or start the election algorithm.<br>If the election algorithm is started, Participate in the elections and become either master node or normal node, depending on election results.<br>Once the election is complete, if we are the master node, move to the master waiting state and construct the complete view of member nodes and cluster state<br>Construct the information database and propagate it to all member nodes.<br>Start the health-checking of local resources and remote nodes and stay in this state until some failure occur. Depending upon type of failure or event take appropriate actions. </div><div>The action could be one of the following<br>     -- Kill itself.<br>     -- Start leader election<br>     -- Restart a local resource (pgpool-II server or delegate-IP)<br>     -- Inform about some event or failure to master node (if it is not master node)<br>     -- Replicate the configuration or information to the member nodes (master node only)<br>     -- Perform fencing of member node (master node only)</div><div><br>Responsibilities of master watchdog node.<br>=================================<br>-- Maintaining the up to date configurations of pgpool-II and replicating it to all participating nodes in the cluster<br>-- Health checking of backend pgpool-II nodes, And if the configuration is in such a way that all members are required to do backend health checking, or if the backend error is detected by some other member of cluster, then ensure that failover of the backend node is executed only by a single node.<br>-- Managing the fencing, joining and leaving of members from the cluster<br>-- Keeping hold of delegate-IP and making sure that it is recovered back if for some reason it is dropped.<br>-- Handing over the responsibility to some other cluster member if for some issue, it is not able to continue as master node or instruct by administrator command.<br><br><br>Leader election algorithm<br>====================<br>Selecting the best algorithm for selecting the master pgpool-II node in case of master node failure or at start-up is still a TODO, and one of the suggestion is to use Leader Election in Asynchronous Distributed Systems  <a href="http://www.cs.indiana.edu/pub/techreports/TR521.pdf" target="_blank">http://www.cs.indiana.edu/pub/techreports/TR521.pdf</a> algorithm (Also used by pacemaker). </div><div>Other leader algorithm suggestions are most welcome</div><div><br></div><div><br></div><div>Thought, suggestions, Comments ???</div><div><br></div><div>Thanks</div><div>Best regards</div><span class="HOEnZb"><font color="#888888"><div>Muhammad Usama</div><div><br></div><div><br></div><div><br></div></font></span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 2, 2015 at 4:08 PM, Muhammad Usama <span dir="ltr">&lt;<a href="mailto:m.usama@gmail.com" target="_blank">m.usama@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi pgpool-II hackers,<br>

<br>

pgpool-II&#39;s watchdog is used to eliminate single point of failure and<br>

provide HA, Although current watchdog is serving the purpose but I<br>

think there is a need to enhance this feature and make it more robust<br>

and adoptable. So that it can work seamlessly in verity of scenarios<br>

and with different system and cloud flavours.<br>

<br>

Below are the few points on which I think the enhancements can be made<br>

to make pgpool-II more robust for high availability scenarios.<br>

<br>

1-) Provide multiple options for heartbeat to check the availability<br>

of other pgpool-II servers.<br>

     a-) UDP uni-cast (Already present)<br>

     b-) UDP multicast, Will be helpful in reducing network traffic.<br>

     c-) TCP heartbeat.<br>

<br>

2-)  pgpool-II running in one group, should also sync the configurations.<br>

<br>

I think it would be good, If multiple pgpool-II servers running in one<br>

group (connected to each other by watchdog), should have same<br>

configuration parameter values and consistent view of backend nodes.<br>

Doing this will also help in cases when some external IP based<br>

load-balancer is used to load-balance between two or more pgpool-II<br>

servers.<br>

<br>

<br>

3-)  It may be good to offload the burden of PG backend node health<br>

checking from secondary pgpool-II servers and delegating it solely to<br>

master pgpool-II only. Which performs the backend node health<br>

checking, this could help in improving the performance a little.<br>

<br>

4-)  If somehow a split brain scenario happens because of network<br>

partitioning or temporary network outage. The pgpool-II should be able<br>

to recover by-itself after detecting the scenario.<br>

<br>

<br>

5-)  Add some way in pgpool-II to allow configurable quorum settings<br>

to decide how and when the pgpool-II can be escalated to master<br>

pgpool-II<br>

<br>

6-) pgpool-II should have some configuration parameter to wait for<br>

configured amount of time before starting to elect new master node in<br>

case of master pgpool-II node failure. This could help to guard<br>

failover in case of temporary network glitches.<br>

<br>

7) Allow to use watchdog in a configuration where watchdog master and<br>

   secondary cannot share the same virtual IP address (for example,<br>

   different regions in AWS).<br>

<br>

Thoughts, comments snd suggestions are most welcome.<br>

<br>

Thanks and regards!<br>

<span><font color="#888888">Muhammad Usama<br>

</font></span></blockquote></div><br></div>

</div></div></blockquote></div><br></div>