[pgpool-hackers: 937] Re: Proposal to make watchdog more robust.

Tue Jun 16 04:28:01 JST 2015

Hi

I have created a TODO item and proposal page in pgpool wiki.

http://pgpool.net/mediawiki/index.php/watchdog_feature_enhancement

Thanks
Kind regards
Muhammad Usama

On Fri, Jun 12, 2015 at 12:34 AM, Muhammad Usama <m.usama at gmail.com> wrote:

> Hi
>
> I have been further working on above for enhancing the pgpool-II watchdog.
> Please read below for the detailed design overview document of watchdog
> enhancements
>
> Terminologies used below
> ---------------------------------------------------
> Cluster:
>             Cluster is the logical entity which contains all the pgpool-II
> server nodes connected by pgpool-II watchdog.
>
>
> What is required by the watchdog?
> ---------------------------------------------------
> The main purpose of the watchdog in pgpool-II is to provide high
> availability, For this purpose the watchdog is required to ensure following.
>
> -- Ensure only healthy nodes are part of the cluster
> -- Ensure only authorized nodes can become the member of the cluster
> -- Ensure only one pgpool-II node is a designated master node at any time
> -- Provide the automatic recovery mechanism when possible when some
> problem occurs
>
>
> The watchdog should provide a guard against following types of failures
> -----------------------------------------------------------------
> -- pgpool-II service failure
> -- complete or partial network failures.
>
>
> High level responsibilities of the watchdog
> --------------------------------------------------------
> -- Health checking of all participating pgpool-II nodes in the cluster
> including the health checking of local pgpool-II server.
> -- Ensure the availability of delegate-ip always on a single node at all
> the time.
> -- Mechanism to add and remove pgpool-II nodes from the cluster.
> -- Perform the leader election to select the master node when the cluster
> is initialized or in case of master node failure.
> -- Performs an automatic recovery if the due to some issue the cluster
> state is broken or split-brain scenario happens
> -- Generate alarms for failures where administrator intervention is
> required to rectify the problem.
> -- Manage the pgpool-II configurations to make sure all the nodes in the
> cluster have similar configurations.
> -- Provide the effective way of health checking of other nodes (heartbeat)
> and messaging between participating nodes.
> -- Ensure security so that only intended nodes can become the cluster
> members.
> -- Provide the mechanism so that administrator can check the status of the
> cluster and alarms generated by cluster.
> -- Able to remove the node membership from the cluster(node fencing) if a
> problematic node is detected or requested by administrator command.
>
> Watchdog on Amazon Cloud and other cloud flavours
> -------------------------------------------------------------------------
> This is the much asked for feature that pgpool-II watchdog should work
> seamlessly on AWS. So the enhanced watchdog will work on amazon cloud where
> a simple virtual IP can not be used by pgpool-II watchdog. For this the
> enhanced watchdog will implement two new features.
>
> 1 -- Active-Active watchdog configuration:
>                           This will be a big improvement to the pgpool-II
> watchdog and this would effectually mean that multiple pgpool-II servers
> can be installed and external load-balancer and HA system can be used with
> the pgpool-II
>
> 2 -- New watchdog will be flexible enough to allow utilities other than
> ifconfig (e.g ec2-assign-private-ip-addresses for AWS virtual IP) can be
> used to bring up virtual-IP
>
>
> Logical Components of watchdog
> ---------------------------------------------
> The pgpool-II watchdog system will consists of following discrete logical
> components
>
> -- Heartbeat to monitor health and availability of cluster member nodes.
> -- Messaging system, to share status and configurations between cluster
> member nodes.
>                ---- All the messaging will be in xml or text based
> extensible protocol to ensure easy debugging and future extensions
>                ---- Will provide a communication mechanism for unicast as
> well as broadcast messaging
> -- Local resource manager, which will have a responsibility to monitor the
> health of local resources. It will consist of two sub components
>                ---- delegate-IP monitoring and control
>                ---- Local pgpool-II server monitoring
> -- Information database, That will store and manage all the cluster wide
> runtime information and pgpool-II configurations
> -- IPC listener to enable administrator control by PCP commands.
>
>
> Working overview of watchdog system
> ---------------------------------------------------
> The new watchdog system will be a finite state machine which will transit
> between different states. Some prominent systems states will be
>
> IDLE                              -- nothing is happening
>
> STARTING                    -- starting up
>
> STOPING                      -- stoping
>
> ELECTION                    -- Take part in the election
>
> JOINING CLUSTER     -- we are initialised and joining the cluster
>
> ELECTED MASTER     -- If the node has been just elected as the master node
>
> NORMAL NODE           -- If we are not master and have joined the cluster
> as a slave node
>
> RECOVERY                  -- some event occurred and we are recovering
> from it
>
>
> The basic working of the watchdog will be as follows:
> ========================================
> At startup do basic sanity checks and go into the normal member node
> state, wait for the instructions from the master node or start the election
> algorithm.
> If the election algorithm is started, Participate in the elections and
> become either master node or normal node, depending on election results.
> Once the election is complete, if we are the master node, move to the
> master waiting state and construct the complete view of member nodes and
> cluster state
> Construct the information database and propagate it to all member nodes.
> Start the health-checking of local resources and remote nodes and stay in
> this state until some failure occur. Depending upon type of failure or
> event take appropriate actions.
> The action could be one of the following
>      -- Kill itself.
>      -- Start leader election
>      -- Restart a local resource (pgpool-II server or delegate-IP)
>      -- Inform about some event or failure to master node (if it is not
> master node)
>      -- Replicate the configuration or information to the member nodes
> (master node only)
>      -- Perform fencing of member node (master node only)
>
> Responsibilities of master watchdog node.
> =================================
> -- Maintaining the up to date configurations of pgpool-II and replicating
> it to all participating nodes in the cluster
> -- Health checking of backend pgpool-II nodes, And if the configuration is
> in such a way that all members are required to do backend health checking,
> or if the backend error is detected by some other member of cluster, then
> ensure that failover of the backend node is executed only by a single node.
> -- Managing the fencing, joining and leaving of members from the cluster
> -- Keeping hold of delegate-IP and making sure that it is recovered back
> if for some reason it is dropped.
> -- Handing over the responsibility to some other cluster member if for
> some issue, it is not able to continue as master node or instruct by
> administrator command.
>
>
> Leader election algorithm
> ====================
> Selecting the best algorithm for selecting the master pgpool-II node in
> case of master node failure or at start-up is still a TODO, and one of the
> suggestion is to use Leader Election in Asynchronous Distributed Systems
> http://www.cs.indiana.edu/pub/techreports/TR521.pdf algorithm (Also used
> by pacemaker).
> Other leader algorithm suggestions are most welcome
>
>
> Thought, suggestions, Comments ???
>
> Thanks
> Best regards
> Muhammad Usama
>
>
>
>
> On Mon, Mar 2, 2015 at 4:08 PM, Muhammad Usama <m.usama at gmail.com> wrote:
>
>> Hi pgpool-II hackers,
>>
>> pgpool-II's watchdog is used to eliminate single point of failure and
>> provide HA, Although current watchdog is serving the purpose but I
>> think there is a need to enhance this feature and make it more robust
>> and adoptable. So that it can work seamlessly in verity of scenarios
>> and with different system and cloud flavours.
>>
>> Below are the few points on which I think the enhancements can be made
>> to make pgpool-II more robust for high availability scenarios.
>>
>> 1-) Provide multiple options for heartbeat to check the availability
>> of other pgpool-II servers.
>>      a-) UDP uni-cast (Already present)
>>      b-) UDP multicast, Will be helpful in reducing network traffic.
>>      c-) TCP heartbeat.
>>
>> 2-)  pgpool-II running in one group, should also sync the configurations.
>>
>> I think it would be good, If multiple pgpool-II servers running in one
>> group (connected to each other by watchdog), should have same
>> configuration parameter values and consistent view of backend nodes.
>> Doing this will also help in cases when some external IP based
>> load-balancer is used to load-balance between two or more pgpool-II
>> servers.
>>
>>
>> 3-)  It may be good to offload the burden of PG backend node health
>> checking from secondary pgpool-II servers and delegating it solely to
>> master pgpool-II only. Which performs the backend node health
>> checking, this could help in improving the performance a little.
>>
>> 4-)  If somehow a split brain scenario happens because of network
>> partitioning or temporary network outage. The pgpool-II should be able
>> to recover by-itself after detecting the scenario.
>>
>>
>> 5-)  Add some way in pgpool-II to allow configurable quorum settings
>> to decide how and when the pgpool-II can be escalated to master
>> pgpool-II
>>
>> 6-) pgpool-II should have some configuration parameter to wait for
>> configured amount of time before starting to elect new master node in
>> case of master pgpool-II node failure. This could help to guard
>> failover in case of temporary network glitches.
>>
>> 7) Allow to use watchdog in a configuration where watchdog master and
>>    secondary cannot share the same virtual IP address (for example,
>>    different regions in AWS).
>>
>> Thoughts, comments snd suggestions are most welcome.
>>
>> Thanks and regards!
>> Muhammad Usama
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20150616/c7d8be05/attachment.html>