[pgpool-hackers: 987] Re: Proposal to make watchdog more robust.

Ahsan Hadi ahsan.hadi at enterprisedb.com
Fri Jul 31 04:39:50 JST 2015


Usama,

What is the ETA for your first patch? It is not clear from the email
whether the entire watchdog will be overhauled by your patches? Can you
elaborate?

When you submit your patches please provide details on how it is affecting
the current watchdog feature.

-- Ahsan


On Mon, Jul 27, 2015 at 1:23 AM, Muhammad Usama <m.usama at gmail.com> wrote:

> Hi
>
> I have been working on this watchdog enhancement for a while, So I thought
> its time to share an update on the work.
> As It is a very big task and is a little more complex than I
> originally anticipated. And one of the biggest hurdle and challenge in the
> development is to make a setup for development testing, As even a simple
> change requires a long time to test and to replicate a required test
> scenario because of the nature of the feature, which involves multiple
> pgpool-II instances. So I have changed a strategy and instead of directly
> writing the code in the pgpool-II, I have created a separate binary that
> creates a watchdog cluster and implementing all the logic, required
> features and functions in that. Doing this substantially reduced the
> development and testing time, but also has a negative side. That I am not
> able to share the incremental patches for the work so far, which was the
> original plan.
> So now instead of dividing the work into ten or twelve smaller patches I
> will be sharing only three or four slightly bigger patches. And on the
> timeline front of this feature, I am hopeful that the feature will be ready
> to be checked-in by the end of next month and will be divided in three
> parts.
> The first part will contain the new design of watchdog system, including a
> thread-less design and a dedicated process to handle all watchdog related
> state management and communication.
> The Second part will contain the addition of multiple options for the
> heartbeat while the third portion will consist of new virtual-IP and
> network status monitoring system for watchdog.
>
> Thanks
> Best regards
> Muhammad Usama
>
>
> On Tue, Jun 16, 2015 at 12:28 AM, Muhammad Usama <m.usama at gmail.com>
> wrote:
>
>> Hi
>>
>> I have created a TODO item and proposal page in pgpool wiki.
>>
>> http://pgpool.net/mediawiki/index.php/watchdog_feature_enhancement
>>
>>
>> Thanks
>> Kind regards
>> Muhammad Usama
>>
>>
>> On Fri, Jun 12, 2015 at 12:34 AM, Muhammad Usama <m.usama at gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I have been further working on above for enhancing the pgpool-II
>>> watchdog. Please read below for the detailed design overview document of
>>> watchdog enhancements
>>>
>>> Terminologies used below
>>> ---------------------------------------------------
>>> Cluster:
>>>             Cluster is the logical entity which contains all the
>>> pgpool-II server nodes connected by pgpool-II watchdog.
>>>
>>>
>>> What is required by the watchdog?
>>> ---------------------------------------------------
>>> The main purpose of the watchdog in pgpool-II is to provide high
>>> availability, For this purpose the watchdog is required to ensure following.
>>>
>>> -- Ensure only healthy nodes are part of the cluster
>>> -- Ensure only authorized nodes can become the member of the cluster
>>> -- Ensure only one pgpool-II node is a designated master node at any time
>>> -- Provide the automatic recovery mechanism when possible when some
>>> problem occurs
>>>
>>>
>>> The watchdog should provide a guard against following types of failures
>>> -----------------------------------------------------------------
>>> -- pgpool-II service failure
>>> -- complete or partial network failures.
>>>
>>>
>>> High level responsibilities of the watchdog
>>> --------------------------------------------------------
>>> -- Health checking of all participating pgpool-II nodes in the cluster
>>> including the health checking of local pgpool-II server.
>>> -- Ensure the availability of delegate-ip always on a single node at all
>>> the time.
>>> -- Mechanism to add and remove pgpool-II nodes from the cluster.
>>> -- Perform the leader election to select the master node when the
>>> cluster is initialized or in case of master node failure.
>>> -- Performs an automatic recovery if the due to some issue the cluster
>>> state is broken or split-brain scenario happens
>>> -- Generate alarms for failures where administrator intervention is
>>> required to rectify the problem.
>>> -- Manage the pgpool-II configurations to make sure all the nodes in the
>>> cluster have similar configurations.
>>> -- Provide the effective way of health checking of other nodes
>>> (heartbeat) and messaging between participating nodes.
>>> -- Ensure security so that only intended nodes can become the cluster
>>> members.
>>> -- Provide the mechanism so that administrator can check the status of
>>> the cluster and alarms generated by cluster.
>>> -- Able to remove the node membership from the cluster(node fencing) if
>>> a problematic node is detected or requested by administrator command.
>>>
>>> Watchdog on Amazon Cloud and other cloud flavours
>>> -------------------------------------------------------------------------
>>> This is the much asked for feature that pgpool-II watchdog should work
>>> seamlessly on AWS. So the enhanced watchdog will work on amazon cloud where
>>> a simple virtual IP can not be used by pgpool-II watchdog. For this the
>>> enhanced watchdog will implement two new features.
>>>
>>> 1 -- Active-Active watchdog configuration:
>>>                           This will be a big improvement to the
>>> pgpool-II watchdog and this would effectually mean that multiple pgpool-II
>>> servers can be installed and external load-balancer and HA system can be
>>> used with the pgpool-II
>>>
>>> 2 -- New watchdog will be flexible enough to allow utilities other than
>>> ifconfig (e.g ec2-assign-private-ip-addresses for AWS virtual IP) can be
>>> used to bring up virtual-IP
>>>
>>>
>>> Logical Components of watchdog
>>> ---------------------------------------------
>>> The pgpool-II watchdog system will consists of following discrete
>>> logical components
>>>
>>> -- Heartbeat to monitor health and availability of cluster member nodes.
>>> -- Messaging system, to share status and configurations between cluster
>>> member nodes.
>>>                ---- All the messaging will be in xml or text based
>>> extensible protocol to ensure easy debugging and future extensions
>>>                ---- Will provide a communication mechanism for unicast
>>> as well as broadcast messaging
>>> -- Local resource manager, which will have a responsibility to monitor
>>> the health of local resources. It will consist of two sub components
>>>                ---- delegate-IP monitoring and control
>>>                ---- Local pgpool-II server monitoring
>>> -- Information database, That will store and manage all the cluster wide
>>> runtime information and pgpool-II configurations
>>> -- IPC listener to enable administrator control by PCP commands.
>>>
>>>
>>> Working overview of watchdog system
>>> ---------------------------------------------------
>>> The new watchdog system will be a finite state machine which will
>>> transit between different states. Some prominent systems states will be
>>>
>>> IDLE                              -- nothing is happening
>>>
>>> STARTING                    -- starting up
>>>
>>> STOPING                      -- stoping
>>>
>>> ELECTION                    -- Take part in the election
>>>
>>> JOINING CLUSTER     -- we are initialised and joining the cluster
>>>
>>> ELECTED MASTER     -- If the node has been just elected as the master
>>> node
>>>
>>> NORMAL NODE           -- If we are not master and have joined the
>>> cluster as a slave node
>>>
>>> RECOVERY                  -- some event occurred and we are recovering
>>> from it
>>>
>>>
>>> The basic working of the watchdog will be as follows:
>>> ========================================
>>> At startup do basic sanity checks and go into the normal member node
>>> state, wait for the instructions from the master node or start the election
>>> algorithm.
>>> If the election algorithm is started, Participate in the elections and
>>> become either master node or normal node, depending on election results.
>>> Once the election is complete, if we are the master node, move to the
>>> master waiting state and construct the complete view of member nodes and
>>> cluster state
>>> Construct the information database and propagate it to all member nodes.
>>> Start the health-checking of local resources and remote nodes and stay
>>> in this state until some failure occur. Depending upon type of failure or
>>> event take appropriate actions.
>>> The action could be one of the following
>>>      -- Kill itself.
>>>      -- Start leader election
>>>      -- Restart a local resource (pgpool-II server or delegate-IP)
>>>      -- Inform about some event or failure to master node (if it is not
>>> master node)
>>>      -- Replicate the configuration or information to the member nodes
>>> (master node only)
>>>      -- Perform fencing of member node (master node only)
>>>
>>> Responsibilities of master watchdog node.
>>> =================================
>>> -- Maintaining the up to date configurations of pgpool-II and
>>> replicating it to all participating nodes in the cluster
>>> -- Health checking of backend pgpool-II nodes, And if the configuration
>>> is in such a way that all members are required to do backend health
>>> checking, or if the backend error is detected by some other member of
>>> cluster, then ensure that failover of the backend node is executed only by
>>> a single node.
>>> -- Managing the fencing, joining and leaving of members from the cluster
>>> -- Keeping hold of delegate-IP and making sure that it is recovered back
>>> if for some reason it is dropped.
>>> -- Handing over the responsibility to some other cluster member if for
>>> some issue, it is not able to continue as master node or instruct by
>>> administrator command.
>>>
>>>
>>> Leader election algorithm
>>> ====================
>>> Selecting the best algorithm for selecting the master pgpool-II node in
>>> case of master node failure or at start-up is still a TODO, and one of the
>>> suggestion is to use Leader Election in Asynchronous Distributed Systems
>>> http://www.cs.indiana.edu/pub/techreports/TR521.pdf algorithm (Also
>>> used by pacemaker).
>>> Other leader algorithm suggestions are most welcome
>>>
>>>
>>> Thought, suggestions, Comments ???
>>>
>>> Thanks
>>> Best regards
>>> Muhammad Usama
>>>
>>>
>>>
>>>
>>> On Mon, Mar 2, 2015 at 4:08 PM, Muhammad Usama <m.usama at gmail.com>
>>> wrote:
>>>
>>>> Hi pgpool-II hackers,
>>>>
>>>> pgpool-II's watchdog is used to eliminate single point of failure and
>>>> provide HA, Although current watchdog is serving the purpose but I
>>>> think there is a need to enhance this feature and make it more robust
>>>> and adoptable. So that it can work seamlessly in verity of scenarios
>>>> and with different system and cloud flavours.
>>>>
>>>> Below are the few points on which I think the enhancements can be made
>>>> to make pgpool-II more robust for high availability scenarios.
>>>>
>>>> 1-) Provide multiple options for heartbeat to check the availability
>>>> of other pgpool-II servers.
>>>>      a-) UDP uni-cast (Already present)
>>>>      b-) UDP multicast, Will be helpful in reducing network traffic.
>>>>      c-) TCP heartbeat.
>>>>
>>>> 2-)  pgpool-II running in one group, should also sync the
>>>> configurations.
>>>>
>>>> I think it would be good, If multiple pgpool-II servers running in one
>>>> group (connected to each other by watchdog), should have same
>>>> configuration parameter values and consistent view of backend nodes.
>>>> Doing this will also help in cases when some external IP based
>>>> load-balancer is used to load-balance between two or more pgpool-II
>>>> servers.
>>>>
>>>>
>>>> 3-)  It may be good to offload the burden of PG backend node health
>>>> checking from secondary pgpool-II servers and delegating it solely to
>>>> master pgpool-II only. Which performs the backend node health
>>>> checking, this could help in improving the performance a little.
>>>>
>>>> 4-)  If somehow a split brain scenario happens because of network
>>>> partitioning or temporary network outage. The pgpool-II should be able
>>>> to recover by-itself after detecting the scenario.
>>>>
>>>>
>>>> 5-)  Add some way in pgpool-II to allow configurable quorum settings
>>>> to decide how and when the pgpool-II can be escalated to master
>>>> pgpool-II
>>>>
>>>> 6-) pgpool-II should have some configuration parameter to wait for
>>>> configured amount of time before starting to elect new master node in
>>>> case of master pgpool-II node failure. This could help to guard
>>>> failover in case of temporary network glitches.
>>>>
>>>> 7) Allow to use watchdog in a configuration where watchdog master and
>>>>    secondary cannot share the same virtual IP address (for example,
>>>>    different regions in AWS).
>>>>
>>>> Thoughts, comments snd suggestions are most welcome.
>>>>
>>>> Thanks and regards!
>>>> Muhammad Usama
>>>>
>>>
>>>
>>
>
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>
>


-- 
Ahsan Hadi
Snr Director Product Development
EnterpriseDB Corporation
The Enterprise Postgres Company

Phone: +92-51-8358874
Mobile: +92-333-5162114

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20150731/340b5597/attachment-0001.html>


More information about the pgpool-hackers mailing list