[pgpool-hackers: 1289] Re: changing the pcp_watchdog_info

Sun Jan 3 03:41:53 JST 2016

Hi Yugo

Thanks for your valuable input on the patch. Sorry I was mostly away from
office in last week, so couldn't reply to this any earlier.

Please find my response inline.

On Mon, Dec 21, 2015 at 5:10 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:

> Hi Usama,
>
> On Thu, 10 Dec 2015 20:29:50 +0500
> Muhammad Usama <m.usama at gmail.com> wrote:
>
> > Hi Ishii San
> >
> > pcp_watchdog_info only gives the information of a single watchdog
> > node which might not be enough in some certain situations. And as we are
> > currently working on watchdog enhancements so I thought it would be good
> to
> > also enhance the pcp_watchdog_info utility. I have created a patch to add
> > a little more information about the watchdog cluster state and nodes in
> the
> > output of pcp_watchdog_info.
> >
> > Can you please have a look at the attached patch specially for
> >
> > 1-) If you are good with all the new information shown by
> pcp_watchdog_info
> > utility or you want to add/remove something?
>
> What is "In Network Error"?  Looking into codes, this stands for the value
> of
> g_cluster.network_error. However, this is not used in watchdog codes, and
> I feel this is meanless. In addition, network_error_time is also not used.
>
> I think it would be good to add delegate_IP and QuorumStatus to
> "Watchdog Cluster Information" section in verbose mode.
>
> In get_node_list_json(), g_cluster.quorum_status is put into jNode, however
> update_quorum_status() isn't called before that. I guess
> update_connected_node_count() is called for updating
> g_cluster.aliveNodeCount.
> I think update_quorum_status() is also required to be call similarly.
>
> I think users can't understand what "Local Node Escalated" stands for,
> that is, what is the different from COORDINATOR status on local node.
> It might be better to use "VIP is up on Local Node", "Local Node has VIP"
> and so on....  Alternatively, each Node Information should have VIP holder
> or not. This can let users know whether VIP exists in the whole cluster
> instead of the local node, and users can notice the problem if VIP doesn't
> exist at any node or if mutiple VIPs exist.
>
> In non verbose mode, "Master Node Name" is shown by its host name.
> However, the host name is not shown at each Node Information section.
> Although, user might know "COORDINATOR is the master", it whould be
> good to add host name to each node information section.
>

All of the above observations are correct and reasonable. I will take care
of these.

>
>  $ ~/pgpool/bin/pcp_watchdog_info -p 11001
>
>  2 NO YES Linux_yugo-n-ubuntu_11000
>
>  [0] localhost 11000 9000 4 COORDINATOR
>  [1] localhost 12000 9001 7 STANDBY
>
>
> If there is no master (coordinator), what should be shown at "Master Node
> Name"?
> Or, this situation must not occur?
>

Yes the with the current implementation of watchdog the cluster must always
have a master node so the situation must not happen.

>
> >
> > 2-) inform watchdog info in pcp_worker violates the data serialization
> > technique used by PCP server for other functions and adopts the JSON data
> > formatted load to transmit the watchdog information to the client side.
> > Although I am of the point of view that someday we should shift all the
> > other functions to use JSON or some other serialization technique which
> is
> > more adaptable and then the current proprietary format. But for the time
> > being the watchdog informing part of PCP is different from all other.
> >
> > with the new pcp_watchdog_info when node ID is given the utility shows
> the
> > information of that specific node while ID =0 means the local watchdog
> > node. And when no node ID is provided by user information of all nodes is
> > shown
>
> To be frank, I'm not sure it is good idea to use ID=0 to stand for local
> watchdog, because the ID of Nth remote pgpool (other_pgpool_hostnameN in
> pgpool.conf) is now N+1 and I think this is slightly misleading. However,
> it is a not bad idea to allow pcp_watchdog_info show all nodes information.
>
>
Well, this is correct the new watchdog node ID system might confuse some
users because of ID=0 is now reserved for local watchdog node and the
remote watchdog nodes will start from id > 0, But the problem here is that
we need some ID for the local watchdog node not only for the
pcp_watchdog_info utility, but also for the IPC commands (GET NODES LIST, NODE
STATUS CHANGE and NODES LIST DATA) that deals with the watchdog node
information (used by the external/3rd party health checking system
integrating with the watchdog).  So I tried to use the same node ID scheme
across the new watchdog (In both IPC commands and pcp_watchdog_info
utility) to keep things consistent in the new watchdog.
I think we can document this particular change to make sure users should
not get confused by it or you have some other idea to get around this which
we can also use in watchdog IPC command?

>
> BTW, when I try to get infromation of Node 1 (1st remote pgpool),
> this shows that Node Number is 0. Is "Node Number" is diffrence
> with "Node ID"? I feel this is confusable. Is Node Number needed
> to be shown rather than Node ID?
>

I think Node ID would be more appropriate. I will make the change

Many thanks
Best regards
Muhammad Usama

>
> $ pcp_watchdog_info -p 11001 -n 1 -v
>
> Watchdog Cluster Information
> Total Nodes         : 2
> Remote Nodes        : 1
> Alive Remote Nodes  : 1
> In Network Error    : NO
> Local Node Escalated: YES
> Master Node Name    : Linux_yugo-n-ubuntu_11000
>
> Watchdog Node Information
> Node Number    : 0
> Node Name      : Linux_yugo-n-ubuntu_12000
> Host Name      : localhost
> Pgpool port    : 12000
> Watchdog port  : 9001
> Node priority  : 1
> status         : 7
> status Name    : STANDBY
>
>
>
>
> >
> > --example--
> >
> > [usama at localhost pgpool]$ bin/pcp_watchdog_info -h localhost -p 9893 -U
> > postgres -v
> > Password:
> > Watchdog Cluster Information
> > Total Nodes         : 3
> > Remote Nodes        : 2
> > Alive Remote Nodes  : 2
> > In Network Error    : NO
> > Local Node Escalated: NO
> > Master Node Name    : Linux_localhost.localdomain_9992
> >
> > Watchdog Node Information
> > Node Number    : 0
> > Node Name      : Linux_localhost.localdomain_9993
> > Host Name      : localhost
> > Pgpool port    : 9993
> > Watchdog port  : 9003
> > Node priority  : 1
> > status         : 7
> > status Name    : STANDBY
> >
> > Node Number    : 1
> > Node Name      : Linux_localhost.localdomain_9992
> > Host Name      : localhost
> > Pgpool port    : 9992
> > Watchdog port  : 9002
> > Node priority  : 1
> > status         : 4
> > status Name    : COORDINATOR
> >
> > Node Number    : 2
> > Node Name      : Linux_localhost.localdomain_9991
> > Host Name      : localhost
> > Pgpool port    : 9991
> > Watchdog port  : 9001
> > Node priority  : 1
> > status         : 7
> > status Name    : STANDBY
> >
> >
> >
> > Thanks
> > Best regards
> > Muhammad Usama
>
>
> --
> Yugo Nagata <nagata at sraoss.co.jp>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160102/0592abd9/attachment-0001.html>