[pgpool-hackers: 1327] Re: [pgpool-committers: 2927] pgpool: Fixing watchdog handling of local network failure events

Tatsuo Ishii ishii at postgresql.org
Tue Jan 12 15:53:53 JST 2016


Usama,

After the commit, I got regression failure with 004.watchdog on my
Ubuntu laptop. log/004.watchdog and
tests/004.watchdog/testdir/{master,standby}/log/pgpool.log are
attached.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Muhammad Usama <m.usama at gmail.com>
Subject: [pgpool-committers: 2927] pgpool: Fixing watchdog handling of local network failure events
Date: Mon, 11 Jan 2016 14:28:28 +0000
Message-ID: <E1aIdSO-0001pH-SL at gothos.postgresql.org>

> Fixing watchdog handling of local network failure events
> 
> WD_EVENT_NW_IP_IS_REMOVED and WD_EVENT_LOCAL_NODE_LOST must be handled in
> the main state machine function to make sure all watchdog states should act
> properly on these events.
> 
> Branch
> ------
> master
> 
> Details
> -------
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6ac8844288d21dc598d90abef3324c7cd9f9ac53
> 
> Modified Files
> --------------
> src/watchdog/watchdog.c | 63 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 49 insertions(+), 14 deletions(-)
> 
> _______________________________________________
> pgpool-committers mailing list
> pgpool-committers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-committers
-------------- next part --------------
creating master pgpool...Satrting set up in raw mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
start all
waiting for pgpool-II coming up...done.
create extensions
Pager usage is off.
 node_id | hostname | port  | status | lb_weight |  role  | select_cnt 
---------+----------+-------+--------+-----------+--------+------------
 0       | /tmp     | 11002 | 2      | 1.000000  | master | 0
(1 row)

shutdown all

pgpool-II setting for raw mode is done.
To start the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/startall.
To shutdown the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/shutdownall.
pcp command user name is "t-ishii", password is "t-ishii".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11002
pgpool port is 11000
pcp port is 11001
The info above is in README.port.
master setup done.
waiting for server to start....10019 2016-01-12 15:28:35 JST LOG:  redirecting log output to logging collector process
10019 2016-01-12 15:28:35 JST HINT:  Future log output will appear in directory "pg_log".
 done
server started
creating standby pgpool...Waiting for the pgpool master...
Master brought up successfully.
Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times
2016-01-12 15:29:11: pid 10190: LOG:  stop request sent to pgpool. waiting for termination...
.done.
Checking if the Standby pgpool-II detected the master shutdown...
Master shutdown detected.
Checking if the Standby pgpool-II takes over the master responsibility...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
Standby successfully became the new master.
2016-01-12 15:29:22: pid 10214: LOG:  stop request sent to pgpool. waiting for termination...
.done.
2016-01-12 15:29:23: pid 10218: FATAL:  could not read pid file
waiting for server to shut down.... done
server stopped
3 out of 4 successfull
-------------- next part --------------
2016-01-12 15:28:36: pid 10028: LOG:  Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-01-12 15:28:36: pid 10028: LOG:  waiting for watchdog to initialize
2016-01-12 15:28:36: pid 10032: LOG:  setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:36: pid 10032: LOG:  watchdog cluster configured with 1 remote nodes
2016-01-12 15:28:36: pid 10032: LOG:  watchdog remote node:0 on localhost:21104
2016-01-12 15:28:36: pid 10032: LOG:  IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21004"
2016-01-12 15:28:40: pid 10032: LOG:  watchdog node state changed from [LOADING] to [JOINING]
2016-01-12 15:28:45: pid 10032: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2016-01-12 15:28:46: pid 10032: LOG:  I am the only alive node in the watchdog cluster
2016-01-12 15:28:46: pid 10032: HINT:  skiping stand for coordinator state
2016-01-12 15:28:46: pid 10032: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
2016-01-12 15:28:46: pid 10032: LOG:  I am announcing my self as master/coordinator watchdog node
2016-01-12 15:28:51: pid 10032: LOG:  I am the cluster leader node
2016-01-12 15:28:51: pid 10032: DETAIL:  our declare coordinator message is accepted by all nodes
2016-01-12 15:28:51: pid 10032: LOG:  I am the cluster leader node. Starting escalation process
2016-01-12 15:28:51: pid 10028: LOG:  watchdog process is initialized
2016-01-12 15:28:51: pid 10032: LOG:  escalation process started with PID:10076
2016-01-12 15:28:51: pid 10032: LOG:  new IPC connection is received 
2016-01-12 15:28:51: pid 10028: LOG:  Setting up socket for 127.0.0.1:11000
2016-01-12 15:28:51: pid 10077: LOG:  2 watchdog nodes are configured for lifecheck
2016-01-12 15:28:51: pid 10077: LOG:  watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:51: pid 10077: DETAIL:  Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-01-12 15:28:51: pid 10077: LOG:  watchdog nodes ID:1 Name:""
2016-01-12 15:28:51: pid 10077: DETAIL:  Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-01-12 15:28:51: pid 10076: LOG:  watchdog: escalation started
2016-01-12 15:28:51: pid 10032: LOG:  watchdog escalation process with pid: 10076 exit with SUCCESS.
2016-01-12 15:28:51: pid 10032: LOG:  watchdog escalation process with pid: 10076 exits with status 0
2016-01-12 15:28:51: pid 10028: LOG:  pgpool-II successfully started. version 3.5beta1 (ekieboshi)
2016-01-12 15:28:51: pid 10109: LOG:  DB node id: 0 backend pid: 10116 statement:  DISCARD ALL
2016-01-12 15:28:51: pid 10032: LOG:  new watchdog node connection is received from "127.0.0.1:17819"
2016-01-12 15:28:52: pid 10081: LOG:  set SO_REUSEPORT option to the socket
2016-01-12 15:28:52: pid 10081: LOG:  creating socket for sending heartbeat
2016-01-12 15:28:52: pid 10081: DETAIL:  set SO_REUSEPORT
2016-01-12 15:28:52: pid 10079: LOG:  set SO_REUSEPORT option to the socket
2016-01-12 15:28:52: pid 10079: LOG:  creating watchdog heartbeat receive socket.
2016-01-12 15:28:52: pid 10079: DETAIL:  set SO_REUSEPORT
2016-01-12 15:28:58: pid 10032: LOG:  new outbond connection to localhost:21104 
2016-01-12 15:29:01: pid 10077: LOG:  watchdog: lifecheck started
2016-01-12 15:29:11: pid 10028: LOG:  received fast shutdown request
2016-01-12 15:29:11: pid 10028: LOG:  shutdown request. closing listen socket
2016-01-12 15:29:11: pid 10078: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10085: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10109: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10080: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10082: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10083: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10084: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10086: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10087: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10088: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10089: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10090: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10091: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10092: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10093: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10094: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10095: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10111: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10102: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10103: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10101: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10104: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10100: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10032: LOG:  Watchdog is shutting down
2016-01-12 15:29:11: pid 10105: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10099: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10106: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10191: LOG:  watchdog: de-escalation started
2016-01-12 15:29:11: pid 10098: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10110: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10097: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10107: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10108: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10096: LOG:  child process received shutdown request signal 2
-------------- next part --------------
2016-01-12 15:28:51: pid 10119: LOG:  Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-01-12 15:28:51: pid 10119: LOG:  waiting for watchdog to initialize
2016-01-12 15:28:51: pid 10123: LOG:  setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11100"
2016-01-12 15:28:51: pid 10123: LOG:  watchdog cluster configured with 1 remote nodes
2016-01-12 15:28:51: pid 10123: LOG:  watchdog remote node:0 on localhost:21004
2016-01-12 15:28:51: pid 10123: LOG:  IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21104"
2016-01-12 15:28:51: pid 10123: LOG:  new outbond connection to localhost:21004 
2016-01-12 15:28:51: pid 10123: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]
2016-01-12 15:28:52: pid 10123: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]
2016-01-12 15:28:52: pid 10119: LOG:  watchdog process is initialized
2016-01-12 15:28:52: pid 10123: LOG:  new IPC connection is received 
2016-01-12 15:28:52: pid 10119: LOG:  Setting up socket for 127.0.0.1:11100
2016-01-12 15:28:52: pid 10129: LOG:  2 watchdog nodes are configured for lifecheck
2016-01-12 15:28:52: pid 10129: LOG:  watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11100"
2016-01-12 15:28:52: pid 10129: DETAIL:  Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-01-12 15:28:52: pid 10129: LOG:  watchdog nodes ID:1 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:52: pid 10129: DETAIL:  Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-01-12 15:28:52: pid 10119: LOG:  pgpool-II successfully started. version 3.5beta1 (ekieboshi)
2016-01-12 15:28:53: pid 10134: LOG:  set SO_REUSEPORT option to the socket
2016-01-12 15:28:53: pid 10134: LOG:  creating socket for sending heartbeat
2016-01-12 15:28:53: pid 10134: DETAIL:  set SO_REUSEPORT
2016-01-12 15:28:53: pid 10132: LOG:  set SO_REUSEPORT option to the socket
2016-01-12 15:28:53: pid 10132: LOG:  creating watchdog heartbeat receive socket.
2016-01-12 15:28:53: pid 10132: DETAIL:  set SO_REUSEPORT
2016-01-12 15:28:58: pid 10123: LOG:  new watchdog node connection is received from "127.0.0.1:28554"
2016-01-12 15:29:02: pid 10129: LOG:  watchdog: lifecheck started
2016-01-12 15:29:11: pid 10123: LOG:  remote node "Linux_tishii-CF-SX3HE4BP_11000" is shutting down
2016-01-12 15:29:11: pid 10123: LOG:  watchdog cluster has lost the coordinator node
2016-01-12 15:29:11: pid 10123: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
2016-01-12 15:29:11: pid 10123: LOG:  read from socket failed with error :"Connection reset by peer"
2016-01-12 15:29:11: pid 10123: LOG:  read from socket failed, remote end closed the connection
2016-01-12 15:29:16: pid 10123: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2016-01-12 15:29:17: pid 10123: LOG:  I am the only alive node in the watchdog cluster
2016-01-12 15:29:17: pid 10123: HINT:  skiping stand for coordinator state
2016-01-12 15:29:17: pid 10123: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
2016-01-12 15:29:17: pid 10123: LOG:  I am announcing my self as master/coordinator watchdog node
2016-01-12 15:29:22: pid 10123: LOG:  I am the cluster leader node
2016-01-12 15:29:22: pid 10123: DETAIL:  our declare coordinator message is accepted by all nodes
2016-01-12 15:29:22: pid 10123: LOG:  I am the cluster leader node. Starting escalation process
2016-01-12 15:29:22: pid 10123: LOG:  escalation process started with PID:10212
2016-01-12 15:29:22: pid 10212: LOG:  watchdog: escalation started
2016-01-12 15:29:22: pid 10123: LOG:  watchdog escalation process with pid: 10212 exit with SUCCESS.
2016-01-12 15:29:22: pid 10123: LOG:  watchdog escalation process with pid: 10212 exits with status 0
2016-01-12 15:29:22: pid 10119: LOG:  received fast shutdown request
2016-01-12 15:29:22: pid 10119: LOG:  shutdown request. closing listen socket
2016-01-12 15:29:22: pid 10130: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10141: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10144: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10133: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10142: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10136: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10135: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10137: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10147: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10143: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10138: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10149: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10150: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10145: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10139: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10153: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10151: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10140: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10162: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10160: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10163: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10148: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10161: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10152: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10159: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10154: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10158: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10155: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10123: LOG:  Watchdog is shutting down
2016-01-12 15:29:22: pid 10157: LOG:  child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10215: LOG:  watchdog: de-escalation started
2016-01-12 15:29:22: pid 10156: LOG:  child process received shutdown request signal 2


More information about the pgpool-hackers mailing list