[pgpool-hackers: 1327] Re: [pgpool-committers: 2927] pgpool: Fixing watchdog handling of local network failure events
Tatsuo Ishii
ishii at postgresql.org
Tue Jan 12 15:53:53 JST 2016
Usama,
After the commit, I got regression failure with 004.watchdog on my
Ubuntu laptop. log/004.watchdog and
tests/004.watchdog/testdir/{master,standby}/log/pgpool.log are
attached.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From: Muhammad Usama <m.usama at gmail.com>
Subject: [pgpool-committers: 2927] pgpool: Fixing watchdog handling of local network failure events
Date: Mon, 11 Jan 2016 14:28:28 +0000
Message-ID: <E1aIdSO-0001pH-SL at gothos.postgresql.org>
> Fixing watchdog handling of local network failure events
>
> WD_EVENT_NW_IP_IS_REMOVED and WD_EVENT_LOCAL_NODE_LOST must be handled in
> the main state machine function to make sure all watchdog states should act
> properly on these events.
>
> Branch
> ------
> master
>
> Details
> -------
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6ac8844288d21dc598d90abef3324c7cd9f9ac53
>
> Modified Files
> --------------
> src/watchdog/watchdog.c | 63 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 49 insertions(+), 14 deletions(-)
>
> _______________________________________________
> pgpool-committers mailing list
> pgpool-committers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-committers
-------------- next part --------------
creating master pgpool...Satrting set up in raw mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
start all
waiting for pgpool-II coming up...done.
create extensions
Pager usage is off.
node_id | hostname | port | status | lb_weight | role | select_cnt
---------+----------+-------+--------+-----------+--------+------------
0 | /tmp | 11002 | 2 | 1.000000 | master | 0
(1 row)
shutdown all
pgpool-II setting for raw mode is done.
To start the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/startall.
To shutdown the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/shutdownall.
pcp command user name is "t-ishii", password is "t-ishii".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11002
pgpool port is 11000
pcp port is 11001
The info above is in README.port.
master setup done.
waiting for server to start....10019 2016-01-12 15:28:35 JST LOG: redirecting log output to logging collector process
10019 2016-01-12 15:28:35 JST HINT: Future log output will appear in directory "pg_log".
done
server started
creating standby pgpool...Waiting for the pgpool master...
Master brought up successfully.
Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times
2016-01-12 15:29:11: pid 10190: LOG: stop request sent to pgpool. waiting for termination...
.done.
Checking if the Standby pgpool-II detected the master shutdown...
Master shutdown detected.
Checking if the Standby pgpool-II takes over the master responsibility...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
Standby successfully became the new master.
2016-01-12 15:29:22: pid 10214: LOG: stop request sent to pgpool. waiting for termination...
.done.
2016-01-12 15:29:23: pid 10218: FATAL: could not read pid file
waiting for server to shut down.... done
server stopped
3 out of 4 successfull
-------------- next part --------------
2016-01-12 15:28:36: pid 10028: LOG: Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-01-12 15:28:36: pid 10028: LOG: waiting for watchdog to initialize
2016-01-12 15:28:36: pid 10032: LOG: setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:36: pid 10032: LOG: watchdog cluster configured with 1 remote nodes
2016-01-12 15:28:36: pid 10032: LOG: watchdog remote node:0 on localhost:21104
2016-01-12 15:28:36: pid 10032: LOG: IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21004"
2016-01-12 15:28:40: pid 10032: LOG: watchdog node state changed from [LOADING] to [JOINING]
2016-01-12 15:28:45: pid 10032: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2016-01-12 15:28:46: pid 10032: LOG: I am the only alive node in the watchdog cluster
2016-01-12 15:28:46: pid 10032: HINT: skiping stand for coordinator state
2016-01-12 15:28:46: pid 10032: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2016-01-12 15:28:46: pid 10032: LOG: I am announcing my self as master/coordinator watchdog node
2016-01-12 15:28:51: pid 10032: LOG: I am the cluster leader node
2016-01-12 15:28:51: pid 10032: DETAIL: our declare coordinator message is accepted by all nodes
2016-01-12 15:28:51: pid 10032: LOG: I am the cluster leader node. Starting escalation process
2016-01-12 15:28:51: pid 10028: LOG: watchdog process is initialized
2016-01-12 15:28:51: pid 10032: LOG: escalation process started with PID:10076
2016-01-12 15:28:51: pid 10032: LOG: new IPC connection is received
2016-01-12 15:28:51: pid 10028: LOG: Setting up socket for 127.0.0.1:11000
2016-01-12 15:28:51: pid 10077: LOG: 2 watchdog nodes are configured for lifecheck
2016-01-12 15:28:51: pid 10077: LOG: watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:51: pid 10077: DETAIL: Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-01-12 15:28:51: pid 10077: LOG: watchdog nodes ID:1 Name:""
2016-01-12 15:28:51: pid 10077: DETAIL: Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-01-12 15:28:51: pid 10076: LOG: watchdog: escalation started
2016-01-12 15:28:51: pid 10032: LOG: watchdog escalation process with pid: 10076 exit with SUCCESS.
2016-01-12 15:28:51: pid 10032: LOG: watchdog escalation process with pid: 10076 exits with status 0
2016-01-12 15:28:51: pid 10028: LOG: pgpool-II successfully started. version 3.5beta1 (ekieboshi)
2016-01-12 15:28:51: pid 10109: LOG: DB node id: 0 backend pid: 10116 statement: DISCARD ALL
2016-01-12 15:28:51: pid 10032: LOG: new watchdog node connection is received from "127.0.0.1:17819"
2016-01-12 15:28:52: pid 10081: LOG: set SO_REUSEPORT option to the socket
2016-01-12 15:28:52: pid 10081: LOG: creating socket for sending heartbeat
2016-01-12 15:28:52: pid 10081: DETAIL: set SO_REUSEPORT
2016-01-12 15:28:52: pid 10079: LOG: set SO_REUSEPORT option to the socket
2016-01-12 15:28:52: pid 10079: LOG: creating watchdog heartbeat receive socket.
2016-01-12 15:28:52: pid 10079: DETAIL: set SO_REUSEPORT
2016-01-12 15:28:58: pid 10032: LOG: new outbond connection to localhost:21104
2016-01-12 15:29:01: pid 10077: LOG: watchdog: lifecheck started
2016-01-12 15:29:11: pid 10028: LOG: received fast shutdown request
2016-01-12 15:29:11: pid 10028: LOG: shutdown request. closing listen socket
2016-01-12 15:29:11: pid 10078: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10085: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10109: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10080: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10082: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10083: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10084: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10086: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10087: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10088: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10089: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10090: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10091: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10092: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10093: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10094: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10095: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10111: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10102: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10103: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10101: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10104: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10100: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10032: LOG: Watchdog is shutting down
2016-01-12 15:29:11: pid 10105: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10099: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10106: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10191: LOG: watchdog: de-escalation started
2016-01-12 15:29:11: pid 10098: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10110: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10097: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10107: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10108: LOG: child process received shutdown request signal 2
2016-01-12 15:29:11: pid 10096: LOG: child process received shutdown request signal 2
-------------- next part --------------
2016-01-12 15:28:51: pid 10119: LOG: Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-01-12 15:28:51: pid 10119: LOG: waiting for watchdog to initialize
2016-01-12 15:28:51: pid 10123: LOG: setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11100"
2016-01-12 15:28:51: pid 10123: LOG: watchdog cluster configured with 1 remote nodes
2016-01-12 15:28:51: pid 10123: LOG: watchdog remote node:0 on localhost:21004
2016-01-12 15:28:51: pid 10123: LOG: IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21104"
2016-01-12 15:28:51: pid 10123: LOG: new outbond connection to localhost:21004
2016-01-12 15:28:51: pid 10123: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2016-01-12 15:28:52: pid 10123: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2016-01-12 15:28:52: pid 10119: LOG: watchdog process is initialized
2016-01-12 15:28:52: pid 10123: LOG: new IPC connection is received
2016-01-12 15:28:52: pid 10119: LOG: Setting up socket for 127.0.0.1:11100
2016-01-12 15:28:52: pid 10129: LOG: 2 watchdog nodes are configured for lifecheck
2016-01-12 15:28:52: pid 10129: LOG: watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11100"
2016-01-12 15:28:52: pid 10129: DETAIL: Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-01-12 15:28:52: pid 10129: LOG: watchdog nodes ID:1 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-01-12 15:28:52: pid 10129: DETAIL: Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-01-12 15:28:52: pid 10119: LOG: pgpool-II successfully started. version 3.5beta1 (ekieboshi)
2016-01-12 15:28:53: pid 10134: LOG: set SO_REUSEPORT option to the socket
2016-01-12 15:28:53: pid 10134: LOG: creating socket for sending heartbeat
2016-01-12 15:28:53: pid 10134: DETAIL: set SO_REUSEPORT
2016-01-12 15:28:53: pid 10132: LOG: set SO_REUSEPORT option to the socket
2016-01-12 15:28:53: pid 10132: LOG: creating watchdog heartbeat receive socket.
2016-01-12 15:28:53: pid 10132: DETAIL: set SO_REUSEPORT
2016-01-12 15:28:58: pid 10123: LOG: new watchdog node connection is received from "127.0.0.1:28554"
2016-01-12 15:29:02: pid 10129: LOG: watchdog: lifecheck started
2016-01-12 15:29:11: pid 10123: LOG: remote node "Linux_tishii-CF-SX3HE4BP_11000" is shutting down
2016-01-12 15:29:11: pid 10123: LOG: watchdog cluster has lost the coordinator node
2016-01-12 15:29:11: pid 10123: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2016-01-12 15:29:11: pid 10123: LOG: read from socket failed with error :"Connection reset by peer"
2016-01-12 15:29:11: pid 10123: LOG: read from socket failed, remote end closed the connection
2016-01-12 15:29:16: pid 10123: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2016-01-12 15:29:17: pid 10123: LOG: I am the only alive node in the watchdog cluster
2016-01-12 15:29:17: pid 10123: HINT: skiping stand for coordinator state
2016-01-12 15:29:17: pid 10123: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2016-01-12 15:29:17: pid 10123: LOG: I am announcing my self as master/coordinator watchdog node
2016-01-12 15:29:22: pid 10123: LOG: I am the cluster leader node
2016-01-12 15:29:22: pid 10123: DETAIL: our declare coordinator message is accepted by all nodes
2016-01-12 15:29:22: pid 10123: LOG: I am the cluster leader node. Starting escalation process
2016-01-12 15:29:22: pid 10123: LOG: escalation process started with PID:10212
2016-01-12 15:29:22: pid 10212: LOG: watchdog: escalation started
2016-01-12 15:29:22: pid 10123: LOG: watchdog escalation process with pid: 10212 exit with SUCCESS.
2016-01-12 15:29:22: pid 10123: LOG: watchdog escalation process with pid: 10212 exits with status 0
2016-01-12 15:29:22: pid 10119: LOG: received fast shutdown request
2016-01-12 15:29:22: pid 10119: LOG: shutdown request. closing listen socket
2016-01-12 15:29:22: pid 10130: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10141: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10144: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10133: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10142: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10136: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10135: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10137: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10147: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10143: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10138: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10149: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10150: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10145: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10139: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10153: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10151: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10140: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10162: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10160: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10163: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10148: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10161: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10152: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10159: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10154: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10158: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10155: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10123: LOG: Watchdog is shutting down
2016-01-12 15:29:22: pid 10157: LOG: child process received shutdown request signal 2
2016-01-12 15:29:22: pid 10215: LOG: watchdog: de-escalation started
2016-01-12 15:29:22: pid 10156: LOG: child process received shutdown request signal 2
More information about the pgpool-hackers
mailing list