Hi team, We reproduced this phenomenon on our side again. Have a look at following operations, and its log, then look at the point we mentioned it with "<-- *** This should be slave2 ***". -bash-3.2$ LANG=C date ; psql -h 10.1.1.187 -p 9999 -c "show pool_nodes" Thu Apr 25 21:03:54 JST 2013 node_id | hostname | port | status | lb_weight | role ---------+----------+------+--------+-----------+--------- 0 | slave1 | 5444 | 2 | 0.333333 | primary 1 | slave2 | 5444 | 2 | 0.333333 | standby 2 | slave3 | 5444 | 2 | 0.333333 | standby (3 行) execute following command on slave1. -bash-3.2$ pg_ctl -D /opt/PostgresPlus/9.2AS/data -m immediate stop We added "bash -x" for SRHS_follow_master.sh to see its details. We got log files like below. /tmp/pgpool.log --- + PG_CTL=/opt/PostgresPlus/9.2AS/bin/pg_ctl + FAILED_NODE_ID=0 + FAILED_NODE_NAME=slave1 + FAILED_NODE_PORT=5444 + FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + NEW_MASTER_ID=1 + OLD_MASTER_ID=0 + NEW_MASTER_NAME=slave2 + OLD_PRIMARY_ID=0 + NEW_MASTER_DATA=/opt/PostgresPlus/9.2AS/data + TRIGGER=/tmp/trigger_file1 + IDENTITY_FILE=/root/.ssh/id_rsa + '[' 0 = 0 ']' ++ date + echo $'2013\345\271\264' $'4\346\234\210' $'25\346\227\245' $'\346\234\250\346\233\234\346\227\245' 21:06:25 JST + echo FAILED_NODE_ID =0 + echo FAILED_NODE_NAME=slave1 + echo FAILED_NODE_PORT=5444 + echo FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + echo NEW_MASTER_ID =1 + echo OLD_MASTER_ID =0 + echo NEW_MASTER_NAME =slave2 + echo OLD_PRIMARY_ID =0 + echo NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data + ssh -T enterprisedb@slave2 /opt/PostgresPlus/9.2AS/bin/pg_ctl promote -D /opt/PostgresPlus/9.2AS/data サーバを昇進中です。 + echo 0 + exit 0 + LOGFILE=/var/log/pgpool/follow_master.log + PCP_ATTACH_NODE=/opt/PostgresPlus/9.2AS/bin/pcp_attach_node + FOLLOW_CMD=/opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh + PG_CTL=/opt/PostgresPlus/9.2AS/bin/pg_ctl + IDENTITY_FILE=/root/.ssh/id_rsa + TIMEOUT=10 + HOST=localhost + PORT=9898 + USERID=enterprisedb + PASSWD=redhat + FAILED_NODE_ID=0 + FAILED_NODE_NAME=slave1 + FAILED_NODE_PORT=5444 + FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + NEW_MASTER_ID=1 + OLD_MASTER_ID=0 + NEW_MASTER_NAME=slave2 + OLD_PRIMARY_ID=0 + NEW_MASTER_DATA=/opt/PostgresPlus/9.2AS/data ++ date + echo $'2013\345\271\264' $'4\346\234\210' $'25\346\227\245' $'\346\234\250\346\233\234\346\227\245' 21:06:26 JST + echo FAILED_NODE_ID =0 + echo FAILED_NODE_NAME=slave1 + echo FAILED_NODE_PORT=5444 + echo FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + echo NEW_MASTER_ID =1 + echo OLD_MASTER_ID =0 + echo NEW_MASTER_NAME =slave2 + echo OLD_PRIMARY_ID =0 + echo NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data + NEW_MASTER_HOST=slave2 + SLAVE_HOST=slave1 + SLAVE_BASEDIR=/opt/PostgresPlus/9.2AS/data + NODE_ID=0 ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:26 START '(/opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh' 0 slave1 5444 /opt/PostgresPlus/9.2AS/data 1 0 slave2 0 '/opt/PostgresPlus/9.2AS/data)' + ssh -T enterprisedb@slave2 /opt/PostgresPlus/9.2AS/bin/pg_ctl promote -D /opt/PostgresPlus/9.2AS/data pg_ctl: サーバを昇進できません。サーバはスタンバイモードではありません。 ++ ssh -i /root/.ssh/id_rsa enterprisedb@slave1 /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh slave2 slave1 /opt/PostgresPlus/9.2AS/data + MSG='/opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 22: /var/log/pgpool/follow_master.log: そのようなファイルやディレクトリはありません touch: cannot touch `SRHS_follow_sub.sh.lock'\'': 許可がありません /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 65: /var/log/pgpool/follow_master.log: そのようなファイルやディ レクトリはありません サーバの起動完了を待っています....完了 サーバ起動完了 /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 75: /var/log/pgpool/follow_master.log: そのようなファイルやディ レクトリはありません' + RC=0 + '[' 0 '!=' 0 ']' ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:28 PostgreSQL 'Restarted(slave1)' + /opt/PostgresPlus/9.2AS/bin/pcp_attach_node 10 localhost 9898 enterprisedb redhat 0 + RC=0 + '[' 0 '!=' 0 ']' ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:28 END '(/opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh)' + exit 0 + LOGFILE=/var/log/pgpool/follow_master.log + PCP_ATTACH_NODE=/opt/PostgresPlus/9.2AS/bin/pcp_attach_node + FOLLOW_CMD=/opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh + PG_CTL=/opt/PostgresPlus/9.2AS/bin/pg_ctl + IDENTITY_FILE=/root/.ssh/id_rsa + TIMEOUT=10 + HOST=localhost + PORT=9898 + USERID=enterprisedb + PASSWD=redhat + FAILED_NODE_ID=2 + FAILED_NODE_NAME=slave3 + FAILED_NODE_PORT=5444 + FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + NEW_MASTER_ID=1 + OLD_MASTER_ID=0 + NEW_MASTER_NAME=slave1 + OLD_PRIMARY_ID=0 + NEW_MASTER_DATA=/opt/PostgresPlus/9.2AS/data ++ date + echo $'2013\345\271\264' $'4\346\234\210' $'25\346\227\245' $'\346\234\250\346\233\234\346\227\245' 21:06:28 JST + echo FAILED_NODE_ID =2 + echo FAILED_NODE_NAME=slave3 + echo FAILED_NODE_PORT=5444 + echo FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data + echo NEW_MASTER_ID =1 + echo OLD_MASTER_ID =0 + echo NEW_MASTER_NAME =slave1 + echo OLD_PRIMARY_ID =0 + echo NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data + NEW_MASTER_HOST=slave1 + SLAVE_HOST=slave3 + SLAVE_BASEDIR=/opt/PostgresPlus/9.2AS/data + NODE_ID=2 ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:28 START '(/opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh' 2 slave3 5444 /opt/PostgresPlus/9.2AS/data 1 0 slave1 0 '/opt/PostgresPlus/9.2AS/data)' + ssh -T enterprisedb@slave1 /opt/PostgresPlus/9.2AS/bin/pg_ctl promote -D /opt/PostgresPlus/9.2AS/data サーバを昇進中です。 ++ ssh -i /root/.ssh/id_rsa enterprisedb@slave3 /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh slave1 slave3 /opt/PostgresPlus/9.2AS/data + MSG='/opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 22: /var/log/pgpool/follow_master.log: そのようなファイルやディレクトリはありません touch: cannot touch `SRHS_follow_sub.sh.lock'\'': 許可がありません /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 65: /var/log/pgpool/follow_master.log: そのようなファイルやディ レクトリはありません サーバの起動完了を待っています....完了 サーバ起動完了 /opt/PostgresPlus/9.2AS/data/SRHS_follow_sub.sh: line 75: /var/log/pgpool/follow_master.log: そのようなファイルやディ レクトリはありません' + RC=0 + '[' 0 '!=' 0 ']' ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:30 PostgreSQL 'Restarted(slave3)' + /opt/PostgresPlus/9.2AS/bin/pcp_attach_node 10 localhost 9898 enterprisedb redhat 2 + RC=0 + '[' 0 '!=' 0 ']' ++ date '+%Y/%m/%d %H:%M:%S' + echo 2013/04/25 21:06:30 END '(/opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh)' + exit 0 --- /var/log/messages --- Apr 25 21:06:25 master1 pgpool[6647]: connect_inet_domain_socket: connect() failed: Connection refused Apr 25 21:06:25 master1 pgpool[6647]: make_persistent_db_connection: connection to slave1(5444) failed Apr 25 21:06:25 master1 pgpool[6647]: connect_inet_domain_socket: connect() failed: Connection refused Apr 25 21:06:25 master1 pgpool[6647]: make_persistent_db_connection: connection to slave1(5444) failed Apr 25 21:06:25 master1 pgpool[6647]: health check failed. 0 th host slave1 at port 5444 is down Apr 25 21:06:25 master1 pgpool[6647]: set 0 th backend down status Apr 25 21:06:25 master1 pgpool[6647]: starting degeneration. shutdown host slave1(5444) Apr 25 21:06:25 master1 pgpool[6647]: Restart all children Apr 25 21:06:25 master1 pgpool[6647]: execute command: /opt/PostgresPlus/9.2AS/data/SRHS_failover.sh 0 "slave1" 5444 /opt/PostgresPlus/9.2AS/data 1 0 "slave2" 0 /opt/PostgresPlus/9.2AS/data Apr 25 21:06:25 master1 pgpool[6647]: find_primary_node_repeatedly: waiting for finding a primary node Apr 25 21:06:26 master1 pgpool[6647]: find_primary_node: primary node id is 1 Apr 25 21:06:26 master1 pgpool[6647]: starting follow degeneration. shutdown host slave1(5444) Apr 25 21:06:26 master1 pgpool[6647]: starting follow degeneration. shutdown host slave3(5444) Apr 25 21:06:26 master1 pgpool[6647]: failover: 2 follow backends have been degenerated Apr 25 21:06:26 master1 pgpool[6745]: start triggering follow command. Apr 25 21:06:26 master1 pgpool[6745]: execute command: /opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh 0 "slave1" 5444 /opt/PostgresPlus/9.2AS/data 1 0 "slave2" 0 /opt/PostgresPlus/9.2AS/data Apr 25 21:06:26 master1 pgpool[6647]: failover: set new primary node: 1 Apr 25 21:06:26 master1 pgpool[6647]: failover: set new master node: 1 Apr 25 21:06:26 master1 pgpool[6681]: worker process received restart request Apr 25 21:06:26 master1 pgpool[6647]: failover done. shutdown host slave1(5444) Apr 25 21:06:27 master1 pgpool[6647]: worker child 6681 exits with status 256 Apr 25 21:06:27 master1 pgpool[6647]: fork a new worker child pid 6785 Apr 25 21:06:27 master1 pgpool[6680]: pcp child process received restart request Apr 25 21:06:27 master1 pgpool[6647]: PCP child 6680 exits with status 256 Apr 25 21:06:27 master1 pgpool[6647]: fork a new PCP child pid 6786 Apr 25 21:06:28 master1 pgpool[6786]: send_failback_request: fail back 0 th node request from pid 6786 Apr 25 21:06:28 master1 pgpool[6647]: starting fail back. reconnect host slave1(5444) Apr 25 21:06:28 master1 pgpool[6647]: Do not restart children because we are failbacking node id 0 hostslave1 port:5444 and we are in streaming replication mode Apr 25 21:06:28 master1 pgpool[6647]: find_primary_node_repeatedly: waiting for finding a primary node Apr 25 21:06:28 master1 pgpool[6745]: execute command: /opt/PostgresPlus/9.2AS/data/SRHS_follow_master.sh 2 "slave3" 5444 /opt/PostgresPlus/9.2AS/data 1 0 "slave1" 0 /opt/PostgresPlus/9.2AS/data Apr 25 21:06:28 master1 pgpool[6647]: find_primary_node: primary node id is 1 Apr 25 21:06:28 master1 pgpool[6647]: failover: set new primary node: 1 Apr 25 21:06:28 master1 pgpool[6647]: failover: set new master node: 0 Apr 25 21:06:28 master1 pgpool[6647]: failback done. reconnect host slave1(5444) Apr 25 21:06:28 master1 pgpool[6785]: worker process received restart request Apr 25 21:06:29 master1 pgpool[6647]: worker child 6785 exits with status 256 Apr 25 21:06:29 master1 pgpool[6647]: fork a new worker child pid 6796 Apr 25 21:06:29 master1 pgpool[6786]: pcp child process received restart request Apr 25 21:06:29 master1 pgpool[6647]: PCP child 6786 exits with status 256 Apr 25 21:06:29 master1 pgpool[6647]: fork a new PCP child pid 6797 Apr 25 21:06:30 master1 pgpool[6797]: send_failback_request: fail back 2 th node request from pid 6797 Apr 25 21:06:30 master1 pgpool[6647]: starting fail back. reconnect host slave3(5444) Apr 25 21:06:30 master1 pgpool[6647]: Do not restart children because we are failbacking node id 2 hostslave3 port:5444 and we are in streaming replication mode Apr 25 21:06:30 master1 pgpool[6647]: find_primary_node_repeatedly: waiting for finding a primary node Apr 25 21:06:30 master1 pgpool[6647]: find_primary_node: primary node id is 0 Apr 25 21:06:30 master1 pgpool[6647]: failover: set new primary node: 0 Apr 25 21:06:30 master1 pgpool[6647]: failover: set new master node: 0 Apr 25 21:06:30 master1 pgpool[6647]: failback done. reconnect host slave3(5444) Apr 25 21:06:30 master1 pgpool[6796]: worker process received restart request Apr 25 21:06:31 master1 pgpool[6647]: worker child 6796 exits with status 256 Apr 25 21:06:31 master1 pgpool[6647]: fork a new worker child pid 6801 Apr 25 21:06:31 master1 pgpool[6797]: pcp child process received restart request Apr 25 21:06:31 master1 pgpool[6647]: PCP child 6797 exits with status 256 Apr 25 21:06:31 master1 pgpool[6647]: fork a new PCP child pid 6802 --- -bash-3.2$ cat /tmp/hoge.hogege 2013年 4月 25日 木曜日 21:06:25 JST FAILED_NODE_ID =0 FAILED_NODE_NAME=slave1 FAILED_NODE_PORT=5444 FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data NEW_MASTER_ID =1 OLD_MASTER_ID =0 NEW_MASTER_NAME =slave2 OLD_PRIMARY_ID =0 NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data 0 -bash-3.2$ cat /tmp/fuga.fugaga 2013年 4月 25日 木曜日 21:06:26 JST FAILED_NODE_ID =0 FAILED_NODE_NAME=slave1 FAILED_NODE_PORT=5444 FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data NEW_MASTER_ID =1 OLD_MASTER_ID =0 NEW_MASTER_NAME =slave2 OLD_PRIMARY_ID =0 NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data 2013年 4月 25日 木曜日 21:06:28 JST FAILED_NODE_ID =2 FAILED_NODE_NAME=slave3 FAILED_NODE_PORT=5444 FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data NEW_MASTER_ID =1 OLD_MASTER_ID =0 NEW_MASTER_NAME =slave1 <-- *** This should be slave2 *** OLD_PRIMARY_ID =0 NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data -bash-3.2$ Hope this will help.