<div dir="ltr"><span style="font-size:12.8px">Hi,</span><div style="font-size:12.8px">I have 2 postgres servers (postgres 9.3.5) installed with pgpool(3.6.6) on the same node.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">server1 192.168.15.55</div><div style="font-size:12.8px">server 2 192.168.15.56</div><div style="font-size:12.8px">pgpool vip 192.168.15.59</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><img src="cid:ii_j5440tpw2_15d420772c4ababb" width="544" height="408" class="gmail-CToWUd gmail-a6T" tabindex="0"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><div style="font-size:12.8px"># psql -U postgres -h 192.168.15.59 -p 9999 -c &#39;show pool_nodes&#39;</div><div style="font-size:12.8px"> node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay</div><div style="font-size:12.8px">---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------</div><div style="font-size:12.8px"> 0       | 192.168.15.55 | 5432 | up     | 0.500000  | standby | 0          | false             | 0</div><div style="font-size:12.8px"> 1       | 192.168.15.56 | 5432 | up     | 0.500000  | primary | 0          | true              | 0</div><div style="font-size:12.8px">(2 rows)</div></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">my question is about disaster recovery scenario:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">1. <b>&quot;primary postgres&quot; and the &quot;master pgpool&quot; is on server1</b> . pgpool vip established on this node (192.168.15.59)</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">2. server1 went down due to  power break. (server1 is dead!!!)</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">3. pgpool on server server2 get promoted (vip established 192.168.15.59) .</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><font color="#000000">4. postgres failover action occurs by the pgpool. postgres is being promoted to primary.</font></div><div style="font-size:12.8px"><font color="#000000"><br></font></div><div><font color="#000000" style="font-size:12.8px">5. staring up the faulty node , pgpool started as STANDBY. and postgres is being recovered vis </font><font color="#000000"><span style="font-size:12.8px"><b>pcp_recovery_node</b> tool!</span></font></div><div style="font-size:12.8px"><font color="#000000"><br></font></div><div style="font-size:12.8px"><font color="#ff0000" style="background-color:rgb(255,255,0)"><b>when performing this action multiple times...(crashing the MASTER pgpool &amp; primary postgres)</b></font></div><div style="font-size:12.8px"><b><font color="#ff0000" style="background-color:rgb(255,255,0)"><br></font></b></div><div style="font-size:12.8px"><font color="#ff0000"><b style="background-color:rgb(255,255,0)">example:</b></font></div><div style="font-size:12.8px"><font color="#ff0000"><b style="background-color:rgb(255,255,0)">server1 (STANDBY pgpool and secondary postgres)</b></font></div><div style="font-size:12.8px"><font color="#ff0000"><b style="background-color:rgb(255,255,0)">server2 (MASTER pgpool and primary postgres)</b></font></div><div style="font-size:12.8px"><font color="#ff0000"><b style="background-color:rgb(255,255,0)"><br></b></font></div><div style="font-size:12.8px"><b><font color="#ff0000" style="background-color:rgb(255,255,0)">the STANDBY pgpool node (server1) is not being elected as MASTER and no failback to the postgres.</font></b></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><b><font color="#ff0000">only when starting up again the faulty node... (server2) the STANBY pgpool (server1) is being elected as MASTER and server2 is being elected as STANDBY but no primary postgres is available due to lack of failback command !!!</font></b></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><b><br></b></div><div style="font-size:12.8px"><b>logs attached as well as pgpool.conf file of both servers!</b></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">log of the STANDBY pgpool which has not been elected as MASTER:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><div style="font-size:12.8px"><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: WARNING:  checking setuid bit of if_up_cmd</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: DETAIL:  ifup[/sbin/ifconfig] doesn&#39;t have setuid bit</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: WARNING:  checking setuid bit of if_down_cmd</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: DETAIL:  ifdown[/sbin/ifconfig] doesn&#39;t have setuid bit</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: WARNING:  checking setuid bit of arping command</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: DETAIL:  arping[/usr/sbin/arping] doesn&#39;t have setuid bit</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: LOG:  waiting for watchdog to initialize</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: LOG:  setting the local watchdog node name to &quot;<a href="http://1.1.1.55:9999">1.1.1.55:9999</a> Linux mgrdb-55&quot;</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: LOG:  watchdog cluster is configured with 1 remote nodes</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: LOG:  watchdog remote node:0 on <a href="http://1.1.1.56:9000">1.1.1.56:9000</a></div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: LOG:  interface monitoring is disabled in watchdog</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: FATAL:  failed to create watchdog command server socket</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2016: DETAIL:  bind on &quot;/tmp/.s.PGPOOLWD_CMD.9000&quot; failed with reason: &quot;Address already in use&quot;</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: LOG:  watchdog child process with pid: 2016 exits with status 768</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2014: FATAL:  watchdog child process exit with fatal error. exiting pgpool-II</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  setting the local watchdog node name to &quot;<a href="http://1.1.1.55:9999">1.1.1.55:9999</a> Linux mgrdb-55&quot;</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  watchdog cluster is configured with 1 remote nodes</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  watchdog remote node:0 on <a href="http://1.1.1.56:9000">1.1.1.56:9000</a></div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  interface monitoring is disabled in watchdog</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  watchdog node state changed from [DEAD] to [LOADING]</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  new outbond connection to <a href="http://1.1.1.56:9000">1.1.1.56:9000</a></div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  setting the remote node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; as watchdog cluster master</div><div style="font-size:12.8px">2017-09-21 09:53:08: pid 2018: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]</div><div style="font-size:12.8px">2017-09-21 09:53:09: pid 2018: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]</div><div style="font-size:12.8px">2017-09-21 09:53:09: pid 2018: LOG:  successfully joined the watchdog cluster as standby node</div><div style="font-size:12.8px">2017-09-21 09:53:09: pid 2018: DETAIL:  our join coordinator request is accepted by cluster leader node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot;</div><div style="font-size:12.8px">2017-09-21 09:53:10: pid 2018: LOG:  new watchdog node connection is received from &quot;<a href="http://1.1.1.56:40168">1.1.1.56:40168</a>&quot;</div><div style="font-size:12.8px">2017-09-21 09:53:10: pid 2018: LOG:  new node joined the cluster hostname:&quot;1.1.1.56&quot; port:9000 pgpool_port:9999</div><div style="font-size:12.8px">2017-09-21 09:55:01: pid 2018: LOG:  watchdog received the failover command from &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot;</div><div style="font-size:12.8px">2017-09-21 09:55:01: pid 2018: LOG:  received failback request for node_id: 0 from pid [2018] wd_failover_id [42]</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><b><font color="#ff0000"><br></font></b></div><div style="font-size:12.8px"><b><font color="#ff0000">when starting the faulty node again(server2).. this node elected as MASTER:</font></b></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  new watchdog node connection is received from &quot;<a href="http://1.1.1.56:30363">1.1.1.56:30363</a>&quot;</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  new node joined the cluster hostname:&quot;1.1.1.56&quot; port:9000 pgpool_port:9999</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  failed to write watchdog packet to socket</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: DETAIL:  Connection reset by peer</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: WARNING:  the coordinator as per our record is not coordinator anymore</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: DETAIL:  re-initializing the cluster</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  watchdog node state changed from [STANDBY] to [JOINING]</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  unassigning the remote node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; from watchdog cluster master</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  new outbond connection to <a href="http://1.1.1.56:9000">1.1.1.56:9000</a></div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]</div><div style="font-size:12.8px">2017-09-21 10:49:11: pid 2018: LOG:  remote node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; is shutting down</div><div style="font-size:12.8px">2017-09-21 10:49:12: pid 2018: LOG:  I am the only alive node in the watchdog cluster</div><div style="font-size:12.8px">2017-09-21 10:49:12: pid 2018: HINT:  skiping stand for coordinator state</div><div style="font-size:12.8px">2017-09-21 10:49:12: pid 2018: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]</div><div style="font-size:12.8px">2017-09-21 10:49:12: pid 2018: LOG:  I am announcing my self as master/coordinator watchdog node</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 2018: LOG:  I am the cluster leader node</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 2018: DETAIL:  our declare coordinator message is accepted by all nodes</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 2018: LOG:  setting the local node &quot;<a href="http://1.1.1.55:9999">1.1.1.55:9999</a> Linux mgrdb-55&quot; as watchdog cluster master</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 2018: LOG:  I am the cluster leader node. Starting escalation process</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 2018: LOG:  escalation process started with PID:10980</div><div style="font-size:12.8px">2017-09-21 10:49:16: pid 10980: LOG:  watchdog: escalation started</div><div style="font-size:12.8px">2017-09-21 10:49:20: pid 10980: LOG:  successfully acquired the delegate IP:&quot;192.168.15.59&quot;</div></div></div><div style="font-size:12.8px"><div style="font-size:12.8px">2017-09-21 10:49:20: pid 10980: DETAIL:  &#39;if_up_cmd&#39; returned with success</div><div style="font-size:12.8px">2017-09-21 10:49:20: pid 2018: LOG:  watchdog escalation process with pid: 10980 exit with SUCCESS.</div><div style="font-size:12.8px">2017-09-21 10:49:25: pid 2018: LOG:  new watchdog node connection is received from &quot;<a href="http://1.1.1.56:32923">1.1.1.56:32923</a>&quot;</div><div style="font-size:12.8px">2017-09-21 10:49:25: pid 2018: LOG:  new node joined the cluster hostname:&quot;1.1.1.56&quot; port:9000 pgpool_port:9999</div><div style="font-size:12.8px">2017-09-21 10:49:25: pid 2018: LOG:  new outbond connection to <a href="http://1.1.1.56:9000">1.1.1.56:9000</a></div><div style="font-size:12.8px">2017-09-21 10:49:26: pid 2018: LOG:  adding watchdog node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; to the standby list</div><div style="font-size:12.8px">2017-09-21 10:50:15: pid 2018: LOG:  remote node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; is shutting down</div><div style="font-size:12.8px">2017-09-21 10:50:15: pid 2018: LOG:  removing watchdog node &quot;<a href="http://1.1.1.56:9999">1.1.1.56:9999</a> Linux mgrdb-56&quot; from the standby list</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div></div><div style="font-size:12.8px">Thanks,</div><div style="font-size:12.8px">cohavisi </div></div>