<div dir="ltr">Hi,<div>still after modifing the setuid bit on &#39;ifconfig&#39; and &#39;arping&#39; commands, when rebooting the primary node the 2nd node gets promoted (vip) and the faulty node boots up as primary:</div><div><br></div><div><br></div><div><div>tail: /var/log/pgpool.log: file truncated</div><div>2016-08-11 20:45:40: pid 1789: LOG:  reading status file: 0 th backend is set to down status</div><div>2016-08-11 20:45:40: pid 1789: LOG:  waiting for watchdog to initialize</div><div>2016-08-11 20:45:40: pid 1795: LOG:  setting the local watchdog node name to &quot;Linux_mgrdb84_9999&quot;</div><div>2016-08-11 20:45:40: pid 1795: LOG:  watchdog cluster configured with 1 remote nodes</div><div>2016-08-11 20:45:40: pid 1795: LOG:  watchdog remote node:0 on <a href="http://1.1.1.85:9000">1.1.1.85:9000</a></div><div>2016-08-11 20:45:40: pid 1795: LOG:  interface monitoring is disabled in watchdog</div><div><b>2016-08-11 20:45:40: pid 1795: LOG:  IPC socket path: &quot;/tmp/.s.PGPOOLWD_CMD.9000&quot;</b></div><div><b>2016-08-11 20:45:45: pid 1795: LOG:  watchdog node state changed from [LOADING] to [JOINING]</b></div><div><b>2016-08-11 20:45:50: pid 1795: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]</b></div><div><b>2016-08-11 20:45:51: pid 1795: LOG:  I am the only alive node in the watchdog cluster</b></div><div><b>2016-08-11 20:45:51: pid 1795: HINT:  skiping stand for coordinator state</b></div><div><b>2016-08-11 20:45:51: pid 1795: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]</b></div><div><b>2016-08-11 20:45:51: pid 1795: LOG:  I am announcing my self as master/coordinator watchdog node</b></div><div><b>2016-08-11 20:45:54: pid 1795: LOG:  new outbond connection to <a href="http://1.1.1.85:9000">1.1.1.85:9000</a></b></div><div>2016-08-11 20:45:56: pid 1795: LOG:  I am the cluster leader node</div><div>2016-08-11 20:45:56: pid 1795: DETAIL:  our declare coordinator message is accepted by all nodes</div><div>2016-08-11 20:45:56: pid 1795: LOG:  I am the cluster leader node. Starting escalation process</div><div>2016-08-11 20:45:56: pid 1789: LOG:  watchdog process is initialized</div><div>2016-08-11 20:45:56: pid 1795: LOG:  escalation process started with PID:2130</div><div>2016-08-11 20:45:56: pid 2130: LOG:  watchdog: escalation started</div><div>2016-08-11 20:45:56: pid 1789: LOG:  Setting up socket for <a href="http://0.0.0.0:9999">0.0.0.0:9999</a></div><div>2016-08-11 20:45:56: pid 1789: LOG:  Setting up socket for :::9999</div><div>2016-08-11 20:45:56: pid 2131: LOG:  2 watchdog nodes are configured for lifecheck</div><div>2016-08-11 20:45:56: pid 2131: LOG:  watchdog nodes ID:0 Name:&quot;Linux_mgrdb84_9999&quot;</div><div>2016-08-11 20:45:56: pid 2131: DETAIL:  Host:&quot;1.1.1.84&quot; WD Port:9000 pgpool-II port:9999</div><div>2016-08-11 20:45:56: pid 2131: LOG:  watchdog nodes ID:1 Name:&quot;Not_Set&quot;</div><div>2016-08-11 20:45:56: pid 2131: DETAIL:  Host:&quot;1.1.1.85&quot; WD Port:9000 pgpool-II port:9999</div><div>2016-08-11 20:45:56: pid 1789: LOG:  pgpool-II successfully started. version 3.5.3 (ekieboshi)</div><div>2016-08-11 20:45:56: pid 1789: LOG:  find_primary_node: checking backend no 0</div><div>2016-08-11 20:45:56: pid 1789: LOG:  find_primary_node: checking backend no 1</div><div>2016-08-11 20:45:56: pid 1789: LOG:  find_primary_node: primary node id is 1</div><div>2016-08-11 20:45:57: pid 2135: LOG:  createing watchdog heartbeat receive socket.</div><div>2016-08-11 20:45:57: pid 2135: DETAIL:  bind receive socket to device: &quot;eth1&quot;</div><div>2016-08-11 20:45:57: pid 2135: LOG:  set SO_REUSEPORT option to the socket</div><div>2016-08-11 20:45:57: pid 2135: LOG:  creating watchdog heartbeat receive socket.</div><div>2016-08-11 20:45:57: pid 2135: DETAIL:  set SO_REUSEPORT</div><div>2016-08-11 20:45:57: pid 2137: LOG:  creating socket for sending heartbeat</div><div>2016-08-11 20:45:57: pid 2137: DETAIL:  bind send socket to device: eth1</div><div>2016-08-11 20:45:57: pid 2137: LOG:  set SO_REUSEPORT option to the socket</div><div>2016-08-11 20:45:57: pid 2137: LOG:  creating socket for sending heartbeat</div><div>2016-08-11 20:45:57: pid 2137: DETAIL:  set SO_REUSEPORT</div><div>2016-08-11 20:45:58: pid 2130: WARNING:  watchdog failed to bring up delegate IP, &#39;if_up_cmd&#39; failed</div><div>2016-08-11 20:45:58: pid 2130: WARNING:  watchdog de-escalation failed to bring down delegate IP</div><div>2016-08-11 20:45:58: pid 1795: LOG:  watchdog escalation process with pid: 2130 exit with SUCCESS.</div><div>2016-08-11 20:47:24: pid 1795: LOG:  new watchdog node connection is received from &quot;<a href="http://1.1.1.85:17053">1.1.1.85:17053</a>&quot;</div><div>2016-08-11 20:47:36: pid 2131: LOG:  watchdog: lifecheck started</div></div><div><br></div><div><br></div><div>please advise...</div><div>cohavisi</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 11, 2016 at 6:31 PM, Daniel Huhardeaux <span dir="ltr">&lt;<a href="mailto:tech@tootai.net" target="_blank">tech@tootai.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello<span class=""><br>

<br>

Le 11/08/2016 à 16:46, Shay Cohavi a écrit :<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

When performing an restart on a primary pgpool node, and vip transfer to<br>

the 2nd node, but when the faulty primary boots up, it  declare itself<br>

as the only node in the cluster and brings up the VIP (duplicate IP)!!<br>

<br>

<br>

the 1st node (startup):<br>

2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of if_up_cmd<br>

2016-08-11 17:31:36: pid 1761: DETAIL:  ifup[/sbin/ifconfig] doesn&#39;t<br>

have setuid bit<br>

2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of if_down_cmd<br>

2016-08-11 17:31:36: pid 1761: DETAIL:  ifdown[/sbin/ifconfig] doesn&#39;t<br>

have setuid bit<br>

2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of arping<br>

command<br>

2016-08-11 17:31:36: pid 1761: DETAIL:  arping[/sbin/arping] doesn&#39;t<br>

have setuid bit<br>

</blockquote>

<br></span>

Answer to your problem is here, set setuid bit<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

the 2nd node:<br>

<br>

*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of if_up_cmd*<span class=""><br>

2016-08-11 17:17:34: pid 16256: DETAIL:  ifup[/sbin/ifconfig] doesn&#39;t<br>

have setuid bit<br></span>

*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of<br>

if_down_cmd*<span class=""><br>

2016-08-11 17:17:34: pid 16256: DETAIL:  ifdown[/sbin/ifconfig] doesn&#39;t<br>

have setuid bit<br></span>

*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of arping<br>

command*<span class=""><br>

2016-08-11 17:17:34: pid 16256: DETAIL:  arping[/sbin/arping] doesn&#39;t<br>

have setuid bit<br>

</span></blockquote>

<br>

Same here<br>

<br>

BTW i hope that both servers are not connected to Internet, IP range you are using are not in private range, they belong to APNIC.<br>

<br>

Regards<span class="HOEnZb"><font color="#888888"><br>

<br>

Daniel<br>

-- <br>

TOOTAi Networks<br>

______________________________<wbr>_________________<br>

pgpool-general mailing list<br>

<a href="mailto:pgpool-general@pgpool.net" target="_blank">pgpool-general@pgpool.net</a><br>

<a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/<wbr>listinfo/pgpool-general</a><br>

</font></span></blockquote></div><br></div>