<div dir="ltr">Hello Bo<div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 24, 2017 at 2:59 AM, Bo Peng <span dir="ltr">&lt;<a href="mailto:pengbo@sraoss.co.jp" target="_blank">pengbo@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

<br>

&gt; &gt; &gt; Then, Postgresql service is started in node 0.<br>

&gt; &gt; &gt; node 0 is standby, with status 3<br>

&gt; &gt; &gt; node 1 is primary, with status 2<br>

<br>

How did you start the node0 in this step?<br>

I think the &quot;recovery&quot; step is missed by you.<br>

<br></blockquote><div><br></div><div>Excuse me, I think my explanation is not complete.</div><div>In this step I start node0 with command &quot;systemctl start postgresql-9.6&quot;.</div><div>Previously I have not recovered node0 because I have started replication between both nodes. I know that in a production environment it is necessary a replication between both nodes, but I did not make this replication because I wanted to check that pcp_attach_command will fail and node0 will continue with status 3 because no replication between nodes.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

&gt; I have checked these steps but attaching node1 (after failover without<br>

&gt; recovery) instead of node 0, and I can&#39;t reproduce this situation with node<br>

&gt; 1. Do you know if this behaviour is by desing of Pgpool? Why is it<br>

&gt; necessary to use pcp_recovery_node  instead of pcp_attach_node?<br>

<br>

If you recover the node as a standby and then attach it to pgpool,<br>

&quot;pcp_recovery_node&quot; is not necessary.<br>

<br>

<br>

Let&#39;s confirm the scenario of doing failover and recover downed backend node as a standby.<br>

<br>

1. Star node0 and node1<br>

<br>

   node0 : primay<br>

   node1 : standby<br>

<br>

2. Stop node0, and failover occurs<br>

<br>

   node0 : down<br>

   node1 : primary  &lt;= failover<br>

<br>

3. Recover node0 as standby<br>

<br>

   node0 : standby<br>

   node1 : primary<br>

<br>

   There are two ways to recover the downed node.<br>

<br>

    (1) Recover node0 as standby by using &quot;pcp_recovery_node&quot;.<br>

<br>

        &quot;pcp_recovery_node&quot; will recover the downed node and attach it to pgpool.<br>

        But to use the commad,you need configure &#39;recovery_1st_stage_command&#39; parameter.<br>

<br>

        Please see the following document for more details about configuring Pgpool-II online recovery.<br>

<br>

        <a href="http://www.pgpool.net/docs/latest/en/html/example-cluster.html" rel="noreferrer" target="_blank">http://www.pgpool.net/docs/<wbr>latest/en/html/example-<wbr>cluster.html</a><br>

<br>

    (2) Recover node0 as a standby by using such as &quot;pg_basebackup&quot; command,<br>

        then attach the node to pgpool. Because pgpool has already dettach the<br>

        node, you need attach the node to pgpool again, to let pgpool know the node.<br>

        Without attach node, the status of the node will be &quot;down&quot;, even if it is running as standby.<br>

<br>

   If you just start the downed PostgreSQL node by using &quot;pg_ctl start&quot; without recovery,<br>

   the node will be started as a primary.<br>

<br></blockquote><div><br></div><div>Hello Bo.</div><div>Thank you for your explanation. </div><div>I use way number 2, with a shell script with Postgresql pg_basebackup command</div><div>For this test, I did not use any recovery way, only I started Postgresql database service, and I run pcp_attach_command on node0. I know it is incorrect. Only I want to check that I run pcp_attach_command on any node and previously I don&#39;t have a replication between both node, then pgpool will fail during attach process. I don&#39;t understand why pcpool make node0 as primary and node1 as standby if previously node0 is standby and node1 is primary and there is no replication between nodes.</div><div>Excuse me because I think my first email is not enough clear</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

On Mon, 23 Oct 2017 22:13:44 +0200<br>

Lucas Luengas &lt;<a href="mailto:lucasluengas@gmail.com">lucasluengas@gmail.com</a>&gt; wrote:<br>

<br>

&gt; Hello Bo.<br>

&gt; Thank you for your answer.<br>

&gt;<br>

&gt; I have checked these steps but attaching node1 (after failover without<br>

&gt; recovery) instead of node 0, and I can&#39;t reproduce this situation with node<br>

&gt; 1. Do you know if this behaviour is by desing of Pgpool? Why is it<br>

&gt; necessary to use pcp_recovery_node  instead of pcp_attach_node?<br>

&gt;<br>

&gt; Kind regards.<br>

&gt;<br>

&gt; On Sat, Oct 21, 2017 at 1:30 AM, Bo Peng &lt;<a href="mailto:pengbo@sraoss.co.jp">pengbo@sraoss.co.jp</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt; Hello,<br>

&gt; &gt;<br>

&gt; &gt; If you want to start node0 (old primary) as standby,<br>

&gt; &gt; you should use pcp_recovery_node to recovery node0 as standby.<br>

&gt; &gt;<br>

&gt; &gt; If you just restart node0 after failover without recovery,<br>

&gt; &gt; it will run as primary.<br>

&gt; &gt;<br>

&gt; &gt; On Fri, 20 Oct 2017 18:41:36 +0200<br>

&gt; &gt; Lucas Luengas &lt;<a href="mailto:lucasluengas@gmail.com">lucasluengas@gmail.com</a>&gt; wrote:<br>

&gt; &gt;<br>

&gt; &gt; &gt; Hello<br>

&gt; &gt; &gt; I am testing Pgpool 3.4.13 with Postgresql-9.6, with streaming<br>

&gt; &gt; replication<br>

&gt; &gt; &gt; and watchdog, on Centos 7. I have two server. Every server has installed<br>

&gt; &gt; &gt; Pgpool and Postgresql. I have installed pgpool from yum repository.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Node 0 is primary, with status 2<br>

&gt; &gt; &gt; Node 1 is standby, with status 2<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; If Postgresql service is stopped in node 0, then:<br>

&gt; &gt; &gt; node 0 is standby, with status 3<br>

&gt; &gt; &gt; node 1 is primary, with status 2. (failover)<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Then, Postgresql service is started in node 0.<br>

&gt; &gt; &gt; node 0 is standby, with status 3<br>

&gt; &gt; &gt; node 1 is primary, with status 2<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Then, I attach node 0 using pcp_attach_node command.<br>

&gt; &gt; &gt; node 0 is primary, with status 2.<br>

&gt; &gt; &gt; node 1 is standby, with status 2.<br>

&gt; &gt; &gt; Node 0 was changed to primary and node 1 was changed to standby. Why ?<br>

&gt; &gt; Do I<br>

&gt; &gt; &gt; have any error in my setup?<br>

&gt; &gt; &gt; I think the correct result should be:<br>

&gt; &gt; &gt; node 0 is standby, with status 2<br>

&gt; &gt; &gt; node 1 is primary, with status 2<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I have repeated previous steps with pgpool 3.4.12, 3,4.11, 3.4.10 and<br>

&gt; &gt; 3.4.9<br>

&gt; &gt; &gt; with same configuration and same server. I get same results.<br>

&gt; &gt; &gt; Also, I have repeated step with pgpool 3.6.6 and I get same results.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Some log lines during fallback<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Oct 20 13:41:03 localhost pgpool[9687]: [128-1] 2017-10-20 13:41:03: pid<br>

&gt; &gt; &gt; 9687: LOG:  received failback request for node_id: 0 from pid [9687]<br>

&gt; &gt; &gt; Oct 20 13:41:03 localhost pgpool[4913]: [255-1] 2017-10-20 13:41:03: pid<br>

&gt; &gt; &gt; 4913: LOG:  watchdog notifying to start interlocking<br>

&gt; &gt; &gt; Oct 20 13:41:03 localhost pgpool[4913]: [256-1] 2017-10-20 13:41:03: pid<br>

&gt; &gt; &gt; 4913: LOG:  starting fail back. reconnect host 192.168.0.136(5432)<br>

&gt; &gt; &gt; Oct 20 13:41:03 localhost pgpool[4913]: [257-1] 2017-10-20 13:41:03: pid<br>

&gt; &gt; &gt; 4913: LOG:  Node 1 is not down (status: 2)<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [258-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  Do not restart children because we are failbacking node id 0<br>

&gt; &gt; &gt; host: 192.168.0.136 port: 5432 and we are in streaming replication mode<br>

&gt; &gt; and<br>

&gt; &gt; &gt; not all backends were down<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [259-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  find_primary_node_repeatedly: waiting for finding a primary<br>

&gt; &gt; node<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [260-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  find_primary_node: checking backend no 0<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [260-2]<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [261-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  find_primary_node: primary node id is 0<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [262-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  watchdog notifying to end interlocking<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [263-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  failover: set new primary node: 0<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [264-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  failover: set new master node: 0<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[4913]: [265-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 4913: LOG:  failback done. reconnect host 192.168.0.136(5432)<br>

&gt; &gt; &gt; Oct 20 13:41:04 localhost pgpool[9688]: [194-1] 2017-10-20 13:41:04: pid<br>

&gt; &gt; &gt; 9688: LOG:  worker process received restart request<br>

&gt; &gt; &gt; Oct 20 13:41:05 localhost pgpool[9687]: [129-1] 2017-10-20 13:41:05: pid<br>

&gt; &gt; &gt; 9687: LOG:  restart request received in pcp child process<br>

&gt; &gt; &gt; Oct 20 13:41:05 localhost pgpool[4913]: [266-1] 2017-10-20 13:41:05: pid<br>

&gt; &gt; &gt; 4913: LOG:  PCP child 9687 exits with status 256 in failover()<br>

&gt; &gt; &gt; Oct 20 13:41:05 localhost pgpool[4913]: [267-1] 2017-10-20 13:41:05: pid<br>

&gt; &gt; &gt; 4913: LOG:  fork a new PCP child pid 10410 in failover()<br>

&gt; &gt; &gt; Oct 20 13:41:05 localhost pgpool[4913]: [268-1] 2017-10-20 13:41:05: pid<br>

&gt; &gt; &gt; 4913: LOG:  worker child process with pid: 9688 exits with status 256<br>

&gt; &gt; &gt; Oct 20 13:41:05 localhost pgpool[4913]: [269-1] 2017-10-20 13:41:05: pid<br>

&gt; &gt; &gt; 4913: LOG:  fork a new worker child process with pid: 10411<br>

&gt; &gt; &gt; Oct 20 13:41:10 localhost pgpool[9692]: [202-1] 2017-10-20 13:41:10: pid<br>

&gt; &gt; &gt; 9692: LOG:  selecting backend connection<br>

&gt; &gt; &gt; Oct 20 13:41:10 localhost pgpool[9692]: [202-2] 2017-10-20 13:41:10: pid<br>

&gt; &gt; &gt; 9692: DETAIL:  failback event detected, discarding existing connections<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Kind regards<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; --<br>

&gt; &gt; Bo Peng &lt;<a href="mailto:pengbo@sraoss.co.jp">pengbo@sraoss.co.jp</a>&gt;<br>

&gt; &gt; SRA OSS, Inc. Japan<br>

&gt; &gt;<br>

&gt; &gt;<br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

--<br>

Bo Peng &lt;<a href="mailto:pengbo@sraoss.co.jp">pengbo@sraoss.co.jp</a>&gt;<br>

SRA OSS, Inc. Japan<br>

<br>

</font></span></blockquote></div><br></div></div>