<div dir="ltr"><div>I agree. This shouldn&#39;t be so complicated.</div><div><br></div><div dir="ltr">Since I&#39;m using sed to repoint slave in follow_master script by updating recovery.conf if the command fails I&#39;m not re-starting and re-attaching the node. Kill two birds with one stone :-)<div><br></div><div>Here&#39;w what I&#39;m testing now:</div><div>ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@{detached_node_host} -T &quot;sed -i &#39;s/host=.*sslmode=/host=${new_master_node_host} port=5432 sslmode=/g&#39; /var/lib/pgsql/10/data/recovery.conf&quot; &gt;&gt; $LOGFILE<br></div><div>repoint_status=$?</div><div><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">if [ ${repoint_status} -eq 0 ]; then</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">      //restart</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">      //reattach</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">else</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">     // WARNING: this could be restarted master so there&#39;s no recovery.conf</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">     // CONSIDERATION: Should I shut it down since I don&#39;t want to have two masters running even though Pgpool load balances one???</pre><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt">fi</pre></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 9:44 AM Pierre Timmermans &lt;<a href="mailto:ptim007@yahoo.com">ptim007@yahoo.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_7288807699801252573ydp133919a9yahoo-style-wrap" style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:16px"><div><div>Thank you, it makes sense indeed and I also like to have a relatively long &quot;grace&quot; delay via the health check interval so that If the primary restarts quickly enough there is no failover</div><div><br></div><div>For the case where there is a degenerated master, I have added this code in the follow_master script, it seems to work fine in my tests:</div><div><br></div><div><span>







<p class="gmail-m_7288807699801252573ydp1bcff9b8p1"><span class="gmail-m_7288807699801252573ydp1bcff9b8s1">ssh_options=&quot;ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no&quot;<br></span></p><div>in_reco=$( $ssh_options postgres@${HOSTNAME} &#39;psql -t -c &quot;select pg_is_in_recovery();&quot;&#39; | head -1 | awk &#39;{print $1}&#39; )</div><div>if [ &quot;a${in_reco}&quot; != &quot;a&quot; ] ; then<br>  echo &quot;Node $HOSTNAME is not in recovery, probably a degenerated master, skip it&quot; | tee -a $LOGFILE<br><span class="gmail-m_7288807699801252573ydp3d1efb43Apple-converted-space">  </span>exit 0<br><span class="gmail-m_7288807699801252573ydp3d1efb43Apple-converted-space"> </span>fi</div><p></p></span></div><div>At the end I believe that pgpool algorithm to choose a primary node (always the node with the lowest id) is the root cause of the problem: pgpool should select the most adequate node (the node that is in recovery and with the lowest gap). Unfortunately I cannot code in &quot;C&quot;, otherwise I would contribute.</div><div><br></div><div class="gmail-m_7288807699801252573ydp133919a9signature">Pierre</div></div>
        <div><br></div><div><br></div>
        
        </div><div id="gmail-m_7288807699801252573ydp477a48d5yahoo_quoted_2054240661" class="gmail-m_7288807699801252573ydp477a48d5yahoo_quoted">
            <div style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:13px;color:rgb(38,40,42)">
                
                <div>
                    On Friday, March 1, 2019, 5:07:06 PM GMT+1, Andre Piwoni &lt;<a href="mailto:apiwoni@webmd.net" target="_blank">apiwoni@webmd.net</a>&gt; wrote:
                </div>
                <div><br></div>
                <div><br></div>
                <div><div id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587"><div><div dir="ltr"><div dir="ltr"><div dir="ltr">FYI,<div><br clear="none"></div><div>One of the things that I have done to minimize impact of restarting the primary is using health check where max_retries x retry_delay_interval allows enough time for the primary to be restarted without triggering failover which may take more time time than restart itself. This is with disabled fail_over_on_backend_error</div><div><br clear="none"></div><div>Andre</div></div></div><br clear="none"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_quote"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587yqt6802907090" id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587yqtfd35264"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_attr" dir="ltr">On Fri, Mar 1, 2019 at 7:58 AM Andre Piwoni &lt;<a shape="rect" href="mailto:apiwoni@webmd.net" rel="nofollow" target="_blank">apiwoni@webmd.net</a>&gt; wrote:<br clear="none"></div><blockquote class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi Pierre,<div><br clear="none"></div><div>Hmmm? I have not covered the case you described which is restart of the primary on node 0, resulting failover ans subsequent restart of new primary on node 1 which results in calling follow_master on node 0. In my case I was shutting down node 0 which resulted in follow_master being called on it after second failover since I was not checking if node 0 was running. In your case, node 0 is running since it has been restarted.</div><div><br clear="none"></div><div>Here&#39;s part of my script that I have to improve given your case:</div><div><br clear="none"></div><div><div>ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@${detached_node_host} -T &quot;/usr/pgsql-10/bin/pgctl -D /var/lib/pgsql/10/data status&quot; | grep &quot;is running&quot;</div><div>running_status=$?</div><div><br clear="none"></div><div>if [ ${running_status} -eq 0 ]; then</div><div>        // TODO: Check if recovery.conf exists or pg_is_in_recovery() on ${detached_node_host} and exit if this is not a slave node</div><div><span style="white-space:pre-wrap">        </span>// repoint to new master ${new_master_node_host}</div><div><span style="white-space:pre-wrap">        </span>// restart ${detached_node_host} </div><div><span style="white-space:pre-wrap">        </span>// reattach restarted node with pcp_attach_node</div><div>else</div><div><span style="white-space:pre-wrap">        </span>// do nothing since this could be old slave or primary that needs to be recovered or node in maintenance mode etc.</div><div>fi</div></div><div><br clear="none"></div><div><br clear="none"></div></div></div><br clear="none"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_quote"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_attr" dir="ltr">On Fri, Mar 1, 2019 at 3:28 AM Pierre Timmermans &lt;<a shape="rect" href="mailto:ptim007@yahoo.com" rel="nofollow" target="_blank">ptim007@yahoo.com</a>&gt; wrote:<br clear="none"></div><blockquote class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp5a0a2fe9yahoo-style-wrap"><div><div>Hi</div><div><br clear="none"></div><div>Same issue for me but I am not sure how to fix it. Andre can you tell exactly how you check ?</div><div><br clear="none"></div><div>I cannot add a test using pcp_node_info to check that the status is up, because then follow_master is never doing something. Indeed, in my case, when the follow_master is executed the status of the target node is always down, so my script does the standby follow command and then a pcp_attach_node.</div><div><br clear="none"></div><div>To solve the issue now I added a check that the command <span>select pg_is_in_recovery(); returns &quot;t&quot; on the node, if it returns &quot;f&quot; then I can assume it is a degenerated master and I don&#39;t execute the follow_master command.</span></div><div><br clear="none"></div><div><br clear="none"></div><div><br clear="none"></div><div>So my use case is this<br clear="none"></div><div><br clear="none"></div><div>1. node 0 is primary, node 1 and node 2 are standby</div><div>2. node 0 is restarted, node 1 becomes primary and node 2 follows the new primary (thanks to folllow_master). In follow_master of node 2 I have to do pcp_attach_node after because the status of the node is down </div><div>3. in the meantime node 0 has rebooted, the db is started on node 0 but it is down in pgpool and its role is standby (it is a degenerated master)</div><div>4. node 1 is restarted, pgpool executes failover on node 2 and follow_master on node 0 =&gt; the follow_master on node 0 breaks everything because after that node 0 becomes a primary again</div><div> <br clear="none"></div><div>Thanks and regards</div><div><br clear="none"></div><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp5a0a2fe9signature">Pierre</div></div>
        <div><br clear="none"></div><div><br clear="none"></div>
        
        </div><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayahoo_quoted" id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayahoo_quoted_1927667633">
            <div>
                
                <div>
                    On Monday, February 25, 2019, 5:35:11 PM GMT+1, Andre Piwoni &lt;<a shape="rect" href="mailto:apiwoni@webmd.net" rel="nofollow" target="_blank">apiwoni@webmd.net</a>&gt; wrote:
                </div>
                <div><br clear="none"></div>
                <div><br clear="none"></div>
                <div><div id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534"><div><div dir="ltr"><div dir="ltr">I have already put that check in place.<div><br clear="none"></div><div>Thank you for confirming.</div></div><br clear="none"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534gmail_quote"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534yqt4592847929" id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534yqtfd08537"><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534gmail_attr" dir="ltr">On Sat, Feb 23, 2019 at 11:56 PM Tatsuo Ishii &lt;<a shape="rect" href="mailto:ishii@sraoss.co.jp" rel="nofollow" target="_blank">ishii@sraoss.co.jp</a>&gt; wrote:<br clear="none"></div><blockquote class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayiv5096222534gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Sorry, I was wrong. A follow_master_command will be executed against<br clear="none">
the down node as well. So you need to check whether target PostgreSQL<br clear="none">
node is running in the follow_master_commdn. If it&#39;s not, you can skip<br clear="none">
the node.<br clear="none">
<br clear="none">
Best regards,<br clear="none">
--<br clear="none">
Tatsuo Ishii<br clear="none">
SRA OSS, Inc. Japan<br clear="none">
English: <a shape="rect" href="http://www.sraoss.co.jp/index_en.php" rel="nofollow" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br clear="none">
Japanese:<a shape="rect" href="http://www.sraoss.co.jp" rel="nofollow" target="_blank">http://www.sraoss.co.jp</a><br clear="none">
<br clear="none">
&gt; I have added pg_ctl status check to ensure no action is taken when node is<br clear="none">
&gt; down but I&#39;ll check 3.7.8 version.<br clear="none">
&gt; <br clear="none">
&gt; Here&#39;s the Pgpool log from the time node2 is shutdown to time node1(already<br clear="none">
&gt; dead old primary) received follow master command.<br clear="none">
&gt; Sorry for double date logging. I&#39;m also including self-explanatory<br clear="none">
&gt; failover.log that I my failover and follow_master scripts generated.<br clear="none">
&gt; <br clear="none">
&gt; Arguments passed to scripts for your reference.<br clear="none">
&gt; failover.sh %d %h %p %D %M %P %m %H %r %R<br clear="none">
&gt; follow_master.sh %d %h %p %D %M %P %m %H %r %R<br clear="none">
&gt; <br clear="none">
&gt; Pool status before shutdown of node 2:<br clear="none">
&gt; postgres=&gt; show pool_nodes;<br clear="none">
&gt;  node_id |          hostname          | port | status | lb_weight |  role<br clear="none">
&gt;  | select_cnt | load_balance_node | replication_delay<br clear="none">
&gt; ---------+----------------------------+------+--------+-----------+---------+------------+-------------------+-------------------<br clear="none">
&gt;  0       | pg-hdp-node1.kitchen.local | 5432 | down   | 0.333333  | standby<br clear="none">
&gt; | 0          | false             | 0<br clear="none">
&gt;  1       | pg-hdp-node2.kitchen.local | 5432 | up     | 0.333333  | primary<br clear="none">
&gt; | 0          | false             | 0<br clear="none">
&gt;  2       | pg-hdp-node3.kitchen.local | 5432 | up     | 0.333333  | standby<br clear="none">
&gt; | 0          | true              | 0<br clear="none">
&gt; (3 rows)<br clear="none">
&gt; <br clear="none">
&gt; Pgpool log<br clear="none">
&gt; Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [126-1] 2019-02-22 10:43:27:<br clear="none">
&gt; pid 12437: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-1] 2019-02-22 10:43:27:<br clear="none">
&gt; pid 12437: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-2] 2019-02-22 10:43:27:<br clear="none">
&gt; pid 12437: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot;<br clear="none">
&gt; failed<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-1] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: ERROR:  Failed to check replication time lag<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-2] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: DETAIL:  No persistent db connection for the node 1<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-3] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: HINT:  check sr_check_user and sr_check_password<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-4] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: CONTEXT:  while checking replication time lag<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [129-1] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-1] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-2] 2019-02-22 10:43:37:<br clear="none">
&gt; pid 12437: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot;<br clear="none">
&gt; failed<br clear="none">
&gt; Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [6-1] 2019-02-22 10:43:45: pid<br clear="none">
&gt; 7786: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-1] 2019-02-22 10:43:45: pid<br clear="none">
&gt; 7786: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-2] 2019-02-22 10:43:45: pid<br clear="none">
&gt; 7786: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot; failed<br clear="none">
&gt; Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [8-1] 2019-02-22 10:43:45: pid<br clear="none">
&gt; 7786: LOG:  health check retrying on DB node: 1 (round:1)<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-1] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: ERROR:  Failed to check replication time lag<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-2] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: DETAIL:  No persistent db connection for the node 1<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-3] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: HINT:  check sr_check_user and sr_check_password<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-4] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: CONTEXT:  while checking replication time lag<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [132-1] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-1] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-2] 2019-02-22 10:43:47:<br clear="none">
&gt; pid 12437: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot;<br clear="none">
&gt; failed<br clear="none">
&gt; Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [9-1] 2019-02-22 10:43:48: pid<br clear="none">
&gt; 7786: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-1] 2019-02-22 10:43:48: pid<br clear="none">
&gt; 7786: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-2] 2019-02-22 10:43:48: pid<br clear="none">
&gt; 7786: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot; failed<br clear="none">
&gt; Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [11-1] 2019-02-22 10:43:48: pid<br clear="none">
&gt; 7786: LOG:  health check retrying on DB node: 1 (round:2)<br clear="none">
&gt; Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [12-1] 2019-02-22 10:43:51: pid<br clear="none">
&gt; 7786: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-1] 2019-02-22 10:43:51: pid<br clear="none">
&gt; 7786: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-2] 2019-02-22 10:43:51: pid<br clear="none">
&gt; 7786: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot; failed<br clear="none">
&gt; Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [14-1] 2019-02-22 10:43:51: pid<br clear="none">
&gt; 7786: LOG:  health check retrying on DB node: 1 (round:3)<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [15-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7786: LOG:  failed to connect to PostgreSQL server on<br clear="none">
&gt; &quot;pg-hdp-node2.kitchen.local:5432&quot;, getsockopt() detected error &quot;Connection<br clear="none">
&gt; refused&quot;<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7786: ERROR:  failed to make persistent db connection<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-2] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7786: DETAIL:  connection to host:&quot;pg-hdp-node2.kitchen.local:5432&quot; failed<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [17-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7786: LOG:  health check failed on node 1 (timeout:0)<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [18-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7786: LOG:  received degenerate backend request for node_id: 1 from pid<br clear="none">
&gt; [7786]<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [253-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7746: LOG:  Pgpool-II parent process has received failover request<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [254-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7746: LOG:  starting degeneration. shutdown host<br clear="none">
&gt; pg-hdp-node2.kitchen.local(5432)<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [255-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7746: LOG:  Restart all children<br clear="none">
&gt; Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [256-1] 2019-02-22 10:43:54: pid<br clear="none">
&gt; 7746: LOG:  execute command: /etc/pgpool-II/failover.sh 1<br clear="none">
&gt; pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2<br clear="none">
&gt; pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [257-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  find_primary_node_repeatedly: waiting for finding a primary node<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [258-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  find_primary_node: checking backend no 0<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [259-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  find_primary_node: checking backend no 1<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [260-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  find_primary_node: checking backend no 2<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [261-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  find_primary_node: primary node id is 2<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [262-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  starting follow degeneration. shutdown host<br clear="none">
&gt; pg-hdp-node1.kitchen.local(5432)<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [263-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  starting follow degeneration. shutdown host<br clear="none">
&gt; pg-hdp-node2.kitchen.local(5432)<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [264-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  failover: 2 follow backends have been degenerated<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [265-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  failover: set new primary node: 2<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [266-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  failover: set new master node: 2<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [267-1] 2019-02-22 10:43:55: pid<br clear="none">
&gt; 7746: LOG:  failover done. shutdown host pg-hdp-node2.kitchen.local(5432)<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12437: ERROR:  Failed to check replication time lag<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-2] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12437: DETAIL:  No persistent db connection for the node 1<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-3] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12437: HINT:  check sr_check_user and sr_check_password<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-4] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12437: CONTEXT:  while checking replication time lag<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [135-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12437: LOG:  worker process received restart request<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12774: LOG:  failback event detected<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-2] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12774: DETAIL:  restarting myself<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [265-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12742: LOG:  start triggering follow command.<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [266-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 0<br clear="none">
&gt; pg-hdp-node1.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2<br clear="none">
&gt; pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data<br clear="none">
&gt; Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [267-1] 2019-02-22 10:43:55:<br clear="none">
&gt; pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 1<br clear="none">
&gt; pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2<br clear="none">
&gt; pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data<br clear="none">
&gt; Feb 22 10:43:56 pg-hdp-node3 pgpool[12436]: [60-1] 2019-02-22 10:43:56: pid<br clear="none">
&gt; 12436: LOG:  restart request received in pcp child process<br clear="none">
&gt; Feb 22 10:43:56 pg-hdp-node3 pgpool[7746]: [268-1] 2019-02-22 10:43:56: pid<br clear="none">
&gt; 7746: LOG:  PCP child 12436 exits with status 0 in failover()<br clear="none">
&gt; <br clear="none">
&gt; Pgpool self-explanatory failover.log<br clear="none">
&gt; <br clear="none">
&gt; 2019-02-22 10:43:54.893 PST Executing failover script ...<br clear="none">
&gt; 2019-02-22 10:43:54.895 PST Script arguments:<br clear="none">
&gt; failed_node_id           1<br clear="none">
&gt; failed_node_host         pg-hdp-node2.kitchen.local<br clear="none">
&gt; failed_node_port         5432<br clear="none">
&gt; failed_node_pgdata       /var/lib/pgsql/10/data<br clear="none">
&gt; old_primary_node_id      1<br clear="none">
&gt; old_master_node_id       1<br clear="none">
&gt; new_master_node_id       2<br clear="none">
&gt; new_master_node_host     pg-hdp-node3.kitchen.local<br clear="none">
&gt; new_master_node_port     5432<br clear="none">
&gt; new_master_node_pgdata   /var/lib/pgsql/10/data<br clear="none">
&gt; 2019-02-22 10:43:54.897 PST Primary node running on<br clear="none">
&gt; pg-hdp-node2.kitchen.local host is unresponsive or have died<br clear="none">
&gt; 2019-02-22 10:43:54.898 PST Attempting to stop primary node running on<br clear="none">
&gt; pg-hdp-node2.kitchen.local host before promoting slave as the new primary<br clear="none">
&gt; 2019-02-22 10:43:54.899 PST ssh -o StrictHostKeyChecking=no -i<br clear="none">
&gt; /var/lib/pgsql/.ssh/id_rsa postgres@pg-hdp-node2.kitchen.local -T<br clear="none">
&gt; /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data stop -m fast<br clear="none">
&gt; 2019-02-22 10:43:55.151 PST Promoting pg-hdp-node3.kitchen.local host as<br clear="none">
&gt; the new primary<br clear="none">
&gt; 2019-02-22 10:43:55.153 PST ssh -o StrictHostKeyChecking=no -i<br clear="none">
&gt; /var/lib/pgsql/.ssh/id_rsa postgres@pg-hdp-node3.kitchen.local -T<br clear="none">
&gt; /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data promote<br clear="none">
&gt; waiting for server to promote.... done<br clear="none">
&gt; server promoted<br clear="none">
&gt; 2019-02-22 10:43:55.532 PST Completed executing failover<br clear="none">
&gt; <br clear="none">
&gt; 2019-02-22 10:43:55.564 PST Executing follow master script ...<br clear="none">
&gt; 2019-02-22 10:43:55.566 PST Script arguments<br clear="none">
&gt; detached_node_id         0<br clear="none">
&gt; detached_node_host       pg-hdp-node1.kitchen.local<br clear="none">
&gt; detached_node_port       5432<br clear="none">
&gt; detached_node_pgdata     /var/lib/pgsql/10/data<br clear="none">
&gt; old_primary_node_id      1<br clear="none">
&gt; old_master_node_id       1<br clear="none">
&gt; new_master_node_id       2<br clear="none">
&gt; new_master_node_host     pg-hdp-node3.kitchen.local<br clear="none">
&gt; new_master_node_port     5432<br clear="none">
&gt; new_master_node_pgdata   /var/lib/pgsql/10/data<br clear="none">
&gt; 2019-02-22 10:43:55.567 PST Checking if server is running on<br clear="none">
&gt; pg-hdp-node1.kitchen.local host<br clear="none">
&gt; 2019-02-22 10:43:55.569 PST ssh -o StrictHostKeyChecking=no -i<br clear="none">
&gt; /var/lib/pgsql/.ssh/id_rsa postgres@pg-hdp-node1.kitchen.local -T<br clear="none">
&gt; /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data status<br clear="none">
&gt; <br clear="none">
&gt; <br clear="none">
&gt; pg_ctl: no server running<br clear="none">
&gt; 2019-02-22 10:43:55.823 PST Node on pg-hdp-node1.kitchen.local host is not<br clear="none">
&gt; running. It could be old slave or primary that needs to be recovered.<br clear="none">
&gt; 2019-02-22 10:43:55.824 PST Completed executing follow master script<br clear="none">
&gt; <br clear="none">
&gt; 2019-02-22 10:43:55.829 PST Executing follow master script ...<br clear="none">
&gt; 2019-02-22 10:43:55.830 PST Script arguments<br clear="none">
&gt; detached_node_id         1<br clear="none">
&gt; detached_node_host       pg-hdp-node2.kitchen.local<br clear="none">
&gt; detached_node_port       5432<br clear="none">
&gt; detached_node_pgdata     /var/lib/pgsql/10/data<br clear="none">
&gt; old_primary_node_id      1<br clear="none">
&gt; old_master_node_id       1<br clear="none">
&gt; new_master_node_id       2<br clear="none">
&gt; new_master_node_host     pg-hdp-node3.kitchen.local<br clear="none">
&gt; new_master_node_port     5432<br clear="none">
&gt; new_master_node_pgdata   /var/lib/pgsql/10/data<br clear="none">
&gt; 2019-02-22 10:43:55.831 PST Detached node on pg-hdp-node2.kitchen.local<br clear="none">
&gt; host is the the old primary node<br clear="none">
&gt; 2019-02-22 10:43:55.833 PST Slave can be created from old primary node by<br clear="none">
&gt; deleting PG_DATA directory under /var/lib/pgsql/10/data on<br clear="none">
&gt; pg-hdp-node2.kitchen.local host and re-running Chef client<br clear="none">
&gt; 2019-02-22 10:43:55.834 PST Slave can be recovered from old primary node by<br clear="none">
&gt; running /usr/pgsql-10/bin/pg_rewind -D /var/lib/pgsql/10/data<br clear="none">
&gt; --source-server=&quot;port=5432 host=pg-hdp-node3.kitchen.local&quot; command on<br clear="none">
&gt; pg-hdp-node2.kitchen.local host as postgres user<br clear="none">
&gt; 2019-02-22 10:43:55.835 PST After successful pg_rewind run cp<br clear="none">
&gt; /var/lib/pgsql/10/data/recovery.done /var/lib/pgsql/10/data/recovery.conf,<br clear="none">
&gt; ensure host connection string points to pg-hdp-node3.kitchen.local, start<br clear="none">
&gt; PostgreSQL and attach it to pgpool<br clear="none">
&gt; 2019-02-22 10:43:55.836 PST Completed executing follow master script<br clear="none">
&gt; <br clear="none">
&gt; On Thu, Feb 21, 2019 at 4:47 PM Tatsuo Ishii &lt;<a shape="rect" href="mailto:ishii@sraoss.co.jp" rel="nofollow" target="_blank">ishii@sraoss.co.jp</a>&gt; wrote:<br clear="none">
&gt; <br clear="none">
&gt;&gt; &gt; Is this correct behavior?<br clear="none">
&gt;&gt; &gt;<br clear="none">
&gt;&gt; &gt; In 3-node setup, node1(primary) is shutdown, failover is executed and<br clear="none">
&gt;&gt; node2<br clear="none">
&gt;&gt; &gt; becomes new primary and node3 follows new primary on node2.<br clear="none">
&gt;&gt; &gt; Now, node2(new primary) is shutdown, failover is executed and node3<br clear="none">
&gt;&gt; becomes<br clear="none">
&gt;&gt; &gt; new primary but fallow_master_command is executed on node1 even though it<br clear="none">
&gt;&gt; &gt; is reported as down.<br clear="none">
&gt;&gt;<br clear="none">
&gt;&gt; No. follow master command should not be executed on an already-down<br clear="none">
&gt;&gt; node (in this case node1).<br clear="none">
&gt;&gt;<br clear="none">
&gt;&gt; &gt; It happens that my script repoints node1 and restarts it which breaks<br clear="none">
&gt;&gt; hell<br clear="none">
&gt;&gt; &gt; because node1 was never recovered after being shutdown.<br clear="none">
&gt;&gt; &gt;<br clear="none">
&gt;&gt; &gt; I&#39;m on PgPool 3.7.4.<br clear="none">
&gt;&gt;<br clear="none">
&gt;&gt; Can you share the log from when node2 was shutdown to when node1 was<br clear="none">
&gt;&gt; recovered by your follow master command?<br clear="none">
&gt;&gt;<br clear="none">
&gt;&gt; In the mean time 3.7.4 is not the latest one. Can you try with the<br clear="none">
&gt;&gt; latest one? (3.7.8).<br clear="none">
&gt;&gt;<br clear="none">
&gt;&gt; Best regards,<br clear="none">
&gt;&gt; --<br clear="none">
&gt;&gt; Tatsuo Ishii<br clear="none">
&gt;&gt; SRA OSS, Inc. Japan<br clear="none">
&gt;&gt; English: <a shape="rect" href="http://www.sraoss.co.jp/index_en.php" rel="nofollow" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br clear="none">
&gt;&gt; Japanese:<a shape="rect" href="http://www.sraoss.co.jp" rel="nofollow" target="_blank">http://www.sraoss.co.jp</a><br clear="none">
&gt;&gt;<br clear="none">
&gt; <br clear="none">
&gt; <br clear="none">
&gt; -- <br clear="none">
&gt; <br clear="none">
&gt; *Andre Piwoni*<br clear="none"><br clear="none"></blockquote></div></div></div></div></div><div class="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayqt4592847929" id="gmail-m_7288807699801252573ydp477a48d5yiv7700399587gmail-m_2770113853894995366gmail-m_6102677147890786540ydp2b963b2ayqtfd69905">_______________________________________________<br clear="none">pgpool-general mailing list<br clear="none"><a shape="rect" href="mailto:pgpool-general@pgpool.net" rel="nofollow" target="_blank">pgpool-general@pgpool.net</a><br clear="none"><a shape="rect" href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="nofollow" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br clear="none"></div></div>
            </div>
        </div></div></blockquote></div><br clear="all"><div><br clear="none"></div>-- </blockquote></div></div></div></div></div></div>
            </div>
        </div></div></blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><p><b><span style="color:rgb(64,64,64)">Andre Piwoni</span></b></p>

<p><span style="color:rgb(64,64,64)">Sr. Software Developer,
BI/Database</span></p>

<p><b><i><span style="color:rgb(31,73,125)">Web</span></i></b><span style="color:black">MD Health Services</span><span style="font-family:&quot;Times New Roman&quot;,serif;color:black"></span></p>

<p><span style="color:rgb(64,64,64)">Mobile: 801.541.4722</span></p>

<p><span style="color:rgb(31,73,125)"><a href="http://www.webmdhealthservices.com/" target="_blank"><span style="color:blue">www.webmdhealthservices.com</span></a></span><span style="color:rgb(64,64,64)"></span></p></div></div>