<div dir="ltr"><div class="gmail_default" style="font-family:courier new,monospace"><div class="gmail_default"><br></div><div class="gmail_default">...</div><div class="gmail_default">2019-02-09 05:17:35: pid 19275: LOG:  PCP process with pid: 9642 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:35: pid 19275: LOG:  PCP process with pid: 9642 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:35: pid 19275: LOG:  forked new pcp worker, pid=9644 socket=7</div><div class="gmail_default">2019-02-09 05:17:35: pid 19275: LOG:  PCP process with pid: 9644 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:35: pid 19275: LOG:  PCP process with pid: 9644 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  forked new pcp worker, pid=9647 socket=7</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9647 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9647 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  forked new pcp worker, pid=9649 socket=7</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9649 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9649 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  forked new pcp worker, pid=9651 socket=7</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9651 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:40: pid 19275: LOG:  PCP process with pid: 9651 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  forked new pcp worker, pid=9655 socket=7</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9655 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9655 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  forked new pcp worker, pid=9657 socket=7</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9657 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9657 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  forked new pcp worker, pid=9659 socket=7</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9659 exit with SUCCESS.</div><div class="gmail_default">2019-02-09 05:17:46: pid 19275: LOG:  PCP process with pid: 9659 exits with status 0</div><div class="gmail_default">2019-02-09 05:17:47: pid 19276: LOG:  received degenerate backend request for node_id: 1 from pid [19276]</div><div class="gmail_default">2019-02-09 05:17:47: pid 19402: LOG:  new IPC connection received</div><br class="gmail-Apple-interchange-newline"></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 18, 2019 at 12:47 PM Pierre Timmermans &lt;<a href="mailto:ptim007@yahoo.com">ptim007@yahoo.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_-3548791530064573299ydp3f63e7a9yahoo-style-wrap" style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:16px"><div><div>Alexander, you should give more logs from the production set-up, in particular the lines before this one : </div><div><br></div><div><br></div><div><span><span style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif">2019-02-09 05:17:47: pid 19402: LOG:  watchdog received the failover</span><br clear="none" style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif"><span style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif">&gt; &gt;&gt; &gt; &gt; command from local pgpool-II on IPC interface</span><br clear="none" style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif"><span style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif">&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  watchdog is processing the</span><br clear="none" style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif"><span style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif">&gt; &gt;&gt; failover</span></span><br></div><div><br></div><div><span><span style="color:rgb(0,0,0);font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:16px">Maybe there is an indication of what caused the failover. </span></span><br></div><div><br></div><div class="gmail-m_-3548791530064573299ydp3f63e7a9signature">Pierre</div></div>
        <div><br></div><div><br></div>
        
        </div><div id="gmail-m_-3548791530064573299ydpe82c5ef8yahoo_quoted_0799630009" class="gmail-m_-3548791530064573299ydpe82c5ef8yahoo_quoted">
            <div style="font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:13px;color:rgb(38,40,42)">
                
                <div>
                    On Monday, February 18, 2019, 3:00:22 PM GMT+1, Alexander Dorogensky &lt;<a href="mailto:amazinglifetime@gmail.com" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:
                </div>
                <div><br></div>
                <div><br></div>
                <div><div id="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253"><div><div dir="ltr"><div class="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253gmail_default" style="font-family:&quot;courier new&quot;,monospace">Any idea why pgpool doesn&#39;t retry?</div></div><br clear="none"><div class="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253gmail_quote"><div class="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253yqt6843868097" id="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253yqtfd17826"><div class="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253gmail_attr" dir="ltr">On Sun, Feb 17, 2019 at 6:19 PM Bo Peng &lt;<a shape="rect" href="mailto:pengbo@sraoss.co.jp" rel="nofollow" target="_blank">pengbo@sraoss.co.jp</a>&gt; wrote:<br clear="none"></div><blockquote class="gmail-m_-3548791530064573299ydpe82c5ef8yiv7666916253gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br clear="none">
<br clear="none">
I confirmed your pgpool.conf, you set <br clear="none">
<br clear="none">
  health_check_max_retries = 16<br clear="none">
<br clear="none">
So I think the following result is correct. <br clear="none">
<br clear="none">
&gt; &gt;&gt; &gt; psql -c &#39;pgpool show health_check_max_retries&#39;<br clear="none">
&gt; &gt;&gt; &gt; health_check_max_retries<br clear="none">
&gt; &gt;&gt; &gt; --------------------------<br clear="none">
&gt; &gt;&gt; &gt; 16<br clear="none">
&gt; &gt;&gt; &gt; (1 row)<br clear="none">
<br clear="none">
On Fri, 15 Feb 2019 07:12:56 -0500<br clear="none">
Alexander Dorogensky &lt;<a shape="rect" href="mailto:amazinglifetime@gmail.com" rel="nofollow" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:<br clear="none">
<br clear="none">
&gt; Hi,<br clear="none">
&gt; <br clear="none">
&gt; Do you have any ideas what’s going on?<br clear="none">
&gt; <br clear="none">
&gt; On Mon, Feb 11, 2019 at 8:30 PM Alexander Dorogensky &lt;<br clear="none">
&gt; <a shape="rect" href="mailto:amazinglifetime@gmail.com" rel="nofollow" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:<br clear="none">
&gt; <br clear="none">
&gt; &gt; Pgpool.conf from one of the app nodes is attached<br clear="none">
&gt; &gt;<br clear="none">
&gt; &gt; Thanks<br clear="none">
&gt; &gt;<br clear="none">
&gt; &gt; On Mon, Feb 11, 2019 at 6:59 PM Bo Peng &lt;<a shape="rect" href="mailto:pengbo@sraoss.co.jp" rel="nofollow" target="_blank">pengbo@sraoss.co.jp</a>&gt; wrote:<br clear="none">
&gt; &gt;<br clear="none">
&gt; &gt;&gt; Hi,<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt; On Mon, 11 Feb 2019 15:32:55 -0600<br clear="none">
&gt; &gt;&gt; Alexander Dorogensky &lt;<a shape="rect" href="mailto:amazinglifetime@gmail.com" rel="nofollow" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt; &gt; I&#39;m running 4 app (pgpool) nodes (3.6.10) and 2 db (postgres) nodes<br clear="none">
&gt; &gt;&gt; (9.6.9)<br clear="none">
&gt; &gt;&gt; &gt; primary/standby configuration with streaming replication. All 6 nodes<br clear="none">
&gt; &gt;&gt; are<br clear="none">
&gt; &gt;&gt; &gt; separate machines.<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; A client has had too many failovers caused by the flaky network and in<br clear="none">
&gt; &gt;&gt; an<br clear="none">
&gt; &gt;&gt; &gt; effort to remedy the issue I set the following parameters<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; health_check_max_retries = 7<br clear="none">
&gt; &gt;&gt; &gt; health_check_retry_delay = 15<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; Now, I have the client&#39;s environment and a lab environment to reproduce<br clear="none">
&gt; &gt;&gt; the<br clear="none">
&gt; &gt;&gt; &gt; issue. Pgpool configuration and the version are identical.<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; To simulate a flaky network, I use iptables to deny postgres<br clear="none">
&gt; &gt;&gt; connections to<br clear="none">
&gt; &gt;&gt; &gt; one of the db nodes and see that pgpool on all app nodes is trying to<br clear="none">
&gt; &gt;&gt; &gt; reconnect according to the configured number of retries and retry delay,<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; &gt; i.e.<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:22:51: pid 7825: LOG:  failed to connect to PostgreSQL<br clear="none">
&gt; &gt;&gt; &gt; &gt; server on &quot;<a shape="rect" href="http://10.0.10.133:5433" rel="nofollow" target="_blank">10.0.10.133:5433</a>&quot;, getsockopt() detected error &quot;No route<br clear="none">
&gt; &gt;&gt; to<br clear="none">
&gt; &gt;&gt; &gt; &gt; host&quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:23:23: pid 6458: LOG:  health checking retry count 1<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:23:38: pid 6458: LOG:  health checking retry count 2<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:42:45: pid 6458: LOG:  health checking retry count 3<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:00: pid 6458: LOG:  health checking retry count 4<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:15: pid 6458: LOG:  health checking retry count 5<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:30: pid 6458: LOG:  health checking retry count 6<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:30: pid 6460: LOG:  failover request from local<br clear="none">
&gt; &gt;&gt; pgpool-II<br clear="none">
&gt; &gt;&gt; &gt; &gt; node received on IPC interface is forwarded to master watchdog node &quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; <a shape="rect" href="http://172.20.20.173:5432" rel="nofollow" target="_blank">172.20.20.173:5432</a>&quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:30: pid 4565: LOG:  watchdog received the failover<br clear="none">
&gt; &gt;&gt; &gt; &gt; command from remote pgpool-II node &quot;<a shape="rect" href="http://172.20.20.172:5432" rel="nofollow" target="_blank">172.20.20.172:5432</a>&quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; ...<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-11 14:43:30: pid 4563: LOG:  execute command:<br clear="none">
&gt; &gt;&gt; &gt; &gt; /etc/pgpool-II/failover.sh 0 10.0.10.133 5433 /opt/redsky/db/data 1 0<br clear="none">
&gt; &gt;&gt; &gt; &gt; 10.0.10.134 1 5433 /opt/redsky/db/data<br clear="none">
&gt; &gt;&gt; &gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; &gt; However, in the client&#39;s environment failover gets initiated before<br clear="none">
&gt; &gt;&gt; the<br clear="none">
&gt; &gt;&gt; &gt; configured number of retries, i.e.<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  watchdog received the failover<br clear="none">
&gt; &gt;&gt; &gt; &gt; command from local pgpool-II on IPC interface<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  watchdog is processing the<br clear="none">
&gt; &gt;&gt; failover<br clear="none">
&gt; &gt;&gt; &gt; &gt; command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on<br clear="none">
&gt; &gt;&gt; IPC<br clear="none">
&gt; &gt;&gt; &gt; &gt; interface<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  forwarding the failover request<br clear="none">
&gt; &gt;&gt; &gt; &gt; [DEGENERATE_BACKEND_REQUEST] to all alive nodes<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: DETAIL:  watchdog cluster currently<br clear="none">
&gt; &gt;&gt; has 3<br clear="none">
&gt; &gt;&gt; &gt; &gt; connected remote nodes<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19276: ERROR:  unable to read data from DB<br clear="none">
&gt; &gt;&gt; node 1<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19276: DETAIL:  socket read failed with an<br clear="none">
&gt; &gt;&gt; error<br clear="none">
&gt; &gt;&gt; &gt; &gt; &quot;Success&quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19400: LOG:  Pgpool-II parent process has<br clear="none">
&gt; &gt;&gt; &gt; &gt; received failover request<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  new IPC connection received<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  received the failover command<br clear="none">
&gt; &gt;&gt; lock<br clear="none">
&gt; &gt;&gt; &gt; &gt; request from local pgpool-II on IPC interface<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  local pgpool-II node &quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; <a shape="rect" href="http://10.15.35.35:5432" rel="nofollow" target="_blank">10.15.35.35:5432</a>&quot; is requesting to become a lock holder for failover<br clear="none">
&gt; &gt;&gt; ID:<br clear="none">
&gt; &gt;&gt; &gt; &gt; 19880<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19402: LOG:  local pgpool-II node &quot;<br clear="none">
&gt; &gt;&gt; &gt; &gt; <a shape="rect" href="http://10.15.35.35:5432" rel="nofollow" target="_blank">10.15.35.35:5432</a>&quot; is the lock holder<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19400: LOG:  starting degeneration. shutdown<br clear="none">
&gt; &gt;&gt; host<br clear="none">
&gt; &gt;&gt; &gt; &gt; 10.38.135.137(5433)<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19400: LOG:  Restart all children<br clear="none">
&gt; &gt;&gt; &gt; &gt; 2019-02-09 05:17:47: pid 19400: LOG:  execute command:<br clear="none">
&gt; &gt;&gt; &gt; &gt; /etc/pgpool-II/failover.sh 1 10.38.135.137 5433 /opt/redsky/db/data 0<br clear="none">
&gt; &gt;&gt; 0<br clear="none">
&gt; &gt;&gt; &gt; &gt; 10.15.35.39 1 5433 /opt/redsky/db/data<br clear="none">
&gt; &gt;&gt; &gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; I ran the following command on all app nodes<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; psql -c &#39;pgpool show health_check_max_retries&#39;<br clear="none">
&gt; &gt;&gt; &gt; health_check_max_retries<br clear="none">
&gt; &gt;&gt; &gt; --------------------------<br clear="none">
&gt; &gt;&gt; &gt; 16<br clear="none">
&gt; &gt;&gt; &gt; (1 row)<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; and the number is different from what I have in the configuration file..<br clear="none">
&gt; &gt;&gt; &gt; It&#39;s more than 1 though and I expect it to be honored.<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt; I could not reproduce this issue by using pgpool_setup.<br clear="none">
&gt; &gt;&gt; Could you share the whole pgpool.conf?<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt; &gt; Can you guys help me out? I&#39;m out of ideas..<br clear="none">
&gt; &gt;&gt; &gt;<br clear="none">
&gt; &gt;&gt; &gt; pgpool-II-pg96-3.6.10-1pgdg.rhel6.x86_64<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt; --<br clear="none">
&gt; &gt;&gt; Bo Peng &lt;<a shape="rect" href="mailto:pengbo@sraoss.co.jp" rel="nofollow" target="_blank">pengbo@sraoss.co.jp</a>&gt;<br clear="none">
&gt; &gt;&gt; SRA OSS, Inc. Japan<br clear="none">
&gt; &gt;&gt;<br clear="none">
&gt; &gt;&gt;<br clear="none">
<br clear="none">
<br clear="none">
-- <br clear="none">
Bo Peng &lt;<a shape="rect" href="mailto:pengbo@sraoss.co.jp" rel="nofollow" target="_blank">pengbo@sraoss.co.jp</a>&gt;<br clear="none">
SRA OSS, Inc. Japan<br clear="none">
<br clear="none">
</blockquote></div></div></div></div><div class="gmail-m_-3548791530064573299ydpe82c5ef8yqt6843868097" id="gmail-m_-3548791530064573299ydpe82c5ef8yqtfd69431">_______________________________________________<br clear="none">pgpool-general mailing list<br clear="none"><a shape="rect" href="mailto:pgpool-general@pgpool.net" rel="nofollow" target="_blank">pgpool-general@pgpool.net</a><br clear="none"><a shape="rect" href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="nofollow" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br clear="none"></div></div>
            </div>
        </div></div></blockquote></div>