<div dir="ltr">Ok so the problem showed up again, here what&#39;s new :<div><br></div><div>I made a small script (remoted, from my pc) that perform &quot;SELECT 1&quot; on the same db with same credentials, each 2 sec interval and with 2 sec timeout, using 1 pool, everything just like health_process.</div><div><br></div><div>During the bug, my script didn&#39;t show anything.... All worked fine on it. Same for network monitoring and nothing more revelant on pgpool and postgresql logs...</div><div><br></div><div>I changed health_process&#39;s credentials and db, nothing changed.</div><div><br></div><div>Could someone can tell me how to make pgpool&#39;s health_process verbose ? (even if it&#39;s require to change the source)</div><div><br></div><div>The start point is still this : </div><div><br></div><div><div>2018-04-27 18:32:16: pid 5983:LOG:  failed to connect to PostgreSQL server on &quot;x.x.x.x:xxxx&quot; using INET socket</div><div>2018-04-27 18:32:16: pid 5983:DETAIL:  health check timer expired</div></div><div><br></div><div>And I would like to have more details about this one</div><div><br></div><div>Thanks..</div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 16:48 GMT+02:00 Bud Curly <span dir="ltr">&lt;<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">So I have reduce timeout to 2 seconds each like this :<div>

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><b>health_check_timeout</b></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> = 2</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><b>connect_timeout</b></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> = 2000</span>

<br></div><div><br></div><div>The problem seems to appears more frequently (two times since I made the update, 4 hours ago). Same logs on pgpool and postgresql and same condition (~5 insert / seconds) during the problem.<div><div><br></div><div>On pgpool :</div><div><br></div><div><div>2018-04-27 16:26:27: pid 5983:LOG:  failed to connect to PostgreSQL server on &quot;x.x.x.x:xxxx&quot; using INET socket</div><div>2018-04-27 16:26:27: pid 5983:DETAIL:  health check timer expired</div><div>2018-04-27 16:26:27: pid 5983:ERROR:  failed to make persistent db connection</div><div>2018-04-27 16:26:27: pid 5983:DETAIL:  connection to host:&quot;

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">x.x.x.x:xxxx</span>&quot; failed</div><div>2018-04-27 16:26:27: pid 5983:LOG:  health check failed on node 0 (timeout:1)</div><div>2018-04-27 16:26:27: pid 5983:LOG:  received degenerate backend request for node_id: 0 from pid [5983]</div><div>2018-04-27 16:26:27: pid 5949:LOG:  Pgpool-II parent process has received failover request</div><div>2018-04-27 16:26:27: pid 5949:LOG:  starting degeneration. shutdown host x.x.x.x (xxxx)</div><div>2018-04-27 16:26:27: pid 5949:LOG:  Restart all children</div></div><div><br></div><div>On PostgreSQL :</div><div><br></div><div><div>2018-04-27 16:26:32.079 CEST [30525] LOG:  trigger file found: /var/lib/postgresql/9.6/main/<wbr>trigger</div><div>2018-04-27 16:26:32.079 CEST [30527] FATAL:  terminating walreceiver process due to administrator command</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  invalid record length at 3/32229D10: wanted 24, got 0</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  redo done at 3/32229CE8</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  last completed transaction was at log time 2018-04-27 16:26:27.093816+02</div><div>2018-04-27 16:26:32.090 CEST [30525] LOG:  selected new timeline ID: 98</div><div>2018-04-27 16:26:32.215 CEST [30525] LOG:  archive recovery complete</div><div>2018-04-27 16:26:32.230 CEST [30525] LOG:  MultiXact member wraparound protections are now enabled</div><div>2018-04-27 16:26:32.237 CEST [30524] LOG:  database system is ready to accept connections</div><div>2018-04-27 16:26:32.238 CEST [31170] LOG:  autovacuum launcher started</div></div><div><br></div><div>On the master PostgreSQL, I set &quot;<b>log_min_error_statement</b> = debug5&quot; so if there were a problem with PostgreSQL, I should have been noticed.</div><div><br></div><div>There is nothing weird on tcp paquets while I was monitoring.</div><div><br></div><div>I also monitored network connection with a looped ping x.x.x.x (public address) from the machine, there is no variation in delays during the problem...</div><div><br></div><div>I though a second it could be linked to my number of pool connection allowed on pgpool and on the backend, because of the connection monopolized by the health_check process :</div><div><br></div><div>- On pgpool :</div><div><br></div><div><div>num_init_children = 30</div><div>max_pool = 3</div></div><div><br></div><div>- On the postgreSQL master :</div><div><br></div><div>max_connections = 100<br></div><div><br></div><div>I tried to increase these settings, this change nothing...</div><div><br></div><div>I will try to simulate the health_check process with one pool and same timeout and check if I have something</div><div><br></div><div>But I run out of idea right now... If someone have something, I take.</div><div><br></div><div>Thanks</div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 11:33 GMT+02:00 Bud Curly <span dir="ltr">&lt;<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><b style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium">&gt; </b><span style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0)"><font face="arial, helvetica, sans-serif">So if we had health_check_hostname0, does it help you?</font><br><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">

</span></div>

<b style="font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium"><span style="font-style:normal;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">This could be a workaround for my case but I believe this is a network issue from my server provider. I don&#39;t really know about their network structure, but the public ip address used is set at an higher level trough NAT and it&#39;s not affected on the network interface of the server itself. </span></b><div>With the command tracepath from the machine to its public IP, I found out that it goes trough 8 node to resolve.</div><div><br></div><div>So in general the use of public IP instead of loopback is not good in terms of performance for local services.</div><div><br></div><div>A setting that could interest me could be : recovery_hostname0, 

recovery_hostname1, etc. as I need the public IP only for standby to perform pgpool_recovery().</div><div><br></div><div>Thanks :)</div><div><div><div><b style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium"><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">Tatsuo Ishii</b><span style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium;background-color:rgb(255,255,255);float:none;display:inline"><span> </span></span><a href="mailto:pgpool-general%40pgpool.net?Subject=Re:%20Re%3A%20%5Bpgpool-general%3A%206060%5D%20Re%3A%20pgpool-general%20Digest%2C%20Vol%2078%2C%20Issue%2019&amp;In-Reply-To=%3C20180427.174736.1303932718741225970.t-ishii%40sraoss.co.jp%3E" title="[pgpool-general: 6060] Re: pgpool-general Digest, Vol 78, Issue 19" style="color:rgb(17,85,204);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);font-family:&quot;Times New Roman&quot;;font-size:medium" target="_blank">ishii at sraoss.co.jp<span> </span></a><br style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium"><i style="font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:&quot;Times New Roman&quot;;font-size:medium">Fri Apr 27 17:47:36 JST 2018</i><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span> </span></span><br style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><pre style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;white-space:pre-wrap;color:rgb(0,0,0)"><span>&gt;<i> Thanks for your support :)
</i></span>
You are welcome:-)

&gt;&gt;<i> Still I don&#39;t understand. Pgpool-II and PostgreSQL master are on thesame
</i><span>&gt;<i> machine, that means you could set like &quot;backend_hostname0 = &quot;127.0.0.1&quot;.
</i>&gt;<i> 
</i>&gt;<i> Because I need the public address for pgpool_recovery() method to permit
</i>&gt;<i> online recovery from remote nodes. And pgPool like health_check
</i>&gt;<i> process use backend_hostname0
</i>&gt;<i> to do so.
</i></span>
Oh that makes sense.

&gt;<i> The setting health_check_hostname0 doesn&#39;t exist but trough, this is not a
</i>&gt;<i> workaround.
</i>
So if we had health_check_hostname0, does it help you?

&gt;<i> So according to the log, is the timeout error triggered by this
</i><span>&gt;<i> &quot;health_check_timeout = 6&quot; or this &quot;connect_timeout = 10000&quot; ?
</i></span>
I believe &quot;health_check_timeout = 6&quot;. connect system call waits up to
10 seconds but before it expires health_check_timeout comes.

&gt;<i> I downed timeout to 2 seconds each and monitoring net paquets to find some
</i><span>&gt;<i> details... Keep you in touch
</i></span>
Thanks.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: <a href="http://www.sraoss.co.jp/index_en.php" style="color:rgb(17,85,204)" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a>
Japanese:<a href="http://www.sraoss.co.jp/" style="color:rgb(17,85,204)" target="_blank">http://www.sraoss.co.<wbr>jp</a></pre><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">

<br></div></div></div></div><div class="m_3756096171255332015HOEnZb"><div class="m_3756096171255332015h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 10:44 GMT+02:00 Bud Curly <span dir="ltr">&lt;<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thanks for your support :)</span><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">&gt; <span style="font-size:12.8px">Still I don&#39;t understand. Pgpool-II and PostgreSQL master are on the</span><span style="font-size:12.8px">same machine, that means you could set like &quot;backend_hostname0 = </span><span style="font-size:12.8px">&quot;127.0.0.1&quot;.</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">Because I need the public address for pgpool_recovery() method to permit online recovery from remote nodes. </span><span style="font-size:12.8px">And pgPool like health_check process use <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">backend_hostname0 to do so.</span></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">The setting health_check_hostname0 doesn&#39;t exist but trough, this is not a workaround.</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">So according to the log, is the timeout error triggered by this &quot;health_check_timeout = 6&quot; or this &quot;connect_timeout = 10000&quot; ?</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">I downed timeout to 2 seconds each and monitoring net paquets to find some details... Keep you in touch</span></div>

<br></div><div class="m_3756096171255332015m_6058052673712875173HOEnZb"><div class="m_3756096171255332015m_6058052673712875173h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 3:15 GMT+02:00 Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>&gt; Pgpool-II health check process uses non-blocking socket for connecting<br>
&gt; to PostgreSQL. After issuing connect system call it waits for its<br>
&gt; completion using select system call with timeout: connect_timeout in<br>
&gt; pgpool.conf (in your case 10 seconds). On the other hand health_check<br>
&gt; timeout is 6 seconds. So after 6 seconds, an alarm interrupted the<br>
&gt; select system call and it returned with errno == EINTR, then the log<br>
&gt; emitted. Not sure why the connect system call did not respond for 6<br>
&gt; seconds.<br>
&gt; <br>
&gt; That&#39;s all what I know from the log.<br>
<br>
</span>If you want to make research on this, packet dump is required.<br>
<div class="m_3756096171255332015m_6058052673712875173m_8626708393922286130HOEnZb"><div class="m_3756096171255332015m_6058052673712875173m_8626708393922286130h5"><br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>