<div dir="ltr">Hi<div><br></div><div>So on the 28 I set 

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">health_check_max_retries = 1 and had no problem during 1 whole day. So I set back <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">health_check_max_retries = 0 yersterday (the 29th) to make sure of the problem and the problem didn&#39;t showed up since then...</span></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">So I want to think that was some network connection reset from my server datacenter... This problem appeared one week after running with 

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">health_check_max_retries = 0<span> and no problem.</span></span></span></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span><br></span></span></span></span></div><div><span style="font-size:12.8px">I&#39;m sorry I don&#39;t have the log anymore. I will wait until tomorrow with <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">health_check_max_retries = 0<span> but then will set <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">health_check_max_retries = 1 to start pre-prod test.</span></span></span></span></div><div><span style="font-size:12.8px"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></span></span></div><div><span style="font-size:12.8px"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thanks for your help, have a nice day !</span></span></span></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-04-28 2:10 GMT+02:00 Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I noticed you set health_check_max_retries = 0. If the error were a<br>
transient one, set some positive number to health_check_max_retries<br>
might help.<br>
<br>
Also I am interested in a strace log when the failover occurs.<br>
<span class="im HOEnZb"><br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
<br>
</span><div class="HOEnZb"><div class="h5">&gt; Oh I forgot the configuration, here it is :<br>
&gt; <br>
&gt; health_check_period = 2<br>
&gt; health_check_timeout = 6<br>
&gt; health_check_max_retries = 0<br>
&gt; health_check_retry_delay = 1<br>
&gt; connect_timeout = 10000<br>
&gt; <br>
&gt; No individual healtcheck settings.<br>
&gt; <br>
&gt; So of course I could increase connect_timeout, but 10 seconds is already a<br>
&gt; lot to trigger failover process for a production server receiving ~10<br>
&gt; insert / second.<br>
&gt; <br>
&gt; 2018-04-26 21:23 GMT+02:00 Bud Curly &lt;<a href="mailto:psyckow.prod@gmail.com">psyckow.prod@gmail.com</a>&gt;:<br>
&gt; <br>
&gt;&gt; Hi and thanks for your work.<br>
&gt;&gt;<br>
&gt;&gt; I use pgpool2 3.7.2 (latest git) with 2 backend as master-slave mode with<br>
&gt;&gt; native stream replication.<br>
&gt;&gt;<br>
&gt;&gt; I think I have an issue concerning the health check process.<br>
&gt;&gt;<br>
&gt;&gt; Since two days now I had two &quot;health check timer expired&quot; that appears<br>
&gt;&gt; yersterday around 9 am and today around 8 pm.<br>
&gt;&gt;<br>
&gt;&gt; The weird thing is... Pgpool and the backend in question are on the same<br>
&gt;&gt; machine. This backend is the master. Here is the log :<br>
&gt;&gt;<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:LOG:  failed to connect to PostgreSQL server<br>
&gt;&gt; on &quot;x.x.x.x:xxx&quot; using INET socket<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:DETAIL:  health check timer expired<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:ERROR:  failed to make persistent db<br>
&gt;&gt; connection<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:DETAIL:  connection to host:&quot; x.x.x.x:xxx&quot;<br>
&gt;&gt; failed<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:LOG:  health check failed on node 0<br>
&gt;&gt; (timeout:1)<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2153:LOG:  received degenerate backend request<br>
&gt;&gt; for node_id: 0 from pid [2153]<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2104:LOG:  Pgpool-II parent process has received<br>
&gt;&gt; failover request<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2104:LOG:  starting degeneration. shutdown host<br>
&gt;&gt; x.x.x.x:xxx<br>
&gt;&gt; 2018-04-26 20:59:29: pid 2104:LOG:  Restart all children<br>
&gt;&gt;<br>
&gt;&gt; Despite the fact that these are on the same machine, I use public IP for<br>
&gt;&gt; the backend0 and not 127.0.0.1, because of failover process that required<br>
&gt;&gt; this ip.<br>
&gt;&gt;<br>
&gt;&gt; Do you think this could be a problem from network conditions on the server<br>
&gt;&gt; itself or an actual issue ?<br>
&gt;&gt;<br>
&gt;&gt; Thanks<br>
&gt;&gt;<br>
</div></div></blockquote></div><br></div>