<div dir="ltr">Hi and thanks<div><br></div><div>Here is Postgres log at the time :</div><div><br></div><div><div>2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not accept SSL connection: EOF detected</div><div>2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found: /var/lib/postgresql/9.6/main/trigger</div><div>2018-04-26 20:59:34.856 CEST [27746] FATAL:  terminating walreceiver process due to administrator command</div><div>2018-04-26 20:59:34.857 CEST [27744] LOG:  invalid record length at 3/2133FD18: wanted 24, got 0</div><div>2018-04-26 20:59:34.857 CEST [27744] LOG:  redo done at 3/2133FCF0</div><div>2018-04-26 20:59:34.857 CEST [27744] LOG:  last completed transaction was at log time 2018-04-26 20:59:29.852716+02</div><div>2018-04-26 20:59:34.873 CEST [27744] LOG:  selected new timeline ID: 94</div><div>2018-04-26 20:59:34.994 CEST [27744] LOG:  archive recovery complete</div><div>2018-04-26 20:59:35.025 CEST [27744] LOG:  MultiXact member wraparound protections are now enabled</div><div>2018-04-26 20:59:35.034 CEST [25506] LOG:  autovacuum launcher started</div><div>2018-04-26 20:59:35.034 CEST [27743] LOG:  database system is ready to accept connections</div></div><div><br></div><div>

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">&gt; 2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found: /var/lib/postgresql/9.6/main/trigger</span></div><div>-&gt; On this line I assume this is the standby who is talking, because there is no <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">/var/lib/postgresql/9.6/main</span> directory on the master, data are mount somewhere else. The failover process start at 

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">20:59:29 on pgpool, and the standby get promoted.</span><br></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">&gt; 2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not accept SSL connection: EOF detected</span>

</span> </span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">This could be the weird boy. But it happened 20 minutes before the bug and this have not much to do with the healtcheck process.<br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">No more revelant things on Postgres logs</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div>&gt; <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">there&#39;s no reason for the </span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">heath check process to not accept 127.0.0.1.</span>

</div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:12.8px">Like I said, the health process fetch PostgreSQL trough public ip. So it get trough a different interface.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">At this time PostgreSQL was receiving ~5 inserts / second and that&#39;s all. No error detected on the apps.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">So the only reason I could find is a problem on the public interface of this server, but this is really really unsual as it come from a dedicated server provider.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 0:55 GMT+02:00 Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">&gt; Hi and thanks for your work.<br>
&gt; <br>
&gt; I use pgpool2 3.7.2 (latest git) with 2 backend as master-slave mode with<br>
&gt; native stream replication.<br>
&gt; <br>
&gt; I think I have an issue concerning the health check process.<br>
&gt; <br>
&gt; Since two days now I had two &quot;health check timer expired&quot; that appears<br>
&gt; yersterday around 9 am and today around 8 pm.<br>
&gt; <br>
&gt; The weird thing is... Pgpool and the backend in question are on the same<br>
&gt; machine. This backend is the master. Here is the log :<br>
<br>
</span><span class="">&gt; Despite the fact that these are on the same machine, I use public IP for<br>
&gt; the backend0 and not 127.0.0.1, because of failover process that required<br>
&gt; this ip.<br>
<br>
</span>Can you elaborate more? As far as I know, there&#39;s no reason for the<br>
heath check process to not accept 127.0.0.1.<br>
<span class=""><br>
&gt; Do you think this could be a problem from network conditions on the server<br>
&gt; itself or an actual issue ?<br>
<br>
</span>Yes. Was PostgreSQL busy at that time?<br>
<br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
</blockquote></div><br></div>