<div dir="ltr">Thank you very much for the information.<div><br></div><div>I see pg-pool on node 0 detecting that it cannot connect to node 1,</div><div>Interrupted system call message</div><div>failed to make persistent db connection</div><div>and then pg-pool on node 0 being restarted</div><div><br></div><div>A couple of clarifying questions....</div><div><br></div><div>1) What happens after restart of pg-pool and the slave node (1) remains down for extended time?  Will pg-pool on node 0 ignore attempting any connections to node 1 postgres until healthcheck passes or it will regardless?   If look like it continues to try establish connection to node 1 postgres.</div><div><br></div><div>2) Maybe related to question 1, we notice that prior developers have our \etc\sysconfig\pgpool configured with OPTS=&quot;-n -D&quot;.  Since it looks like pg-pool uses the status file to transmit node status across restarts and it is discarded, will this cause failover issues because the node 1 down status is lost on restart?</div><div><br></div><div>3) Throughout this whole troubleshooting experience...we have found that without any pgpool.conf changes.  If node 1 is disconnected via that hard failure, node 0 will &quot;hang&quot; blocking all DB connections then at about the 16 minute mark free up and just start processing db requests.   Where Is this &quot;16 minute&quot; hang originating from.  It seems pretty consistent and repeatable when we repeat the failure scenario without pgpool.conf changes.</div><div><br></div><div>Thank you for your help.  This has been a challenging issue to get corrected for seamless operation in the standby/slave node hard failed case. </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 17, 2018 at 11:47 PM, Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">&gt; We have 2 nodes running.   The stack on each node is:<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Our app<br>

&gt;<br>

&gt; HikariCP<br>

&gt;<br>

&gt; Jdbc<br>

&gt;<br>

&gt; Pg-pool 3.6.7<br>

&gt;<br>

&gt; Postgres 9.6.6<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Pgpool is set for Master-Slave with streaming replication; no load<br>

&gt; balancing.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; We are testing our disaster recovery and failover capabilities.    If we<br>

&gt; gracefully shutdown node 1 (2nd node), the 1st node proceeds as is nothing<br>

&gt; happened.  The app continues to run without missing a beat. As you would<br>

&gt; expect.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Our problem is when we encounter a “hard” error.  If node 1 becomes<br>

&gt; disconnected (network is removed), node 0 becomes impacted.   The app will<br>

&gt; freeze up as it can no longer get database connections.   We see the<br>

&gt; app/spring talk to Hikari, Hikari talks to jdbc, jdbc cannot get connection<br>

&gt; , eventaully Hikari times out (with 30 sec connection wait) and reples to<br>

&gt; app and we get exceptions.  This repeats as the app continues to try talk<br>

&gt; to the database.    Pgpool is aware that the node1 is gone as it is in<br>

&gt; recovery mode and node 0 pgpool retries to establish connectivity to pgpool<br>

&gt; on node 1 per pgpool.conf intervals.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; So the thing that really has us stumped is if node 0 is only talking<br>

&gt; through it’s stack to node 0 postgres, why is this failure on node1 having<br>

&gt; any impact on node 0 and freezing the db connections?    Obviously when a<br>

&gt; graceful shutdown occurs pgpool graceful handles this and things work as<br>

&gt; you expect.  With a hard failure, it does not.    I have attached our<br>

&gt; pgpool.conf file.   Can someone provide some guidance into the internals of<br>

&gt; pgpool and why this node1 hard failure causes node 0 impacts?<br>

<br>

</div></div>Pgpool-II connects to all PostgreSQL even if load_balance_mode = off.<br>

There has been ongoing discussions to make Pgpool-II connects to only<br>

1 backend, but it&#39;s not still implemented.<br>

<br>

If you want to shorten the &quot;black period&quot; (that&#39;s Pgpool-II is working<br>

on failover), You can adjust health check parameters and failover<br>

related parameter.<br>

<br>

Change fail_over_on_backend_error = off to on, will cause immediate<br>

failover if there&#39;s problem on connecting or read/write sockets to<br>

backend.<br>

<br>

health_check_period = 40 may take up to 40 seconds before Pgpool-II<br>

notices the error. So you might want to shorten this.<br>

<br>

health_check_timeout = 10 make take up to 10 secinds before Pgpool-II<br>

notices the error. So you might want to shorten this.<br>

<br>

health_check_max_retries = 3 could retry before it gives up, upto<br>

health_check_timeout*health_<wbr>check_max_retries = 30 seconds.<br>

<br>

Best regards,<br>

--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>

Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>

<br>

</blockquote></div><br></div>