<div dir="ltr">Hi<br><div class="gmail_extra"><br></div><div class="gmail_extra">Please see my response inline.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 20, 2015 at 6:34 AM, Yugo Nagata <span dir="ltr">&lt;<a href="mailto:nagata@sraoss.co.jp" target="_blank">nagata@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>

<br>

It seems like good idea, but I have some questions. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

I&#39;m not sure how different from using health_check_max_retries.<br></blockquote><div><br></div><div>health_check_max_retries only waits for a specific amount of time for the node to get back online and does not care about what has caused the node to become unavailable. And having larger values for this configuration to cover the transient errors also delays the failovers in cases of actual node failures.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

How should NODE_TEMP_DOWN be defined? This should include<br>

only max_connections error or also other case like health<br>

check errors within health_check_max_retries?<br></blockquote><div><br></div><div>I am thinking of NODE_TEMP_DOWN  for only temporary kind of errors where PostgreSQL node is reachable but connection is explicitly closed by PG server. Currently I can only think of  max_connections reached error at the moment, but I am sure there are other cases.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

How are NODE_TEMP_DOWN nodes treated by child processes?<br>

While the status is NODE_TEMP_DOWN, are these allowed to be<br>

sent queries from children?<br></blockquote><div><br></div><div>I think it should be treated similarly as NODE_DOWN status by child processes.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

How long does NODE_TEMP_DOWN state last? Forever untill<br>

health check succeeds again? Or, this should be controlled<br>

by other parameter?<br></blockquote><div><br></div><div>This one need to be thought out a little more. Some of the options are. It always remains as NODE_TEMP_DOWN until the node comes back or die permanently, Or we can control it with a new configuration parameter, which could put a time limit on this status before failing the node.</div><div><br></div><div><br></div><div>Thanks</div><div>Kind regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class=""><div class="h5"><br>

On Fri, 17 Apr 2015 20:07:10 +0500<br>

Muhammad Usama &lt;<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>&gt; wrote:<br>

<br>

&gt; Hi<br>

&gt;<br>

&gt; Currenlty pgpool-II does not discriminate between types and nature of<br>

&gt; backend failures, especially when performing the backend health check, And<br>

&gt; it triggers the node failover as soon as the health check fails to connect<br>

&gt; to backend PostgreSQL server (of course after retries gets expired). This<br>

&gt; is a big problem in case of transient failures like for example if<br>

&gt; max_connection is reached on the backend node and health check connection<br>

&gt; gets denied, it will still be considered as a backend node failure by<br>

&gt; pgpool-II and it will go on to trigger a failover. Despite the fact that<br>

&gt; node actually is working fine and pgpool-II child processes are<br>

&gt; successfully connected to that.<br>

&gt;<br>

&gt; So I think pgpool-II health check should consider the cause and type of<br>

&gt; error happened on backend and depending on the type of error It should<br>

&gt; either register the failover request, ignore the error or may be just<br>

&gt; change the backend node status. We could introduce a new node status to<br>

&gt; identify these type of situations, (e-g NODE_TEMP_DOWN) and have a new<br>

&gt; configuration parameter to control the behavior of this state. And instead<br>

&gt; of straight away initiating the failover on a node, Health check keeps on<br>

&gt; probing for the node with this new NODE_TEMP_DOWN status and automatically<br>

&gt; make the node available when health check succeeds on the node.<br>

&gt;<br>

&gt; Thoughts, suggestions and design ideas are most welcome<br>

&gt;<br>

&gt; Thanks<br>

&gt; Best regards!<br>

&gt; Muhammad Usama<br>

<br>

<br>

</div></div><span class=""><font color="#888888">--<br>

Yugo Nagata &lt;<a href="mailto:nagata@sraoss.co.jp">nagata@sraoss.co.jp</a>&gt;<br>

</font></span></blockquote></div><br></div></div>