Thank you Tatsuo. I would say &quot;will go in never ending loop&quot; if any of slave stop responding (until alive again) as It is been observed earlier i.e.<div><br></div><div>


<span class="Apple-style-span" style="color:rgb(34,34,34);font-size:13px;font-family:Arial">pgpool.log</span></div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

....<br>2013-04-04 12:34:41 DEBUG: pid 44263: retrying <b>10867</b> th health checking<br>2013-04-04 12:34:41 DEBUG: pid 44263: health_check: 0 th DB node status: 2<br>2013-04-04 12:34:41 DEBUG: pid 44263: pool_ssl: SSL requested but SSL support is not available<br>

2013-04-04 12:34:41 DEBUG: pid 44263: s_do_auth: auth kind: 0<br>2013-04-04 12:34:41 DEBUG: pid 44263: s_do_auth: backend key data received<br>2013-04-04 12:34:41 DEBUG: pid 44263: s_do_auth: transaction state: I<br>2013-04-04 12:34:41 DEBUG: pid 44263: health_check: 1 th DB node status: 2<br>

2013-04-04 12:34:41 ERROR: pid 44263: connect_inet_domain_socket: getsockopt() detected error: Connection refused<br>2013-04-04 12:34:41 ERROR: pid 44263: make_persistent_db_connection: connection to localhost(7445) failed<br>

2013-04-04 12:34:41 ERROR: pid 44263: health check failed. 1 th host localhost at port 7445 is down<br>2013-04-04 12:34:41 LOG:   pid 44263: health_check: 1 failover is canceld because failover is disallowed<br>....<br>....</blockquote>

<div><br></div><div>AFAIU discussing it with you that it is a feature not a bug. In the presented scenario, If any of slave got down or missing ( maybe because of network issue ), until it become available/up again, pgpool will be non responsive to any new connection (with no warning or message). Do you agree ?. Thanks.</div>

<div><br><div><div>Best Regards,</div><div>Asif Naeem<br><br><div class="gmail_quote">On Tue, Apr 9, 2013 at 5:05 AM, Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@postgresql.org" target="_blank">ishii@postgresql.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Well, &quot;will go in never ending loop&quot; is a little bit incorrect<br>

statement.  What happens here is, pgpool tries to fail over every<br>

health_check_period and it is canceled because DISALLOW_TO_FAILOVER<br>

flag was set. This particular set up has at least two use cases:<br>

<br>

- PostgreSQL is protected by heartbeat/pacemaker or any other HA(High<br>

  Availability software). When a PostgreSQL server fails, they are<br>

  responsible for taking over the node by the standby PostgreSQL. Once<br>

  the PostgreSQL comes up, pgpool will start to accept connections<br>

  from clients.<br>

<br>

- Admin wants to upgrade PostgreSQL immediately because of security<br>

  issues with it (like recent PostgreSQL). He stops PostgreSQL one by<br>

  one and upgrades them. While admin stops PostgreSQL, pgpool refuses<br>

  to accept connections from clients and database consistency among<br>

  database nodes are safely kept. This will make minimize the down<br>

  time.<br>

<br>

In summary, I see no point to change current behavior of pgpool.<br>

--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

<div><div class="h5"><br>

&gt; Hi Tatsuo Ishii,<br>

&gt;<br>

&gt; By looking at the source code, It seems that health check mechanism depends<br>

&gt; on failover option (fail_over_on_backend_error + backend_flag) for non<br>

&gt; parallel mode and will go in never ending loop if failover is disabled (As<br>

&gt; I mentioned earlier on Issue#3 in first email) i.e.<br>

&gt;<br>

&gt; pgpool2/main.c<br>

&gt;<br>

&gt;&gt; /* do we need health checking for PostgreSQL? */<br>

&gt;&gt; if (pool_config-&gt;health_check_period &gt; 0)<br>

&gt;&gt; {<br>

&gt;&gt; ...<br>

&gt;&gt; ...<br>

&gt;&gt; if (POOL_DISALLOW_TO_FAILOVER(BACKEND_INFO(sts).flag))<br>

&gt;&gt; {<br>

&gt;&gt;      pool_log(&quot;health_check: %d failover is canceld because failover is<br>

&gt;&gt; disallowed&quot;, sts);<br>

&gt;&gt; }<br>

&gt;&gt; else if (retrycnt &lt;= pool_config-&gt;health_check_max_retries)<br>

&gt;&gt; ...<br>

&gt;&gt; ...<br>

&gt;&gt; }<br>

&gt;<br>

&gt;<br>

&gt; It seems failover depend on configuration option not only<br>

&gt; fail_over_on_backend_error but as well as backend_flag too. If<br>

&gt; fail_over_on_backend_error is &quot;on&quot; but backend_flag is<br>

&gt; &quot;DISALLOW_TO_FAILOVER&quot; it will not trigger fail over for related slave<br>

&gt; node. On the other hand If child process find an error in connection for<br>

&gt; any related node it aborts. As you suggested earlier It seems the only<br>

&gt; appropriate thing that should be done is failover and restart all child<br>

&gt; processes, if error in connection to any related node found.<br>

&gt;<br>

&gt; In the example (Issue#3 in first email) I mentioned earlier there is dead<br>

&gt; end and pgpool goes in endless loop and become non responsive for new<br>

&gt; connections if we use following configuration settings i.e.<br>

&gt;<br>

&gt; pgpool.conf<br>

&gt;<br>

&gt;&gt; fail_over_on_backend_error  = on<br>

&gt;&gt; backend_flag0 = &#39;DISALLOW_TO_FAILOVER&#39;<br>

&gt;&gt; backend_flag1 = &#39;DISALLOW_TO_FAILOVER&#39;<br>

&gt;&gt; health_check_period = 5<br>

&gt;&gt; health_check_timeout = 1<br>

&gt;&gt; health_check_retry_delay = 10<br>

&gt;<br>

&gt;<br>

&gt; On each new<br>

&gt; connection new_connection()-&gt;notice_backend_error()-&gt;degenerate_backend_set()<br>

&gt; give the following warning i.e.<br>

&gt;<br>

&gt; if (POOL_DISALLOW_TO_FAILOVER(BACKEND_INFO(node_id_set[i]).flag))<br>

&gt;&gt; {<br>

&gt;&gt;      pool_log(&quot;degenerate_backend_set: %d failover request from pid %d is<br>

&gt;&gt; canceld because failover is disallowed&quot;, node_id_set[i], getpid());<br>

&gt;&gt;      continue;<br>

&gt;&gt; }<br>

&gt;<br>

&gt;<br>

&gt; As mentioned in the fail_over_on_backend_error documentation, failover can<br>

&gt; happen even when fail_over_on_backend_error=off when it detects<br>

&gt; administrative shutdown of postmaster i.e.<br>

&gt;<br>

&gt; <a href="http://www.pgpool.net/docs/latest/pgpool-en.html" target="_blank">http://www.pgpool.net/docs/latest/pgpool-en.html</a><br>

&gt;<br>

&gt;&gt; fail_over_on_backend_error V2.3 -<br>

&gt;&gt; If true, and an error occurs when reading/writing to the backend<br>

&gt;&gt; communication, pgpool-II will trigger the fail over procedure. If set to<br>

&gt;&gt; false, pgpool will report an error and disconnect the session. If you set<br>

&gt;&gt; this parameter to off, it is recommended that you turn on health checking.<br>

&gt;&gt; Please note that even if this parameter is set to off, however, pgpool will<br>

&gt;&gt; also do the fail over when pgpool detects the administrative shutdown of<br>

&gt;&gt; postmaster.<br>

&gt;&gt; You need to reload pgpool.conf if you change this value.<br>

&gt;<br>

&gt;<br>

&gt; If failover/degenerate is only option to handle the situation where slave<br>

&gt; node is non responsive/crashed etc, can&#39;t it be allowed in the code to do<br>

&gt; failover on connection error (even when it is disabled) ?. Thanks.<br>

&gt;<br>

&gt; Best Regards,<br>

&gt; Asif Naeem<br>

&gt;<br>

&gt; On Wed, Apr 3, 2013 at 11:43 AM, Asif Naeem &lt;<a href="mailto:anaeem.it@gmail.com">anaeem.it@gmail.com</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; Hi,<br>

&gt;&gt;<br>

&gt;&gt; We are facing issue with pgpool health check failsafe mechanism in<br>

&gt;&gt; production environment. I have previously posted this issue on<br>

&gt;&gt; <a href="http://www.pgpool.net/mantisbt/view.php?id=50" target="_blank">http://www.pgpool.net/mantisbt/view.php?id=50</a>. I have observed 2 issue<br>

&gt;&gt; with gpool-II version 3.2.3 (built with latest source code) i.e.<br>

&gt;&gt;<br>

&gt;&gt; Used versions i.e.<br>

&gt;&gt;<br>

&gt;&gt;&gt; pgpool-II version 3.2.3<br>

&gt;&gt;&gt; postgresql 9.2.3 (Master + Slave)<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; 1. In master slave configuration, if health check and failover is enabled<br>

&gt;&gt; i.e.<br>

&gt;&gt;<br>

&gt;&gt; pgpool.conf<br>

&gt;&gt;<br>

&gt;&gt;&gt; backend_flag0 = &#39;ALLOW_TO_FAILOVER&#39;<br>

&gt;&gt;&gt; backend_flag1 = &#39;ALLOW_TO_FAILOVER&#39;<br>

&gt;&gt;&gt;<br>

&gt;&gt; health_check_period = 5<br>

&gt;&gt;&gt; health_check_timeout = 1<br>

&gt;&gt;&gt; health_check_max_retries = 2<br>

&gt;&gt;&gt; health_check_retry_delay = 10<br>

&gt;&gt;<br>

&gt;&gt; load_balance_mode = off<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; On Linux64, When master server is running fine and without load balancing<br>

&gt;&gt; and when suddenly if network interruption happen or any other reason (I<br>

&gt;&gt; mimic the situation via forcefully shutdown dbserver via immediate mode<br>

&gt;&gt; etc) and pgpool is not able to make connection to slave server. After that<br>

&gt;&gt; first connection attempt to pgpool return without error/warning message and<br>

&gt;&gt; pgpool do fail over and kill all child processes. Does that make sense that<br>

&gt;&gt; when there is no load balancing and master dbserver is serving the queries<br>

&gt;&gt; well and disconnection of slave server trigger failover ?.<br>

&gt;&gt;<br>

&gt;&gt; pgpool.log<br>

&gt;&gt;<br>

&gt;&gt;&gt; ....<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65431: I am 65431 accept fd 6<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65431: read_startup_packet:<br>

&gt;&gt;&gt; application_name: psql<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65431: Protocol Major: 3 Minor: 0<br>

&gt;&gt;&gt; database: postgres user: asif<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65431: new_connection: connecting 0 backend<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65431: new_connection: connecting 1 backend<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 ERROR: pid 65431: connect_inet_domain_socket:<br>

&gt;&gt;&gt; getsockopt() detected error: Connection refused<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 ERROR: pid 65431: connection to localhost(7445) failed<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 ERROR: pid 65431: new_connection: create_cp() failed<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 LOG:   pid 65431: degenerate_backend_set: 1 fail over<br>

&gt;&gt;&gt; request from pid 65431<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler called<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: starting to<br>

&gt;&gt;&gt; select new master node<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 LOG:   pid 65417: starting degeneration. shutdown<br>

&gt;&gt;&gt; host localhost(7445)<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 LOG:   pid 65417: Restart all children<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65418<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65419<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65420<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65421<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65422<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65423<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65424<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65425<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65426<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65427<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65428<br>

&gt;&gt;&gt; 2013-04-02 17:24:36 DEBUG: pid 65417: failover_handler: kill 65429<br>

&gt;&gt;&gt; ...<br>

&gt;&gt;&gt; ...<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; 2. In the same previous configuration, If I disable failover i.e.<br>

&gt;&gt;<br>

&gt;&gt; pgpool.conf<br>

&gt;&gt;<br>

&gt;&gt;&gt; backend_flag0 = &#39;DISALLOW_TO_FAILOVER&#39;<br>

&gt;&gt;&gt; backend_flag1 = &#39;DISALLOW_TO_FAILOVER&#39;<br>

&gt;&gt;&gt;<br>

&gt;&gt; health_check_period = 5<br>

&gt;&gt;&gt; health_check_timeout = 1<br>

&gt;&gt;&gt; health_check_max_retries = 2<br>

&gt;&gt;&gt; health_check_retry_delay = 10<br>

&gt;&gt;<br>

&gt;&gt; load_balance_mode = off<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; On Linux64, When master server is running fine and there is no load<br>

&gt;&gt; balancing and no failover and suddenly slave server appear to be<br>

&gt;&gt; disconnected because of network interruption happen or any other reason (I<br>

&gt;&gt; mimic it by forcefully shutdown dbserver via immediate mode etc). After<br>

&gt;&gt; that no connection attempt got successful to pgpool until health check<br>

&gt;&gt; complete and master database server log shows the following messages i.e.<br>

&gt;&gt;<br>

&gt;&gt; dbserver.log<br>

&gt;&gt;   ...<br>

&gt;&gt;   ...<br>

&gt;&gt;   LOG: incomplete startup packet<br>

&gt;&gt;   LOG: incomplete startup packet<br>

&gt;&gt;   LOG: incomplete startup packet<br>

&gt;&gt;   LOG: incomplete startup packet<br>

&gt;&gt;   LOG: incomplete startup packet<br>

&gt;&gt;   ...<br>

&gt;&gt;<br>

&gt;&gt; 3. While testing this scenario on my MacOSX machine (gcc), it seems that<br>

&gt;&gt; health check is not getting complete and endless with pgpool configuration<br>

&gt;&gt; settings as issue #2 above and it completely refrain me from to to connect<br>

&gt;&gt; pgpool any more i.e.<br>

&gt;&gt;<br>

&gt;&gt; pgpool.log<br>

&gt;&gt;<br>

&gt;&gt;&gt; ...<br>

&gt;&gt;&gt; ...<br>

</div></div>&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: retrying *679* th health checking<br>

<div class="im">&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: health_check: 0 th DB node status: 2<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: pool_ssl: SSL requested but SSL<br>

&gt;&gt;&gt; support is not available<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: s_do_auth: auth kind: 0<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: s_do_auth: backend key data received<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: s_do_auth: transaction state: I<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 DEBUG: pid 44263: health_check: 1 th DB node status: 2<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 ERROR: pid 44263: connect_inet_domain_socket:<br>

&gt;&gt;&gt; getsockopt() detected error: Connection refused<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 ERROR: pid 44263: make_persistent_db_connection:<br>

&gt;&gt;&gt; connection to localhost(7445) failed<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 ERROR: pid 44263: health check failed. 1 th host<br>

&gt;&gt;&gt; localhost at port 7445 is down<br>

&gt;&gt;&gt; 2013-04-03 11:29:29 LOG:   pid 44263: health_check: 1 failover is canceld<br>

&gt;&gt;&gt; because failover is disallowed<br>

</div>&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: retrying *680* th health checking<br>

<div class="HOEnZb"><div class="h5">&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: health_check: 0 th DB node status: 2<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: pool_ssl: SSL requested but SSL<br>

&gt;&gt;&gt; support is not available<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: s_do_auth: auth kind: 0<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: s_do_auth: backend key data received<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: s_do_auth: transaction state: I<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 DEBUG: pid 44263: health_check: 1 th DB node status: 2<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 ERROR: pid 44263: connect_inet_domain_socket:<br>

&gt;&gt;&gt; getsockopt() detected error: Connection refused<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 ERROR: pid 44263: make_persistent_db_connection:<br>

&gt;&gt;&gt; connection to localhost(7445) failed<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 ERROR: pid 44263: health check failed. 1 th host<br>

&gt;&gt;&gt; localhost at port 7445 is down<br>

&gt;&gt;&gt; 2013-04-03 11:29:34 LOG:   pid 44263: health_check: 1 failover is canceld<br>

&gt;&gt;&gt; because failover is disallowed<br>

&gt;&gt;&gt; ...<br>

&gt;&gt;&gt; ...<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; I will try it on Linux64 machine too. Thanks.<br>

&gt;&gt;<br>

&gt;&gt; Best Regards,<br>

&gt;&gt; Asif Naeem<br>

&gt;&gt;<br>

&gt;&gt;<br>

</div></div></blockquote></div><br></div></div></div>