[pgpool-hackers: 881] Re: Making Failover more robust.

Mon Apr 20 10:34:34 JST 2015

Hi,

It seems like good idea, but I have some questions.

I'm not sure how different from using health_check_max_retries.

How should NODE_TEMP_DOWN be defined? This should include
only max_connections error or also other case like health
check errors within health_check_max_retries?

How are NODE_TEMP_DOWN nodes treated by child processes?
While the status is NODE_TEMP_DOWN, are these allowed to be
sent queries from children?

How long does NODE_TEMP_DOWN state last? Forever untill
health check succeeds again? Or, this should be controlled
by other parameter?

On Fri, 17 Apr 2015 20:07:10 +0500
Muhammad Usama <m.usama at gmail.com> wrote:

> Hi
> 
> Currenlty pgpool-II does not discriminate between types and nature of
> backend failures, especially when performing the backend health check, And
> it triggers the node failover as soon as the health check fails to connect
> to backend PostgreSQL server (of course after retries gets expired). This
> is a big problem in case of transient failures like for example if
> max_connection is reached on the backend node and health check connection
> gets denied, it will still be considered as a backend node failure by
> pgpool-II and it will go on to trigger a failover. Despite the fact that
> node actually is working fine and pgpool-II child processes are
> successfully connected to that.
> 
> So I think pgpool-II health check should consider the cause and type of
> error happened on backend and depending on the type of error It should
> either register the failover request, ignore the error or may be just
> change the backend node status. We could introduce a new node status to
> identify these type of situations, (e-g NODE_TEMP_DOWN) and have a new
> configuration parameter to control the behavior of this state. And instead
> of straight away initiating the failover on a node, Health check keeps on
> probing for the node with this new NODE_TEMP_DOWN status and automatically
> make the node available when health check succeeds on the node.
> 
> Thoughts, suggestions and design ideas are most welcome
> 
> Thanks
> Best regards!
> Muhammad Usama

-- 
Yugo Nagata <nagata at sraoss.co.jp>