[pgpool-hackers: 876] Making Failover more robust.

Sat Apr 18 00:07:10 JST 2015

Hi

Currenlty pgpool-II does not discriminate between types and nature of
backend failures, especially when performing the backend health check, And
it triggers the node failover as soon as the health check fails to connect
to backend PostgreSQL server (of course after retries gets expired). This
is a big problem in case of transient failures like for example if
max_connection is reached on the backend node and health check connection
gets denied, it will still be considered as a backend node failure by
pgpool-II and it will go on to trigger a failover. Despite the fact that
node actually is working fine and pgpool-II child processes are
successfully connected to that.

So I think pgpool-II health check should consider the cause and type of
error happened on backend and depending on the type of error It should
either register the failover request, ignore the error or may be just
change the backend node status. We could introduce a new node status to
identify these type of situations, (e-g NODE_TEMP_DOWN) and have a new
configuration parameter to control the behavior of this state. And instead
of straight away initiating the failover on a node, Health check keeps on
probing for the node with this new NODE_TEMP_DOWN status and automatically
make the node available when health check succeeds on the node.

Thoughts, suggestions and design ideas are most welcome

Thanks
Best regards!
Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20150417/88e5d4e6/attachment.html>