[pgpool-hackers: 2213] RFC: New health check implementation

Fri Mar 31 15:39:24 JST 2017

Hi,

This is a Request for Comments on the new implementation of health
check aiming for Pgpool-II 3.7.

Problems:

Current implementation of health check is a single serial processing
for all of database nodes. This implies some limitations for health
check process: 1) it is not possible to specify different health check
configuration values for each database node, 2) node failure detection
may take longer. For example, if it takes 10 seconds before detecting
node 0 failure, then detecting node 1 failure will be delayed at least
10 seconds because it starts node failure detection after node 0.

The solution:

Allow to specify health check parameters for each node. Pgpool.conf
will look like:

health_check_period0 = 10
health_check_timeout0 = 20
:
:

where "0" means database node 0 (similar concept as "backend_*0
parameters).

To make admin's life easier, current parameters can be used as
well. Suppose there are 3 nodes, and we have:

health_check_period = 10
health_check_period0 = 5

then health_check_period for node 1 and 2 will be 10, while
health_check_period for node 0 will be 5. So parameter names without
node id works as a "global variable".

The implementation:

Create separate child process of pgpool main process and let do the
heal check job for each database node. Once the health check child
process detects node failure, it signals to main process and main
process will perform failover.

This architecture makes pgpool main process simpler and robust, while
earlier detection of node failure by the health check child process.

Comments and suggestions are welcome.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp