[pgpool-hackers: 3365] Re: Overhaul node failure debugging aid

Tatsuo Ishii ishii at sraoss.co.jp
Tue Aug 6 11:32:41 JST 2019


Done.

> check_backend_down_request() in health_check.c is intended to simulate
> the situation where communication failure between health check and
> PostgreSQL backend node by creating a file containing lines:
> 
> 1	down
> 
> where the first numeric is the node id starting from 0, tab, and
> "down". When health check process finds the file, let health check
> fails on node 1.
> 
> After health check brings the node into down status,
> check_backend_down_request() change "down" to "already_down" to
> prevent repeating node failure.
> 
> However, questions is, this is necessary at all. I think
> check_backend_down_request() should keep on reporting the down status
> and it should be called inside establish_persistent_connection() to
> prevent repeating node failure because it could be better simulated
> the failing situation in this way. For example, currently the health
> check retry is not simulated but the new way can do it.
> 
> Moreover, in current watchdog implementation, to bring a node into
> quarantine state requires *two" times of node communication error
> detection. Since check_backend_down_request() only allows to raise
> node down even *once" (after the down state is changed to already_down
> state), it's impossible to test the watchdog quarantine using
> check_backend_down_request(). I changed check_backend_down_request()
> so that it continues to raise "down" event as long as the down request
> file exists.
> 
> Attached patch tries to enhance check_backend_down_request() as
> described above.
> 
> 1) caller of check_backend_down_request() is
>    establish_persistent_connection(), rather than
>    do_health_check_child().
> 
> 2) check_backend_down_request() does not change "down" to
>    "already_down" anymore. This means that the second argument of
>    check_backend_down_request() is not useful anymore. Probably I
>    should remove the argument later on.
> 
> If there's no objection, I will commit/push this by the end of this
> weekend.


More information about the pgpool-hackers mailing list