[pgpool-hackers: 3212] Re: Deal with recovery failure by an abnormally exiting child process

Tatsuo Ishii ishii at sraoss.co.jp
Tue Jan 8 10:57:42 JST 2019


> In bug 431, it was reported that recovery second stage fails if there
> was an abnormally exiting child process (typically caused by SIGKILL
> or segfault). This is because the global connection counter
> (Req_info->conn_counter) is left when the child process abnormaly
> exits. In general we have nothing to do for abnormaly exiting process
> situation and we recommend to restart whole Pgpool-II in this case.
> 
> However I find a tricky solution for a particular situation: if
> client_idle_limit_in_recovery is properly set (i.e.
> client_idle_limit_in_recovery >= recovery_timeout).
> 
> The logic is shown in the patch:
> 
> 	/*
> 	 * recovery_timeout was expired. Before returning with failure status,
> 	 * let's check if this is caused by the malformed conn_counter. If a child
> 	 * process abnormally exits (killed by SIGKILL or SEGFAULT, for example),
> 	 * then conn_counter is not decremented at process exit, thus it will
> 	 * never be returning to 0. This could be detected by checking if
> 	 * client_idle_limit_in_recovery is enabled and less value than
> 	 * recovery_timeout because all clients must be kicked out by the time
> 	 * when client_idle_limit_in_recovery is expired. If so, we should reset
> 	 * conn_counter to 0 also.
> 
> Should we emply this? Is it too tricky? Comments are welcome.

Forgot to attach patch.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: recovery.diff
Type: text/x-patch
Size: 1481 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190108/d9669837/attachment.bin>


More information about the pgpool-hackers mailing list