[pgpool-hackers: 3476] Re: Proposal: health check statistics

Tue Dec 10 11:28:06 JST 2019

> On Tue, 10 Dec 2019 09:18:58 +0900 (JST)
> Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Currently Pgpool-II's health check process logs various information
>> including backend connection problem, retrying to recover from it, and
>> so on. This information is very important for users because it reports
>> the healthiness problem of PostgreSQL.　For example, observing
>> increase of retry count may suggest that network connection between
>> Pgpool-II and PostgreSQL having trouble so that users could replace
>> the switch before actual failure occurs. Problem is, it is annoying to
>> look for such that information from log files afterward since it may
>> already disappear or was not logged by other problems (such as disk
>> full).
>> 
>> I would like to propose a new feature:
>> 
>> - Accumulate health check statistics on shared memory so that later on
>>   users can look into the stats using PCP commands.
>> 
>> - Such statistics includes:
>>   - failure count per backend nodes
>>   - retry count per backend nodes
>>   - success count after retries
> 
> How about collecting statistics of response time? For example:
> - average response time per backend nodes
> - maximum response time of successful check
> 
> If these are available, it may help users tune timeout values in
> configurations.

That makes sense. Although "response time" is divided into multiple
phase: connect to backend, send startup packet and read response from
backend.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp