[pgpool-hackers: 3329] Re: [proposal] New feature: auto failback of stancby node

Thu May 23 08:15:18 JST 2019

Great! This should solve one of the our long standing TODO item:
https://pgpool.net/mediawiki/index.php/TODO#Automatically_reattach_a_node_in_streaming_master.2Fslave_configuration

With this feature enabled, Pgpool-II will automatically bring back a
"healthy" standby node (that means the standby server is not only up
and running but properly connected to the primary server).

One question is, whether it should check the replication delay of the
standby server in question. I.e. if the delay is too large, do not
automatically failback the server. I think the check is not necessary
since we can avoid to use that by using the delay_threshold parameter.

Also note that if the server is in "catchup" replication state (that
could happen if the server had been stopping for a while and the
primary server had performed lots of modifications to the database),
the server will not be automatically failbacked because the state is
not "streaming".

BTW, the feature will work if PostgreSQL version is 9.1 or higher (not
work with 9.0 because there's no pg_stat_replication view which the
feature relies on).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi all,
> 
> I suggest new feature of auto failback for Pgpool-II 4.1.
> 
> Now, pgpool execute backend degeneration at tempolary network error or query's error response or etc.
> So standby node of streaming replication is degenerated by pgpool, even if replication 
> between primary and standby nodes is no problem. In this case, pgpool set 'down' status,
> but postgres's replication is continuing normally.
> But User need to attached to pgpool manually, in order to do load balance by pgpool again for standby node.
> 
> I attached a patch of 'auto failback'. This feature use "replication_state" added for pool_worker_process in 4.1.
> And valid if auto_failback is on. If worker process find node which replication_status is 
> 'streaming' and backend_status is 'down', worker_process request failback like pcp_attach_node.
> 
> Comments and suggestions are welcome.
> 
> Best regards,
> -- 
> Takuma Hoshiai <hoshiai at sraoss.co.jp>