[pgpool-hackers: 3350] Re: [proposal] New feature: auto failback of stancby node

Thu Jul 11 16:31:46 JST 2019

Great. Thank you for your work. This is one of the long waited
features.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi all,
> 
> I improve auto_failback's patch and add document and regression test.
> 
> improvement point is:
> 
> * use health check process
> The previous patch used sr check porcess only. If a network 
> between promary and standby node is normal but a network between 
> pgpool and standby node is trouble, auto_failback was executed  
> after that failover probably executed.
> In this patch, pgpool do health check to standby node before auto failback newly.
> 
> * add auto_failback_interaval paramter
> This parameter can specify the minimum amount of time for execution 
> interval of auto failback. This avoid repeating of failover and failback,
> because of network error for example.
> 
> Comments and suggestions are welcome.
> 
> On Thu, 23 May 2019 08:15:18 +0900 (JST)
> Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Great! This should solve one of the our long standing TODO item:
>> https://pgpool.net/mediawiki/index.php/TODO#Automatically_reattach_a_node_in_streaming_master.2Fslave_configuration
>> 
>> With this feature enabled, Pgpool-II will automatically bring back a
>> "healthy" standby node (that means the standby server is not only up
>> and running but properly connected to the primary server).
>> 
>> One question is, whether it should check the replication delay of the
>> standby server in question. I.e. if the delay is too large, do not
>> automatically failback the server. I think the check is not necessary
>> since we can avoid to use that by using the delay_threshold parameter.
>> 
>> Also note that if the server is in "catchup" replication state (that
>> could happen if the server had been stopping for a while and the
>> primary server had performed lots of modifications to the database),
>> the server will not be automatically failbacked because the state is
>> not "streaming".
>> 
>> BTW, the feature will work if PostgreSQL version is 9.1 or higher (not
>> work with 9.0 because there's no pg_stat_replication view which the
>> feature relies on).
>> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>> 
>> > Hi all,
>> > 
>> > I suggest new feature of auto failback for Pgpool-II 4.1.
>> > 
>> > Now, pgpool execute backend degeneration at tempolary network error or query's error response or etc.
>> > So standby node of streaming replication is degenerated by pgpool, even if replication 
>> > between primary and standby nodes is no problem. In this case, pgpool set 'down' status,
>> > but postgres's replication is continuing normally.
>> > But User need to attached to pgpool manually, in order to do load balance by pgpool again for standby node.
>> > 
>> > I attached a patch of 'auto failback'. This feature use "replication_state" added for pool_worker_process in 4.1.
>> > And valid if auto_failback is on. If worker process find node which replication_status is 
>> > 'streaming' and backend_status is 'down', worker_process request failback like pcp_attach_node.
>> > 
>> > Comments and suggestions are welcome.
>> > 
>> > Best regards,
>> > -- 
>> > Takuma Hoshiai <hoshiai at sraoss.co.jp>
>> 
> 
> Best regards,
> 
> -- 
> Takuma Hoshiai <hoshiai at sraoss.co.jp>