[pgpool-general: 1901] Re: Node status and recovery sequence

Tatsuo Ishii ishii at postgresql.org
Wed Jul 17 09:10:30 JST 2013


> Hello,
> 
> I am testing a setup with pgpool with two postgres nodes in replication
> mode.
> 
> I tested the recovery where one postgres node goes down and I try to
> recover from the other remaining node. In my test, I find that the first
> stage recovery script (base recovery : basebackup.sh ) does not run.

Why do you think basebackup.sh did not run? I see:

> 2013-07-16 08:24:20 LOG:   pid 32565: starting recovery command: "SELECT
> pgpool_recovery('basebackup.sh', '192.168.0.103', '/var/lib/pgsq
> l')"
> 2013-07-16 08:24:21 LOG:   pid 32565: 1st stage is done
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

>  I
> tried to drop the database on the second node. Even in this condition, only
> the incremental transactions were applied and the node status was `2`  (up
> and accepting connections).
> 
> What triggers the execution of the base recovery script?
> 
> 
> Here is the pgpool log on node 0 during the recovery:
> 
> 
> 2013-07-16 08:24:20 LOG:   pid 32565: starting recovering node 1
> 2013-07-16 08:24:20 LOG:   pid 32565: CHECKPOINT in the 1st stage done
> 2013-07-16 08:24:20 LOG:   pid 32565: starting recovery command: "SELECT
> pgpool_recovery('basebackup.sh', '192.168.0.103', '/var/lib/pgsq
> l')"
> 2013-07-16 08:24:21 LOG:   pid 32565: 1st stage is done
> 2013-07-16 08:24:21 LOG:   pid 32565: starting 2nd stage
> 2013-07-16 08:24:21 LOG:   pid 32565: all connections from clients have
> been closed
> 2013-07-16 08:24:21 LOG:   pid 32565: CHECKPOINT in the 2nd stage done
> 2013-07-16 08:24:21 LOG:   pid 32565: starting recovery command: "SELECT
> pgpool_recovery('pgpool_recovery_pitr', '192.168.0.103', '/var/l
> ib/pgsql')"
> 2013-07-16 08:24:25 LOG:   pid 32565: check_postmaster_started: try to
> connect to postmaster on hostname:192.168.0.103 database:postgres
> user:postgres (retry 0 times)
> 2013-07-16 08:24:25 LOG:   pid 32565: 1 node restarted
> 2013-07-16 08:24:25 LOG:   pid 32565: send_failback_request: fail back 1 th
> node request from pid 32565
> 2013-07-16 08:24:25 LOG:   pid 24385: starting fail back. reconnect host
> 192.168.0.103(5432)
> 2013-07-16 08:24:25 LOG:   pid 24385: Restart all children
> 2013-07-16 08:24:25 LOG:   pid 24385: failover: set new primary node: -1
> 2013-07-16 08:24:25 LOG:   pid 24385: failover: set new master node: 0
> 2013-07-16 08:24:25 LOG:   pid 24385: failback done. reconnect host
> 192.168.0.103(5432)
> 2013-07-16 08:24:25 LOG:   pid 32565: recovery done
> 2013-07-16 08:24:25 LOG:   pid 32564: worker process received restart
> request
> 2013-07-16 08:24:25 LOG:   pid 32690: connection received:
> host=192.168.0.102 port=35052
> 
> 
> As you can see, the first stage is not done. However it would be required
> since I droppped the database on node 1.
> 
> Thanks
> 
> Gilbert


More information about the pgpool-general mailing list