[pgpool-general: 1900] Node status and recovery sequence

Gilbert Soucy gsoucy at 36pix.com
Tue Jul 16 21:35:30 JST 2013


Hello,

I am testing a setup with pgpool with two postgres nodes in replication
mode.

I tested the recovery where one postgres node goes down and I try to
recover from the other remaining node. In my test, I find that the first
stage recovery script (base recovery : basebackup.sh ) does not run. I
tried to drop the database on the second node. Even in this condition, only
the incremental transactions were applied and the node status was `2`  (up
and accepting connections).

What triggers the execution of the base recovery script?


Here is the pgpool log on node 0 during the recovery:


2013-07-16 08:24:20 LOG:   pid 32565: starting recovering node 1
2013-07-16 08:24:20 LOG:   pid 32565: CHECKPOINT in the 1st stage done
2013-07-16 08:24:20 LOG:   pid 32565: starting recovery command: "SELECT
pgpool_recovery('basebackup.sh', '192.168.0.103', '/var/lib/pgsq
l')"
2013-07-16 08:24:21 LOG:   pid 32565: 1st stage is done
2013-07-16 08:24:21 LOG:   pid 32565: starting 2nd stage
2013-07-16 08:24:21 LOG:   pid 32565: all connections from clients have
been closed
2013-07-16 08:24:21 LOG:   pid 32565: CHECKPOINT in the 2nd stage done
2013-07-16 08:24:21 LOG:   pid 32565: starting recovery command: "SELECT
pgpool_recovery('pgpool_recovery_pitr', '192.168.0.103', '/var/l
ib/pgsql')"
2013-07-16 08:24:25 LOG:   pid 32565: check_postmaster_started: try to
connect to postmaster on hostname:192.168.0.103 database:postgres
user:postgres (retry 0 times)
2013-07-16 08:24:25 LOG:   pid 32565: 1 node restarted
2013-07-16 08:24:25 LOG:   pid 32565: send_failback_request: fail back 1 th
node request from pid 32565
2013-07-16 08:24:25 LOG:   pid 24385: starting fail back. reconnect host
192.168.0.103(5432)
2013-07-16 08:24:25 LOG:   pid 24385: Restart all children
2013-07-16 08:24:25 LOG:   pid 24385: failover: set new primary node: -1
2013-07-16 08:24:25 LOG:   pid 24385: failover: set new master node: 0
2013-07-16 08:24:25 LOG:   pid 24385: failback done. reconnect host
192.168.0.103(5432)
2013-07-16 08:24:25 LOG:   pid 32565: recovery done
2013-07-16 08:24:25 LOG:   pid 32564: worker process received restart
request
2013-07-16 08:24:25 LOG:   pid 32690: connection received:
host=192.168.0.102 port=35052


As you can see, the first stage is not done. However it would be required
since I droppped the database on node 1.

Thanks

Gilbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130716/80d08516/attachment.html>


More information about the pgpool-general mailing list