[pgpool-general: 1980] trouble with recovery of a downed node

Fri Aug 2 04:11:40 JST 2013

Hi again,

I am having difficulty achieving a consistent database recovery in my 
three-PostgreSQL, two-pgpool setup (native replication).  In a previous 
post I mentioned that I often get inconsistent counts of updated rows 
for a table that is probably being updated during the recovery.

The technique I'm using is a daily backup with continuous archiving as 
described at 
http://zetetic.net/blog/2012/3/9/point-in-time-recovery-from-backup-using-postgresql-continuo.html. 
As a result my setup has no recovery_1st_stage_command, 
pgpool_recovery_pitr as the second phase command, and my 
pgpool_remote_start takes care of the complete recovery and restart as 
in that blog post.

I would include my exact scripts but I don't want to be one of those 
people who say "here's my setup, now fix it".  :-)  Instead I'd like to 
better understand the recovery phases and how they impact database 
availability.  Here is the order of things as I understand it:

1)    Start recovery by running the stage 1 script.  Existing and new 
connections, and all queries and updates, are allowed.  Updates are 
written to the WAL.

2)    Stop accepting connections and queries.  (Question: Are existing 
connections terminated at this point?)

3)    Run the stage 2 script.  pgpool_recovery_pitr does 
pgpool_switch_xlog() as recommended, so all WAL are archived by the time 
it completes.

4)    Run pgpool_remote_start.
   a)    This copies the base backup and archived WALs, and writes 
recovery.conf.
   b)    Since in postgresql.conf I have hot_standby=off, connections 
are blocked until recovery completes.
   c)    My pgpool_remote_start starts PostgreSQL on the recovered node 
in a synchronous fashion, so it does not terminate until the PostgreSQL 
startup is finished.

5)    Connections are allowed once more.

Is this correct?  With this flow I can't see how the newly recovered 
node could be out of sync with the master.  But the cluster behaves as 
if an update has not been recorded in the archived WAL, or an update 
took place on the master and the other slave somewhere between step 2 
and step 5.

pgpool-II's logs of the recovery look correct:

2013-08-01 16:16:09 LOG:   pid 25265: send_failback_request: fail back 1 
th node request from pid 25265
2013-08-01 16:16:09 LOG:   pid 25257: wd_start_interlock: start interlocking
2013-08-01 16:16:09 LOG:   pid 25265: wd_send_response: 
WD_STAND_FOR_LOCK_HOLDER received it
2013-08-01 16:16:09 LOG:   pid 25257: starting fail back. reconnect host 
psql-vm2.compusult.net(5432)
2013-08-01 16:16:11 LOG:   pid 25257: Restart all children
2013-08-01 16:16:11 LOG:   pid 25257: wd_end_interlock: end interlocking
2013-08-01 16:16:12 LOG:   pid 25257: failover: set new primary node: -1
2013-08-01 16:16:12 LOG:   pid 25257: failover: set new master node: 0
2013-08-01 16:16:12 LOG:   pid 6894: worker process received restart request
2013-08-01 16:16:12 LOG:   pid 25257: failback done. reconnect host 
psql-vm2.compusult.net(5432)
2013-08-01 16:16:13 LOG:   pid 6893: pcp child process received restart 
request
2013-08-01 16:16:13 LOG:   pid 25257: PCP child 6893 exits with status 
256 in failover()
2013-08-01 16:16:13 LOG:   pid 25257: fork a new PCP child pid 7374 in 
failover()
2013-08-01 16:16:13 LOG:   pid 25257: worker child 6894 exits with 
status 256
2013-08-01 16:16:13 LOG:   pid 25257: fork a new worker child pid 7375

But no matter how I do it, I get something like this after running my 
app for a while:

2013-08-01 17:08:28 ERROR: pid 6597: pgpool detected difference of the 
number of inserted, updated or deleted tuples. Possible last query was: 
" UPDATE ws_cached_searches SET ... "
2013-08-01 17:08:28 LOG:   pid 6597: CommandComplete: Number of affected 
tuples are: 1 0 1
2013-08-01 17:08:28 LOG:   pid 6597: ReadyForQuery: Degenerate backends: 1
2013-08-01 17:08:28 LOG:   pid 6597: ReadyForQuery: Number of affected 
tuples are: 1 0 1
2013-08-01 17:08:28 LOG:   pid 25257: starting degeneration. shutdown 
host psql-vm2.compusult.net(5432)

It varies whether the complaint is for the node just recovered, or the 
other slave.

I feel like I'm missing something critical in the backup or recovery 
process, but it is eluding me.  Can anyone help?

Thanks,
Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sean.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130801/58074d25/attachment.vcf>