[pgpool-general: 2054] Re: trouble with recovery of a downed node

Sean Hogan sean at compusult.net
Wed Aug 21 04:38:31 JST 2013


On 13-08-05 11:54 PM, Tatsuo Ishii wrote:
>> 3) Run the stage 2 script.  pgpool_recovery_pitr does
>> pgpool_switch_xlog() as recommended, so all WAL are archived by the
>> time it completes.
>>
>> 4)    Run pgpool_remote_start.
>>    a) This copies the base backup and archived WALs, and writes
>>    recovery.conf.
>>    b) Since in postgresql.conf I have hot_standby=off, connections are
>>    blocked until recovery completes.
>>    c) My pgpool_remote_start starts PostgreSQL on the recovered node in a
>>    synchronous fashion, so it does not terminate until the PostgreSQL
>>    startup is finished.
>>
>> 5)    Connections are allowed once more.
>>
>>
>> Is this correct?  With this flow I can't see how the newly recovered
>> node could be out of sync with the master.  But the cluster behaves as
>> if an update has not been recorded in the archived WAL, or an update
>> took place on the master and the other slave somewhere between step 2
>> and step 5.
> Not sure I correctly understand your strategy but...
>
> Can you stop updating at all while online recovery and make sure that
> there's no difference between master DB and recoveried DB? If there's
> no difference, the problem must be in the WAL copy process (or timing).
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>

Tatsuo, thanks very much for your response.  I've tried online recovery 
with the database clients disconnected, but it didn't have any effect.  
The database recovers correctly, works for a while and then when a 
certain update takes place one of the nodes blows up. It's not always 
the same statement but one of them is:

2013-08-20 17:04:02 ERROR: pid 30688: pgpool detected difference of the 
number of inserted, updated or deleted tuples. Possible last query was: 
"UPDATE ws_cached_searches SET search_data = $1, cache_time = 
current_timestamp WHERE cached_search_id = $2"
2013-08-20 17:04:02 LOG:   pid 30688: CommandComplete: Number of 
affected tuples are: 1 0 1


I've posted my configuration to 
https://github.com/clixtec/redundant-pgpool-config - pgpool and 
PostgreSQL config files in the root, and the four script files in a 
subdirectory.  Again it is modelled on the approach described at 
http://zetetic.net/blog/2012/3/9/point-in-time-recovery-from-backup-using-postgresql-continuo.html. 
Am I doing anything fundamentally wrong?

Thanks,
Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130820/8aa067cc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sean.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130820/8aa067cc/attachment.vcf>


More information about the pgpool-general mailing list