[pgpool-general: 2507] Re: Help error code e1012 on pgpool II 3.3.0 while clicking Recovery button

Thu Jan 30 22:27:14 JST 2014

Thanks a million Sir, you have been of great help to me and also a quick suggestion from my team lead which finally solved this long impending issue. Cheers!!!

I just installed fresh latest postgres-9.3.2 from source, latest pgpool-II-3.3.2 and pgpoolAdmin-3.3.0. 

The e1012 was resolved by adding a line just before rsync command in both the "basebackup.sh" files of postgres servers. Thats it and my pgpool with streaming replication worked like a charm. 

this is the extra "basebackup.sh" line added from the suggestion of my team lead

***************************************************

ssh -T $recovery_node_host_name rm -rf $recovery_db_cluster/pg_xlog 

*****************************************************
I will definitely need your expert guidance and support in my future testing and performance tuning phases of postgres and pgpool.

Thanks and Regards,
Syed Irfan.Sr. Developer

On Thursday, 30 January 2014 5:21 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

It seems the cause of your problem is apparently this:

> /usr/local/pgsql/data/basebackup.sh: line 33: psql: command not found

Please fix it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Dear Tatsuo,
> 
>       Thanks for your reply, I have followed your recommendations and this time the errorcode e1012 pops up not on the third try but on the second try and within 2 seconds of clicking the recovery button. 
> 
> First Try of Recovery button when primary was down
> Recovery success on 172.16.80.49(when it was down manually) the backup log of 172.16.80.47 is as follows
> 
> ******************************************************************
>  pg_start_backup 
> -----------------
>  1/3C000020
> (1 row)
> 
> mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
> NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
>  pg_stop_backup 
> ----------------
>  1/3C0000D8
> (1 row)
> ******************************************************************
> 
> Second Try of Recovery button when the new primary was down manually
> Recovery fails on 172.16.80.47(when it was down manually) the backup log of 172.16.80.49 is as follows
> 
> ******************************************************************
> /usr/local/pgsql/data/basebackup.sh: line 12: psql: command not found
> mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
> /usr/local/pgsql/data/basebackup.sh: line 33: psql: command not found
> ******************************************************************
> 
> The basebackup.sh on both servers is as follows with the added script for basebackup.log and uncommented recovery_target_timeline = 'latest'
> 
> *******************************************************************
> #/bin/sh -x
> exec > /tmp/basebackup.log 2>&1
> # XXX We assume master and recovery host uses the same port number
> PORT=5432
> master_node_host_name=`hostname`
> master_db_cluster=$1
> recovery_node_host_name=$2
> recovery_db_cluster=$3
> tmp=/tmp/mytemp$$
> trap "rm -f $tmp" 0 1 2 3 15
> 
> psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
> 
> rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
> --exclude postmaster.opts --exclude pg_log \
> --exclude recovery.conf --exclude recovery.done \
> --exclude pg_xlog \
> $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster
> 
> ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done
> 
> cat > $tmp <<EOF
> recovery_target_timeline = 'latest'
> standby_mode          = 'on'
> primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
> EOF
> 
> scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
> 
> psql -p $PORT -c "SELECT pg_stop_backup()" postgres
> *******************************************************************
> 
>        
>      Also, the reason for commenting (recovery_target_timeline = 'latest') was it was not mentioned in your "Simple Streaming replication setting with pgpool-II(multiple servers version)" http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2_3.0/ page. 
>      But after a long search on the net i found someone adding the line (recovery_target_timeline = 'latest') so for test purpose i have added it and once it did not solve the purpose I had commented it.
> 
>      Request you to help on the issue ASAP.
> 
> Best Regards,
> Syed Irfan
> Sr Developer
> 
>  
> Thanks and Regards,
> Syed Irfan.
> 
> Sr. Developer
> 
> 
> 
> 
> 
> On Tuesday, 28 January 2014 4:52 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>  
>> This is what the recovery.conf looks like.
>> *******************************************
>> #recovery_target_timeline = 'latest'
>> standby_mode          = 'on'
>> primary_conninfo      = 'host=postgres-p.rolta.com port=5432 user=postgres'
>> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
>> ********************************************************
> 
> Why did you remove "recovery_target_timeline = 'latest'"?
> 
> I suggesto to take an execution log of script. You change the very
> begging of the script:
> 
> #/bin/sh -x
> 
> to:
> 
> #/bin/sh -x
> exec > /tmp/basebackup.log 2>&1
> 
> and please show us the content of /tmp/basebackup.log after execution of pcp_recovery_node.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
> 
> 
>> The basebackup.sh on both postgres databases is as follows
>> **************************************************
>> 
>> #/bin/sh -x
>> #
>> # XXX We assume master and recovery host uses the same port number
>> PORT=5432
>> master_node_host_name=`hostname`
>> master_db_cluster=$1
>> recovery_node_host_name=$2
>> recovery_db_cluster=$3
>> tmp=/tmp/mytemp$$
>> trap "rm -f $tmp" 0 1 2 3 15
>> 
>> psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
>> 
>> rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
>> --exclude postmaster.opts --exclude pg_log \
>> --exclude recovery.conf --exclude recovery.done \
>> --exclude pg_xlog \
>> $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster
>> 
>> ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
>> ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
>> ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done
>> 
>> cat > $tmp <<EOF
>> #recovery_target_timeline = 'latest'
>> standby_mode          = 'on'
>> primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
>> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
>> EOF
>> 
>> scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
>> 
>> psql -p $PORT -c "SELECT pg_stop_backup()" postgres
>> ***********************************************
>>  
>> Thanks and Regards,
>> Syed Irfan.
>> 
>> Sr. Developer
>> 
>> 
>> 
>> 
>> 
>> On Thursday, 23 January 2014 11:07 PM, Jeff Frost <jeff at pgexperts.com> wrote:
>>  
>> 
>> 
>> On Jan 23, 2014, at 9:32 AM, Syed Irfan <syedirfan_77 at yahoo.com> wrote:
>> 
>> Dear Tatsuo Ishii,
>>>
>>>
>>>       I am still awaiting for your reply on this issue, I have tried your suggestions but still I am unable to successfully run the Recovery process the third time it's surprises me how does it work the first time but same thing fails in the third attempt.?
>>>
>>>
>>>The Postgres log shows as below
>>>
>>>
>>>
>>>
>>>>> 28038 2014-01-09 21:28:33 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>>>>> 28039 2014-01-09 21:28:38 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>>
>>>
>>>
>>>I urgently request you to help me in this impending issue.
>>>
>> 
>> This is usually caused by postgres trying to replay WAL files from the wrong source.  Did you clean out the pg_xlog directory on the replica before taking the base backup?
>> 
>> What does your recovery.conf look like?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20140130/0015ad48/attachment.htm>