[pgpool-general: 2502] Re: Help error code e1012 on pgpool II 3.3.0 while clicking Recovery button

Thu Jan 30 08:50:59 JST 2014

It seems the cause of your problem is apparently this:

> /usr/local/pgsql/data/basebackup.sh: line 33: psql: command not found

Please fix it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Dear Tatsuo,
> 
>       Thanks for your reply, I have followed your recommendations and this time the errorcode e1012 pops up not on the third try but on the second try and within 2 seconds of clicking the recovery button. 
> 
> First Try of Recovery button when primary was down
> Recovery success on 172.16.80.49(when it was down manually) the backup log of 172.16.80.47 is as follows
> 
> ******************************************************************
>  pg_start_backup 
> -----------------
>  1/3C000020
> (1 row)
> 
> mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
> NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
>  pg_stop_backup 
> ----------------
>  1/3C0000D8
> (1 row)
> ******************************************************************
> 
> Second Try of Recovery button when the new primary was down manually
> Recovery fails on 172.16.80.47(when it was down manually) the backup log of 172.16.80.49 is as follows
> 
> ******************************************************************
> /usr/local/pgsql/data/basebackup.sh: line 12: psql: command not found
> mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
> /usr/local/pgsql/data/basebackup.sh: line 33: psql: command not found
> ******************************************************************
> 
> The basebackup.sh on both servers is as follows with the added script for basebackup.log and uncommented recovery_target_timeline = 'latest'
> 
> *******************************************************************
> #/bin/sh -x
> exec > /tmp/basebackup.log 2>&1
> # XXX We assume master and recovery host uses the same port number
> PORT=5432
> master_node_host_name=`hostname`
> master_db_cluster=$1
> recovery_node_host_name=$2
> recovery_db_cluster=$3
> tmp=/tmp/mytemp$$
> trap "rm -f $tmp" 0 1 2 3 15
> 
> psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
> 
> rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
> --exclude postmaster.opts --exclude pg_log \
> --exclude recovery.conf --exclude recovery.done \
> --exclude pg_xlog \
> $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster
> 
> ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done
> 
> cat > $tmp <<EOF
> recovery_target_timeline = 'latest'
> standby_mode          = 'on'
> primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
> EOF
> 
> scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
> 
> psql -p $PORT -c "SELECT pg_stop_backup()" postgres
> *******************************************************************
> 
>        
>      Also, the reason for commenting (recovery_target_timeline = 'latest') was it was not mentioned in your "Simple Streaming replication setting with pgpool-II(multiple servers version)" http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2_3.0/ page. 
>      But after a long search on the net i found someone adding the line (recovery_target_timeline = 'latest') so for test purpose i have added it and once it did not solve the purpose I had commented it.
> 
>      Request you to help on the issue ASAP.
> 
> Best Regards,
> Syed Irfan
> Sr Developer
> 
>  
> Thanks and Regards,
> Syed Irfan.
> 
> Sr. Developer
> 
> 
> 
> 
> 
> On Tuesday, 28 January 2014 4:52 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>  
>> This is what the recovery.conf looks like.
>> *******************************************
>> #recovery_target_timeline = 'latest'
>> standby_mode          = 'on'
>> primary_conninfo      = 'host=postgres-p.rolta.com port=5432 user=postgres'
>> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
>> ********************************************************
> 
> Why did you remove "recovery_target_timeline = 'latest'"?
> 
> I suggesto to take an execution log of script. You change the very
> begging of the script:
> 
> #/bin/sh -x
> 
> to:
> 
> #/bin/sh -x
> exec > /tmp/basebackup.log 2>&1
> 
> and please show us the content of /tmp/basebackup.log after execution of pcp_recovery_node.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
> 
> 
>> The basebackup.sh on both postgres databases is as follows
>> **************************************************
>> 
>> #/bin/sh -x
>> #
>> # XXX We assume master and recovery host uses the same port number
>> PORT=5432
>> master_node_host_name=`hostname`
>> master_db_cluster=$1
>> recovery_node_host_name=$2
>> recovery_db_cluster=$3
>> tmp=/tmp/mytemp$$
>> trap "rm -f $tmp" 0 1 2 3 15
>> 
>> psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
>> 
>> rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
>> --exclude postmaster.opts --exclude pg_log \
>> --exclude recovery.conf --exclude recovery.done \
>> --exclude pg_xlog \
>> $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster
>> 
>> ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
>> ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
>> ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done
>> 
>> cat > $tmp <<EOF
>> #recovery_target_timeline = 'latest'
>> standby_mode          = 'on'
>> primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
>> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
>> EOF
>> 
>> scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
>> 
>> psql -p $PORT -c "SELECT pg_stop_backup()" postgres
>> ***********************************************
>>  
>> Thanks and Regards,
>> Syed Irfan.
>> 
>> Sr. Developer
>> 
>> 
>> 
>> 
>> 
>> On Thursday, 23 January 2014 11:07 PM, Jeff Frost <jeff at pgexperts.com> wrote:
>>  
>> 
>> 
>> On Jan 23, 2014, at 9:32 AM, Syed Irfan <syedirfan_77 at yahoo.com> wrote:
>> 
>> Dear Tatsuo Ishii,
>>>
>>>
>>>       I am still awaiting for your reply on this issue, I have tried your suggestions but still I am unable to successfully run the Recovery process the third time it's surprises me how does it work the first time but same thing fails in the third attempt.?
>>>
>>>
>>>The Postgres log shows as below
>>>
>>>
>>>
>>>
>>>>> 28038 2014-01-09 21:28:33 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>>>>> 28039 2014-01-09 21:28:38 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>>
>>>
>>>
>>>I urgently request you to help me in this impending issue.
>>>
>> 
>> This is usually caused by postgres trying to replay WAL files from the wrong source.  Did you clean out the pg_xlog directory on the replica before taking the base backup?
>> 
>> What does your recovery.conf look like?