[pgpool-general: 2501] Re: Help error code e1012 on pgpool II 3.3.0 while clicking Recovery button

Thu Jan 30 00:02:07 JST 2014

Dear Tatsuo,

      Thanks for your reply, I have followed your recommendations and this time the errorcode e1012 pops up not on the third try but on the second try and within 2 seconds of clicking the recovery button. 

First Try of Recovery button when primary was down
Recovery success on 172.16.80.49(when it was down manually) the backup log of 172.16.80.47 is as follows

******************************************************************
 pg_start_backup 
-----------------
 1/3C000020
(1 row)

mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
 pg_stop_backup 
----------------
 1/3C0000D8
(1 row)
******************************************************************

Second Try of Recovery button when the new primary was down manually
Recovery fails on 172.16.80.47(when it was down manually) the backup log of 172.16.80.49 is as follows

******************************************************************
/usr/local/pgsql/data/basebackup.sh: line 12: psql: command not found
mkdir: cannot create directory `/usr/local/pgsql/data/pg_xlog': File exists
/usr/local/pgsql/data/basebackup.sh: line 33: psql: command not found
******************************************************************

The basebackup.sh on both servers is as follows with the added script for basebackup.log and uncommented recovery_target_timeline = 'latest'

*******************************************************************
#/bin/sh -x
exec > /tmp/basebackup.log 2>&1
# XXX We assume master and recovery host uses the same port number
PORT=5432
master_node_host_name=`hostname`
master_db_cluster=$1
recovery_node_host_name=$2
recovery_db_cluster=$3
tmp=/tmp/mytemp$$
trap "rm -f $tmp" 0 1 2 3 15

psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres

rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
--exclude postmaster.opts --exclude pg_log \
--exclude recovery.conf --exclude recovery.done \
--exclude pg_xlog \
$master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster

ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done

cat > $tmp <<EOF
recovery_target_timeline = 'latest'
standby_mode          = 'on'
primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
trigger_file = '/var/log/pgpool/trigger/trigger_file1'
EOF

scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf

psql -p $PORT -c "SELECT pg_stop_backup()" postgres
*******************************************************************

     Also, the reason for commenting (recovery_target_timeline = 'latest') was it was not mentioned in your "Simple Streaming replication setting with pgpool-II(multiple servers version)" http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2_3.0/ page. 
     But after a long search on the net i found someone adding the line (recovery_target_timeline = 'latest') so for test purpose i have added it and once it did not solve the purpose I had commented it.

     Request you to help on the issue ASAP.

Best Regards,
Syed Irfan
Sr Developer

Thanks and Regards,
Syed Irfan.

Sr. Developer

On Tuesday, 28 January 2014 4:52 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> This is what the recovery.conf looks like.
> *******************************************
> #recovery_target_timeline = 'latest'
> standby_mode          = 'on'
> primary_conninfo      = 'host=postgres-p.rolta.com port=5432 user=postgres'
> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
> ********************************************************

Why did you remove "recovery_target_timeline = 'latest'"?

I suggesto to take an execution log of script. You change the very
begging of the script:

#/bin/sh -x

to:

#/bin/sh -x
exec > /tmp/basebackup.log 2>&1

and please show us the content of /tmp/basebackup.log after execution of pcp_recovery_node.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> The basebackup.sh on both postgres databases is as follows
> **************************************************
> 
> #/bin/sh -x
> #
> # XXX We assume master and recovery host uses the same port number
> PORT=5432
> master_node_host_name=`hostname`
> master_db_cluster=$1
> recovery_node_host_name=$2
> recovery_db_cluster=$3
> tmp=/tmp/mytemp$$
> trap "rm -f $tmp" 0 1 2 3 15
> 
> psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
> 
> rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
> --exclude postmaster.opts --exclude pg_log \
> --exclude recovery.conf --exclude recovery.done \
> --exclude pg_xlog \
> $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster
> 
> ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
> ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done
> 
> cat > $tmp <<EOF
> #recovery_target_timeline = 'latest'
> standby_mode          = 'on'
> primary_conninfo      = 'host=$master_node_host_name port=$PORT user=postgres'
> trigger_file = '/var/log/pgpool/trigger/trigger_file1'
> EOF
> 
> scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
> 
> psql -p $PORT -c "SELECT pg_stop_backup()" postgres
> ***********************************************
>  
> Thanks and Regards,
> Syed Irfan.
> 
> Sr. Developer
> 
> 
> 
> 
> 
> On Thursday, 23 January 2014 11:07 PM, Jeff Frost <jeff at pgexperts.com> wrote:
>  
> 
> 
> On Jan 23, 2014, at 9:32 AM, Syed Irfan <syedirfan_77 at yahoo.com> wrote:
> 
> Dear Tatsuo Ishii,
>>
>>
>>       I am still awaiting for your reply on this issue, I have tried your suggestions but still I am unable to successfully run the Recovery process the third time it's surprises me how does it work the first time but same thing fails in the third attempt.?
>>
>>
>>The Postgres log shows as below
>>
>>
>>
>>
>>>> 28038 2014-01-09 21:28:33 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>>>> 28039 2014-01-09 21:28:38 BDT FATAL:  timeline 35 of the primary does not match recovery target timeline 36
>>
>>
>>
>>I urgently request you to help me in this impending issue.
>>
> 
> This is usually caused by postgres trying to replay WAL files from the wrong source.  Did you clean out the pg_xlog directory on the replica before taking the base backup?
> 
> What does your recovery.conf look like?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140129/c67e29b2/attachment-0001.html>