[pgpool-general: 1923] Re: Online recovery with Streaming Replication confusion.

Tue Jul 23 22:52:25 JST 2013

Looks baseback.sh coming with the doc is broken.

Can you please try another one which comes with the tutorial on the
pgpool wiki? (http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/index.html)

http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/basebackup.sh
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi,
> 
> I'm trying to do online recovery with master/slave replication and I'm
> basically copying
> http://www.pgpool.net/docs/latest/pgpool-en.html#master_slave_mode but I
> can't figure out how the users work and seem to be getting a "host key
> verification failed" issue.
> 
> Following the steps I setup the recovery_user and recovery_password for
> this I used the postgresql user "postgres" which I set a password for in
> the database and not the ubuntu user which seems to work, is this correct ?
> 
> It then says you need to be able to ssh from the primary to the standby
> which I assume uses the user postgres. I have ssh keys setup so I can
> connect from the primary to the standby like so:
> ssh standby as user postres works fine.
> 
> I have created the basebackup.sh file however I'm not sure why this line
> uses localhost shouldn't it be $desthost ?
> ssh -T localhost mv $destdir/recovery.done $destdir/recovery.conf
> 
> I have installed pgpool-recovery and updated pgpool_remote_start to use
> /usr/lib/postgresql/9.1/bin/ instead of /usr/local/pgsql/bin/pg_ctl
> 
> 
> now when I run pcp_recovery_node like so this is my pgpool console debug
> output
> 
> pcp_recovery_node -d 10 localhost 9898 postgres postgres 1
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="e", len=20, data=recovery failed
> DEBUG: command failed. reason=recovery failed
> BackendError
> DEBUG: send: tos="X", len=4
> 
> 
> postgresql primary log
> 
>  pg_start_backup
> -----------------
>  0/12000020
> (1 row)
> 
> Host key verification failed.
> NOTICE:  WAL archiving is not enabled; you must ensure that all required
> WAL segments are copied through other means to complete the backup
>  pg_stop_backup
> ----------------
>  0/120000D8
> 
> 
> pgpool log
> 
> 2013-07-23 10:09:36 LOG:   pid 7547: starting recovering node 1
> 2013-07-23 10:09:36 LOG:   pid 7547: starting recovery command: "SELECT
> pgpool_recovery('basebackup.sh', '10.0.11.150',
> '/var/lib/postgresql/9.1/main/')"
> 2013-07-23 10:09:37 LOG:   pid 7547: 1st stage is done
> 2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: try to
> connect to postmaster on hostname:10.0.11.150 database:postgres
> user:postgres (retry 0 times)
> 2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: failed to
> connect to postmaster on hostname:10.0.11.150 database:postgres
> user:postgres
> 
> 
> the check_postmaster_started just keeps repeating for 90seconds which is
> the timeout.
> 
> postgres standby startup log
> 
> 2013-07-23 10:18:17 UTC LOG:  database system was interrupted; last known
> up at 2013-07-23 10:13:35 UTC
> 2013-07-23 10:18:17 UTC LOG:  could not open file
> "pg_xlog/000000010000000000000015" (log file 0, segment 21): No s
>  uch file or directory
> 2013-07-23 10:18:17 UTC LOG:  invalid checkpoint record
> 2013-07-23 10:18:17 UTC FATAL:  could not locate required checkpoint record
> 2013-07-23 10:18:17 UTC HINT:  If you are not restoring from a backup, try
> removing the file "/var/lib/postgresql/9          .1/main/backup_label".
> 2013-07-23 10:18:17 UTC LOG:  startup process (PID 7020) exited with exit
> code 1
> 2013-07-23 10:18:17 UTC LOG:  aborting startup due to startup process
> failure
> 
> 
> basically it looks like it started to copy the files then fails ?