[pgpool-general: 1923] Re: Online recovery with Streaming Replication confusion.
Tatsuo Ishii
ishii at postgresql.org
Tue Jul 23 22:52:25 JST 2013
Looks baseback.sh coming with the doc is broken.
Can you please try another one which comes with the tutorial on the
pgpool wiki? (http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/index.html)
http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/basebackup.sh
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Hi,
>
> I'm trying to do online recovery with master/slave replication and I'm
> basically copying
> http://www.pgpool.net/docs/latest/pgpool-en.html#master_slave_mode but I
> can't figure out how the users work and seem to be getting a "host key
> verification failed" issue.
>
> Following the steps I setup the recovery_user and recovery_password for
> this I used the postgresql user "postgres" which I set a password for in
> the database and not the ubuntu user which seems to work, is this correct ?
>
> It then says you need to be able to ssh from the primary to the standby
> which I assume uses the user postgres. I have ssh keys setup so I can
> connect from the primary to the standby like so:
> ssh standby as user postres works fine.
>
> I have created the basebackup.sh file however I'm not sure why this line
> uses localhost shouldn't it be $desthost ?
> ssh -T localhost mv $destdir/recovery.done $destdir/recovery.conf
>
> I have installed pgpool-recovery and updated pgpool_remote_start to use
> /usr/lib/postgresql/9.1/bin/ instead of /usr/local/pgsql/bin/pg_ctl
>
>
> now when I run pcp_recovery_node like so this is my pgpool console debug
> output
>
> pcp_recovery_node -d 10 localhost 9898 postgres postgres 1
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="e", len=20, data=recovery failed
> DEBUG: command failed. reason=recovery failed
> BackendError
> DEBUG: send: tos="X", len=4
>
>
> postgresql primary log
>
> pg_start_backup
> -----------------
> 0/12000020
> (1 row)
>
> Host key verification failed.
> NOTICE: WAL archiving is not enabled; you must ensure that all required
> WAL segments are copied through other means to complete the backup
> pg_stop_backup
> ----------------
> 0/120000D8
>
>
> pgpool log
>
> 2013-07-23 10:09:36 LOG: pid 7547: starting recovering node 1
> 2013-07-23 10:09:36 LOG: pid 7547: starting recovery command: "SELECT
> pgpool_recovery('basebackup.sh', '10.0.11.150',
> '/var/lib/postgresql/9.1/main/')"
> 2013-07-23 10:09:37 LOG: pid 7547: 1st stage is done
> 2013-07-23 10:09:37 LOG: pid 7547: check_postmaster_started: try to
> connect to postmaster on hostname:10.0.11.150 database:postgres
> user:postgres (retry 0 times)
> 2013-07-23 10:09:37 LOG: pid 7547: check_postmaster_started: failed to
> connect to postmaster on hostname:10.0.11.150 database:postgres
> user:postgres
>
>
> the check_postmaster_started just keeps repeating for 90seconds which is
> the timeout.
>
> postgres standby startup log
>
> 2013-07-23 10:18:17 UTC LOG: database system was interrupted; last known
> up at 2013-07-23 10:13:35 UTC
> 2013-07-23 10:18:17 UTC LOG: could not open file
> "pg_xlog/000000010000000000000015" (log file 0, segment 21): No s
> uch file or directory
> 2013-07-23 10:18:17 UTC LOG: invalid checkpoint record
> 2013-07-23 10:18:17 UTC FATAL: could not locate required checkpoint record
> 2013-07-23 10:18:17 UTC HINT: If you are not restoring from a backup, try
> removing the file "/var/lib/postgresql/9 .1/main/backup_label".
> 2013-07-23 10:18:17 UTC LOG: startup process (PID 7020) exited with exit
> code 1
> 2013-07-23 10:18:17 UTC LOG: aborting startup due to startup process
> failure
>
>
> basically it looks like it started to copy the files then fails ?
More information about the pgpool-general
mailing list