[pgpool-general: 1922] Online recovery with Streaming Replication confusion.

Nathan Brennan nathan at healthengine.com.au
Tue Jul 23 19:30:33 JST 2013


Hi,

I'm trying to do online recovery with master/slave replication and I'm
basically copying
http://www.pgpool.net/docs/latest/pgpool-en.html#master_slave_mode but I
can't figure out how the users work and seem to be getting a "host key
verification failed" issue.

Following the steps I setup the recovery_user and recovery_password for
this I used the postgresql user "postgres" which I set a password for in
the database and not the ubuntu user which seems to work, is this correct ?

It then says you need to be able to ssh from the primary to the standby
which I assume uses the user postgres. I have ssh keys setup so I can
connect from the primary to the standby like so:
ssh standby as user postres works fine.

I have created the basebackup.sh file however I'm not sure why this line
uses localhost shouldn't it be $desthost ?
ssh -T localhost mv $destdir/recovery.done $destdir/recovery.conf

I have installed pgpool-recovery and updated pgpool_remote_start to use
/usr/lib/postgresql/9.1/bin/ instead of /usr/local/pgsql/bin/pg_ctl


now when I run pcp_recovery_node like so this is my pgpool console debug
output

pcp_recovery_node -d 10 localhost 9898 postgres postgres 1
DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="e", len=20, data=recovery failed
DEBUG: command failed. reason=recovery failed
BackendError
DEBUG: send: tos="X", len=4


postgresql primary log

 pg_start_backup
-----------------
 0/12000020
(1 row)

Host key verification failed.
NOTICE:  WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
 pg_stop_backup
----------------
 0/120000D8


pgpool log

2013-07-23 10:09:36 LOG:   pid 7547: starting recovering node 1
2013-07-23 10:09:36 LOG:   pid 7547: starting recovery command: "SELECT
pgpool_recovery('basebackup.sh', '10.0.11.150',
'/var/lib/postgresql/9.1/main/')"
2013-07-23 10:09:37 LOG:   pid 7547: 1st stage is done
2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: try to
connect to postmaster on hostname:10.0.11.150 database:postgres
user:postgres (retry 0 times)
2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: failed to
connect to postmaster on hostname:10.0.11.150 database:postgres
user:postgres


the check_postmaster_started just keeps repeating for 90seconds which is
the timeout.

postgres standby startup log

2013-07-23 10:18:17 UTC LOG:  database system was interrupted; last known
up at 2013-07-23 10:13:35 UTC
2013-07-23 10:18:17 UTC LOG:  could not open file
"pg_xlog/000000010000000000000015" (log file 0, segment 21): No s
 uch file or directory
2013-07-23 10:18:17 UTC LOG:  invalid checkpoint record
2013-07-23 10:18:17 UTC FATAL:  could not locate required checkpoint record
2013-07-23 10:18:17 UTC HINT:  If you are not restoring from a backup, try
removing the file "/var/lib/postgresql/9          .1/main/backup_label".
2013-07-23 10:18:17 UTC LOG:  startup process (PID 7020) exited with exit
code 1
2013-07-23 10:18:17 UTC LOG:  aborting startup due to startup process
failure


basically it looks like it started to copy the files then fails ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130723/bacd454e/attachment-0001.html>


More information about the pgpool-general mailing list