[pgpool-general: 1930] Re: Online recovery with Streaming Replication confusion.

Wed Jul 24 16:25:55 JST 2013

This one worked thanks.

Nathan Brennan | Senior Software Engineer

This message and any attachments contain confidential information and is
intended only for the individual named. If you are not the named addressee
you should not disseminate, distribute or copy this e-mail or attachments
(if any). Please notify the sender immediately by e-mail if you have
received this e-mail by mistake and delete this e-mail and attachments (if
any) from your system. E-mail transmission cannot be guaranteed to be
secure or error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The sender
therefore does not accept liability for any errors or omissions in the
contents or attachments (if any) of this message, which arise as a result
of e-mail transmission.

On Tue, Jul 23, 2013 at 9:52 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Looks baseback.sh coming with the doc is broken.
>
> Can you please try another one which comes with the tutorial on the
> pgpool wiki? (
> http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/index.html
> )
>
>
> http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting2/basebackup.sh
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Hi,
> >
> > I'm trying to do online recovery with master/slave replication and I'm
> > basically copying
> > http://www.pgpool.net/docs/latest/pgpool-en.html#master_slave_mode but I
> > can't figure out how the users work and seem to be getting a "host key
> > verification failed" issue.
> >
> > Following the steps I setup the recovery_user and recovery_password for
> > this I used the postgresql user "postgres" which I set a password for in
> > the database and not the ubuntu user which seems to work, is this
> correct ?
> >
> > It then says you need to be able to ssh from the primary to the standby
> > which I assume uses the user postgres. I have ssh keys setup so I can
> > connect from the primary to the standby like so:
> > ssh standby as user postres works fine.
> >
> > I have created the basebackup.sh file however I'm not sure why this line
> > uses localhost shouldn't it be $desthost ?
> > ssh -T localhost mv $destdir/recovery.done $destdir/recovery.conf
> >
> > I have installed pgpool-recovery and updated pgpool_remote_start to use
> > /usr/lib/postgresql/9.1/bin/ instead of /usr/local/pgsql/bin/pg_ctl
> >
> >
> > now when I run pcp_recovery_node like so this is my pgpool console debug
> > output
> >
> > pcp_recovery_node -d 10 localhost 9898 postgres postgres 1
> > DEBUG: send: tos="R", len=46
> > DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> > DEBUG: send: tos="D", len=6
> > DEBUG: recv: tos="e", len=20, data=recovery failed
> > DEBUG: command failed. reason=recovery failed
> > BackendError
> > DEBUG: send: tos="X", len=4
> >
> >
> > postgresql primary log
> >
> >  pg_start_backup
> > -----------------
> >  0/12000020
> > (1 row)
> >
> > Host key verification failed.
> > NOTICE:  WAL archiving is not enabled; you must ensure that all required
> > WAL segments are copied through other means to complete the backup
> >  pg_stop_backup
> > ----------------
> >  0/120000D8
> >
> >
> > pgpool log
> >
> > 2013-07-23 10:09:36 LOG:   pid 7547: starting recovering node 1
> > 2013-07-23 10:09:36 LOG:   pid 7547: starting recovery command: "SELECT
> > pgpool_recovery('basebackup.sh', '10.0.11.150',
> > '/var/lib/postgresql/9.1/main/')"
> > 2013-07-23 10:09:37 LOG:   pid 7547: 1st stage is done
> > 2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: try to
> > connect to postmaster on hostname:10.0.11.150 database:postgres
> > user:postgres (retry 0 times)
> > 2013-07-23 10:09:37 LOG:   pid 7547: check_postmaster_started: failed to
> > connect to postmaster on hostname:10.0.11.150 database:postgres
> > user:postgres
> >
> >
> > the check_postmaster_started just keeps repeating for 90seconds which is
> > the timeout.
> >
> > postgres standby startup log
> >
> > 2013-07-23 10:18:17 UTC LOG:  database system was interrupted; last known
> > up at 2013-07-23 10:13:35 UTC
> > 2013-07-23 10:18:17 UTC LOG:  could not open file
> > "pg_xlog/000000010000000000000015" (log file 0, segment 21): No s
> >  uch file or directory
> > 2013-07-23 10:18:17 UTC LOG:  invalid checkpoint record
> > 2013-07-23 10:18:17 UTC FATAL:  could not locate required checkpoint
> record
> > 2013-07-23 10:18:17 UTC HINT:  If you are not restoring from a backup,
> try
> > removing the file "/var/lib/postgresql/9          .1/main/backup_label".
> > 2013-07-23 10:18:17 UTC LOG:  startup process (PID 7020) exited with exit
> > code 1
> > 2013-07-23 10:18:17 UTC LOG:  aborting startup due to startup process
> > failure
> >
> >
> > basically it looks like it started to copy the files then fails ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130724/d7cb08f1/attachment.htm>