[pgpool-general: 1325] Re: hot recovery of a node
Gilbert Soucy
gsoucy at 36pix.com
Thu Jan 17 22:57:54 JST 2013
Thank you for the reply.
In the node 1 log, I dont find a "last completed transaction" line as you
have in your example.
For this to happen, am I right to say that I need to rsync the files in
pg_xlog (node 0) to my archive folder ( /thumbnails/postgres_archives/ ) on
node 1 ? My stage 2 script does not do that right now.
I am not completely comfortable with what the files in pg_xlog contain
exactly, at the various stages. Also, I find it strange that the log show
many "No such file or directory", even in your example.
Thanks
Gilbert
~~~~
LOg file for postgres on node 1 during recovery:
2013-01-16 19:50:20 EST : LOG: database system was interrupted; last known
up at 2013-01-16 19:50:16 EST
2013-01-16 19:50:20 EST : LOG: starting archive recovery
2013-01-16 19:50:20 EST : LOG: restore_command = 'cp
/thumbnails/postgres_archives/pg_log/%f %p'
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000002.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000002.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file "00000002.history"
from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG: history of timeline 2 is (i 2 1)
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000002000000040000008D.00000020.backup
pg_xlog/RECOVERY
HISTORY"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D.00000020.backup':
No such file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file
"00000002000000040000008D.00000020.backup" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000002000000040000008D
pg_xlog/RECOVERYXLOG"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D': No such
file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file
"00000002000000040000008D" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG: checkpoint record is at 4/8D000020
2013-01-16 19:50:20 EST : DEBUG: redo record is at 4/8D000020; shutdown
FALSE
2013-01-16 19:50:20 EST : DEBUG: next transaction ID: 0/1625; next OID:
17797
2013-01-16 19:50:20 EST : DEBUG: next MultiXactId: 1; next
MultiXactOffset: 0
2013-01-16 19:50:20 EST : LOG: automatic recovery in progress
2013-01-16 19:50:20 EST : LOG: record with zero length at 4/8D000068
2013-01-16 19:50:20 EST : LOG: redo is not required
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000002000000040000008D
pg_xlog/RECOVERYXLOG"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D': No such
file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file
"00000002000000040000008D" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000003.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG: restored log file "00000003.history" from
archive
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000004.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG: restored log file "00000004.history" from
archive
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000005.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG: restored log file "00000005.history" from
archive
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000006.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000006.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file "00000006.history"
from archive: return code 256
2013-01-16 19:50:20 EST : LOG: selected new timeline ID: 6
2013-01-16 19:50:20 EST : DEBUG: executing restore command "cp
/thumbnails/postgres_archives/pg_log/00000002.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000002.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG: could not restore file "00000002.history"
from archive: return code 256
2013-01-16 19:50:20 EST : LOG: archive recovery complete
2013-01-16 19:50:20 EST : DEBUG: recycled transaction log file
"00000002000000040000008C"
2013-01-16 19:50:20 EST : DEBUG: transaction ID wrap limit is 2147484295,
limited by database "template1"
2013-01-16 19:50:20 EST : DEBUG: shmem_exit(0): 3 callbacks to make
2013-01-16 19:50:20 EST : DEBUG: proc_exit(0): 2 callbacks to make
2013-01-16 19:50:20 EST : DEBUG: exit(0)
2013-01-16 19:50:20 EST : DEBUG: shmem_exit(-1): 0 callbacks to make
2013-01-16 19:50:20 EST : DEBUG: proc_exit(-1): 0 callbacks to make
2013-01-16 19:50:20 EST : DEBUG: reaping dead processes
2013-01-16 19:50:20 EST : LOG: autovacuum launcher started
2013-01-16 19:50:20 EST : LOG: database system is ready to accept
connections
On Wed, Jan 16, 2013 at 10:06 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> > Hello
> >
> > I hope this is the place whee I can ask general questions about pgpool. I
> > could not find a forum. Otheriwse, please let me know where I should go
> to
> > get help
>
> Yes, this is the perfect place to ask general questions about
> pgpool(there's no forum).
>
> > Here is my question:
> >
> > I am trying to setup pgpool II with 2 postgres nodes using replication
> and
> > load balancing. I have setup the watch dog to switch the virtual IP
> > automatically in case of failure. The 2 postgres nodes run on 2 identical
> > computers.
> >
> > Everything works fine except for the recovery which I dont quite
> undestand.
> > I have installed and configured the scripts:
> >
> > basebackup.sh
> > pgpool_remote_start
> > pgpool_recovery_pitr
> >
> >
> > Then I simulate a failure : I shutdown one of the postgres nodes. This
> gets
> > detected fine and pgpool is still functional with the remaining node
> (which
> > can still accept connections).
> >
> > When I try to recover, it does not work as I expected. I thought that we
> > could recover *while pgpool keeps accepting requests*. (at least for
> stage
> > 1 of the recover). So, I simulated this by inserting a new line in a
> table
> > of my database just before the end of the basebackup.sh script (just
> *after
> > * the rsync and *before* the pg_stop_backup() ).
> >
> > So, my basebackup.sh script looks like:
> > ~~~~~~~~~~
> >
> > ....
> >
> > *psql -U postgres -p 5432 -c "insert into users values (3000, 'def1');"
> > myDB # this line was added to basebackup.sh to simulate a
> transaction
> > during a recover, stage 1 *
> >
> > psql -U postgres -p 5432 -c "SELECT pg_stop_backup()" postgres
> > ~~~~~~~~~~~~
> >
> > I then try to recover my failed node with:
> >
> > /usr/local/bin/pcp_recovery_node -d 15 192.168.0.110 9898 pgpool pgpool 1
> >
> > Which succeeds. However, when I look at the content of the databases,
> only
> > the master database has the new line "3000 test" (purposely added in my
> > basebackup.sh script). The recovered database has all other lines but not
> > this one. So the recovery is not complete in my opinion (or I dont
> > understand what it is supposed to do).
> >
> > Am I wrong in expecting that this should work (i.e. after the recover,
> the
> > 2 database should be completely in sync) ?
>
> Works fine for me. I guess you have problem with archive recovery
> setting of PostgreSQL.
>
> Below is my recovered PostgreSQL log. Do you find something like
> "restored log file "000000030000000100000021" from archive" in your
> PostgreSQL log(in your case DB node 1)?
>
> 31055 2013-01-17 11:38:14 JST LOG: database system was interrupted; last
> known up at 2013-01-17 11:38:11 JST
> 31055 2013-01-17 11:38:14 JST LOG: creating missing WAL directory
> "pg_xlog/archive_status"
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000003.history': No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG: starting archive recovery
> 31055 2013-01-17 11:38:14 JST LOG: restored log file
> "000000030000000100000020" from archive
> 31055 2013-01-17 11:38:14 JST LOG: redo starts at 1/20000020
> 31055 2013-01-17 11:38:14 JST LOG: consistent recovery state reached at
> 1/21000000
> 31053 2013-01-17 11:38:14 JST LOG: database system is ready to accept
> read only connections
> 31055 2013-01-17 11:38:14 JST LOG: restored log file
> "000000030000000100000021" from archive
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/000000030000000100000022': No
> such file or directory
> 31055 2013-01-17 11:38:14 JST LOG: could not open file
> "pg_xlog/000000030000000100000022" (log file 1, segment 34): No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG: redo done at 1/21000E48
> 31055 2013-01-17 11:38:14 JST LOG: last completed transaction was at log
> time 2013-01-17 11:38:12.376859+09
> 31055 2013-01-17 11:38:14 JST LOG: restored log file
> "000000030000000100000021" from archive
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000004.history': No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG: selected new timeline ID: 4
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000003.history': No such file
> or directory
> 31055 2013-01-17 11:38:15 JST LOG: archive recovery complete
> 31053 2013-01-17 11:38:15 JST LOG: database system is ready to accept
> connections
> 31067 2013-01-17 11:38:15 JST LOG: autovacuum launcher started
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130117/a012b29f/attachment.htm>
More information about the pgpool-general
mailing list