<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 13-08-05 11:54 PM, Tatsuo Ishii

      wrote:<br>

    </div>

    <blockquote

      cite="mid:20130806.112452.765978550314514202.t-ishii@sraoss.co.jp"

      type="cite">

      <blockquote type="cite">

        <pre wrap="">3) Run the stage 2 script.  pgpool_recovery_pitr does

pgpool_switch_xlog() as recommended, so all WAL are archived by the

time it completes.

4)    Run pgpool_remote_start.

  a) This copies the base backup and archived WALs, and writes

  recovery.conf.

  b) Since in postgresql.conf I have hot_standby=off, connections are

  blocked until recovery completes.

  c) My pgpool_remote_start starts PostgreSQL on the recovered node in a

  synchronous fashion, so it does not terminate until the PostgreSQL

  startup is finished.

5)    Connections are allowed once more.

Is this correct?  With this flow I can't see how the newly recovered

node could be out of sync with the master.  But the cluster behaves as

if an update has not been recorded in the archived WAL, or an update

took place on the master and the other slave somewhere between step 2

and step 5.

</pre>

      </blockquote>

      <pre wrap="">

Not sure I correctly understand your strategy but...

Can you stop updating at all while online recovery and make sure that

there's no difference between master DB and recoveried DB? If there's

no difference, the problem must be in the WAL copy process (or timing).

--

Tatsuo Ishii

SRA OSS, Inc. Japan

English: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp/index_en.php">http://www.sraoss.co.jp/index_en.php</a>

Japanese: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp">http://www.sraoss.co.jp</a>

</pre>

    </blockquote>

    <br>

    Tatsuo, thanks very much for your response.&nbsp; I've tried online

    recovery with the database clients disconnected, but it didn't have

    any effect.&nbsp; The database recovers correctly, works for a while and

    then when a certain update takes place one of the nodes blows up.&nbsp;

    It's not always the same statement but one of them is:<br>

    <br>

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    2013-08-20 17:04:02 ERROR: pid 30688: pgpool detected difference of

    the number of inserted, updated or deleted tuples. Possible last

    query was: "UPDATE ws_cached_searches SET search_data = $1,

    cache_time = current_timestamp WHERE cached_search_id = $2"<br>

    2013-08-20 17:04:02 LOG:&nbsp;&nbsp; pid 30688: CommandComplete: Number of

    affected tuples are: 1 0 1<br>

    <br>

    <br>

    I've posted my configuration to

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    <a href="https://github.com/clixtec/redundant-pgpool-config">https://github.com/clixtec/redundant-pgpool-config</a>

    - pgpool and PostgreSQL config files in the root, and the four

    script files in a subdirectory.&nbsp; Again it is modelled on the

    approach described at

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    <a

href="http://zetetic.net/blog/2012/3/9/point-in-time-recovery-from-backup-using-postgresql-continuo.html">http://zetetic.net/blog/2012/3/9/point-in-time-recovery-from-backup-using-postgresql-continuo.html</a>.&nbsp;

    Am I doing anything fundamentally wrong?<br>

    <br>

    Thanks,<br>

    Sean<br>

  </body>

</html>