[pgpool-hackers: 2429] Re: Load balancing with synchronous replication

Wed Jul 5 13:40:59 JST 2017

On Wed, Jul 5, 2017 at 3:47 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> Right.  With the current synchronous_replay patch you could look for
>> pg_stat_replication.sync_replay = 'available'.  The trouble is that it
>> may be out of date, so you could still hit 40P02.  We'd need to decide
>> how to handle that too.
>
> The 40P02 error always returned? I thought there's a GUC to turn off
> the error.

With the synchronous_replay patch, the new error 40P02 can only be
raised on standby servers when the new GUC synchronous_replay is set
to on, so users have to activate this behaviour explicitly.  The error
is raised if the primary thinks that your standby is too slow, so it
doesn't want to wait for it to apply anymore when committing.  In that
case you will see sync_replay = 'unavailable' in the primary server's
pg_stat_replication view for that standby server.  If the standby
server catches up again, then it will switch back to 'available' in
the primary's pg_stat_replication view and the error will stop
happening.  You get the error whenever pg_stat_replication.replay_lag
> synchronous_replay_max_lag (another new GUC introduced by my match).
The replicas with sync_replay = 'available' are the set of replicas
that can currently handle queries with synchronous_replay = on without
raising error 40P02 *at that instant*, but of course that set might
change 0.7 milliseconds later.

Note that the GUC synchronous_replay does two different things: in a
committing write transaction it waits for the current set of
'available' standbys to apply the transaction or stop being
'available', and in a read-only standby transaction it checks that
this standby is currently 'available'.  Together those behaviours
create a useful guarantee about data visibility.

The idea is: if you run a read-only transaction with
synchronous_replay = on, then you are guaranteed to see a write
transaction that committed and returned with synchronous_replay = on
before your query started OR get the new error.  There is no
possibility to see stale data.  The question is: how can we handle
this error automatically so the client application doesn't have to?

-- 
Thomas Munro
http://www.enterprisedb.com