[pgpool-hackers: 2427] Re: Load balancing with synchronous replication

Wed Jul 5 08:15:24 JST 2017

On Wed, Jul 5, 2017 at 12:28 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> Here is my idea for step 1.  Use the regular master/slave load
>> balancing mode, but with a couple of small modifications:
>>
>> 1.  If error 40P02 is raised (the new error introduced by my patch,
>> "synchronous replay is not available"), then remember not to pick that
>> server again for N seconds (configurable).
>>
>> 2.  If error 40P02 is raised, then automatically retry the statement;
>> because of point (1), this will happen on a different server.  There
>> is a small limit on the number of retries (configurable).
>
> What would happen if the query is executed in an explicit transaction?
> If the error raised, the transaction is already in the abort state. So
> the retry needs for whole transaction, which seems very difficult for
> me.

For explicit transactions, the transaction must be rolled back in
order to be able to reuse the connection, but I assume you already
have handling for that sort of thing?  As for automatically retrying
the transaction if error 40P02 is raised, that would only be possible
if the error occurred on the *first* statement of the transaction,
because the user hasn't seen any results yet.  Otherwise there is no
choice but to let the error reach the user.  Fortunately, there are
two ways for users to make sure that error 40P02 only happens on the
first statement of a transaction: (1) use REPEATABLE READ READ ONLY
(error 40P02 is only raised when acquiring a snapshot, which happens
on the first statement only in the higher isolation levels), and (2)
use implicit transactions (the statement is always the first).

>> I think that would be better than trying to check the
>> pg_stat_replication view periodically on the master (the idea I
>> mentioned in the earlier email).  What do you think?
>
> Because using pg_stat_replication view does not have such a problem
> (aborted transaction), things would be lot easier. Fortunately there's
> a convenient infrastructure in Pgpool-II: the relation cache. It's a
> per process cache. We could cache the result for querying
> pg_stat_replication. The cache can have an expiration time. So by
> setting the expiration, it acts like "checking pg_stat_replication
> view periodically".

Right.  With the current synchronous_replay patch you could look for
pg_stat_replication.sync_replay = 'available'.  The trouble is that it
may be out of date, so you could still hit 40P02.  We'd need to decide
how to handle that too.  I was thinking: if we have to handle the
error anyway, then why have two systems?  That's why I thought that
perhaps we should ONLY use the error to guide the routing.

>> Here is my idea for step 2.  Multi-statement transactions that begin
>> with "BEGIN ... READ ONLY" could also be sent to read-only servers.
>
> Good point. Probably we could do an optimaization by checking "BEGIN
> ... READ ONLY" today.

+1

Thanks very much for taking the time to think about my strange questions :-)

-- 
Thomas Munro
http://www.enterprisedb.com