[pgpool-hackers: 2418] Re: Load balancing with synchronous replication

Sat Jul 1 11:32:14 JST 2017

On Thu, May 26, 2016 at 9:42 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> I think problem with your idea is the overhead for pgpool-II to ask
>>> status to PostgreSQL. This inquery will happen with every single query
>>> sent from clients to pgpool-II, and that could cause visible performance
>>> degration. What do you think?
>>
>> Yeah, I didn't mean for every query.  I suppose something periodic
>> like sr_check_period.
>
> That's and idea. I think the patch you plan to propose for 10.0 will
> greatly help this functionality.

Hello again pgpool hackers,

I have now renamed that proposed feature to "synchronous replay".  I
am wondering if it would be a good idea to do a proof-of-concept patch
for pgpool.  One of the central ideas of my patch is that it should be
easy to use for end users, which is where pgpool can help.

Here is my idea for step 1.  Use the regular master/slave load
balancing mode, but with a couple of small modifications:

1.  If error 40P02 is raised (the new error introduced by my patch,
"synchronous replay is not available"), then remember not to pick that
server again for N seconds (configurable).

2.  If error 40P02 is raised, then automatically retry the statement;
because of point (1), this will happen on a different server.  There
is a small limit on the number of retries (configurable).

I think that would be better than trying to check the
pg_stat_replication view periodically on the master (the idea I
mentioned in the earlier email).  What do you think?

Here is my idea for step 2.  Multi-statement transactions that begin
with "BEGIN ... READ ONLY" could also be sent to read-only servers.
If the error 40P02 occurs, then perhaps the transaction could be
automatically retried on another server, but *only* if it happens when
running the *first* statement in the transaction.  (It doesn't make
sense to replay the transaction automatically on another node if the
user has already seen some results, which is why it's OK to do it if
the first statement fails but not if later statements fail).  In
practice, this mean that REPEATABLE READ + READ ONLY transactions
would benefit from automatic load balancing with automatic transparent
retry, because REPEATABLE READ transactions can only fail with error
40P02 on the first statement.  That would provide end users with a
nice way to achieve load balancing of multi-statement read only
transactions, as long as they are prepared to wrap their transactions
in "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY".
Perhaps READ COMMITTED could be supported too, but then we would have
to send the 40P02 error to the user: automatic retry would not be
possible, unless it happens to be on the first statement.  Actually
that it is very likely to occur on the first statement, so maybe
that's still quite usable.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com