[pgpool-hackers: 900] Re: [pgpool-general: 3672] Re: Load balancing in explicit transaction block?

Tatsuo Ishii ishii at postgresql.org
Fri May 8 10:39:38 JST 2015


Hi,

While discussing with a pgpool user, I noticed that the treatment of
load balancing within a transaction in the streaming replication mode
is not so good or at least is counter intuitive. We claim followings
in the manual:
-----------------------------------------------
Condition for load balancing
:
:
    However, if following conditions are met, load balance is possible
    even if in an explicit transaction

        transaction isolation level is not SERIALIZABLE

        the transaction has not issued a write query yet(until a write
        query issued, load balance is possible)

For a intuitive understanding of the last phrase, "Write query" should
include not only DDL/DML, but SELECTs having write functions specified
in the black or the white function list. But the reality is different
as written in the message below.

> I thought so, but was wrong. The decision is made by calling
> is_select_query(node, query)), which essentially regards a statement
> as SELECT if it looks like a SELECT. i.e. not checking if it calls
> writing functions. This seems a bug and I am going to discuss with

I think we should fix this. But I also am afraid this will change the
pgpool-II's behavior of load balancing. Do we want to add new
directive to control the load balance behavior to keep the backward
compatibility?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Tatsuo Ishii <ishii at postgresql.org>
Subject: [pgpool-general: 3672] Re: Load balancing in explicit transaction block?
Date: Thu, 30 Apr 2015 10:27:31 +0900 (JST)
Message-ID: <20150430.102731.1165897528140188188.t-ishii at sraoss.co.jp>

>> Hi, Tatsuo!
>> Sorry, but I still dont understand. Stored function is not balanced, it
>> executes on master only. Why should we add it to black list?
> 
> The reason why the SELECT (and FETCH) is not load balanced is, it uses
> cursor, not because of the stored function. So if you use the function
> by not using CURSOR it will be load balanced unless you register it in
> the black function list.
> 
>> The problem is with next statement: plain select from table,  no functions
>> used.
>> If we add function to black list, the next select will not be balanced too?
> 
> I thought so, but was wrong. The decision is made by calling
> is_select_query(node, query)), which essentially regards a statement
> as SELECT if it looks like a SELECT. i.e. not checking if it calls
> writing functions. This seems a bug and I am going to discuss with
> other developers.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> чт, 30 апр. 2015, 9:10, Tatsuo Ishii <ishii at postgresql.org>:
>> 
>>> > Hi, Tatsuo!
>>> > Thank you for answer!
>>> > It is good that function is not balanced - it writes to database. But the
>>> > next statement in transaction (select after function) is balanced. Is it
>>> > correct?
>>>
>>> True.
>>>
>>> > We do not want select to be balanced. And we expected it not to be
>>> balanced
>>> > to slave, because it sits next to writing statement and in same explicit
>>> > transaction.
>>>
>>> You really should use black_function_list (or white_function_list).
>>>
>>> > With best regards, Sergey Melekhin
>>> >
>>> > вт, 28 апр. 2015 г. в 19:03, Tatsuo Ishii <ishii at postgresql.org>:
>>> >
>>> >> Thanks for the test case. The reason why your function is not load
>>> >> balanced is, you are using cursor statement. Currently pgpool-II does
>>> >> not load balance if cursor is used. This is because the cursor
>>> >> statement may use DML, which will raise problem if the cursor
>>> >> statement is sent to standby.
>>> >>
>>> >> Best regards,
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese:http://www.sraoss.co.jp
>>> >>
>>> >> > Hi!
>>> >> > Here is test case that fails on slow replication.
>>> >> >
>>> >> > чт, 23 апр. 2015 г. в 17:16, Tatsuo Ishii <ishii at postgresql.org>:
>>> >> >
>>> >> >> > Our stored function is not balanced. But next select in the same
>>> >> >>
>>> >> >> Could you show me a concrete example? I hardly believe that SELECT
>>> >> >> foo() or SELECT * FROM foo() is not load balanced if neither white
>>> and
>>> >> >> black function lists are not specified.
>>> >> >>
>>> >> >> > transaction is balanced. Documentation, if i understand it
>>> correctly,
>>> >> >> > states that it should not be balanced. I think it should be
>>> clarified.
>>> >> >> >
>>> >> >> > http://www.pgpool.net/docs/latest/pgpool-en.html
>>> >> >> >
>>> >> >> > For a query to be load balanced, all the following requirements
>>> must
>>> >> be
>>> >> >> met:
>>> >> >> >
>>> >> >> >    - PostgreSQL version 7.4 or later
>>> >> >> >    - *the query must not be in an explicitly declared transaction
>>> >> (i.e.
>>> >> >> not
>>> >> >> >    in a BEGIN ~ END block)*
>>> >> >>
>>> >> >> This is simply wrong. Sorry for the outdated info. I will fix it.
>>> >> >>
>>> >> >> > So query in explicitly declared transaction will not be balanced.
>>> But
>>> >> >> then
>>> >> >> > in the stame doc:
>>> >> >> >
>>> >> >> > http://www.pgpool.net/docs/latest/pgpool-en.html
>>> >> >> > In an explicit transaction:Transaction starting commands such as
>>> BEGIN
>>> >> >> are
>>> >> >> > sent to the primary node.Following SELECT and some other queries
>>> that
>>> >> can
>>> >> >> > be sent to both primary or standby are executed in the transaction
>>> or
>>> >> on
>>> >> >> > the standby node.Commands which cannot be executed on the standby
>>> >> such as
>>> >> >> > INSERT are sent to the primary. After one of these commands, even
>>> >> SELECTs
>>> >> >> > are sent to the primary node, This is because these SELECTs might
>>> >> want to
>>> >> >> > see the result of an INSERT immediately. This behavior continues
>>> until
>>> >> >> the
>>> >> >> > transaction closes or aborts.
>>> >> >> >
>>> >> >> > This states that after any writing in transaction all statements
>>> >> should
>>> >> >> not
>>> >> >> > be balanced. This contradicts first part, which can be understood
>>> as
>>> >> if
>>> >> >> > transactions are not balanced at all.
>>> >> >>
>>> >> >> Correct.
>>> >> >>
>>> >> >> > ср, 22 апр. 2015 г. в 18:24, Tatsuo Ishii <ishii at postgresql.org>:
>>> >> >> >
>>> >> >> >> pgpool-II has no idea which is a stored function or not regarding
>>> >> load
>>> >> >> >> balancing: all functions are treated as same. So "SELECT
>>> >> upper('foo')"
>>> >> >> >> is load balanced if white and black function list is empty.
>>> >> >> >>
>>> >> >> >> BTW if you do not want to load balance particular SELECT, you can
>>> add
>>> >> >> >> "/*NO LOAD BALANCE*/" to the SELECT statement. See the manual for
>>> >> more
>>> >> >> >> details.
>>> >> >> >>
>>> >> >> >> Best regards,
>>> >> >> >> --
>>> >> >> >> Tatsuo Ishii
>>> >> >> >> SRA OSS, Inc. Japan
>>> >> >> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >> >>
>>> >> >> >> > Hi, Lachezar!
>>> >> >> >> > The function itself is not balanced, it would fail on read only
>>> >> node
>>> >> >> >> > (slave) but it does not (by default stored functions are not
>>> >> balanced
>>> >> >> >> even
>>> >> >> >> > if they are not explicitly listed in black list). Problem is
>>> with
>>> >> next
>>> >> >> >> > statement in this transaction, which is select. And we want this
>>> >> >> select
>>> >> >> >> to
>>> >> >> >> > be run on master.
>>> >> >> >> > And looking in documentation I think it should be run on master.
>>> >> >> >> >
>>> >> >> >> > вт, 21 апр. 2015 г. в 18:26, Lachezar Dobrev <
>>> l.dobrev at gmail.com>:
>>> >> >> >> >
>>> >> >> >> >>   Non-authority response:
>>> >> >> >> >>   You might want to use a white-list[1] or black-list[2] of
>>> >> functions
>>> >> >> >> >> to inform the PgPool which functions are suitable for
>>> >> distributing,
>>> >> >> or
>>> >> >> >> >> which are not suitable. Documentation specifies that you can
>>> use
>>> >> >> >> >> either, but not both. I suppose adding your functions to the
>>> >> >> >> >> black_function_list would be easier. Not sure if that will help
>>> >> >> >> >> though. My understanding was, that everything in a transaction
>>> >> >> (BEGIN;
>>> >> >> >> >> /* EVERYTHING; */ END;) would be sent to the master.
>>> >> >> >> >>
>>> >> >> >> >>   [1]
>>> >> >> >>
>>> http://www.pgpool.net/docs/latest/pgpool-en.html#WHITE_FUNCTION_LIST
>>> >> >> >> >>   [2]
>>> >> >> >>
>>> http://www.pgpool.net/docs/latest/pgpool-en.html#BLACK_FUNCTION_LIST
>>> >> >> >> >>
>>> >> >> >> >> 2015-04-21 5:55 GMT+03:00 Сергей Мелехин <cpro29a at gmail.com>:
>>> >> >> >> >> > There is another part in docs:
>>> >> >> >> >> >
>>> >> >> >> >> > http://www.pgpool.net/docs/latest/pgpool-en.html
>>> >> >> >> >> > In an explicit transaction:Transaction starting commands
>>> such as
>>> >> >> BEGIN
>>> >> >> >> >> are
>>> >> >> >> >> > sent to the primary node.Following SELECT and some other
>>> queries
>>> >> >> that
>>> >> >> >> >> can be
>>> >> >> >> >> > sent to both primary or standby are executed in the
>>> transaction
>>> >> or
>>> >> >> on
>>> >> >> >> the
>>> >> >> >> >> > standby node.Commands which cannot be executed on the standby
>>> >> such
>>> >> >> as
>>> >> >> >> >> INSERT
>>> >> >> >> >> > are sent to the primary. After one of these commands, even
>>> >> SELECTs
>>> >> >> are
>>> >> >> >> >> sent
>>> >> >> >> >> > to the primary node, This is because these SELECTs might
>>> want to
>>> >> >> see
>>> >> >> >> the
>>> >> >> >> >> > result of an INSERT immediately. This behavior continues
>>> until
>>> >> the
>>> >> >> >> >> > transaction closes or aborts.
>>> >> >> >> >> >
>>> >> >> >> >> > It looks like pgpool treats all stored functions as non
>>> writing
>>> >> in
>>> >> >> >> this
>>> >> >> >> >> > scenario.
>>> >> >> >> >> >
>>> >> >> >> >> > вт, 21 апр. 2015 г. в 12:05, Сергей Мелехин <
>>> cpro29a at gmail.com
>>> >> >:
>>> >> >> >> >> >
>>> >> >> >> >> >> Hi!
>>> >> >> >> >> >> Our test server is relatively slow, and there are some lags
>>> in
>>> >> >> >> >> replication
>>> >> >> >> >> >> between master and slave sometimes. We are using pgpool
>>> 3.3.4
>>> >> in
>>> >> >> load
>>> >> >> >> >> >> balancing mode to mimic our production environment.
>>> >> >> >> >> >> Some unit tests are making some changes in database (calling
>>> >> >> stored
>>> >> >> >> >> >> functions) and immediately check them issuing select
>>> queries.
>>> >> >> >> Sometimes
>>> >> >> >> >> they
>>> >> >> >> >> >> fail not finding records they have just inserted.
>>> >> >> >> >> >> Judging by this verse in documentation:
>>> >> >> >> >> >>
>>> >> >> >> >> >> For a query to be load balanced, all the following
>>> requirements
>>> >> >> must
>>> >> >> >> be
>>> >> >> >> >> >> met:
>>> >> >> >> >> >>
>>> >> >> >> >> >> PostgreSQL version 7.4 or later
>>> >> >> >> >> >> the query must not be in an explicitly declared transaction
>>> >> (i.e.
>>> >> >> not
>>> >> >> >> >> in a
>>> >> >> >> >> >> BEGIN ~ END block)
>>> >> >> >> >> >>
>>> >> >> >> >> >> we decided that putting whole test in transaction block will
>>> >> avoid
>>> >> >> >> load
>>> >> >> >> >> >> balancing for such test, but it looks like that no matter
>>> >> being in
>>> >> >> >> >> >> transaction, selects are replicated to slave and dont find
>>> >> desired
>>> >> >> >> data
>>> >> >> >> >> >> because of replication lags. This errors are present when we
>>> >> use
>>> >> >> >> stored
>>> >> >> >> >> >> functions, when we use explicit DML, selects inside
>>> transaction
>>> >> >> are
>>> >> >> >> not
>>> >> >> >> >> >> replicated. Function names are not included in white or
>>> black
>>> >> >> list in
>>> >> >> >> >> >> pgpool.conf.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Is it normal behaviour, or is it a bug?
>>> >> >> >> >> >>
>>> >> >> >> >> >> I include simple test, it fails in our slow replication
>>> >> >> environment.
>>> >> >> >> >> >> You'll need python3 and psycopg2 to run it. And there is db
>>> >> >> >> connection
>>> >> >> >> >> >> string constant "DB" in the beginning of script you'll have
>>> to
>>> >> >> >> change.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Thank you for your work!
>>> >> >> >> >> >> Sergey Melekhin
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > pgpool-general mailing list
>>> >> >> >> >> > pgpool-general at pgpool.net
>>> >> >> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


More information about the pgpool-hackers mailing list