[pgpool-hackers: 97] Re: "COMMIT" sent to read-only slave in streaming replication

alex alex at smalldemons.com
Fri Jul 27 08:49:10 JST 2012


I will look through my configuration again to see if there is a variable 
I have overlooked.


   alex


On 7/24/12 6:37 PM, Tatsuo Ishii wrote:
> I have tried with following test case(primary is node 1 and standby is
> node 0, black_function_list = '.*'):
>
> psql -p 11002 test
> Pager usage is off.
> psql (9.2beta2)
> Type "help" for help.
>
> test=# select 3; -- make sure that load balance node is 0
>   ?column?
> ----------
>          3
> (1 row)
>
> test=# begin;
> BEGIN
> test=# select current_setting('transaction_isolation');
>   current_setting
> -----------------
>   read committed
> (1 row)
>
> test=# end;
> COMMIT
>
> It seems I couldn't reproduce your problem. Here is pgpool log.
>
> 2012-07-25 10:08:45 LOG:   pid 26030: connection received: host=[local]
> 2012-07-25 10:08:49 LOG:   pid 26030: DB node id: 0 backend pid: 26048 statement: select 3;
> 2012-07-25 10:08:57 LOG:   pid 26030: DB node id: 0 backend pid: 26048 statement: begin;
> 2012-07-25 10:08:57 LOG:   pid 26030: DB node id: 1 backend pid: 26049 statement: begin;
> 2012-07-25 10:09:01 LOG:   pid 26030: DB node id: 1 backend pid: 26049 statement: select current_setting('transaction_isolation');
> 2012-07-25 10:09:18 LOG:   pid 26030: DB node id: 1 backend pid: 26049 statement: end;
> 2012-07-25 10:09:18 LOG:   pid 26030: DB node id: 0 backend pid: 26048 statement: end;
>
> select current_setting is as expected sent to node 1, that is primary
> although load balance node is 0(standby). If pgpool thought that
> "select current_setting" could be load balanced, it should have been
> sent to node 0. So it seems working correctly.
>
> Any advice to reproduce your problem?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> Thanks for the report. I will look into this.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> Sorry for the slow reply, but I had to build a test setup in our
>>> development environment --the problem occurred in our production
>>> environment.
>>>
>>> I can reproduce the problem reliably now.  The key seems to be having
>>> the primary as backend1 and slave as backend0 when pgpool starts.
>>> We're using python with sqlalchemy which, for its own reasons, does a
>>> "SELECT current_setting('transaction_isolation')" followed by a
>>> COMMIT.  Really neither of those statements should go to the slave,
>>> the COMMIT because there's no transaction and the SELECT because that
>>> function isn't whitelisted.
>>>
>>> Let me know what other information I can provide to help.
>>>
>>>
>>>    alex
>>>
>>>
>>> On 7/19/12 4:02 PM, Tatsuo Ishii wrote:
>>>> Ok. Do you know the way to reproduce the problem reliably?
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>>>>> I meant I turned off query caching; I'm still using connection
>>>>> caching.  This is pgpool-II version 3.2alpha1 (hatsuiboshi).
>>>>>
>>>>>
>>>>>     alex
>>>>>
>>>>>
>>>>> On 7/19/12 3:00 PM, Tatsuo Ishii wrote:
>>>>>> Thanks for the report. I need more info however. What version is
>>>>>> pgpool-II?
>>>>>>
>>>>>>> load balancing with no caching (yet).  The whitelist is empty and the
>>>>>> You mean you turn off connection caching? or query caching?
>>>>>> --
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS, Inc. Japan
>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>
>>>>>>> I had an interesting problem yesterday and wanted to report it before
>>>>>>> spending too much time tracking it down.
>>>>>>>
>>>>>>> I have two databases configured with postgres streaming replication
>>>>>>> that I was putting together with a new instance of pgpool.  I'm using
>>>>>>> load balancing with no caching (yet).  The whitelist is empty and the
>>>>>>> blacklist is ".*" (so, everything).  I had the primary configured as
>>>>>>> backend1 and the slave configured as backend0.  I know pgpool prefers
>>>>>>> it the other way around, but also that it's smart enough to figure out
>>>>>>> which is which; backend0 is the normal primary but we'd had a failover
>>>>>>> recently and haven't failed back yet.
>>>>>>>
>>>>>>> What I saw was the queries correctly went to backend1, but the COMMIT
>>>>>>> went to both, causing errors that filtered up to the application.
>>>>>>> When I switched backend0 and backend1 in the configs everything worked
>>>>>>> fine.
>>>>>>>
>>>>>>> (from the postgres logs)
>>>>>>> LOG:  statement: COMMIT
>>>>>>> WARNING:  there is no transaction in progress
>>>>>>>
>>>>>>> (and from pgpool's logs)
>>>>>>> ProcessBackendResponse: kind from backend: Z
>>>>>>> pool_read_message_length: slot: 1 length: 5
>>>>>>> ReadyForQuery: transaction state:T
>>>>>>> pool_unset_query_in_progress: done
>>>>>>> pool_unset_query_in_progress: done
>>>>>>> ProcessBackendResponse: Ready For Query
>>>>>>> ProcessFrontendResponse: kind from frontend Q(51)
>>>>>>> pool_unset_doing_extended_query_message: done
>>>>>>> statement: COMMIT
>>>>>>> pool_set_query_in_progress: done
>>>>>>> send_to_where: 3 query: COMMIT
>>>>>>> DB node id: 1 backend pid: 18487 statement: COMMIT
>>>>>>> wait_for_query_response: waiting for backend 1 completing the query
>>>>>>> DB node id: 0 backend pid: 7692 statement: COMMIT
>>>>>>> wait_for_query_response: waiting for backend 0 completing the query
>>>>>>> pool_send_and_wait: Error or notice message from backend: : DB node
>>>>>>> id: 0 backend pid: 7692 statement: COMMIT message: there is no tr
>>>>>>> ansaction in progress
>>>>>>> read_kind_from_backend: read kind from 0 th backend N NUM_BACKENDS: 2
>>>>>>> read_kind_from_backend: read kind from 1 th backend C NUM_BACKENDS: 2
>>>>>>> read_kind_from_backend: 1 th kind C does not match with master or
>>>>>>> majority connection kind Nkind mismatch among backends. Possible last
>>>>>>> query was: "COMMIT" kind details are: 0[N: there is no transaction in
>>>>>>> progress] 1[C]
>>>>>>> do_child: exits with status 1 due to error
>>>>>>>
>>>>>>>
>>>>>>> It looks like find_primary_node works as intended, but for at least
>>>>>>> one check PRIMARY_NODE_ID isn't being tested for type stream. Before I
>>>>>>> spend too long wandering the code I wanted to see if this was intended
>>>>>>> (configurable) behavior, even though it doesn't seem to be.
>>>>>>> _______________________________________________
>>>>>>> pgpool-hackers mailing list
>>>>>>> pgpool-hackers at pgpool.net
>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers



More information about the pgpool-hackers mailing list