[pgpool-hackers: 233] Re: Unexpected behavior of pgpool-II

Wed May 8 18:33:02 JST 2013

Hi Ahsan,

> Hi Tatsuo,
> 
> Good to see that you found the problem. We have a customer who has been waiting for a fix for this issue for sometime.
> 
> Which branches are you planning to commit this to and when?

The fix has been already committed to master, 3.2-stable and 3.1-stable.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> -- Ahsan
> 
> -----Original Message-----
> From: Tatsuo Ishii <ishii at postgresql.org>
> Sender: pgpool-hackers-bounces at pgpool.net
> Date: Wed, 08 May 2013 18:02:03 
> To: <anaeem.it at gmail.com>
> Cc: <pgpool-hackers at pgpool.net>
> Subject: [pgpool-hackers: 231] Re: Unexpected behavior of pgpool-II
> 
> Sorry for delay. I think I found a problem regarding %H for
> follow_master_command. Pgpool-II assigns it "new master node", which
> is defined as "the first live node". On the other hand %H value
> expected in follow_master_command is "new primary node", which is not
> neccesary same as "new master node".
> 
> Attached patch should fix the problem(against 3.2-stable).
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Hi,
>> 
>> There is unexpected behavior in pgpool2 that seems to be a bug. Please find
>> the details as following i.e.
>> 
>> Phenomenon
>> 
>>> pgpool2 configuration option "follow_master_command" special character %H
>>> (Hostname of the new master node) is not working as expected
>>> In master slave mode after failover, 2nd node become new master but got
>>> unexpected hostname (via %H). Please find attached pgpool.conf for details.
>> 
>> 
>> Versions
>> 
>>> OS version: RHEL - 64
>>> pgpool version: pgpool-II version 3.2.1 (namameboshi)
>> 
>> 
>> *Details/Observations* :-
>> 
>>> At Start
>> 
>>>  node_id | hostname | port | status | lb_weight |  role
>>> ---------+----------+------+--------+-----------+---------
>>>  0       | dmp02    | 5444 | 2      | 0.333333  | primary
>>>  1       | dmp03    | 5444 | 2      | 0.333333  | standby
>>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>>>
>> 
>> 
>>> Failover arguments when the user stopped DMP02.
>> 
>>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:12 JST
>> 
>> FAILED_NODE_ID =0
>>> FAILED_NODE_NAME=dmp02
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =dmp03
>>> OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>> 0
>>>
>> 
>> 
>>> Then, these arguments told user that the master node has been changed
>> 
>>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:13 JST
>>> FAILED_NODE_ID =0
>>> FAILED_NODE_NAME=dmp02
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =dmp03
>>> OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:17 JST
>>> FAILED_NODE_ID =2
>>> FAILED_NODE_NAME=dmp04
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =dmp02  *<--- *** The master name is unexpected ****
>>> OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>>
>> 
>> 
>>>  node_id | hostname | port | status | lb_weight |  role
>>> ---------+----------+------+--------+-----------+---------
>>>  0       | dmp02    | 5444 | 2      | 0.333333  | standby
>>>  1       | dmp03    | 5444 | 2      | 0.333333  | primary
>>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>>>
>> 
>> 
>>>   pid  | usesysid |   usename    | application_name |  client_addr  |
>>> client_hostname | client_port |    backend_start                 |   state
>>>   | sent_location | write_location | flush_location | replay_location |
>>> sync_priority | sync_state
>>>
>>> -------+----------+--------------+------------------+---------------+-----------------+-------------+----------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
>>>  20641 |       10 | enterprisedb | dmp02            | 172.24.217.23 |
>>>             |       16948 | 08-FEB-13 17:39:44.078173 +09:00 | streaming |
>>> 14/C50000E0   | 14/C50000E0    | 14/C50000E0    | 14/C50000E0     |
>>>     0 | async
>>> (1 çs)
>>>
>> 
>> There is one pg_stat_replication record because dmp04 is connecting to dmp02
>> 
>> This phenomenon is reproduced by user. What we need to do is just make the
>> scripts like follow_master.sh and launch pgpool process with -F [path to
>> pcp.conf].
>> 
>> 1. Before failover
>> 
>>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>>> node_id | hostname | port | status | lb_weight | role
>>> ---------+----------+------+--------+-----------+---------
>>> 0 | slave1 | 5444 | 2 | 0.333333 | primary
>>> 1 | slave2 | 5444 | 2 | 0.333333 | standby
>>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>>> (3 行)
>> 
>> 
>> 2. We stopped slave 1 node with "pg_ctl -D /data -m immediate stop".
>> 
>> 3. After failover
>> 
>>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>>> node_id | hostname | port | status | lb_weight | role
>>> ---------+----------+------+--------+-----------+---------
>>> 0 | slave1 | 5444 | 2 | 0.333333 | standby
>>> 1 | slave2 | 5444 | 2 | 0.333333 | primary
>>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>>> (3 行)
>> 
>> 
>> 4. Then user got output like below after failover that is not expected that
>> is present at the end of follow_master.sh.log i.e.
>> 
>>> NEW_MASTER_NAME =slave1
>> 
>> 
>> [root at master1 ~]# cat /tmp/failover.sh.log
>> 
>>> 2013年 3月 13日 水曜日 11:36:28 JST
>>> FAILED_NODE_ID =0
>>> FAILED_NODE_NAME=slave1
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =slave2
>>> OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>> 0
>>>
>> [root at master1 ~]# cat /tmp/follow_master.sh.log
>> 
>>> 2013年 3月 13日 水曜日 11:36:30 JST
>>> FAILED_NODE_ID =0
>>> FAILED_NODE_NAME=slave1
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =slave2
>>> OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>> 2013年 3月 13日 水曜日 11:36:32 JST
>>> FAILED_NODE_ID =2
>>> FAILED_NODE_NAME=slave3
>>> FAILED_NODE_PORT=5444
>>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>>> NEW_MASTER_ID =1
>>> OLD_MASTER_ID =0
>>> NEW_MASTER_NAME =*slave1
>>> *OLD_PRIMARY_ID =0
>>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> 
>> 
>> Please guide. Thanks.
>> 
>> Best Regards,
>> Asif Naeem