[pgpool-hackers: 232] Re: Unexpected behavior of pgpool-II

Ahsan Hadi ahsan.hadi at enterprisedb.com
Wed May 8 18:28:42 JST 2013


Hi Tatsuo,

Good to see that you found the problem. We have a customer who has been waiting for a fix for this issue for sometime.

Which branches are you planning to commit this to and when?

-- Ahsan

-----Original Message-----
From: Tatsuo Ishii <ishii at postgresql.org>
Sender: pgpool-hackers-bounces at pgpool.net
Date: Wed, 08 May 2013 18:02:03 
To: <anaeem.it at gmail.com>
Cc: <pgpool-hackers at pgpool.net>
Subject: [pgpool-hackers: 231] Re: Unexpected behavior of pgpool-II

Sorry for delay. I think I found a problem regarding %H for
follow_master_command. Pgpool-II assigns it "new master node", which
is defined as "the first live node". On the other hand %H value
expected in follow_master_command is "new primary node", which is not
neccesary same as "new master node".

Attached patch should fix the problem(against 3.2-stable).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi,
> 
> There is unexpected behavior in pgpool2 that seems to be a bug. Please find
> the details as following i.e.
> 
> Phenomenon
> 
>> pgpool2 configuration option "follow_master_command" special character %H
>> (Hostname of the new master node) is not working as expected
>> In master slave mode after failover, 2nd node become new master but got
>> unexpected hostname (via %H). Please find attached pgpool.conf for details.
> 
> 
> Versions
> 
>> OS version: RHEL - 64
>> pgpool version: pgpool-II version 3.2.1 (namameboshi)
> 
> 
> *Details/Observations* :-
> 
>> At Start
> 
>>  node_id | hostname | port | status | lb_weight |  role
>> ---------+----------+------+--------+-----------+---------
>>  0       | dmp02    | 5444 | 2      | 0.333333  | primary
>>  1       | dmp03    | 5444 | 2      | 0.333333  | standby
>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>>
> 
> 
>> Failover arguments when the user stopped DMP02.
> 
>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:12 JST
> 
> FAILED_NODE_ID =0
>> FAILED_NODE_NAME=dmp02
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =dmp03
>> OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> 0
>>
> 
> 
>> Then, these arguments told user that the master node has been changed
> 
>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:13 JST
>> FAILED_NODE_ID =0
>> FAILED_NODE_NAME=dmp02
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =dmp03
>> OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:17 JST
>> FAILED_NODE_ID =2
>> FAILED_NODE_NAME=dmp04
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =dmp02  *<--- *** The master name is unexpected ****
>> OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>>
> 
> 
>>  node_id | hostname | port | status | lb_weight |  role
>> ---------+----------+------+--------+-----------+---------
>>  0       | dmp02    | 5444 | 2      | 0.333333  | standby
>>  1       | dmp03    | 5444 | 2      | 0.333333  | primary
>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>>
> 
> 
>>   pid  | usesysid |   usename    | application_name |  client_addr  |
>> client_hostname | client_port |    backend_start                 |   state
>>   | sent_location | write_location | flush_location | replay_location |
>> sync_priority | sync_state
>>
>> -------+----------+--------------+------------------+---------------+-----------------+-------------+----------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
>>  20641 |       10 | enterprisedb | dmp02            | 172.24.217.23 |
>>             |       16948 | 08-FEB-13 17:39:44.078173 +09:00 | streaming |
>> 14/C50000E0   | 14/C50000E0    | 14/C50000E0    | 14/C50000E0     |
>>     0 | async
>> (1 çs)
>>
> 
> There is one pg_stat_replication record because dmp04 is connecting to dmp02
> 
> This phenomenon is reproduced by user. What we need to do is just make the
> scripts like follow_master.sh and launch pgpool process with -F [path to
> pcp.conf].
> 
> 1. Before failover
> 
>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>> node_id | hostname | port | status | lb_weight | role
>> ---------+----------+------+--------+-----------+---------
>> 0 | slave1 | 5444 | 2 | 0.333333 | primary
>> 1 | slave2 | 5444 | 2 | 0.333333 | standby
>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>> (3 行)
> 
> 
> 2. We stopped slave 1 node with "pg_ctl -D /data -m immediate stop".
> 
> 3. After failover
> 
>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>> node_id | hostname | port | status | lb_weight | role
>> ---------+----------+------+--------+-----------+---------
>> 0 | slave1 | 5444 | 2 | 0.333333 | standby
>> 1 | slave2 | 5444 | 2 | 0.333333 | primary
>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>> (3 行)
> 
> 
> 4. Then user got output like below after failover that is not expected that
> is present at the end of follow_master.sh.log i.e.
> 
>> NEW_MASTER_NAME =slave1
> 
> 
> [root at master1 ~]# cat /tmp/failover.sh.log
> 
>> 2013年 3月 13日 水曜日 11:36:28 JST
>> FAILED_NODE_ID =0
>> FAILED_NODE_NAME=slave1
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =slave2
>> OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> 0
>>
> [root at master1 ~]# cat /tmp/follow_master.sh.log
> 
>> 2013年 3月 13日 水曜日 11:36:30 JST
>> FAILED_NODE_ID =0
>> FAILED_NODE_NAME=slave1
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =slave2
>> OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> 2013年 3月 13日 水曜日 11:36:32 JST
>> FAILED_NODE_ID =2
>> FAILED_NODE_NAME=slave3
>> FAILED_NODE_PORT=5444
>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> NEW_MASTER_ID =1
>> OLD_MASTER_ID =0
>> NEW_MASTER_NAME =*slave1
>> *OLD_PRIMARY_ID =0
>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
> 
> 
> Please guide. Thanks.
> 
> Best Regards,
> Asif Naeem


More information about the pgpool-hackers mailing list