[pgpool-hackers: 235] Re: Unexpected behavior of pgpool-II

Tatsuo Ishii ishii at postgresql.org
Wed May 8 19:08:44 JST 2013


> On Wed, May 8, 2013 at 2:32 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Ahsan,
>>
>> > Hi Tatsuo,
>> >
>> > Good to see that you found the problem. We have a customer who has been
>> waiting for a fix for this issue for sometime.
>> >
>> > Which branches are you planning to commit this to and when?
>>
>> The fix has been already committed to master, 3.2-stable and 3.1-stable.
>>
> 
> Thanks Tatsuo.
> 
> Were you able to reproduce the issue? As i understand it wasn't straight
> forward to reproduce.

I didn't even try to reproduce the problem because it was a
straightforward bug from the source code's point of view.

I don't why you could not reproduce the problem, but I think the
reason why your customer saw the problem is very clear.

On 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:13 JST, follow_master_command was
executed against dmp02 (you can see >> >>> FAILED_NODE_NAME=dmp02). At
this point node 0 was down, so new master host name was determined as
"dmp03" because it was the first live node.

On 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:17 JST, follow_master_command was
executed against dmp04, and this time dmp02 was already recovered. So
NEW_MASTER_NAME was assigned to dmp02, because now it's a live node.

I guess your follow_master_command was executed too early and at the
time when follow_master_command was executed against dmp04, dmp02 is
still down status and %H points to dmp03, which the first live node at
the time.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

>> > -- Ahsan
>> >
>> > -----Original Message-----
>> > From: Tatsuo Ishii <ishii at postgresql.org>
>> > Sender: pgpool-hackers-bounces at pgpool.net
>> > Date: Wed, 08 May 2013 18:02:03
>> > To: <anaeem.it at gmail.com>
>> > Cc: <pgpool-hackers at pgpool.net>
>> > Subject: [pgpool-hackers: 231] Re: Unexpected behavior of pgpool-II
>> >
>> > Sorry for delay. I think I found a problem regarding %H for
>> > follow_master_command. Pgpool-II assigns it "new master node", which
>> > is defined as "the first live node". On the other hand %H value
>> > expected in follow_master_command is "new primary node", which is not
>> > neccesary same as "new master node".
>> >
>> > Attached patch should fix the problem(against 3.2-stable).
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese: http://www.sraoss.co.jp
>> >
>> >> Hi,
>> >>
>> >> There is unexpected behavior in pgpool2 that seems to be a bug. Please
>> find
>> >> the details as following i.e.
>> >>
>> >> Phenomenon
>> >>
>> >>> pgpool2 configuration option "follow_master_command" special character
>> %H
>> >>> (Hostname of the new master node) is not working as expected
>> >>> In master slave mode after failover, 2nd node become new master but got
>> >>> unexpected hostname (via %H). Please find attached pgpool.conf for
>> details.
>> >>
>> >>
>> >> Versions
>> >>
>> >>> OS version: RHEL - 64
>> >>> pgpool version: pgpool-II version 3.2.1 (namameboshi)
>> >>
>> >>
>> >> *Details/Observations* :-
>> >>
>> >>> At Start
>> >>
>> >>>  node_id | hostname | port | status | lb_weight |  role
>> >>> ---------+----------+------+--------+-----------+---------
>> >>>  0       | dmp02    | 5444 | 2      | 0.333333  | primary
>> >>>  1       | dmp03    | 5444 | 2      | 0.333333  | standby
>> >>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>> >>>
>> >>
>> >>
>> >>> Failover arguments when the user stopped DMP02.
>> >>
>> >>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:12 JST
>> >>
>> >> FAILED_NODE_ID =0
>> >>> FAILED_NODE_NAME=dmp02
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =dmp03
>> >>> OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>> 0
>> >>>
>> >>
>> >>
>> >>> Then, these arguments told user that the master node has been changed
>> >>
>> >>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:13 JST
>> >>> FAILED_NODE_ID =0
>> >>> FAILED_NODE_NAME=dmp02
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =dmp03
>> >>> OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:17 JST
>> >>> FAILED_NODE_ID =2
>> >>> FAILED_NODE_NAME=dmp04
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =dmp02  *<--- *** The master name is unexpected ****
>> >>> OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>>
>> >>
>> >>
>> >>>  node_id | hostname | port | status | lb_weight |  role
>> >>> ---------+----------+------+--------+-----------+---------
>> >>>  0       | dmp02    | 5444 | 2      | 0.333333  | standby
>> >>>  1       | dmp03    | 5444 | 2      | 0.333333  | primary
>> >>>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>> >>>
>> >>
>> >>
>> >>>   pid  | usesysid |   usename    | application_name |  client_addr  |
>> >>> client_hostname | client_port |    backend_start                 |
>> state
>> >>>   | sent_location | write_location | flush_location | replay_location |
>> >>> sync_priority | sync_state
>> >>>
>> >>>
>> -------+----------+--------------+------------------+---------------+-----------------+-------------+----------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
>> >>>  20641 |       10 | enterprisedb | dmp02            | 172.24.217.23 |
>> >>>             |       16948 | 08-FEB-13 17:39:44.078173 +09:00 |
>> streaming |
>> >>> 14/C50000E0   | 14/C50000E0    | 14/C50000E0    | 14/C50000E0     |
>> >>>     0 | async
>> >>> (1 çs)
>> >>>
>> >>
>> >> There is one pg_stat_replication record because dmp04 is connecting to
>> dmp02
>> >>
>> >> This phenomenon is reproduced by user. What we need to do is just make
>> the
>> >> scripts like follow_master.sh and launch pgpool process with -F [path to
>> >> pcp.conf].
>> >>
>> >> 1. Before failover
>> >>
>> >>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>> >>> node_id | hostname | port | status | lb_weight | role
>> >>> ---------+----------+------+--------+-----------+---------
>> >>> 0 | slave1 | 5444 | 2 | 0.333333 | primary
>> >>> 1 | slave2 | 5444 | 2 | 0.333333 | standby
>> >>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>> >>> (3 行)
>> >>
>> >>
>> >> 2. We stopped slave 1 node with "pg_ctl -D /data -m immediate stop".
>> >>
>> >> 3. After failover
>> >>
>> >>> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
>> >>> node_id | hostname | port | status | lb_weight | role
>> >>> ---------+----------+------+--------+-----------+---------
>> >>> 0 | slave1 | 5444 | 2 | 0.333333 | standby
>> >>> 1 | slave2 | 5444 | 2 | 0.333333 | primary
>> >>> 2 | slave3 | 5444 | 2 | 0.333333 | standby
>> >>> (3 行)
>> >>
>> >>
>> >> 4. Then user got output like below after failover that is not expected
>> that
>> >> is present at the end of follow_master.sh.log i.e.
>> >>
>> >>> NEW_MASTER_NAME =slave1
>> >>
>> >>
>> >> [root at master1 ~]# cat /tmp/failover.sh.log
>> >>
>> >>> 2013年 3月 13日 水曜日 11:36:28 JST
>> >>> FAILED_NODE_ID =0
>> >>> FAILED_NODE_NAME=slave1
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =slave2
>> >>> OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>> 0
>> >>>
>> >> [root at master1 ~]# cat /tmp/follow_master.sh.log
>> >>
>> >>> 2013年 3月 13日 水曜日 11:36:30 JST
>> >>> FAILED_NODE_ID =0
>> >>> FAILED_NODE_NAME=slave1
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =slave2
>> >>> OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>> 2013年 3月 13日 水曜日 11:36:32 JST
>> >>> FAILED_NODE_ID =2
>> >>> FAILED_NODE_NAME=slave3
>> >>> FAILED_NODE_PORT=5444
>> >>> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
>> >>> NEW_MASTER_ID =1
>> >>> OLD_MASTER_ID =0
>> >>> NEW_MASTER_NAME =*slave1
>> >>> *OLD_PRIMARY_ID =0
>> >>> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>> >>
>> >>
>> >> Please guide. Thanks.
>> >>
>> >> Best Regards,
>> >> Asif Naeem
>>
> 
> 
> 
> -- 
> Ahsan Hadi
> Snr Director Product Development
> EnterpriseDB Corporation
> The Enterprise Postgres Company
> 
> Phone: +92-51-8358874
> Mobile: +92-333-5162114
> 
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
> 
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.


More information about the pgpool-hackers mailing list