[pgpool-general: 2027] Re: Problem with watchdog...

Wed Aug 14 06:12:49 JST 2013

> No problem...
> If your active server is the number 2 then run the command (killall -9
> pgpool) himself.
> You will see that all processes die and the interface with the ip delegate
> will remain active. This is the case...

That is only applicatable to pgpool-II 3.2.

3.3's watchdog monitors parent pgpool process is alive. If it's gone,
release the ip and tells standby watchdog that it goes to "down"
status.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> 2013/8/13 Tatsuo Ishii <ishii at postgresql.org>
> 
>> > You said: "I killed server2 pgpool-II parent process by kill -9"
>> > Try "killall -9 pgpool" in active server...
>>
>> Sorry for confusion but in my case server2 is the active.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > The stand by will grow up the delegate IP and the same will still on old
>> > active server and will conflict.
>> > So I am using a shell script in crontab to check this and restart pgpool
>> if
>> > necessary.
>> >
>> >
>> > 2013/8/12 Tatsuo Ishii <ishii at postgresql.org>
>> >
>> >> > On 08/05/13 14:07, Fernando Buzon wrote:
>> >> >>
>> >> >> FINAL:
>> >> >> Like I said, all is working nice.
>> >> >> And now I am with the 2 pgpools up and working again.
>> >> >> The escaled pgpool is pgpool-01.
>> >> >> I stop it with "killall -9 pgpool" and now wd_lifecheck worked fine
>> on
>> >> >> pgpool-02!
>> >> >> I dont now what was the problem early, but now is working!
>> >> >>
>> >> >
>> >> > Maybe you fix some small issue in the config during your testing.
>> >> >
>> >> >> log on pgpool-02:
>> >> >> 2013-08-05 17:52:42 LOG:   pid 11524: wd_lifecheck: lifecheck failed
>> 3
>> >> >> times. pgpool 1 (10.0.0.21:5432 <http://10.0.0.21:5432>) seems not
>> to
>> >> be working
>> >> >> 2013-08-05 17:52:42 LOG:   pid 11524: wd_escalation: escalated to
>> >> master pgpool
>> >> >> 2013-08-05 17:52:42 LOG:   pid 11524: wd_escalation:  escalated to
>> >> >> delegate_IP holder
>> >> >> 2013-08-05 17:52:52 LOG:   pid 11524: wd_lifecheck: lifecheck failed
>> 3
>> >> >> times. pgpool 1 (10.0.0.21:5432 <http://10.0.0.21:5432>) seems not
>> to
>> >> be working
>> >> >>
>> >> >> So rest only one problem, that is how to down delegate_ip from the
>> >> pgpool-01?
>> >> >> Because both servers is responding to delegate_ip.
>> >> >
>> >> > Well, the reason it doesn't get removed on pgpool-01 is because the
>> >> killall -9
>> >> > kills the pgpool processes including the watchdog without any hope of
>> >> them
>> >> > running the ifconfig down command.
>> >> >
>> >> > That said, you just need to run the ifconfig down command on
>> pgpool-01.
>> >> >
>> >> > I'm sure what you're trying to simulate is a crash, but I'm not sure
>> >> killing
>> >> > ALL the pgpool processes with -9 is a good simulation, because more
>> >> likely
>> >> > only one of the backends would crash.
>> >> >
>> >> > Maybe one of the other folks on the list can suggest a better
>> simulation
>> >> for a
>> >> > crashing pgpool service.
>> >>
>> >> I have tried with pgpool-II 3.3.0 to test the case. Initially
>> >> "server2" is the watchdog active, and "server1" is the watchdog
>> >> "standby".
>> >>
>> >> I killed server2 pgpool-II parent process by kill -9.
>> >>
>> >> - server2 releases the VIP. server2 watchdog goes to "down" status.
>> >>
>> >> - server1 becomes active and grab the VIP.
>> >>
>> >> So my guess is, 3.2's watchdog is not capable to handle the situation.
>> >>
>> >> Pgpool-II 3.3's watchdog is much more enhanced than 3.2's. I recommend
>> >> to use 3.3 if you want to seriously use watchdog.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>>