[pgpool-hackers: 3306] Re: duplicate failover request over allow_multiple_failover_requests_from_node=off

Tatsuo Ishii ishii at sraoss.co.jp
Tue Apr 16 18:03:36 JST 2019


>> Thanks. However this will change existing behavior. Probably we should
>> make the change against master branch only?
>>
> 
> Probably yes, because the current fix I have for this in my mind involves
> the configurable timeout parameter
> to make the master pgpool resign. Let me come up with the patch and then we
> work on the part of that
> needs to be back ported.
> And regarding the patch I shared upthread to continue the health check on
> quarantined nodes, Do you think we should
> also back-patch it to older versions as-well ?

Not sure we should back port both of two patches since they will
change existing behaviors (and even one of them is documented).

What do you think?

> Thanks
> Best Regards
> Muhammad Usama
> 
> 
>>
>> > Thanks
>> > Best Regards
>> > Muhammad Usama
>> >
>> >
>> >> > Thanks
>> >> > Best Regards
>> >> > Muhammad Usama
>> >> >
>> >> >
>> >> >> >> > Can you please try out the attached patch, to see if the
>> solution
>> >> >> works
>> >> >> >> for
>> >> >> >> > the situation?
>> >> >> >> > The patch is generated against current master branch.
>> >> >> >> >
>> >> >> >> > Thanks
>> >> >> >> > Best Regards
>> >> >> >> > Muhammad Usama
>> >> >> >> >
>> >> >> >> > On Wed, Apr 10, 2019 at 2:04 PM TAKATSUKA Haruka <
>> >> >> harukat at sraoss.co.jp>
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> >> Hello, Pgpool developers
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I found Pgpool-II watchdog is too strict for duplicate failover
>> >> >> request
>> >> >> >> >> with allow_multiple_failover_requests_from_node=off setting.
>> >> >> >> >>
>> >> >> >> >> For example, A watchdog cluster with 3 pgpool instances is
>> here.
>> >> >> >> >> Their backends are PostgreSQL servers using streaming
>> replication.
>> >> >> >> >>
>> >> >> >> >> When the communication between master/coordinator pgpool and
>> >> >> >> >> primary PostgreSQL node is down during a short period
>> >> >> >> >> (or pgpool do any false-positive judgement by various reasons),
>> >> >> >> >> and then the pgpool tries to failover but cannot get the
>> >> consensus,
>> >> >> >> >> so it makes the primary node into quarantine status. It cannot
>> >> >> >> >> be reset automatically. As a result, the service becomes
>> >> unavailable.
>> >> >> >> >>
>> >> >> >> >> This case generates logs like the following:
>> >> >> >> >>
>> >> >> >> >> pid 1234: LOG:  new IPC connection received
>> >> >> >> >> pid 1234: LOG:  watchdog received the failover command from
>> local
>> >> >> >> >> pgpool-II on IPC interface
>> >> >> >> >> pid 1234: LOG:  watchdog is processing the failover command
>> >> >> >> >> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on
>> IPC
>> >> >> >> interface
>> >> >> >> >> pid 1234: LOG:  Duplicate failover request from "pg1:5432 Linux
>> >> pg1"
>> >> >> >> node
>> >> >> >> >> pid 1234: DETAIL:  request ignored
>> >> >> >> >> pid 1234: LOG:  failover requires the majority vote, waiting
>> for
>> >> >> >> consensus
>> >> >> >> >> pid 1234: DETAIL:  failover request noted
>> >> >> >> >> pid 4321: LOG:  degenerate backend request for 1 node(s) from
>> pid
>> >> >> >> [4321],
>> >> >> >> >> is changed to quarantine node request by watchdog
>> >> >> >> >> pid 4321: DETAIL:  watchdog is taking time to build consensus
>> >> >> >> >>
>> >> >> >> >> Note that this case dosen't have any communication truouble
>> among
>> >> >> >> >> the Pgpool watchdog nodes.
>> >> >> >> >> You can reproduce it by changing one PostgreSQL's pg_hba.conf
>> to
>> >> >> >> >> reject the helth check access from one pgpool node in short
>> >> period.
>> >> >> >> >>
>> >> >> >> >> The document don't say that duplicate failover requests make
>> the
>> >> node
>> >> >> >> >> quarantine immediately. I think it should be just igunoring the
>> >> >> request.
>> >> >> >> >>
>> >> >> >> >> A patch file for head of V3_7_STABLE is attached.
>> >> >> >> >> Pgpool with this patch also disturbs failover by single
>> pgpool's
>> >> >> >> repeated
>> >> >> >> >> failover requests. But it can recover when the connection
>> trouble
>> >> is
>> >> >> >> gone.
>> >> >> >> >>
>> >> >> >> >> Does this change have any problem?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> with best regards,
>> >> >> >> >> TAKATSUKA Haruka <harukat at sraoss.co.jp>
>> >> >> >> >> _______________________________________________
>> >> >> >> >> pgpool-hackers mailing list
>> >> >> >> >> pgpool-hackers at pgpool.net
>> >> >> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>


More information about the pgpool-hackers mailing list