[pgpool-hackers: 1996] Re: Proposal to make backend node failover mechanism quorum aware

Mon Jan 23 12:15:05 JST 2017

> On Fri, Jan 20, 2017 at 7:37 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >
>> >> Hi Usama,
>> >>
>> >> If my understanding is correct, by using the quorum, Pgpool-B and
>> >> Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
>> >> to connect to B1 if the network failure between Pgpool-A and B1
>> >> continues? I guess clients connect to Pgpool-A get error and failed to
>> >> connect to database?
>> >>
>> >
>> > Yes, that is correct. I think what we can do in this scenario is, If the
>> > Pgpool-A is not allowed to failover B1 because other nodes in the
>> cluster
>> > (Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the
>> > Pgpool-A will throw an error to its clients if B1 was the master/primary
>> > Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A
>> would
>> > continue serving the clients without using the unreachable PostgreSQL
>> > server B1.
>>
>> Well, that sounds overly complex to me. In this case it is likely that
>> network devices or switch ports used by Pgpool-A are broken. In this
>> situation, as a member of watchdog clusters, Pgpool-A cannot be
>> trusted any more thus we can let Pgpool-II retire from the watchdog
>> cluster.
>>
> 
> Basically the scenario mentioned in the initial proposal is the very
> simplistic, which has all Pgpool-II and Database servers located inside a
> single network and as you pointed out the failure scenario would be more
> likely because of a network device failure.
> 
> But if we consider a situation where the Pgpool-II servers and PostgreSQL
> servers are distributed in multiple or even just two availability zones
> then the network partitioning can happen because of disruption of the
> link connecting the networks in the different availability zone.

Ok. Suppose we have:

AZ1: Pgpool-A, Pgpool-B, B1
AZ2: Pgpool-C, B2

They are configured as shown in
http://www.pgpool.net/docs/latest/en/html/example-aws.html.

If AZ1 and AZ2 are disconnected, then I expect followings happen in
Pgpool-II 3.6:

1) Pgpool-A and Pgpool-B detects failure of B2 because they cannot
   reach to B2 and detache B2. They may promote B1 if B1 is standby.

2) Pgpool-A and Pgpool-B decide that new watchdog master should be
   elected from one of Pgpool-A and Pgpool-B.

3) Pgpool-C decices that it should retire from the watchdog cluster,
   which makes users in AZ2 impossible to access B2 through the
   elastic IP.

   Pgpool-C may or may not promote B2 (if it's a standby).

According to the proposal, the only difference would be in #3:

3a) Pgpool-C decices that it should retire from the watchdog cluster,
   which makes users in AZ2 impossible to access B2 through the
   elastic IP. Users in AZ2 need to access Pgpool-C using Pgpool-C's
   real IP address.

   Pgpool-C does not promote B2 (if it's a standby).

   Pgpool-C refuses access to B2 (if it's a primary).

If my understanding is correct, the proposal seems to add little
benefit because:

- Users in AZ2 need to switch the IP from the elastic IP to real IP
  when the link down between two regions to access DB.

- Even without the proposal, users in AZ2 could access B2 in this
  case. Users need to switch IP anyway, so switching from the elastic
  IP to the real standby IP is no big deal.

Am I missing something?

> There is also a question ("[pgpool-general: 5179] Architecture Questions
> <http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html>")
> posted by a user in pgpool-general mailing list who wants a similar type of
> network that spans over two AWS availability zones and Pgpool-II has no
> good answer to avoid split-brain of backend nodes if the corporate link
> between two zones suffers a glitch.

That seems totally different story to me because there two independent
streaming replication primary servers in the east and west regions.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Thanks
> Best regards
> Muhammad Usama
> 
> 
> 
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> >> > Hi Hackers,
>> >> >
>> >> > This is the proposal to make the failover of backend PostgreSQL nodes
>> >> > quorum aware to make it more robust and fault tolerant.
>> >> >
>> >> > Currently Pgpool-II proceeds to failover the backend node as soon as
>> the
>> >> > health check detects the failure or in case of an error occurred on
>> the
>> >> > backend connection (when fail_over_on_backend_error is set). This is
>> good
>> >> > enough for the standalone Pgpool-II server.
>> >> >
>> >> > But consider the scenario where we have more than one Pgpool-II (Say
>> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through
>> >> watchdog
>> >> > and each Pgpool-II node is configured with two PostgreSQL backends
>> (B1
>> >> and
>> >> > B2).
>> >> >
>> >> > Now if due to some network glitch or an issue, Pgpool-A fails or loses
>> >> its
>> >> > network connection with backend B1, The Pgpool-A will detect the
>> failure
>> >> > and detach (failover) the B1 backend and also pass this information
>> to
>> >> the
>> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the
>> Backend
>> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B and
>> >> > Pgpool-C nodes, But still because of a network glitch between
>> Pgpool-A
>> >> and
>> >> > Backend B1, it will get detached from the cluster and the worst part
>> is,
>> >> if
>> >> > the B1 was a master PostgreSQL (in master-standby configuration), the
>> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as a new
>> >> > master, hense making the way for split-brain and/or data corruptions.
>> >> >
>> >> > So my proposal is that when the Watchdog is configured in Pgpool-II
>> the
>> >> > backend health check of Pgpool-II should consult with other attached
>> >> > Pgpool-II nodes over the watchdog to decide if the Backend node is
>> >> actually
>> >> > failed or if it is just a localized glitch/false alarm. And the
>> failover
>> >> on
>> >> > the node should only be performed, when the majority of cluster
>> members
>> >> > agrees on the failure of nodes.
>> >> >
>> >> > This quorum aware architecture of failover will prevents the false
>> >> > failovers and split-brain scenarios in the Backend nodes.
>> >> >
>> >> > What are your thoughts and suggestions on this?
>> >> >
>> >> > Thanks
>> >> > Best regards
>> >> > Muhammad Usama
>> >>
>>