[pgpool-hackers: 1998] Re: Proposal to make backend node failover mechanism quorum aware

Mon Jan 23 14:00:29 JST 2017

Ahsan,

> Actually what i have understood from Usama's proposal is that providing the
> quorum functionality at the backend level will prevent us from running into
> split brain scenarios at the backend level. If pgpool II decides that a
> backend is not available and promotes the stand-by as the the new master
> while the backend only disappeared due to a network glitch and immediately
> came back on. The quorum setting will help make this decision better and
> prevent split brain scenarios.
> 
> Is my understanding correct? Do we have any users that are complaining
> about this issue?
> 
> While i agree this is an important feature I think we need to have a team
> discussion on features that on the horizon and then decided what are the
> best ones to implement for 3.7. I have been wanting to have this
> brainstorming session for sometime but not had a opportunity yet.

Pgpool-hackers is the best place to do that kind of discussions as
this is an open source project. Please feel free to jump in to the
discussion on this proposal.

> I think the auto failover functionality that we have discussed in another
> email chain is also an interesting feature. I have mentioned this feature
> by EDB pgpool II users over sometime. I know that we don't have a design
> for this yet but it will be good to give this some thought.

I have already posted a messge to the discussion but I got no response
so far. So still what "auto failover" means is vague.

> The other feature is support for more authentication modes that are
> supported by PostgreSQL.
> 
> Another important feature that i mentioned is support for multi-master
> replication setup.

In my understaing no one has come up with a propsal on which Pgpool-II
should deal with the multi-master replication. Also it is likely that
multi-master replication will not appear in PostgreSQL 10.0, we could
implement something in 3.8 or beyond.

> Again we need a good discussion on which features makes most sense for next
> major release of pgpool.

Sure. Please post your opinions on pgpool-hackers.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> My 2 cents...
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Jan 23, 2017 at 8:15 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > On Fri, Jan 20, 2017 at 7:37 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >
>> >> > On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
>> >> wrote:
>> >> >
>> >> >> Hi Usama,
>> >> >>
>> >> >> If my understanding is correct, by using the quorum, Pgpool-B and
>> >> >> Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
>> >> >> to connect to B1 if the network failure between Pgpool-A and B1
>> >> >> continues? I guess clients connect to Pgpool-A get error and failed
>> to
>> >> >> connect to database?
>> >> >>
>> >> >
>> >> > Yes, that is correct. I think what we can do in this scenario is, If
>> the
>> >> > Pgpool-A is not allowed to failover B1 because other nodes in the
>> >> cluster
>> >> > (Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the
>> >> > Pgpool-A will throw an error to its clients if B1 was the
>> master/primary
>> >> > Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A
>> >> would
>> >> > continue serving the clients without using the unreachable PostgreSQL
>> >> > server B1.
>> >>
>> >> Well, that sounds overly complex to me. In this case it is likely that
>> >> network devices or switch ports used by Pgpool-A are broken. In this
>> >> situation, as a member of watchdog clusters, Pgpool-A cannot be
>> >> trusted any more thus we can let Pgpool-II retire from the watchdog
>> >> cluster.
>> >>
>> >
>> > Basically the scenario mentioned in the initial proposal is the very
>> > simplistic, which has all Pgpool-II and Database servers located inside a
>> > single network and as you pointed out the failure scenario would be more
>> > likely because of a network device failure.
>> >
>> > But if we consider a situation where the Pgpool-II servers and PostgreSQL
>> > servers are distributed in multiple or even just two availability zones
>> > then the network partitioning can happen because of disruption of the
>> > link connecting the networks in the different availability zone.
>>
>> Ok. Suppose we have:
>>
>> AZ1: Pgpool-A, Pgpool-B, B1
>> AZ2: Pgpool-C, B2
>>
>> They are configured as shown in
>> http://www.pgpool.net/docs/latest/en/html/example-aws.html.
>>
>> If AZ1 and AZ2 are disconnected, then I expect followings happen in
>> Pgpool-II 3.6:
>>
>> 1) Pgpool-A and Pgpool-B detects failure of B2 because they cannot
>>    reach to B2 and detache B2. They may promote B1 if B1 is standby.
>>
>> 2) Pgpool-A and Pgpool-B decide that new watchdog master should be
>>    elected from one of Pgpool-A and Pgpool-B.
>>
>> 3) Pgpool-C decices that it should retire from the watchdog cluster,
>>    which makes users in AZ2 impossible to access B2 through the
>>    elastic IP.
>>
>>    Pgpool-C may or may not promote B2 (if it's a standby).
>>
>> According to the proposal, the only difference would be in #3:
>>
>> 3a) Pgpool-C decices that it should retire from the watchdog cluster,
>>    which makes users in AZ2 impossible to access B2 through the
>>    elastic IP. Users in AZ2 need to access Pgpool-C using Pgpool-C's
>>    real IP address.
>>
>>    Pgpool-C does not promote B2 (if it's a standby).
>>
>>    Pgpool-C refuses access to B2 (if it's a primary).
>>
>> If my understanding is correct, the proposal seems to add little
>> benefit because:
>>
>> - Users in AZ2 need to switch the IP from the elastic IP to real IP
>>   when the link down between two regions to access DB.
>>
>> - Even without the proposal, users in AZ2 could access B2 in this
>>   case. Users need to switch IP anyway, so switching from the elastic
>>   IP to the real standby IP is no big deal.
>>
>> Am I missing something?
>>
>> > There is also a question ("[pgpool-general: 5179] Architecture Questions
>> > <http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html
>> >")
>> > posted by a user in pgpool-general mailing list who wants a similar type
>> of
>> > network that spans over two AWS availability zones and Pgpool-II has no
>> > good answer to avoid split-brain of backend nodes if the corporate link
>> > between two zones suffers a glitch.
>>
>> That seems totally different story to me because there two independent
>> streaming replication primary servers in the east and west regions.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>> >
>> >
>> >
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >> >> > Hi Hackers,
>> >> >> >
>> >> >> > This is the proposal to make the failover of backend PostgreSQL
>> nodes
>> >> >> > quorum aware to make it more robust and fault tolerant.
>> >> >> >
>> >> >> > Currently Pgpool-II proceeds to failover the backend node as soon
>> as
>> >> the
>> >> >> > health check detects the failure or in case of an error occurred on
>> >> the
>> >> >> > backend connection (when fail_over_on_backend_error is set). This
>> is
>> >> good
>> >> >> > enough for the standalone Pgpool-II server.
>> >> >> >
>> >> >> > But consider the scenario where we have more than one Pgpool-II
>> (Say
>> >> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through
>> >> >> watchdog
>> >> >> > and each Pgpool-II node is configured with two PostgreSQL backends
>> >> (B1
>> >> >> and
>> >> >> > B2).
>> >> >> >
>> >> >> > Now if due to some network glitch or an issue, Pgpool-A fails or
>> loses
>> >> >> its
>> >> >> > network connection with backend B1, The Pgpool-A will detect the
>> >> failure
>> >> >> > and detach (failover) the B1 backend and also pass this information
>> >> to
>> >> >> the
>> >> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the
>> >> Backend
>> >> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B
>> and
>> >> >> > Pgpool-C nodes, But still because of a network glitch between
>> >> Pgpool-A
>> >> >> and
>> >> >> > Backend B1, it will get detached from the cluster and the worst
>> part
>> >> is,
>> >> >> if
>> >> >> > the B1 was a master PostgreSQL (in master-standby configuration),
>> the
>> >> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as a
>> new
>> >> >> > master, hense making the way for split-brain and/or data
>> corruptions.
>> >> >> >
>> >> >> > So my proposal is that when the Watchdog is configured in Pgpool-II
>> >> the
>> >> >> > backend health check of Pgpool-II should consult with other
>> attached
>> >> >> > Pgpool-II nodes over the watchdog to decide if the Backend node is
>> >> >> actually
>> >> >> > failed or if it is just a localized glitch/false alarm. And the
>> >> failover
>> >> >> on
>> >> >> > the node should only be performed, when the majority of cluster
>> >> members
>> >> >> > agrees on the failure of nodes.
>> >> >> >
>> >> >> > This quorum aware architecture of failover will prevents the false
>> >> >> > failovers and split-brain scenarios in the Backend nodes.
>> >> >> >
>> >> >> > What are your thoughts and suggestions on this?
>> >> >> >
>> >> >> > Thanks
>> >> >> > Best regards
>> >> >> > Muhammad Usama
>> >> >>
>> >>
>>
> 
> 
> 
> -- 
> Ahsan Hadi
> Snr Director Product Development
> EnterpriseDB Corporation
> The Enterprise Postgres Company
> 
> Phone: +92-51-8358874
> Mobile: +92-333-5162114
> 
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
> 
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.