[pgpool-hackers: 1997] Re: Proposal to make backend node failover mechanism quorum aware

Mon Jan 23 13:40:16 JST 2017

Actually what i have understood from Usama's proposal is that providing the
quorum functionality at the backend level will prevent us from running into
split brain scenarios at the backend level. If pgpool II decides that a
backend is not available and promotes the stand-by as the the new master
while the backend only disappeared due to a network glitch and immediately
came back on. The quorum setting will help make this decision better and
prevent split brain scenarios.

Is my understanding correct? Do we have any users that are complaining
about this issue?

While i agree this is an important feature I think we need to have a team
discussion on features that on the horizon and then decided what are the
best ones to implement for 3.7. I have been wanting to have this
brainstorming session for sometime but not had a opportunity yet.

I think the auto failover functionality that we have discussed in another
email chain is also an interesting feature. I have mentioned this feature
by EDB pgpool II users over sometime. I know that we don't have a design
for this yet but it will be good to give this some thought.

The other feature is support for more authentication modes that are
supported by PostgreSQL.

Another important feature that i mentioned is support for multi-master
replication setup.

Again we need a good discussion on which features makes most sense for next
major release of pgpool.

My 2 cents...

On Mon, Jan 23, 2017 at 8:15 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > On Fri, Jan 20, 2017 at 7:37 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> > On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >
> >> >> Hi Usama,
> >> >>
> >> >> If my understanding is correct, by using the quorum, Pgpool-B and
> >> >> Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
> >> >> to connect to B1 if the network failure between Pgpool-A and B1
> >> >> continues? I guess clients connect to Pgpool-A get error and failed
> to
> >> >> connect to database?
> >> >>
> >> >
> >> > Yes, that is correct. I think what we can do in this scenario is, If
> the
> >> > Pgpool-A is not allowed to failover B1 because other nodes in the
> >> cluster
> >> > (Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the
> >> > Pgpool-A will throw an error to its clients if B1 was the
> master/primary
> >> > Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A
> >> would
> >> > continue serving the clients without using the unreachable PostgreSQL
> >> > server B1.
> >>
> >> Well, that sounds overly complex to me. In this case it is likely that
> >> network devices or switch ports used by Pgpool-A are broken. In this
> >> situation, as a member of watchdog clusters, Pgpool-A cannot be
> >> trusted any more thus we can let Pgpool-II retire from the watchdog
> >> cluster.
> >>
> >
> > Basically the scenario mentioned in the initial proposal is the very
> > simplistic, which has all Pgpool-II and Database servers located inside a
> > single network and as you pointed out the failure scenario would be more
> > likely because of a network device failure.
> >
> > But if we consider a situation where the Pgpool-II servers and PostgreSQL
> > servers are distributed in multiple or even just two availability zones
> > then the network partitioning can happen because of disruption of the
> > link connecting the networks in the different availability zone.
>
> Ok. Suppose we have:
>
> AZ1: Pgpool-A, Pgpool-B, B1
> AZ2: Pgpool-C, B2
>
> They are configured as shown in
> http://www.pgpool.net/docs/latest/en/html/example-aws.html.
>
> If AZ1 and AZ2 are disconnected, then I expect followings happen in
> Pgpool-II 3.6:
>
> 1) Pgpool-A and Pgpool-B detects failure of B2 because they cannot
>    reach to B2 and detache B2. They may promote B1 if B1 is standby.
>
> 2) Pgpool-A and Pgpool-B decide that new watchdog master should be
>    elected from one of Pgpool-A and Pgpool-B.
>
> 3) Pgpool-C decices that it should retire from the watchdog cluster,
>    which makes users in AZ2 impossible to access B2 through the
>    elastic IP.
>
>    Pgpool-C may or may not promote B2 (if it's a standby).
>
> According to the proposal, the only difference would be in #3:
>
> 3a) Pgpool-C decices that it should retire from the watchdog cluster,
>    which makes users in AZ2 impossible to access B2 through the
>    elastic IP. Users in AZ2 need to access Pgpool-C using Pgpool-C's
>    real IP address.
>
>    Pgpool-C does not promote B2 (if it's a standby).
>
>    Pgpool-C refuses access to B2 (if it's a primary).
>
> If my understanding is correct, the proposal seems to add little
> benefit because:
>
> - Users in AZ2 need to switch the IP from the elastic IP to real IP
>   when the link down between two regions to access DB.
>
> - Even without the proposal, users in AZ2 could access B2 in this
>   case. Users need to switch IP anyway, so switching from the elastic
>   IP to the real standby IP is no big deal.
>
> Am I missing something?
>
> > There is also a question ("[pgpool-general: 5179] Architecture Questions
> > <http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html
> >")
> > posted by a user in pgpool-general mailing list who wants a similar type
> of
> > network that spans over two AWS availability zones and Pgpool-II has no
> > good answer to avoid split-brain of backend nodes if the corporate link
> > between two zones suffers a glitch.
>
> That seems totally different story to me because there two independent
> streaming replication primary servers in the east and west regions.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> >
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> >> > Hi Hackers,
> >> >> >
> >> >> > This is the proposal to make the failover of backend PostgreSQL
> nodes
> >> >> > quorum aware to make it more robust and fault tolerant.
> >> >> >
> >> >> > Currently Pgpool-II proceeds to failover the backend node as soon
> as
> >> the
> >> >> > health check detects the failure or in case of an error occurred on
> >> the
> >> >> > backend connection (when fail_over_on_backend_error is set). This
> is
> >> good
> >> >> > enough for the standalone Pgpool-II server.
> >> >> >
> >> >> > But consider the scenario where we have more than one Pgpool-II
> (Say
> >> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through
> >> >> watchdog
> >> >> > and each Pgpool-II node is configured with two PostgreSQL backends
> >> (B1
> >> >> and
> >> >> > B2).
> >> >> >
> >> >> > Now if due to some network glitch or an issue, Pgpool-A fails or
> loses
> >> >> its
> >> >> > network connection with backend B1, The Pgpool-A will detect the
> >> failure
> >> >> > and detach (failover) the B1 backend and also pass this information
> >> to
> >> >> the
> >> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the
> >> Backend
> >> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B
> and
> >> >> > Pgpool-C nodes, But still because of a network glitch between
> >> Pgpool-A
> >> >> and
> >> >> > Backend B1, it will get detached from the cluster and the worst
> part
> >> is,
> >> >> if
> >> >> > the B1 was a master PostgreSQL (in master-standby configuration),
> the
> >> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as a
> new
> >> >> > master, hense making the way for split-brain and/or data
> corruptions.
> >> >> >
> >> >> > So my proposal is that when the Watchdog is configured in Pgpool-II
> >> the
> >> >> > backend health check of Pgpool-II should consult with other
> attached
> >> >> > Pgpool-II nodes over the watchdog to decide if the Backend node is
> >> >> actually
> >> >> > failed or if it is just a localized glitch/false alarm. And the
> >> failover
> >> >> on
> >> >> > the node should only be performed, when the majority of cluster
> >> members
> >> >> > agrees on the failure of nodes.
> >> >> >
> >> >> > This quorum aware architecture of failover will prevents the false
> >> >> > failovers and split-brain scenarios in the Backend nodes.
> >> >> >
> >> >> > What are your thoughts and suggestions on this?
> >> >> >
> >> >> > Thanks
> >> >> > Best regards
> >> >> > Muhammad Usama
> >> >>
> >>
>

-- 
Ahsan Hadi
Snr Director Product Development
EnterpriseDB Corporation
The Enterprise Postgres Company

Phone: +92-51-8358874
Mobile: +92-333-5162114

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170123/583204a9/attachment-0001.html>