[pgpool-hackers: 1993] Re: Proposal to make backend node failover mechanism quorum aware

Fri Jan 20 21:54:54 JST 2017

On Fri, Jan 20, 2017 at 7:37 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> Hi Usama,
> >>
> >> If my understanding is correct, by using the quorum, Pgpool-B and
> >> Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
> >> to connect to B1 if the network failure between Pgpool-A and B1
> >> continues? I guess clients connect to Pgpool-A get error and failed to
> >> connect to database?
> >>
> >
> > Yes, that is correct. I think what we can do in this scenario is, If the
> > Pgpool-A is not allowed to failover B1 because other nodes in the
> cluster
> > (Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the
> > Pgpool-A will throw an error to its clients if B1 was the master/primary
> > Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A
> would
> > continue serving the clients without using the unreachable PostgreSQL
> > server B1.
>
> Well, that sounds overly complex to me. In this case it is likely that
> network devices or switch ports used by Pgpool-A are broken. In this
> situation, as a member of watchdog clusters, Pgpool-A cannot be
> trusted any more thus we can let Pgpool-II retire from the watchdog
> cluster.
>

Basically the scenario mentioned in the initial proposal is the very
simplistic, which has all Pgpool-II and Database servers located inside a
single network and as you pointed out the failure scenario would be more
likely because of a network device failure.

But if we consider a situation where the Pgpool-II servers and PostgreSQL
servers are distributed in multiple or even just two availability zones
then the network partitioning can happen because of disruption of the
link connecting the networks in the different availability zone.

There is also a question ("[pgpool-general: 5179] Architecture Questions
<http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html>")
posted by a user in pgpool-general mailing list who wants a similar type of
network that spans over two AWS availability zones and Pgpool-II has no
good answer to avoid split-brain of backend nodes if the corporate link
between two zones suffers a glitch.

Thanks
Best regards
Muhammad Usama

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> >> > Hi Hackers,
> >> >
> >> > This is the proposal to make the failover of backend PostgreSQL nodes
> >> > quorum aware to make it more robust and fault tolerant.
> >> >
> >> > Currently Pgpool-II proceeds to failover the backend node as soon as
> the
> >> > health check detects the failure or in case of an error occurred on
> the
> >> > backend connection (when fail_over_on_backend_error is set). This is
> good
> >> > enough for the standalone Pgpool-II server.
> >> >
> >> > But consider the scenario where we have more than one Pgpool-II (Say
> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through
> >> watchdog
> >> > and each Pgpool-II node is configured with two PostgreSQL backends
> (B1
> >> and
> >> > B2).
> >> >
> >> > Now if due to some network glitch or an issue, Pgpool-A fails or loses
> >> its
> >> > network connection with backend B1, The Pgpool-A will detect the
> failure
> >> > and detach (failover) the B1 backend and also pass this information
> to
> >> the
> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the
> Backend
> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B and
> >> > Pgpool-C nodes, But still because of a network glitch between
> Pgpool-A
> >> and
> >> > Backend B1, it will get detached from the cluster and the worst part
> is,
> >> if
> >> > the B1 was a master PostgreSQL (in master-standby configuration), the
> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as a new
> >> > master, hense making the way for split-brain and/or data corruptions.
> >> >
> >> > So my proposal is that when the Watchdog is configured in Pgpool-II
> the
> >> > backend health check of Pgpool-II should consult with other attached
> >> > Pgpool-II nodes over the watchdog to decide if the Backend node is
> >> actually
> >> > failed or if it is just a localized glitch/false alarm. And the
> failover
> >> on
> >> > the node should only be performed, when the majority of cluster
> members
> >> > agrees on the failure of nodes.
> >> >
> >> > This quorum aware architecture of failover will prevents the false
> >> > failovers and split-brain scenarios in the Backend nodes.
> >> >
> >> > What are your thoughts and suggestions on this?
> >> >
> >> > Thanks
> >> > Best regards
> >> > Muhammad Usama
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170120/5367775a/attachment-0001.html>