[pgpool-hackers: 1987] Re: Proposal to make backend node failover mechanism quorum aware

Tue Jan 17 04:05:27 JST 2017

On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> If my understanding is correct, by using the quorum, Pgpool-B and
> Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
> to connect to B1 if the network failure between Pgpool-A and B1
> continues? I guess clients connect to Pgpool-A get error and failed to
> connect to database?
>

Yes, that is correct. I think what we can do in this scenario is, If the
Pgpool-A is not allowed to failover B1 because other nodes in the cluster
(Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the
Pgpool-A will throw an error to its clients if B1 was the master/primary
Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A would
continue serving the clients without using the unreachable PostgreSQL
server B1.

Thanks
Best regards
Muhammad Usama

--
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Hackers,
> >
> > This is the proposal to make the failover of backend PostgreSQL nodes
> > quorum aware to make it more robust and fault tolerant.
> >
> > Currently Pgpool-II proceeds to failover the backend node as soon as the
> > health check detects the failure or in case of an error occurred on the
> > backend connection (when fail_over_on_backend_error is set). This is good
> > enough for the standalone Pgpool-II server.
> >
> > But consider the scenario where we have more than one Pgpool-II (Say
> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through
> watchdog
> > and each Pgpool-II node is configured with two PostgreSQL backends (B1
> and
> > B2).
> >
> > Now if due to some network glitch or an issue, Pgpool-A fails or loses
> its
> > network connection with backend B1, The Pgpool-A will detect the failure
> > and detach (failover) the B1 backend and also pass this information to
> the
> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the Backend
> > B1 was perfectly healthy and it was also reachable from Pgpool-B and
> > Pgpool-C nodes, But still because of a network glitch between Pgpool-A
> and
> > Backend B1, it will get detached from the cluster and the worst part is,
> if
> > the B1 was a master PostgreSQL (in master-standby configuration), the
> > Pgpool-II failover would also promote the B2 PostgreSQL node as a new
> > master, hense making the way for split-brain and/or data corruptions.
> >
> > So my proposal is that when the Watchdog is configured in Pgpool-II the
> > backend health check of Pgpool-II should consult with other attached
> > Pgpool-II nodes over the watchdog to decide if the Backend node is
> actually
> > failed or if it is just a localized glitch/false alarm. And the failover
> on
> > the node should only be performed, when the majority of cluster members
> > agrees on the failure of nodes.
> >
> > This quorum aware architecture of failover will prevents the false
> > failovers and split-brain scenarios in the Backend nodes.
> >
> > What are your thoughts and suggestions on this?
> >
> > Thanks
> > Best regards
> > Muhammad Usama
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170117/4b3568d2/attachment-0001.html>