[pgpool-hackers: 2009] Re: Proposal to make backend node failover mechanism quorum aware

Muhammad Usama m.usama at gmail.com
Wed Jan 25 18:04:16 JST 2017


On Wed, Jan 25, 2017 at 9:05 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Usama,
>
> > This is correct. If the Pgpool-II is used in maste-standby mode (With
> > elastic or virtual-IP and clients only connect to one Pgpool-II server
> > only) then there is not much issues that could be caused by the
> > interruption of link between AZ1 and AZ2 as you defined above.
> >
> > But the issue arrives when the Pgpool-II is used in the master-master
> > mode(clients connect to all available Pgpool-II) then consider the
> > following scenario.
> >
> > a) Link between AZ1 and AZ2 broke, at that time B1 was master while B2
> was
> > standby.
> >
> > b) Pgpool-C in AZ2 promote B2 to the master since Pgpool-C is not able to
> > connect old master (B1)
>
> I thought Pgpool-C sucides because it cannot get quorum in this case, no?
>

No, Pgpool-II only commits suicide only when it loses all network
connections. Otherwise the master watchdog node is de-escalated when the
quorum is lost.
Committing a suicide everytime quorum is lost is very risky and not
a feasible since it will shutdown the whole cluster as soon as a quorum
loses even because of a small glitch.


> > c) A client connects to Pgpool-C and issues a write statement. It will
> land
> > on the B2 PostgreSQL server, which was promoted as master in step b.
> >
> > c-1) Another client connects to Pgpool-A and also issues a write
> statement
> > that will land on the B1 PostgreSQL server as it the master node in AZ.
> >
> > d) The link between AZ1 and AZ2 is restored, but now the PostgreSQL B1
> and
> > B2 both have different sets of data and with no easy way to get both
> > changes in one place and restore the cluster to original state.
> >
> > The above scenario will become more complicated if both availability
> zones
> > AZ1 and AZ2 have multiple Pgpool-II nodes, since retiring the multiple
> > Pgpool-II nodes logic will become more complex when link disruption
> between
> > AZ1 and AZ2.
> >
> > So the proposal tries to solve this by making sure that we should always
> > have only one master PostgreSQL node in the cluster and never end up in
> the
> > situation where we have different sets of data in different PostgreSQL
> > nodes.
> >
> >
> >
> >> > There is also a question ("[pgpool-general: 5179] Architecture
> Questions
> >> > <http://www.sraoss.jp/pipermail/pgpool-general/2016-
> December/005237.html
> >> >")
> >> > posted by a user in pgpool-general mailing list who wants a similar
> type
> >> of
> >> > network that spans over two AWS availability zones and Pgpool-II has
> no
> >> > good answer to avoid split-brain of backend nodes if the corporate
> link
> >> > between two zones suffers a glitch.
> >>
> >> That seems totally different story to me because there two independent
> >> streaming replication primary servers in the east and west regions.
> >>
> >>
> > I think the original question statement was a little bit confusing. How I
> > understand the user requirements later in the thread was that.
> > The user has a couple of PostgreSQL nodes in two availability zones
> (total
> > 4 PG nodes) and all four nodes are part of the single streaming
> replication
> > setup.
> > Both zones have two Pgpool-II nodes each. (Total 4 Pgpool-II nodes in the
> > cluster).
> > Each availability zone has one application server that connects to one of
> > two Pgpool-II in the that availability zone. (That makes it master-master
> > Pgpool setup). And the user is concerned about split-brain of PostgreSQL
> > servers when the corporate link between zones becomes unavailable.
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> >
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Thanks
> >> > Best regards
> >> > Muhammad Usama
> >> >
> >> >
> >> >
> >> >>
> >> >> Best regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >> >> > Hi Hackers,
> >> >> >> >
> >> >> >> > This is the proposal to make the failover of backend PostgreSQL
> >> nodes
> >> >> >> > quorum aware to make it more robust and fault tolerant.
> >> >> >> >
> >> >> >> > Currently Pgpool-II proceeds to failover the backend node as
> soon
> >> as
> >> >> the
> >> >> >> > health check detects the failure or in case of an error
> occurred on
> >> >> the
> >> >> >> > backend connection (when fail_over_on_backend_error is set).
> This
> >> is
> >> >> good
> >> >> >> > enough for the standalone Pgpool-II server.
> >> >> >> >
> >> >> >> > But consider the scenario where we have more than one Pgpool-II
> >> (Say
> >> >> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected
> through
> >> >> >> watchdog
> >> >> >> > and each Pgpool-II node is configured with two PostgreSQL
> backends
> >> >> (B1
> >> >> >> and
> >> >> >> > B2).
> >> >> >> >
> >> >> >> > Now if due to some network glitch or an issue, Pgpool-A fails or
> >> loses
> >> >> >> its
> >> >> >> > network connection with backend B1, The Pgpool-A will detect the
> >> >> failure
> >> >> >> > and detach (failover) the B1 backend and also pass this
> information
> >> >> to
> >> >> >> the
> >> >> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although
> the
> >> >> Backend
> >> >> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B
> >> and
> >> >> >> > Pgpool-C nodes, But still because of a network glitch between
> >> >> Pgpool-A
> >> >> >> and
> >> >> >> > Backend B1, it will get detached from the cluster and the worst
> >> part
> >> >> is,
> >> >> >> if
> >> >> >> > the B1 was a master PostgreSQL (in master-standby
> configuration),
> >> the
> >> >> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as
> a
> >> new
> >> >> >> > master, hense making the way for split-brain and/or data
> >> corruptions.
> >> >> >> >
> >> >> >> > So my proposal is that when the Watchdog is configured in
> Pgpool-II
> >> >> the
> >> >> >> > backend health check of Pgpool-II should consult with other
> >> attached
> >> >> >> > Pgpool-II nodes over the watchdog to decide if the Backend node
> is
> >> >> >> actually
> >> >> >> > failed or if it is just a localized glitch/false alarm. And the
> >> >> failover
> >> >> >> on
> >> >> >> > the node should only be performed, when the majority of cluster
> >> >> members
> >> >> >> > agrees on the failure of nodes.
> >> >> >> >
> >> >> >> > This quorum aware architecture of failover will prevents the
> false
> >> >> >> > failovers and split-brain scenarios in the Backend nodes.
> >> >> >> >
> >> >> >> > What are your thoughts and suggestions on this?
> >> >> >> >
> >> >> >> > Thanks
> >> >> >> > Best regards
> >> >> >> > Muhammad Usama
> >> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170125/cf625708/attachment-0001.html>


More information about the pgpool-hackers mailing list