[pgpool-hackers: 2094] Re: Proposal to make backend node failover mechanism quorum aware

Thu Mar 9 04:53:09 JST 2017

Hi Ishii-San

I have tried to create a detailed proposal to explain why and where the
quorum aware backend failover mechanism would be useful.
Can you please take a look at the attached pdf document and share your
thoughts.

Thanks
Kind Regards
Muhammad Usama

On Wed, Jan 25, 2017 at 2:04 PM, Muhammad Usama <m.usama at gmail.com> wrote:

>
>
> On Wed, Jan 25, 2017 at 9:05 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
>> Usama,
>>
>> > This is correct. If the Pgpool-II is used in maste-standby mode (With
>> > elastic or virtual-IP and clients only connect to one Pgpool-II server
>> > only) then there is not much issues that could be caused by the
>> > interruption of link between AZ1 and AZ2 as you defined above.
>> >
>> > But the issue arrives when the Pgpool-II is used in the master-master
>> > mode(clients connect to all available Pgpool-II) then consider the
>> > following scenario.
>> >
>> > a) Link between AZ1 and AZ2 broke, at that time B1 was master while B2
>> was
>> > standby.
>> >
>> > b) Pgpool-C in AZ2 promote B2 to the master since Pgpool-C is not able
>> to
>> > connect old master (B1)
>>
>> I thought Pgpool-C sucides because it cannot get quorum in this case, no?
>>
>
> No, Pgpool-II only commits suicide only when it loses all network
> connections. Otherwise the master watchdog node is de-escalated when the
> quorum is lost.
> Committing a suicide everytime quorum is lost is very risky and not
> a feasible since it will shutdown the whole cluster as soon as a quorum
> loses even because of a small glitch.
>
>
>> > c) A client connects to Pgpool-C and issues a write statement. It will
>> land
>> > on the B2 PostgreSQL server, which was promoted as master in step b.
>> >
>> > c-1) Another client connects to Pgpool-A and also issues a write
>> statement
>> > that will land on the B1 PostgreSQL server as it the master node in AZ.
>> >
>> > d) The link between AZ1 and AZ2 is restored, but now the PostgreSQL B1
>> and
>> > B2 both have different sets of data and with no easy way to get both
>> > changes in one place and restore the cluster to original state.
>> >
>> > The above scenario will become more complicated if both availability
>> zones
>> > AZ1 and AZ2 have multiple Pgpool-II nodes, since retiring the multiple
>> > Pgpool-II nodes logic will become more complex when link disruption
>> between
>> > AZ1 and AZ2.
>> >
>> > So the proposal tries to solve this by making sure that we should always
>> > have only one master PostgreSQL node in the cluster and never end up in
>> the
>> > situation where we have different sets of data in different PostgreSQL
>> > nodes.
>> >
>> >
>> >
>> >> > There is also a question ("[pgpool-general: 5179] Architecture
>> Questions
>> >> > <http://www.sraoss.jp/pipermail/pgpool-general/2016-December
>> /005237.html
>> >> >")
>> >> > posted by a user in pgpool-general mailing list who wants a similar
>> type
>> >> of
>> >> > network that spans over two AWS availability zones and Pgpool-II has
>> no
>> >> > good answer to avoid split-brain of backend nodes if the corporate
>> link
>> >> > between two zones suffers a glitch.
>> >>
>> >> That seems totally different story to me because there two independent
>> >> streaming replication primary servers in the east and west regions.
>> >>
>> >>
>> > I think the original question statement was a little bit confusing. How
>> I
>> > understand the user requirements later in the thread was that.
>> > The user has a couple of PostgreSQL nodes in two availability zones
>> (total
>> > 4 PG nodes) and all four nodes are part of the single streaming
>> replication
>> > setup.
>> > Both zones have two Pgpool-II nodes each. (Total 4 Pgpool-II nodes in
>> the
>> > cluster).
>> > Each availability zone has one application server that connects to one
>> of
>> > two Pgpool-II in the that availability zone. (That makes it
>> master-master
>> > Pgpool setup). And the user is concerned about split-brain of PostgreSQL
>> > servers when the corporate link between zones becomes unavailable.
>> >
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>> >
>> >
>> >
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >> > Thanks
>> >> > Best regards
>> >> > Muhammad Usama
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> Best regards,
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese:http://www.sraoss.co.jp
>> >> >>
>> >> >> >> > Hi Hackers,
>> >> >> >> >
>> >> >> >> > This is the proposal to make the failover of backend PostgreSQL
>> >> nodes
>> >> >> >> > quorum aware to make it more robust and fault tolerant.
>> >> >> >> >
>> >> >> >> > Currently Pgpool-II proceeds to failover the backend node as
>> soon
>> >> as
>> >> >> the
>> >> >> >> > health check detects the failure or in case of an error
>> occurred on
>> >> >> the
>> >> >> >> > backend connection (when fail_over_on_backend_error is set).
>> This
>> >> is
>> >> >> good
>> >> >> >> > enough for the standalone Pgpool-II server.
>> >> >> >> >
>> >> >> >> > But consider the scenario where we have more than one Pgpool-II
>> >> (Say
>> >> >> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected
>> through
>> >> >> >> watchdog
>> >> >> >> > and each Pgpool-II node is configured with two PostgreSQL
>> backends
>> >> >> (B1
>> >> >> >> and
>> >> >> >> > B2).
>> >> >> >> >
>> >> >> >> > Now if due to some network glitch or an issue, Pgpool-A fails
>> or
>> >> loses
>> >> >> >> its
>> >> >> >> > network connection with backend B1, The Pgpool-A will detect
>> the
>> >> >> failure
>> >> >> >> > and detach (failover) the B1 backend and also pass this
>> information
>> >> >> to
>> >> >> >> the
>> >> >> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although
>> the
>> >> >> Backend
>> >> >> >> > B1 was perfectly healthy and it was also reachable from
>> Pgpool-B
>> >> and
>> >> >> >> > Pgpool-C nodes, But still because of a network glitch between
>> >> >> Pgpool-A
>> >> >> >> and
>> >> >> >> > Backend B1, it will get detached from the cluster and the worst
>> >> part
>> >> >> is,
>> >> >> >> if
>> >> >> >> > the B1 was a master PostgreSQL (in master-standby
>> configuration),
>> >> the
>> >> >> >> > Pgpool-II failover would also promote the B2 PostgreSQL node
>> as a
>> >> new
>> >> >> >> > master, hense making the way for split-brain and/or data
>> >> corruptions.
>> >> >> >> >
>> >> >> >> > So my proposal is that when the Watchdog is configured in
>> Pgpool-II
>> >> >> the
>> >> >> >> > backend health check of Pgpool-II should consult with other
>> >> attached
>> >> >> >> > Pgpool-II nodes over the watchdog to decide if the Backend
>> node is
>> >> >> >> actually
>> >> >> >> > failed or if it is just a localized glitch/false alarm. And the
>> >> >> failover
>> >> >> >> on
>> >> >> >> > the node should only be performed, when the majority of cluster
>> >> >> members
>> >> >> >> > agrees on the failure of nodes.
>> >> >> >> >
>> >> >> >> > This quorum aware architecture of failover will prevents the
>> false
>> >> >> >> > failovers and split-brain scenarios in the Backend nodes.
>> >> >> >> >
>> >> >> >> > What are your thoughts and suggestions on this?
>> >> >> >> >
>> >> >> >> > Thanks
>> >> >> >> > Best regards
>> >> >> >> > Muhammad Usama
>> >> >> >>
>> >> >>
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170309/bd0df39d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quorum aware failover proposal.pdf
Type: application/pdf
Size: 189674 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170309/bd0df39d/attachment-0001.pdf>