<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 25, 2017 at 9:05 AM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Usama,<br>
<span class=""><br>
> This is correct. If the Pgpool-II is used in maste-standby mode (With<br>
> elastic or virtual-IP and clients only connect to one Pgpool-II server<br>
> only) then there is not much issues that could be caused by the<br>
> interruption of link between AZ1 and AZ2 as you defined above.<br>
><br>
> But the issue arrives when the Pgpool-II is used in the master-master<br>
> mode(clients connect to all available Pgpool-II) then consider the<br>
> following scenario.<br>
><br>
> a) Link between AZ1 and AZ2 broke, at that time B1 was master while B2 was<br>
> standby.<br>
><br>
> b) Pgpool-C in AZ2 promote B2 to the master since Pgpool-C is not able to<br>
> connect old master (B1)<br>
<br>
</span>I thought Pgpool-C sucides because it cannot get quorum in this case, no?<br></blockquote><div><br></div><div>No, Pgpool-II only commits suicide only when it loses all network connections. Otherwise the master watchdog node is de-escalated when the quorum is lost.</div><div>Committing a suicide everytime quorum is lost is very risky and not a feasible since it will shutdown the whole cluster as soon as a quorum loses even because of a small glitch.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
> c) A client connects to Pgpool-C and issues a write statement. It will land<br>
> on the B2 PostgreSQL server, which was promoted as master in step b.<br>
><br>
> c-1) Another client connects to Pgpool-A and also issues a write statement<br>
> that will land on the B1 PostgreSQL server as it the master node in AZ.<br>
><br>
> d) The link between AZ1 and AZ2 is restored, but now the PostgreSQL B1 and<br>
> B2 both have different sets of data and with no easy way to get both<br>
> changes in one place and restore the cluster to original state.<br>
><br>
> The above scenario will become more complicated if both availability zones<br>
> AZ1 and AZ2 have multiple Pgpool-II nodes, since retiring the multiple<br>
> Pgpool-II nodes logic will become more complex when link disruption between<br>
> AZ1 and AZ2.<br>
><br>
> So the proposal tries to solve this by making sure that we should always<br>
> have only one master PostgreSQL node in the cluster and never end up in the<br>
> situation where we have different sets of data in different PostgreSQL<br>
> nodes.<br>
><br>
><br>
><br>
>> > There is also a question ("[pgpool-general: 5179] Architecture Questions<br>
>> > <<a href="http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html" rel="noreferrer" target="_blank">http://www.sraoss.jp/<wbr>pipermail/pgpool-general/2016-<wbr>December/005237.html</a><br>
>> >")<br>
>> > posted by a user in pgpool-general mailing list who wants a similar type<br>
>> of<br>
>> > network that spans over two AWS availability zones and Pgpool-II has no<br>
>> > good answer to avoid split-brain of backend nodes if the corporate link<br>
>> > between two zones suffers a glitch.<br>
>><br>
>> That seems totally different story to me because there two independent<br>
>> streaming replication primary servers in the east and west regions.<br>
>><br>
>><br>
> I think the original question statement was a little bit confusing. How I<br>
> understand the user requirements later in the thread was that.<br>
> The user has a couple of PostgreSQL nodes in two availability zones (total<br>
> 4 PG nodes) and all four nodes are part of the single streaming replication<br>
> setup.<br>
> Both zones have two Pgpool-II nodes each. (Total 4 Pgpool-II nodes in the<br>
> cluster).<br>
> Each availability zone has one application server that connects to one of<br>
> two Pgpool-II in the that availability zone. (That makes it master-master<br>
> Pgpool setup). And the user is concerned about split-brain of PostgreSQL<br>
> servers when the corporate link between zones becomes unavailable.<br>
><br>
> Thanks<br>
> Best regards<br>
> Muhammad Usama<br>
><br>
><br>
><br>
>> Best regards,<br>
>> --<br>
>> Tatsuo Ishii<br>
>> SRA OSS, Inc. Japan<br>
>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
>><br>
>> > Thanks<br>
>> > Best regards<br>
>> > Muhammad Usama<br>
>> ><br>
>> ><br>
>> ><br>
>> >><br>
>> >> Best regards,<br>
>> >> --<br>
>> >> Tatsuo Ishii<br>
>> >> SRA OSS, Inc. Japan<br>
>> >> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
>> >> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
>> >><br>
>> >> >> > Hi Hackers,<br>
>> >> >> ><br>
>> >> >> > This is the proposal to make the failover of backend PostgreSQL<br>
>> nodes<br>
>> >> >> > quorum aware to make it more robust and fault tolerant.<br>
>> >> >> ><br>
>> >> >> > Currently Pgpool-II proceeds to failover the backend node as soon<br>
>> as<br>
>> >> the<br>
>> >> >> > health check detects the failure or in case of an error occurred on<br>
>> >> the<br>
>> >> >> > backend connection (when fail_over_on_backend_error is set). This<br>
>> is<br>
>> >> good<br>
>> >> >> > enough for the standalone Pgpool-II server.<br>
>> >> >> ><br>
>> >> >> > But consider the scenario where we have more than one Pgpool-II<br>
>> (Say<br>
>> >> >> > Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through<br>
>> >> >> watchdog<br>
>> >> >> > and each Pgpool-II node is configured with two PostgreSQL backends<br>
>> >> (B1<br>
>> >> >> and<br>
>> >> >> > B2).<br>
>> >> >> ><br>
>> >> >> > Now if due to some network glitch or an issue, Pgpool-A fails or<br>
>> loses<br>
>> >> >> its<br>
>> >> >> > network connection with backend B1, The Pgpool-A will detect the<br>
>> >> failure<br>
>> >> >> > and detach (failover) the B1 backend and also pass this information<br>
>> >> to<br>
>> >> >> the<br>
>> >> >> > other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the<br>
>> >> Backend<br>
>> >> >> > B1 was perfectly healthy and it was also reachable from Pgpool-B<br>
>> and<br>
>> >> >> > Pgpool-C nodes, But still because of a network glitch between<br>
>> >> Pgpool-A<br>
>> >> >> and<br>
>> >> >> > Backend B1, it will get detached from the cluster and the worst<br>
>> part<br>
>> >> is,<br>
>> >> >> if<br>
>> >> >> > the B1 was a master PostgreSQL (in master-standby configuration),<br>
>> the<br>
>> >> >> > Pgpool-II failover would also promote the B2 PostgreSQL node as a<br>
>> new<br>
>> >> >> > master, hense making the way for split-brain and/or data<br>
>> corruptions.<br>
>> >> >> ><br>
>> >> >> > So my proposal is that when the Watchdog is configured in Pgpool-II<br>
>> >> the<br>
>> >> >> > backend health check of Pgpool-II should consult with other<br>
>> attached<br>
>> >> >> > Pgpool-II nodes over the watchdog to decide if the Backend node is<br>
>> >> >> actually<br>
>> >> >> > failed or if it is just a localized glitch/false alarm. And the<br>
>> >> failover<br>
>> >> >> on<br>
>> >> >> > the node should only be performed, when the majority of cluster<br>
>> >> members<br>
>> >> >> > agrees on the failure of nodes.<br>
>> >> >> ><br>
>> >> >> > This quorum aware architecture of failover will prevents the false<br>
>> >> >> > failovers and split-brain scenarios in the Backend nodes.<br>
>> >> >> ><br>
>> >> >> > What are your thoughts and suggestions on this?<br>
>> >> >> ><br>
>> >> >> > Thanks<br>
>> >> >> > Best regards<br>
>> >> >> > Muhammad Usama<br>
>> >> >><br>
>> >><br>
>><br>
</div></div></blockquote></div><br></div></div>