<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 25, 2017 at 9:05 AM, Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Usama,<br>

<span class=""><br>

&gt; This is correct. If the Pgpool-II is used in maste-standby mode (With<br>

&gt; elastic or virtual-IP and clients only connect to one Pgpool-II server<br>

&gt; only) then there is not much issues that could be caused by the<br>

&gt; interruption of link between AZ1 and AZ2 as you defined above.<br>

&gt;<br>

&gt; But the issue arrives when the Pgpool-II is used in the master-master<br>

&gt; mode(clients connect to all available Pgpool-II) then consider the<br>

&gt; following scenario.<br>

&gt;<br>

&gt; a) Link between AZ1 and AZ2 broke, at that time B1 was master while B2 was<br>

&gt; standby.<br>

&gt;<br>

&gt; b) Pgpool-C in AZ2 promote B2 to the master since Pgpool-C is not able to<br>

&gt; connect old master (B1)<br>

<br>

</span>I thought Pgpool-C sucides because it cannot get quorum in this case, no?<br></blockquote><div><br></div><div>No, Pgpool-II only commits suicide only when it loses all network connections. Otherwise the master watchdog node is de-escalated when the quorum is lost.</div><div>Committing a suicide everytime quorum is lost is very risky and not a feasible since it will shutdown the whole cluster as soon as a quorum loses even because of a small glitch.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

&gt; c) A client connects to Pgpool-C and issues a write statement. It will land<br>

&gt; on the B2 PostgreSQL server, which was promoted as master in step b.<br>

&gt;<br>

&gt; c-1) Another client connects to Pgpool-A and also issues a write statement<br>

&gt; that will land on the B1 PostgreSQL server as it the master node in AZ.<br>

&gt;<br>

&gt; d) The link between AZ1 and AZ2 is restored, but now the PostgreSQL B1 and<br>

&gt; B2 both have different sets of data and with no easy way to get both<br>

&gt; changes in one place and restore the cluster to original state.<br>

&gt;<br>

&gt; The above scenario will become more complicated if both availability zones<br>

&gt; AZ1 and AZ2 have multiple Pgpool-II nodes, since retiring the multiple<br>

&gt; Pgpool-II nodes logic will become more complex when link disruption between<br>

&gt; AZ1 and AZ2.<br>

&gt;<br>

&gt; So the proposal tries to solve this by making sure that we should always<br>

&gt; have only one master PostgreSQL node in the cluster and never end up in the<br>

&gt; situation where we have different sets of data in different PostgreSQL<br>

&gt; nodes.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;&gt; &gt; There is also a question (&quot;[pgpool-general: 5179] Architecture Questions<br>

&gt;&gt; &gt; &lt;<a href="http://www.sraoss.jp/pipermail/pgpool-general/2016-December/005237.html" rel="noreferrer" target="_blank">http://www.sraoss.jp/<wbr>pipermail/pgpool-general/2016-<wbr>December/005237.html</a><br>

&gt;&gt; &gt;&quot;)<br>

&gt;&gt; &gt; posted by a user in pgpool-general mailing list who wants a similar type<br>

&gt;&gt; of<br>

&gt;&gt; &gt; network that spans over two AWS availability zones and Pgpool-II has no<br>

&gt;&gt; &gt; good answer to avoid split-brain of backend nodes if the corporate link<br>

&gt;&gt; &gt; between two zones suffers a glitch.<br>

&gt;&gt;<br>

&gt;&gt; That seems totally different story to me because there two independent<br>

&gt;&gt; streaming replication primary servers in the east and west regions.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt; I think the original question statement was a little bit confusing. How I<br>

&gt; understand the user requirements later in the thread was that.<br>

&gt; The user has a couple of PostgreSQL nodes in two availability zones (total<br>

&gt; 4 PG nodes) and all four nodes are part of the single streaming replication<br>

&gt; setup.<br>

&gt; Both zones have two Pgpool-II nodes each. (Total 4 Pgpool-II nodes in the<br>

&gt; cluster).<br>

&gt; Each availability zone has one application server that connects to one of<br>

&gt; two Pgpool-II in the that availability zone. (That makes it master-master<br>

&gt; Pgpool setup). And the user is concerned about split-brain of PostgreSQL<br>

&gt; servers when the corporate link between zones becomes unavailable.<br>

&gt;<br>

&gt; Thanks<br>

&gt; Best regards<br>

&gt; Muhammad Usama<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;&gt; Best regards,<br>

&gt;&gt; --<br>

&gt;&gt; Tatsuo Ishii<br>

&gt;&gt; SRA OSS, Inc. Japan<br>

&gt;&gt; English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>

&gt;&gt; Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>

&gt;&gt;<br>

&gt;&gt; &gt; Thanks<br>

&gt;&gt; &gt; Best regards<br>

&gt;&gt; &gt; Muhammad Usama<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; Best regards,<br>

&gt;&gt; &gt;&gt; --<br>

&gt;&gt; &gt;&gt; Tatsuo Ishii<br>

&gt;&gt; &gt;&gt; SRA OSS, Inc. Japan<br>

&gt;&gt; &gt;&gt; English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>

&gt;&gt; &gt;&gt; Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Hi Hackers,<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; This is the proposal to make the failover of backend PostgreSQL<br>

&gt;&gt; nodes<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; quorum aware to make it more robust and fault tolerant.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Currently Pgpool-II proceeds to failover the backend node as soon<br>

&gt;&gt; as<br>

&gt;&gt; &gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; health check detects the failure or in case of an error occurred on<br>

&gt;&gt; &gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; backend connection (when fail_over_on_backend_error is set). This<br>

&gt;&gt; is<br>

&gt;&gt; &gt;&gt; good<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; enough for the standalone Pgpool-II server.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; But consider the scenario where we have more than one Pgpool-II<br>

&gt;&gt; (Say<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through<br>

&gt;&gt; &gt;&gt; &gt;&gt; watchdog<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; and each Pgpool-II node is configured with two PostgreSQL backends<br>

&gt;&gt; &gt;&gt; (B1<br>

&gt;&gt; &gt;&gt; &gt;&gt; and<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; B2).<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Now if due to some network glitch or an issue, Pgpool-A fails or<br>

&gt;&gt; loses<br>

&gt;&gt; &gt;&gt; &gt;&gt; its<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; network connection with backend B1, The Pgpool-A will detect the<br>

&gt;&gt; &gt;&gt; failure<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; and detach (failover) the B1 backend and also pass this information<br>

&gt;&gt; &gt;&gt; to<br>

&gt;&gt; &gt;&gt; &gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the<br>

&gt;&gt; &gt;&gt; Backend<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; B1 was perfectly healthy and it was also reachable from Pgpool-B<br>

&gt;&gt; and<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Pgpool-C nodes, But still because of a network glitch between<br>

&gt;&gt; &gt;&gt; Pgpool-A<br>

&gt;&gt; &gt;&gt; &gt;&gt; and<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Backend B1, it will get detached from the cluster and the worst<br>

&gt;&gt; part<br>

&gt;&gt; &gt;&gt; is,<br>

&gt;&gt; &gt;&gt; &gt;&gt; if<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; the B1 was a master PostgreSQL (in master-standby configuration),<br>

&gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Pgpool-II failover would also promote the B2 PostgreSQL node as a<br>

&gt;&gt; new<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; master, hense making the way for split-brain and/or data<br>

&gt;&gt; corruptions.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; So my proposal is that when the Watchdog is configured in Pgpool-II<br>

&gt;&gt; &gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; backend health check of Pgpool-II should consult with other<br>

&gt;&gt; attached<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Pgpool-II nodes over the watchdog to decide if the Backend node is<br>

&gt;&gt; &gt;&gt; &gt;&gt; actually<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; failed or if it is just a localized glitch/false alarm. And the<br>

&gt;&gt; &gt;&gt; failover<br>

&gt;&gt; &gt;&gt; &gt;&gt; on<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; the node should only be performed, when the majority of cluster<br>

&gt;&gt; &gt;&gt; members<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; agrees on the failure of nodes.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; This quorum aware architecture of failover will prevents the false<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; failovers and split-brain scenarios in the Backend nodes.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; What are your thoughts and suggestions on this?<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Thanks<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Best regards<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt; Muhammad Usama<br>

&gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt;<br>

</div></div></blockquote></div><br></div></div>