<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 16, 2017 at 12:10 PM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Usama,<br>
<br>
If my understanding is correct, by using the quorum, Pgpool-B and<br>
Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries<br>
to connect to B1 if the network failure between Pgpool-A and B1<br>
continues? I guess clients connect to Pgpool-A get error and failed to<br>
connect to database?<br></blockquote><div><br></div><div>Yes, that is correct. I think what we can do in this scenario is, If the Pgpool-A is not allowed to failover B1 because other nodes in the cluster (Pgpool-B and Pgpool-C) does not agree with the failure of B1 then the Pgpool-A will throw an error to its clients if B1 was the master/primary Backend Server. Otherwise, if B1 was the Standby server then Pgpool-A would continue serving the clients without using the unreachable PostgreSQL server B1.</div><div><br></div><div>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
<div class="HOEnZb"><div class="h5"><br>
> Hi Hackers,<br>
><br>
> This is the proposal to make the failover of backend PostgreSQL nodes<br>
> quorum aware to make it more robust and fault tolerant.<br>
><br>
> Currently Pgpool-II proceeds to failover the backend node as soon as the<br>
> health check detects the failure or in case of an error occurred on the<br>
> backend connection (when fail_over_on_backend_error is set). This is good<br>
> enough for the standalone Pgpool-II server.<br>
><br>
> But consider the scenario where we have more than one Pgpool-II (Say<br>
> Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through watchdog<br>
> and each Pgpool-II node is configured with two PostgreSQL backends (B1 and<br>
> B2).<br>
><br>
> Now if due to some network glitch or an issue, Pgpool-A fails or loses its<br>
> network connection with backend B1, The Pgpool-A will detect the failure<br>
> and detach (failover) the B1 backend and also pass this information to the<br>
> other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the Backend<br>
> B1 was perfectly healthy and it was also reachable from Pgpool-B and<br>
> Pgpool-C nodes, But still because of a network glitch between Pgpool-A and<br>
> Backend B1, it will get detached from the cluster and the worst part is, if<br>
> the B1 was a master PostgreSQL (in master-standby configuration), the<br>
> Pgpool-II failover would also promote the B2 PostgreSQL node as a new<br>
> master, hense making the way for split-brain and/or data corruptions.<br>
><br>
> So my proposal is that when the Watchdog is configured in Pgpool-II the<br>
> backend health check of Pgpool-II should consult with other attached<br>
> Pgpool-II nodes over the watchdog to decide if the Backend node is actually<br>
> failed or if it is just a localized glitch/false alarm. And the failover on<br>
> the node should only be performed, when the majority of cluster members<br>
> agrees on the failure of nodes.<br>
><br>
> This quorum aware architecture of failover will prevents the false<br>
> failovers and split-brain scenarios in the Backend nodes.<br>
><br>
> What are your thoughts and suggestions on this?<br>
><br>
> Thanks<br>
> Best regards<br>
> Muhammad Usama<br>
</div></div></blockquote></div><br></div></div>