<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 16, 2019 at 2:03 PM Tatsuo Ishii &lt;<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">&gt;&gt; Thanks. However this will change existing behavior. Probably we should<br>

&gt;&gt; make the change against master branch only?<br>

&gt;&gt;<br>

&gt; <br>

&gt; Probably yes, because the current fix I have for this in my mind involves<br>

&gt; the configurable timeout parameter<br>

&gt; to make the master pgpool resign. Let me come up with the patch and then we<br>

&gt; work on the part of that<br>

&gt; needs to be back ported.<br>

&gt; And regarding the patch I shared upthread to continue the health check on<br>

&gt; quarantined nodes, Do you think we should<br>

&gt; also back-patch it to older versions as-well ?<br>

<br>

Not sure we should back port both of two patches since they will<br>

change existing behaviors (and even one of them is documented).<br>

<br>

What do you think?<br></blockquote><div><br></div><div>Totally agreed. So I will go on to make it for master branch only. </div><div>Many thanks for the valuable inputs.</div><div><br></div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

&gt; Thanks<br>

&gt; Best Regards<br>

&gt; Muhammad Usama<br>

&gt; <br>

&gt; <br>

&gt;&gt;<br>

&gt;&gt; &gt; Thanks<br>

&gt;&gt; &gt; Best Regards<br>

&gt;&gt; &gt; Muhammad Usama<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Thanks<br>

&gt;&gt; &gt;&gt; &gt; Best Regards<br>

&gt;&gt; &gt;&gt; &gt; Muhammad Usama<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; Can you please try out the attached patch, to see if the<br>

&gt;&gt; solution<br>

&gt;&gt; &gt;&gt; &gt;&gt; works<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; for<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; the situation?<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; The patch is generated against current master branch.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; Thanks<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; Best Regards<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; Muhammad Usama<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; On Wed, Apr 10, 2019 at 2:04 PM TAKATSUKA Haruka &lt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; <a href="mailto:harukat@sraoss.co.jp" target="_blank">harukat@sraoss.co.jp</a>&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; wrote:<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; Hello, Pgpool developers<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; I found Pgpool-II watchdog is too strict for duplicate failover<br>

&gt;&gt; &gt;&gt; &gt;&gt; request<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; with allow_multiple_failover_requests_from_node=off setting.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; For example, A watchdog cluster with 3 pgpool instances is<br>

&gt;&gt; here.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; Their backends are PostgreSQL servers using streaming<br>

&gt;&gt; replication.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; When the communication between master/coordinator pgpool and<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; primary PostgreSQL node is down during a short period<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; (or pgpool do any false-positive judgement by various reasons),<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; and then the pgpool tries to failover but cannot get the<br>

&gt;&gt; &gt;&gt; consensus,<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; so it makes the primary node into quarantine status. It cannot<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; be reset automatically. As a result, the service becomes<br>

&gt;&gt; &gt;&gt; unavailable.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; This case generates logs like the following:<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: LOG:  new IPC connection received<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: LOG:  watchdog received the failover command from<br>

&gt;&gt; local<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pgpool-II on IPC interface<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: LOG:  watchdog is processing the failover command<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on<br>

&gt;&gt; IPC<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; interface<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: LOG:  Duplicate failover request from &quot;pg1:5432 Linux<br>

&gt;&gt; &gt;&gt; pg1&quot;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; node<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: DETAIL:  request ignored<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: LOG:  failover requires the majority vote, waiting<br>

&gt;&gt; for<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; consensus<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 1234: DETAIL:  failover request noted<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 4321: LOG:  degenerate backend request for 1 node(s) from<br>

&gt;&gt; pid<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; [4321],<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; is changed to quarantine node request by watchdog<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pid 4321: DETAIL:  watchdog is taking time to build consensus<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; Note that this case dosen&#39;t have any communication truouble<br>

&gt;&gt; among<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; the Pgpool watchdog nodes.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; You can reproduce it by changing one PostgreSQL&#39;s pg_hba.conf<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; reject the helth check access from one pgpool node in short<br>

&gt;&gt; &gt;&gt; period.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; The document don&#39;t say that duplicate failover requests make<br>

&gt;&gt; the<br>

&gt;&gt; &gt;&gt; node<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; quarantine immediately. I think it should be just igunoring the<br>

&gt;&gt; &gt;&gt; &gt;&gt; request.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; A patch file for head of V3_7_STABLE is attached.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; Pgpool with this patch also disturbs failover by single<br>

&gt;&gt; pgpool&#39;s<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; repeated<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; failover requests. But it can recover when the connection<br>

&gt;&gt; trouble<br>

&gt;&gt; &gt;&gt; is<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; gone.<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; Does this change have any problem?<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; with best regards,<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; TAKATSUKA Haruka &lt;<a href="mailto:harukat@sraoss.co.jp" target="_blank">harukat@sraoss.co.jp</a>&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; _______________________________________________<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; pgpool-hackers mailing list<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; <a href="mailto:pgpool-hackers@pgpool.net" target="_blank">pgpool-hackers@pgpool.net</a><br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; <a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt;<br>

</blockquote></div></div>