[pgpool-general: 6083] Re: failover_require_consensus does not work.

Muhammad Usama m.usama at gmail.com
Mon May 14 03:34:39 JST 2018


Hi Ishii-San

I have tried to rephrase your suggestion for clarity. Please have a look at
the attached patch if you see it fit


Thanks
Best Regards
Muhammad Usama


On Fri, May 11, 2018 at 1:30 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Ok, here is a proposal for addition to the doc.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Usama,
> >
> > Do we want to add some notes to the doc regarding this? The behavior
> > described below may not be obvious to users.
> >
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese:http://www.sraoss.co.jp
> >
> >> Hi
> >>
> >> Thanks for the logs and config files.
> >> As per the logs and pgpool.conf files, This is what is happening.
> >>
> >> You have health check disabled on all Pgpool-II nodes, So only way to
> >> detect the backend failure is through fail_over_on_backend error( which
> >> only works when client connection
> >> detects the error) . But since the clients are only connecting to the
> >> master Pgpool-II node, so only master Pgpool-II node can notice the
> backend
> >> PostgreSQL node failure
> >> and because of consensus requirement it will keep waiting for the
> detection
> >> of backend failure by other Pgpool-II nodes, Which never arrives because
> >> other two Pgpool-II nodes
> >> are sitting idle and didn't detected the error.
> >> So you either need to enable the health check on all pgpool-II nodes (
> >> Which is the recommended setting for HA) or just disable the consensus
> >> requirements (as you did when failover
> >> was working fine)
> >>
> >> Thanks
> >> Best Rgeards
> >> Muhammad Usama
> >>
> >> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
> >>
> >>> Hey Guys.
> >>> Thank you for your answer.
> >>> I attached the configuration files of pgpool and logs.
> >>> I hope you сan help.
> >>>
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Best regards,
> >>> Vladyslav
> >>>
> >>>
> >>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
> >>>
> >>> Hi
> >>>
> >>> From the log snippet you shared it seems that the the failure was never
> >>> detected by the other Pgpool-II node, Can you please share the
> pgpoo.conf
> >>> files and log files for all Pgpool nodes.
> >>>
> >>> Thanks
> >>> Best Regards
> >>> Muhammad Usama
> >>>
> >>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
> >>>
> >>>> Hey Guys,
> >>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
> >>>> (3 x pgpool and 3 x postgresql
> >>>> The same scheme as:
> >>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
> >>>>
> >>>> When master node of postgresql (pgpoolpsql-1) goes down the master
> node
> >>>> of pgpool (  pgpool-1)  does not get second vote from one of the
> standby
> >>>> pgpool nodes (pgpool-2 and pgpool-3).
> >>>>
> >>>> If I set:
> >>>> failover_require_consensus = off
> >>>> Everything works fine.
> >>>>
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24237:
> >>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
> >>>> getsockopt() detected error "Connection refused"
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24237:
> >>>> LOG:  received degenerate backend request for node_id: 0 from pid
> [24237]
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> LOG:  new IPC connection received
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> LOG:  watchdog received the failover command from local pgpool-II on
> IPC
> >>>> interface
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> LOG:  watchdog is processing the failover command
> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
> interface
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> LOG:  failover requires the majority vote, waiting for consensus
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> DETAIL:  failover request noted
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24217:
> >>>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from
> pgpool-II
> >>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the
> confirmation
> >>>> from other nodes
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24237:
> >>>> LOG:  degenerate backend request for node_id: 0 from pid [24237],
> will be
> >>>> handled by watchdog, which is building consensus for request
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24237:
> >>>> FATAL:  failed to create a backend connection
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24237:
> >>>> DETAIL:  executing failover on backend
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24216:
> >>>> LOG:  child process with pid: 24237 exits with status 256
> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
> 24216:
> >>>> LOG:  fork a new child process with pid: 24268
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
> >>>> getsockopt() detected error "Connection refused"
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> LOG:  received degenerate backend request for node_id: 0 from pid
> [24228]
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  new IPC connection received
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  watchdog received the failover command from local pgpool-II on
> IPC
> >>>> interface
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  watchdog is processing the failover command
> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
> interface
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1"
> node
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> DETAIL:  request ignored
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  failover requires the majority vote, waiting for consensus
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> DETAIL:  failover request noted
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is
> changed
> >>>> to quarantine node request by watchdog
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> DETAIL:  watchdog is taking time to build consensus
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> FATAL:  failed to create a backend connection
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24228:
> >>>> DETAIL:  executing failover on backend
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24216:
> >>>> LOG:  Pgpool-II parent process has received failover request
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  new IPC connection received
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  received the failover indication from Pgpool-II on IPC interface
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  watchdog is informed of failover end by the main process
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24216:
> >>>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24216:
> >>>> LOG:  Restart all children
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24216:
> >>>> LOG:  failover: set new primary node: -1
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24216:
> >>>> LOG:  failover: set new master node: 1
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24252:
> >>>> LOG:  worker process received restart request
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  new IPC connection received
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  received the failover indication from Pgpool-II on IPC interface
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
> 24217:
> >>>> LOG:  watchdog is informed of failover start by the main process
> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
> >>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine
> done.
> >>>> shutdown host pgpoolpsql-1(5432)
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24251:
> >>>> LOG:  restart request received in pcp child process
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  PCP child 24251 exits with status 0 in failover()
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  fork a new PCP child pid 24301 in failover()
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  child process with pid: 24219 exits with status 0
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  child process with pid: 24219 exited with success and will not
> be
> >>>> restarted
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  child process with pid: 24220 exits with status 0
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  child process with pid: 24220 exited with success and will not
> be
> >>>> restarted
> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
> 24216:
> >>>> LOG:  child process with pid: 24221 exits with status 0
> >>>>
> >>>> Around a month ago it woked fine (It seems I tested it on
> pgpool-3.7.2),
> >>>> but now it does not work. Could you tell me some parameters what it
> depends
> >>>> on or you have other thoughts.
> >>>>
> >>>> Best regards,
> >>>> Vladyslav
> >>>>
> >>>> _______________________________________________
> >>>> pgpool-general mailing list
> >>>> pgpool-general at pgpool.net
> >>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> pgpool-general mailing list
> >>> pgpool-general at pgpool.net
> >>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>>
> >>>
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>
> diff --git a/doc/src/sgml/watchdog.sgml b/doc/src/sgml/watchdog.sgml
> index 7e7adc9..041686b 100644
> --- a/doc/src/sgml/watchdog.sgml
> +++ b/doc/src/sgml/watchdog.sgml
> @@ -442,6 +442,16 @@
>          <para>
>            Default is on.
>          </para>
> +
> +       <caution>
> +         <para>
> +           To make <varname>failover_require_consensus</varname>
> +           workable, You need to enable health check. For more
> +           details of health check,
> +           see <xref linkend="runtime-config-health-check">.
> +         </para>
> +       </caution>
> +
>          <para>
>          <varname>failover_require_consensus</varname> is not available
> prior to
>          <productname>Pgpool-II </productname><emphasis>V3.7</emphasis>.
> and it is only
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180513/2a38943b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: watchdog_2.diff
Type: application/octet-stream
Size: 995 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180513/2a38943b/attachment-0001.obj>


More information about the pgpool-general mailing list