[pgpool-general: 5871] Re: Troubleshooting assistance on node failure and connections blocking
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Jan 18 14:47:09 JST 2018
> We have 2 nodes running. The stack on each node is:
>
>
>
> Our app
>
> HikariCP
>
> Jdbc
>
> Pg-pool 3.6.7
>
> Postgres 9.6.6
>
>
>
> Pgpool is set for Master-Slave with streaming replication; no load
> balancing.
>
>
>
> We are testing our disaster recovery and failover capabilities. If we
> gracefully shutdown node 1 (2nd node), the 1st node proceeds as is nothing
> happened. The app continues to run without missing a beat. As you would
> expect.
>
>
>
> Our problem is when we encounter a “hard” error. If node 1 becomes
> disconnected (network is removed), node 0 becomes impacted. The app will
> freeze up as it can no longer get database connections. We see the
> app/spring talk to Hikari, Hikari talks to jdbc, jdbc cannot get connection
> , eventaully Hikari times out (with 30 sec connection wait) and reples to
> app and we get exceptions. This repeats as the app continues to try talk
> to the database. Pgpool is aware that the node1 is gone as it is in
> recovery mode and node 0 pgpool retries to establish connectivity to pgpool
> on node 1 per pgpool.conf intervals.
>
>
>
> So the thing that really has us stumped is if node 0 is only talking
> through it’s stack to node 0 postgres, why is this failure on node1 having
> any impact on node 0 and freezing the db connections? Obviously when a
> graceful shutdown occurs pgpool graceful handles this and things work as
> you expect. With a hard failure, it does not. I have attached our
> pgpool.conf file. Can someone provide some guidance into the internals of
> pgpool and why this node1 hard failure causes node 0 impacts?
Pgpool-II connects to all PostgreSQL even if load_balance_mode = off.
There has been ongoing discussions to make Pgpool-II connects to only
1 backend, but it's not still implemented.
If you want to shorten the "black period" (that's Pgpool-II is working
on failover), You can adjust health check parameters and failover
related parameter.
Change fail_over_on_backend_error = off to on, will cause immediate
failover if there's problem on connecting or read/write sockets to
backend.
health_check_period = 40 may take up to 40 seconds before Pgpool-II
notices the error. So you might want to shorten this.
health_check_timeout = 10 make take up to 10 secinds before Pgpool-II
notices the error. So you might want to shorten this.
health_check_max_retries = 3 could retry before it gives up, upto
health_check_timeout*health_check_max_retries = 30 seconds.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
More information about the pgpool-general
mailing list