[pgpool-general: 764] Re: pgpool dropping backends too much

Lonni J Friedman netllama at gmail.com
Thu Jul 19 12:17:10 JST 2012


Its not just you.  I've been seeing this behavior sporadically for a
while now.  I've got 5 slaves, and every so often, pgpool starts to
freak out for one of them, and drops it.  All the while streaming
replication continues to work without a problem.

On Wed, Jul 18, 2012 at 8:11 PM, Karl von Randow
<karl+pgpool at cactuslab.com> wrote:
> We are running pgpool with 3 backend servers (9.0, streaming replication).
> We are running non-SSL client-pgpool and SSL pgpool-server (my previous
> email was in error, we do not appear to be using SSL client-pgpool).
>
> I have set the primary server to not support fail-over, and that works, it
> doesn't failover. However our slaves failover once or twice a day, when the
> slave has not in fact failed. I have to reattach the node, and it continues
> happily.
>
> The syslog always contains this note about E packets:
>
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 31871:
> pool_process_query: discard E packet from backend 1
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 ERROR: pid 31871: pool_ssl:
> SSL_read: no SSL error reported
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 ERROR: pid 31871:
> pool_read: read failed (Success)
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 31871:
> degenerate_backend_set: 1 fail over request from pid 31871
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: starting
> degeneration. shutdown host db2(5432)
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: Restart
> all children
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346:
> find_primary_node_repeatedly: waiting for finding a primary node
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346:
> find_primary_node: primary node id is 0
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: failover:
> set new primary node: 0
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: failover:
> set new master node: 0
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 923: worker
> process received restart request
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: failover
> done. shutdown host db2(5432)
> Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: worker
> child 923 exits with status 256
> Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 924: pcp child
> process received restart request
> Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: fork a
> new worker child pid 9434
> Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: PCP child
> 924 exits with status 256
> Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: fork a
> new PCP child pid 9435
>
> Sometimes the proceeding syslog entry is a LOG notice about a statement that
> failed, eg:
> Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 26664:
> pool_send_and_wait: Error or notice message from backend: : DB node id: 1
> backend pid: 15682 statement: <SNIP> message: canceling statement due to
> conflict with recovery
>
> I don't want to mark our slaves as no-failover but it seems that pgpool is
> either experiencing a fault and interpreting it as a failover, or is a bit
> sensitive? I'm happy to test patches!


More information about the pgpool-general mailing list