[pgpool-general: 5902] Pgpool-3.7.1 out of memory

Philip Champon philip at adaptly.com
Tue Jan 30 05:04:36 JST 2018


Hello,

I've got a single pgpool server load balancing and streaming between 1 RDS
primary and 1 RDS replica. In general, things work pretty well. But I find
that each night, at the same time, pgpool network IO spikes, because of
some jobs we kick off and pgpool fails to gracefully handle the load.

As the number of connections and queries increases, we see an increase in
processes receiving SIGKILL. Within a minute, we see all of the processes
receive SIGKILL. Then we lose the connection to our backends. It seems that
new children are spawned and the cycle repeats itself. Suddenly, the parent
process can no longer fork, because pgpool cannot allocate memory. After
the memory error no children are forked (waited 6 hours, never recovered).
If I connect to pgpool my connection hangs indefinitely. Once pgpool
reports the memory error, nstat show TcpExtListenDrops are ever
increasing.

Has anyone else run into issues like this? I tried moving to a larger
instance (from 8gb RAM to 15). The 8pm job deluge was passed, pgpool
climbed to 11.2gb of RAM. But just now, I ran into the issue again... Any
insight would be appreciated.

Thanks!

2018-01-28 00:14:23: pid 8874: LOG:  child process with pid: 13059 exits
with status 9 by signal 9
...
2018-01-28 00:15:13: pid 2408: WARNING:  failed to connect to PostgreSQL
server, getaddrinfo() failed with error "System error"
...
2018-01-28 00:23:33: pid 8874: LOG:  child process with pid: 9155 exits
with status 9 by signal 9
...
2018-01-28 00:23:34: pid 3034: WARNING:  failed to connect to PostgreSQL
server, getaddrinfo() failed with error "System error"
...

2018-01-28 00:23:48: pid 8874: FATAL:  failed to fork a child
2018-01-28 00:23:48: pid 8874: DETAIL:  system call fork() failed with
reason: Cannot allocate memory
2018-01-28 00:23:49: pid 9105: LOG:  pool_ssl: "SSL_write": "bad write retry"
2018-01-28 00:23:49: pid 9105: LOCATION:  pool_ssl.c:314
2018-01-28 00:23:49: pid 9105: WARNING:  write on backend 0 failed
with error :"Success"
2018-01-28 00:23:49: pid 9105: DETAIL:  while trying to write data
from offset: 0 wlen: 5
2018-01-28 00:23:49: pid 9105: LOCATION:  pool_stream.c:678


listen_addresses          = '*'
port                      = '9999'
socket_dir                = '/var/run/pgpool'
pcp_listen_addresses      = 'localhost'
pcp_port                  = 9898
pcp_socket_dir            = '/var/run/pgpool'
listen_backlog_multiplier = 2
serialize_accept          = off
backend_hostname0         = 'primary-host'
backend_port0             = 5432
backend_weight0           = 0
backend_flag0             = 'ALWAYS_MASTER'
backend_hostname1         = 'secondary-host'
backend_port1             = 5432
backend_weight1           = 1
backend_flag1             = 'ALLOW_TO_FAILOVER'
enable_pool_hba           = on
pool_passwd               = 'pool_passwd'
authentication_timeout    = 60
ssl                       = on
num_init_children         = 450
max_pool                  = 2
child_life_time           = 300
child_max_connections     = 0
connection_life_time      = 300
client_idle_limit         = 0
log_destination           = 'stderr'
log_line_prefix           = '%t: pid %p: '
log_connections           = off
log_hostname              = off
log_statement             = off
log_per_node_statement    = off
log_standby_delay         = 'if_over_threshold'
log_error_verbosity       = 'verbose'
log_min_messages          = 'warning'
pid_file_name             = '/var/run/pgpool/pgpool.pid'
logdir                    = '/var/log/pgpool'
connection_cache          = on
reset_query_list          = 'ABORT; DISCARD ALL'
replication_mode          = off
replicate_select          = off
insert_lock               = off
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode         = on
ignore_leading_white_space = on
black_function_list       = 'currval,lastval,nextval,setval'
allow_sql_comments        = off
master_slave_mode         = on
master_slave_sub_mode     = 'stream'
sr_check_period           = 0
delay_threshold           = 10000000
health_check_period       = 5
health_check_timeout      = 20
health_check_password     = 'pw'
health_check_user         = 'user'
health_check_database     = 'postgres'
health_check_max_retries  = 20
health_check_retry_delay  = 1
connect_timeout           = 10000
fail_over_on_backend_error = off
use_watchdog              = off
clear_memqcache_on_escalation = on
check_temp_table          = on
check_unlogged_table      = on
memory_cache_enabled      = off
ssl_key                   = '/etc/pgpool/pgpool.key'
ssl_cert                  = '/etc/pgpool/pgpool.pem'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180129/d1402209/attachment.html>


More information about the pgpool-general mailing list