[pgpool-general: 3394] v3.4.0.(3)? - memory issue and connection hangs
Pablo Sanchez
pablo at blueoakdb.com
Sat Jan 3 07:10:34 JST 2015
Hi,
I'm going to apologize in advance for his lengthy email message. I'm
hoping there's sufficient valid data presented for some assistance.
I'm on CentOS 7 with the latest updates. I'm using the latest version
of pgpool [1]: v3.4.0.3
We're seeing two issues. One, a memory leak and another PG Pool seems
to hang when there's a /severe/ in-rush of connections.
At a high-level, we have the following PGPool features enabled:
o num_init_children = 95
o max_pool = 1
o connection_cache = on
o replication_mode = off
o load_balance_mode = on
We have one /slave/ for our /master/
o master_slave_mode = on
o use_watchdog = on
o memory_cache_enabled = on
Any help would be appreciated. If more data is required, please don't
hesitate to ask.
Thank you!
::: Memory Leak :::
The front-end application is using the Quartz scheduler[2] whch seems
to occasionally get into an infinite loop. The memory leak is
triggered when we see upwards of 300+ UPDATE's per second which
fail[3].
While I understand there's an application issue which needs to be
resolved, IMO PGPool shouldn't die because of the issue. :)
I'm enclosing trimmed /sar/ memory data[4] which shows how quickly we
run out of memory.
::: Memory Leak - Part 2 :::
It appears, I've yet to confirm it though, but when we have
/memory_cache_enabled = on/, we have a minor memory leak. It seems to
take 48h+ (I've yet to let the server run out of memory) before we
exhause all the RAM (nearly 4G) on the server.
:: Hang on in-rush :::
We've been using PGBench to stress test the environment. One test in
particular is how our environment handles a sudden burst of
connections.
Using the -C option, PGBench has the ability to create a connection
per query. When 'connection_cache = off', PGPool can handle any
number of iterations of the PGBench call per minute[5] - sh code:
while [ 1 ] ; do
[5]
sleep 1m
done
When 'connection_cache = on', on the third iteration PGBench hangs.
The benchmark can be set up following these steps[6].
While monitoring network connections[7], I had to employ the following
kernel tunes[8] to get 'connection_cache = off' to work but it didn't
help with 'connection_cache = on'
We'd like to use the connection pooling feature.
::: References :::
[1] - pgpool-II-pg93-3.4.0-3pgdg.rhel6.x86_64
[2] - http://quartz-scheduler.org
[3] - UPDATE failure
015-01-01 22:04:02 - [unknown] (pid 20899): the-user-db: LOG:
Parse: Error or notice message from backend: : DB node id: 0 backend
pid: 28304 statement: "UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = $1
WHERE SCHED_NAME = 'QuartzScheduler' AND TRIGGER_NAME = $2 AND
TRIGGER_GROUP = $3 AND TRIGGER_STATE�<88>" message: "invalid byte
sequence for encoding "UTF8": 0xe8 0x88"
[4] - trimmed output of /sar -r 60 .../
Note: "%memused" increasing attt roughly 9:43:27 pm
kbmemfree kbmemused %memused %commit
09:20:27 PM 3037436 846340 21.79 23.46
09:21:27 PM 3042244 841532 21.67 23.37
09:22:27 PM 3041180 842596 21.70 23.39
09:23:27 PM 3036952 846824 21.80 23.46
09:24:27 PM 3023848 859928 22.14 23.69
09:25:27 PM 3023632 860144 22.15 23.69
09:26:27 PM 2676032 1207744 31.10 28.91
09:27:27 PM 2404916 1478860 38.08 28.91
09:28:27 PM 2061288 1822488 46.93 28.93
09:29:27 PM 2025796 1857980 47.84 28.95
09:30:27 PM 1876968 2006808 51.67 28.96
09:31:27 PM 1877060 2006716 51.67 28.96
09:32:27 PM 1875104 2008672 51.72 29.02
09:33:27 PM 1876220 2007556 51.69 29.01
09:34:27 PM 2929684 954092 24.57 25.12
09:35:27 PM 2907896 975880 25.13 25.48
09:36:27 PM 3001660 882116 22.71 7.99
09:37:27 PM 3053604 830172 21.38 24.51
09:38:27 PM 2903444 980332 25.24 25.61
09:39:27 PM 2881668 1002108 25.80 25.96
09:40:27 PM 2854664 1029112 26.50 26.44
09:41:27 PM 2847116 1036660 26.69 26.57
09:42:27 PM 2840500 1043276 26.86 26.67
09:43:27 PM 2830284 1053492 27.13 26.82
09:44:27 PM 2581672 1302104 33.53 31.63
09:45:27 PM 2297892 1585884 40.83 37.20
09:46:27 PM 2019400 1864376 48.00 42.38
09:47:27 PM 1717092 2166684 55.79 48.02
09:48:27 PM 1430172 2453604 63.18 53.36
09:49:27 PM 1158128 2725648 70.18 58.43
09:50:27 PM 884424 2999352 77.23 63.56
09:51:27 PM 584972 3298804 84.94 69.20
09:52:27 PM 309340 3574436 92.04 74.66
09:53:27 PM 131388 3752388 96.62 80.59
09:54:27 PM 105308 3778468 97.29 86.00
09:55:27 PM 112748 3771028 97.10 91.64
09:56:27 PM 130988 3752788 96.63 97.19
09:57:27 PM 107760 3776016 97.23 102.37
09:58:27 PM 103328 3780448 97.34 107.24
09:59:27 PM 113372 3770404 97.08 112.71
10:00:27 PM 145368 3738408 96.26 117.67
10:01:27 PM 112524 3771252 97.10 122.87
10:02:27 PM 113772 3770004 97.07 127.92
10:03:27 PM 103928 3779848 97.32 132.97
10:04:27 PM 140552 3743224 96.38 119.54
10:05:27 PM 139540 3744236 96.41 119.54
10:06:27 PM 139740 3744036 96.40 119.54
10:07:27 PM 139464 3744312 96.41 119.54
10:08:27 PM 139592 3744184 96.41 119.54
10:09:27 PM 139600 3744176 96.41 119.54
10:10:27 PM 133392 3750384 96.57 119.54
10:11:27 PM 133488 3750288 96.56 119.54
10:12:27 PM 133552 3750224 96.56 119.54
10:13:27 PM 133632 3750144 96.56 119.54
10:14:27 PM 133624 3750152 96.56 119.54
10:15:27 PM 133640 3750136 96.56 119.54
10:16:27 PM 133768 3750008 96.56 119.54
10:17:27 PM 133800 3749976 96.55 119.54
10:18:27 PM 133832 3749944 96.55 119.54
10:19:27 PM 133800 3749976 96.55 119.54
10:20:27 PM 86816 3796960 97.76 119.54
10:21:27 PM 97432 3786344 97.49 119.54
10:22:27 PM 103120 3780656 97.34 119.54
10:23:27 PM 124552 3759224 96.79 119.54
10:24:27 PM 3619768 264008 6.80 12.24
10:25:27 PM 3614228 269548 6.94 12.24
[5] pgbench call
#
# db-cluster: the VIP to a two-node PGPool cluster.
#
date ; /usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -T 30 -S -c
10 -C pgbench ; date
[6] pgbench setup
yum -y install postgresql93-contrib
createdb -h db-cluster -U postgres pgbench
# scale of 100
/usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -i -s 100
--foreign-keys --unlogged-tables pgbench
[7] Monitor connections
# Script found on the web
while [ 1 ] ; do
netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}';
sleep 1;
echo '---';
done
[8] kernel tunes on PGPool Server
# Default: 0
net.ipv4.tcp_tw_reuse = 1
# Default: 32768 61000
net.ipv4.ip_local_port_range = 1024 65000
# Default: 128
net.core.somaxconn = 10240
--
Pablo Sanchez - Blueoak Database Engineering, Inc
Ph: 819.459.1926 Blog: http://pablo-blog.blueoakdb.com
iNum: 883.5100.0990.1054
More information about the pgpool-general
mailing list