[pgpool-general: 3394] v3.4.0.(3)? - memory issue and connection hangs

Pablo Sanchez pablo at blueoakdb.com
Sat Jan 3 07:10:34 JST 2015


Hi,

I'm going to apologize in advance for his lengthy email message.  I'm
hoping there's sufficient valid data presented for some assistance.

I'm on CentOS 7 with the latest updates.  I'm using the latest version
of pgpool [1]:  v3.4.0.3

We're seeing two issues.  One, a memory leak and another PG Pool seems
to hang when there's a /severe/ in-rush of connections.

At a high-level, we have the following PGPool features enabled:

    o num_init_children = 95
    o max_pool = 1
    o connection_cache = on
    o replication_mode = off
    o load_balance_mode = on

      We have one /slave/ for our /master/

    o master_slave_mode = on
    o use_watchdog = on
    o memory_cache_enabled = on

Any help would be appreciated.  If more data is required, please don't
hesitate to ask.

Thank you!

::: Memory Leak :::

The front-end application is using the Quartz scheduler[2] whch seems
to occasionally get into an infinite loop.  The memory leak is
triggered when we see upwards of 300+ UPDATE's per second which
fail[3].

While I understand there's an application issue which needs to be
resolved, IMO PGPool shouldn't die because of the issue.  :)

I'm enclosing trimmed /sar/ memory data[4] which shows how quickly we
run out of memory.

::: Memory Leak - Part 2 :::

It appears, I've yet to confirm it though, but when we have
/memory_cache_enabled = on/, we have a minor memory leak.  It seems to
take 48h+ (I've yet to let the server run out of memory) before we
exhause all the RAM (nearly 4G) on the server.

:: Hang on in-rush :::

We've been using PGBench to stress test the environment.  One test in
particular is how our environment handles a sudden burst of
connections.

Using the -C option, PGBench has the ability to create a connection
per query.  When 'connection_cache = off', PGPool can handle any
number of iterations of the PGBench call per minute[5] - sh code:

    while [ 1 ] ; do
       [5]
       sleep 1m
    done

When 'connection_cache = on', on the third iteration PGBench hangs.

The benchmark can be set up following these steps[6].

While monitoring network connections[7], I had to employ the following
kernel tunes[8] to get 'connection_cache = off' to work but it didn't
help with 'connection_cache = on'

We'd like to use the connection pooling feature.

::: References :::

[1] - pgpool-II-pg93-3.4.0-3pgdg.rhel6.x86_64

[2] - http://quartz-scheduler.org

[3] - UPDATE failure

015-01-01 22:04:02 - [unknown] (pid 20899): the-user-db: LOG:
Parse: Error or notice message from backend: : DB node id: 0 backend
pid: 28304 statement: "UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = $1
WHERE SCHED_NAME = 'QuartzScheduler' AND TRIGGER_NAME = $2 AND
TRIGGER_GROUP = $3 AND TRIGGER_STATE�<88>" message: "invalid byte
sequence for encoding "UTF8": 0xe8 0x88"

[4] - trimmed output of /sar -r 60 .../

Note:  "%memused" increasing attt roughly 9:43:27 pm

             kbmemfree kbmemused  %memused   %commit
09:20:27 PM   3037436    846340     21.79     23.46
09:21:27 PM   3042244    841532     21.67     23.37
09:22:27 PM   3041180    842596     21.70     23.39
09:23:27 PM   3036952    846824     21.80     23.46
09:24:27 PM   3023848    859928     22.14     23.69
09:25:27 PM   3023632    860144     22.15     23.69
09:26:27 PM   2676032   1207744     31.10     28.91
09:27:27 PM   2404916   1478860     38.08     28.91
09:28:27 PM   2061288   1822488     46.93     28.93
09:29:27 PM   2025796   1857980     47.84     28.95
09:30:27 PM   1876968   2006808     51.67     28.96
09:31:27 PM   1877060   2006716     51.67     28.96
09:32:27 PM   1875104   2008672     51.72     29.02
09:33:27 PM   1876220   2007556     51.69     29.01
09:34:27 PM   2929684    954092     24.57     25.12
09:35:27 PM   2907896    975880     25.13     25.48
09:36:27 PM   3001660    882116     22.71      7.99
09:37:27 PM   3053604    830172     21.38     24.51
09:38:27 PM   2903444    980332     25.24     25.61
09:39:27 PM   2881668   1002108     25.80     25.96
09:40:27 PM   2854664   1029112     26.50     26.44
09:41:27 PM   2847116   1036660     26.69     26.57
09:42:27 PM   2840500   1043276     26.86     26.67
09:43:27 PM   2830284   1053492     27.13     26.82
09:44:27 PM   2581672   1302104     33.53     31.63
09:45:27 PM   2297892   1585884     40.83     37.20
09:46:27 PM   2019400   1864376     48.00     42.38
09:47:27 PM   1717092   2166684     55.79     48.02
09:48:27 PM   1430172   2453604     63.18     53.36
09:49:27 PM   1158128   2725648     70.18     58.43
09:50:27 PM    884424   2999352     77.23     63.56
09:51:27 PM    584972   3298804     84.94     69.20
09:52:27 PM    309340   3574436     92.04     74.66
09:53:27 PM    131388   3752388     96.62     80.59
09:54:27 PM    105308   3778468     97.29     86.00
09:55:27 PM    112748   3771028     97.10     91.64
09:56:27 PM    130988   3752788     96.63     97.19
09:57:27 PM    107760   3776016     97.23    102.37
09:58:27 PM    103328   3780448     97.34    107.24
09:59:27 PM    113372   3770404     97.08    112.71
10:00:27 PM    145368   3738408     96.26    117.67
10:01:27 PM    112524   3771252     97.10    122.87
10:02:27 PM    113772   3770004     97.07    127.92
10:03:27 PM    103928   3779848     97.32    132.97
10:04:27 PM    140552   3743224     96.38    119.54
10:05:27 PM    139540   3744236     96.41    119.54
10:06:27 PM    139740   3744036     96.40    119.54
10:07:27 PM    139464   3744312     96.41    119.54
10:08:27 PM    139592   3744184     96.41    119.54
10:09:27 PM    139600   3744176     96.41    119.54
10:10:27 PM    133392   3750384     96.57    119.54
10:11:27 PM    133488   3750288     96.56    119.54
10:12:27 PM    133552   3750224     96.56    119.54
10:13:27 PM    133632   3750144     96.56    119.54
10:14:27 PM    133624   3750152     96.56    119.54
10:15:27 PM    133640   3750136     96.56    119.54
10:16:27 PM    133768   3750008     96.56    119.54
10:17:27 PM    133800   3749976     96.55    119.54
10:18:27 PM    133832   3749944     96.55    119.54
10:19:27 PM    133800   3749976     96.55    119.54
10:20:27 PM     86816   3796960     97.76    119.54
10:21:27 PM     97432   3786344     97.49    119.54
10:22:27 PM    103120   3780656     97.34    119.54
10:23:27 PM    124552   3759224     96.79    119.54
10:24:27 PM   3619768    264008      6.80     12.24
10:25:27 PM   3614228    269548      6.94     12.24

[5] pgbench call

#
# db-cluster:  the VIP to a two-node PGPool cluster.
#
date ; /usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -T 30 -S -c 
10 -C pgbench ; date

[6] pgbench setup

yum -y install postgresql93-contrib
createdb -h db-cluster -U postgres pgbench

# scale of 100
/usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -i -s 100
--foreign-keys --unlogged-tables pgbench

[7] Monitor connections

# Script found on the web

while [ 1 ] ; do
    netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}';
    sleep 1;
    echo '---';
done

[8] kernel tunes on PGPool Server

# Default:  0
net.ipv4.tcp_tw_reuse = 1

# Default:  32768 61000
net.ipv4.ip_local_port_range = 1024 65000

# Default:  128
net.core.somaxconn = 10240
-- 
Pablo Sanchez - Blueoak Database Engineering, Inc
Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
iNum:  883.5100.0990.1054


More information about the pgpool-general mailing list