[pgpool-hackers: 1065] Fixing "Thundering herd problem"

Tatsuo Ishii ishii at postgresql.org
Fri Sep 25 12:54:33 JST 2015


This is a proposal to fix the "thundering herd problem" toward
pgpool-II 3.5.

Background: The thundering herd problem is a general problem where a
server needs to handle lots of child process. From the Wikipedia:

       The thundering herd problem occurs when a large number of
       processes waiting for an event are awoken when that event
       occurs, but only one process is able to proceed at a
       time. After the processes wake up, they all demand the resource
       and a decision must be made as to which process can
       continue. After the decision is made, the remaining processes
       are put back to sleep, only to all wake up again to request
       access to the resource.

       Quote from: https://en.wikipedia.org/wiki/Thundering_herd_problem

Ppgpool-II parent process spawns num_init_children child process and
all of them issue listen() then sleep. When a client connect to the
port which those child process are listening on, all of them are woke
up and only one of them can actually succeeds in doing accept(). This
is the place where the thundering herd problem occurs because lots of
child process wake up thus it results in heavy context switching.

Solution:

As suggested by Michael Renner in [pgpool-general: 3934],
serialization of listen() is useful to fix the problem. Attached patch
implements it by using semaphore so that only one of child process
listens to the socket at the same time.

Results:

I did a simple bench marking before/after a patch against master branch
using pgbench -C, which repeats connecting/disconnecting to pgpool-II
for each transaction and it easily reveals the problem. The hardware
is a notebook (Panasonic's Let's note CF-SX3) with 2x CORE i7, 512GB
mem and SSD disk. The OS is Ubuntu 14.04 LTS. PostgreSQL 9.4.4 with
max_connections = 512. The database was created by pgpool_setup with 1
DB node. num_init_children = 400.

In short, I got up to 2.8 times improvement (937.896224 TPS
vs. 2644.281630 TPS, see exact number below).

Downside of the patch:

However there is a downside with the patch: child_life_time does not
work anymore. I tried to eliminate the downside without success (see
the mailing list thread). I think that is not a crucial one since we
could use child_max_connections instead. I want to keep the code
simple. Maybe we could come back and eliminate the restriction in the
future. Note that if child_life_time is enabled, the serialization
does not occur in the patch.

Do we need yet another switch?

I see the improvement is not achieved if num_init_children is not so
large, like 32. Currently the "32" is hard coded and the serialization
is disabled if num_init_children is equal to or lower than the
value. I doubt the 32 is general enough for various
environments. Maybe we need a switch to turn off the serialization?

Without the patch:
t-ishii at localhost: pgbench -n -S -p 11000 -c 32 -C -S -T 300 test
transaction type: SELECT only
scaling factor: 1
query mode: simple
number of clients: 32
number of threads: 1
duration: 300 s
number of transactions actually processed: 295499
latency average: 32.487 ms
tps = 937.896224 (including connections establishing)
tps = 34010.790776 (excluding connections establishing)

With the patch:
t-ishii at localhost: pgbench -n -S -p 11000 -c 32 -C -S -T 300 test
transaction type: SELECT only
scaling factor: 1
query mode: simple
number of clients: 32
number of threads: 1
duration: 300 s
number of transactions actually processed: 793289
latency average: 12.102 ms
tps = 2644.281630 (including connections establishing)
tps = 14393.209054 (excluding connections establishing)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list