[pgpool-hackers: 509] Child processes hang during `read_password_packet`

Junegunn Choi junegunn.c at gmail.com
Wed May 7 11:48:10 JST 2014


Hi,
we had the issue where every child process hangs while waiting
for the password packet from frontend, finally making the whole
pgpool cluster unresponsive. The callstack of each process was
given as follows:


#0  0x0000003d222cdaf3 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000418c61 in pool_check_fd (cp=<value optimized out>)
    at pool_process_query.c:951
#2  0x000000000041d534 in pool_read (cp=0x1da7a210, buf=0x7ffff5092d3f,
len=1)
    at pool_stream.c:139
#3  0x000000000040b9f0 in read_password_packet (frontend=0x1da7a210,
    protoMajor=<value optimized out>,
    password=0x70a460 "md55c81acdb03ea852f30d0630528697236",
pwdSize=0x70a860)
    at pool_auth.c:1047
#4  0x000000000040c8d2 in do_md5 (backend=0x1da63c70, frontend=0x1da7a210,
    reauth=1, protoMajor=3) at pool_auth.c:867
#5  0x000000000040cceb in pool_do_reauth (frontend=0x1da7a210,
cp=0x1da5ec70)
    at pool_auth.c:421
#6  0x000000000040a9c5 in connect_using_existing_connection (unix_fd=4,
    inet_fd=5) at child.c:1043
#7  do_child (unix_fd=4, inet_fd=5) at child.c:330
#8  0x000000000040455f in fork_a_child (unix_fd=4, inet_fd=5, id=0)
    at main.c:1258
#9  0x0000000000404887 in reaper () at main.c:2482
#10 0x0000000000407a47 in main (argc=<value optimized out>, argv=0x0)
    at main.c:714

We looked into the code and realized that send_md5auth_request
function, which is called just before read_password_packet,
always returns 0. So we suspect that those processes were waiting
for the response even when they failed to correctly send the
request. Strangely, we couldn't reproduce the exact problem on
our test cluster, however it was still happening on our
production servers several times a day. We had to quickly find
a workaround for this recurring problem, so we commented out
do_md5 part from pool_do_reauth effectively disabling the
authentication, and it has not occurred ever since. Although
our cluster is running smooth now, we crippled the md5
authentication. I believe this problem deserves attention and a
proper fix. Thanks.


-- 
cheers,
junegunn choi.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20140507/4f639359/attachment.html>


More information about the pgpool-hackers mailing list