[pgpool-hackers: 514] Re: Child processes hang during `read_password_packet`
Yugo Nagata
nagata at sraoss.co.jp
Tue May 13 21:47:16 JST 2014
Hi,
What version of pgpool do you use?
And pgpool.conf is the same of the another thread?
[pgpool-general: 2785] Re: Child process hangs in active state
On Wed, 7 May 2014 11:48:10 +0900
Junegunn Choi <junegunn.c at gmail.com> wrote:
> Hi,
> we had the issue where every child process hangs while waiting
> for the password packet from frontend, finally making the whole
> pgpool cluster unresponsive. The callstack of each process was
> given as follows:
>
>
> #0 0x0000003d222cdaf3 in __select_nocancel () from /lib64/libc.so.6
> #1 0x0000000000418c61 in pool_check_fd (cp=<value optimized out>)
> at pool_process_query.c:951
> #2 0x000000000041d534 in pool_read (cp=0x1da7a210, buf=0x7ffff5092d3f,
> len=1)
> at pool_stream.c:139
> #3 0x000000000040b9f0 in read_password_packet (frontend=0x1da7a210,
> protoMajor=<value optimized out>,
> password=0x70a460 "md55c81acdb03ea852f30d0630528697236",
> pwdSize=0x70a860)
> at pool_auth.c:1047
> #4 0x000000000040c8d2 in do_md5 (backend=0x1da63c70, frontend=0x1da7a210,
> reauth=1, protoMajor=3) at pool_auth.c:867
> #5 0x000000000040cceb in pool_do_reauth (frontend=0x1da7a210,
> cp=0x1da5ec70)
> at pool_auth.c:421
> #6 0x000000000040a9c5 in connect_using_existing_connection (unix_fd=4,
> inet_fd=5) at child.c:1043
> #7 do_child (unix_fd=4, inet_fd=5) at child.c:330
> #8 0x000000000040455f in fork_a_child (unix_fd=4, inet_fd=5, id=0)
> at main.c:1258
> #9 0x0000000000404887 in reaper () at main.c:2482
> #10 0x0000000000407a47 in main (argc=<value optimized out>, argv=0x0)
> at main.c:714
>
> We looked into the code and realized that send_md5auth_request
> function, which is called just before read_password_packet,
> always returns 0. So we suspect that those processes were waiting
> for the response even when they failed to correctly send the
> request. Strangely, we couldn't reproduce the exact problem on
> our test cluster, however it was still happening on our
> production servers several times a day. We had to quickly find
What difference is between test cluster and productsion environment?
pgpool.conf and network load are under the same conditions?
> a workaround for this recurring problem, so we commented out
> do_md5 part from pool_do_reauth effectively disabling the
> authentication, and it has not occurred ever since. Although
> our cluster is running smooth now, we crippled the md5
> authentication. I believe this problem deserves attention and a
> proper fix. Thanks.
>
>
> --
> cheers,
> junegunn choi.
--
Yugo Nagata <nagata at sraoss.co.jp>
More information about the pgpool-hackers
mailing list