[pgpool-hackers: 514] Re: Child processes hang during `read_password_packet`

Yugo Nagata nagata at sraoss.co.jp
Tue May 13 21:47:16 JST 2014


Hi,

What version of pgpool do you use? 
And pgpool.conf is the same of the another thread?
[pgpool-general: 2785] Re: Child process hangs in active state

On Wed, 7 May 2014 11:48:10 +0900
Junegunn Choi <junegunn.c at gmail.com> wrote:

> Hi,
> we had the issue where every child process hangs while waiting
> for the password packet from frontend, finally making the whole
> pgpool cluster unresponsive. The callstack of each process was
> given as follows:
> 
> 
> #0  0x0000003d222cdaf3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x0000000000418c61 in pool_check_fd (cp=<value optimized out>)
>     at pool_process_query.c:951
> #2  0x000000000041d534 in pool_read (cp=0x1da7a210, buf=0x7ffff5092d3f,
> len=1)
>     at pool_stream.c:139
> #3  0x000000000040b9f0 in read_password_packet (frontend=0x1da7a210,
>     protoMajor=<value optimized out>,
>     password=0x70a460 "md55c81acdb03ea852f30d0630528697236",
> pwdSize=0x70a860)
>     at pool_auth.c:1047
> #4  0x000000000040c8d2 in do_md5 (backend=0x1da63c70, frontend=0x1da7a210,
>     reauth=1, protoMajor=3) at pool_auth.c:867
> #5  0x000000000040cceb in pool_do_reauth (frontend=0x1da7a210,
> cp=0x1da5ec70)
>     at pool_auth.c:421
> #6  0x000000000040a9c5 in connect_using_existing_connection (unix_fd=4,
>     inet_fd=5) at child.c:1043
> #7  do_child (unix_fd=4, inet_fd=5) at child.c:330
> #8  0x000000000040455f in fork_a_child (unix_fd=4, inet_fd=5, id=0)
>     at main.c:1258
> #9  0x0000000000404887 in reaper () at main.c:2482
> #10 0x0000000000407a47 in main (argc=<value optimized out>, argv=0x0)
>     at main.c:714
> 
> We looked into the code and realized that send_md5auth_request
> function, which is called just before read_password_packet,
> always returns 0. So we suspect that those processes were waiting
> for the response even when they failed to correctly send the
> request. Strangely, we couldn't reproduce the exact problem on
> our test cluster, however it was still happening on our
> production servers several times a day. We had to quickly find

What difference is between test cluster and productsion environment?
pgpool.conf and network load are under the same conditions?

> a workaround for this recurring problem, so we commented out
> do_md5 part from pool_do_reauth effectively disabling the
> authentication, and it has not occurred ever since. Although
> our cluster is running smooth now, we crippled the md5
> authentication. I believe this problem deserves attention and a
> proper fix. Thanks.
> 
> 
> -- 
> cheers,
> junegunn choi.


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-hackers mailing list