[pgpool-hackers: 515] Re: Child processes hang during `read_password_packet`

Junegunn Choi junegunn.c at gmail.com
Wed May 14 14:36:40 JST 2014


> What version of pgpool do you use?
> And pgpool.conf is the same of the another thread?

Yes. The same version and the configuration.

> [pgpool-general: 2785] Re: Child process hangs in active state

Actually we mitigated the issue by setting client_idle_limit.
The dev team is investigating why some connections are leaking.

This "read_password_packet" callstack was also observed back then
with low probability. But after setting client_idle_limit it became
the most common one.

> What difference is between test cluster and productsion environment?
> pgpool.conf and network load are under the same conditions?

Really can't say they're identical. For various reasons, we currently
can't afford to have two identical set of clusters. (FYI the production
cluster consists of over 20 postgres shards and 9 servers
each of which running multiple instances of pgpool backed by L7)

After we commented out that "do_md5" part from "pool_do_reauth",
we're now seeing that the same problem in pool_do_auth (notice that we
have child_max_connections set to 2000)

#0  0x0000003d222cdaf3 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000418a61 in pool_check_fd (cp=<value optimized out>) at
pool_process_query.c:951
#2  0x000000000041d334 in pool_read (cp=0x12c80c00,
buf=0x7fff9d1e82df, len=1) at pool_stream.c:139
#3  0x000000000040b9f0 in read_password_packet (frontend=0x12c80c00,
protoMajor=<value optimized out>, password=0x70a460 "",
pwdSize=0x70a860) at pool_auth.c:1047
#4  0x000000000040c5f2 in do_md5 (backend=0x12c83800,
frontend=0x12c80c00, reauth=0, protoMajor=3) at pool_auth.c:867
#5  0x000000000040c8f9 in pool_do_auth (frontend=0x12c80c00,
cp=0x12c7ea40) at pool_auth.c:222
#6  0x000000000040b079 in connect_backend (unix_fd=5, inet_fd=6) at child.c:1241
#7  do_child (unix_fd=5, inet_fd=6) at child.c:320
#8  0x000000000040455f in fork_a_child (unix_fd=5, inet_fd=6, id=2) at
main.c:1258
#9  0x0000000000404887 in reaper () at main.c:2482
#10 0x0000000000404c15 in pool_sleep (second=<value optimized out>) at
main.c:2679
#11 0x00000000004079fa in main (argc=<value optimized out>,
argv=<value optimized out>) at main.c:856

We plan to patch the do_md5 function to actually check the result codes
from the functions it's calling like so:

diff --git a/pool_auth.c b/pool_auth.c
index f70383b..cb185ab 100644
--- a/pool_auth.c
+++ b/pool_auth.c
@@ -1008,19 +1008,21 @@ static int do_md5(POOL_CONNECTION *backend,
POOL_CONNECTION *frontend, int reaut
  */
 static int send_md5auth_request(POOL_CONNECTION *frontend, int
protoMajor, char *salt)
 {
+  #define CHECK_RET(expr) if (expr) return -1
  int len;
  int kind;

- pool_write(frontend, "R", 1); /* authentication */
+ CHECK_RET(pool_write(frontend, "R", 1)); /* authentication */
  if (protoMajor == PROTO_MAJOR_V3)
  {
  len = htonl(12);
- pool_write(frontend, &len, sizeof(len));
+ CHECK_RET(pool_write(frontend, &len, sizeof(len)));
  }
  kind = htonl(5);
- pool_write(frontend, &kind, sizeof(kind)); /* indicating MD5 */
- pool_write_and_flush(frontend, salt, 4); /* salt */
+ CHECK_RET(pool_write(frontend, &kind, sizeof(kind))); /* indicating MD5 */
+ CHECK_RET(pool_write_and_flush(frontend, salt, 4)); /* salt */

+  #undef CHECK_RET
  return 0;
 }

I'll keep you updated with the progress.


Regards,
Junegunn Choi


More information about the pgpool-hackers mailing list