[pgpool-hackers: 1966] Re: Problem with log_line_prefix()

Tatsuo Ishii ishii at sraoss.co.jp
Fri Jan 6 15:22:21 JST 2017


After some investigations, I found PostgreSQL does not block all
signals: SIGTERM etc. still can interrupt.

Probably we should take out elog/ereport call from child.c:die().

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> I have asked this how PostgreSQL solves similar problems like this.
> It turned out that PostgreSQL mostly blocks all signals:
> 
> https://www.postgresql.org/message-id/18405.1482794285%40sss.pgh.pa.us
> 
> Probably we should do the same way...
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Usama,
>> 
>> If I hit CTRL-C key while executing the regression test, sometimes a
>> orphan pgpool child remains. Here's a stack trace of the process:
>> 
>> t-ishii at localhost: gdb temp/installed/bin/pgpool 3794
>> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
>> [snip]
>> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
>> __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
>> 95	../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: そのようなファイルやディレクトリはありません.
>> (gdb) bt
>> #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
>> #1  0x00007f67fe20ccba in _L_lock_12808 () from /lib/x86_64-linux-gnu/libc.so.6
>> #2  0x00007f67fe20a6b5 in __GI___libc_malloc (bytes=15) at malloc.c:2887
>> #3  0x00007f67fe21072a in __GI___strdup (s=0x7f67fe305dd8 "/etc/localtime") at strdup.c:42
>> #4  0x00007f67fe239f51 in tzset_internal (always=<optimized out>, explicit=explicit at entry=1)
>>     at tzset.c:444
>> #5  0x00007f67fe23a913 in __tz_convert (timer=timer at entry=0x7ffce1c1b7f8, 
>>     use_localtime=use_localtime at entry=1, tp=tp at entry=0x7f67fe54bde0 <_tmbuf>) at tzset.c:632
>> #6  0x00007f67fe2387d1 in __GI_localtime (t=t at entry=0x7ffce1c1b7f8) at localtime.c:42
>> #7  0x000000000045627b in log_line_prefix (buf=buf at entry=0x7ffce1c1b8d0, line_prefix=<optimized out>, 
>>     edata=<optimized out>) at ../../src/utils/error/elog.c:2059
>> #8  0x000000000045894d in send_message_to_server_log (edata=0x753320 <errordata>)
>>     at ../../src/utils/error/elog.c:2084
>> #9  EmitErrorReport () at ../../src/utils/error/elog.c:1129
>> #10 0x0000000000456d8e in errfinish (dummy=<optimized out>) at ../../src/utils/error/elog.c:434
>> #11 0x0000000000421f57 in die (sig=2) at protocol/child.c:925
>> #12 <signal handler called>
>> #13 _int_malloc (av=0x7f67fe546760 <main_arena>, bytes=4176) at malloc.c:3302
>> #14 0x00007f67fe20a6c0 in __GI___libc_malloc (bytes=4176) at malloc.c:2891
>> #15 0x000000000045561d in AllocSetAlloc (context=0xf90000, size=4128) at ../../src/utils/mmgr/aset.c:674
>> #16 0x000000000042cd1e in pool_init_cp () at protocol/pool_connection_pool.c:70
>> #17 0x0000000000423024 in do_child (fds=fds at entry=0xfa35c0) at protocol/child.c:194
>> #18 0x00000000004086f5 in fork_a_child (fds=0xfa35c0, id=11) at main/pgpool_main.c:723
>> #19 0x00000000004090cc in reaper () at main/pgpool_main.c:2427
>> #20 0x000000000040d869 in PgpoolMain (discard_status=discard_status at entry=1 '\001', 
>>     clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000') at main/pgpool_main.c:457
>> #21 0x0000000000406eb3 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:305
>> 
>> As you can see, the process was stuck while calling malloc().
>> 
>> pool_init_cp() was interrupted by SIGINT (because I hit the key), and
>> the signal handler die() was called. In die(), elog() was
>> called. elog() calls log_line_prefix(), which calls
>> localtime(). localtime() calls malloc(), which seems to try to obtain
>> a lock, then it causes a dead lock since pool_init_cp() was
>> interrupted while calling malloc().
>> 
>> log_line_prefix() was copied from PostgreSQL but I see subtle
>> differences from the PostgreSQL's original log_line_prefix().:
>> 
>> pg_strftime(strfbuf, sizeof(strfbuf),
>> 	"%Y-%m-%d %H:%M:%S %Z",
>> 	pg_localtime(&stamp_time, log_timezone));
>> 
>> Ours:
>> 	strftime(strbuf, 128, "%Y-%m-%d %H:%M:%S", localtime(&now));
>> 
>> Apparently the call stuck shows that the hang happened inside
>> localtime() because it calls malloc(). On the other hand,
>> pg_localtime() seems to avoid calling localtime() by using their home
>> made version of localtime(), which does *not* call malloc(). I'm not
>> sure if it's a intention but it seems it'd better to follow
>> PostgreSQL's way to avoid the hang problem.
>> 
>> What do you think, Usama?
>> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list