[pgpool-hackers: 1975] Re: Problem with log_line_prefix()

Muhammad Usama m.usama at gmail.com
Wed Jan 11 00:37:02 JST 2017


On Tue, Jan 10, 2017 at 8:08 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> Problem is, there are some places where standard library functions
> that calls malloc() internally.
> (For example, asprintf(), getaddrinfo()).
>
> You are right, this problem makes this solution not feasible.



> Also blocking signals/unblocking signals every time malloc() called
> might hurt performance (I have not done an actual measurement. So this
> just a guess).
>

Yes the benchmark can revel the actual overhead, but I think the impact of
this change should be
very small, since Memory manager allocates the memory in chunks and
malloc is not issued for every palloc call,
But again its hard to guess the actual impact before doing the measurements.

So I think we can proceed with the simple solution for now and remove the
ereports from the signal handlers.

Kind regards
Muhammad Usama



> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Ishii-San
> >
> > I have been thinking about this and I think there is one solution we can
> > adopt.
> >
> > We can block signals before doing malloc() in, AllocSetAlloc() and
> > AllocSetContextCreate() function, And since pgpool-II only uses palloc
> and
> > friends to allocate memory and there are only few places where the
> malloc()
> > are issued by MemoryManager API, So if we block signals before calling
> the
> > malloc(), we can safely use elog/ereport from signal handler routines.
> > What are your thoughts on this. I am attaching the sample patch for the
> > reference.
> >
> > Kind regards
> > Muhammad Usama
> >
> >
> > On Fri, Jan 6, 2017 at 12:03 PM, Muhammad Usama <m.usama at gmail.com>
> wrote:
> >
> >> Hi Ishii-San
> >>
> >>
> >> On Fri, Jan 6, 2017 at 11:22 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >>
> >>> After some investigations, I found PostgreSQL does not block all
> >>> signals: SIGTERM etc. still can interrupt.
> >>>
> >>> Probably we should take out elog/ereport call from child.c:die().
> >>>
> >>
> >> Yes, this seems to be a good fix for now, But I am looking for some more
> >> future proof fix. Give me a couple of more days to get to the solution,
> >> otherwise we will go with removing the ereport from child.c:die.
> >>
> >> Thanks
> >> Best regards
> >> Muhammad Usama
> >>
> >>
> >>> Best regards,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese:http://www.sraoss.co.jp
> >>>
> >>> > I have asked this how PostgreSQL solves similar problems like this.
> >>> > It turned out that PostgreSQL mostly blocks all signals:
> >>> >
> >>> > https://www.postgresql.org/message-id/18405.1482794285%
> 40sss.pgh.pa.us
> >>> >
> >>> > Probably we should do the same way...
> >>> >
> >>> > Best regards,
> >>> > --
> >>> > Tatsuo Ishii
> >>> > SRA OSS, Inc. Japan
> >>> > English: http://www.sraoss.co.jp/index_en.php
> >>> > Japanese:http://www.sraoss.co.jp
> >>> >
> >>> >> Usama,
> >>> >>
> >>> >> If I hit CTRL-C key while executing the regression test, sometimes a
> >>> >> orphan pgpool child remains. Here's a stack trace of the process:
> >>> >>
> >>> >> t-ishii at localhost: gdb temp/installed/bin/pgpool 3794
> >>> >> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
> >>> >> [snip]
> >>> >> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
> >>> >> __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linu
> >>> x/x86_64/lowlevellock.S:95
> >>> >> 95   ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:
> >>> そのようなファイルやディレクトリはありません.
> >>> >> (gdb) bt
> >>> >> #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linu
> >>> x/x86_64/lowlevellock.S:95
> >>> >> #1  0x00007f67fe20ccba in _L_lock_12808 () from
> >>> /lib/x86_64-linux-gnu/libc.so.6
> >>> >> #2  0x00007f67fe20a6b5 in __GI___libc_malloc (bytes=15) at
> >>> malloc.c:2887
> >>> >> #3  0x00007f67fe21072a in __GI___strdup (s=0x7f67fe305dd8
> >>> "/etc/localtime") at strdup.c:42
> >>> >> #4  0x00007f67fe239f51 in tzset_internal (always=<optimized out>,
> >>> explicit=explicit at entry=1)
> >>> >>     at tzset.c:444
> >>> >> #5  0x00007f67fe23a913 in __tz_convert (timer=timer at entry
> =0x7ffce1c1b
> >>> 7f8,
> >>> >>     use_localtime=use_localtime at entry=1, tp=tp at entry=0x7f67fe54bde0
> >>> <_tmbuf>) at tzset.c:632
> >>> >> #6  0x00007f67fe2387d1 in __GI_localtime (t=t at entry=0x7ffce1c1b7f8)
> >>> at localtime.c:42
> >>> >> #7  0x000000000045627b in log_line_prefix (buf=buf at entry
> =0x7ffce1c1b8d0,
> >>> line_prefix=<optimized out>,
> >>> >>     edata=<optimized out>) at ../../src/utils/error/elog.c:2059
> >>> >> #8  0x000000000045894d in send_message_to_server_log (edata=0x753320
> >>> <errordata>)
> >>> >>     at ../../src/utils/error/elog.c:2084
> >>> >> #9  EmitErrorReport () at ../../src/utils/error/elog.c:1129
> >>> >> #10 0x0000000000456d8e in errfinish (dummy=<optimized out>) at
> >>> ../../src/utils/error/elog.c:434
> >>> >> #11 0x0000000000421f57 in die (sig=2) at protocol/child.c:925
> >>> >> #12 <signal handler called>
> >>> >> #13 _int_malloc (av=0x7f67fe546760 <main_arena>, bytes=4176) at
> >>> malloc.c:3302
> >>> >> #14 0x00007f67fe20a6c0 in __GI___libc_malloc (bytes=4176) at
> >>> malloc.c:2891
> >>> >> #15 0x000000000045561d in AllocSetAlloc (context=0xf90000,
> size=4128)
> >>> at ../../src/utils/mmgr/aset.c:674
> >>> >> #16 0x000000000042cd1e in pool_init_cp () at
> >>> protocol/pool_connection_pool.c:70
> >>> >> #17 0x0000000000423024 in do_child (fds=fds at entry=0xfa35c0) at
> >>> protocol/child.c:194
> >>> >> #18 0x00000000004086f5 in fork_a_child (fds=0xfa35c0, id=11) at
> >>> main/pgpool_main.c:723
> >>> >> #19 0x00000000004090cc in reaper () at main/pgpool_main.c:2427
> >>> >> #20 0x000000000040d869 in PgpoolMain (discard_status=discard_
> status at entry=1
> >>> '\001',
> >>> >>     clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000')
> at
> >>> main/pgpool_main.c:457
> >>> >> #21 0x0000000000406eb3 in main (argc=<optimized out>,
> argv=<optimized
> >>> out>) at main/main.c:305
> >>> >>
> >>> >> As you can see, the process was stuck while calling malloc().
> >>> >>
> >>> >> pool_init_cp() was interrupted by SIGINT (because I hit the key),
> and
> >>> >> the signal handler die() was called. In die(), elog() was
> >>> >> called. elog() calls log_line_prefix(), which calls
> >>> >> localtime(). localtime() calls malloc(), which seems to try to
> obtain
> >>> >> a lock, then it causes a dead lock since pool_init_cp() was
> >>> >> interrupted while calling malloc().
> >>> >>
> >>> >> log_line_prefix() was copied from PostgreSQL but I see subtle
> >>> >> differences from the PostgreSQL's original log_line_prefix().:
> >>> >>
> >>> >> pg_strftime(strfbuf, sizeof(strfbuf),
> >>> >>      "%Y-%m-%d %H:%M:%S %Z",
> >>> >>      pg_localtime(&stamp_time, log_timezone));
> >>> >>
> >>> >> Ours:
> >>> >>      strftime(strbuf, 128, "%Y-%m-%d %H:%M:%S", localtime(&now));
> >>> >>
> >>> >> Apparently the call stuck shows that the hang happened inside
> >>> >> localtime() because it calls malloc(). On the other hand,
> >>> >> pg_localtime() seems to avoid calling localtime() by using their
> home
> >>> >> made version of localtime(), which does *not* call malloc(). I'm not
> >>> >> sure if it's a intention but it seems it'd better to follow
> >>> >> PostgreSQL's way to avoid the hang problem.
> >>> >>
> >>> >> What do you think, Usama?
> >>> >>
> >>> >> Best regards,
> >>> >> --
> >>> >> Tatsuo Ishii
> >>> >> SRA OSS, Inc. Japan
> >>> >> English: http://www.sraoss.co.jp/index_en.php
> >>> >> Japanese:http://www.sraoss.co.jp
> >>> >> _______________________________________________
> >>> >> pgpool-hackers mailing list
> >>> >> pgpool-hackers at pgpool.net
> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>> > _______________________________________________
> >>> > pgpool-hackers mailing list
> >>> > pgpool-hackers at pgpool.net
> >>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>> _______________________________________________
> >>> pgpool-hackers mailing list
> >>> pgpool-hackers at pgpool.net
> >>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>>
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170110/47f9870e/attachment.html>


More information about the pgpool-hackers mailing list