[pgpool-hackers: 1978] Re: Problem with log_line_prefix()

Muhammad Usama m.usama at gmail.com
Wed Jan 11 19:37:02 JST 2017


On Wed, Jan 11, 2017 at 5:44 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > On Tue, Jan 10, 2017 at 8:08 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> Hi Usama,
> >>
> >> Problem is, there are some places where standard library functions
> >> that calls malloc() internally.
> >> (For example, asprintf(), getaddrinfo()).
> >>
> >> You are right, this problem makes this solution not feasible.
> >
> >
> >
> >> Also blocking signals/unblocking signals every time malloc() called
> >> might hurt performance (I have not done an actual measurement. So this
> >> just a guess).
> >>
> >
> > Yes the benchmark can revel the actual overhead, but I think the impact
> of
> > this change should be
> > very small, since Memory manager allocates the memory in chunks and
> > malloc is not issued for every palloc call,
> > But again its hard to guess the actual impact before doing the
> measurements.
> >
> > So I think we can proceed with the simple solution for now and remove the
> > ereports from the signal handlers.
>
> I have pushed changes in the direction.
>

Thanks
Best regards
Muhammad usama


>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Kind regards
> > Muhammad Usama
> >
> >
> >
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Hi Ishii-San
> >> >
> >> > I have been thinking about this and I think there is one solution we
> can
> >> > adopt.
> >> >
> >> > We can block signals before doing malloc() in, AllocSetAlloc() and
> >> > AllocSetContextCreate() function, And since pgpool-II only uses palloc
> >> and
> >> > friends to allocate memory and there are only few places where the
> >> malloc()
> >> > are issued by MemoryManager API, So if we block signals before calling
> >> the
> >> > malloc(), we can safely use elog/ereport from signal handler routines.
> >> > What are your thoughts on this. I am attaching the sample patch for
> the
> >> > reference.
> >> >
> >> > Kind regards
> >> > Muhammad Usama
> >> >
> >> >
> >> > On Fri, Jan 6, 2017 at 12:03 PM, Muhammad Usama <m.usama at gmail.com>
> >> wrote:
> >> >
> >> >> Hi Ishii-San
> >> >>
> >> >>
> >> >> On Fri, Jan 6, 2017 at 11:22 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >>
> >> >>> After some investigations, I found PostgreSQL does not block all
> >> >>> signals: SIGTERM etc. still can interrupt.
> >> >>>
> >> >>> Probably we should take out elog/ereport call from child.c:die().
> >> >>>
> >> >>
> >> >> Yes, this seems to be a good fix for now, But I am looking for some
> more
> >> >> future proof fix. Give me a couple of more days to get to the
> solution,
> >> >> otherwise we will go with removing the ereport from child.c:die.
> >> >>
> >> >> Thanks
> >> >> Best regards
> >> >> Muhammad Usama
> >> >>
> >> >>
> >> >>> Best regards,
> >> >>> --
> >> >>> Tatsuo Ishii
> >> >>> SRA OSS, Inc. Japan
> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >>> Japanese:http://www.sraoss.co.jp
> >> >>>
> >> >>> > I have asked this how PostgreSQL solves similar problems like
> this.
> >> >>> > It turned out that PostgreSQL mostly blocks all signals:
> >> >>> >
> >> >>> > https://www.postgresql.org/message-id/18405.1482794285%
> >> 40sss.pgh.pa.us
> >> >>> >
> >> >>> > Probably we should do the same way...
> >> >>> >
> >> >>> > Best regards,
> >> >>> > --
> >> >>> > Tatsuo Ishii
> >> >>> > SRA OSS, Inc. Japan
> >> >>> > English: http://www.sraoss.co.jp/index_en.php
> >> >>> > Japanese:http://www.sraoss.co.jp
> >> >>> >
> >> >>> >> Usama,
> >> >>> >>
> >> >>> >> If I hit CTRL-C key while executing the regression test,
> sometimes a
> >> >>> >> orphan pgpool child remains. Here's a stack trace of the process:
> >> >>> >>
> >> >>> >> t-ishii at localhost: gdb temp/installed/bin/pgpool 3794
> >> >>> >> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
> >> >>> >> [snip]
> >> >>> >> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
> >> >>> >> __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linu
> >> >>> x/x86_64/lowlevellock.S:95
> >> >>> >> 95   ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:
> >> >>> そのようなファイルやディレクトリはありません.
> >> >>> >> (gdb) bt
> >> >>> >> #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linu
> >> >>> x/x86_64/lowlevellock.S:95
> >> >>> >> #1  0x00007f67fe20ccba in _L_lock_12808 () from
> >> >>> /lib/x86_64-linux-gnu/libc.so.6
> >> >>> >> #2  0x00007f67fe20a6b5 in __GI___libc_malloc (bytes=15) at
> >> >>> malloc.c:2887
> >> >>> >> #3  0x00007f67fe21072a in __GI___strdup (s=0x7f67fe305dd8
> >> >>> "/etc/localtime") at strdup.c:42
> >> >>> >> #4  0x00007f67fe239f51 in tzset_internal (always=<optimized out>,
> >> >>> explicit=explicit at entry=1)
> >> >>> >>     at tzset.c:444
> >> >>> >> #5  0x00007f67fe23a913 in __tz_convert (timer=timer at entry
> >> =0x7ffce1c1b
> >> >>> 7f8,
> >> >>> >>     use_localtime=use_localtime at entry=1, tp=tp at entry
> =0x7f67fe54bde0
> >> >>> <_tmbuf>) at tzset.c:632
> >> >>> >> #6  0x00007f67fe2387d1 in __GI_localtime (t=t at entry
> =0x7ffce1c1b7f8)
> >> >>> at localtime.c:42
> >> >>> >> #7  0x000000000045627b in log_line_prefix (buf=buf at entry
> >> =0x7ffce1c1b8d0,
> >> >>> line_prefix=<optimized out>,
> >> >>> >>     edata=<optimized out>) at ../../src/utils/error/elog.c:2059
> >> >>> >> #8  0x000000000045894d in send_message_to_server_log
> (edata=0x753320
> >> >>> <errordata>)
> >> >>> >>     at ../../src/utils/error/elog.c:2084
> >> >>> >> #9  EmitErrorReport () at ../../src/utils/error/elog.c:1129
> >> >>> >> #10 0x0000000000456d8e in errfinish (dummy=<optimized out>) at
> >> >>> ../../src/utils/error/elog.c:434
> >> >>> >> #11 0x0000000000421f57 in die (sig=2) at protocol/child.c:925
> >> >>> >> #12 <signal handler called>
> >> >>> >> #13 _int_malloc (av=0x7f67fe546760 <main_arena>, bytes=4176) at
> >> >>> malloc.c:3302
> >> >>> >> #14 0x00007f67fe20a6c0 in __GI___libc_malloc (bytes=4176) at
> >> >>> malloc.c:2891
> >> >>> >> #15 0x000000000045561d in AllocSetAlloc (context=0xf90000,
> >> size=4128)
> >> >>> at ../../src/utils/mmgr/aset.c:674
> >> >>> >> #16 0x000000000042cd1e in pool_init_cp () at
> >> >>> protocol/pool_connection_pool.c:70
> >> >>> >> #17 0x0000000000423024 in do_child (fds=fds at entry=0xfa35c0) at
> >> >>> protocol/child.c:194
> >> >>> >> #18 0x00000000004086f5 in fork_a_child (fds=0xfa35c0, id=11) at
> >> >>> main/pgpool_main.c:723
> >> >>> >> #19 0x00000000004090cc in reaper () at main/pgpool_main.c:2427
> >> >>> >> #20 0x000000000040d869 in PgpoolMain (discard_status=discard_
> >> status at entry=1
> >> >>> '\001',
> >> >>> >>     clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0
> '\000')
> >> at
> >> >>> main/pgpool_main.c:457
> >> >>> >> #21 0x0000000000406eb3 in main (argc=<optimized out>,
> >> argv=<optimized
> >> >>> out>) at main/main.c:305
> >> >>> >>
> >> >>> >> As you can see, the process was stuck while calling malloc().
> >> >>> >>
> >> >>> >> pool_init_cp() was interrupted by SIGINT (because I hit the key),
> >> and
> >> >>> >> the signal handler die() was called. In die(), elog() was
> >> >>> >> called. elog() calls log_line_prefix(), which calls
> >> >>> >> localtime(). localtime() calls malloc(), which seems to try to
> >> obtain
> >> >>> >> a lock, then it causes a dead lock since pool_init_cp() was
> >> >>> >> interrupted while calling malloc().
> >> >>> >>
> >> >>> >> log_line_prefix() was copied from PostgreSQL but I see subtle
> >> >>> >> differences from the PostgreSQL's original log_line_prefix().:
> >> >>> >>
> >> >>> >> pg_strftime(strfbuf, sizeof(strfbuf),
> >> >>> >>      "%Y-%m-%d %H:%M:%S %Z",
> >> >>> >>      pg_localtime(&stamp_time, log_timezone));
> >> >>> >>
> >> >>> >> Ours:
> >> >>> >>      strftime(strbuf, 128, "%Y-%m-%d %H:%M:%S", localtime(&now));
> >> >>> >>
> >> >>> >> Apparently the call stuck shows that the hang happened inside
> >> >>> >> localtime() because it calls malloc(). On the other hand,
> >> >>> >> pg_localtime() seems to avoid calling localtime() by using their
> >> home
> >> >>> >> made version of localtime(), which does *not* call malloc(). I'm
> not
> >> >>> >> sure if it's a intention but it seems it'd better to follow
> >> >>> >> PostgreSQL's way to avoid the hang problem.
> >> >>> >>
> >> >>> >> What do you think, Usama?
> >> >>> >>
> >> >>> >> Best regards,
> >> >>> >> --
> >> >>> >> Tatsuo Ishii
> >> >>> >> SRA OSS, Inc. Japan
> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
> >> >>> >> Japanese:http://www.sraoss.co.jp
> >> >>> >> _______________________________________________
> >> >>> >> pgpool-hackers mailing list
> >> >>> >> pgpool-hackers at pgpool.net
> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >> >>> > _______________________________________________
> >> >>> > pgpool-hackers mailing list
> >> >>> > pgpool-hackers at pgpool.net
> >> >>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >> >>> _______________________________________________
> >> >>> pgpool-hackers mailing list
> >> >>> pgpool-hackers at pgpool.net
> >> >>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >> >>>
> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170111/66199ad8/attachment-0001.html>


More information about the pgpool-hackers mailing list