<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 4, 2017 at 11:39 AM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">> Hi ishii-San<br>
><br>
> I am looking into the issue<br>
> <a href="http://www.pgpool.net/mantisbt/view.php?id=249" rel="noreferrer" target="_blank">http://www.pgpool.net/<wbr>mantisbt/view.php?id=249</a>, where<br>
> pgpool-II sometimes does not de-escalations while shutting down. And as per<br>
> the bug report, the issue starts to appear after this commit.<br>
><br>
> Although I am not able to replicate the exact reported issue but It seems<br>
> like the changes made by this commit can leave the zombie processes.<br>
><br>
> As we are replacing the wait(NULL) with waitpid(,..WNOHANG)<br>
><br>
> @@ -1365,8 +1367,10 @@ static RETSIGTYPE exit_handler(int sig)<br>
> POOL_SETMASK(&UnBlockSig);<br>
> do<br>
> {<br>
> - wpid = wait(NULL);<br>
> - }while (wpid > 0 || (wpid == -1 && errno == EINTR));<br>
> + int ret_pid;<br>
> + wpid = waitpid(-1, &ret_pid, WNOHANG);<br>
> + } while (wpid > 0 || (wpid == -1 && errno == EINTR));<br>
><br>
> The problem with this logic is that after replacing the wait(NULL) with<br>
> waitpid(,..WNOHANG) we can move forward without waiting for all child<br>
> process to finish, especially if some child process takes a little longer<br>
> to finish. Since waitpid() returns 0 indicating that there is no<br>
> exiting process at the moment, even when the child processes exists.<br>
> For example,<br>
> at the time of system shutdown, the watchdog process sometimes takes few<br>
> seconds to execute the de-escalation process before exiting, and meanwhile<br>
> in the main process as soon as waitpid( WNOHANG) would return 0 and the<br>
> pgpool-II main process exits itself leaving the watchdog process as a<br>
> zombie.<br>
<br>
</div></div>You are right. I should have not used WNOHANG here. The line should<br>
have been:<br>
<br>
wpid = waitpid(-1, &ret_pid, 0);<br></blockquote><div><br></div><div>Thanks for the confirmation. I have committed this change. </div><div><br></div><div>Regards</div><div>Muhammad Usama</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
> Also, is it possible if you can share the scenario where you ran into the<br>
> infinite wait situation, as there may be some other issue in the code since<br>
> as per the wait() system call documentation it returns -1 when there is no<br>
> child process, so theoretically wait() call should not cause the infinite<br>
> wait.<br>
<br>
</span>Not remember clearly but it maybe the case When a child receives a<br>
stop signal (SIGSTOP).<br>
<div class="HOEnZb"><div class="h5"><br>
> On Thu, Jul 7, 2016 at 11:55 AM, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>> wrote:<br>
><br>
>> Fix usage of wait(2) in pgpool main process<br>
>><br>
>> Per [pgpool-hackers: 1444]. Here is the copy of the message:<br>
>><br>
>> Hi Usama,<br>
>><br>
>> I have noticed that the usage of wait(2) in pgpool main could cause<br>
>> infinite wait in the system call.<br>
>><br>
>> /* wait for all children to exit */<br>
>> do<br>
>> {<br>
>> wpid = wait(NULL);<br>
>> }while (wpid > 0 || (wpid == -1 && errno == EINTR));<br>
>><br>
>> When child process dies, SIGCHLD signal is raised and wait(2) knows<br>
>> the event. However, multiple child death does not necessarily creates<br>
>> exact same number of SIGCHLD signal as the number of dead children and<br>
>> wait(2) could wait for an event which never happens in this case. I<br>
>> actually encountered this situation while testing pgpool-II. Solution<br>
>> is, to use waitpid(2) instead of wait(2).<br>
>><br>
>> Branch<br>
>> ------<br>
>> master<br>
>><br>
>> Details<br>
>> -------<br>
>> <a href="http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=" rel="noreferrer" target="_blank">http://git.postgresql.org/<wbr>gitweb?p=pgpool2.git;a=<wbr>commitdiff;h=</a><br>
>> 0d1cdf96feb77de6f1dfc2d46ecd74<wbr>67325d1f79<br>
>><br>
>> Modified Files<br>
>> --------------<br>
>> src/main/pgpool_main.c | 12 ++++++++----<br>
>> 1 file changed, 8 insertions(+), 4 deletions(-)<br>
>><br>
>> ______________________________<wbr>_________________<br>
>> pgpool-committers mailing list<br>
>> <a href="mailto:pgpool-committers@pgpool.net">pgpool-committers@pgpool.net</a><br>
>> <a href="http://www.pgpool.net/mailman/listinfo/pgpool-committers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/<wbr>listinfo/pgpool-committers</a><br>
>><br>
</div></div></blockquote></div><br></div></div>