[pgpool-general: 1142] Re: Possible race condition in pool_get_passwd

Tatsuo Ishii ishii at postgresql.org
Sun Oct 28 17:57:43 JST 2012


> Hello all,
> 
> I've been digging into an issue we've been having with many threads
> connect to the pgpool backend at once.  Some of the threads are
> failing md5 authentication without ever sending an auth packet.
> 
> We're currently running pgpool 3.1.3 on FreeBSD.
> 
> After several hours of digging, I determined that pool_get_passwd was
> returning NULL.  I littered the thing with pool_debugs and started
> digging in.  The interesting this was the more debugging I added, the
> more often the issue occurred.  This got me looking more.
> 
> It appears to be there is a race condition based on the fact the
> underlying fd's are shared between the child processes.  Note this
> from the fork(2) man page on FreeBSD:
> 
> The child process has its own copy of the parent's descriptors.
> These descriptors reference the same underlying objects, so
> that, for instance, file pointers in file objects are shared
> between the child and the parent, so that an lseek(2) on a
> descriptor in the child process can affect a subsequent read(2)
> or write(2) by the parent.  This descriptor copying is also
> used by the shell to establish standard input and output for
> newly created processes as well as to set up pipes.
> 
> The linux man page has a similar note about then sharing offsets.
> 
> In all cases where I see the error, the first call to fgetc in
> pool_get_passwd returns EOF, even though the file pointer in the FILE
> struct is at 0 right before the call.
> 
> In this snippet, QQQ reports the position returned by ftell(), while
> NNN reports the position obtained by calling
> lseek(fileno(passwd_fd),0,SEEK_CUR)
> 
> Oct 28 01:10:09 apu pgpool[78129]: QQQ Pos 0
> Oct 28 01:10:09 apu pgpool[78129]: NNN Pos 48
> 
> As you can see, the FILE struct thinks we're at 0, while the
> underlying kernel fd shows us to be at 48.  This causes the subsequent
> fgetc to fail.
> 
> Is the process of sharing the FILE struct, and consequently the
> underlying fd safe?  Would a better approach be to cache the contents
> of the passwd file in memory, and provide a mechanism to reload them
> on reload?

Thanks for the analysis. I should have realized this earlier. I think
less invasive approach would be reopening the fd in each child
process's starting up. Could you try attached patch?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pool_md5.patch
Type: text/x-patch
Size: 868 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20121028/aefa8e21/attachment.bin>


More information about the pgpool-general mailing list