[pgpool-general: 1973] Re: tight lockup of main pgpool process

Yugo Nagata nagata at sraoss.co.jp
Thu Aug 1 16:14:17 JST 2013


Hi Sean,

Sorry for delay reply.
This lockup is a bug in 3.3.0-RC1 and fixed in 3.3.0.

This bug is about inter-locking of failover command that is a new feature of
3.3. When a backend down is detected and failover starts, standby pgpools
confirm the existence of active pgpool, which coordinates inter-locking. 
In 3.3.0-RC1, standby pgpools also confirm that the active pgpool accept 
a lifecheck query (SELECT 1). However, the query is timed out because the 
backend is stoped, and this caused the lockup. In 3.3.0, standby pgpools 
don't send the query.

On Fri, 26 Jul 2013 15:36:18 -0230
Sean Hogan <sean at compusult.net> wrote:

> Hi Tatsuo,
> 
> I have been experimenting with a pgpool-II setup having three machines: 
> two of them have one pgpool-II (port 5430) and one PostgreSQL each, and 
> the third has just PostgreSQL.  This is git master code (3.3.1-RC1) with 
> the exception that it also has the small patch for continuing correctly 
> after a SQL parse error.
> 
> I have been having a devil of a time getting this configuration to 
> function properly, in that queries return inconsistent numbers of 
> results among the backends.  That is a serious problem, but not my main 
> concern right now.
> 
> If I manually down the backend that disagrees with the other two (I 
> actually thought pgpool did this automatically, but that doesn't seem to 
> happen) then pgpool on one of the nodes gets into a bad state:
> 
> 2013-07-26 14:54:07 ERROR: pid 27863: connect_inet_domain_socket: 
> getsockopt() detected error: Connection refused
> 2013-07-26 14:54:07 ERROR: pid 27863: connection to 
> psql-vm2.compusult.net(5432) failed
> 2013-07-26 14:54:07 ERROR: pid 27863: new_connection: create_cp() failed
> 2013-07-26 14:54:56 LOG:   pid 27735: wd_create_send_socket: connect() 
> reports failure (Cannot assign requested address). You can safely ignore 
> this while starting up.
> 2013-07-26 14:54:56 LOG:   pid 27735: send_packet_4_nodes: packet for 
> psql-vm2.compusult.net:9000 is canceled
> 
> and then the last two lines repeat indefinitely.  That process 27735 
> (the main pgpool process) is unresponsive to ordinary kills; -9 is 
> required to stop it.  Of course if I do that, then all its children have 
> to be killed individually which is tremendously tedious.  This command:
>      psql -U postgres -p 5430 -c "show pool_nodes"
> also locks up and has to be killed with Control-\.
> 
> This happens to be the standby pgpool instance, but I believe I have 
> seen it happen with the active one as well.
> 
> Any ideas what might be happening here?
> 
> Thanks,
> Sean


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-general mailing list