<font size=2 face="sans-serif">Hi All, </font>

<br>

<br><font size=2 face="sans-serif">@ Tom</font>

<br><font size=2 face="sans-serif">Thank you for your response. While working

on your suggestions, we seem to have found the cause of our problems.</font>

<br>

<br><font size=2 face="sans-serif">@ Yugo</font>

<br><font size=2 face="sans-serif">Thank you for your response. We are

running pgpool in replication mode with load balancing enabled. If you

have further questions to aid in debugging the situation, please let me

know. </font>

<br>

<br>

<br><font size=2 face="sans-serif">It seems that the root cause was that

pgpool acquired the locks in the wrong order. If the resource is called

A it seems that pgpool allows child X to acquire A on node1 and at the

same time, child Y acquires A on node2. This leaves X wanting A on node2

and Y wanting A on node1. This leaves both children hanging indefinitely.

It also leaves both postgres'es blissfully unaware of the deadlock, whereby

it escapes postgres'es deadlock detection.</font>

<br>

<br><font size=2 face="sans-serif">We have included a summary of the system

state here:</font>

<br><a href=http://pastebin.com/9f6gjxLA><font size=2 face="sans-serif">http://pastebin.com/9f6gjxLA</font></a>

<br>

<br><font size=2 face="sans-serif">We have used netstat to trace the connections

between the pgpool children and the postgress'es. pgpool child 7606 has

acquired a lock on the .204 server but waits for the same lock on the .202

server. At the same time pgpool child 7681 has the lock on the .202 server

and waits for it on the .204 server. Pgpool is running on the .204 server.

</font>

<br>

<br><font size=2 face="sans-serif">If anyone is interested, we have included

the full outputs in the following pastebins:</font>

<br>

<br><font size=2 face="sans-serif">pg_locks on 10.216.73.202: </font><a href=http://pastebin.com/uRQh5Env><font size=2 face="sans-serif">http://pastebin.com/uRQh5Env</font></a>

<br><font size=2 face="sans-serif">pg_locks on 10.216.73.204: </font><a href=http://pastebin.com/BXpirVQ2><font size=2 face="sans-serif">http://pastebin.com/BXpirVQ2</font></a>

<br><font size=2 face="sans-serif">netstat -p on 10.216.73.202: </font><a href=http://pastebin.com/b9kV7Wz4><font size=2 face="sans-serif">http://pastebin.com/b9kV7Wz4</font></a><font size=2 face="sans-serif"><br>

netstat -p on 10.216.73.204: </font><a href=http://pastebin.com/tPz8gwRG><font size=2 face="sans-serif">http://pastebin.com/tPz8gwRG</font></a>

<br>

<br><font size=2 face="sans-serif">Kind regards,</font>

<br><font size=2 face="sans-serif">Fredrik &amp; friends</font>

<br>

<br>

<br>

<br>

<br>

<table width=100%>

<tr valign=top>

<td width=40%><font size=1 face="sans-serif"><b>Tom Lane &lt;tgl@sss.pgh.pa.us&gt;</b>

</font>

<p><font size=1 face="sans-serif">2013/01/10 05:30</font>

<td width=59%>

<table width=100%>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">To</font></div>

<td><font size=1 face="sans-serif">Fredrik.HuitfeldtMadsen@schneider-electric.com</font>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">cc</font></div>

<td><font size=1 face="sans-serif">pgsql-general@postgresql.org, pgpool-general@pgpool.net</font>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">Subject</font></div>

<td><font size=1 face="sans-serif">Re: [GENERAL] Database connections seemingly

hanging</font></table>

<br>

<table>

<tr valign=top>

<td>

<td></table>

<br></table>

<br>

<br>

<br><tt><font size=2>Fredrik.HuitfeldtMadsen@schneider-electric.com writes:<br>

&gt; We have a setup where 2 JBoss (5.1) servers communicate with 1 instance

of <br>

&gt; PgPool (3.04), which again communicates with 2 Postgresql (8.4) servers.

<br>

&gt; The JBoss servers host some Java code for us and as part of that they

run <br>

&gt; some quartz jobs. <br>

<br>

&gt; These jobs are triggered right after startup and as part of that we

get <br>

&gt; what seems to get stuck. At least when we can see in the database

that <br>

&gt; when inspecting pg_locks, there exists a virtual transaction that

has all <br>

&gt; desired locks granted but seems to be stuck. When we inspect <br>

&gt; pg_stat_activity, it seems that the process is still waiting for the

query <br>

&gt; (SELECT ... FOR UPDATE) to finish.<br>

<br>

&gt; The locking transaction is described here: </font></tt><a href=http://pastebin.com/3pEn6vPe><tt><font size=2>http://pastebin.com/3pEn6vPe</font></tt></a><tt><font size=2><br>

<br>

What that shows is several sessions running SELECT FOR UPDATE, but none<br>

of them seem to be waiting. &nbsp;What else is going on? &nbsp;In particular,

are<br>

there any idle-in-transaction sessions? &nbsp;Also, would any of these<br>

SELECTs return enough rows that the sessions might be blocked trying to<br>

send data back to their clients? &nbsp;That wouldn't show as waiting =

true,<br>

though I think you could detect it by strace'ing the backends to see if<br>

they are stopped in a send() kernel call.<br>

<br>

&gt; We suspect that a connection to the database acquires its locks but

<br>

&gt; somehow does not return to the application. If this is true, it would

<br>

&gt; either be a postgresql or a pgpool problem. We would appreciate any

help <br>

&gt; in further debugging or resolving the situation. <br>

<br>

It seems like a good guess would be that you have a deadlock situation<br>

that cannot be detected by the database because part of the blockage is<br>

on the client side --- that is, client thread A is waiting on its<br>

database query, that query is waiting on some lock held by client thread<br>

B's database session, and thread B is somehow waiting for A on the<br>

client side. &nbsp;It's not too hard to get into this type of situation

when<br>

B is sitting on an open idle-in-transaction session: B isn't waiting for<br>

the database to do anything, but is doing something itself, and so it's<br>

not obvious that there's any risk. &nbsp;Thus my question about what idle<br>

sessions there might be. &nbsp;This does usually lead to a visibly waiting<br>

database session for client A, though, so it's probably too simple as an<br>

explanation for your issue. &nbsp;We have seen some harder-to-debug cases<br>

where the database sessions weren't visibly &quot;waiting&quot; because

they were<br>

blocked on client I/O, so maybe you've got something like that.<br>

<br>

Another line of thought to pursue is possible misuse of pgpool. &nbsp;If<br>

pgpool doesn't realize you're inside a transaction and swaps the<br>

connection to some other client thread, all kinds of confusion ensues.<br>

<br>

Also, I hope you're running a reasonably recent 8.4.x minor release.<br>

A quick look through the commit logs didn't show anything about deadlock<br>

fixes in the 8.4 branch, but I might have missed something that was<br>

fixed a long time ago.<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

regards, tom lane<br>

<br>

______________________________________________________________________<br>

This email has been scanned by the Symantec Email Security.cloud service.<br>

______________________________________________________________________<br>

</font></tt>

<br>