<div dir="ltr">Oh, it&#39;s entirely our application framework&#39;s fault!  <div><br></div><div>It shouldn&#39;t be waiting for a session to become free if it is also possible for sessions to be running for 5 minutes...</div>

<div><br></div><div>Our fix is to make the &quot;pinger&quot; not require a session, since it is only looking for a lockfile and doesn&#39;t need session data to do that.</div><div><br></div><div><br></div><div>Thanks again for your helpful suggestions, Tatsuo!</div>

<div><br></div><div>Justin</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jan 22, 2014 at 7:21 PM, Tatsuo Ishii <span dir="ltr">&lt;<a href="mailto:ishii@postgresql.org" target="_blank">ishii@postgresql.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Justin,<br>

<br>

Thank you for the follow up.<br>

<br>

It would be nice if we could avoid something like &quot;ping&quot; client to<br>

fill up all available sockets of pgpool-II. Any idea anyone?<br>

<div class="HOEnZb"><div class="h5"><br>

Best regards,<br>

--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

<br>

&gt; Just to follow up on this, it turns out it was a problem in our<br>

&gt; application...<br>

&gt;<br>

&gt; We had a long running job that was tying up the user&#39;s session, paired with<br>

&gt; a second browser window that was making a &quot;ping&quot; type call every 10<br>

&gt; seconds.  These pings were stacking up and after 300s, they would tie up<br>

&gt; all of PGPool&#39;s available sockets to apache.<br>

&gt;<br>

&gt;<br>

&gt; Thanks for the help.  I did a lot of testing with PGPool and thought it was<br>

&gt; the source of the problem, but the more I tested the more I became<br>

&gt; convinced that PGPool is actually working great!<br>

&gt;<br>

&gt;<br>

&gt; Cheers,<br>

&gt; Justin<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; On Thu, Jan 16, 2014 at 10:24 PM, Tatsuo Ishii &lt;<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; When the lock up happens, what &quot;select * from pg_stat_activity&quot;<br>

&gt;&gt; and &quot;select * from pg_locks&quot; show?<br>

&gt;&gt;<br>

&gt;&gt; Best regards,<br>

&gt;&gt; --<br>

&gt;&gt; Tatsuo Ishii<br>

&gt;&gt; SRA OSS, Inc. Japan<br>

&gt;&gt; English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

&gt;&gt; Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

&gt;&gt;<br>

&gt;&gt; &gt; Thank you, Tatsuo.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; We are still experiencing the problem once or twice per day.  I am making<br>

&gt;&gt; &gt; incremental changes on our live cluster after testing them on the test<br>

&gt;&gt; &gt; cluster.  So far we have done the following:<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; -Comment out unused 2nd backend in pgpool.conf<br>

&gt;&gt; &gt; -Add a connect_timeout of 10 seconds to the pg_connect() connection<br>

&gt;&gt; string<br>

&gt;&gt; &gt; in the PHP application<br>

&gt;&gt; &gt; -set sysctl net.core.somaxconn = 1024<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; We just did the last step today so we will see if there is any impact.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; When the fault happens, there is work being done in the database, yet<br>

&gt;&gt; &gt; &quot;select * from pg_stat_activity;&quot; shows only a few running queries at the<br>

&gt;&gt; &gt; time.  To me, this says that Apache+PHP still has the connection open to<br>

&gt;&gt; &gt; pgpool.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; I&#39;ll be sure to post back if we figure it out!<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Justin<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; On Mon, Jan 13, 2014 at 7:55 PM, Tatsuo Ishii &lt;<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>&gt;<br>

&gt;&gt; wrote:<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; Thanks for posting detailed analythis. It looks really interesting.<br>

&gt;&gt; &gt;&gt; I need more time to understanding full details.<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; In the mean time I wonder if you care about listen queue<br>

&gt;&gt; &gt;&gt; setting. Currently pgpool listens up to num_init_children*2 (which 64,<br>

&gt;&gt; &gt;&gt; in your case). However Apache connects to pgpool up to 256, which is<br>

&gt;&gt; &gt;&gt; way too low compared with 64. Also Linux allows max the listen queue<br>

&gt;&gt; &gt;&gt; to up 128 by default on most systems. You can check it by looking at:<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; $ sysctl net.core.somaxconn<br>

&gt;&gt; &gt;&gt; net.core.somaxconn = 128<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; 128 is too low compared with 256, of course.<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; If the allowed listen queue length (backlog) is too low, lots of retry<br>

&gt;&gt; &gt;&gt; happens in kernel&#39;s TCP layer.<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; Best regards,<br>

&gt;&gt; &gt;&gt; --<br>

&gt;&gt; &gt;&gt; Tatsuo Ishii<br>

&gt;&gt; &gt;&gt; SRA OSS, Inc. Japan<br>

&gt;&gt; &gt;&gt; English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

&gt;&gt; &gt;&gt; Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; &gt; Greetings!<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; We are having an issue with PGPool and I wanted to post my analysis to<br>

&gt;&gt; &gt;&gt; this<br>

&gt;&gt; &gt;&gt; &gt; list to see if: A). My analysis seems correct to you all and B). To<br>

&gt;&gt; see<br>

&gt;&gt; &gt;&gt; if<br>

&gt;&gt; &gt;&gt; &gt; you folks might have any advice on tuning.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; For the last month plus, we have been experiencing an intermittent<br>

&gt;&gt; fault<br>

&gt;&gt; &gt;&gt; &gt; state on our production cluster.  When the fault occurs, any request<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt; Apache+PHP web server will either time out connecting, or will connect<br>

&gt;&gt; &gt;&gt; but<br>

&gt;&gt; &gt;&gt; &gt; return with a &quot;Could not connect to DB&quot; message from PHP.  I&#39;ve done<br>

&gt;&gt; some<br>

&gt;&gt; &gt;&gt; &gt; analysis on the problem and this is what I&#39;ve found.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; First let me describe the cluster as it is configured today.  We have<br>

&gt;&gt; one<br>

&gt;&gt; &gt;&gt; &gt; web front end running Apache+PHP, which has a MaxClients setting of<br>

&gt;&gt; 256,<br>

&gt;&gt; &gt;&gt; &gt; meaning that it&#39;s possible to have 256 concurrently running processes.<br>

&gt;&gt; &gt;&gt;  The<br>

&gt;&gt; &gt;&gt; &gt; PHP application is configured to connect to PGPool 3.2.1 for its<br>

&gt;&gt; database<br>

&gt;&gt; &gt;&gt; &gt; connection.  PGPool is configured with max_init_children of 32 and<br>

&gt;&gt; &gt;&gt; max_pool<br>

&gt;&gt; &gt;&gt; &gt; of 8.  The application runs on 10-12 different databases, all with the<br>

&gt;&gt; &gt;&gt; same<br>

&gt;&gt; &gt;&gt; &gt; Postgres username+password.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; When the fault occurs, it looks like this: Apache has 256 running<br>

&gt;&gt; &gt;&gt; processes<br>

&gt;&gt; &gt;&gt; &gt; and load on the web front end drops to near 0.  PGPool has all 32<br>

&gt;&gt; sockets<br>

&gt;&gt; &gt;&gt; &gt; that face Apache filled, and all 256 sockets that face Postgres<br>

&gt;&gt; filled.<br>

&gt;&gt; &gt;&gt; &gt;  Postgres has 256 connections and its load goes to near 0.  If you<br>

&gt;&gt; try to<br>

&gt;&gt; &gt;&gt; &gt; connect to PGPool from the command line, it will time out in<br>

&gt;&gt; connecting,<br>

&gt;&gt; &gt;&gt; or<br>

&gt;&gt; &gt;&gt; &gt; sometimes partially connect and then receive a connection closed<br>

&gt;&gt; message.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Using our test cluster, I ran some tests that give me high confidence<br>

&gt;&gt; &gt;&gt; that<br>

&gt;&gt; &gt;&gt; &gt; PGPool is actually working correctly, as are Apache and Postgres, and<br>

&gt;&gt; &gt;&gt; that<br>

&gt;&gt; &gt;&gt; &gt; the fundamental problem is just a badly tuned configuration.  This is<br>

&gt;&gt; the<br>

&gt;&gt; &gt;&gt; &gt; test that shows that best:<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;    1. Stop Apache, restart PGPool<br>

&gt;&gt; &gt;&gt; &gt;    2. Start up 100 psql command line clients to connect to PGPool<br>

&gt;&gt; with a<br>

&gt;&gt; &gt;&gt; &gt;    single database<br>

&gt;&gt; &gt;&gt; &gt;    3. The first 32 psql clients connect and work fine<br>

&gt;&gt; &gt;&gt; &gt;    4. The 33rd psql client blocks waiting to connect (it will time out<br>

&gt;&gt; &gt;&gt; &gt;    after 30 seconds, but in this test we don&#39;t wait that long)<br>

&gt;&gt; &gt;&gt; &gt;    5. fg the psql client #1, then exit the client, freeing up one of<br>

&gt;&gt; &gt;&gt; &gt;    PGPool&#39;s connections<br>

&gt;&gt; &gt;&gt; &gt;    6. One of the 68 blocking psql clients now gets through and can run<br>

&gt;&gt; &gt;&gt; &gt;    queries<br>

&gt;&gt; &gt;&gt; &gt;    7. Any of the 32 connected psql clients can get through as well<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; This shows that PGPool is working as expected.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Now we try a test that is more like the real world:<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;    1. Restart PGPool<br>

&gt;&gt; &gt;&gt; &gt;    2. Start up 10-20 psql command line clients.  These are simulating<br>

&gt;&gt; &gt;&gt; long<br>

&gt;&gt; &gt;&gt; &gt;    running php processes.<br>

&gt;&gt; &gt;&gt; &gt;    3. Start siege web testing tool with 100-200 concurrent requests to<br>

&gt;&gt; &gt;&gt; &gt;    Apache.<br>

&gt;&gt; &gt;&gt; &gt;    4. At 100 clients, the response time from Apache slows down and the<br>

&gt;&gt; &gt;&gt; time<br>

&gt;&gt; &gt;&gt; &gt;    taken to service each request goes up to around 15s (from &lt; 1s).<br>

&gt;&gt;  Psql<br>

&gt;&gt; &gt;&gt; &gt;    command line client can get through most of the time, but it takes<br>

&gt;&gt; &gt;&gt; some<br>

&gt;&gt; &gt;&gt; &gt;    time to connect as it is contending for one of the 32 slots to<br>

&gt;&gt; PGPool<br>

&gt;&gt; &gt;&gt; with<br>

&gt;&gt; &gt;&gt; &gt;    all of the Apache processes.<br>

&gt;&gt; &gt;&gt; &gt;    5. At 200 clients, response time goes up more and we start to see<br>

&gt;&gt; &gt;&gt; &gt;    failures in Apache, as well as &quot;Could not connect to DB&quot; responses.<br>

&gt;&gt; &gt;&gt;  Psql<br>

&gt;&gt; &gt;&gt; &gt;    command line client often will timeout before it gets a connection<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt; &gt;    PGPool.<br>

&gt;&gt; &gt;&gt; &gt;    6. Once lots of failures are happening at the 200 clients level,<br>

&gt;&gt; load<br>

&gt;&gt; &gt;&gt; on<br>

&gt;&gt; &gt;&gt; &gt;    Postgres goes to near 0 as well as load on Apache.<br>

&gt;&gt; &gt;&gt; &gt;    7. Failure will also happen with 250 siege clients and no psql<br>

&gt;&gt; command<br>

&gt;&gt; &gt;&gt; &gt;    line clients running.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; In step 4, I believe the response time from Apache goes up due to<br>

&gt;&gt; PGPool<br>

&gt;&gt; &gt;&gt; &gt; having to spend so much time managing incoming connections from<br>

&gt;&gt; Apache as<br>

&gt;&gt; &gt;&gt; &gt; well as managing connections to Postgres.  Database load is not high<br>

&gt;&gt; in<br>

&gt;&gt; &gt;&gt; &gt; this case, so the slowness is not due to Postgres being overloaded.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; I believe that on the live cluster the load is even more severe as<br>

&gt;&gt; there<br>

&gt;&gt; &gt;&gt; &gt; are more databases being used, and occasionally high load, long<br>

&gt;&gt; running<br>

&gt;&gt; &gt;&gt; &gt; queries.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; It&#39;s also notable that restarting Apache has been our fix to get<br>

&gt;&gt; &gt;&gt; everything<br>

&gt;&gt; &gt;&gt; &gt; running again.  I believe that this is because PGPool gets a chance to<br>

&gt;&gt; &gt;&gt; &gt; catch up, which it does fairly quickly, and resumes with 32 available<br>

&gt;&gt; &gt;&gt; &gt; sockets for Apache.  If we do nothing, PGPool reaches a 10 minute<br>

&gt;&gt; timeout<br>

&gt;&gt; &gt;&gt; &gt; specified in its config, and closes all 32 sockets, which causes<br>

&gt;&gt; &gt;&gt; everything<br>

&gt;&gt; &gt;&gt; &gt; to resume working again.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; In the end, I believe the problem is that Apache is just sending too<br>

&gt;&gt; many<br>

&gt;&gt; &gt;&gt; &gt; requests to PGPool, and PGPool spends all of its time managing<br>

&gt;&gt; &gt;&gt; connections,<br>

&gt;&gt; &gt;&gt; &gt; causing it to be slow at everything.  That slowness and contention<br>

&gt;&gt; for 32<br>

&gt;&gt; &gt;&gt; &gt; slots among up to 256 Apache processes leads to connection timeouts<br>

&gt;&gt; (it<br>

&gt;&gt; &gt;&gt; &gt; should be noted that Apache seems to have no connect timeout defined<br>

&gt;&gt; and<br>

&gt;&gt; &gt;&gt; &gt; will wait for a connection until the PHP max execution time is<br>

&gt;&gt; reached).<br>

&gt;&gt; &gt;&gt; &gt;  Once a threshold is reached, we enter a state where no Apache<br>

&gt;&gt; process is<br>

&gt;&gt; &gt;&gt; &gt; able to connect to PGPool in enough time and we see the browser<br>

&gt;&gt; requests<br>

&gt;&gt; &gt;&gt; &gt; either timing out entirely or returning the &quot;Could not connect to DB&quot;<br>

&gt;&gt; &gt;&gt; &gt; message.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; The proposed solution to all of this is to adjust the configuration of<br>

&gt;&gt; &gt;&gt; &gt; PGPool and Apache to ensure that we can never reach this overwhelmed<br>

&gt;&gt; &gt;&gt; state.<br>

&gt;&gt; &gt;&gt; &gt;  Specifically, we need to increase the number of PGPool processes and<br>

&gt;&gt; &gt;&gt; &gt; decrease the maximum number of Apache processes.  We need to be<br>

&gt;&gt; careful<br>

&gt;&gt; &gt;&gt; as<br>

&gt;&gt; &gt;&gt; &gt; we do this, as there is surely an upper limit to how many PGPool<br>

&gt;&gt; &gt;&gt; processes<br>

&gt;&gt; &gt;&gt; &gt; can be sustained and increasing that increases overhead on Postgres<br>

&gt;&gt; since<br>

&gt;&gt; &gt;&gt; &gt; it increases the number of persistent open connections between it and<br>

&gt;&gt; &gt;&gt; &gt; PGPool.  The same for Apache, we need to lower MaxClients but not so<br>

&gt;&gt; low<br>

&gt;&gt; &gt;&gt; &gt; that it turns away requests that could have been handled.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; There are a few other adjustments that I believe will help that I&#39;ll<br>

&gt;&gt; &gt;&gt; &gt; describe below.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Apache MaxClients:<br>

&gt;&gt; &gt;&gt; &gt; This is how many concurrent Apache processes can run at once.  The<br>

&gt;&gt; &gt;&gt; current<br>

&gt;&gt; &gt;&gt; &gt; setting of 256 is clearly more than the system can handle.  I suggest<br>

&gt;&gt; we<br>

&gt;&gt; &gt;&gt; &gt; drop it down to 128 to begin with and monitor the results.  I&#39;d like<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt; &gt; make this change before the others.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Apache PHP DB connection timeout:<br>

&gt;&gt; &gt;&gt; &gt; I can see that it&#39;s waiting as long as 150s before returning with<br>

&gt;&gt; &#39;Could<br>

&gt;&gt; &gt;&gt; &gt; not connect to DB&#39; at times, which indicates that no timeout is being<br>

&gt;&gt; &gt;&gt; &gt; specified.  This must be sent as part of the connection string, like:<br>

&gt;&gt; &gt;&gt; &gt; &quot;pgsql:host=127.0.0.1;port=5432;dbname=vw_bepensa;timeout=10&quot;.  I&#39;m<br>

&gt;&gt; not<br>

&gt;&gt; &gt;&gt; &gt; sure at this point what a reasonable value would be, but I&#39;m thinking<br>

&gt;&gt; 10<br>

&gt;&gt; &gt;&gt; &gt; seconds is a good start.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; PGPool backends:<br>

&gt;&gt; &gt;&gt; &gt; We currently have 2 backends specified in the config.  One has<br>

&gt;&gt; &gt;&gt; &gt; backend_weight of 1 and the other, that is not used, has<br>

&gt;&gt; backedn_weight<br>

&gt;&gt; &gt;&gt; of<br>

&gt;&gt; &gt;&gt; &gt; 0.  I have confirmed that whenever a client connects to PGPool and<br>

&gt;&gt; &gt;&gt; requests<br>

&gt;&gt; &gt;&gt; &gt; a connection to a database, for example, PGPool opens a persistent<br>

&gt;&gt; &gt;&gt; &gt; connection to both backends.  We will comment out the backend that<br>

&gt;&gt; &gt;&gt; &gt; specifies the backup server, which should help PGPool a lot.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; PGPool max_init_children:<br>

&gt;&gt; &gt;&gt; &gt; This is the config parameter that specifies how many PGPool processes<br>

&gt;&gt; can<br>

&gt;&gt; &gt;&gt; &gt; run, and therefore how many sockets are available to Apache.<br>

&gt;&gt;  Increasing<br>

&gt;&gt; &gt;&gt; &gt; this number by one increases the number of persistent connections to<br>

&gt;&gt; the<br>

&gt;&gt; &gt;&gt; DB<br>

&gt;&gt; &gt;&gt; &gt; by max_pool, currently 8.  Postgres is currently configured to only<br>

&gt;&gt; allow<br>

&gt;&gt; &gt;&gt; &gt; 300 connections maximum, so that would need to be changed as well.<br>

&gt;&gt;  More<br>

&gt;&gt; &gt;&gt; &gt; research and testing is needed to find the sweet spot.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; PGPool max_pool:<br>

&gt;&gt; &gt;&gt; &gt; This parameter specifies how many different DBs each PGPool process<br>

&gt;&gt; keeps<br>

&gt;&gt; &gt;&gt; &gt; in its cache of persistent connections to Postgres.  It is currently<br>

&gt;&gt; set<br>

&gt;&gt; &gt;&gt; to<br>

&gt;&gt; &gt;&gt; &gt; 8, yet we have more than 8 different databases in production (I see 12<br>

&gt;&gt; &gt;&gt; &gt; connected right now).  If a connection to a database is requested of<br>

&gt;&gt; &gt;&gt; PGPool<br>

&gt;&gt; &gt;&gt; &gt; by Apache, and the PGPool process servicing Apache&#39;s request does not<br>

&gt;&gt; &gt;&gt; have<br>

&gt;&gt; &gt;&gt; &gt; a connection to that database, it will drop one and use the slot to<br>

&gt;&gt; make<br>

&gt;&gt; &gt;&gt; a<br>

&gt;&gt; &gt;&gt; &gt; new connection to the requested DB on Postgres.  If max_pool was set<br>

&gt;&gt; to<br>

&gt;&gt; &gt;&gt; 12,<br>

&gt;&gt; &gt;&gt; &gt; this would stop happening and there would always be a persistent<br>

&gt;&gt; &gt;&gt; connection<br>

&gt;&gt; &gt;&gt; &gt; to the db requested ready to go when requested by apache.  Postgres<br>

&gt;&gt; would<br>

&gt;&gt; &gt;&gt; &gt; ideally get no new db connections.  Increasing from 8 to 12 would mean<br>

&gt;&gt; &gt;&gt; that<br>

&gt;&gt; &gt;&gt; &gt; total connections to Postgres would be 32*12 = 384, which is above<br>

&gt;&gt; &gt;&gt; &gt; Postgres&#39;s connection limit.  So this parameter, max_init_children,<br>

&gt;&gt; and<br>

&gt;&gt; &gt;&gt; &gt; Postgres&#39;s connection limit must all be tuned to eachother, and kept<br>

&gt;&gt; low<br>

&gt;&gt; &gt;&gt; &gt; enough to not overwhelm Postgres.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; I suggest that we begin by commenting out the second backend in<br>

&gt;&gt; &gt;&gt; &gt; pgpool.conf, and lowering MaxClients on Apache to 128.  This should<br>

&gt;&gt; &gt;&gt; prevent<br>

&gt;&gt; &gt;&gt; &gt; PGPool being hammered past the point that it can handle.  If PGPool<br>

&gt;&gt; does<br>

&gt;&gt; &gt;&gt; &gt; fall behind, only 128 Apache connections will be hitting PGPool and it<br>

&gt;&gt; &gt;&gt; &gt; seems to be able to handle that many in an orderly fashion.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; I also think adding a PHP connection timeout will help keep the system<br>

&gt;&gt; &gt;&gt; from<br>

&gt;&gt; &gt;&gt; &gt; grinding to a stop.<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Thank you for reading and any help or insight you can provide!<br>

&gt;&gt; &gt;&gt; &gt;<br>

&gt;&gt; &gt;&gt; &gt; Justin Cooper<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt;<br>

</div></div></blockquote></div><br></div>