[pgpool-general: 394] Re: Trouble with connection during long/large data queries

Tatsuo Ishii ishii at postgresql.org
Fri Apr 27 00:56:01 JST 2012


Yes, the patch is in 3.1.3. Strange thing is, it's just a patch for
ssl=on route. ssl=off is nothing changed. So my guess is, we fixed
ssl=on case which unfortunately revealed ssl=off case problem. I will
continue to chase the ssl=off case problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> My apologies for not getting back sooner: the success in my initial
> testing of the patch (which, far as I can tell, made it into 3.1.3)
> was short lived.  What I have found is that about thirty percent of
> the tests are successful, but that the rest now fail in the same
> manner as when I was running with "ssl = off" in pgpool.conf (with
> 3.1.2).  To summarize:
> 
> System: (test) postgresql 9.1.3 and pgpool-ii 3.1.3 (this has also
> been reproduced with postgres 9.1.2 and pgpool-ii 3.1.2 with the
> pool_ssl.patch applied)
> 
> With "ssl = on" in pgpool.conf
> ~30% success
> ~70% failure.  The failure in my case is that the dbf files created as
> part of the shapefile creation process are empty.  This is the same as
> described below when "ssl = off" was set in pgpool.conf using stock
> 3.1.2.  Also similar to earlier behavior is that there is nothing
> seemingly obvious in postgres or pgpool logs (no signs of errors nor
> abnormal exits, etc.).  All processes finish as though successful.
> 
> I do not believe this to be an issue with Ruby's shapelib or other
> items as, with or without SSL, these build successfully every time
> when connecting directly to the database.
> 
> In some ways this is worse than before in that, before, the process
> noticeably failed.  Now it completes with no obvious sign of error
> (other than the empty dbf files in ~70% of cases).
> 
> I am at a loss on this as far what I can send to the list/developers.
> I would be happy to send any logs and such that I can, please advise
> as to the options I might use in pgpool.conf or when invoking pgpool
> (again, at present, I see nothing in the pgpool or postgres logs which
> look useful).
> 
> As for our use of pgpool, it is still beneficial for just about
> everything we do, and we do have workarounds for this problem.
> 
> Thanks,
> 
> Joe
> 
> On 2012-04-03 17:09, Joe Barnett wrote:
>> Thank you for this patch.  It works in my initial testing (including
>> "ssl = on" in pgpool.conf).  I will do some additional testing, but so
>> far this looks great!
>>
>> Again, thanks,
>>
>> Joe Barnett
>>
>> On 2012-04-03 04:26, Tatsuo Ishii wrote:
>>> I was able to reproduce your problem while retrieving large data from
>>> PostgreSQL, with SSL enabled between pgpool and PostgreSQL. Included
>>> is the patch which should fix the problem(this patch solved the
>>> reporduced problem in our environment). Please try if you like.
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>>> Thank you for the responses.
>>>>
>>>> We are running pgpool-ii 3.1.2 against postgres 9.1.2.  While we
>>>> employ (postgres') streaming replication, we point pgpool only to the
>>>> master server at this time (running a pgpool daemon local to each
>>>> client machine).  In test, however, we do point to both the master and
>>>> the (hot) standby server.  The pgpool behavior we are seeing is
>>>> identical on both our test and production systems.
>>>>
>>>> I followed the suggestion of disabling SSL at the pgpool level ("ssl =
>>>> off" in pgpool.conf; prior I had just set PGSSLMODE=disable in the
>>>> shell prior to running my Ruby script).  This produced some
>>>> interesting results (which I will try to reproduce): the script ran
>>>> without error, and errors were logged in neither the postgres log nor
>>>> the pgpool log, but the resulting output contained no data.  The
>>>> output, in this case, are shapefiles (ESRI formatted geographic data)
>>>> each of which contain several pieces including a .dbf file.  The .dbf
>>>> files were empty, though everything else looked as would be expected.
>>>> As the logs give no clue, I can not really identify what happened.
>>>> Again, I will try to reproduce this behavior.
>>>>
>>>> In a more normal setting (SSL enabled both from the client to pgpool
>>>> and from pgpool to the postgres server) we get an expected amount of
>>>> data for the queries that run before the connection goes away.  In
>>>> cases where this is run through pgpool we get about 1/3 of the way
>>>> through the job before the connection goes away.  When we access the
>>>> database directly, we get all of the data expected.  As I think about
>>>> this, I might be able to modify the script to request a new connection
>>>> for each iteration and that might prove to be another
>>>> workaround--though even if it works, it will still only buy us some
>>>> time rather than fix the root problem.
>>>>
>>>> More on the issue of (nearly) empty return sets if I can reproduce
>>>> them.  Otherwise, I hope this info helps shed some light on this.
>>>>
>>>> Thanks,
>>>>
>>>> Joe
>>>>
>>>> On 2012-03-26 18:03, Matt Wise wrote:
>>>>> I believe we may be seeing this problem as well, but with much
>>>>> shorter querytimes. We have a few jobs that run database transactions
>>>>> for up to 15minutes or so. They will do 500-1000 individual database
>>>>> queries, that range from 1-30 seconds in length. We continue to see
>>>>> failures when running through PGPool and end up having to run these
>>>>> through our legacy PGBouncer service directly on the database hosts.
>>>>>
>>>>> We DO use SSL between PGPool and the backend PostgreSQL databases. We
>>>>> also use SSL in between PGPool and the clients (which is a local copy
>>>>> of PGBouncer on each client), but we do that with Stunnel so its
>>>>> transparent to the applications.
>>>>>
>>>>> ―Matt
>>>>>
>>>>>
>>>>>
>>>>> On Mar 26, 2012, at 5:07 PM, Tatsuo Ishii wrote:
>>>>>
>>>>>> What pgpoo version are you using?
>>>>>>
>>>>>> Also if you disable SSL between pgpool and PostgreSQL, do you see any
>>>>>> difference?
>>>>>> --
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS, Inc. Japan
>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>
>>>>>>> We are using pgpool in front of a postrgesql 9.1 database and,
>>>>>>> generally, we are experiencing tremendous improvement in performance
>>>>>>> (and even stability) over accessing the database directly.  We are
>>>>>>> seeing one behavior, however, which is troubling: long queries which
>>>>>>> produce much resulting data seem to kill (or are killed by) the pgpool
>>>>>>> process handling the connection.  By long/large, I mean connections
>>>>>>> lasting ninety minutes or more (often comprising just a handful of
>>>>>>> long running queries) and returning >1GB of data.  Specifically, this
>>>>>>> is happening when querying (and returning) large, global sets of
>>>>>>> geographic data (which are not conducive to being run as smaller,
>>>>>>> independent queries), though we can reproduce this problem with any
>>>>>>> query taking sufficient time and returning sufficient data.
>>>>>>>
>>>>>>> The workaround is to run these queries against the database directly,
>>>>>>> bypassing pgpool.  We would, however, like to require nearly all
>>>>>>> access to the database to go through pgpool, restricting direct access
>>>>>>> to things of an administrative nature.
>>>>>>>
>>>>>>> Ideas, advice, and criticism welcome, thanks,
>>>>>>>
>>>>>>> Joe Barnett
>>>>>>>
>>>>>>> --- Errors and logs follow ---
>>>>>>>
>>>>>>> ### Error returned when connection goes away (from Ruby script):
>>>>>>>
>>>>>>> (with SSL enabled)
>>>>>>> ./bin/shapes.rb:150:in `exec': SSL SYSCALL error: EOF detected
>>>>>>> (PG::Error)
>>>>>>>
>>>>>>> (with SSL disabled)
>>>>>>> ./bin/shapes.rb:150:in `exec': PG::Error
>>>>>>>
>>>>>>> ### Excerpts from pgpool.log:
>>>>>>>
>>>>>>> 2012-03-26 13:16:35 LOG: pid 14860: statement: SELECT
>>>>>>> max(octet_length(hist_file_path::text)) from temp_shapes_daily
>>>>>>> 2012-03-26 13:16:35 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT max(octet_length(hist_file_path::text)) from
>>>>>>> temp_shapes_daily
>>>>>>> 2012-03-26 13:16:36 LOG: pid 14860: statement: SELECT
>>>>>>> max(octet_length(hist_file_name::text)) from temp_shapes_daily
>>>>>>> 2012-03-26 13:16:36 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT max(octet_length(hist_file_name::text)) from
>>>>>>> temp_shapes_daily
>>>>>>> 2012-03-26 13:16:37 LOG: pid 14860: statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>>
>>>>>>> 2012-03-26 13:16:37 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>>
>>>>>>> 2012-03-26 13:17:28 LOG: pid 14860: statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> 2012-03-26 13:17:28 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> 2012-03-26 13:17:33 LOG: pid 14846: connection received:
>>>>>>> host=localhost port=39636
>>>>>>> 2012-03-26 13:18:08 LOG: pid 14860: statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>>
>>>>>>> 2012-03-26 13:18:08 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>>
>>>>>>> 2012-03-26 13:22:38 LOG: pid 14860: statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> 2012-03-26 13:22:38 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>>>>> statement: SELECT *
>>>>>>>                        FROM temp_shapes_daily
>>>>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> 2012-03-26 13:22:43 ERROR: pid 14860: pool_read2: EOF encountered with
>>>>>>> backend
>>>>>>> 2012-03-26 13:22:43 LOG: pid 14860: do_child: exits with status 1 due
>>>>>>> to error
>>>>>>>
>>>>>>> (I have pgpool logs with DEBUG info as well, but they are very large
>>>>>>> (and I am not certain how much useful information they contain). I
>>>>>>> can certainly provide excerpts from those logs if anyone thinks they
>>>>>>> are useful--perhaps suggest things for which I might look.)
>>>>>>>
>>>>>>> ### Excerpts from postgresql log:
>>>>>>>
>>>>>>> LOG:  could not send data to client: Connection reset by peer
>>>>>>> STATEMENT:                      SELECT *
>>>>>>>                                FROM temp_shapes_daily
>>>>>>>                                WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                                AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> LOG:  could not send data to client: Broken pipe
>>>>>>> STATEMENT:                      SELECT *
>>>>>>>                                FROM temp_shapes_daily
>>>>>>>                                WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>>>>                                AND original_product_id LIKE 'wac%'
>>>>>>>
>>>>>>> LOG:  SSL error: bad write retry
>>>>>>> LOG:  could not send data to client: Connection reset by peer
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pgpool-general mailing list
>>>>>>> pgpool-general at pgpool.net
>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>>> _______________________________________________
>>>>>> pgpool-general mailing list
>>>>>> pgpool-general at pgpool.net
>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 
> -- 
> Joe Barnett
> 623.670.1326
> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


More information about the pgpool-general mailing list