[pgpool-general: 391] Re: Trouble with connection during long/large data queries

Joe Barnett joe.barnett at mr72.com
Thu Apr 26 04:12:31 JST 2012


My apologies for not getting back sooner: the success in my initial 
testing of the patch (which, far as I can tell, made it into 3.1.3) was 
short lived.  What I have found is that about thirty percent of the 
tests are successful, but that the rest now fail in the same manner as 
when I was running with "ssl = off" in pgpool.conf (with 3.1.2).  To 
summarize:

System: (test) postgresql 9.1.3 and pgpool-ii 3.1.3  (this has also 
been reproduced with postgres 9.1.2 and pgpool-ii 3.1.2 with the 
pool_ssl.patch applied)

With "ssl = on" in pgpool.conf
~30% success
~70% failure.  The failure in my case is that the dbf files created as 
part of the shapefile creation process are empty.  This is the same as 
described below when "ssl = off" was set in pgpool.conf using stock 
3.1.2.  Also similar to earlier behavior is that there is nothing 
seemingly obvious in postgres or pgpool logs (no signs of errors nor 
abnormal exits, etc.).  All processes finish as though successful.

I do not believe this to be an issue with Ruby's shapelib or other 
items as, with or without SSL, these build successfully every time when 
connecting directly to the database.

In some ways this is worse than before in that, before, the process 
noticeably failed.  Now it completes with no obvious sign of error 
(other than the empty dbf files in ~70% of cases).

I am at a loss on this as far what I can send to the list/developers.  
I would be happy to send any logs and such that I can, please advise as 
to the options I might use in pgpool.conf or when invoking pgpool 
(again, at present, I see nothing in the pgpool or postgres logs which 
look useful).

As for our use of pgpool, it is still beneficial for just about 
everything we do, and we do have workarounds for this problem.

Thanks,

Joe

On 2012-04-03 17:09, Joe Barnett wrote:
> Thank you for this patch.  It works in my initial testing (including
> "ssl = on" in pgpool.conf).  I will do some additional testing, but 
> so
> far this looks great!
>
> Again, thanks,
>
> Joe Barnett
>
> On 2012-04-03 04:26, Tatsuo Ishii wrote:
>> I was able to reproduce your problem while retrieving large data 
>> from
>> PostgreSQL, with SSL enabled between pgpool and PostgreSQL. Included
>> is the patch which should fix the problem(this patch solved the
>> reporduced problem in our environment). Please try if you like.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> Thank you for the responses.
>>>
>>> We are running pgpool-ii 3.1.2 against postgres 9.1.2.  While we
>>> employ (postgres') streaming replication, we point pgpool only to 
>>> the
>>> master server at this time (running a pgpool daemon local to each
>>> client machine).  In test, however, we do point to both the master 
>>> and
>>> the (hot) standby server.  The pgpool behavior we are seeing is
>>> identical on both our test and production systems.
>>>
>>> I followed the suggestion of disabling SSL at the pgpool level 
>>> ("ssl =
>>> off" in pgpool.conf; prior I had just set PGSSLMODE=disable in the
>>> shell prior to running my Ruby script).  This produced some
>>> interesting results (which I will try to reproduce): the script ran
>>> without error, and errors were logged in neither the postgres log 
>>> nor
>>> the pgpool log, but the resulting output contained no data.  The
>>> output, in this case, are shapefiles (ESRI formatted geographic 
>>> data)
>>> each of which contain several pieces including a .dbf file.  The 
>>> .dbf
>>> files were empty, though everything else looked as would be 
>>> expected.
>>> As the logs give no clue, I can not really identify what happened.
>>> Again, I will try to reproduce this behavior.
>>>
>>> In a more normal setting (SSL enabled both from the client to 
>>> pgpool
>>> and from pgpool to the postgres server) we get an expected amount 
>>> of
>>> data for the queries that run before the connection goes away.  In
>>> cases where this is run through pgpool we get about 1/3 of the way
>>> through the job before the connection goes away.  When we access 
>>> the
>>> database directly, we get all of the data expected.  As I think 
>>> about
>>> this, I might be able to modify the script to request a new 
>>> connection
>>> for each iteration and that might prove to be another
>>> workaround--though even if it works, it will still only buy us some
>>> time rather than fix the root problem.
>>>
>>> More on the issue of (nearly) empty return sets if I can reproduce
>>> them.  Otherwise, I hope this info helps shed some light on this.
>>>
>>> Thanks,
>>>
>>> Joe
>>>
>>> On 2012-03-26 18:03, Matt Wise wrote:
>>>> I believe we may be seeing this problem as well, but with much
>>>> shorter querytimes. We have a few jobs that run database 
>>>> transactions
>>>> for up to 15minutes or so. They will do 500-1000 individual 
>>>> database
>>>> queries, that range from 1-30 seconds in length. We continue to 
>>>> see
>>>> failures when running through PGPool and end up having to run 
>>>> these
>>>> through our legacy PGBouncer service directly on the database 
>>>> hosts.
>>>>
>>>> We DO use SSL between PGPool and the backend PostgreSQL databases. 
>>>> We
>>>> also use SSL in between PGPool and the clients (which is a local 
>>>> copy
>>>> of PGBouncer on each client), but we do that with Stunnel so its
>>>> transparent to the applications.
>>>>
>>>> ―Matt
>>>>
>>>>
>>>>
>>>> On Mar 26, 2012, at 5:07 PM, Tatsuo Ishii wrote:
>>>>
>>>>> What pgpoo version are you using?
>>>>>
>>>>> Also if you disable SSL between pgpool and PostgreSQL, do you see 
>>>>> any
>>>>> difference?
>>>>> --
>>>>> Tatsuo Ishii
>>>>> SRA OSS, Inc. Japan
>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>> Japanese: http://www.sraoss.co.jp
>>>>>
>>>>>> We are using pgpool in front of a postrgesql 9.1 database and,
>>>>>> generally, we are experiencing tremendous improvement in 
>>>>>> performance
>>>>>> (and even stability) over accessing the database directly.  We 
>>>>>> are
>>>>>> seeing one behavior, however, which is troubling: long queries 
>>>>>> which
>>>>>> produce much resulting data seem to kill (or are killed by) the 
>>>>>> pgpool
>>>>>> process handling the connection.  By long/large, I mean 
>>>>>> connections
>>>>>> lasting ninety minutes or more (often comprising just a handful 
>>>>>> of
>>>>>> long running queries) and returning >1GB of data.  Specifically, 
>>>>>> this
>>>>>> is happening when querying (and returning) large, global sets of
>>>>>> geographic data (which are not conducive to being run as 
>>>>>> smaller,
>>>>>> independent queries), though we can reproduce this problem with 
>>>>>> any
>>>>>> query taking sufficient time and returning sufficient data.
>>>>>>
>>>>>> The workaround is to run these queries against the database 
>>>>>> directly,
>>>>>> bypassing pgpool.  We would, however, like to require nearly all
>>>>>> access to the database to go through pgpool, restricting direct 
>>>>>> access
>>>>>> to things of an administrative nature.
>>>>>>
>>>>>> Ideas, advice, and criticism welcome, thanks,
>>>>>>
>>>>>> Joe Barnett
>>>>>>
>>>>>> --- Errors and logs follow ---
>>>>>>
>>>>>> ### Error returned when connection goes away (from Ruby script):
>>>>>>
>>>>>> (with SSL enabled)
>>>>>> ./bin/shapes.rb:150:in `exec': SSL SYSCALL error: EOF detected
>>>>>> (PG::Error)
>>>>>>
>>>>>> (with SSL disabled)
>>>>>> ./bin/shapes.rb:150:in `exec': PG::Error
>>>>>>
>>>>>> ### Excerpts from pgpool.log:
>>>>>>
>>>>>> 2012-03-26 13:16:35 LOG: pid 14860: statement: SELECT
>>>>>> max(octet_length(hist_file_path::text)) from temp_shapes_daily
>>>>>> 2012-03-26 13:16:35 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT max(octet_length(hist_file_path::text)) from
>>>>>> temp_shapes_daily
>>>>>> 2012-03-26 13:16:36 LOG: pid 14860: statement: SELECT
>>>>>> max(octet_length(hist_file_name::text)) from temp_shapes_daily
>>>>>> 2012-03-26 13:16:36 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT max(octet_length(hist_file_name::text)) from
>>>>>> temp_shapes_daily
>>>>>> 2012-03-26 13:16:37 LOG: pid 14860: statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 
>>>>>> 'COMMISSIONING'
>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>
>>>>>> 2012-03-26 13:16:37 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 
>>>>>> 'COMMISSIONING'
>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>
>>>>>> 2012-03-26 13:17:28 LOG: pid 14860: statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 
>>>>>> 'COMMISSIONING'
>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>
>>>>>> 2012-03-26 13:17:28 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 
>>>>>> 'COMMISSIONING'
>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>
>>>>>> 2012-03-26 13:17:33 LOG: pid 14846: connection received:
>>>>>> host=localhost port=39636
>>>>>> 2012-03-26 13:18:08 LOG: pid 14860: statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 'NOMINAL 
>>>>>> MISSION'
>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>
>>>>>> 2012-03-26 13:18:08 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 'NOMINAL 
>>>>>> MISSION'
>>>>>>                        AND original_product_id LIKE 'nac%'
>>>>>>
>>>>>> 2012-03-26 13:22:38 LOG: pid 14860: statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 'NOMINAL 
>>>>>> MISSION'
>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>
>>>>>> 2012-03-26 13:22:38 LOG: pid 14860: DB node id: 0 backend pid: 
>>>>>> 16005
>>>>>> statement: SELECT *
>>>>>>                        FROM temp_shapes_daily
>>>>>>                        WHERE mission_phase_name = 'NOMINAL 
>>>>>> MISSION'
>>>>>>                        AND original_product_id LIKE 'wac%'
>>>>>>
>>>>>> 2012-03-26 13:22:43 ERROR: pid 14860: pool_read2: EOF 
>>>>>> encountered with
>>>>>> backend
>>>>>> 2012-03-26 13:22:43 LOG: pid 14860: do_child: exits with status 
>>>>>> 1 due
>>>>>> to error
>>>>>>
>>>>>> (I have pgpool logs with DEBUG info as well, but they are very 
>>>>>> large
>>>>>> (and I am not certain how much useful information they contain). 
>>>>>> I
>>>>>> can certainly provide excerpts from those logs if anyone thinks 
>>>>>> they
>>>>>> are useful--perhaps suggest things for which I might look.)
>>>>>>
>>>>>> ### Excerpts from postgresql log:
>>>>>>
>>>>>> LOG:  could not send data to client: Connection reset by peer
>>>>>> STATEMENT:                      SELECT *
>>>>>>                                FROM temp_shapes_daily
>>>>>>                                WHERE mission_phase_name = 
>>>>>> 'NOMINAL MISSION'
>>>>>>                                AND original_product_id LIKE 
>>>>>> 'wac%'
>>>>>>
>>>>>> LOG:  could not send data to client: Broken pipe
>>>>>> STATEMENT:                      SELECT *
>>>>>>                                FROM temp_shapes_daily
>>>>>>                                WHERE mission_phase_name = 
>>>>>> 'NOMINAL MISSION'
>>>>>>                                AND original_product_id LIKE 
>>>>>> 'wac%'
>>>>>>
>>>>>> LOG:  SSL error: bad write retry
>>>>>> LOG:  could not send data to client: Connection reset by peer
>>>>>>
>>>>>> _______________________________________________
>>>>>> pgpool-general mailing list
>>>>>> pgpool-general at pgpool.net
>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>> _______________________________________________
>>>>> pgpool-general mailing list
>>>>> pgpool-general at pgpool.net
>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general

-- 
Joe Barnett
623.670.1326



More information about the pgpool-general mailing list