[pgpool-general: 812] Re: load balancing seems to be bottlenecked by performance of master

Tatsuo Ishii ishii at postgresql.org
Thu Jul 26 07:35:55 JST 2012


From: Lonni J Friedman <netllama at gmail.com>
Subject: Re: [pgpool-general: 597] Re: load balancing seems to be bottlenecked by performance of master
Date: Wed, 25 Jul 2012 12:05:29 -0700
Message-ID: <CAP=oouG8DRV+FQw58xz=3YZCJwp2RiKZjPmhdZXsq+xkbndbPg at mail.gmail.com>

> On Mon, Jun 11, 2012 at 5:36 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> On Mon, Jun 11, 2012 at 4:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>> On Thu, Jun 7, 2012 at 11:01 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>>>>>> On Tue, Jun 5, 2012 at 6:06 PM, Lonni J Friedman <netllama at gmail.com> wrote:
>>>>>>> On Tue, Jun 5, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>>>>> On Tue, Jun 5, 2012 at 4:13 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>>>> On Mon, Jun 4, 2012 at 10:21 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>>>>>>>>> On Mon, Jun 4, 2012 at 3:40 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>>>>>>>>>>> You're correct, I only needed to perform a reload.  I kept this change
>>>>>>>>>>>>>>> enabled for all of 36 seconds, and in that time there were 1597 times
>>>>>>>>>>>>>>> that the following query was logged originating from the pgpool server
>>>>>>>>>>>>>>> IP address:
>>>>>>>>>>>>>>> SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3204 times that the following query was logged originating from the
>>>>>>>>>>>>>>> pgpool server IP address:
>>>>>>>>>>>>>>> SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.relname ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Those numbers seem excessive to me, but perhaps this is
>>>>>>>>>>>>>>> expected/normal?  It definitely seems like it would explain why perf
>>>>>>>>>>>>>>> always degrades whenever the master is very busy, if that volume of
>>>>>>>>>>>>>>> queries needs to be sustained.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That depends on your use case. If client's query involves many tables,
>>>>>>>>>>>>>> it requires many catalog lookups. Also the catalog cache of pgpool's
>>>>>>>>>>>>>> life time is same as pgpool child process lifetime.  If you would show
>>>>>>>>>>>>>> me the complete log, I could make more precise analysis.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sure, attached as pg.log.gz.
>>>>>>>>>>>>
>>>>>>>>>>>> Analyzing the log I noticed several things:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) I see many sessions which frequently connects/disconnects(over 1800
>>>>>>>>>>>>   in 36 seconds). That is, each session only lasts 20 ms in average.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) A process issues exactly same query more than once. This suggests
>>>>>>>>>>>>   that relcache (catalog cache) size might be too small for your
>>>>>>>>>>>>   environment. The size of the cache is vary from cache to cache, but
>>>>>>>>>>>>   typical size is 128. So if you access more than 128 tables via
>>>>>>>>>>>>   pgpool, the cache replacement will happen. How many tables do you
>>>>>>>>>>>>   have?
>>>>>>>>>>>
>>>>>>>>>>> hundreds.  I don't see any option for setting the relcache size.  How
>>>>>>>>>>> do I make this change?
>>>>>>>>>>
>>>>>>>>>> Currently the only way is changing the source. pgpool-II 3.2 will has
>>>>>>>>>> new directive(relcache_size) to control this.
>>>>>>>>>
>>>>>>>>> When do you anticipate that 3.2 will be released?
>>>>>>>>
>>>>>>>> Officially not decided yet. I personally expect 3.2 is going to be out
>>>>>>>> by the end of this month.
>>>>>>>>
>>>>>>>>> Where in the source of 3.1.x would I make this change, and what would
>>>>>>>>> you suggest I change it to?
>>>>>>>>
>>>>>>>> grep "pool_create_relcache(" *.c will show places where relcache is
>>>>>>>> created. pool_relcache's first argument is the number of cache
>>>>>>>> entries, which you would want to increase.
>>>>>>
>>>>>> I don't have a good understanding of what the number represents, or
>>>>>> how much I'd ideally need.   what would you suggest increasing the
>>>>>> value to, or how can I determine a good value?
>>>>>
>>>>> I can't tell if the lack of a reply is because you don't have time to
>>>>> answer, or for some other reason.  I didn't think my questions were
>>>>> unreasonable.  I'd really appreciate some guidance on how to address
>>>>> this, as its a serious issue in my environment.
>>>>
>>>> You should increase them until cache replacement does not
>>>> happen. Problem is, how to know cache replacement? (Pgpool-II 3.2 will
>>>> emit log in this case)
>>>
>>> OK, if that's the case, then I guess I'm stuck until 3.2 is available.
>>>  Please make sure that whatever message appears in the log is well
>>> documented, along with clear guidance on how to select a larger cache
>>> size, so that I know what to look for.
>>
>> You will see something like:
>>
>> LOG: pool_search_relcache: cache replacement happend
>>
>> in this case.
>>
>>> Or alternatively, what would
>>> be the impact of selecting a value that was too large?  Performance,
>>> stability, something else?
>>
>> Memory size. There are 14 relcaches in the source code. Each relcache
>> entry size is 1064 bytes on my 64bit linux. So if you increase the
>> relcache size by 10 for example, the memory for a pgpool child will
>> increase 10*14*1064 = 148,960 bytes.
>>
>>>> So my only suggestion is, increase to number of tables you have in
>>>> your database.
>>>
>>> Sorry, I'm not following this, unless you really meant "decrease"
>>> instead of "increase".  How would increasing the number of tables in a
>>> database help?  I thought the problem was a result of there being too
>>> many tables to fit into the hardcoded cache  size limit?  Also, am I
>>> supposed to just randomly create tables that serve no purpose?
>>
>> Oops. What I meant was:
>>
>>> So my only suggestion is, increase the number of cache to be larger
>>> than the number of tables you have in your database.
> 
> I'm looking at the official 3.2-RC1 documentation as it applies to
> this (new) functionality, and I think I spotted a bug.  The doc says
> "relcache_size V3.2 -    Size of relation cache in seconds. Default is
> 256. If you see following message frequently, increase the number. ".
> Is this really measured in seconds?   256 seems like an odd number of
> seconds, plus, I don't usually think of the size of something in
> seconds.  Is this an error?

Yes, it's a bug. That should have been "Number of relcache entries. Default is 256."
Will fix.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


More information about the pgpool-general mailing list