[pgpool-general: 677] Re: strange load balancing issue in Solaris

Kuczynski, Rafal (LNG-POL) Rafal.Kuczynski at lexisnexis.pl
Fri Jun 29 18:04:42 JST 2012


> 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space

This means you have ran out of ram/swap space. Either you r server doesn't have enough ram for 300 processes of pgpool, or you have problem with memory leaks.
You have to monitor ram/swap usage while pgpool is working.

Regards,
Rafal



________________________________
From: pgpool-general-bounces at pgpool.net [mailto:pgpool-general-bounces at pgpool.net] On Behalf Of Aravinth
Sent: Friday, June 29, 2012 10:51 AM
To: Tatsuo Ishii
Cc: pgpool-general at pgpool.net
Subject: [pgpool-general: 674] Re: strange load balancing issue in Solaris

Guys,

I am facing another issue in same solaris.

I have initialized 300 pre-forked connections using num_init_childresn in streaming replication mode. Every thing works perfectly for a few hours.

After a few hours the connections drop with the below error . Also pgpool doesn't allow any new connections.

Any ideas guys.....




2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1645
2012-06-07 08:31:15 DEBUG: pid 927: child 1413 exits with status 1 by signal 1
2012-06-07 08:31:15 DEBUG: pid 1645: I am 1645
2012-06-07 08:31:15 DEBUG: pid 1645: pool_initialize_private_backend_status: initialize backend status
2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1646
2012-06-07 08:31:15 DEBUG: pid 927: child 1410 exits with status 1 by signal 1
2012-06-07 08:31:15 DEBUG: pid 1646: I am 1646
2012-06-07 08:31:15 DEBUG: pid 1646: pool_initialize_private_backend_status: initialize backend status
2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space
2012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:1716382012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06
-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:17: 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-0
7 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08
:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:1716402012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2
012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid  DEBUG: pid 1637 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pi
d child received shutdown request signal  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid 1641 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DE
BUG: pid  DEBUG: pid  DEBUG: pid 1642 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid :  DEBUG: pid  DEBUG: pid 1636 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBU
G: pid 16341646: 163016351629163115163316431628164516441639: 1632162716251617161516261610: 1614162216181613child received shutdown request signal 16191612: 161116231621
16241620: : child received shutdown request signal : : : :
: : : : : : child received shutdown request signal : : : : : : : child received shutdown request signal : : : : 15: : child received shutdown request signal : : : : : child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal
child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 1515
15151515151515151515
15151515151515
151515151515
1515151515



Regards,
Aravinth

On Thu, May 10, 2012 at 8:15 AM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>> wrote:
Good. Fix committed in master/V3_1_STABLE/V3_0_STABLE.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> It's working.
>
> Regards,
> Aravinth
>
>
> On Wed, May 9, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>> wrote:
>
>> Thanks for the hint. Attached is a patch trying to fix the
>> problem. Can you please try it?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Yes the issue is with random() function.
>> >
>> > Looks like I have solved the problem by using rand.
>> >
>> > Regards,
>> > Aravinth
>> >
>> >
>> > On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> wrote:
>> >
>> >> Thanks. Apparently random() of Solaris could return value beyond
>> >> RAND_MAX! It's easy to fix the problem, but I would like to do it with
>> >> respcet to portability. Any idea?
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> >>From Solaris 10 (x86) man page:
>> >> >
>> >> >
>> >> > SYNOPSIS
>> >> >      #include <stdlib.h>
>> >> >
>> >> >      long random(void);
>> >> >
>> >> >      void srandom(unsigned int seed);
>> >> >
>> >> >      char  *initstate(unsigned  int  seed,  char  *state,  size_t
>> >> >      size);
>> >> >
>> >> >      char *setstate(const char *state);
>> >> >
>> >> > DESCRIPTION
>> >> >      The random() function uses  a  nonlinear  additive  feedback
>> >> >      random-number generator employing a default state array size
>> >> >      of 31  long  integers  to  return  successive  pseudo-random
>> >> >      numbers  in the range from 0 to 2**31 -1. The period of this
>> >> >      random-number generator is approximately 16 x (2 **31   -1).
>> >> >      The  size  of  the  state array determines the period of the
>> >> >      random-number generator. Increasing  the  state  array  size
>> >> >      increases the period.
>> >> >
>> >> >      The srandom() function initializes the current  state  array
>> >> >      using the value of seed.
>> >> >
>> >> >
>> >> > (...)
>> >> >
>> >> >
>> >> >
>> >> > Regards,
>> >> > Rafal
>> >> >
>> >> >
>> >> >
>> >> > -----Original Message-----
>> >> > From: pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net> [mailto:
>> >> pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net>] On Behalf Of Tatsuo Ishii
>> >> > Sent: Wednesday, May 09, 2012 11:44 AM
>> >> > To: caravinth at gmail.com<mailto:caravinth at gmail.com>
>> >> > Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > Subject: [pgpool-general: 431] Re: strange load balancing issue in
>> >> Solaris
>> >> >
>> >> > Thanks.
>> >> >
>> >> > 2012-05-09 14:31:48 LOG:   pid 22459: r: 268356063.000000
>> total_weight:
>> >> 32767.000000
>> >> >
>> >> > This is really weird. Here pgpool caculate this:
>> >> >
>> >> >       r = (((double)random())/RAND_MAX) * total_weight;
>> >> >
>> >> > Total weight is same as RAND_MAX.  It seems your random() returns
>> >> > bigger than RAND_MAX, which does not make sense because man page of
>> >> > random(3) on my Linux says:
>> >> >
>> >> >          The random() function uses a non-linear additive feedback
>> >> random number
>> >> >        generator  employing a default table of size 31 long integers
>> to
>> >> return
>> >> >        successive pseudo-random numbers in the range from 0 to
>> RAND_MAX.
>> >>   The
>> >> >        period  of  this  random  number generator is very large,
>> >> approximately
>> >> >        16 * ((2^31) - 1).
>> >> >
>> >> > What does your man page for random() say on your system?
>> >> > --
>> >> > Tatsuo Ishii
>> >> > SRA OSS, Inc. Japan
>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> > Japanese: http://www.sraoss.co.jp
>> >> >
>> >> >> Sorry . I missed it.
>> >> >>
>> >> >> Here is the log file.
>> >> >>
>> >> >> --Aravinth
>> >> >>
>> >> >>
>> >> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> wrote:
>> >> >>
>> >> >>> > The code you have sent is same in child.c.
>> >> >>>
>> >> >>> No.
>> >> >>>
>> >> >>>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >>>
>> >> >>> You need to add the line above to get usefull information.
>> >> >>> --
>> >> >>> Tatsuo Ishii
>> >> >>> SRA OSS, Inc. Japan
>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> Japanese: http://www.sraoss.co.jp
>> >> >>>
>> >> >>>
>> >> >>> > I have attached the log file. Please check
>> >> >>> >
>> >> >>> >
>> >> >>> > --Aravinth
>> >> >>> >
>> >> >>> >
>> >> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <
>> ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> >> I suspect there's some portablity issue with load balance code.
>> The
>> >> >>> >> actual source code is in select_load_balancing_nodechild.c).
>> >> >>> >> Please modify source code and connect to pgpool by using psql.
>> >> >>> >> Please send the log output.
>> >> >>> >> --
>> >> >>> >> Tatsuo Ishii
>> >> >>> >> SRA OSS, Inc. Japan
>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >>
>> >> >>> >> int select_load_balancing_node(void)
>> >> >>> >> {
>> >> >>> >>        int selected_slot;
>> >> >>> >>        double total_weight,r;
>> >> >>> >>        int i;
>> >> >>> >>
>> >> >>> >>        /* choose a backend in random manner with weight */
>> >> >>> >>        selected_slot = MASTER_NODE_ID;
>> >> >>> >>        total_weight = 0.0;
>> >> >>> >>
>> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >>> >>        {
>> >> >>> >>                if (VALID_BACKEND(i))
>> >> >>> >>                {
>> >> >>> >>                        total_weight +=
>> >> BACKEND_INFO(i).backend_weight;
>> >> >>> >>                }
>> >> >>> >>        }
>> >> >>> >>        r = (((double)random())/RAND_MAX) * total_weight;
>> >> >>> >>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >>>  <--
>> >> >>> >> add this
>> >> >>> >>
>> >> >>> >>        total_weight = 0.0;
>> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >>> >>        {
>> >> >>> >>                if (VALID_BACKEND(i) &&
>> >> BACKEND_INFO(i).backend_weight >
>> >> >>> >> 0.0)
>> >> >>> >>                {
>> >> >>> >>                        if(r >= total_weight)
>> >> >>> >>                                selected_slot = i;
>> >> >>> >>                        else
>> >> >>> >>                                break;
>> >> >>> >>                        total_weight +=
>> >> BACKEND_INFO(i).backend_weight;
>> >> >>> >>                 }
>> >> >>> >>        }
>> >> >>> >>
>> >> >>> >>        pool_debug("select_load_balancing_node: selected backend
>> id
>> >> is
>> >> >>> %d",
>> >> >>> >> selected_slot);
>> >> >>> >>         return selected_slot;
>> >> >>> >> }
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> > Hi Tatsuo, Thanks for the reply.
>> >> >>> >> >
>> >> >>> >> > The normalized weights are 0.5 for both nodes and the selected
>> >> node is
>> >> >>> >> always the same node. I hope then it's srandom().
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Any idea to solve this srandom issue
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Thanks and Regards,
>> >> >>> >> > Aravinth
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > ________________________________
>> >> >>> >> >  From: Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> >>> >> > To: aravinth at mafiree.com<mailto:aravinth at mafiree.com>
>> >> >>> >> > Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
>> >> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing
>> issue in
>> >> >>> >> Solaris
>> >> >>> >> >
>> >> >>> >> > First of all please check "normalized" weights are as you
>> >> expected.
>> >> >>> >> > Run "show pool_status;" and see "backend_weight0",
>> >> "backend_weight1"
>> >> >>> >> > section. You see a floating point numbers, which are the
>> >> normalized
>> >> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5, primary
>> and
>> >> >>> >> > standby are given same weight.
>> >> >>> >> >
>> >> >>> >> > If they are ok, I suspect srandom() function behavior is
>> different
>> >> >>> >> > from other platforms. Pgpool-II chooses the load balance node
>> by
>> >> using
>> >> >>> >> > srandom(). select_load_balancing_node() is the function which
>> is
>> >> >>> >> > responsible for selecting the load balance node. If you run
>> >> pgpool-II
>> >> >>> >> > with -d (debug) option, you will see following in the log:
>> >> >>> >> >
>> >> >>> >> >     pool_debug("select_load_balancing_node: selected backend
>> id is
>> >> >>> %d",
>> >> >>> >> selected_slot);
>> >> >>> >> >
>> >> >>> >> > If backend_weight in show pool_status are fine but the line
>> above
>> >> >>> >> > always shows same number, it is the sign that we have problem
>> with
>> >> >>> >> > srandom().
>> >> >>> >> > --
>> >> >>> >> > Tatsuo Ishii
>> >> >>> >> > SRA OSS, Inc. Japan
>> >> >>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> > Japanese: http://www.sraoss.co.jp
>> >> >>> >> >
>> >> >>> >> >> Hi All,
>> >> >>> >> >>
>> >> >>> >> >> I am facing a strange issue in load balancing with replication
>> >> mode
>> >> >>> set
>> >> >>> >> to
>> >> >>> >> >> true in Solaris. Load balancing algorithm always select the
>> same
>> >> node
>> >> >>> >> >> whatever may be the backend weight
>> >> >>> >> >>
>> >> >>> >> >> Here is the scenario.
>> >> >>> >> >>
>> >> >>> >> >> I have a pgpool installed installed in 1 server
>> >> >>> >> >> 2 postgres nodes in other 2 servers
>> >> >>> >> >> replication mode set to true and load balancing set to true
>> >> >>> >> >> backend weight of the 2 nodes is 1.
>> >> >>> >> >>
>> >> >>> >> >> When I fire the queries manuall using different connections or
>> >> using
>> >> >>> >> >> pgbench all the queries hit the same node. Load balancing
>> >> algorithm
>> >> >>> >> always
>> >> >>> >> >> select the same node.
>> >> >>> >> >> No effect in changing the backend weight. Only when I set
>> backend
>> >> >>> >> weight to
>> >> >>> >> >> 0 hits go to the other server.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> I face this issue only in solaris. The same setup in other
>> >> servers (
>> >> >>> >> centos
>> >> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
>> >> >>> >> >>
>> >> >>> >> >> Also tries various postgres versions and pgpool version with
>> same
>> >> >>> >> result.
>> >> >>> >> >> But every version runs fine in other servers.
>> >> >>> >> >>
>> >> >>> >> >> Has anyone faced this issue?
>> >> >>> >> >>
>> >> >>> >> >> Any information would highly helpful.
>> >> >>> >> >>
>> >> >>> >> >> Regards,
>> >> >>> >> >> Aravinth
>> >> >>> >> _______________________________________________
>> >> >>> >> pgpool-general mailing list
>> >> >>> >> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >>> >>
>> >> >>>
>> >> > _______________________________________________
>> >> > pgpool-general mailing list
>> >> > pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> > _______________________________________________
>> >> > pgpool-general mailing list
>> >> > pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> _______________________________________________
>> >> pgpool-general mailing list
>> >> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120629/bd772528/attachment-0001.html>


More information about the pgpool-general mailing list