[pgpool-general: 679] Re: strange load balancing issue in Solaris

Aravinth aravinth at mafiree.com
Fri Jun 29 18:14:12 JST 2012


Hi Rafal,

The total RAM size is 16GB. Also pgpool was able to handle 300 connection
for a few hours.

I have the same setup in centos machine with less RAM that runs without any
issues. The issue is only in solaris.

Any thoughts..


Regards,
Aravinth


On Fri, Jun 29, 2012 at 2:34 PM, Kuczynski, Rafal (LNG-POL) <
Rafal.Kuczynski at lexisnexis.pl> wrote:

> **
>
> > 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough
> space****
>
> ** **
>
> This means you have ran out of ram/swap space. Either you r server doesn’t
> have enough ram for 300 processes of pgpool, or you have problem with
> memory leaks.****
>
> You have to monitor ram/swap usage while pgpool is working.****
>
> ** **
>
> Regards,****
>
> Rafal****
>
> ** **
>
> ** **
>
>  ****
>   ------------------------------
>
> *From:* pgpool-general-bounces at pgpool.net [mailto:
> pgpool-general-bounces at pgpool.net] *On Behalf Of *Aravinth
> *Sent:* Friday, June 29, 2012 10:51 AM
> *To:* Tatsuo Ishii
> *Cc:* **pgpool-general at pgpool.net**
> *Subject:* [pgpool-general: 674] Re: strange load balancing issue in
> Solaris****
>
> ** **
>
> Guys,
>
> I am facing another issue in same solaris.
>
> I have initialized 300 pre-forked connections using num_init_childresn in
> streaming replication mode. Every thing works perfectly for a few hours.
>
> After a few hours the connections drop with the below error . Also pgpool
> doesn't allow any new connections.
>
> Any ideas guys.....
>
>
>
>
> 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1645
> 2012-06-07 08:31:15 DEBUG: pid 927: child 1413 exits with status 1 by
> signal 1
> 2012-06-07 08:31:15 DEBUG: pid 1645: I am 1645
> 2012-06-07 08:31:15 DEBUG: pid 1645:
> pool_initialize_private_backend_status: initialize backend status
> 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1646
> 2012-06-07 08:31:15 DEBUG: pid 927: child 1410 exits with status 1 by
> signal 1
> 2012-06-07 08:31:15 DEBUG: pid 1646: I am 1646
> 2012-06-07 08:31:15 DEBUG: pid 1646:
> pool_initialize_private_backend_status: initialize backend status
> 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space
> 2012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> 08:31:1716382012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> 08:31:172012-06-07 08:31:172012-06
> -07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:17: 2012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:172012-06-0
> 7 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08
> :31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:172012-06-07 08:31:1716402012-06-07
> 08:31:172012-06-07 08:31:17 DEBUG: pid 2
> 012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:17 DEBUG: pid  DEBUG: pid 1637 DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBUG: pi
> d child received shutdown request signal  DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBUG: pid  DEBUG: pid 1641 DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DE
> BUG: pid  DEBUG: pid  DEBUG: pid 1642 DEBUG: pid  DEBUG: pid  DEBUG: pid
> DEBUG: pid :  DEBUG: pid  DEBUG: pid 1636 DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBU
> G: pid 16341646: 163016351629163115163316431628164516441639:
> 1632162716251617161516261610: 1614162216181613child received shutdown
> request signal 16191612: 161116231621
> 16241620: : child received shutdown request signal : : : :
> : : : : : : child received shutdown request signal : : : : : : : child
> received shutdown request signal : : : : 15: : child received shutdown
> request signal : : : : : child received shutdown request signal child
> received shutdown request signal 15child received shutdown request signal
> child received shutdown request signal child received shutdown request
> signal child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal child
> received shutdown request signal child received shutdown request signal
> 15child received shutdown request signal child received shutdown request
> signal child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal 15child
> received shutdown request signal child received shutdown request signal
> child received shutdown request signal child received shutdown request
> signal
> child received shutdown request signal child received shutdown request
> signal 15child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal 1515
> 15151515151515151515
> 15151515151515
> 151515151515
> 1515151515
>
>
>
> Regards,
> Aravinth
>
> ****
>
> On Thu, May 10, 2012 at 8:15 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:****
>
> Good. Fix committed in master/V3_1_STABLE/V3_0_STABLE.****
>
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > It's working.
> >
> > Regards,
> > Aravinth
> >
> >
> > On Wed, May 9, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> Thanks for the hint. Attached is a patch trying to fix the
> >> problem. Can you please try it?
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
> >> > Yes the issue is with random() function.
> >> >
> >> > Looks like I have solved the problem by using rand.
> >> >
> >> > Regards,
> >> > Aravinth
> >> >
> >> >
> >> > On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >> >
> >> >> Thanks. Apparently random() of Solaris could return value beyond
> >> >> RAND_MAX! It's easy to fix the problem, but I would like to do it
> with
> >> >> respcet to portability. Any idea?
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese: http://www.sraoss.co.jp
> >> >>
> >> >> >>From Solaris 10 (x86) man page:
> >> >> >
> >> >> >
> >> >> > SYNOPSIS
> >> >> >      #include <stdlib.h>
> >> >> >
> >> >> >      long random(void);
> >> >> >
> >> >> >      void srandom(unsigned int seed);
> >> >> >
> >> >> >      char  *initstate(unsigned  int  seed,  char  *state,  size_t
> >> >> >      size);
> >> >> >
> >> >> >      char *setstate(const char *state);
> >> >> >
> >> >> > DESCRIPTION
> >> >> >      The random() function uses  a  nonlinear  additive  feedback
> >> >> >      random-number generator employing a default state array size
> >> >> >      of 31  long  integers  to  return  successive  pseudo-random
> >> >> >      numbers  in the range from 0 to 2**31 -1. The period of this
> >> >> >      random-number generator is approximately 16 x (2 **31   -1).
> >> >> >      The  size  of  the  state array determines the period of the
> >> >> >      random-number generator. Increasing  the  state  array  size
> >> >> >      increases the period.
> >> >> >
> >> >> >      The srandom() function initializes the current  state  array
> >> >> >      using the value of seed.
> >> >> >
> >> >> >
> >> >> > (...)
> >> >> >
> >> >> >
> >> >> >
> >> >> > Regards,
> >> >> > Rafal
> >> >> >
> >> >> >
> >> >> >
> >> >> > -----Original Message-----
> >> >> > From: pgpool-general-bounces at pgpool.net [mailto:
> >> >> pgpool-general-bounces at pgpool.net] On Behalf Of Tatsuo Ishii
> >> >> > Sent: Wednesday, May 09, 2012 11:44 AM
> >> >> > To: caravinth at gmail.com
> >> >> > Cc: pgpool-general at pgpool.net
> >> >> > Subject: [pgpool-general: 431] Re: strange load balancing issue in
> >> >> Solaris
> >> >> >
> >> >> > Thanks.
> >> >> >
> >> >> > 2012-05-09 14:31:48 LOG:   pid 22459: r: 268356063.000000
> >> total_weight:
> >> >> 32767.000000
> >> >> >
> >> >> > This is really weird. Here pgpool caculate this:
> >> >> >
> >> >> >       r = (((double)random())/RAND_MAX) * total_weight;
> >> >> >
> >> >> > Total weight is same as RAND_MAX.  It seems your random() returns
> >> >> > bigger than RAND_MAX, which does not make sense because man page of
> >> >> > random(3) on my Linux says:
> >> >> >
> >> >> >          The random() function uses a non-linear additive feedback
> >> >> random number
> >> >> >        generator  employing a default table of size 31 long
> integers
> >> to
> >> >> return
> >> >> >        successive pseudo-random numbers in the range from 0 to
> >> RAND_MAX.
> >> >>   The
> >> >> >        period  of  this  random  number generator is very large,
> >> >> approximately
> >> >> >        16 * ((2^31) - 1).
> >> >> >
> >> >> > What does your man page for random() say on your system?
> >> >> > --
> >> >> > Tatsuo Ishii
> >> >> > SRA OSS, Inc. Japan
> >> >> > English: http://www.sraoss.co.jp/index_en.php
> >> >> > Japanese: http://www.sraoss.co.jp
> >> >> >
> >> >> >> Sorry . I missed it.
> >> >> >>
> >> >> >> Here is the log file.
> >> >> >>
> >> >> >> --Aravinth
> >> >> >>
> >> >> >>
> >> >> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <
> ishii at postgresql.org>
> >> >> wrote:
> >> >> >>
> >> >> >>> > The code you have sent is same in child.c.
> >> >> >>>
> >> >> >>> No.
> >> >> >>>
> >> >> >>>        pool_log("r: %f total_weight: %f", r, total_weight);
> >> >> >>>
> >> >> >>> You need to add the line above to get usefull information.
> >> >> >>> --
> >> >> >>> Tatsuo Ishii
> >> >> >>> SRA OSS, Inc. Japan
> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >> >>> Japanese: http://www.sraoss.co.jp
> >> >> >>>
> >> >> >>>
> >> >> >>> > I have attached the log file. Please check
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > --Aravinth
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <
> >> ishii at postgresql.org>
> >> >> >>> wrote:
> >> >> >>> >
> >> >> >>> >> I suspect there's some portablity issue with load balance
> code.
> >> The
> >> >> >>> >> actual source code is in select_load_balancing_nodechild.c).
> >> >> >>> >> Please modify source code and connect to pgpool by using psql.
> >> >> >>> >> Please send the log output.
> >> >> >>> >> --
> >> >> >>> >> Tatsuo Ishii
> >> >> >>> >> SRA OSS, Inc. Japan
> >> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> >>> >> Japanese: http://www.sraoss.co.jp
> >> >> >>> >>
> >> >> >>> >> int select_load_balancing_node(void)
> >> >> >>> >> {
> >> >> >>> >>        int selected_slot;
> >> >> >>> >>        double total_weight,r;
> >> >> >>> >>        int i;
> >> >> >>> >>
> >> >> >>> >>        /* choose a backend in random manner with weight */
> >> >> >>> >>        selected_slot = MASTER_NODE_ID;
> >> >> >>> >>        total_weight = 0.0;
> >> >> >>> >>
> >> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
> >> >> >>> >>        {
> >> >> >>> >>                if (VALID_BACKEND(i))
> >> >> >>> >>                {
> >> >> >>> >>                        total_weight +=
> >> >> BACKEND_INFO(i).backend_weight;
> >> >> >>> >>                }
> >> >> >>> >>        }
> >> >> >>> >>        r = (((double)random())/RAND_MAX) * total_weight;
> >> >> >>> >>        pool_log("r: %f total_weight: %f", r, total_weight);
> >> >> >>>  <--
> >> >> >>> >> add this
> >> >> >>> >>
> >> >> >>> >>        total_weight = 0.0;
> >> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
> >> >> >>> >>        {
> >> >> >>> >>                if (VALID_BACKEND(i) &&
> >> >> BACKEND_INFO(i).backend_weight >
> >> >> >>> >> 0.0)
> >> >> >>> >>                {
> >> >> >>> >>                        if(r >= total_weight)
> >> >> >>> >>                                selected_slot = i;
> >> >> >>> >>                        else
> >> >> >>> >>                                break;
> >> >> >>> >>                        total_weight +=
> >> >> BACKEND_INFO(i).backend_weight;
> >> >> >>> >>                 }
> >> >> >>> >>        }
> >> >> >>> >>
> >> >> >>> >>        pool_debug("select_load_balancing_node: selected
> backend
> >> id
> >> >> is
> >> >> >>> %d",
> >> >> >>> >> selected_slot);
> >> >> >>> >>         return selected_slot;
> >> >> >>> >> }
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> > Hi Tatsuo, Thanks for the reply.
> >> >> >>> >> >
> >> >> >>> >> > The normalized weights are 0.5 for both nodes and the
> selected
> >> >> node is
> >> >> >>> >> always the same node. I hope then it's srandom().
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > Any idea to solve this srandom issue
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > Thanks and Regards,
> >> >> >>> >> > Aravinth
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > ________________________________
> >> >> >>> >> >  From: Tatsuo Ishii <ishii at postgresql.org>
> >> >> >>> >> > To: aravinth at mafiree.com
> >> >> >>> >> > Cc: pgpool-general at pgpool.net
> >> >> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
> >> >> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing
> >> issue in
> >> >> >>> >> Solaris
> >> >> >>> >> >
> >> >> >>> >> > First of all please check "normalized" weights are as you
> >> >> expected.
> >> >> >>> >> > Run "show pool_status;" and see "backend_weight0",
> >> >> "backend_weight1"
> >> >> >>> >> > section. You see a floating point numbers, which are the
> >> >> normalized
> >> >> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5, primary
> >> and
> >> >> >>> >> > standby are given same weight.
> >> >> >>> >> >
> >> >> >>> >> > If they are ok, I suspect srandom() function behavior is
> >> different
> >> >> >>> >> > from other platforms. Pgpool-II chooses the load balance
> node
> >> by
> >> >> using
> >> >> >>> >> > srandom(). select_load_balancing_node() is the function
> which
> >> is
> >> >> >>> >> > responsible for selecting the load balance node. If you run
> >> >> pgpool-II
> >> >> >>> >> > with -d (debug) option, you will see following in the log:
> >> >> >>> >> >
> >> >> >>> >> >     pool_debug("select_load_balancing_node: selected backend
> >> id is
> >> >> >>> %d",
> >> >> >>> >> selected_slot);
> >> >> >>> >> >
> >> >> >>> >> > If backend_weight in show pool_status are fine but the line
> >> above
> >> >> >>> >> > always shows same number, it is the sign that we have
> problem
> >> with
> >> >> >>> >> > srandom().
> >> >> >>> >> > --
> >> >> >>> >> > Tatsuo Ishii
> >> >> >>> >> > SRA OSS, Inc. Japan
> >> >> >>> >> > English: http://www.sraoss.co.jp/index_en.php
> >> >> >>> >> > Japanese: http://www.sraoss.co.jp
> >> >> >>> >> >
> >> >> >>> >> >> Hi All,
> >> >> >>> >> >>
> >> >> >>> >> >> I am facing a strange issue in load balancing with
> replication
> >> >> mode
> >> >> >>> set
> >> >> >>> >> to
> >> >> >>> >> >> true in Solaris. Load balancing algorithm always select the
> >> same
> >> >> node
> >> >> >>> >> >> whatever may be the backend weight
> >> >> >>> >> >>
> >> >> >>> >> >> Here is the scenario.
> >> >> >>> >> >>
> >> >> >>> >> >> I have a pgpool installed installed in 1 server
> >> >> >>> >> >> 2 postgres nodes in other 2 servers
> >> >> >>> >> >> replication mode set to true and load balancing set to true
> >> >> >>> >> >> backend weight of the 2 nodes is 1.
> >> >> >>> >> >>
> >> >> >>> >> >> When I fire the queries manuall using different
> connections or
> >> >> using
> >> >> >>> >> >> pgbench all the queries hit the same node. Load balancing
> >> >> algorithm
> >> >> >>> >> always
> >> >> >>> >> >> select the same node.
> >> >> >>> >> >> No effect in changing the backend weight. Only when I set
> >> backend
> >> >> >>> >> weight to
> >> >> >>> >> >> 0 hits go to the other server.
> >> >> >>> >> >>
> >> >> >>> >> >>
> >> >> >>> >> >> I face this issue only in solaris. The same setup in other
> >> >> servers (
> >> >> >>> >> centos
> >> >> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
> >> >> >>> >> >>
> >> >> >>> >> >> Also tries various postgres versions and pgpool version
> with
> >> same
> >> >> >>> >> result.
> >> >> >>> >> >> But every version runs fine in other servers.
> >> >> >>> >> >>
> >> >> >>> >> >> Has anyone faced this issue?
> >> >> >>> >> >>
> >> >> >>> >> >> Any information would highly helpful.
> >> >> >>> >> >>
> >> >> >>> >> >> Regards,
> >> >> >>> >> >> Aravinth
> >> >> >>> >> _______________________________________________
> >> >> >>> >> pgpool-general mailing list
> >> >> >>> >> pgpool-general at pgpool.net
> >> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >>> >>
> >> >> >>>
> >> >> > _______________________________________________
> >> >> > pgpool-general mailing list
> >> >> > pgpool-general at pgpool.net
> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> > _______________________________________________
> >> >> > pgpool-general mailing list
> >> >> > pgpool-general at pgpool.net
> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> _______________________________________________
> >> >> pgpool-general mailing list
> >> >> pgpool-general at pgpool.net
> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >>
> >>
> >> _______________________________________________
> >> pgpool-general mailing list
> >> pgpool-general at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>
> >>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general****
>
> ** **
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120629/c47bf489/attachment-0001.html>


More information about the pgpool-general mailing list