[pgpool-general: 691] Re: strange load balancing issue in Solaris

Kuczynski, Rafal (LNG-POL) Rafal.Kuczynski at lexisnexis.pl
Mon Jul 2 15:19:47 JST 2012


Yes, my config is streaming replication of two postgres 9.0.x nodes. Load balance is off.

Regards,
Rafal



________________________________
From: caravinth at gmail.com [mailto:caravinth at gmail.com] On Behalf Of Aravinth
Sent: Saturday, June 30, 2012 6:48 AM
To: Kuczynski, Rafal (LNG-POL)
Cc: pgpool-general at pgpool.net
Subject: Re: [pgpool-general: 682] Re: strange load balancing issue in Solaris

HI Rafal,

I shall try this.

Was you configuration in streaming replication mode?
Regards,
Aravinth

On Fri, Jun 29, 2012 at 4:06 PM, Kuczynski, Rafal (LNG-POL) <Rafal.Kuczynski at lexisnexis.pl<mailto:Rafal.Kuczynski at lexisnexis.pl>> wrote:
Size of my pgpool processes (after many days of work without restart) is abort 12M (with configuration num_init_children = 100 max_pool = 5 child_life_time 300)
I have another pgpool (num_init_children = 30 max_pool = 4 child_life_time 300) and size of the processes is about  4M
100*5 creates 500 connections to the backend, 30*4 creates 120 connections to the backend.
In your case 300*1 creates 300 connections to the backend, what is your average size of the pgpool processes?
Check it after start of pgpool and after few hours of work and when you get the error.

Regards,
Rafal
P.S. I have Solaris 10 8/11 s10x_u10wos_17b X86

________________________________
From: Aravinth [mailto:caravinth at gmail.com<mailto:caravinth at gmail.com>]
Sent: Friday, June 29, 2012 11:45 AM

To: Kuczynski, Rafal (LNG-POL)
Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
Subject: Re: [pgpool-general: 680] Re: strange load balancing issue in Solaris

Hi Rafal,

Thanks for the reply.
Its Solaris 10  64 Bit(Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC)

In my configuration I have
num_init_children = 300
max_pool = 1
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit =1000


--Aravinth
On Fri, Jun 29, 2012 at 3:06 PM, Kuczynski, Rafal (LNG-POL) <Rafal.Kuczynski at lexisnexis.pl<mailto:Rafal.Kuczynski at lexisnexis.pl>> wrote:
Check for memory leaks, it's possible in solaris. Maybe after few hours the amount of ram used by pgpool processes is significantly greater?
My suggest is to observe ram/swap usage and size of pgpool proceesses while it's running, maybe size of the processes grows in time.

Although, I have configuration:
num_init_children = 100
max_pool = 5
Pgpool 3.1.3 on solaris 10 (x86) and did not encounter any problems with memory leaks.

Did you have any other software running on server with pgpool? Is this physical server or virtual machine? What is your solaris version?

Regards,
Rafal


________________________________
From: caravinth at gmail.com<mailto:caravinth at gmail.com> [mailto:caravinth at gmail.com<mailto:caravinth at gmail.com>] On Behalf Of Aravinth
Sent: Friday, June 29, 2012 11:14 AM
To: Kuczynski, Rafal (LNG-POL)
Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
Subject: Re: [pgpool-general: 677] Re: strange load balancing issue in Solaris

Hi Rafal,

The total RAM size is 16GB. Also pgpool was able to handle 300 connection for a few hours.

I have the same setup in centos machine with less RAM that runs without any issues. The issue is only in solaris.

Any thoughts..


Regards,
Aravinth
On Fri, Jun 29, 2012 at 2:34 PM, Kuczynski, Rafal (LNG-POL) <Rafal.Kuczynski at lexisnexis.pl<mailto:Rafal.Kuczynski at lexisnexis.pl>> wrote:
> 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space

This means you have ran out of ram/swap space. Either you r server doesn't have enough ram for 300 processes of pgpool, or you have problem with memory leaks.
You have to monitor ram/swap usage while pgpool is working.

Regards,
Rafal



________________________________
From: pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net> [mailto:pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net>] On Behalf Of Aravinth
Sent: Friday, June 29, 2012 10:51 AM
To: Tatsuo Ishii
Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
Subject: [pgpool-general: 674] Re: strange load balancing issue in Solaris

Guys,

I am facing another issue in same solaris.

I have initialized 300 pre-forked connections using num_init_childresn in streaming replication mode. Every thing works perfectly for a few hours.

After a few hours the connections drop with the below error . Also pgpool doesn't allow any new connections.

Any ideas guys.....




2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1645
2012-06-07 08:31:15 DEBUG: pid 927: child 1413 exits with status 1 by signal 1
2012-06-07 08:31:15 DEBUG: pid 1645: I am 1645
2012-06-07 08:31:15 DEBUG: pid 1645: pool_initialize_private_backend_status: initialize backend status
2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1646
2012-06-07 08:31:15 DEBUG: pid 927: child 1410 exits with status 1 by signal 1
2012-06-07 08:31:15 DEBUG: pid 1646: I am 1646
2012-06-07 08:31:15 DEBUG: pid 1646: pool_initialize_private_backend_status: initialize backend status
2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space
2012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:1716382012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06
-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:17: 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-0
7 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08
:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:1716402012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2
012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid  DEBUG: pid 1637 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pi
d child received shutdown request signal  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid 1641 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DE
BUG: pid  DEBUG: pid  DEBUG: pid 1642 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid :  DEBUG: pid  DEBUG: pid 1636 DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBUG: pid  DEBU
G: pid 16341646: 163016351629163115163316431628164516441639: 1632162716251617161516261610: 1614162216181613child received shutdown request signal 16191612: 161116231621
16241620: : child received shutdown request signal : : : :
: : : : : : child received shutdown request signal : : : : : : : child received shutdown request signal : : : : 15: : child received shutdown request signal : : : : : child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal
child received shutdown request signal child received shutdown request signal 15child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal child received shutdown request signal 1515
15151515151515151515
15151515151515
151515151515
1515151515



Regards,
Aravinth
On Thu, May 10, 2012 at 8:15 AM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>> wrote:
Good. Fix committed in master/V3_1_STABLE/V3_0_STABLE.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> It's working.
>
> Regards,
> Aravinth
>
>
> On Wed, May 9, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>> wrote:
>
>> Thanks for the hint. Attached is a patch trying to fix the
>> problem. Can you please try it?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Yes the issue is with random() function.
>> >
>> > Looks like I have solved the problem by using rand.
>> >
>> > Regards,
>> > Aravinth
>> >
>> >
>> > On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> wrote:
>> >
>> >> Thanks. Apparently random() of Solaris could return value beyond
>> >> RAND_MAX! It's easy to fix the problem, but I would like to do it with
>> >> respcet to portability. Any idea?
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> >>From Solaris 10 (x86) man page:
>> >> >
>> >> >
>> >> > SYNOPSIS
>> >> >      #include <stdlib.h>
>> >> >
>> >> >      long random(void);
>> >> >
>> >> >      void srandom(unsigned int seed);
>> >> >
>> >> >      char  *initstate(unsigned  int  seed,  char  *state,  size_t
>> >> >      size);
>> >> >
>> >> >      char *setstate(const char *state);
>> >> >
>> >> > DESCRIPTION
>> >> >      The random() function uses  a  nonlinear  additive  feedback
>> >> >      random-number generator employing a default state array size
>> >> >      of 31  long  integers  to  return  successive  pseudo-random
>> >> >      numbers  in the range from 0 to 2**31 -1. The period of this
>> >> >      random-number generator is approximately 16 x (2 **31   -1).
>> >> >      The  size  of  the  state array determines the period of the
>> >> >      random-number generator. Increasing  the  state  array  size
>> >> >      increases the period.
>> >> >
>> >> >      The srandom() function initializes the current  state  array
>> >> >      using the value of seed.
>> >> >
>> >> >
>> >> > (...)
>> >> >
>> >> >
>> >> >
>> >> > Regards,
>> >> > Rafal
>> >> >
>> >> >
>> >> >
>> >> > -----Original Message-----
>> >> > From: pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net> [mailto:
>> >> pgpool-general-bounces at pgpool.net<mailto:pgpool-general-bounces at pgpool.net>] On Behalf Of Tatsuo Ishii
>> >> > Sent: Wednesday, May 09, 2012 11:44 AM
>> >> > To: caravinth at gmail.com<mailto:caravinth at gmail.com>
>> >> > Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > Subject: [pgpool-general: 431] Re: strange load balancing issue in
>> >> Solaris
>> >> >
>> >> > Thanks.
>> >> >
>> >> > 2012-05-09 14:31:48 LOG:   pid 22459: r: 268356063.000000
>> total_weight:
>> >> 32767.000000
>> >> >
>> >> > This is really weird. Here pgpool caculate this:
>> >> >
>> >> >       r = (((double)random())/RAND_MAX) * total_weight;
>> >> >
>> >> > Total weight is same as RAND_MAX.  It seems your random() returns
>> >> > bigger than RAND_MAX, which does not make sense because man page of
>> >> > random(3) on my Linux says:
>> >> >
>> >> >          The random() function uses a non-linear additive feedback
>> >> random number
>> >> >        generator  employing a default table of size 31 long integers
>> to
>> >> return
>> >> >        successive pseudo-random numbers in the range from 0 to
>> RAND_MAX.
>> >>   The
>> >> >        period  of  this  random  number generator is very large,
>> >> approximately
>> >> >        16 * ((2^31) - 1).
>> >> >
>> >> > What does your man page for random() say on your system?
>> >> > --
>> >> > Tatsuo Ishii
>> >> > SRA OSS, Inc. Japan
>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> > Japanese: http://www.sraoss.co.jp
>> >> >
>> >> >> Sorry . I missed it.
>> >> >>
>> >> >> Here is the log file.
>> >> >>
>> >> >> --Aravinth
>> >> >>
>> >> >>
>> >> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> wrote:
>> >> >>
>> >> >>> > The code you have sent is same in child.c.
>> >> >>>
>> >> >>> No.
>> >> >>>
>> >> >>>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >>>
>> >> >>> You need to add the line above to get usefull information.
>> >> >>> --
>> >> >>> Tatsuo Ishii
>> >> >>> SRA OSS, Inc. Japan
>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> Japanese: http://www.sraoss.co.jp
>> >> >>>
>> >> >>>
>> >> >>> > I have attached the log file. Please check
>> >> >>> >
>> >> >>> >
>> >> >>> > --Aravinth
>> >> >>> >
>> >> >>> >
>> >> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <
>> ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> >> I suspect there's some portablity issue with load balance code.
>> The
>> >> >>> >> actual source code is in select_load_balancing_nodechild.c).
>> >> >>> >> Please modify source code and connect to pgpool by using psql.
>> >> >>> >> Please send the log output.
>> >> >>> >> --
>> >> >>> >> Tatsuo Ishii
>> >> >>> >> SRA OSS, Inc. Japan
>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >>
>> >> >>> >> int select_load_balancing_node(void)
>> >> >>> >> {
>> >> >>> >>        int selected_slot;
>> >> >>> >>        double total_weight,r;
>> >> >>> >>        int i;
>> >> >>> >>
>> >> >>> >>        /* choose a backend in random manner with weight */
>> >> >>> >>        selected_slot = MASTER_NODE_ID;
>> >> >>> >>        total_weight = 0.0;
>> >> >>> >>
>> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >>> >>        {
>> >> >>> >>                if (VALID_BACKEND(i))
>> >> >>> >>                {
>> >> >>> >>                        total_weight +=
>> >> BACKEND_INFO(i).backend_weight;
>> >> >>> >>                }
>> >> >>> >>        }
>> >> >>> >>        r = (((double)random())/RAND_MAX) * total_weight;
>> >> >>> >>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >>>  <--
>> >> >>> >> add this
>> >> >>> >>
>> >> >>> >>        total_weight = 0.0;
>> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >>> >>        {
>> >> >>> >>                if (VALID_BACKEND(i) &&
>> >> BACKEND_INFO(i).backend_weight >
>> >> >>> >> 0.0)
>> >> >>> >>                {
>> >> >>> >>                        if(r >= total_weight)
>> >> >>> >>                                selected_slot = i;
>> >> >>> >>                        else
>> >> >>> >>                                break;
>> >> >>> >>                        total_weight +=
>> >> BACKEND_INFO(i).backend_weight;
>> >> >>> >>                 }
>> >> >>> >>        }
>> >> >>> >>
>> >> >>> >>        pool_debug("select_load_balancing_node: selected backend
>> id
>> >> is
>> >> >>> %d",
>> >> >>> >> selected_slot);
>> >> >>> >>         return selected_slot;
>> >> >>> >> }
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> > Hi Tatsuo, Thanks for the reply.
>> >> >>> >> >
>> >> >>> >> > The normalized weights are 0.5 for both nodes and the selected
>> >> node is
>> >> >>> >> always the same node. I hope then it's srandom().
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Any idea to solve this srandom issue
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Thanks and Regards,
>> >> >>> >> > Aravinth
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > ________________________________
>> >> >>> >> >  From: Tatsuo Ishii <ishii at postgresql.org<mailto:ishii at postgresql.org>>
>> >> >>> >> > To: aravinth at mafiree.com<mailto:aravinth at mafiree.com>
>> >> >>> >> > Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
>> >> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing
>> issue in
>> >> >>> >> Solaris
>> >> >>> >> >
>> >> >>> >> > First of all please check "normalized" weights are as you
>> >> expected.
>> >> >>> >> > Run "show pool_status;" and see "backend_weight0",
>> >> "backend_weight1"
>> >> >>> >> > section. You see a floating point numbers, which are the
>> >> normalized
>> >> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5, primary
>> and
>> >> >>> >> > standby are given same weight.
>> >> >>> >> >
>> >> >>> >> > If they are ok, I suspect srandom() function behavior is
>> different
>> >> >>> >> > from other platforms. Pgpool-II chooses the load balance node
>> by
>> >> using
>> >> >>> >> > srandom(). select_load_balancing_node() is the function which
>> is
>> >> >>> >> > responsible for selecting the load balance node. If you run
>> >> pgpool-II
>> >> >>> >> > with -d (debug) option, you will see following in the log:
>> >> >>> >> >
>> >> >>> >> >     pool_debug("select_load_balancing_node: selected backend
>> id is
>> >> >>> %d",
>> >> >>> >> selected_slot);
>> >> >>> >> >
>> >> >>> >> > If backend_weight in show pool_status are fine but the line
>> above
>> >> >>> >> > always shows same number, it is the sign that we have problem
>> with
>> >> >>> >> > srandom().
>> >> >>> >> > --
>> >> >>> >> > Tatsuo Ishii
>> >> >>> >> > SRA OSS, Inc. Japan
>> >> >>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> > Japanese: http://www.sraoss.co.jp
>> >> >>> >> >
>> >> >>> >> >> Hi All,
>> >> >>> >> >>
>> >> >>> >> >> I am facing a strange issue in load balancing with replication
>> >> mode
>> >> >>> set
>> >> >>> >> to
>> >> >>> >> >> true in Solaris. Load balancing algorithm always select the
>> same
>> >> node
>> >> >>> >> >> whatever may be the backend weight
>> >> >>> >> >>
>> >> >>> >> >> Here is the scenario.
>> >> >>> >> >>
>> >> >>> >> >> I have a pgpool installed installed in 1 server
>> >> >>> >> >> 2 postgres nodes in other 2 servers
>> >> >>> >> >> replication mode set to true and load balancing set to true
>> >> >>> >> >> backend weight of the 2 nodes is 1.
>> >> >>> >> >>
>> >> >>> >> >> When I fire the queries manuall using different connections or
>> >> using
>> >> >>> >> >> pgbench all the queries hit the same node. Load balancing
>> >> algorithm
>> >> >>> >> always
>> >> >>> >> >> select the same node.
>> >> >>> >> >> No effect in changing the backend weight. Only when I set
>> backend
>> >> >>> >> weight to
>> >> >>> >> >> 0 hits go to the other server.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> I face this issue only in solaris. The same setup in other
>> >> servers (
>> >> >>> >> centos
>> >> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
>> >> >>> >> >>
>> >> >>> >> >> Also tries various postgres versions and pgpool version with
>> same
>> >> >>> >> result.
>> >> >>> >> >> But every version runs fine in other servers.
>> >> >>> >> >>
>> >> >>> >> >> Has anyone faced this issue?
>> >> >>> >> >>
>> >> >>> >> >> Any information would highly helpful.
>> >> >>> >> >>
>> >> >>> >> >> Regards,
>> >> >>> >> >> Aravinth
>> >> >>> >> _______________________________________________
>> >> >>> >> pgpool-general mailing list
>> >> >>> >> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >>> >>
>> >> >>>
>> >> > _______________________________________________
>> >> > pgpool-general mailing list
>> >> > pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> > _______________________________________________
>> >> > pgpool-general mailing list
>> >> > pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> _______________________________________________
>> >> pgpool-general mailing list
>> >> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general


_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general


_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general


_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120702/53358cef/attachment-0001.html>


More information about the pgpool-general mailing list