[pgpool-general: 1899] Re: Pgpool is unable to connect backend PostgreSQL

Tatsuo Ishii ishii at postgresql.org
Tue Jul 16 16:52:44 JST 2013


> Hi
> Thanks for your update.
> I have installed the changes and compile.
> Will follow the behavior now and update.

Thanks.Looking forward to hearing from you.

> Do you recommend to upgrade to 3.2.4?

Moreover, we recommend to upgrade to 3.2.5 (the latest stable version).

> Will this fix be included there?

No, even it is not included in 3.2.5. It's brand new.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Thanks
> Larisa.
> 
> 
> On Tue, Jul 16, 2013 at 6:08 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> Oops. Small correction to the patch.
>>
>> +               if (health_check_timer_expired && getpid() != mypid)
>>      /* has health check timer expired */
>>
>> should be:
>>
>> +               if (health_check_timer_expired && getpid() == mypid)
>>      /* has health check timer expired */
>>
>> Included is the new patch.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Ok. I think I finally understand what's going on here.
>> >
>> > Pgpool main process (14317) started health checking at Jul 12 09:17:04.
>> >
>> > Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking
>> >
>> > Pgpool main process set timer at 09:17:14 because you set
>> > health_check_timeout 10.  This time the health check successfully
>> > completed. The timer for 09:17:14 is blocked by calling
>> > signal(SIGALRM, SIG_IGN).
>> >
>> > Unfortunately child life time was expired at 09:17:14 and pgpool main
>> > process was busy at the time because of this.
>> >
>> > Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds
>> expired
>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called
>> >
>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking
>> >
>> > Pgpool main re-enabled the timer and reset the timer variable
>> > (health_check_timer_expired = 0). But when the timer re-enabled, the
>> > signal handler for the timer set health_check_timer_expired to 1.  As
>> > a result pgpool thought that health check timer was expired.
>> >
>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health
>> check timer has been already expired before attempting to connect to 0 th
>> backend
>> >
>> > Thus failover happend even if the backend was running fine.
>> >
>> > Besides this,
>> >
>> >>> This seems very strange. The error comes here:
>> >
>> > I can now think of an explanation. When child life time is expired,
>> > pgpool fork off new process. If the global variable
>> > health_check_timer_expired to 1 for the reason above, it is possible
>> > the problem you saw could happen because child process inherits this.
>> >
>> > Includes patch addresses the problems above. Could you try it out?
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese: http://www.sraoss.co.jp
>> >
>> >> Hi, thanks for your reply.
>> >>
>> >>>> What kind of pgpool installation are you using?
>> >>  I comlile pgpool from source code, pgpool-II-3.2.3.tar.gz
>> >>>>  What kind of platform are you using?
>> >> We use Ubuntu 12.04 on HP Cloud server.
>> >>
>> >>>> How is like your pgpool.conf?
>> >> Attaching pgpool.conf
>> >>
>> >> Also attached syslog file from the time of a problem. You can look for a
>> >> line
>> >> "Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: I am 11465 accept fd 7"
>> >>
>> >> By the way postgres was up and running at that time, there is nothing in
>> >> the logs from it and its procceses uptime shows week ago...
>> >>
>> >> Thanks in advance for your help
>> >> Larisa.
>> >>
>> >>
>> >>
>> >> On Mon, Jul 15, 2013 at 5:00 AM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >>
>> >>> > Hi
>> >>> > I am hitting the same issue as described in the mail [pgpool-general:
>> >>> 1815]
>> >>> > Pgpool is unable to connect backend PostgreSQL.
>> >>>
>> >>> I guess [pgpool-general: 1815] is different from you (my guess is the
>> >>> case somewhat related to Amazon EC2 environment problem). Moreover,
>> >>> you case seems very unique and strange.
>> >>>
>> >>> > While connected to a single postgres node, after a while pgpool
>> looses
>> >>> > connection to a running postgres db, restarts all children processes
>> and
>> >>> > stays in running state unable to connect to db.
>> >>> >
>> >>> > Pgpool version 3.2.3
>> >>> > Postgres version 9.2.4
>> >>> >
>> >>> > Part of the log:
>> >>> > --------------------
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection received:
>> >>> > host=10.4.225.120 port=41090
>> >>>
>> >>> Process 11465 is a pgpool child process and is responsible for actual
>> >>> pgpool functions.
>> >>>
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: Protocol Major: 3
>> Minor:
>> >>> 0
>> >>> > database: hpadb user: hpauser
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
>> >>> connecting
>> >>> > 0 backend
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
>> >>> > connect_inet_domain_socket_by_port: health check timer expired
>> >>>
>> >>> This seems very strange. The error comes here:
>> >>>
>> >>>                 if (health_check_timer_expired)         /* has health
>> >>> check timer expired */
>> >>>                 {
>> >>>                         pool_log("connect_inet_domain_socket_by_port:
>> >>> health check timer expired");
>> >>>                         close(fd);
>> >>>                         return -1;
>> >>>                 }
>> >>>
>> >>> "health_check_timer_expired" is a global variable used in pgpool main
>> >>> process, which is responsible for management of pgpool, including:
>> >>> health check, failover etc. The variable is only meaningful in the
>> >>> main process and should not be set to non 0 in pgpool child. Moreover,
>> >>> the only place set the variable to non 0 is the signal handler which
>> >>> is set by main process.
>> >>>
>> >>> One the error occurs, pgpool starts failover as you see.
>> >>>
>> >>> I've never seen this kind of report before. What kind of pgpool
>> >>> installation are you using? (compiled from source code or from
>> >>> packes?) What kind of platform are you using? How is like your
>> >>> pgpool.conf?
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS, Inc. Japan
>> >>> English: http://www.sraoss.co.jp/index_en.php
>> >>> Japanese: http://www.sraoss.co.jp
>> >>>
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection to
>> >>> > purple1_node1_ps(5432) failed
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
>> >>> create_cp()
>> >>> > failed
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
>> degenerate_backend_set: 0
>> >>> > fail over request from pid 11465
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler
>> called
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> >>> starting
>> >>> > to select new master node
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: starting
>> degeneration.
>> >>> > shutdown host purple1_node1_ps(5432)
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: no
>> >>> valid
>> >>> > DB node found
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: Restart all children
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 4388
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 9597
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[18648]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4388]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 85
>> messages
>> >>> > from pid 9597 due to rate-limiting
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[9597]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 18648
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29409]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 29409
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 11454
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14323]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11454]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 14323
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 22349
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[22349]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 23617
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 29410
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[31511]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29410]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> kill
>> >>> 31511
>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4385]: child received
>> shutdown
>> >>> > request signal 3
>> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 757
>> >>> messages
>> >>> > from pid 23617 due to rate-limiting
>> >>> >
>> >>> > Could you please explain?
>> >>> > Thanks
>> >>> > Larisa.
>> >>>
>>


More information about the pgpool-general mailing list