[pgpool-hackers: 27] Re: [pgpool-general: 131] Healthcheck timeout not always respected

Sun Feb 19 22:01:58 JST 2012

Stevo,

Thanks for the patches. I have committed changes except the part which
you ignore DISALLOW_TO_FAILOVER. Instead I modified low level socket
reading functions not to unconditionaly failover when fails to read
from backend sockets (only failover when If fail_over_on_backend_error
is on). So if you want to trigger failover only when health checking
fails, you want to turn off fail_over_on_backend_error and turn off
DISALLOW_TO_FAILOVER.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hello Tatsuo,
> 
> Attached is cumulative patch rebased to current master branch head which:
> - Fixes health check timeout not always respected (includes unsetting
> non-blocking mode after connection has been successfully established);
> - Fixes failover on health check only support.
> 
> Kind regards,
> Stevo.
> 
> 2012/2/5 Stevo Slavić <sslavic at gmail.com>
> 
>> Tatsuo,
>>
>> Thank you very much for your time and effort put into analysis of the
>> submitted patch,
>>
>>
>> Obviously I'm missing something regarding healthcheck feature, so please
>> clarify:
>>
>>    - what is the purpose of healthcheck when backend flag is set to
>>    DISALLOW_TO_FAILOVER? To log that healthchecks are on time but will not
>>    actually do anything?
>>    - what is the purpose of healthcheck (especially with retries
>>    configured) when backend flag is set to ALLOW_TO_FAILOVER? When answering
>>    please consider case of non-helloworld application that connects to db via
>>    pgpool - will healthcheck be given a chance to fail even once?
>>    - since there is no other backend flag value than the mentioned two,
>>    what is the purpose of healthcheck (especially with retries configured) if
>>    it's not to be the sole process controlling when to failover?
>>
>> I disagree that changing pgpool to give healthcheck feature a meaning
>> disrupts DISALLOW_TO_FAILOVER meaning, it extends it just for case when
>> healthcheck is configured - if one doesn't want healthcheck just keep on
>> not-using it, it's disabled by default. Health checks and retries have only
>> recently been introduced so I doubt there are many if any users of health
>> check especially which have configured DISALLOW_TO_FAILOVER with
>> expectation to just have health check logging but not actually do anything.
>> Out of all pgpool healthcheck users which have backends set to
>> DISALLOW_TO_FAILOVER too I believe most of them expect but do not know that
>> this will not allow failover on health check, it will just make log bigger.
>> Changes included in patch do not affect users which have health check
>> configured and backend set to ALLOW_TO_FAILOVER.
>>
>>
>> About non-blocking connection to backend change:
>>
>>    - with pgpool in raw mode and extensive testing (endurance tests,
>>    failover and failback tests), I didn't notice any unwanted change in
>>    behaviour, apart from wanted non-blocking timeout aware health checks;
>>    - do you see or know about anything in pgpool depending on connection
>>    to backend being blocking one? will have a look myself, just asking maybe
>>    you've found something already. will look into means to set connection back
>>    to being blocking after it's successfully established - maybe just changing
>>    that flag will do.
>>
>>
>> Kind regards,
>>
>> Stevo.
>>
>>
>> On Feb 5, 2012 6:50 AM, "Tatsuo Ishii" <ishii at postgresql.org> wrote:
>>
>>> Finially I have time to check your patches. Here is the result of review.
>>>
>>> > Hello Tatsuo,
>>> >
>>> > Here is cumulative patch to be applied on pgpool master branch with
>>> > following fixes included:
>>> >
>>> >    1. fix for health check bug
>>> >       1. it was not possible to allow backend failover only on failed
>>> >       health check(s);
>>> >       2. to achieve this one just configures backend to
>>> >       DISALLOW_TO_FAILOVER, sets fail_over_on_backend_error to off, and
>>> >       configures health checks;
>>> >       3. for this fix in code an unwanted check was removed in main.c,
>>> >       after health check failed if DISALLOW_TO_FAILOVER was set for
>>> backend
>>> >       failover would have been always prevented, even when one
>>> > configures health
>>> >       check whose sole purpose is to control failover
>>>
>>> This is not acceptable, at least for stable
>>> releases. DISALLOW_TO_FAILOVER and sets fail_over_on_backend_error are
>>> for different purposes. The former is for preventing any failover
>>> including health check. The latter is for write to communication
>>> socket.
>>>
>>> fail_over_on_backend_error = on
>>>                                   # Initiates failover when writing to the
>>>                                   # backend communication socket fails
>>>                                   # This is the same behaviour of
>>> pgpool-II
>>>                                   # 2.2.x and previous releases
>>>                                   # If set to off, pgpool will report an
>>>                                   # error and disconnect the session.
>>>
>>> Your patch changes the existing semantics. Another point is,
>>> DISALLOW_TO_FAILOVER allows to control per backend behavior. Your
>>> patch breaks it.
>>>
>>> >       2. fix for health check bug
>>> >       1. health check timeout was not being respected in all conditions
>>> >       (icmp host unreachable messages dropped for security reasons, or
>>> > no active
>>> >       network component to send those message)
>>> >       2. for this fix in code (main.c, pool.h, pool_connection_pool.c)
>>> inet
>>> >       connections have been made to be non blocking, and during
>>> connection
>>> >       retries status of now global health_check_timer_expired variable
>>> is being
>>> >       checked
>>>
>>> This seems good. But I need more investigation. For example, your
>>> patch set non blocking to sockets but never revert back to blocking.
>>>
>>> >       3. fix for failback bug
>>> >       1. in raw mode, after failback (through pcp_attach_node) standby
>>> >       node/backend would remain in invalid state
>>>
>>> It turned out that even failover was bugged. The status was not set to
>>> CON_DOWN. This leaves the status to CON_CONNECT_WAIT and it prevented
>>> failback from returning to normal state. I fixed this on master branch.
>>>
>>> > (it would be in CON_UP, so on
>>> >       failover after failback pgpool would not be able to connect to
>>> standby as
>>> >       get_next_master_node expects standby nodes/backends in raw mode
>>> to be in
>>> >       CON_CONNECT_WAIT state when finding next master node)
>>> >       2. for this fix in code, when in raw mode on failback status of
>>> all
>>> >       nodes/backends with CON_UP state is set to CON_CONNECT_WAIT -
>>> > all children
>>> >       are restarted anyway
>>>
>>>
>>> > Neither of these fixes changes expected behaviour of related features so
>>> > there are no changes to the documentation.
>>> >
>>> >
>>> > Kind regards,
>>> >
>>> > Stevo.
>>> >
>>> >
>>> > 2012/1/24 Tatsuo Ishii <ishii at postgresql.org>
>>> >
>>> >> > Additional testing confirmed that this fix ensures health check timer
>>> >> gets
>>> >> > respected (should I create a ticket on some issue tracker? send
>>> >> cumulative
>>> >> > patch with all changes to have it accepted?).
>>> >>
>>> >> We have problem with Mantis bug tracker and decided to stop using
>>> >> it(unless someone volunteers to fix it). Please send cumulative patch
>>> >> againt master head to this list so that we will be able to look
>>> >> into(be sure to include English doc changes).
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>> >> > Problem is that with all the testing another issue has been
>>> encountered,
>>> >> > now with pcp_attach_node.
>>> >> >
>>> >> > With pgpool in raw mode and two backends in postgres 9 streaming
>>> >> > replication, when backend0 fails, after health checks retries pgpool
>>> >> calls
>>> >> > failover command and degenerates backend0, backend1 gets promoted to
>>> new
>>> >> > master, pgpool can connect to that master, and two backends are in
>>> pgpool
>>> >> > state 3/2. And this is ok and expected.
>>> >> >
>>> >> > Once backend0 is recovered, it's attached back to pgpool using
>>> >> > pcp_attach_node, and pgpool will show two backends in state 2/2 (in
>>> logs
>>> >> > and in show pool_nodes; query) with backend0 taking all the load (raw
>>> >> > mode). If after that recovery and attachment of backend0 pgpool is
>>> not
>>> >> > restarted, and afetr some time backend0 fails again, after health
>>> check
>>> >> > retries backend0 will get degenerated, failover command will get
>>> called
>>> >> > (promotes standby to master), but pgpool will not be able to connect
>>> to
>>> >> > backend1 (regardless if unix or inet sockets are used for backend1).
>>> Only
>>> >> > if pgpool is restarted before second (complete) failure of backend0,
>>> will
>>> >> > pgpool be able to connect to backend1.
>>> >> >
>>> >> > Following code, pcp_attach_node (failback of backend0) will actually
>>> >> > execute same code as for failover. Not sure what, but that failover
>>> does
>>> >> > something with backend1 state or in memory settings, so that pgpool
>>> can
>>> >> no
>>> >> > longer connect to backend1. Is this a known issue?
>>> >> >
>>> >> > Kind regards,
>>> >> > Stevo.
>>> >> >
>>> >> > 2012/1/20 Stevo Slavić <sslavic at gmail.com>
>>> >> >
>>> >> >> Key file was missing from that commit/change - pool.h where
>>> >> >> health_check_timer_expired was made global. Included now attached
>>> patch.
>>> >> >>
>>> >> >> Kind regards,
>>> >> >> Stevo.
>>> >> >>
>>> >> >>
>>> >> >> 2012/1/20 Stevo Slavić <sslavic at gmail.com>
>>> >> >>
>>> >> >>> Using exit_request was wrong and caused a bug. 4th patch needed -
>>> >> >>> health_check_timer_expired is global now so it can be verified if
>>> it
>>> >> was
>>> >> >>> set to 1 outside of main.c
>>> >> >>>
>>> >> >>>
>>> >> >>> Kind regards,
>>> >> >>> Stevo.
>>> >> >>>
>>> >> >>> 2012/1/19 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>
>>> >> >>>> Using exit_code was not wise. Tested and encountered a case where
>>> this
>>> >> >>>> results in a bug. Have to work on it more. Main issue is how in
>>> >> >>>> pool_connection_pool.c connect_inet_domain_socket_by_port
>>> function to
>>> >> know
>>> >> >>>> that health check timer has expired (set to 1). Any ideas?
>>> >> >>>>
>>> >> >>>> Kind regards,
>>> >> >>>> Stevo.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> 2012/1/19 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>>
>>> >> >>>>> Tatsuo,
>>> >> >>>>>
>>> >> >>>>> Here are the patches which should be applied to current pgpool
>>> head
>>> >> for
>>> >> >>>>> fixing this issue:
>>> >> >>>>>
>>> >> >>>>> Fixes-health-check-timeout.patch
>>> >> >>>>> Fixes-health-check-retrying-after-failover.patch
>>> >> >>>>> Fixes-clearing-exitrequest-flag.patch
>>> >> >>>>>
>>> >> >>>>> Quirk I noticed in logs was resolved as well - after failover
>>> pgpool
>>> >> >>>>> would perform healthcheck and report it is doing (max retries +
>>> 1) th
>>> >> >>>>> health check which was confusing. Rather I've adjusted that it
>>> does
>>> >> and
>>> >> >>>>> reports it's doing a new health check cycle after failover.
>>> >> >>>>>
>>> >> >>>>> I've tested and it works well - when in raw mode, backends set to
>>> >> >>>>> disallow failover, failover on backend failure disabled, and
>>> health
>>> >> checks
>>> >> >>>>> configured with retries (30sec interval, 5sec timeout, 2 retries,
>>> >> 10sec
>>> >> >>>>> delay between retries).
>>> >> >>>>>
>>> >> >>>>> Please test, and if confirmed ok include in next release.
>>> >> >>>>>
>>> >> >>>>> Kind regards,
>>> >> >>>>>
>>> >> >>>>> Stevo.
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> 2012/1/16 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>>>
>>> >> >>>>>> Here is pgpool.log, strace.out, and pgpool.conf when I tested
>>> with
>>> >> my
>>> >> >>>>>> latest patch for health check timeout applied. It works well,
>>> >> except for
>>> >> >>>>>> single quirk, after failover completed in log files it was
>>> reported
>>> >> that
>>> >> >>>>>> 3rd health check retry was done (even though just 2 are
>>> configured,
>>> >> see
>>> >> >>>>>> pgpool.conf) and that backend has returned to healthy state.
>>> That
>>> >> >>>>>> interesting part from log file follows:
>>> >> >>>>>>
>>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45
>>> DEBUG: pid
>>> >> >>>>>> 1163: retrying 3 th health checking
>>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45
>>> DEBUG: pid
>>> >> >>>>>> 1163: health_check: 0 th DB node status: 3
>>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45 LOG:
>>>   pid
>>> >> >>>>>> 1163: after some retrying backend returned to healthy state
>>> >> >>>>>> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15
>>> DEBUG: pid
>>> >> >>>>>> 1163: starting health checking
>>> >> >>>>>> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15
>>> DEBUG: pid
>>> >> >>>>>> 1163: health_check: 0 th DB node status: 3
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> As can be seen in pgpool.conf, there is only one backend
>>> configured.
>>> >> >>>>>> pgpool did failover well after health check max retries has been
>>> >> reached
>>> >> >>>>>> (pgpool just degraded that single backend to 3, and restarted
>>> child
>>> >> >>>>>> processes).
>>> >> >>>>>>
>>> >> >>>>>> After this quirk has been logged, next health check logs were as
>>> >> >>>>>> expected. Except those couple weird log entries, everything
>>> seems
>>> >> to be ok.
>>> >> >>>>>> Maybe that quirk was caused by single backend only configuration
>>> >> corner
>>> >> >>>>>> case. Will try tomorrow if it occurs on dual backend
>>> configuration.
>>> >> >>>>>>
>>> >> >>>>>> Regards,
>>> >> >>>>>> Stevo.
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> 2012/1/16 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>>>>
>>> >> >>>>>>> Hello Tatsuo,
>>> >> >>>>>>>
>>> >> >>>>>>> Unfortunately, with your patch when A is on
>>> >> >>>>>>> (pool_config->health_check_period > 0) and B is on, when retry
>>> >> count is
>>> >> >>>>>>> over, failover will be disallowed because of B being on.
>>> >> >>>>>>>
>>> >> >>>>>>> Nenad's patch allows failover to be triggered only by health
>>> check.
>>> >> >>>>>>> Here is the patch which includes Nenad's fix but also fixes
>>> issue
>>> >> with
>>> >> >>>>>>> health check timeout not being respected.
>>> >> >>>>>>>
>>> >> >>>>>>> Key points in fix for health check timeout being respected are:
>>> >> >>>>>>> - in pool_connection_pool.c connect_inet_domain_socket_by_port
>>> >> >>>>>>> function, before trying to connect, file descriptor is set to
>>> >> non-blocking
>>> >> >>>>>>> mode, and also non-blocking mode error codes are handled,
>>> >> EINPROGRESS and
>>> >> >>>>>>> EALREADY (please verify changes here, especially regarding
>>> closing
>>> >> fd)
>>> >> >>>>>>> - in main.c health_check_timer_handler has been changed to
>>> signal
>>> >> >>>>>>> exit_request to health check initiated
>>> >> connect_inet_domain_socket_by_port
>>> >> >>>>>>> function call (please verify this, maybe there is a better way
>>> to
>>> >> check
>>> >> >>>>>>> from connect_inet_domain_socket_by_port if in
>>> >> health_check_timer_expired
>>> >> >>>>>>> has been set to 1)
>>> >> >>>>>>>
>>> >> >>>>>>> These changes will practically make connect attempt to be
>>> >> >>>>>>> non-blocking and repeated until:
>>> >> >>>>>>> - connection is made, or
>>> >> >>>>>>> - unhandled connection error condition is reached, or
>>> >> >>>>>>> - health check timer alarm has been raised, or
>>> >> >>>>>>> - some other exit request (shutdown) has been issued.
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> Kind regards,
>>> >> >>>>>>> Stevo.
>>> >> >>>>>>>
>>> >> >>>>>>> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>
>>> >> >>>>>>>> Ok, let me clarify use cases regarding failover.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Currently there are three parameters:
>>> >> >>>>>>>> a) health_check
>>> >> >>>>>>>> b) DISALLOW_TO_FAILOVER
>>> >> >>>>>>>> c) fail_over_on_backend_error
>>> >> >>>>>>>>
>>> >> >>>>>>>> Source of errors which can trigger failover are 1)health check
>>> >> >>>>>>>> 2)write
>>> >> >>>>>>>> to backend socket 3)read backend from socket. I represent
>>> each 1)
>>> >> as
>>> >> >>>>>>>> A, 2) as B, 3) as C.
>>> >> >>>>>>>>
>>> >> >>>>>>>> 1) trigger failover if A or B or C is error
>>> >> >>>>>>>> a = on, b = off, c = on
>>> >> >>>>>>>>
>>> >> >>>>>>>> 2) trigger failover only when B or C is error
>>> >> >>>>>>>> a = off, b = off, c = on
>>> >> >>>>>>>>
>>> >> >>>>>>>> 3) trigger failover only when B is error
>>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>>> >> >>>>>>>>
>>> >> >>>>>>>> 4) trigger failover only when C is error
>>> >> >>>>>>>> a = off, b = off, c = off
>>> >> >>>>>>>>
>>> >> >>>>>>>> 5) trigger failover only when A is error(Stevo wants this)
>>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>>> >> >>>>>>>>
>>> >> >>>>>>>> 6) never trigger failover
>>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>>> >> >>>>>>>>
>>> >> >>>>>>>> As you can see, C is the problem here (look at #3, #5 and #6)
>>> >> >>>>>>>>
>>> >> >>>>>>>> If we implemented this:
>>> >> >>>>>>>> >> However I think we should disable failover if
>>> >> >>>>>>>> DISALLOW_TO_FAILOVER set
>>> >> >>>>>>>> >> in case of reading data from backend. This should have been
>>> >> done
>>> >> >>>>>>>> when
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER was introduced because this is exactly
>>> >> what
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you
>>> think?
>>> >> >>>>>>>>
>>> >> >>>>>>>> 1) trigger failover if A or B or C is error
>>> >> >>>>>>>> a = on, b = off, c = on
>>> >> >>>>>>>>
>>> >> >>>>>>>> 2) trigger failover only when B or C is error
>>> >> >>>>>>>> a = off, b = off, c = on
>>> >> >>>>>>>>
>>> >> >>>>>>>> 3) trigger failover only when B is error
>>> >> >>>>>>>> a = off, b = on, c = on
>>> >> >>>>>>>>
>>> >> >>>>>>>> 4) trigger failover only when C is error
>>> >> >>>>>>>> a = off, b = off, c = off
>>> >> >>>>>>>>
>>> >> >>>>>>>> 5) trigger failover only when A is error(Stevo wants this)
>>> >> >>>>>>>> a = on, b = on, c = off
>>> >> >>>>>>>>
>>> >> >>>>>>>> 6) never trigger failover
>>> >> >>>>>>>> a = off, b = on, c = off
>>> >> >>>>>>>>
>>> >> >>>>>>>> So it seems my patch will solve all the problems including
>>> yours.
>>> >> >>>>>>>> (timeout while retrying is another issue of course).
>>> >> >>>>>>>> --
>>> >> >>>>>>>> Tatsuo Ishii
>>> >> >>>>>>>> SRA OSS, Inc. Japan
>>> >> >>>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>>
>>> >> >>>>>>>> > I agree, fail_over_on_backend_error isn't useful, just adds
>>> >> >>>>>>>> confusion by
>>> >> >>>>>>>> > overlapping with DISALLOW_TO_FAILOVER.
>>> >> >>>>>>>> >
>>> >> >>>>>>>> > With your patch or without it, it is not possible to
>>> failover
>>> >> only
>>> >> >>>>>>>> on
>>> >> >>>>>>>> > health check (max retries) failure. With Nenad's patch, that
>>> >> part
>>> >> >>>>>>>> works ok
>>> >> >>>>>>>> > and I think that patch is semantically ok - failover occurs
>>> even
>>> >> >>>>>>>> though
>>> >> >>>>>>>> > DISALLOW_TO_FAILOVER is set for backend but only when health
>>> >> check
>>> >> >>>>>>>> is
>>> >> >>>>>>>> > configured too. Configuring health check without failover on
>>> >> >>>>>>>> failed health
>>> >> >>>>>>>> > check has no purpose. Also health check configured with
>>> allowed
>>> >> >>>>>>>> failover on
>>> >> >>>>>>>> > any condition other than health check (max retries) failure
>>> has
>>> >> no
>>> >> >>>>>>>> purpose.
>>> >> >>>>>>>> >
>>> >> >>>>>>>> > Kind regards,
>>> >> >>>>>>>> > Stevo.
>>> >> >>>>>>>> >
>>> >> >>>>>>>> > 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>> >
>>> >> >>>>>>>> >> fail_over_on_backend_error has different meaning from
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER. From the doc:
>>> >> >>>>>>>> >>
>>> >> >>>>>>>> >>  If true, and an error occurs when writing to the backend
>>> >> >>>>>>>> >>  communication, pgpool-II will trigger the fail over
>>> procedure
>>> >> .
>>> >> >>>>>>>> This
>>> >> >>>>>>>> >>  is the same behavior as of pgpool-II 2.2.x or earlier. If
>>> set
>>> >> to
>>> >> >>>>>>>> >>  false, pgpool will report an error and disconnect the
>>> session.
>>> >> >>>>>>>> >>
>>> >> >>>>>>>> >> This means that if pgpool failed to read from backend, it
>>> will
>>> >> >>>>>>>> trigger
>>> >> >>>>>>>> >> failover even if fail_over_on_backend_error to off. So
>>> >> >>>>>>>> unconditionaly
>>> >> >>>>>>>> >> disabling failover will lead backward imcompatibilty.
>>> >> >>>>>>>> >>
>>> >> >>>>>>>> >> However I think we should disable failover if
>>> >> >>>>>>>> DISALLOW_TO_FAILOVER set
>>> >> >>>>>>>> >> in case of reading data from backend. This should have been
>>> >> done
>>> >> >>>>>>>> when
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER was introduced because this is exactly
>>> >> what
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you
>>> think?
>>> >> >>>>>>>> >> --
>>> >> >>>>>>>> >> Tatsuo Ishii
>>> >> >>>>>>>> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >>
>>> >> >>>>>>>> >> > For a moment I thought we could have set
>>> >> >>>>>>>> fail_over_on_backend_error to
>>> >> >>>>>>>> >> off,
>>> >> >>>>>>>> >> > and have backends set with ALLOW_TO_FAILOVER flag. But
>>> then I
>>> >> >>>>>>>> looked in
>>> >> >>>>>>>> >> > code.
>>> >> >>>>>>>> >> >
>>> >> >>>>>>>> >> > In child.c there is a loop child process goes through in
>>> its
>>> >> >>>>>>>> lifetime.
>>> >> >>>>>>>> >> When
>>> >> >>>>>>>> >> > fatal error condition occurs before child process exits
>>> it
>>> >> will
>>> >> >>>>>>>> call
>>> >> >>>>>>>> >> > notice_backend_error which will call
>>> degenerate_backend_set
>>> >> >>>>>>>> which will
>>> >> >>>>>>>> >> not
>>> >> >>>>>>>> >> > take into account fail_over_on_backend_error is set to
>>> off,
>>> >> >>>>>>>> causing
>>> >> >>>>>>>> >> backend
>>> >> >>>>>>>> >> > to be degenerated and failover to occur. That's why we
>>> have
>>> >> >>>>>>>> backends set
>>> >> >>>>>>>> >> > with DISALLOW_TO_FAILOVER but with our patch applied,
>>> health
>>> >> >>>>>>>> check could
>>> >> >>>>>>>> >> > cause failover to occur as expected.
>>> >> >>>>>>>> >> >
>>> >> >>>>>>>> >> > Maybe it would be enough just to modify
>>> >> degenerate_backend_set,
>>> >> >>>>>>>> to take
>>> >> >>>>>>>> >> > fail_over_on_backend_error into account just like it
>>> already
>>> >> >>>>>>>> takes
>>> >> >>>>>>>> >> > DISALLOW_TO_FAILOVER into account.
>>> >> >>>>>>>> >> >
>>> >> >>>>>>>> >> > Kind regards,
>>> >> >>>>>>>> >> > Stevo.
>>> >> >>>>>>>> >> >
>>> >> >>>>>>>> >> > 2012/1/15 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>>>>>> >> >
>>> >> >>>>>>>> >> >> Yes and that behaviour which you describe as expected,
>>> is
>>> >> not
>>> >> >>>>>>>> what we
>>> >> >>>>>>>> >> >> want. We want pgpool to degrade backend0 and failover
>>> when
>>> >> >>>>>>>> configured
>>> >> >>>>>>>> >> max
>>> >> >>>>>>>> >> >> health check retries have failed, and to failover only
>>> in
>>> >> that
>>> >> >>>>>>>> case, so
>>> >> >>>>>>>> >> not
>>> >> >>>>>>>> >> >> sooner e.g. connection/child error condition, but as
>>> soon as
>>> >> >>>>>>>> max health
>>> >> >>>>>>>> >> >> check retries have been attempted.
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> Maybe examples will be more clear.
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> Imagine two nodes (node 1 and node 2). On each node a
>>> single
>>> >> >>>>>>>> pgpool and
>>> >> >>>>>>>> >> a
>>> >> >>>>>>>> >> >> single backend. Apps/clients access db through pgpool on
>>> >> their
>>> >> >>>>>>>> own node.
>>> >> >>>>>>>> >> >> Two backends are configured in postgres native streaming
>>> >> >>>>>>>> replication.
>>> >> >>>>>>>> >> >> pgpools are used in raw mode. Both pgpools have same
>>> >> backend as
>>> >> >>>>>>>> >> backend0,
>>> >> >>>>>>>> >> >> and same backend as backend1.
>>> >> >>>>>>>> >> >> initial state: both backends are up and pgpool can
>>> access
>>> >> >>>>>>>> them, clients
>>> >> >>>>>>>> >> >> connect to their pgpool and do their work on master
>>> backend,
>>> >> >>>>>>>> backend0.
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> 1st case: unmodified/non-patched pgpool 3.1.1 is used,
>>> >> >>>>>>>> backends are
>>> >> >>>>>>>> >> >> configured with ALLOW_TO_FAILOVER flag
>>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool on
>>> node 2
>>> >> >>>>>>>> and backend0
>>> >> >>>>>>>> >> >> - error condition is reported by child process, and
>>> since
>>> >> >>>>>>>> >> >> ALLOW_TO_FAILOVER is set, pgpool performs failover
>>> without
>>> >> >>>>>>>> giving
>>> >> >>>>>>>> >> chance to
>>> >> >>>>>>>> >> >> pgpool health check retries to control whether backend
>>> is
>>> >> just
>>> >> >>>>>>>> >> temporarily
>>> >> >>>>>>>> >> >> inaccessible
>>> >> >>>>>>>> >> >> - failover command on node 2 promotes standby backend
>>> to a
>>> >> new
>>> >> >>>>>>>> master -
>>> >> >>>>>>>> >> >> split brain occurs, with two masters
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> 2nd case: unmodified/non-patched pgpool 3.1.1 is used,
>>> >> >>>>>>>> backends are
>>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool on
>>> node 2
>>> >> >>>>>>>> and backend0
>>> >> >>>>>>>> >> >> - error condition is reported by child process, and
>>> since
>>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform
>>> >> failover
>>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>>> condition,
>>> >> >>>>>>>> determines
>>> >> >>>>>>>> >> that
>>> >> >>>>>>>> >> >> it's not accessible, there will be no health check
>>> retries
>>> >> >>>>>>>> because
>>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, no failover occurs ever
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> 3rd case, pgpool 3.1.1 + patch you've sent applied, and
>>> >> >>>>>>>> backends
>>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool on
>>> node 2
>>> >> >>>>>>>> and backend0
>>> >> >>>>>>>> >> >> - error condition is reported by child process, and
>>> since
>>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform
>>> >> failover
>>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>>> condition,
>>> >> >>>>>>>> determines
>>> >> >>>>>>>> >> that
>>> >> >>>>>>>> >> >> it's not accessible, health check retries happen, and
>>> even
>>> >> >>>>>>>> after max
>>> >> >>>>>>>> >> >> retries, no failover happens since failover is
>>> disallowed
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> 4th expected behaviour, pgpool 3.1.1 + patch we sent,
>>> and
>>> >> >>>>>>>> backends
>>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool on
>>> node 2
>>> >> >>>>>>>> and backend0
>>> >> >>>>>>>> >> >> - error condition is reported by child process, and
>>> since
>>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform
>>> >> failover
>>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>>> condition,
>>> >> >>>>>>>> determines
>>> >> >>>>>>>> >> that
>>> >> >>>>>>>> >> >> it's not accessible, health check retries happen,
>>> before a
>>> >> max
>>> >> >>>>>>>> retry
>>> >> >>>>>>>> >> >> network condition is cleared, retry happens, and
>>> backend0
>>> >> >>>>>>>> remains to be
>>> >> >>>>>>>> >> >> master, no failover occurs, temporary network issue did
>>> not
>>> >> >>>>>>>> cause split
>>> >> >>>>>>>> >> >> brain
>>> >> >>>>>>>> >> >> - after some time, temporary network outage happens
>>> again
>>> >> >>>>>>>> between pgpool
>>> >> >>>>>>>> >> >> on node 2 and backend0
>>> >> >>>>>>>> >> >> - error condition is reported by child process, and
>>> since
>>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform
>>> >> failover
>>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>>> condition,
>>> >> >>>>>>>> determines
>>> >> >>>>>>>> >> that
>>> >> >>>>>>>> >> >> it's not accessible, health check retries happen, after
>>> max
>>> >> >>>>>>>> retries
>>> >> >>>>>>>> >> >> backend0 is still not accessible, failover happens,
>>> standby
>>> >> is
>>> >> >>>>>>>> new
>>> >> >>>>>>>> >> master
>>> >> >>>>>>>> >> >> and backend0 is degraded
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> Kind regards,
>>> >> >>>>>>>> >> >> Stevo.
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>> In my test evironment, the patch works as expected. I
>>> have
>>> >> two
>>> >> >>>>>>>> >> >>> backends. Health check retry conf is as follows:
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> health_check_max_retries = 3
>>> >> >>>>>>>> >> >>> health_check_retry_delay = 1
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> 5 09:17:20 LOG:   pid 21411: Backend status file
>>> >> >>>>>>>> /home/t-ishii/work/
>>> >> >>>>>>>> >> >>> git.postgresql.org/test/log/pgpool_status discarded
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411: pgpool-II
>>> >> successfully
>>> >> >>>>>>>> started.
>>> >> >>>>>>>> >> >>> version 3.2alpha1 (hatsuiboshi)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411:
>>> find_primary_node:
>>> >> >>>>>>>> primary node
>>> >> >>>>>>>> >> id
>>> >> >>>>>>>> >> >>> is 0
>>> >> >>>>>>>> >> >>> -- backend1 was shutdown
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>>> >> >>>>>>>> check_replication_time_lag: could
>>> >> >>>>>>>> >> >>> not connect to DB node 1, check sr_check_user and
>>> >> >>>>>>>> sr_check_password
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> -- health check failed
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> -- start retrying
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 LOG:   pid 21411: health check
>>> retry
>>> >> >>>>>>>> sleep time: 1
>>> >> >>>>>>>> >> >>> second(s)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 LOG:   pid 21411: health check
>>> retry
>>> >> >>>>>>>> sleep time: 1
>>> >> >>>>>>>> >> >>> second(s)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 LOG:   pid 21411: health check
>>> retry
>>> >> >>>>>>>> sleep time: 1
>>> >> >>>>>>>> >> >>> second(s)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 LOG:   pid 21411: health_check: 1
>>> >> >>>>>>>> failover is
>>> >> >>>>>>>> >> canceld
>>> >> >>>>>>>> >> >>> because failover is disallowed
>>> >> >>>>>>>> >> >>> -- after 3 retries, pgpool wanted to failover, but
>>> gave up
>>> >> >>>>>>>> because
>>> >> >>>>>>>> >> >>> DISALLOW_TO_FAILOVER is set for backend1
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>>> >> >>>>>>>> check_replication_time_lag: could
>>> >> >>>>>>>> >> >>> not connect to DB node 1, check sr_check_user and
>>> >> >>>>>>>> sr_check_password
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 LOG:   pid 21411: health check
>>> retry
>>> >> >>>>>>>> sleep time: 1
>>> >> >>>>>>>> >> >>> second(s)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file
>>> or
>>> >> >>>>>>>> directory
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>>> >> >>>>>>>> make_persistent_db_connection:
>>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411: health check
>>> failed.
>>> >> 1
>>> >> >>>>>>>> th host
>>> >> >>>>>>>> >> /tmp
>>> >> >>>>>>>> >> >>> at port 11001 is down
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 LOG:   pid 21411: health check
>>> retry
>>> >> >>>>>>>> sleep time: 1
>>> >> >>>>>>>> >> >>> second(s)
>>> >> >>>>>>>> >> >>> 2012-01-15 09:18:05 LOG:   pid 21411: after some
>>> retrying
>>> >> >>>>>>>> backend
>>> >> >>>>>>>> >> >>> returned to healthy state
>>> >> >>>>>>>> >> >>> -- started backend1 and pgpool succeeded in health
>>> >> checking.
>>> >> >>>>>>>> Resumed
>>> >> >>>>>>>> >> >>> using backend1
>>> >> >>>>>>>> >> >>> --
>>> >> >>>>>>>> >> >>> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>> > Hello Tatsuo,
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > Thank you for the patch and effort, but unfortunately
>>> >> this
>>> >> >>>>>>>> change
>>> >> >>>>>>>> >> won't
>>> >> >>>>>>>> >> >>> > work for us. We need to set disallow failover to
>>> prevent
>>> >> >>>>>>>> failover on
>>> >> >>>>>>>> >> >>> child
>>> >> >>>>>>>> >> >>> > reported connection errors (it's ok if few clients
>>> lose
>>> >> >>>>>>>> their
>>> >> >>>>>>>> >> >>> connection or
>>> >> >>>>>>>> >> >>> > can not connect), and still have pgpool perform
>>> failover
>>> >> >>>>>>>> but only on
>>> >> >>>>>>>> >> >>> failed
>>> >> >>>>>>>> >> >>> > health check (if configured, after max retries
>>> threshold
>>> >> >>>>>>>> has been
>>> >> >>>>>>>> >> >>> reached).
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > Maybe it would be best to add an extra value for
>>> >> >>>>>>>> backend_flag -
>>> >> >>>>>>>> >> >>> > ALLOW_TO_FAILOVER_ON_HEALTH_CHECK or
>>> >> >>>>>>>> >> >>> DISALLOW_TO_FAILOVER_ON_CHILD_ERROR.
>>> >> >>>>>>>> >> >>> > It should behave same as DISALLOW_TO_FAILOVER is set,
>>> >> with
>>> >> >>>>>>>> only
>>> >> >>>>>>>> >> >>> difference
>>> >> >>>>>>>> >> >>> > in behaviour when health check (if set, max retries)
>>> has
>>> >> >>>>>>>> failed -
>>> >> >>>>>>>> >> unlike
>>> >> >>>>>>>> >> >>> > DISALLOW_TO_FAILOVER, this new flag should allow
>>> failover
>>> >> >>>>>>>> in this
>>> >> >>>>>>>> >> case
>>> >> >>>>>>>> >> >>> only.
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > Without this change health check (especially health
>>> check
>>> >> >>>>>>>> retries)
>>> >> >>>>>>>> >> >>> doesn't
>>> >> >>>>>>>> >> >>> > make much sense - child error is more likely to
>>> occur on
>>> >> >>>>>>>> (temporary)
>>> >> >>>>>>>> >> >>> > backend failure then health check and will or will
>>> not
>>> >> cause
>>> >> >>>>>>>> >> failover to
>>> >> >>>>>>>> >> >>> > occur depending on backend flag, without giving
>>> health
>>> >> >>>>>>>> check retries
>>> >> >>>>>>>> >> a
>>> >> >>>>>>>> >> >>> > chance to determine if failure was temporary or not,
>>> >> >>>>>>>> risking split
>>> >> >>>>>>>> >> brain
>>> >> >>>>>>>> >> >>> > situation with two masters just because of temporary
>>> >> >>>>>>>> network link
>>> >> >>>>>>>> >> >>> hiccup.
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > Our main problem remains though with the health check
>>> >> >>>>>>>> timeout not
>>> >> >>>>>>>> >> being
>>> >> >>>>>>>> >> >>> > respected in these special conditions we have. Maybe
>>> >> Nenad
>>> >> >>>>>>>> can help
>>> >> >>>>>>>> >> you
>>> >> >>>>>>>> >> >>> > more to reproduce the issue on your environment.
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > Kind regards,
>>> >> >>>>>>>> >> >>> > Stevo.
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> > 2012/1/13 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>> >> >>> >
>>> >> >>>>>>>> >> >>> >> Thanks for pointing it out.
>>> >> >>>>>>>> >> >>> >> Yes, checking DISALLOW_TO_FAILOVER before retrying
>>> is
>>> >> >>>>>>>> wrong.
>>> >> >>>>>>>> >> >>> >> However, after retry count over, we should check
>>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER I
>>> >> >>>>>>>> >> >>> >> think.
>>> >> >>>>>>>> >> >>> >> Attached is the patch attempt to fix it. Please try.
>>> >> >>>>>>>> >> >>> >> --
>>> >> >>>>>>>> >> >>> >> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >>
>>> >> >>>>>>>> >> >>> >> > pgpool is being used in raw mode - just for
>>> (health
>>> >> >>>>>>>> check based)
>>> >> >>>>>>>> >> >>> failover
>>> >> >>>>>>>> >> >>> >> > part, so applications are not required to restart
>>> when
>>> >> >>>>>>>> standby
>>> >> >>>>>>>> >> gets
>>> >> >>>>>>>> >> >>> >> > promoted to new master. Here is pgpool.conf file
>>> and a
>>> >> >>>>>>>> very small
>>> >> >>>>>>>> >> >>> patch
>>> >> >>>>>>>> >> >>> >> > we're using applied to pgpool 3.1.1 release.
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> > We have to have DISALLOW_TO_FAILOVER set for the
>>> >> backend
>>> >> >>>>>>>> since any
>>> >> >>>>>>>> >> >>> child
>>> >> >>>>>>>> >> >>> >> > process that detects condition that
>>> master/backend0 is
>>> >> >>>>>>>> not
>>> >> >>>>>>>> >> >>> available, if
>>> >> >>>>>>>> >> >>> >> > DISALLOW_TO_FAILOVER was not set, will degenerate
>>> >> >>>>>>>> backend without
>>> >> >>>>>>>> >> >>> giving
>>> >> >>>>>>>> >> >>> >> > health check a chance to retry. We need health
>>> check
>>> >> >>>>>>>> with retries
>>> >> >>>>>>>> >> >>> because
>>> >> >>>>>>>> >> >>> >> > condition that backend0 is not available could be
>>> >> >>>>>>>> temporary
>>> >> >>>>>>>> >> (network
>>> >> >>>>>>>> >> >>> >> > glitches to the remote site where master is, or
>>> >> >>>>>>>> deliberate
>>> >> >>>>>>>> >> failover
>>> >> >>>>>>>> >> >>> of
>>> >> >>>>>>>> >> >>> >> > master postgres service from one node to the
>>> other on
>>> >> >>>>>>>> remote site
>>> >> >>>>>>>> >> -
>>> >> >>>>>>>> >> >>> in
>>> >> >>>>>>>> >> >>> >> both
>>> >> >>>>>>>> >> >>> >> > cases remote means remote to the pgpool that is
>>> going
>>> >> to
>>> >> >>>>>>>> perform
>>> >> >>>>>>>> >> >>> health
>>> >> >>>>>>>> >> >>> >> > checks and ultimately the failover) and we don't
>>> want
>>> >> >>>>>>>> standby to
>>> >> >>>>>>>> >> be
>>> >> >>>>>>>> >> >>> >> > promoted as easily to a new master, to prevent
>>> >> temporary
>>> >> >>>>>>>> network
>>> >> >>>>>>>> >> >>> >> conditions
>>> >> >>>>>>>> >> >>> >> > which could occur frequently to frequently cause
>>> split
>>> >> >>>>>>>> brain with
>>> >> >>>>>>>> >> two
>>> >> >>>>>>>> >> >>> >> > masters.
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> > But then, with DISALLOW_TO_FAILOVER set, without
>>> the
>>> >> >>>>>>>> patch health
>>> >> >>>>>>>> >> >>> check
>>> >> >>>>>>>> >> >>> >> > will not retry and will thus give only one chance
>>> to
>>> >> >>>>>>>> backend (if
>>> >> >>>>>>>> >> >>> health
>>> >> >>>>>>>> >> >>> >> > check ever occurs before child process failure to
>>> >> >>>>>>>> connect to the
>>> >> >>>>>>>> >> >>> >> backend),
>>> >> >>>>>>>> >> >>> >> > rendering retry settings effectively to be
>>> ignored.
>>> >> >>>>>>>> That's where
>>> >> >>>>>>>> >> this
>>> >> >>>>>>>> >> >>> >> patch
>>> >> >>>>>>>> >> >>> >> > comes into action - enables health check retries
>>> while
>>> >> >>>>>>>> child
>>> >> >>>>>>>> >> >>> processes
>>> >> >>>>>>>> >> >>> >> are
>>> >> >>>>>>>> >> >>> >> > prevented to degenerate backend.
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> > I don't think, but I could be wrong, that this
>>> patch
>>> >> >>>>>>>> influences
>>> >> >>>>>>>> >> the
>>> >> >>>>>>>> >> >>> >> > behavior we're seeing with unwanted health check
>>> >> attempt
>>> >> >>>>>>>> delays.
>>> >> >>>>>>>> >> >>> Also,
>>> >> >>>>>>>> >> >>> >> > knowing this, maybe pgpool could be patched or
>>> some
>>> >> >>>>>>>> other support
>>> >> >>>>>>>> >> be
>>> >> >>>>>>>> >> >>> >> built
>>> >> >>>>>>>> >> >>> >> > into it to cover this use case.
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> > Regards,
>>> >> >>>>>>>> >> >>> >> > Stevo.
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> > 2012/1/12 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> >> I have accepted the moderation request. Your post
>>> >> >>>>>>>> should be sent
>>> >> >>>>>>>> >> >>> >> shortly.
>>> >> >>>>>>>> >> >>> >> >> Also I have raised the post size limit to 1MB.
>>> >> >>>>>>>> >> >>> >> >> I will look into this...
>>> >> >>>>>>>> >> >>> >> >> --
>>> >> >>>>>>>> >> >>> >> >> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >>
>>> >> >>>>>>>> >> >>> >> >> > Here is the log file and strace output file
>>> (this
>>> >> >>>>>>>> time in an
>>> >> >>>>>>>> >> >>> archive,
>>> >> >>>>>>>> >> >>> >> >> > didn't know about 200KB constraint on post size
>>> >> which
>>> >> >>>>>>>> requires
>>> >> >>>>>>>> >> >>> >> moderator
>>> >> >>>>>>>> >> >>> >> >> > approval). Timings configured are 30sec health
>>> >> check
>>> >> >>>>>>>> interval,
>>> >> >>>>>>>> >> >>> 5sec
>>> >> >>>>>>>> >> >>> >> >> > timeout, and 2 retries with 10sec retry delay.
>>> >> >>>>>>>> >> >>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> > It takes a lot more than 5sec from started
>>> health
>>> >> >>>>>>>> check to
>>> >> >>>>>>>> >> >>> sleeping
>>> >> >>>>>>>> >> >>> >> 10sec
>>> >> >>>>>>>> >> >>> >> >> > for first retry.
>>> >> >>>>>>>> >> >>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> > Seen in code (main.x, health_check() function),
>>> >> >>>>>>>> within (retry)
>>> >> >>>>>>>> >> >>> attempt
>>> >> >>>>>>>> >> >>> >> >> > there is inner retry (first with postgres
>>> database
>>> >> >>>>>>>> then with
>>> >> >>>>>>>> >> >>> >> template1)
>>> >> >>>>>>>> >> >>> >> >> and
>>> >> >>>>>>>> >> >>> >> >> > that part doesn't seem to be interrupted by
>>> alarm.
>>> >> >>>>>>>> >> >>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> > Regards,
>>> >> >>>>>>>> >> >>> >> >> > Stevo.
>>> >> >>>>>>>> >> >>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> > 2012/1/12 Stevo Slavić <sslavic at gmail.com>
>>> >> >>>>>>>> >> >>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >> Here is the log file and strace output file.
>>> >> Timings
>>> >> >>>>>>>> >> configured
>>> >> >>>>>>>> >> >>> are
>>> >> >>>>>>>> >> >>> >> >> 30sec
>>> >> >>>>>>>> >> >>> >> >> >> health check interval, 5sec timeout, and 2
>>> retries
>>> >> >>>>>>>> with 10sec
>>> >> >>>>>>>> >> >>> retry
>>> >> >>>>>>>> >> >>> >> >> delay.
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >> It takes a lot more than 5sec from started
>>> health
>>> >> >>>>>>>> check to
>>> >> >>>>>>>> >> >>> sleeping
>>> >> >>>>>>>> >> >>> >> >> 10sec
>>> >> >>>>>>>> >> >>> >> >> >> for first retry.
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >> Seen in code (main.x, health_check()
>>> function),
>>> >> >>>>>>>> within (retry)
>>> >> >>>>>>>> >> >>> >> attempt
>>> >> >>>>>>>> >> >>> >> >> >> there is inner retry (first with postgres
>>> database
>>> >> >>>>>>>> then with
>>> >> >>>>>>>> >> >>> >> template1)
>>> >> >>>>>>>> >> >>> >> >> and
>>> >> >>>>>>>> >> >>> >> >> >> that part doesn't seem to be interrupted by
>>> alarm.
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >> Regards,
>>> >> >>>>>>>> >> >>> >> >> >> Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >> 2012/1/11 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> Ok, I will do it. In the mean time you could
>>> use
>>> >> >>>>>>>> "strace -tt
>>> >> >>>>>>>> >> -p
>>> >> >>>>>>>> >> >>> PID"
>>> >> >>>>>>>> >> >>> >> >> >>> to see which system call is blocked.
>>> >> >>>>>>>> >> >>> >> >> >>> --
>>> >> >>>>>>>> >> >>> >> >> >>> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> >>> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> >>> English:
>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> >>> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> > OK, got the info - key point is that ip
>>> >> >>>>>>>> forwarding is
>>> >> >>>>>>>> >> >>> disabled for
>>> >> >>>>>>>> >> >>> >> >> >>> security
>>> >> >>>>>>>> >> >>> >> >> >>> > reasons. Rules in iptables are not
>>> important,
>>> >> >>>>>>>> iptables can
>>> >> >>>>>>>> >> be
>>> >> >>>>>>>> >> >>> >> >> stopped,
>>> >> >>>>>>>> >> >>> >> >> >>> or
>>> >> >>>>>>>> >> >>> >> >> >>> > previously added rules removed.
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > Here are the steps to reproduce (kudos to
>>> my
>>> >> >>>>>>>> colleague
>>> >> >>>>>>>> >> Nenad
>>> >> >>>>>>>> >> >>> >> >> Bulatovic
>>> >> >>>>>>>> >> >>> >> >> >>> for
>>> >> >>>>>>>> >> >>> >> >> >>> > providing this):
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > 1.) make sure that ip forwarding is off:
>>> >> >>>>>>>> >> >>> >> >> >>> >     echo 0 > /proc/sys/net/ipv4/ip_forward
>>> >> >>>>>>>> >> >>> >> >> >>> > 2.) create IP alias on some interface (and
>>> have
>>> >> >>>>>>>> postgres
>>> >> >>>>>>>> >> >>> listen on
>>> >> >>>>>>>> >> >>> >> >> it):
>>> >> >>>>>>>> >> >>> >> >> >>> >     ip addr add x.x.x.x/yy dev ethz
>>> >> >>>>>>>> >> >>> >> >> >>> > 3.) set backend_hostname0 to
>>> aforementioned IP
>>> >> >>>>>>>> >> >>> >> >> >>> > 4.) start pgpool and monitor health checks
>>> >> >>>>>>>> >> >>> >> >> >>> > 5.) remove IP alias:
>>> >> >>>>>>>> >> >>> >> >> >>> >     ip addr del x.x.x.x/yy dev ethz
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > Here is the interesting part in pgpool log
>>> >> after
>>> >> >>>>>>>> this:
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>>> starting
>>> >> >>>>>>>> health
>>> >> >>>>>>>> >> checking
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>>> >> >>>>>>>> health_check: 0 th DB
>>> >> >>>>>>>> >> >>> node
>>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>>> >> >>>>>>>> health_check: 1 th DB
>>> >> >>>>>>>> >> >>> node
>>> >> >>>>>>>> >> >>> >> >> >>> status: 1
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358:
>>> starting
>>> >> >>>>>>>> health
>>> >> >>>>>>>> >> checking
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358:
>>> >> >>>>>>>> health_check: 0 th DB
>>> >> >>>>>>>> >> >>> node
>>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:43 DEBUG: pid 24358:
>>> >> >>>>>>>> health_check: 0 th DB
>>> >> >>>>>>>> >> >>> node
>>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 ERROR: pid 24358:
>>> health
>>> >> >>>>>>>> check failed.
>>> >> >>>>>>>> >> 0
>>> >> >>>>>>>> >> >>> th
>>> >> >>>>>>>> >> >>> >> host
>>> >> >>>>>>>> >> >>> >> >> >>> > 192.168.2.27 at port 5432 is down
>>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 LOG:   pid 24358:
>>> health
>>> >> >>>>>>>> check retry
>>> >> >>>>>>>> >> sleep
>>> >> >>>>>>>> >> >>> >> time:
>>> >> >>>>>>>> >> >>> >> >> 10
>>> >> >>>>>>>> >> >>> >> >> >>> > second(s)
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > That pgpool was configured with health
>>> check
>>> >> >>>>>>>> interval of
>>> >> >>>>>>>> >> >>> 30sec,
>>> >> >>>>>>>> >> >>> >> 5sec
>>> >> >>>>>>>> >> >>> >> >> >>> > timeout, and 10sec retry delay with 2 max
>>> >> retries.
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > Making use of libpq instead for connecting
>>> to
>>> >> db
>>> >> >>>>>>>> in health
>>> >> >>>>>>>> >> >>> checks
>>> >> >>>>>>>> >> >>> >> IMO
>>> >> >>>>>>>> >> >>> >> >> >>> > should resolve it, but you'll best
>>> determine
>>> >> >>>>>>>> which call
>>> >> >>>>>>>> >> >>> exactly
>>> >> >>>>>>>> >> >>> >> gets
>>> >> >>>>>>>> >> >>> >> >> >>> > blocked waiting. Btw, psql with
>>> >> PGCONNECT_TIMEOUT
>>> >> >>>>>>>> env var
>>> >> >>>>>>>> >> >>> >> configured
>>> >> >>>>>>>> >> >>> >> >> >>> > respects that env var timeout.
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> > Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> > On Wed, Jan 11, 2012 at 11:15 AM, Stevo
>>> Slavić
>>> >> <
>>> >> >>>>>>>> >> >>> sslavic at gmail.com
>>> >> >>>>>>>> >> >>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >> Tatsuo,
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >> Did you restart iptables after adding
>>> rule?
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >> Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> >> Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >> On Wed, Jan 11, 2012 at 11:12 AM, Stevo
>>> >> Slavić <
>>> >> >>>>>>>> >> >>> >> sslavic at gmail.com>
>>> >> >>>>>>>> >> >>> >> >> >>> wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>> Looking into this to verify if these are
>>> all
>>> >> >>>>>>>> necessary
>>> >> >>>>>>>> >> >>> changes
>>> >> >>>>>>>> >> >>> >> to
>>> >> >>>>>>>> >> >>> >> >> have
>>> >> >>>>>>>> >> >>> >> >> >>> >>> port unreachable message silently
>>> rejected
>>> >> >>>>>>>> (suspecting
>>> >> >>>>>>>> >> some
>>> >> >>>>>>>> >> >>> >> kernel
>>> >> >>>>>>>> >> >>> >> >> >>> >>> parameter tuning is needed).
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>> Just to clarify it's not a problem that
>>> host
>>> >> is
>>> >> >>>>>>>> being
>>> >> >>>>>>>> >> >>> detected
>>> >> >>>>>>>> >> >>> >> by
>>> >> >>>>>>>> >> >>> >> >> >>> pgpool
>>> >> >>>>>>>> >> >>> >> >> >>> >>> to be down, but the timing when that
>>> >> happens. On
>>> >> >>>>>>>> >> environment
>>> >> >>>>>>>> >> >>> >> where
>>> >> >>>>>>>> >> >>> >> >> >>> issue is
>>> >> >>>>>>>> >> >>> >> >> >>> >>> reproduced pgpool as part of health check
>>> >> >>>>>>>> attempt tries
>>> >> >>>>>>>> >> to
>>> >> >>>>>>>> >> >>> >> connect
>>> >> >>>>>>>> >> >>> >> >> to
>>> >> >>>>>>>> >> >>> >> >> >>> >>> backend and hangs for tcp timeout
>>> instead of
>>> >> >>>>>>>> being
>>> >> >>>>>>>> >> >>> interrupted
>>> >> >>>>>>>> >> >>> >> by
>>> >> >>>>>>>> >> >>> >> >> >>> timeout
>>> >> >>>>>>>> >> >>> >> >> >>> >>> alarm. Can you verify/confirm please the
>>> >> health
>>> >> >>>>>>>> check
>>> >> >>>>>>>> >> retry
>>> >> >>>>>>>> >> >>> >> timings
>>> >> >>>>>>>> >> >>> >> >> >>> are not
>>> >> >>>>>>>> >> >>> >> >> >>> >>> delayed?
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>> Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> >>> Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>> On Wed, Jan 11, 2012 at 10:50 AM, Tatsuo
>>> >> Ishii <
>>> >> >>>>>>>> >> >>> >> >> ishii at postgresql.org
>>> >> >>>>>>>> >> >>> >> >> >>> >wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> Ok, I did:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> # iptables -A FORWARD -j REJECT
>>> >> --reject-with
>>> >> >>>>>>>> >> >>> >> >> icmp-port-unreachable
>>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> on the host where pgpoo is running. And
>>> pull
>>> >> >>>>>>>> network
>>> >> >>>>>>>> >> cable
>>> >> >>>>>>>> >> >>> from
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> backend0 host network interface. Pgpool
>>> >> >>>>>>>> detected the
>>> >> >>>>>>>> >> host
>>> >> >>>>>>>> >> >>> being
>>> >> >>>>>>>> >> >>> >> >> down
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> as expected...
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> --
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> English:
>>> >> http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Backend is not destination of this
>>> >> message,
>>> >> >>>>>>>> pgpool
>>> >> >>>>>>>> >> host
>>> >> >>>>>>>> >> >>> is,
>>> >> >>>>>>>> >> >>> >> and
>>> >> >>>>>>>> >> >>> >> >> we
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> don't
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > want it to ever get it. With command
>>> I've
>>> >> >>>>>>>> sent you
>>> >> >>>>>>>> >> rule
>>> >> >>>>>>>> >> >>> will
>>> >> >>>>>>>> >> >>> >> be
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> created for
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > any source and destination.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> > On Wed, Jan 11, 2012 at 10:38 AM,
>>> Tatsuo
>>> >> >>>>>>>> Ishii <
>>> >> >>>>>>>> >> >>> >> >> >>> ishii at postgresql.org>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> I did following:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Do following on the host where
>>> pgpool is
>>> >> >>>>>>>> running on:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> # iptables -A FORWARD -j REJECT
>>> >> >>>>>>>> --reject-with
>>> >> >>>>>>>> >> >>> >> >> >>> icmp-port-unreachable -d
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> 133.137.177.124
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> (133.137.177.124 is the host where
>>> >> backend
>>> >> >>>>>>>> is running
>>> >> >>>>>>>> >> >>> on)
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Pull network cable from backend0 host
>>> >> >>>>>>>> network
>>> >> >>>>>>>> >> interface.
>>> >> >>>>>>>> >> >>> >> Pgpool
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> detected the host being down as
>>> expected.
>>> >> >>>>>>>> Am I
>>> >> >>>>>>>> >> missing
>>> >> >>>>>>>> >> >>> >> >> something?
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> --
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> English:
>>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Hello Tatsuo,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > With backend0 on one host just
>>> >> configure
>>> >> >>>>>>>> following
>>> >> >>>>>>>> >> >>> rule on
>>> >> >>>>>>>> >> >>> >> >> other
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> host
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> where
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > pgpool is:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > iptables -A FORWARD -j REJECT
>>> >> >>>>>>>> --reject-with
>>> >> >>>>>>>> >> >>> >> >> >>> icmp-port-unreachable
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > and then have pgpool startup with
>>> >> health
>>> >> >>>>>>>> checking
>>> >> >>>>>>>> >> and
>>> >> >>>>>>>> >> >>> >> >> retrying
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> configured,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > and then pull network cable from
>>> >> backend0
>>> >> >>>>>>>> host
>>> >> >>>>>>>> >> network
>>> >> >>>>>>>> >> >>> >> >> >>> interface.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > On Wed, Jan 11, 2012 at 6:27 AM,
>>> Tatsuo
>>> >> >>>>>>>> Ishii <
>>> >> >>>>>>>> >> >>> >> >> >>> ishii at postgresql.org
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> I want to try to test the
>>> situation
>>> >> you
>>> >> >>>>>>>> descrived:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > When system is configured for
>>> >> >>>>>>>> security
>>> >> >>>>>>>> >> reasons
>>> >> >>>>>>>> >> >>> not
>>> >> >>>>>>>> >> >>> >> to
>>> >> >>>>>>>> >> >>> >> >> >>> return
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> destination
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable messages,
>>> even
>>> >> >>>>>>>> though
>>> >> >>>>>>>> >> >>> >> >> >>> health_check_timeout is
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> But I don't know how to do it. I
>>> >> pulled
>>> >> >>>>>>>> out the
>>> >> >>>>>>>> >> >>> network
>>> >> >>>>>>>> >> >>> >> >> cable
>>> >> >>>>>>>> >> >>> >> >> >>> and
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> pgpool detected it as expected.
>>> Also I
>>> >> >>>>>>>> configured
>>> >> >>>>>>>> >> the
>>> >> >>>>>>>> >> >>> >> server
>>> >> >>>>>>>> >> >>> >> >> >>> which
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> PostgreSQL is running on to
>>> disable
>>> >> the
>>> >> >>>>>>>> 5432
>>> >> >>>>>>>> >> port. In
>>> >> >>>>>>>> >> >>> >> this
>>> >> >>>>>>>> >> >>> >> >> case
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> connect(2) returned EHOSTUNREACH
>>> (No
>>> >> >>>>>>>> route to
>>> >> >>>>>>>> >> host)
>>> >> >>>>>>>> >> >>> so
>>> >> >>>>>>>> >> >>> >> >> pgpool
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> detected
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> the error as expected.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Could you please instruct me?
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> --
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> English:
>>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Hello Tatsuo,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Thank you for replying!
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > I'm not sure what exactly is
>>> >> blocking,
>>> >> >>>>>>>> just by
>>> >> >>>>>>>> >> >>> pgpool
>>> >> >>>>>>>> >> >>> >> code
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> analysis I
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > suspect it is the part where a
>>> >> >>>>>>>> connection is
>>> >> >>>>>>>> >> made
>>> >> >>>>>>>> >> >>> to
>>> >> >>>>>>>> >> >>> >> the
>>> >> >>>>>>>> >> >>> >> >> db
>>> >> >>>>>>>> >> >>> >> >> >>> and
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> it
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> doesn't
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > seem to get interrupted by
>>> alarm.
>>> >> >>>>>>>> Tested
>>> >> >>>>>>>> >> thoroughly
>>> >> >>>>>>>> >> >>> >> health
>>> >> >>>>>>>> >> >>> >> >> >>> check
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> behaviour,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > it works really well when
>>> host/ip is
>>> >> >>>>>>>> there and
>>> >> >>>>>>>> >> just
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> backend/postgres
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> is
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > down, but not when backend
>>> host/ip
>>> >> is
>>> >> >>>>>>>> down. I
>>> >> >>>>>>>> >> could
>>> >> >>>>>>>> >> >>> >> see in
>>> >> >>>>>>>> >> >>> >> >> >>> log
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> that
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> initial
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > health check and each retry got
>>> >> >>>>>>>> delayed when
>>> >> >>>>>>>> >> >>> host/ip is
>>> >> >>>>>>>> >> >>> >> >> not
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> reachable,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > while when just backend is not
>>> >> >>>>>>>> listening (is
>>> >> >>>>>>>> >> down)
>>> >> >>>>>>>> >> >>> on
>>> >> >>>>>>>> >> >>> >> the
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> reachable
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> host/ip
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > then initial health check and
>>> all
>>> >> >>>>>>>> retries are
>>> >> >>>>>>>> >> >>> exact to
>>> >> >>>>>>>> >> >>> >> the
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> settings in
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > pgpool.conf.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > PGCONNECT_TIMEOUT is listed as
>>> one
>>> >> of
>>> >> >>>>>>>> the libpq
>>> >> >>>>>>>> >> >>> >> >> environment
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> variables
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> in
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > the docs (see
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >>
>>> >> >>>>>>>> http://www.postgresql.org/docs/9.1/static/libpq-envars.html)
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > There is equivalent parameter in
>>> >> libpq
>>> >> >>>>>>>> >> >>> >> PGconnectdbParams (
>>> >> >>>>>>>> >> >>> >> >> >>> see
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>>> >> >>>>>>>> >> >>> >> >> >>>
>>> >> >>>>>>>> >> >>> >> >>
>>> >> >>>>>>>> >> >>> >>
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >>
>>> >> >>>>>>>>
>>> >>
>>> http://www.postgresql.org/docs/9.1/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> )
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > At the beginning of that same
>>> page
>>> >> >>>>>>>> there are
>>> >> >>>>>>>> >> some
>>> >> >>>>>>>> >> >>> >> >> important
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> infos on
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> using
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > these functions.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > psql respects PGCONNECT_TIMEOUT.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Regards,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Stevo.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > On Wed, Jan 11, 2012 at 12:13
>>> AM,
>>> >> >>>>>>>> Tatsuo Ishii <
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> ishii at postgresql.org>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> wrote:
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > Hello pgpool community,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> >
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > When system is configured for
>>> >> >>>>>>>> security
>>> >> >>>>>>>> >> reasons
>>> >> >>>>>>>> >> >>> not
>>> >> >>>>>>>> >> >>> >> to
>>> >> >>>>>>>> >> >>> >> >> >>> return
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> destination
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable messages,
>>> even
>>> >> >>>>>>>> though
>>> >> >>>>>>>> >> >>> >> >> >>> health_check_timeout is
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> configured,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > socket call will block and
>>> alarm
>>> >> >>>>>>>> will not get
>>> >> >>>>>>>> >> >>> raised
>>> >> >>>>>>>> >> >>> >> >> >>> until TCP
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> timeout
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > occurs.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Interesting. So are you saying
>>> that
>>> >> >>>>>>>> read(2)
>>> >> >>>>>>>> >> >>> cannot be
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> interrupted by
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> alarm signal if the system is
>>> >> >>>>>>>> configured not to
>>> >> >>>>>>>> >> >>> return
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> destination
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> host unreachable message?
>>> Could you
>>> >> >>>>>>>> please
>>> >> >>>>>>>> >> guide
>>> >> >>>>>>>> >> >>> me
>>> >> >>>>>>>> >> >>> >> >> where I
>>> >> >>>>>>>> >> >>> >> >> >>> can
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> get
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> such that info? (I'm not a
>>> network
>>> >> >>>>>>>> expert).
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > Not a C programmer, found
>>> some
>>> >> info
>>> >> >>>>>>>> that
>>> >> >>>>>>>> >> select
>>> >> >>>>>>>> >> >>> call
>>> >> >>>>>>>> >> >>> >> >> >>> could be
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> replace
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> with
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > select/pselect calls. Maybe
>>> it
>>> >> >>>>>>>> would be best
>>> >> >>>>>>>> >> if
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> PGCONNECT_TIMEOUT
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> value
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > could be used here for
>>> connection
>>> >> >>>>>>>> timeout.
>>> >> >>>>>>>> >> >>> pgpool
>>> >> >>>>>>>> >> >>> >> has
>>> >> >>>>>>>> >> >>> >> >> >>> libpq as
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> dependency,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > why isn't it using libpq for
>>> the
>>> >> >>>>>>>> healthcheck
>>> >> >>>>>>>> >> db
>>> >> >>>>>>>> >> >>> >> connect
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> calls, then
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > PGCONNECT_TIMEOUT would be
>>> >> applied?
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> I don't think libpq uses
>>> >> >>>>>>>> select/pselect for
>>> >> >>>>>>>> >> >>> >> establishing
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> connection,
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> but using libpq instead of
>>> homebrew
>>> >> >>>>>>>> code seems
>>> >> >>>>>>>> >> to
>>> >> >>>>>>>> >> >>> be
>>> >> >>>>>>>> >> >>> >> an
>>> >> >>>>>>>> >> >>> >> >> >>> idea.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> Let me
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> think about it.
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> One question. Are you sure that
>>> >> libpq
>>> >> >>>>>>>> can deal
>>> >> >>>>>>>> >> >>> with
>>> >> >>>>>>>> >> >>> >> the
>>> >> >>>>>>>> >> >>> >> >> case
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> (not to
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> return destination host
>>> unreachable
>>> >> >>>>>>>> messages)
>>> >> >>>>>>>> >> by
>>> >> >>>>>>>> >> >>> using
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> PGCONNECT_TIMEOUT?
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> --
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Tatsuo Ishii
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> SRA OSS, Inc. Japan
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> English:
>>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Japanese:
>>> http://www.sraoss.co.jp
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>>
>>> >> >>>>>>>> >> >>> >> >> >>> >>
>>> >> >>>>>>>> >> >>> >> >> >>>
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >> >>
>>> >> >>>>>>>> >> >>> >> >>
>>> >> >>>>>>>> >> >>> >>
>>> >> >>>>>>>> >> >>>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >> >>
>>> >> >>>>>>>> >>
>>> >> >>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >>
>>>
>>