[pgpool-hackers: 31] Re: [pgpool-general: 131] Healthcheck timeout not always respected

Mon Feb 27 21:45:52 JST 2012

Your use case seems to be not very common. I think better solution
would be adding a flag something like "FAILOVER_ON_ADMIN_SHUTDOWN". If
the flag is set, pgpool does not trigger failover by postmaster
shutdown.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Imagine two geo-redundant data centers, in each data center couple of nodes
> dedicated to running postgres as clustered service, and the two clustered
> postgres services (one in each data center) configured in streaming
> replication (so one is master, and other standby, standby initiates
> replication).
> 
> Then imagine pgpool performing failover immediately upon relocation of
> currently active postgres master service from one node to the other node
> within same data center/cluster, for maintenance of node where master
> service was active. Failover will promote standby postgres to new master,
> and original master will relocate after couple of seconds and startup as
> master - split brain condition - not good.
> 
> This can happen in abrupt failures too of a node with active postgres
> master service - cluster manager will detect failure of a node and recover
> postgres service on another node in same cluster. With pgpool not letting
> health check control failover, and such condition occurs a split brain
> condition would last some time, and recovery would be hard and painful.
> 
> If pgpool didn't failover immediately, and let health check decide if and
> when to failover instead, and if pgpool health check is tuned to
> retry/delay giving enough time to postgres cluster service to
> relocate/recover - split brain condition would not occur.
> 
> Kind regards,
> Stevo.
> 
> 2012/2/24 Tatsuo Ishii <ishii at postgresql.org>
> 
>> > Hello Tatsuo,
>> >
>> > Thank you for accepting and improving majority of the changes.
>> > Unfortunately, not accepted part is a show stopper so I still have to use
>> > patched/customized pgpool version in production, since it seems still to
>> be
>> > impossible to configure pgpool so that health check is only controls if
>> and
>> > when failover should triggered. With latest sources, and pgpool
>> configured
>> > as you suggested (backed flag set to ALLOW_TO_FAILOVER, and
>> > fail_over_on_backend_error set to off) with two backends in raw mode,
>> after
>> > initial stable state pgpool triggered failover of primary backend as soon
>> > as that backend became inaccessible to pgpool without giving healthcheck
>> a
>> > chance. Primary backend was shutdown to simulate failure/relocation but
>> > same would happen if just connecting to backend failed because of
>> temporary
>> > network issue.
>> >
>> > This behaviour is in line with documentation of
>> fail_over_on_backend_error
>> > configuration parameter which states:
>> > "Please note that even if this parameter is set to off, however, pgpool
>> > will also do the fail over when connecting to a backend fails or pgpool
>> > detects the administrative shutdown of postmaster."
>> >
>> > But, it is perfectly valid requirement to want to prevent failover to
>> occur
>> > immediately when connecting to a backend fails - that condition could be
>> > temporary, e.g. temporary network condition. Health check retries were
>> > designed to cover this situation, so one can configure even for health
>> > check to fail connecting several times, but all is fine and no failover
>> > should occur as long as after configured number of retries backend is
>> > accessible again.
>>
>> I understand the point. It would be nice to retry connecting to
>> backend in pgpool child as the health check does.
>>
>> > Also, it is perfectly valid requirement to prevent failover to occur
>> > immediately when administrative shutdown of backend is performed. For
>> > example, a single backend for high availability and easy maintenance can
>> be
>> > configured as cluster service with e.g. two or more nodes where it can
>> run
>> > while it actually runs on one node only at a given point in time. So e.g.
>> > when admin wants to upgrade postgres installation on each of the nodes
>> > within the cluster, to upgrade postgres installation on a node where
>> > postgres service is currently active, admin relocates service to some
>> other
>> > node in a cluster. Relocation causes stop (administrative shutdown) of
>> > postgres service on currently active node, and starts it on another node.
>> > pgpool which is configured to use such clustered postgres service as a
>> > single backend (bound to cluster service ip) should not perform failover
>> on
>> > detected administrative shutdown - reloaction takes time, and healthcheck
>> > is configured to give relocation enough time, and it should be only one
>> to
>> > trigger failover if backend is still not accessible after configured
>> number
>> > of retries and delays between them.
>>
>> This I don't understand. Why don't you use pcp_attach_node in this
>> case after failover?
>>
>> > Given these two examples, I hope you'll agree that it is valid
>> requirement
>> > to want to let healthcheck only control when failover should be
>> triggered.
>> >
>> > Unfortunately this is not possible at the moment. Configuring backend
>> flag
>> > to DISALLOW_TO_FAILOVER will prevent health check to trigger failover.
>> With
>> > fail_over_on_backend_error set to off, will let failover be triggered
>> > immediately on temporary conditions that health check with retries should
>> > handle.
>> >
>> > Did I miss something, how does one configure pgpool to have health check
>> to
>> > be only process in pgpool that triggers failover?
>> >
>> > Kind regards,
>> > Stevo.
>> >
>> > 2012/2/19 Tatsuo Ishii <ishii at postgresql.org>
>> >
>> >> Stevo,
>> >>
>> >> Thanks for the patches. I have committed changes except the part which
>> >> you ignore DISALLOW_TO_FAILOVER. Instead I modified low level socket
>> >> reading functions not to unconditionaly failover when fails to read
>> >> from backend sockets (only failover when If fail_over_on_backend_error
>> >> is on). So if you want to trigger failover only when health checking
>> >> fails, you want to turn off fail_over_on_backend_error and turn off
>> >> DISALLOW_TO_FAILOVER.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> > Hello Tatsuo,
>> >> >
>> >> > Attached is cumulative patch rebased to current master branch head
>> which:
>> >> > - Fixes health check timeout not always respected (includes unsetting
>> >> > non-blocking mode after connection has been successfully established);
>> >> > - Fixes failover on health check only support.
>> >> >
>> >> > Kind regards,
>> >> > Stevo.
>> >> >
>> >> > 2012/2/5 Stevo Slavić <sslavic at gmail.com>
>> >> >
>> >> >> Tatsuo,
>> >> >>
>> >> >> Thank you very much for your time and effort put into analysis of the
>> >> >> submitted patch,
>> >> >>
>> >> >>
>> >> >> Obviously I'm missing something regarding healthcheck feature, so
>> please
>> >> >> clarify:
>> >> >>
>> >> >>    - what is the purpose of healthcheck when backend flag is set to
>> >> >>    DISALLOW_TO_FAILOVER? To log that healthchecks are on time but
>> will
>> >> not
>> >> >>    actually do anything?
>> >> >>    - what is the purpose of healthcheck (especially with retries
>> >> >>    configured) when backend flag is set to ALLOW_TO_FAILOVER? When
>> >> answering
>> >> >>    please consider case of non-helloworld application that connects
>> to
>> >> db via
>> >> >>    pgpool - will healthcheck be given a chance to fail even once?
>> >> >>    - since there is no other backend flag value than the mentioned
>> two,
>> >> >>    what is the purpose of healthcheck (especially with retries
>> >> configured) if
>> >> >>    it's not to be the sole process controlling when to failover?
>> >> >>
>> >> >> I disagree that changing pgpool to give healthcheck feature a meaning
>> >> >> disrupts DISALLOW_TO_FAILOVER meaning, it extends it just for case
>> when
>> >> >> healthcheck is configured - if one doesn't want healthcheck just
>> keep on
>> >> >> not-using it, it's disabled by default. Health checks and retries
>> have
>> >> only
>> >> >> recently been introduced so I doubt there are many if any users of
>> >> health
>> >> >> check especially which have configured DISALLOW_TO_FAILOVER with
>> >> >> expectation to just have health check logging but not actually do
>> >> anything.
>> >> >> Out of all pgpool healthcheck users which have backends set to
>> >> >> DISALLOW_TO_FAILOVER too I believe most of them expect but do not
>> know
>> >> that
>> >> >> this will not allow failover on health check, it will just make log
>> >> bigger.
>> >> >> Changes included in patch do not affect users which have health check
>> >> >> configured and backend set to ALLOW_TO_FAILOVER.
>> >> >>
>> >> >>
>> >> >> About non-blocking connection to backend change:
>> >> >>
>> >> >>    - with pgpool in raw mode and extensive testing (endurance tests,
>> >> >>    failover and failback tests), I didn't notice any unwanted change
>> in
>> >> >>    behaviour, apart from wanted non-blocking timeout aware health
>> >> checks;
>> >> >>    - do you see or know about anything in pgpool depending on
>> connection
>> >> >>    to backend being blocking one? will have a look myself, just
>> asking
>> >> maybe
>> >> >>    you've found something already. will look into means to set
>> >> connection back
>> >> >>    to being blocking after it's successfully established - maybe just
>> >> changing
>> >> >>    that flag will do.
>> >> >>
>> >> >>
>> >> >> Kind regards,
>> >> >>
>> >> >> Stevo.
>> >> >>
>> >> >>
>> >> >> On Feb 5, 2012 6:50 AM, "Tatsuo Ishii" <ishii at postgresql.org> wrote:
>> >> >>
>> >> >>> Finially I have time to check your patches. Here is the result of
>> >> review.
>> >> >>>
>> >> >>> > Hello Tatsuo,
>> >> >>> >
>> >> >>> > Here is cumulative patch to be applied on pgpool master branch
>> with
>> >> >>> > following fixes included:
>> >> >>> >
>> >> >>> >    1. fix for health check bug
>> >> >>> >       1. it was not possible to allow backend failover only on
>> failed
>> >> >>> >       health check(s);
>> >> >>> >       2. to achieve this one just configures backend to
>> >> >>> >       DISALLOW_TO_FAILOVER, sets fail_over_on_backend_error to
>> off,
>> >> and
>> >> >>> >       configures health checks;
>> >> >>> >       3. for this fix in code an unwanted check was removed in
>> >> main.c,
>> >> >>> >       after health check failed if DISALLOW_TO_FAILOVER was set
>> for
>> >> >>> backend
>> >> >>> >       failover would have been always prevented, even when one
>> >> >>> > configures health
>> >> >>> >       check whose sole purpose is to control failover
>> >> >>>
>> >> >>> This is not acceptable, at least for stable
>> >> >>> releases. DISALLOW_TO_FAILOVER and sets fail_over_on_backend_error
>> are
>> >> >>> for different purposes. The former is for preventing any failover
>> >> >>> including health check. The latter is for write to communication
>> >> >>> socket.
>> >> >>>
>> >> >>> fail_over_on_backend_error = on
>> >> >>>                                   # Initiates failover when writing
>> to
>> >> the
>> >> >>>                                   # backend communication socket
>> fails
>> >> >>>                                   # This is the same behaviour of
>> >> >>> pgpool-II
>> >> >>>                                   # 2.2.x and previous releases
>> >> >>>                                   # If set to off, pgpool will
>> report
>> >> an
>> >> >>>                                   # error and disconnect the
>> session.
>> >> >>>
>> >> >>> Your patch changes the existing semantics. Another point is,
>> >> >>> DISALLOW_TO_FAILOVER allows to control per backend behavior. Your
>> >> >>> patch breaks it.
>> >> >>>
>> >> >>> >       2. fix for health check bug
>> >> >>> >       1. health check timeout was not being respected in all
>> >> conditions
>> >> >>> >       (icmp host unreachable messages dropped for security
>> reasons,
>> >> or
>> >> >>> > no active
>> >> >>> >       network component to send those message)
>> >> >>> >       2. for this fix in code (main.c, pool.h,
>> >> pool_connection_pool.c)
>> >> >>> inet
>> >> >>> >       connections have been made to be non blocking, and during
>> >> >>> connection
>> >> >>> >       retries status of now global health_check_timer_expired
>> >> variable
>> >> >>> is being
>> >> >>> >       checked
>> >> >>>
>> >> >>> This seems good. But I need more investigation. For example, your
>> >> >>> patch set non blocking to sockets but never revert back to blocking.
>> >> >>>
>> >> >>> >       3. fix for failback bug
>> >> >>> >       1. in raw mode, after failback (through pcp_attach_node)
>> >> standby
>> >> >>> >       node/backend would remain in invalid state
>> >> >>>
>> >> >>> It turned out that even failover was bugged. The status was not set
>> to
>> >> >>> CON_DOWN. This leaves the status to CON_CONNECT_WAIT and it
>> prevented
>> >> >>> failback from returning to normal state. I fixed this on master
>> branch.
>> >> >>>
>> >> >>> > (it would be in CON_UP, so on
>> >> >>> >       failover after failback pgpool would not be able to connect
>> to
>> >> >>> standby as
>> >> >>> >       get_next_master_node expects standby nodes/backends in raw
>> mode
>> >> >>> to be in
>> >> >>> >       CON_CONNECT_WAIT state when finding next master node)
>> >> >>> >       2. for this fix in code, when in raw mode on failback
>> status of
>> >> >>> all
>> >> >>> >       nodes/backends with CON_UP state is set to CON_CONNECT_WAIT
>> -
>> >> >>> > all children
>> >> >>> >       are restarted anyway
>> >> >>>
>> >> >>>
>> >> >>> > Neither of these fixes changes expected behaviour of related
>> >> features so
>> >> >>> > there are no changes to the documentation.
>> >> >>> >
>> >> >>> >
>> >> >>> > Kind regards,
>> >> >>> >
>> >> >>> > Stevo.
>> >> >>> >
>> >> >>> >
>> >> >>> > 2012/1/24 Tatsuo Ishii <ishii at postgresql.org>
>> >> >>> >
>> >> >>> >> > Additional testing confirmed that this fix ensures health check
>> >> timer
>> >> >>> >> gets
>> >> >>> >> > respected (should I create a ticket on some issue tracker? send
>> >> >>> >> cumulative
>> >> >>> >> > patch with all changes to have it accepted?).
>> >> >>> >>
>> >> >>> >> We have problem with Mantis bug tracker and decided to stop using
>> >> >>> >> it(unless someone volunteers to fix it). Please send cumulative
>> >> patch
>> >> >>> >> againt master head to this list so that we will be able to look
>> >> >>> >> into(be sure to include English doc changes).
>> >> >>> >> --
>> >> >>> >> Tatsuo Ishii
>> >> >>> >> SRA OSS, Inc. Japan
>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >>
>> >> >>> >> > Problem is that with all the testing another issue has been
>> >> >>> encountered,
>> >> >>> >> > now with pcp_attach_node.
>> >> >>> >> >
>> >> >>> >> > With pgpool in raw mode and two backends in postgres 9
>> streaming
>> >> >>> >> > replication, when backend0 fails, after health checks retries
>> >> pgpool
>> >> >>> >> calls
>> >> >>> >> > failover command and degenerates backend0, backend1 gets
>> promoted
>> >> to
>> >> >>> new
>> >> >>> >> > master, pgpool can connect to that master, and two backends
>> are in
>> >> >>> pgpool
>> >> >>> >> > state 3/2. And this is ok and expected.
>> >> >>> >> >
>> >> >>> >> > Once backend0 is recovered, it's attached back to pgpool using
>> >> >>> >> > pcp_attach_node, and pgpool will show two backends in state 2/2
>> >> (in
>> >> >>> logs
>> >> >>> >> > and in show pool_nodes; query) with backend0 taking all the
>> load
>> >> (raw
>> >> >>> >> > mode). If after that recovery and attachment of backend0
>> pgpool is
>> >> >>> not
>> >> >>> >> > restarted, and afetr some time backend0 fails again, after
>> health
>> >> >>> check
>> >> >>> >> > retries backend0 will get degenerated, failover command will
>> get
>> >> >>> called
>> >> >>> >> > (promotes standby to master), but pgpool will not be able to
>> >> connect
>> >> >>> to
>> >> >>> >> > backend1 (regardless if unix or inet sockets are used for
>> >> backend1).
>> >> >>> Only
>> >> >>> >> > if pgpool is restarted before second (complete) failure of
>> >> backend0,
>> >> >>> will
>> >> >>> >> > pgpool be able to connect to backend1.
>> >> >>> >> >
>> >> >>> >> > Following code, pcp_attach_node (failback of backend0) will
>> >> actually
>> >> >>> >> > execute same code as for failover. Not sure what, but that
>> >> failover
>> >> >>> does
>> >> >>> >> > something with backend1 state or in memory settings, so that
>> >> pgpool
>> >> >>> can
>> >> >>> >> no
>> >> >>> >> > longer connect to backend1. Is this a known issue?
>> >> >>> >> >
>> >> >>> >> > Kind regards,
>> >> >>> >> > Stevo.
>> >> >>> >> >
>> >> >>> >> > 2012/1/20 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >
>> >> >>> >> >> Key file was missing from that commit/change - pool.h where
>> >> >>> >> >> health_check_timer_expired was made global. Included now
>> attached
>> >> >>> patch.
>> >> >>> >> >>
>> >> >>> >> >> Kind regards,
>> >> >>> >> >> Stevo.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> 2012/1/20 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>
>> >> >>> >> >>> Using exit_request was wrong and caused a bug. 4th patch
>> needed
>> >> -
>> >> >>> >> >>> health_check_timer_expired is global now so it can be
>> verified
>> >> if
>> >> >>> it
>> >> >>> >> was
>> >> >>> >> >>> set to 1 outside of main.c
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >>> Kind regards,
>> >> >>> >> >>> Stevo.
>> >> >>> >> >>>
>> >> >>> >> >>> 2012/1/19 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>>
>> >> >>> >> >>>> Using exit_code was not wise. Tested and encountered a case
>> >> where
>> >> >>> this
>> >> >>> >> >>>> results in a bug. Have to work on it more. Main issue is
>> how in
>> >> >>> >> >>>> pool_connection_pool.c connect_inet_domain_socket_by_port
>> >> >>> function to
>> >> >>> >> know
>> >> >>> >> >>>> that health check timer has expired (set to 1). Any ideas?
>> >> >>> >> >>>>
>> >> >>> >> >>>> Kind regards,
>> >> >>> >> >>>> Stevo.
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>> 2012/1/19 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>>>
>> >> >>> >> >>>>> Tatsuo,
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Here are the patches which should be applied to current
>> pgpool
>> >> >>> head
>> >> >>> >> for
>> >> >>> >> >>>>> fixing this issue:
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Fixes-health-check-timeout.patch
>> >> >>> >> >>>>> Fixes-health-check-retrying-after-failover.patch
>> >> >>> >> >>>>> Fixes-clearing-exitrequest-flag.patch
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Quirk I noticed in logs was resolved as well - after
>> failover
>> >> >>> pgpool
>> >> >>> >> >>>>> would perform healthcheck and report it is doing (max
>> retries
>> >> +
>> >> >>> 1) th
>> >> >>> >> >>>>> health check which was confusing. Rather I've adjusted
>> that it
>> >> >>> does
>> >> >>> >> and
>> >> >>> >> >>>>> reports it's doing a new health check cycle after failover.
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> I've tested and it works well - when in raw mode, backends
>> >> set to
>> >> >>> >> >>>>> disallow failover, failover on backend failure disabled,
>> and
>> >> >>> health
>> >> >>> >> checks
>> >> >>> >> >>>>> configured with retries (30sec interval, 5sec timeout, 2
>> >> retries,
>> >> >>> >> 10sec
>> >> >>> >> >>>>> delay between retries).
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Please test, and if confirmed ok include in next release.
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Kind regards,
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Stevo.
>> >> >>> >> >>>>>
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> 2012/1/16 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>>>>
>> >> >>> >> >>>>>> Here is pgpool.log, strace.out, and pgpool.conf when I
>> tested
>> >> >>> with
>> >> >>> >> my
>> >> >>> >> >>>>>> latest patch for health check timeout applied. It works
>> well,
>> >> >>> >> except for
>> >> >>> >> >>>>>> single quirk, after failover completed in log files it was
>> >> >>> reported
>> >> >>> >> that
>> >> >>> >> >>>>>> 3rd health check retry was done (even though just 2 are
>> >> >>> configured,
>> >> >>> >> see
>> >> >>> >> >>>>>> pgpool.conf) and that backend has returned to healthy
>> state.
>> >> >>> That
>> >> >>> >> >>>>>> interesting part from log file follows:
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45
>> >> >>> DEBUG: pid
>> >> >>> >> >>>>>> 1163: retrying 3 th health checking
>> >> >>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45
>> >> >>> DEBUG: pid
>> >> >>> >> >>>>>> 1163: health_check: 0 th DB node status: 3
>> >> >>> >> >>>>>> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45
>> >> LOG:
>> >> >>>   pid
>> >> >>> >> >>>>>> 1163: after some retrying backend returned to healthy
>> state
>> >> >>> >> >>>>>> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15
>> >> >>> DEBUG: pid
>> >> >>> >> >>>>>> 1163: starting health checking
>> >> >>> >> >>>>>> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15
>> >> >>> DEBUG: pid
>> >> >>> >> >>>>>> 1163: health_check: 0 th DB node status: 3
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>> As can be seen in pgpool.conf, there is only one backend
>> >> >>> configured.
>> >> >>> >> >>>>>> pgpool did failover well after health check max retries
>> has
>> >> been
>> >> >>> >> reached
>> >> >>> >> >>>>>> (pgpool just degraded that single backend to 3, and
>> restarted
>> >> >>> child
>> >> >>> >> >>>>>> processes).
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>> After this quirk has been logged, next health check logs
>> >> were as
>> >> >>> >> >>>>>> expected. Except those couple weird log entries,
>> everything
>> >> >>> seems
>> >> >>> >> to be ok.
>> >> >>> >> >>>>>> Maybe that quirk was caused by single backend only
>> >> configuration
>> >> >>> >> corner
>> >> >>> >> >>>>>> case. Will try tomorrow if it occurs on dual backend
>> >> >>> configuration.
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>> Regards,
>> >> >>> >> >>>>>> Stevo.
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>> 2012/1/16 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>>> Hello Tatsuo,
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> Unfortunately, with your patch when A is on
>> >> >>> >> >>>>>>> (pool_config->health_check_period > 0) and B is on, when
>> >> retry
>> >> >>> >> count is
>> >> >>> >> >>>>>>> over, failover will be disallowed because of B being on.
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> Nenad's patch allows failover to be triggered only by
>> health
>> >> >>> check.
>> >> >>> >> >>>>>>> Here is the patch which includes Nenad's fix but also
>> fixes
>> >> >>> issue
>> >> >>> >> with
>> >> >>> >> >>>>>>> health check timeout not being respected.
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> Key points in fix for health check timeout being
>> respected
>> >> are:
>> >> >>> >> >>>>>>> - in pool_connection_pool.c
>> >> connect_inet_domain_socket_by_port
>> >> >>> >> >>>>>>> function, before trying to connect, file descriptor is
>> set
>> >> to
>> >> >>> >> non-blocking
>> >> >>> >> >>>>>>> mode, and also non-blocking mode error codes are handled,
>> >> >>> >> EINPROGRESS and
>> >> >>> >> >>>>>>> EALREADY (please verify changes here, especially
>> regarding
>> >> >>> closing
>> >> >>> >> fd)
>> >> >>> >> >>>>>>> - in main.c health_check_timer_handler has been changed
>> to
>> >> >>> signal
>> >> >>> >> >>>>>>> exit_request to health check initiated
>> >> >>> >> connect_inet_domain_socket_by_port
>> >> >>> >> >>>>>>> function call (please verify this, maybe there is a
>> better
>> >> way
>> >> >>> to
>> >> >>> >> check
>> >> >>> >> >>>>>>> from connect_inet_domain_socket_by_port if in
>> >> >>> >> health_check_timer_expired
>> >> >>> >> >>>>>>> has been set to 1)
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> These changes will practically make connect attempt to be
>> >> >>> >> >>>>>>> non-blocking and repeated until:
>> >> >>> >> >>>>>>> - connection is made, or
>> >> >>> >> >>>>>>> - unhandled connection error condition is reached, or
>> >> >>> >> >>>>>>> - health check timer alarm has been raised, or
>> >> >>> >> >>>>>>> - some other exit request (shutdown) has been issued.
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> Kind regards,
>> >> >>> >> >>>>>>> Stevo.
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>>> Ok, let me clarify use cases regarding failover.
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> Currently there are three parameters:
>> >> >>> >> >>>>>>>> a) health_check
>> >> >>> >> >>>>>>>> b) DISALLOW_TO_FAILOVER
>> >> >>> >> >>>>>>>> c) fail_over_on_backend_error
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> Source of errors which can trigger failover are 1)health
>> >> check
>> >> >>> >> >>>>>>>> 2)write
>> >> >>> >> >>>>>>>> to backend socket 3)read backend from socket. I
>> represent
>> >> >>> each 1)
>> >> >>> >> as
>> >> >>> >> >>>>>>>> A, 2) as B, 3) as C.
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 1) trigger failover if A or B or C is error
>> >> >>> >> >>>>>>>> a = on, b = off, c = on
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 2) trigger failover only when B or C is error
>> >> >>> >> >>>>>>>> a = off, b = off, c = on
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 3) trigger failover only when B is error
>> >> >>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 4) trigger failover only when C is error
>> >> >>> >> >>>>>>>> a = off, b = off, c = off
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 5) trigger failover only when A is error(Stevo wants
>> this)
>> >> >>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 6) never trigger failover
>> >> >>> >> >>>>>>>> Impossible. Because C error always triggers failover.
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> As you can see, C is the problem here (look at #3, #5
>> and
>> >> #6)
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> If we implemented this:
>> >> >>> >> >>>>>>>> >> However I think we should disable failover if
>> >> >>> >> >>>>>>>> DISALLOW_TO_FAILOVER set
>> >> >>> >> >>>>>>>> >> in case of reading data from backend. This should
>> have
>> >> been
>> >> >>> >> done
>> >> >>> >> >>>>>>>> when
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER was introduced because this is
>> >> exactly
>> >> >>> >> what
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you
>> >> >>> think?
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 1) trigger failover if A or B or C is error
>> >> >>> >> >>>>>>>> a = on, b = off, c = on
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 2) trigger failover only when B or C is error
>> >> >>> >> >>>>>>>> a = off, b = off, c = on
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 3) trigger failover only when B is error
>> >> >>> >> >>>>>>>> a = off, b = on, c = on
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 4) trigger failover only when C is error
>> >> >>> >> >>>>>>>> a = off, b = off, c = off
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 5) trigger failover only when A is error(Stevo wants
>> this)
>> >> >>> >> >>>>>>>> a = on, b = on, c = off
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> 6) never trigger failover
>> >> >>> >> >>>>>>>> a = off, b = on, c = off
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> So it seems my patch will solve all the problems
>> including
>> >> >>> yours.
>> >> >>> >> >>>>>>>> (timeout while retrying is another issue of course).
>> >> >>> >> >>>>>>>> --
>> >> >>> >> >>>>>>>> Tatsuo Ishii
>> >> >>> >> >>>>>>>> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>> > I agree, fail_over_on_backend_error isn't useful, just
>> >> adds
>> >> >>> >> >>>>>>>> confusion by
>> >> >>> >> >>>>>>>> > overlapping with DISALLOW_TO_FAILOVER.
>> >> >>> >> >>>>>>>> >
>> >> >>> >> >>>>>>>> > With your patch or without it, it is not possible to
>> >> >>> failover
>> >> >>> >> only
>> >> >>> >> >>>>>>>> on
>> >> >>> >> >>>>>>>> > health check (max retries) failure. With Nenad's
>> patch,
>> >> that
>> >> >>> >> part
>> >> >>> >> >>>>>>>> works ok
>> >> >>> >> >>>>>>>> > and I think that patch is semantically ok - failover
>> >> occurs
>> >> >>> even
>> >> >>> >> >>>>>>>> though
>> >> >>> >> >>>>>>>> > DISALLOW_TO_FAILOVER is set for backend but only when
>> >> health
>> >> >>> >> check
>> >> >>> >> >>>>>>>> is
>> >> >>> >> >>>>>>>> > configured too. Configuring health check without
>> >> failover on
>> >> >>> >> >>>>>>>> failed health
>> >> >>> >> >>>>>>>> > check has no purpose. Also health check configured
>> with
>> >> >>> allowed
>> >> >>> >> >>>>>>>> failover on
>> >> >>> >> >>>>>>>> > any condition other than health check (max retries)
>> >> failure
>> >> >>> has
>> >> >>> >> no
>> >> >>> >> >>>>>>>> purpose.
>> >> >>> >> >>>>>>>> >
>> >> >>> >> >>>>>>>> > Kind regards,
>> >> >>> >> >>>>>>>> > Stevo.
>> >> >>> >> >>>>>>>> >
>> >> >>> >> >>>>>>>> > 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >
>> >> >>> >> >>>>>>>> >> fail_over_on_backend_error has different meaning from
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER. From the doc:
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>> >>  If true, and an error occurs when writing to the
>> >> backend
>> >> >>> >> >>>>>>>> >>  communication, pgpool-II will trigger the fail over
>> >> >>> procedure
>> >> >>> >> .
>> >> >>> >> >>>>>>>> This
>> >> >>> >> >>>>>>>> >>  is the same behavior as of pgpool-II 2.2.x or
>> earlier.
>> >> If
>> >> >>> set
>> >> >>> >> to
>> >> >>> >> >>>>>>>> >>  false, pgpool will report an error and disconnect
>> the
>> >> >>> session.
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>> >> This means that if pgpool failed to read from
>> backend,
>> >> it
>> >> >>> will
>> >> >>> >> >>>>>>>> trigger
>> >> >>> >> >>>>>>>> >> failover even if fail_over_on_backend_error to off.
>> So
>> >> >>> >> >>>>>>>> unconditionaly
>> >> >>> >> >>>>>>>> >> disabling failover will lead backward imcompatibilty.
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>> >> However I think we should disable failover if
>> >> >>> >> >>>>>>>> DISALLOW_TO_FAILOVER set
>> >> >>> >> >>>>>>>> >> in case of reading data from backend. This should
>> have
>> >> been
>> >> >>> >> done
>> >> >>> >> >>>>>>>> when
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER was introduced because this is
>> >> exactly
>> >> >>> >> what
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you
>> >> >>> think?
>> >> >>> >> >>>>>>>> >> --
>> >> >>> >> >>>>>>>> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>> >> > For a moment I thought we could have set
>> >> >>> >> >>>>>>>> fail_over_on_backend_error to
>> >> >>> >> >>>>>>>> >> off,
>> >> >>> >> >>>>>>>> >> > and have backends set with ALLOW_TO_FAILOVER flag.
>> But
>> >> >>> then I
>> >> >>> >> >>>>>>>> looked in
>> >> >>> >> >>>>>>>> >> > code.
>> >> >>> >> >>>>>>>> >> >
>> >> >>> >> >>>>>>>> >> > In child.c there is a loop child process goes
>> through
>> >> in
>> >> >>> its
>> >> >>> >> >>>>>>>> lifetime.
>> >> >>> >> >>>>>>>> >> When
>> >> >>> >> >>>>>>>> >> > fatal error condition occurs before child process
>> >> exits
>> >> >>> it
>> >> >>> >> will
>> >> >>> >> >>>>>>>> call
>> >> >>> >> >>>>>>>> >> > notice_backend_error which will call
>> >> >>> degenerate_backend_set
>> >> >>> >> >>>>>>>> which will
>> >> >>> >> >>>>>>>> >> not
>> >> >>> >> >>>>>>>> >> > take into account fail_over_on_backend_error is
>> set to
>> >> >>> off,
>> >> >>> >> >>>>>>>> causing
>> >> >>> >> >>>>>>>> >> backend
>> >> >>> >> >>>>>>>> >> > to be degenerated and failover to occur. That's
>> why we
>> >> >>> have
>> >> >>> >> >>>>>>>> backends set
>> >> >>> >> >>>>>>>> >> > with DISALLOW_TO_FAILOVER but with our patch
>> applied,
>> >> >>> health
>> >> >>> >> >>>>>>>> check could
>> >> >>> >> >>>>>>>> >> > cause failover to occur as expected.
>> >> >>> >> >>>>>>>> >> >
>> >> >>> >> >>>>>>>> >> > Maybe it would be enough just to modify
>> >> >>> >> degenerate_backend_set,
>> >> >>> >> >>>>>>>> to take
>> >> >>> >> >>>>>>>> >> > fail_over_on_backend_error into account just like
>> it
>> >> >>> already
>> >> >>> >> >>>>>>>> takes
>> >> >>> >> >>>>>>>> >> > DISALLOW_TO_FAILOVER into account.
>> >> >>> >> >>>>>>>> >> >
>> >> >>> >> >>>>>>>> >> > Kind regards,
>> >> >>> >> >>>>>>>> >> > Stevo.
>> >> >>> >> >>>>>>>> >> >
>> >> >>> >> >>>>>>>> >> > 2012/1/15 Stevo Slavić <sslavic at gmail.com>
>> >> >>> >> >>>>>>>> >> >
>> >> >>> >> >>>>>>>> >> >> Yes and that behaviour which you describe as
>> >> expected,
>> >> >>> is
>> >> >>> >> not
>> >> >>> >> >>>>>>>> what we
>> >> >>> >> >>>>>>>> >> >> want. We want pgpool to degrade backend0 and
>> failover
>> >> >>> when
>> >> >>> >> >>>>>>>> configured
>> >> >>> >> >>>>>>>> >> max
>> >> >>> >> >>>>>>>> >> >> health check retries have failed, and to failover
>> >> only
>> >> >>> in
>> >> >>> >> that
>> >> >>> >> >>>>>>>> case, so
>> >> >>> >> >>>>>>>> >> not
>> >> >>> >> >>>>>>>> >> >> sooner e.g. connection/child error condition, but
>> as
>> >> >>> soon as
>> >> >>> >> >>>>>>>> max health
>> >> >>> >> >>>>>>>> >> >> check retries have been attempted.
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> Maybe examples will be more clear.
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> Imagine two nodes (node 1 and node 2). On each
>> node a
>> >> >>> single
>> >> >>> >> >>>>>>>> pgpool and
>> >> >>> >> >>>>>>>> >> a
>> >> >>> >> >>>>>>>> >> >> single backend. Apps/clients access db through
>> >> pgpool on
>> >> >>> >> their
>> >> >>> >> >>>>>>>> own node.
>> >> >>> >> >>>>>>>> >> >> Two backends are configured in postgres native
>> >> streaming
>> >> >>> >> >>>>>>>> replication.
>> >> >>> >> >>>>>>>> >> >> pgpools are used in raw mode. Both pgpools have
>> same
>> >> >>> >> backend as
>> >> >>> >> >>>>>>>> >> backend0,
>> >> >>> >> >>>>>>>> >> >> and same backend as backend1.
>> >> >>> >> >>>>>>>> >> >> initial state: both backends are up and pgpool can
>> >> >>> access
>> >> >>> >> >>>>>>>> them, clients
>> >> >>> >> >>>>>>>> >> >> connect to their pgpool and do their work on
>> master
>> >> >>> backend,
>> >> >>> >> >>>>>>>> backend0.
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> 1st case: unmodified/non-patched pgpool 3.1.1 is
>> >> used,
>> >> >>> >> >>>>>>>> backends are
>> >> >>> >> >>>>>>>> >> >> configured with ALLOW_TO_FAILOVER flag
>> >> >>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool
>> on
>> >> >>> node 2
>> >> >>> >> >>>>>>>> and backend0
>> >> >>> >> >>>>>>>> >> >> - error condition is reported by child process,
>> and
>> >> >>> since
>> >> >>> >> >>>>>>>> >> >> ALLOW_TO_FAILOVER is set, pgpool performs failover
>> >> >>> without
>> >> >>> >> >>>>>>>> giving
>> >> >>> >> >>>>>>>> >> chance to
>> >> >>> >> >>>>>>>> >> >> pgpool health check retries to control whether
>> >> backend
>> >> >>> is
>> >> >>> >> just
>> >> >>> >> >>>>>>>> >> temporarily
>> >> >>> >> >>>>>>>> >> >> inaccessible
>> >> >>> >> >>>>>>>> >> >> - failover command on node 2 promotes standby
>> backend
>> >> >>> to a
>> >> >>> >> new
>> >> >>> >> >>>>>>>> master -
>> >> >>> >> >>>>>>>> >> >> split brain occurs, with two masters
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> 2nd case: unmodified/non-patched pgpool 3.1.1 is
>> >> used,
>> >> >>> >> >>>>>>>> backends are
>> >> >>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>> >> >>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool
>> on
>> >> >>> node 2
>> >> >>> >> >>>>>>>> and backend0
>> >> >>> >> >>>>>>>> >> >> - error condition is reported by child process,
>> and
>> >> >>> since
>> >> >>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not
>> perform
>> >> >>> >> failover
>> >> >>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>> >> >>> condition,
>> >> >>> >> >>>>>>>> determines
>> >> >>> >> >>>>>>>> >> that
>> >> >>> >> >>>>>>>> >> >> it's not accessible, there will be no health check
>> >> >>> retries
>> >> >>> >> >>>>>>>> because
>> >> >>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, no failover occurs
>> ever
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> 3rd case, pgpool 3.1.1 + patch you've sent
>> applied,
>> >> and
>> >> >>> >> >>>>>>>> backends
>> >> >>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>> >> >>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool
>> on
>> >> >>> node 2
>> >> >>> >> >>>>>>>> and backend0
>> >> >>> >> >>>>>>>> >> >> - error condition is reported by child process,
>> and
>> >> >>> since
>> >> >>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not
>> perform
>> >> >>> >> failover
>> >> >>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>> >> >>> condition,
>> >> >>> >> >>>>>>>> determines
>> >> >>> >> >>>>>>>> >> that
>> >> >>> >> >>>>>>>> >> >> it's not accessible, health check retries happen,
>> and
>> >> >>> even
>> >> >>> >> >>>>>>>> after max
>> >> >>> >> >>>>>>>> >> >> retries, no failover happens since failover is
>> >> >>> disallowed
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> 4th expected behaviour, pgpool 3.1.1 + patch we
>> sent,
>> >> >>> and
>> >> >>> >> >>>>>>>> backends
>> >> >>> >> >>>>>>>> >> >> configured with DISALLOW_TO_FAILOVER
>> >> >>> >> >>>>>>>> >> >> - temporary network outage happens between pgpool
>> on
>> >> >>> node 2
>> >> >>> >> >>>>>>>> and backend0
>> >> >>> >> >>>>>>>> >> >> - error condition is reported by child process,
>> and
>> >> >>> since
>> >> >>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not
>> perform
>> >> >>> >> failover
>> >> >>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>> >> >>> condition,
>> >> >>> >> >>>>>>>> determines
>> >> >>> >> >>>>>>>> >> that
>> >> >>> >> >>>>>>>> >> >> it's not accessible, health check retries happen,
>> >> >>> before a
>> >> >>> >> max
>> >> >>> >> >>>>>>>> retry
>> >> >>> >> >>>>>>>> >> >> network condition is cleared, retry happens, and
>> >> >>> backend0
>> >> >>> >> >>>>>>>> remains to be
>> >> >>> >> >>>>>>>> >> >> master, no failover occurs, temporary network
>> issue
>> >> did
>> >> >>> not
>> >> >>> >> >>>>>>>> cause split
>> >> >>> >> >>>>>>>> >> >> brain
>> >> >>> >> >>>>>>>> >> >> - after some time, temporary network outage
>> happens
>> >> >>> again
>> >> >>> >> >>>>>>>> between pgpool
>> >> >>> >> >>>>>>>> >> >> on node 2 and backend0
>> >> >>> >> >>>>>>>> >> >> - error condition is reported by child process,
>> and
>> >> >>> since
>> >> >>> >> >>>>>>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not
>> perform
>> >> >>> >> failover
>> >> >>> >> >>>>>>>> >> >> - health check gets a chance to check backend0
>> >> >>> condition,
>> >> >>> >> >>>>>>>> determines
>> >> >>> >> >>>>>>>> >> that
>> >> >>> >> >>>>>>>> >> >> it's not accessible, health check retries happen,
>> >> after
>> >> >>> max
>> >> >>> >> >>>>>>>> retries
>> >> >>> >> >>>>>>>> >> >> backend0 is still not accessible, failover
>> happens,
>> >> >>> standby
>> >> >>> >> is
>> >> >>> >> >>>>>>>> new
>> >> >>> >> >>>>>>>> >> master
>> >> >>> >> >>>>>>>> >> >> and backend0 is degraded
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> Kind regards,
>> >> >>> >> >>>>>>>> >> >> Stevo.
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> In my test evironment, the patch works as
>> expected.
>> >> I
>> >> >>> have
>> >> >>> >> two
>> >> >>> >> >>>>>>>> >> >>> backends. Health check retry conf is as follows:
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> health_check_max_retries = 3
>> >> >>> >> >>>>>>>> >> >>> health_check_retry_delay = 1
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> 5 09:17:20 LOG:   pid 21411: Backend status file
>> >> >>> >> >>>>>>>> /home/t-ishii/work/
>> >> >>> >> >>>>>>>> >> >>> git.postgresql.org/test/log/pgpool_statusdiscarded
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411: pgpool-II
>> >> >>> >> successfully
>> >> >>> >> >>>>>>>> started.
>> >> >>> >> >>>>>>>> >> >>> version 3.2alpha1 (hatsuiboshi)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411:
>> >> >>> find_primary_node:
>> >> >>> >> >>>>>>>> primary node
>> >> >>> >> >>>>>>>> >> id
>> >> >>> >> >>>>>>>> >> >>> is 0
>> >> >>> >> >>>>>>>> >> >>> -- backend1 was shutdown
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> check_replication_time_lag: could
>> >> >>> >> >>>>>>>> >> >>> not connect to DB node 1, check sr_check_user and
>> >> >>> >> >>>>>>>> sr_check_password
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> -- health check failed
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> -- start retrying
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:50 LOG:   pid 21411: health
>> check
>> >> >>> retry
>> >> >>> >> >>>>>>>> sleep time: 1
>> >> >>> >> >>>>>>>> >> >>> second(s)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:51 LOG:   pid 21411: health
>> check
>> >> >>> retry
>> >> >>> >> >>>>>>>> sleep time: 1
>> >> >>> >> >>>>>>>> >> >>> second(s)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:52 LOG:   pid 21411: health
>> check
>> >> >>> retry
>> >> >>> >> >>>>>>>> sleep time: 1
>> >> >>> >> >>>>>>>> >> >>> second(s)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:17:53 LOG:   pid 21411:
>> health_check:
>> >> 1
>> >> >>> >> >>>>>>>> failover is
>> >> >>> >> >>>>>>>> >> canceld
>> >> >>> >> >>>>>>>> >> >>> because failover is disallowed
>> >> >>> >> >>>>>>>> >> >>> -- after 3 retries, pgpool wanted to failover,
>> but
>> >> >>> gave up
>> >> >>> >> >>>>>>>> because
>> >> >>> >> >>>>>>>> >> >>> DISALLOW_TO_FAILOVER is set for backend1
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>> >> >>> >> >>>>>>>> check_replication_time_lag: could
>> >> >>> >> >>>>>>>> >> >>> not connect to DB node 1, check sr_check_user and
>> >> >>> >> >>>>>>>> sr_check_password
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:03 LOG:   pid 21411: health
>> check
>> >> >>> retry
>> >> >>> >> >>>>>>>> sleep time: 1
>> >> >>> >> >>>>>>>> >> >>> second(s)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> >> connect_unix_domain_socket_by_port:
>> >> >>> >> >>>>>>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such
>> >> file
>> >> >>> or
>> >> >>> >> >>>>>>>> directory
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>> >> >>> >> >>>>>>>> make_persistent_db_connection:
>> >> >>> >> >>>>>>>> >> >>> connection to /tmp(11001) failed
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411: health
>> check
>> >> >>> failed.
>> >> >>> >> 1
>> >> >>> >> >>>>>>>> th host
>> >> >>> >> >>>>>>>> >> /tmp
>> >> >>> >> >>>>>>>> >> >>> at port 11001 is down
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:04 LOG:   pid 21411: health
>> check
>> >> >>> retry
>> >> >>> >> >>>>>>>> sleep time: 1
>> >> >>> >> >>>>>>>> >> >>> second(s)
>> >> >>> >> >>>>>>>> >> >>> 2012-01-15 09:18:05 LOG:   pid 21411: after some
>> >> >>> retrying
>> >> >>> >> >>>>>>>> backend
>> >> >>> >> >>>>>>>> >> >>> returned to healthy state
>> >> >>> >> >>>>>>>> >> >>> -- started backend1 and pgpool succeeded in
>> health
>> >> >>> >> checking.
>> >> >>> >> >>>>>>>> Resumed
>> >> >>> >> >>>>>>>> >> >>> using backend1
>> >> >>> >> >>>>>>>> >> >>> --
>> >> >>> >> >>>>>>>> >> >>> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>> > Hello Tatsuo,
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > Thank you for the patch and effort, but
>> >> unfortunately
>> >> >>> >> this
>> >> >>> >> >>>>>>>> change
>> >> >>> >> >>>>>>>> >> won't
>> >> >>> >> >>>>>>>> >> >>> > work for us. We need to set disallow failover
>> to
>> >> >>> prevent
>> >> >>> >> >>>>>>>> failover on
>> >> >>> >> >>>>>>>> >> >>> child
>> >> >>> >> >>>>>>>> >> >>> > reported connection errors (it's ok if few
>> clients
>> >> >>> lose
>> >> >>> >> >>>>>>>> their
>> >> >>> >> >>>>>>>> >> >>> connection or
>> >> >>> >> >>>>>>>> >> >>> > can not connect), and still have pgpool perform
>> >> >>> failover
>> >> >>> >> >>>>>>>> but only on
>> >> >>> >> >>>>>>>> >> >>> failed
>> >> >>> >> >>>>>>>> >> >>> > health check (if configured, after max retries
>> >> >>> threshold
>> >> >>> >> >>>>>>>> has been
>> >> >>> >> >>>>>>>> >> >>> reached).
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > Maybe it would be best to add an extra value
>> for
>> >> >>> >> >>>>>>>> backend_flag -
>> >> >>> >> >>>>>>>> >> >>> > ALLOW_TO_FAILOVER_ON_HEALTH_CHECK or
>> >> >>> >> >>>>>>>> >> >>> DISALLOW_TO_FAILOVER_ON_CHILD_ERROR.
>> >> >>> >> >>>>>>>> >> >>> > It should behave same as DISALLOW_TO_FAILOVER
>> is
>> >> set,
>> >> >>> >> with
>> >> >>> >> >>>>>>>> only
>> >> >>> >> >>>>>>>> >> >>> difference
>> >> >>> >> >>>>>>>> >> >>> > in behaviour when health check (if set, max
>> >> retries)
>> >> >>> has
>> >> >>> >> >>>>>>>> failed -
>> >> >>> >> >>>>>>>> >> unlike
>> >> >>> >> >>>>>>>> >> >>> > DISALLOW_TO_FAILOVER, this new flag should
>> allow
>> >> >>> failover
>> >> >>> >> >>>>>>>> in this
>> >> >>> >> >>>>>>>> >> case
>> >> >>> >> >>>>>>>> >> >>> only.
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > Without this change health check (especially
>> >> health
>> >> >>> check
>> >> >>> >> >>>>>>>> retries)
>> >> >>> >> >>>>>>>> >> >>> doesn't
>> >> >>> >> >>>>>>>> >> >>> > make much sense - child error is more likely to
>> >> >>> occur on
>> >> >>> >> >>>>>>>> (temporary)
>> >> >>> >> >>>>>>>> >> >>> > backend failure then health check and will or
>> will
>> >> >>> not
>> >> >>> >> cause
>> >> >>> >> >>>>>>>> >> failover to
>> >> >>> >> >>>>>>>> >> >>> > occur depending on backend flag, without giving
>> >> >>> health
>> >> >>> >> >>>>>>>> check retries
>> >> >>> >> >>>>>>>> >> a
>> >> >>> >> >>>>>>>> >> >>> > chance to determine if failure was temporary or
>> >> not,
>> >> >>> >> >>>>>>>> risking split
>> >> >>> >> >>>>>>>> >> brain
>> >> >>> >> >>>>>>>> >> >>> > situation with two masters just because of
>> >> temporary
>> >> >>> >> >>>>>>>> network link
>> >> >>> >> >>>>>>>> >> >>> hiccup.
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > Our main problem remains though with the health
>> >> check
>> >> >>> >> >>>>>>>> timeout not
>> >> >>> >> >>>>>>>> >> being
>> >> >>> >> >>>>>>>> >> >>> > respected in these special conditions we have.
>> >> Maybe
>> >> >>> >> Nenad
>> >> >>> >> >>>>>>>> can help
>> >> >>> >> >>>>>>>> >> you
>> >> >>> >> >>>>>>>> >> >>> > more to reproduce the issue on your
>> environment.
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > Kind regards,
>> >> >>> >> >>>>>>>> >> >>> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> > 2012/1/13 Tatsuo Ishii <ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> Thanks for pointing it out.
>> >> >>> >> >>>>>>>> >> >>> >> Yes, checking DISALLOW_TO_FAILOVER before
>> >> retrying
>> >> >>> is
>> >> >>> >> >>>>>>>> wrong.
>> >> >>> >> >>>>>>>> >> >>> >> However, after retry count over, we should
>> check
>> >> >>> >> >>>>>>>> >> DISALLOW_TO_FAILOVER I
>> >> >>> >> >>>>>>>> >> >>> >> think.
>> >> >>> >> >>>>>>>> >> >>> >> Attached is the patch attempt to fix it.
>> Please
>> >> try.
>> >> >>> >> >>>>>>>> >> >>> >> --
>> >> >>> >> >>>>>>>> >> >>> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> > pgpool is being used in raw mode - just for
>> >> >>> (health
>> >> >>> >> >>>>>>>> check based)
>> >> >>> >> >>>>>>>> >> >>> failover
>> >> >>> >> >>>>>>>> >> >>> >> > part, so applications are not required to
>> >> restart
>> >> >>> when
>> >> >>> >> >>>>>>>> standby
>> >> >>> >> >>>>>>>> >> gets
>> >> >>> >> >>>>>>>> >> >>> >> > promoted to new master. Here is pgpool.conf
>> >> file
>> >> >>> and a
>> >> >>> >> >>>>>>>> very small
>> >> >>> >> >>>>>>>> >> >>> patch
>> >> >>> >> >>>>>>>> >> >>> >> > we're using applied to pgpool 3.1.1 release.
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> > We have to have DISALLOW_TO_FAILOVER set for
>> >> the
>> >> >>> >> backend
>> >> >>> >> >>>>>>>> since any
>> >> >>> >> >>>>>>>> >> >>> child
>> >> >>> >> >>>>>>>> >> >>> >> > process that detects condition that
>> >> >>> master/backend0 is
>> >> >>> >> >>>>>>>> not
>> >> >>> >> >>>>>>>> >> >>> available, if
>> >> >>> >> >>>>>>>> >> >>> >> > DISALLOW_TO_FAILOVER was not set, will
>> >> degenerate
>> >> >>> >> >>>>>>>> backend without
>> >> >>> >> >>>>>>>> >> >>> giving
>> >> >>> >> >>>>>>>> >> >>> >> > health check a chance to retry. We need
>> health
>> >> >>> check
>> >> >>> >> >>>>>>>> with retries
>> >> >>> >> >>>>>>>> >> >>> because
>> >> >>> >> >>>>>>>> >> >>> >> > condition that backend0 is not available
>> could
>> >> be
>> >> >>> >> >>>>>>>> temporary
>> >> >>> >> >>>>>>>> >> (network
>> >> >>> >> >>>>>>>> >> >>> >> > glitches to the remote site where master
>> is, or
>> >> >>> >> >>>>>>>> deliberate
>> >> >>> >> >>>>>>>> >> failover
>> >> >>> >> >>>>>>>> >> >>> of
>> >> >>> >> >>>>>>>> >> >>> >> > master postgres service from one node to the
>> >> >>> other on
>> >> >>> >> >>>>>>>> remote site
>> >> >>> >> >>>>>>>> >> -
>> >> >>> >> >>>>>>>> >> >>> in
>> >> >>> >> >>>>>>>> >> >>> >> both
>> >> >>> >> >>>>>>>> >> >>> >> > cases remote means remote to the pgpool
>> that is
>> >> >>> going
>> >> >>> >> to
>> >> >>> >> >>>>>>>> perform
>> >> >>> >> >>>>>>>> >> >>> health
>> >> >>> >> >>>>>>>> >> >>> >> > checks and ultimately the failover) and we
>> >> don't
>> >> >>> want
>> >> >>> >> >>>>>>>> standby to
>> >> >>> >> >>>>>>>> >> be
>> >> >>> >> >>>>>>>> >> >>> >> > promoted as easily to a new master, to
>> prevent
>> >> >>> >> temporary
>> >> >>> >> >>>>>>>> network
>> >> >>> >> >>>>>>>> >> >>> >> conditions
>> >> >>> >> >>>>>>>> >> >>> >> > which could occur frequently to frequently
>> >> cause
>> >> >>> split
>> >> >>> >> >>>>>>>> brain with
>> >> >>> >> >>>>>>>> >> two
>> >> >>> >> >>>>>>>> >> >>> >> > masters.
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> > But then, with DISALLOW_TO_FAILOVER set,
>> >> without
>> >> >>> the
>> >> >>> >> >>>>>>>> patch health
>> >> >>> >> >>>>>>>> >> >>> check
>> >> >>> >> >>>>>>>> >> >>> >> > will not retry and will thus give only one
>> >> chance
>> >> >>> to
>> >> >>> >> >>>>>>>> backend (if
>> >> >>> >> >>>>>>>> >> >>> health
>> >> >>> >> >>>>>>>> >> >>> >> > check ever occurs before child process
>> failure
>> >> to
>> >> >>> >> >>>>>>>> connect to the
>> >> >>> >> >>>>>>>> >> >>> >> backend),
>> >> >>> >> >>>>>>>> >> >>> >> > rendering retry settings effectively to be
>> >> >>> ignored.
>> >> >>> >> >>>>>>>> That's where
>> >> >>> >> >>>>>>>> >> this
>> >> >>> >> >>>>>>>> >> >>> >> patch
>> >> >>> >> >>>>>>>> >> >>> >> > comes into action - enables health check
>> >> retries
>> >> >>> while
>> >> >>> >> >>>>>>>> child
>> >> >>> >> >>>>>>>> >> >>> processes
>> >> >>> >> >>>>>>>> >> >>> >> are
>> >> >>> >> >>>>>>>> >> >>> >> > prevented to degenerate backend.
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> > I don't think, but I could be wrong, that
>> this
>> >> >>> patch
>> >> >>> >> >>>>>>>> influences
>> >> >>> >> >>>>>>>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> > behavior we're seeing with unwanted health
>> >> check
>> >> >>> >> attempt
>> >> >>> >> >>>>>>>> delays.
>> >> >>> >> >>>>>>>> >> >>> Also,
>> >> >>> >> >>>>>>>> >> >>> >> > knowing this, maybe pgpool could be patched
>> or
>> >> >>> some
>> >> >>> >> >>>>>>>> other support
>> >> >>> >> >>>>>>>> >> be
>> >> >>> >> >>>>>>>> >> >>> >> built
>> >> >>> >> >>>>>>>> >> >>> >> > into it to cover this use case.
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> > 2012/1/12 Tatsuo Ishii <
>> ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> I have accepted the moderation request.
>> Your
>> >> post
>> >> >>> >> >>>>>>>> should be sent
>> >> >>> >> >>>>>>>> >> >>> >> shortly.
>> >> >>> >> >>>>>>>> >> >>> >> >> Also I have raised the post size limit to
>> 1MB.
>> >> >>> >> >>>>>>>> >> >>> >> >> I will look into this...
>> >> >>> >> >>>>>>>> >> >>> >> >> --
>> >> >>> >> >>>>>>>> >> >>> >> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> English:
>> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> > Here is the log file and strace output
>> file
>> >> >>> (this
>> >> >>> >> >>>>>>>> time in an
>> >> >>> >> >>>>>>>> >> >>> archive,
>> >> >>> >> >>>>>>>> >> >>> >> >> > didn't know about 200KB constraint on
>> post
>> >> size
>> >> >>> >> which
>> >> >>> >> >>>>>>>> requires
>> >> >>> >> >>>>>>>> >> >>> >> moderator
>> >> >>> >> >>>>>>>> >> >>> >> >> > approval). Timings configured are 30sec
>> >> health
>> >> >>> >> check
>> >> >>> >> >>>>>>>> interval,
>> >> >>> >> >>>>>>>> >> >>> 5sec
>> >> >>> >> >>>>>>>> >> >>> >> >> > timeout, and 2 retries with 10sec retry
>> >> delay.
>> >> >>> >> >>>>>>>> >> >>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> > It takes a lot more than 5sec from
>> started
>> >> >>> health
>> >> >>> >> >>>>>>>> check to
>> >> >>> >> >>>>>>>> >> >>> sleeping
>> >> >>> >> >>>>>>>> >> >>> >> 10sec
>> >> >>> >> >>>>>>>> >> >>> >> >> > for first retry.
>> >> >>> >> >>>>>>>> >> >>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> > Seen in code (main.x, health_check()
>> >> function),
>> >> >>> >> >>>>>>>> within (retry)
>> >> >>> >> >>>>>>>> >> >>> attempt
>> >> >>> >> >>>>>>>> >> >>> >> >> > there is inner retry (first with postgres
>> >> >>> database
>> >> >>> >> >>>>>>>> then with
>> >> >>> >> >>>>>>>> >> >>> >> template1)
>> >> >>> >> >>>>>>>> >> >>> >> >> and
>> >> >>> >> >>>>>>>> >> >>> >> >> > that part doesn't seem to be interrupted
>> by
>> >> >>> alarm.
>> >> >>> >> >>>>>>>> >> >>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> > 2012/1/12 Stevo Slavić <
>> sslavic at gmail.com>
>> >> >>> >> >>>>>>>> >> >>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >> Here is the log file and strace output
>> >> file.
>> >> >>> >> Timings
>> >> >>> >> >>>>>>>> >> configured
>> >> >>> >> >>>>>>>> >> >>> are
>> >> >>> >> >>>>>>>> >> >>> >> >> 30sec
>> >> >>> >> >>>>>>>> >> >>> >> >> >> health check interval, 5sec timeout,
>> and 2
>> >> >>> retries
>> >> >>> >> >>>>>>>> with 10sec
>> >> >>> >> >>>>>>>> >> >>> retry
>> >> >>> >> >>>>>>>> >> >>> >> >> delay.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >> It takes a lot more than 5sec from
>> started
>> >> >>> health
>> >> >>> >> >>>>>>>> check to
>> >> >>> >> >>>>>>>> >> >>> sleeping
>> >> >>> >> >>>>>>>> >> >>> >> >> 10sec
>> >> >>> >> >>>>>>>> >> >>> >> >> >> for first retry.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >> Seen in code (main.x, health_check()
>> >> >>> function),
>> >> >>> >> >>>>>>>> within (retry)
>> >> >>> >> >>>>>>>> >> >>> >> attempt
>> >> >>> >> >>>>>>>> >> >>> >> >> >> there is inner retry (first with
>> postgres
>> >> >>> database
>> >> >>> >> >>>>>>>> then with
>> >> >>> >> >>>>>>>> >> >>> >> template1)
>> >> >>> >> >>>>>>>> >> >>> >> >> and
>> >> >>> >> >>>>>>>> >> >>> >> >> >> that part doesn't seem to be
>> interrupted by
>> >> >>> alarm.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >> Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >> Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >> 2012/1/11 Tatsuo Ishii <
>> >> ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> Ok, I will do it. In the mean time you
>> >> could
>> >> >>> use
>> >> >>> >> >>>>>>>> "strace -tt
>> >> >>> >> >>>>>>>> >> -p
>> >> >>> >> >>>>>>>> >> >>> PID"
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> to see which system call is blocked.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> --
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> English:
>> >> >>> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > OK, got the info - key point is that
>> ip
>> >> >>> >> >>>>>>>> forwarding is
>> >> >>> >> >>>>>>>> >> >>> disabled for
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> security
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > reasons. Rules in iptables are not
>> >> >>> important,
>> >> >>> >> >>>>>>>> iptables can
>> >> >>> >> >>>>>>>> >> be
>> >> >>> >> >>>>>>>> >> >>> >> >> stopped,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> or
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > previously added rules removed.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > Here are the steps to reproduce
>> (kudos
>> >> to
>> >> >>> my
>> >> >>> >> >>>>>>>> colleague
>> >> >>> >> >>>>>>>> >> Nenad
>> >> >>> >> >>>>>>>> >> >>> >> >> Bulatovic
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> for
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > providing this):
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 1.) make sure that ip forwarding is
>> off:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >     echo 0 >
>> >> /proc/sys/net/ipv4/ip_forward
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2.) create IP alias on some interface
>> >> (and
>> >> >>> have
>> >> >>> >> >>>>>>>> postgres
>> >> >>> >> >>>>>>>> >> >>> listen on
>> >> >>> >> >>>>>>>> >> >>> >> >> it):
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >     ip addr add x.x.x.x/yy dev ethz
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 3.) set backend_hostname0 to
>> >> >>> aforementioned IP
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 4.) start pgpool and monitor health
>> >> checks
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 5.) remove IP alias:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >     ip addr del x.x.x.x/yy dev ethz
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > Here is the interesting part in
>> pgpool
>> >> log
>> >> >>> >> after
>> >> >>> >> >>>>>>>> this:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>> >> >>> starting
>> >> >>> >> >>>>>>>> health
>> >> >>> >> >>>>>>>> >> checking
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>> >> >>> >> >>>>>>>> health_check: 0 th DB
>> >> >>> >> >>>>>>>> >> >>> node
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358:
>> >> >>> >> >>>>>>>> health_check: 1 th DB
>> >> >>> >> >>>>>>>> >> >>> node
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> status: 1
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358:
>> >> >>> starting
>> >> >>> >> >>>>>>>> health
>> >> >>> >> >>>>>>>> >> checking
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358:
>> >> >>> >> >>>>>>>> health_check: 0 th DB
>> >> >>> >> >>>>>>>> >> >>> node
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:43 DEBUG: pid 24358:
>> >> >>> >> >>>>>>>> health_check: 0 th DB
>> >> >>> >> >>>>>>>> >> >>> node
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> status: 2
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 ERROR: pid 24358:
>> >> >>> health
>> >> >>> >> >>>>>>>> check failed.
>> >> >>> >> >>>>>>>> >> 0
>> >> >>> >> >>>>>>>> >> >>> th
>> >> >>> >> >>>>>>>> >> >>> >> host
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 192.168.2.27 at port 5432 is down
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 LOG:   pid 24358:
>> >> >>> health
>> >> >>> >> >>>>>>>> check retry
>> >> >>> >> >>>>>>>> >> sleep
>> >> >>> >> >>>>>>>> >> >>> >> time:
>> >> >>> >> >>>>>>>> >> >>> >> >> 10
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > second(s)
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > That pgpool was configured with
>> health
>> >> >>> check
>> >> >>> >> >>>>>>>> interval of
>> >> >>> >> >>>>>>>> >> >>> 30sec,
>> >> >>> >> >>>>>>>> >> >>> >> 5sec
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > timeout, and 10sec retry delay with 2
>> >> max
>> >> >>> >> retries.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > Making use of libpq instead for
>> >> connecting
>> >> >>> to
>> >> >>> >> db
>> >> >>> >> >>>>>>>> in health
>> >> >>> >> >>>>>>>> >> >>> checks
>> >> >>> >> >>>>>>>> >> >>> >> IMO
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > should resolve it, but you'll best
>> >> >>> determine
>> >> >>> >> >>>>>>>> which call
>> >> >>> >> >>>>>>>> >> >>> exactly
>> >> >>> >> >>>>>>>> >> >>> >> gets
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > blocked waiting. Btw, psql with
>> >> >>> >> PGCONNECT_TIMEOUT
>> >> >>> >> >>>>>>>> env var
>> >> >>> >> >>>>>>>> >> >>> >> configured
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > respects that env var timeout.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> > On Wed, Jan 11, 2012 at 11:15 AM,
>> Stevo
>> >> >>> Slavić
>> >> >>> >> <
>> >> >>> >> >>>>>>>> >> >>> sslavic at gmail.com
>> >> >>> >> >>>>>>>> >> >>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >> Tatsuo,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >> Did you restart iptables after
>> adding
>> >> >>> rule?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >> Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >> Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >> On Wed, Jan 11, 2012 at 11:12 AM,
>> Stevo
>> >> >>> >> Slavić <
>> >> >>> >> >>>>>>>> >> >>> >> sslavic at gmail.com>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> Looking into this to verify if
>> these
>> >> are
>> >> >>> all
>> >> >>> >> >>>>>>>> necessary
>> >> >>> >> >>>>>>>> >> >>> changes
>> >> >>> >> >>>>>>>> >> >>> >> to
>> >> >>> >> >>>>>>>> >> >>> >> >> have
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> port unreachable message silently
>> >> >>> rejected
>> >> >>> >> >>>>>>>> (suspecting
>> >> >>> >> >>>>>>>> >> some
>> >> >>> >> >>>>>>>> >> >>> >> kernel
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> parameter tuning is needed).
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> Just to clarify it's not a problem
>> >> that
>> >> >>> host
>> >> >>> >> is
>> >> >>> >> >>>>>>>> being
>> >> >>> >> >>>>>>>> >> >>> detected
>> >> >>> >> >>>>>>>> >> >>> >> by
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> pgpool
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> to be down, but the timing when
>> that
>> >> >>> >> happens. On
>> >> >>> >> >>>>>>>> >> environment
>> >> >>> >> >>>>>>>> >> >>> >> where
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> issue is
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> reproduced pgpool as part of health
>> >> check
>> >> >>> >> >>>>>>>> attempt tries
>> >> >>> >> >>>>>>>> >> to
>> >> >>> >> >>>>>>>> >> >>> >> connect
>> >> >>> >> >>>>>>>> >> >>> >> >> to
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> backend and hangs for tcp timeout
>> >> >>> instead of
>> >> >>> >> >>>>>>>> being
>> >> >>> >> >>>>>>>> >> >>> interrupted
>> >> >>> >> >>>>>>>> >> >>> >> by
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> timeout
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> alarm. Can you verify/confirm
>> please
>> >> the
>> >> >>> >> health
>> >> >>> >> >>>>>>>> check
>> >> >>> >> >>>>>>>> >> retry
>> >> >>> >> >>>>>>>> >> >>> >> timings
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> are not
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> delayed?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>> On Wed, Jan 11, 2012 at 10:50 AM,
>> >> Tatsuo
>> >> >>> >> Ishii <
>> >> >>> >> >>>>>>>> >> >>> >> >> ishii at postgresql.org
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> Ok, I did:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> # iptables -A FORWARD -j REJECT
>> >> >>> >> --reject-with
>> >> >>> >> >>>>>>>> >> >>> >> >> icmp-port-unreachable
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> on the host where pgpoo is
>> running.
>> >> And
>> >> >>> pull
>> >> >>> >> >>>>>>>> network
>> >> >>> >> >>>>>>>> >> cable
>> >> >>> >> >>>>>>>> >> >>> from
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> backend0 host network interface.
>> >> Pgpool
>> >> >>> >> >>>>>>>> detected the
>> >> >>> >> >>>>>>>> >> host
>> >> >>> >> >>>>>>>> >> >>> being
>> >> >>> >> >>>>>>>> >> >>> >> >> down
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> as expected...
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> --
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> English:
>> >> >>> >> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> Japanese: http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Backend is not destination of
>> this
>> >> >>> >> message,
>> >> >>> >> >>>>>>>> pgpool
>> >> >>> >> >>>>>>>> >> host
>> >> >>> >> >>>>>>>> >> >>> is,
>> >> >>> >> >>>>>>>> >> >>> >> and
>> >> >>> >> >>>>>>>> >> >>> >> >> we
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> don't
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > want it to ever get it. With
>> >> command
>> >> >>> I've
>> >> >>> >> >>>>>>>> sent you
>> >> >>> >> >>>>>>>> >> rule
>> >> >>> >> >>>>>>>> >> >>> will
>> >> >>> >> >>>>>>>> >> >>> >> be
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> created for
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > any source and destination.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> > On Wed, Jan 11, 2012 at 10:38
>> AM,
>> >> >>> Tatsuo
>> >> >>> >> >>>>>>>> Ishii <
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> I did following:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Do following on the host where
>> >> >>> pgpool is
>> >> >>> >> >>>>>>>> running on:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> # iptables -A FORWARD -j REJECT
>> >> >>> >> >>>>>>>> --reject-with
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> icmp-port-unreachable -d
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> 133.137.177.124
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> (133.137.177.124 is the host
>> where
>> >> >>> >> backend
>> >> >>> >> >>>>>>>> is running
>> >> >>> >> >>>>>>>> >> >>> on)
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Pull network cable from
>> backend0
>> >> host
>> >> >>> >> >>>>>>>> network
>> >> >>> >> >>>>>>>> >> interface.
>> >> >>> >> >>>>>>>> >> >>> >> Pgpool
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> detected the host being down as
>> >> >>> expected.
>> >> >>> >> >>>>>>>> Am I
>> >> >>> >> >>>>>>>> >> missing
>> >> >>> >> >>>>>>>> >> >>> >> >> something?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> --
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> English:
>> >> >>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> Japanese:
>> http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Hello Tatsuo,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > With backend0 on one host
>> just
>> >> >>> >> configure
>> >> >>> >> >>>>>>>> following
>> >> >>> >> >>>>>>>> >> >>> rule on
>> >> >>> >> >>>>>>>> >> >>> >> >> other
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> host
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> where
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > pgpool is:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > iptables -A FORWARD -j REJECT
>> >> >>> >> >>>>>>>> --reject-with
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> icmp-port-unreachable
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > and then have pgpool startup
>> >> with
>> >> >>> >> health
>> >> >>> >> >>>>>>>> checking
>> >> >>> >> >>>>>>>> >> and
>> >> >>> >> >>>>>>>> >> >>> >> >> retrying
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> configured,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > and then pull network cable
>> from
>> >> >>> >> backend0
>> >> >>> >> >>>>>>>> host
>> >> >>> >> >>>>>>>> >> network
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> interface.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> > On Wed, Jan 11, 2012 at 6:27
>> AM,
>> >> >>> Tatsuo
>> >> >>> >> >>>>>>>> Ishii <
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> ishii at postgresql.org
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> I want to try to test the
>> >> >>> situation
>> >> >>> >> you
>> >> >>> >> >>>>>>>> descrived:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > When system is
>> configured
>> >> for
>> >> >>> >> >>>>>>>> security
>> >> >>> >> >>>>>>>> >> reasons
>> >> >>> >> >>>>>>>> >> >>> not
>> >> >>> >> >>>>>>>> >> >>> >> to
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> return
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> destination
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable
>> messages,
>> >> >>> even
>> >> >>> >> >>>>>>>> though
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> health_check_timeout is
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> But I don't know how to do
>> it.
>> >> I
>> >> >>> >> pulled
>> >> >>> >> >>>>>>>> out the
>> >> >>> >> >>>>>>>> >> >>> network
>> >> >>> >> >>>>>>>> >> >>> >> >> cable
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> and
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> pgpool detected it as
>> expected.
>> >> >>> Also I
>> >> >>> >> >>>>>>>> configured
>> >> >>> >> >>>>>>>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> server
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> which
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> PostgreSQL is running on to
>> >> >>> disable
>> >> >>> >> the
>> >> >>> >> >>>>>>>> 5432
>> >> >>> >> >>>>>>>> >> port. In
>> >> >>> >> >>>>>>>> >> >>> >> this
>> >> >>> >> >>>>>>>> >> >>> >> >> case
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> connect(2) returned
>> >> EHOSTUNREACH
>> >> >>> (No
>> >> >>> >> >>>>>>>> route to
>> >> >>> >> >>>>>>>> >> host)
>> >> >>> >> >>>>>>>> >> >>> so
>> >> >>> >> >>>>>>>> >> >>> >> >> pgpool
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> detected
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> the error as expected.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Could you please instruct
>> me?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> --
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> English:
>> >> >>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> Japanese:
>> >> http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Hello Tatsuo,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Thank you for replying!
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > I'm not sure what exactly
>> is
>> >> >>> >> blocking,
>> >> >>> >> >>>>>>>> just by
>> >> >>> >> >>>>>>>> >> >>> pgpool
>> >> >>> >> >>>>>>>> >> >>> >> code
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> analysis I
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > suspect it is the part
>> where
>> >> a
>> >> >>> >> >>>>>>>> connection is
>> >> >>> >> >>>>>>>> >> made
>> >> >>> >> >>>>>>>> >> >>> to
>> >> >>> >> >>>>>>>> >> >>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> >> db
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> and
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> it
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> doesn't
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > seem to get interrupted by
>> >> >>> alarm.
>> >> >>> >> >>>>>>>> Tested
>> >> >>> >> >>>>>>>> >> thoroughly
>> >> >>> >> >>>>>>>> >> >>> >> health
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> check
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> behaviour,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > it works really well when
>> >> >>> host/ip is
>> >> >>> >> >>>>>>>> there and
>> >> >>> >> >>>>>>>> >> just
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> backend/postgres
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> is
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > down, but not when backend
>> >> >>> host/ip
>> >> >>> >> is
>> >> >>> >> >>>>>>>> down. I
>> >> >>> >> >>>>>>>> >> could
>> >> >>> >> >>>>>>>> >> >>> >> see in
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> log
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> that
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> initial
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > health check and each
>> retry
>> >> got
>> >> >>> >> >>>>>>>> delayed when
>> >> >>> >> >>>>>>>> >> >>> host/ip is
>> >> >>> >> >>>>>>>> >> >>> >> >> not
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> reachable,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > while when just backend is
>> >> not
>> >> >>> >> >>>>>>>> listening (is
>> >> >>> >> >>>>>>>> >> down)
>> >> >>> >> >>>>>>>> >> >>> on
>> >> >>> >> >>>>>>>> >> >>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> reachable
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> host/ip
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > then initial health check
>> and
>> >> >>> all
>> >> >>> >> >>>>>>>> retries are
>> >> >>> >> >>>>>>>> >> >>> exact to
>> >> >>> >> >>>>>>>> >> >>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> settings in
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > pgpool.conf.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > PGCONNECT_TIMEOUT is
>> listed
>> >> as
>> >> >>> one
>> >> >>> >> of
>> >> >>> >> >>>>>>>> the libpq
>> >> >>> >> >>>>>>>> >> >>> >> >> environment
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> variables
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> in
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > the docs (see
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >>
>> >> >>> >> >>>>>>>>
>> >> http://www.postgresql.org/docs/9.1/static/libpq-envars.html)
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > There is equivalent
>> >> parameter in
>> >> >>> >> libpq
>> >> >>> >> >>>>>>>> >> >>> >> PGconnectdbParams (
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> see
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>>
>> >> >>> >>
>> >> >>>
>> >>
>> http://www.postgresql.org/docs/9.1/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> )
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > At the beginning of that
>> same
>> >> >>> page
>> >> >>> >> >>>>>>>> there are
>> >> >>> >> >>>>>>>> >> some
>> >> >>> >> >>>>>>>> >> >>> >> >> important
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> infos on
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> using
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > these functions.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > psql respects
>> >> PGCONNECT_TIMEOUT.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Regards,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > Stevo.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> > On Wed, Jan 11, 2012 at
>> 12:13
>> >> >>> AM,
>> >> >>> >> >>>>>>>> Tatsuo Ishii <
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> ishii at postgresql.org>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> wrote:
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > Hello pgpool community,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> >
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > When system is
>> configured
>> >> for
>> >> >>> >> >>>>>>>> security
>> >> >>> >> >>>>>>>> >> reasons
>> >> >>> >> >>>>>>>> >> >>> not
>> >> >>> >> >>>>>>>> >> >>> >> to
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> return
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> destination
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable
>> messages,
>> >> >>> even
>> >> >>> >> >>>>>>>> though
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> health_check_timeout is
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> configured,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > socket call will block
>> and
>> >> >>> alarm
>> >> >>> >> >>>>>>>> will not get
>> >> >>> >> >>>>>>>> >> >>> raised
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> until TCP
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> timeout
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > occurs.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Interesting. So are you
>> >> saying
>> >> >>> that
>> >> >>> >> >>>>>>>> read(2)
>> >> >>> >> >>>>>>>> >> >>> cannot be
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> interrupted by
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> alarm signal if the
>> system
>> >> is
>> >> >>> >> >>>>>>>> configured not to
>> >> >>> >> >>>>>>>> >> >>> return
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> destination
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> host unreachable message?
>> >> >>> Could you
>> >> >>> >> >>>>>>>> please
>> >> >>> >> >>>>>>>> >> guide
>> >> >>> >> >>>>>>>> >> >>> me
>> >> >>> >> >>>>>>>> >> >>> >> >> where I
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> can
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> get
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> such that info? (I'm not
>> a
>> >> >>> network
>> >> >>> >> >>>>>>>> expert).
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > Not a C programmer,
>> found
>> >> >>> some
>> >> >>> >> info
>> >> >>> >> >>>>>>>> that
>> >> >>> >> >>>>>>>> >> select
>> >> >>> >> >>>>>>>> >> >>> call
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> could be
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> replace
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> with
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > select/pselect calls.
>> >> Maybe
>> >> >>> it
>> >> >>> >> >>>>>>>> would be best
>> >> >>> >> >>>>>>>> >> if
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> PGCONNECT_TIMEOUT
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> value
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > could be used here for
>> >> >>> connection
>> >> >>> >> >>>>>>>> timeout.
>> >> >>> >> >>>>>>>> >> >>> pgpool
>> >> >>> >> >>>>>>>> >> >>> >> has
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> libpq as
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> dependency,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > why isn't it using
>> libpq
>> >> for
>> >> >>> the
>> >> >>> >> >>>>>>>> healthcheck
>> >> >>> >> >>>>>>>> >> db
>> >> >>> >> >>>>>>>> >> >>> >> connect
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> calls, then
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> > PGCONNECT_TIMEOUT
>> would be
>> >> >>> >> applied?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> I don't think libpq uses
>> >> >>> >> >>>>>>>> select/pselect for
>> >> >>> >> >>>>>>>> >> >>> >> establishing
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> connection,
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> but using libpq instead
>> of
>> >> >>> homebrew
>> >> >>> >> >>>>>>>> code seems
>> >> >>> >> >>>>>>>> >> to
>> >> >>> >> >>>>>>>> >> >>> be
>> >> >>> >> >>>>>>>> >> >>> >> an
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> idea.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> Let me
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> think about it.
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> One question. Are you
>> sure
>> >> that
>> >> >>> >> libpq
>> >> >>> >> >>>>>>>> can deal
>> >> >>> >> >>>>>>>> >> >>> with
>> >> >>> >> >>>>>>>> >> >>> >> the
>> >> >>> >> >>>>>>>> >> >>> >> >> case
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> (not to
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> return destination host
>> >> >>> unreachable
>> >> >>> >> >>>>>>>> messages)
>> >> >>> >> >>>>>>>> >> by
>> >> >>> >> >>>>>>>> >> >>> using
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> PGCONNECT_TIMEOUT?
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> --
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Tatsuo Ishii
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> SRA OSS, Inc. Japan
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> English:
>> >> >>> >> >>>>>>>> http://www.sraoss.co.jp/index_en.php
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >> Japanese:
>> >> >>> http://www.sraoss.co.jp
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >> >>
>> >> >>> >> >>>>>>>> >> >>> >> >>
>> >> >>> >> >>>>>>>> >> >>> >>
>> >> >>> >> >>>>>>>> >> >>>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >> >>
>> >> >>> >> >>>>>>>> >>
>> >> >>> >> >>>>>>>>
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>>
>> >> >>> >> >>>>>>
>> >> >>> >> >>>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >> >>
>> >>
>>