[pgpool-hackers: 628] Re: pgpool-II ignores heath check retry
Muhammad Usama
m.usama at gmail.com
Wed Oct 29 19:53:08 JST 2014
Hi Tatsuo,
I have looked into this. The problem was only with the log message and
setting health_check_max_retries = 0 does not
Proceeds to retry, but does prints an invalid LOG message.
Unfortunately the problem still persists with your fix, since
"retrycnt" is incremented before calling
process_backend_health_check_failure() function. And with first health
check failure the retrycnt becomes 1 and if(retrycnt <=
health_check_max_retries) is enough to detect if retries are
exhausted or not for health_check_max_retries = 0.
I have tried to fix the problem by resetting the retrycnt if
process_backend_health_check_failure() informs retry count is
exhausted.
Can you please have a look at the attached patch.
Thanks and regards
Usama
On Wed, Oct 29, 2014 at 6:32 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> Unfortunately with the fix even health_check_max_retries = 0,
> pgpool-II tries to retry at least once. Will fix.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>> Hi Usama,
>>
>> Great! Thanks. Tomorrow I'm going to release RC1.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>> Hi Tatsuo,
>>> Thanks for pointing this out. I have committed the fix in master branch.
>>>
>>> The reason was a typo in process_backend_health_check_failure() function
>>> where the function tries to figure out the max health check retry count for
>>> current pgpool-II configuration. Instead of using
>>> pool_config->health_check_max_retries configuration for max retry count
>>> the function was mistakenly considering
>>> pool_config->health_check_retry_delay for the purpose.
>>>
>>> Kind regards,
>>> Muhammad Usama
>>>
>>>
>>> On Tue, Oct 28, 2014 at 5:31 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>
>>>> Hi Usama,
>>>>
>>>> It seems master branch pgpool-II ignores health check retry
>>>> setting. At least 3.3 stable pgpool-II behaves as expected. I am
>>>> afraid this will be the show stopper for 3.5.0.
>>>>
>>>> Step to reproduce;
>>>>
>>>> 1) pgpool.conf
>>>> fail_over_on_backend_error = off
>>>> health_check_period = 5
>>>> health_check_timeout = 2
>>>> health_check_max_retries = 20
>>>> health_check_retry_delay = 1
>>>> sr_check_period = 0
>>>>
>>>> 2) start pgpool-II.
>>>>
>>>> 3) kill one of postmaster using kill -9. (I know this is bad but
>>>> normal shutdown command using pg_ctl does not trigger health check
>>>> retry since pgpool-II immediately detaches the node)
>>>>
>>>> 4) check pgpool log. You will see it reried only once or twice.
>>>>
>>>> 2014-10-28 09:19:19: pid 9519: LOG: health checking retry count 1
>>>> 2014-10-28 09:19:19: pid 9519: LOG: failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:19:19: pid 9519: DETAIL: connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:19:19: pid 9519: ERROR: failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:19:19: pid 9519: DETAIL: connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:19:19: pid 9519: LOG: setting backend node 1 status to NODE
>>>> DOWN
>>>> 2014-10-28 09:19:19: pid 9519: LOG: starting degeneration. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:19:19: pid 9519: LOG: Restart all children
>>>> :
>>>> :
>>>> :
>>>>
>>>> I attached full debug log for your convenience.
>>>>
>>>> Best regards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese:http://www.sraoss.co.jp
>>>>
>>>> 2014-10-28 09:29:39: pid 9955: LOG: Backend status file
>>>> /home/t-ishii/work/pgpool-II/current/aaa/log/pgpool_status discarded
>>>> 2014-10-28 09:29:39: pid 9955: LOG: Setting up socket for 127.0.0.1:11002
>>>> 2014-10-28 09:29:39: pid 9956: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9957: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9958: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9961: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9962: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9960: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9965: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9966: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9959: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9969: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9971: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9973: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9963: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9976: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9977: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9980: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9981: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9984: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9985: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: LOG: pgpool-II successfully started.
>>>> version 3.4beta2 (tataraboshi)
>>>> 2014-10-28 09:29:39: pid 9955: LOG: find_primary_node: checking
>>>> bac2014-10-28 09:29:39: pid 9988: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9988: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: I am 9989
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9987: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9967: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9986: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9968: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9970: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9964: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9972: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: extended:0 query:"SELECT
>>>> pg_is_in_recovery()"
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9974: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9975: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9978: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: LOG: find_primary_node: primary node id is
>>>> 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9979: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9982: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9983: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: extended:0 query:"SELECT
>>>> pg_current_xlog_location()"
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: extended:0 query:"SELECT
>>>> pg_last_xlog_replay_location()"
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG: do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: extended:0 query:"SELECT
>>>> pg_current_xlog_location()"
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'T'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'D'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'C'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'Z'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: extended:0 query:"SELECT
>>>> pg_last_xlog_replay_location()"
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'T'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'D'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'C'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: kind: 'Z'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG: do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT: while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: LOG: failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL: connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:54: pid 9955: ERROR: failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL: connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: LOG: failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL: connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:54: pid 9955: ERROR: failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL: connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG: Sleeping for 1 seconds from process
>>>> backend health check failure
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL: health check failed retry no is 1
>>>> while max retries are 1
>>>> 2014-10-28 09:29:55: pid 9955: LOG: health checking retry count 1
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:55: pid 9955: LOG: failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:55: pid 9955: ERROR: failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: LOG: setting backend node 1 status to NODE
>>>> DOWN
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler called
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: starting to select new master node
>>>> 2014-10-28 09:29:55: pid 9955: LOG: starting degeneration. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:29:55: pid 9955: LOG: Restart all children
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9956
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9957
>>>> 2014-10-28 09:29:55: pid 9957: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9959
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9960
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9961
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9962
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9963
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9964
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9965
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9966
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9967
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9968
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9969
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9970
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9971
>>>> 2014-10-28 09:29:55: pid 9960: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9972
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9973
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9974
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9975
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9976
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9977
>>>> 2014-10-28 09:29:55: pid 9961: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9979
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9980
>>>> 2014-10-28 09:29:55: pid 9965: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9963: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9964: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9966: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9969: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9968: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9971: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9973: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9981
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9982
>>>> 2014-10-28 09:29:55: pid 9982: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9976: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9985
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9986
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG: failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL: kill process with PID:9987
>>>> 2014-10-28 09:29:55: pid 9955: LOG: execute command:
>>>> /home/t-ishii/work/pgpool-II/current/aaa/etc/failover.sh 1 /tmp 11001
>>>> /home/t-ishii/work/pgpool-II/current/aaa/data1 0 0 /tmp 0 11000
>>>> /home/t-ishii/work/pgpool-II/current/aaa/data0
>>>> 2014-10-28 09:29:55: pid 9977: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9980: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9986: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9983: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9981: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9979: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9987: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9985: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9978: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9984: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9967: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9970: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9972: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9975: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9974: LOG: child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: LOG: failover: set new primary node: 0
>>>> 2014-10-28 09:29:55: pid 9955: LOG: failover: set new master node: 0
>>>> 2014-10-28 09:29:55: pid 10008: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10009: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10012: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10011: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10013: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10010: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10015: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10014: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10016: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10019: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10017: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10029: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10018: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10023: DEBUG: initializing backend status
>>>> failover done. shutdown host /tmp(11001)2014-10-28 09:29:55: pid 9989:
>>>> LOG: worker process received restart request
>>>> 2014-10-28 09:29:55: pid 10021: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 9955: LOG: failover done. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:29:55: pid 10024: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 9988: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10022: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10020: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10026: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10027: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10034: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10035: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10028: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10036: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10030: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10037: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10031: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10038: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10032: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10039: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:55: pid 10033: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:56: pid 9988: LOG: restart request received in pcp child
>>>> process
>>>> 2014-10-28 09:29:56: pid 9955: LOG: PCP child 9988 exits with status 256
>>>> in failover()
>>>> 2014-10-28 09:29:56: pid 9955: LOG: fork a new PCP child pid 10040 in
>>>> failover()
>>>> 2014-10-28 09:29:56: pid 9955: LOG: health checking retry count 2
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG: I am PCP child with pid:10040
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9956
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9957
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9958
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9959
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9960
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9961
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9962
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9963
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9964
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9965
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9966
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9967
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9968
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9969
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9970
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9971
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9972
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9973
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9974
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9975
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9976
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9977
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9978
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9979
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9980
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9981
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9982
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9983
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9984
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9985
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9986
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: child with PID:9987
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: LOG: worker child 9989 exits with status
>>>> 256
>>>> 2014-10-28 09:29:56: pid 9955: LOG: fork a new worker child pid 10042
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG: reaper handler: exiting normally
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: I am 10042
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: initializing backend status
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG: SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG: SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG: SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG: SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: starting health check
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: authenticate kind = 0
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG: health check: clearing alarm
>>>>
>>>> _______________________________________________
>>>> pgpool-hackers mailing list
>>>> pgpool-hackers at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>>>
>>>>
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
-------------- next part --------------
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
index 449529f..d88e202 100644
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -353,11 +353,18 @@ int PgpoolMain(bool discard_status, bool clear_memcache_oidmaps)
}
else
{
+ int ret;
+
retrycnt++;
- if(process_backend_health_check_failure(health_check_node_id, retrycnt))
+ ret = process_backend_health_check_failure(health_check_node_id, retrycnt);
+ if(ret > 0) /* Retries are exhausted, reset the counter */
{
- health_check_node_id = 0;
- use_template_db = false;
+ retrycnt = 0;
+ if(ret == 2)
+ {
+ health_check_node_id = 0;
+ use_template_db = false;
+ }
}
}
}
@@ -481,7 +488,8 @@ int PgpoolMain(bool discard_status, bool clear_memcache_oidmaps)
* Function process the backend node failure captured by the health check
* since this function is called from the exception handler so ereport(ERROR)
* is not allowed from this function
- * Function returns non zero if failover is performed and 0 otherwise.
+ * Function returns non zero if no retries are exhausted.
+ * ( 1 if failover is not performed on node and 2 in case of failover)
*/
static int
process_backend_health_check_failure(int health_check_node_id, int retrycnt)
@@ -515,6 +523,7 @@ process_backend_health_check_failure(int health_check_node_id, int retrycnt)
{
ereport(LOG,
(errmsg("health check failed on node %d but failover is disallowed for the node", health_check_node_id)));
+ return 1;
}
else
{
@@ -522,7 +531,7 @@ process_backend_health_check_failure(int health_check_node_id, int retrycnt)
(errmsg("setting backend node %d status to NODE DOWN", health_check_node_id)));
health_check_timer_expired = 0;
register_node_operation_request(NODE_DOWN_REQUEST,&health_check_node_id,1);
- return 1;
+ return 2;
/* need to distribute this info to children ??*/
}
}
More information about the pgpool-hackers
mailing list