[pgpool-hackers: 628] Re: pgpool-II ignores heath check retry

Muhammad Usama m.usama at gmail.com
Wed Oct 29 19:53:08 JST 2014


Hi Tatsuo,

I have looked into this. The problem was only with the log message and
setting health_check_max_retries = 0 does not
Proceeds to retry, but does prints an invalid LOG message.
Unfortunately the problem still persists with your fix, since
"retrycnt" is incremented before calling
process_backend_health_check_failure() function. And with first health
check failure the retrycnt becomes 1 and  if(retrycnt <=
health_check_max_retries)  is enough to detect if retries are
exhausted or not for health_check_max_retries = 0.

I have tried to fix the problem by resetting the retrycnt if
process_backend_health_check_failure() informs retry count is
exhausted.
Can you please have a look at the attached patch.

Thanks and regards
Usama

On Wed, Oct 29, 2014 at 6:32 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> Unfortunately with the fix even health_check_max_retries = 0,
> pgpool-II tries to retry at least once. Will fix.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>> Hi Usama,
>>
>> Great! Thanks. Tomorrow I'm going to release RC1.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>> Hi Tatsuo,
>>> Thanks for pointing this out. I have committed the fix in master branch.
>>>
>>> The reason was a typo in process_backend_health_check_failure() function
>>> where the function tries to figure out the max health check retry count for
>>> current pgpool-II configuration. Instead of using
>>> pool_config->health_check_max_retries  configuration for max retry count
>>> the function was mistakenly considering
>>> pool_config->health_check_retry_delay for the purpose.
>>>
>>> Kind regards,
>>> Muhammad Usama
>>>
>>>
>>> On Tue, Oct 28, 2014 at 5:31 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>
>>>> Hi Usama,
>>>>
>>>> It seems master branch pgpool-II ignores health check retry
>>>> setting. At least 3.3 stable pgpool-II behaves as expected. I am
>>>> afraid this will be the show stopper for 3.5.0.
>>>>
>>>> Step to reproduce;
>>>>
>>>> 1) pgpool.conf
>>>> fail_over_on_backend_error = off
>>>> health_check_period = 5
>>>> health_check_timeout = 2
>>>> health_check_max_retries = 20
>>>> health_check_retry_delay = 1
>>>> sr_check_period = 0
>>>>
>>>> 2) start pgpool-II.
>>>>
>>>> 3) kill one of postmaster using kill -9. (I know this is bad but
>>>> normal shutdown command using pg_ctl does not trigger health check
>>>> retry since pgpool-II immediately detaches the node)
>>>>
>>>> 4) check pgpool log. You will see it reried only once or twice.
>>>>
>>>> 2014-10-28 09:19:19: pid 9519: LOG:  health checking retry count 1
>>>> 2014-10-28 09:19:19: pid 9519: LOG:  failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:19:19: pid 9519: DETAIL:  connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:19:19: pid 9519: ERROR:  failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:19:19: pid 9519: DETAIL:  connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:19:19: pid 9519: LOG:  setting backend node 1 status to NODE
>>>> DOWN
>>>> 2014-10-28 09:19:19: pid 9519: LOG:  starting degeneration. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:19:19: pid 9519: LOG:  Restart all children
>>>> :
>>>> :
>>>> :
>>>>
>>>> I attached full debug log for your convenience.
>>>>
>>>> Best regards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese:http://www.sraoss.co.jp
>>>>
>>>> 2014-10-28 09:29:39: pid 9955: LOG:  Backend status file
>>>> /home/t-ishii/work/pgpool-II/current/aaa/log/pgpool_status discarded
>>>> 2014-10-28 09:29:39: pid 9955: LOG:  Setting up socket for 127.0.0.1:11002
>>>> 2014-10-28 09:29:39: pid 9956: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9957: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9958: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9961: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9962: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9960: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9965: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9966: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9959: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9969: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9971: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9973: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9963: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9976: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9977: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9980: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9981: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9984: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9985: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: LOG:  pgpool-II successfully started.
>>>> version 3.4beta2 (tataraboshi)
>>>> 2014-10-28 09:29:39: pid 9955: LOG:  find_primary_node: checking
>>>> bac2014-10-28 09:29:39: pid 9988: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9988: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  I am 9989
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9987: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9967: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9986: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9968: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9970: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9964: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9972: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: extended:0 query:"SELECT
>>>> pg_is_in_recovery()"
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9974: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9975: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9978: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: LOG:  find_primary_node: primary node id is
>>>> 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9979: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9982: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9983: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: extended:0 query:"SELECT
>>>> pg_current_xlog_location()"
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: extended:0 query:"SELECT
>>>> pg_last_xlog_replay_location()"
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'T'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'D'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'C'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: kind: 'Z'
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:39: pid 9989: DEBUG:  do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:39: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:44: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: extended:0 query:"SELECT
>>>> pg_current_xlog_location()"
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'T'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'D'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'C'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'Z'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: extended:0 query:"SELECT
>>>> pg_last_xlog_replay_location()"
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'T'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received ROW DESCRIPTION
>>>> ('T')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: row description:
>>>> num_fileds: 1
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'D'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received DATA ROW ('D')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'C'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received COMMAND COMPLETE
>>>> ('C')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: kind: 'Z'
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9989: DEBUG:  do_query: received READY FOR QUERY
>>>> ('Z')
>>>> 2014-10-28 09:29:49: pid 9989: CONTEXT:  while checking replication time
>>>> lag
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 1 having status 1 is successful
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:49: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: LOG:  failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL:  connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:54: pid 9955: ERROR:  failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL:  connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:54: pid 9955: LOG:  failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL:  connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:54: pid 9955: ERROR:  failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL:  connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:54: pid 9955: DEBUG:  Sleeping for 1 seconds from process
>>>> backend health check failure
>>>> 2014-10-28 09:29:54: pid 9955: DETAIL:  health check failed retry no is 1
>>>> while max retries are 1
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  health checking retry count 1
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  Backend DB node 1 status is 1
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 1 having status 1
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  failed to connect to PostgreSQL
>>>> server by unix domain socket
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  connect to "/tmp/.s.PGSQL.11001"
>>>> failed with error "Connection refused"
>>>> 2014-10-28 09:29:55: pid 9955: ERROR:  failed to make persistent db
>>>> connection
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  connection to host:"/tmp:11001"
>>>> failed
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  setting backend node 1 status to NODE
>>>> DOWN
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler called
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  starting to select new master node
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  starting degeneration. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  Restart all children
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9956
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9957
>>>> 2014-10-28 09:29:55: pid 9957: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9959
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9960
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9961
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9962
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9963
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9964
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9965
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9966
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9967
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9968
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9969
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9970
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9971
>>>> 2014-10-28 09:29:55: pid 9960: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9972
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9973
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9974
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9975
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9976
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9977
>>>> 2014-10-28 09:29:55: pid 9961: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9979
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9980
>>>> 2014-10-28 09:29:55: pid 9965: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9963: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9964: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9966: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9969: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9968: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9971: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9973: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9981
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9982
>>>> 2014-10-28 09:29:55: pid 9982: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9976: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9985
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9986
>>>> 2014-10-28 09:29:55: pid 9955: DEBUG:  failover handler
>>>> 2014-10-28 09:29:55: pid 9955: DETAIL:  kill process with PID:9987
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  execute command:
>>>> /home/t-ishii/work/pgpool-II/current/aaa/etc/failover.sh 1 /tmp 11001
>>>> /home/t-ishii/work/pgpool-II/current/aaa/data1 0 0 /tmp 0 11000
>>>> /home/t-ishii/work/pgpool-II/current/aaa/data0
>>>> 2014-10-28 09:29:55: pid 9977: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9980: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9986: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9983: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9981: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9979: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9987: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9985: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9978: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9984: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9967: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9970: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9972: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9975: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9974: LOG:  child process received shutdown
>>>> request signal 3
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  failover: set new primary node: 0
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  failover: set new master node: 0
>>>> 2014-10-28 09:29:55: pid 10008: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10009: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10012: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10011: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10013: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10010: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10015: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10014: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10016: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10019: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10017: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10029: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10018: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10023: DEBUG:  initializing backend status
>>>> failover done. shutdown host /tmp(11001)2014-10-28 09:29:55: pid 9989:
>>>> LOG:  worker process received restart request
>>>> 2014-10-28 09:29:55: pid 10021: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 9955: LOG:  failover done. shutdown host
>>>> /tmp(11001)
>>>> 2014-10-28 09:29:55: pid 10024: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 9988: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10022: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10020: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10026: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10027: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10034: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10035: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10028: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10036: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10030: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10037: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10031: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10038: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10032: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10039: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:55: pid 10033: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:56: pid 9988: LOG:  restart request received in pcp child
>>>> process
>>>> 2014-10-28 09:29:56: pid 9955: LOG:  PCP child 9988 exits with status 256
>>>> in failover()
>>>> 2014-10-28 09:29:56: pid 9955: LOG:  fork a new PCP child pid 10040 in
>>>> failover()
>>>> 2014-10-28 09:29:56: pid 9955: LOG:  health checking retry count 2
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG:  I am PCP child with pid:10040
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:56: pid 10040: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9956
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9957
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9958
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9959
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9960
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9961
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9962
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9963
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9964
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9965
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9966
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9967
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9968
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9969
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9970
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9971
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9972
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9973
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9974
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9975
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9976
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9977
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9978
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9979
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9980
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9981
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9982
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9983
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9984
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9985
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9986
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: child with PID:9987
>>>> exits with status 0
>>>> 2014-10-28 09:29:56: pid 9955: LOG:  worker child 9989 exits with status
>>>> 256
>>>> 2014-10-28 09:29:56: pid 9955: LOG:  fork a new worker child pid 10042
>>>> 2014-10-28 09:29:56: pid 9955: DEBUG:  reaper handler: exiting normally
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  I am 10042
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  initializing backend status
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:29:56: pid 10042: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:01: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG:  SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:06: pid 10042: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:06: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:11: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG:  SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:16: pid 10042: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:16: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:21: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG:  SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:26: pid 10042: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:26: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:31: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG:  SSL is requested but SSL support
>>>> is not available
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:36: pid 10042: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  starting health check
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  Backend DB node 0 status is 1
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  Trying to make persistent DB
>>>> connection to backend node 0 having status 1
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  SSL is requested but SSL support is
>>>> not available
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  authenticate kind = 0
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  authenticate backend: key data
>>>> received
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  authenticate backend: transaction
>>>> state: I
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  persistent DB connection to backend
>>>> node 0 having status 1 is successful
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  Backend DB node 1 status is 3
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  health check: clearing alarm
>>>> 2014-10-28 09:30:36: pid 9955: DEBUG:  health check: clearing alarm
>>>>
>>>> _______________________________________________
>>>> pgpool-hackers mailing list
>>>> pgpool-hackers at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>>>
>>>>
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
-------------- next part --------------
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
index 449529f..d88e202 100644
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -353,11 +353,18 @@ int PgpoolMain(bool discard_status, bool clear_memcache_oidmaps)
 				}
 				else
 				{
+					int ret;
+
 					retrycnt++;
-					if(process_backend_health_check_failure(health_check_node_id, retrycnt))
+					ret = process_backend_health_check_failure(health_check_node_id, retrycnt);
+					if(ret > 0) /* Retries are exhausted, reset the counter */
 					{
-						health_check_node_id = 0;
-						use_template_db = false;
+						retrycnt = 0;
+						if(ret == 2)
+						{
+							health_check_node_id = 0;
+							use_template_db = false;
+						}
 					}
 				}
 			}
@@ -481,7 +488,8 @@ int PgpoolMain(bool discard_status, bool clear_memcache_oidmaps)
  * Function process the backend node failure captured by the health check
  * since this function is called from the exception handler so ereport(ERROR)
  * is not allowed from this function
- * Function returns non zero if failover is performed and 0 otherwise.
+ * Function returns non zero if no retries are exhausted.
+ * ( 1 if failover is not performed on node and 2 in case of failover)
  */
 static int
 process_backend_health_check_failure(int health_check_node_id, int retrycnt)
@@ -515,6 +523,7 @@ process_backend_health_check_failure(int health_check_node_id, int retrycnt)
 		{
 			ereport(LOG,
 					(errmsg("health check failed on node %d but failover is disallowed for the node", health_check_node_id)));
+			return 1;
 		}
 		else
 		{
@@ -522,7 +531,7 @@ process_backend_health_check_failure(int health_check_node_id, int retrycnt)
 					(errmsg("setting backend node %d status to NODE DOWN", health_check_node_id)));
 			health_check_timer_expired = 0;
 			register_node_operation_request(NODE_DOWN_REQUEST,&health_check_node_id,1);
-			return 1;
+			return 2;
 			/* need to distribute this info to children ??*/
 		}
 	}


More information about the pgpool-hackers mailing list