[pgpool-general: 6435] Re: follow_master_command executed on node shown as down (one of unrecovered masters from previous failover)

Andre Piwoni apiwoni at webmd.net
Tue Feb 26 01:34:58 JST 2019


I have already put that check in place.

Thank you for confirming.

On Sat, Feb 23, 2019 at 11:56 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Sorry, I was wrong. A follow_master_command will be executed against
> the down node as well. So you need to check whether target PostgreSQL
> node is running in the follow_master_commdn. If it's not, you can skip
> the node.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > I have added pg_ctl status check to ensure no action is taken when node
> is
> > down but I'll check 3.7.8 version.
> >
> > Here's the Pgpool log from the time node2 is shutdown to time
> node1(already
> > dead old primary) received follow master command.
> > Sorry for double date logging. I'm also including self-explanatory
> > failover.log that I my failover and follow_master scripts generated.
> >
> > Arguments passed to scripts for your reference.
> > failover.sh %d %h %p %D %M %P %m %H %r %R
> > follow_master.sh %d %h %p %D %M %P %m %H %r %R
> >
> > Pool status before shutdown of node 2:
> > postgres=> show pool_nodes;
> >  node_id |          hostname          | port | status | lb_weight |  role
> >  | select_cnt | load_balance_node | replication_delay
> >
> ---------+----------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
> >  0       | pg-hdp-node1.kitchen.local | 5432 | down   | 0.333333  |
> standby
> > | 0          | false             | 0
> >  1       | pg-hdp-node2.kitchen.local | 5432 | up     | 0.333333  |
> primary
> > | 0          | false             | 0
> >  2       | pg-hdp-node3.kitchen.local | 5432 | up     | 0.333333  |
> standby
> > | 0          | true              | 0
> > (3 rows)
> >
> > Pgpool log
> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [126-1] 2019-02-22 10:43:27:
> > pid 12437: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-1] 2019-02-22 10:43:27:
> > pid 12437: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-2] 2019-02-22 10:43:27:
> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> > failed
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-1] 2019-02-22 10:43:37:
> > pid 12437: ERROR:  Failed to check replication time lag
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-2] 2019-02-22 10:43:37:
> > pid 12437: DETAIL:  No persistent db connection for the node 1
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-3] 2019-02-22 10:43:37:
> > pid 12437: HINT:  check sr_check_user and sr_check_password
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-4] 2019-02-22 10:43:37:
> > pid 12437: CONTEXT:  while checking replication time lag
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [129-1] 2019-02-22 10:43:37:
> > pid 12437: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-1] 2019-02-22 10:43:37:
> > pid 12437: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-2] 2019-02-22 10:43:37:
> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> > failed
> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [6-1] 2019-02-22 10:43:45: pid
> > 7786: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-1] 2019-02-22 10:43:45: pid
> > 7786: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-2] 2019-02-22 10:43:45: pid
> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> failed
> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [8-1] 2019-02-22 10:43:45: pid
> > 7786: LOG:  health check retrying on DB node: 1 (round:1)
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-1] 2019-02-22 10:43:47:
> > pid 12437: ERROR:  Failed to check replication time lag
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-2] 2019-02-22 10:43:47:
> > pid 12437: DETAIL:  No persistent db connection for the node 1
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-3] 2019-02-22 10:43:47:
> > pid 12437: HINT:  check sr_check_user and sr_check_password
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-4] 2019-02-22 10:43:47:
> > pid 12437: CONTEXT:  while checking replication time lag
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [132-1] 2019-02-22 10:43:47:
> > pid 12437: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-1] 2019-02-22 10:43:47:
> > pid 12437: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-2] 2019-02-22 10:43:47:
> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> > failed
> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [9-1] 2019-02-22 10:43:48: pid
> > 7786: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-1] 2019-02-22 10:43:48:
> pid
> > 7786: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-2] 2019-02-22 10:43:48:
> pid
> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> failed
> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [11-1] 2019-02-22 10:43:48:
> pid
> > 7786: LOG:  health check retrying on DB node: 1 (round:2)
> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [12-1] 2019-02-22 10:43:51:
> pid
> > 7786: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-1] 2019-02-22 10:43:51:
> pid
> > 7786: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-2] 2019-02-22 10:43:51:
> pid
> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> failed
> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [14-1] 2019-02-22 10:43:51:
> pid
> > 7786: LOG:  health check retrying on DB node: 1 (round:3)
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [15-1] 2019-02-22 10:43:54:
> pid
> > 7786: LOG:  failed to connect to PostgreSQL server on
> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
> "Connection
> > refused"
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-1] 2019-02-22 10:43:54:
> pid
> > 7786: ERROR:  failed to make persistent db connection
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-2] 2019-02-22 10:43:54:
> pid
> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
> failed
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [17-1] 2019-02-22 10:43:54:
> pid
> > 7786: LOG:  health check failed on node 1 (timeout:0)
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [18-1] 2019-02-22 10:43:54:
> pid
> > 7786: LOG:  received degenerate backend request for node_id: 1 from pid
> > [7786]
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [253-1] 2019-02-22 10:43:54:
> pid
> > 7746: LOG:  Pgpool-II parent process has received failover request
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [254-1] 2019-02-22 10:43:54:
> pid
> > 7746: LOG:  starting degeneration. shutdown host
> > pg-hdp-node2.kitchen.local(5432)
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [255-1] 2019-02-22 10:43:54:
> pid
> > 7746: LOG:  Restart all children
> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [256-1] 2019-02-22 10:43:54:
> pid
> > 7746: LOG:  execute command: /etc/pgpool-II/failover.sh 1
> > pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [257-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  find_primary_node_repeatedly: waiting for finding a primary
> node
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [258-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  find_primary_node: checking backend no 0
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [259-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  find_primary_node: checking backend no 1
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [260-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  find_primary_node: checking backend no 2
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [261-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  find_primary_node: primary node id is 2
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [262-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  starting follow degeneration. shutdown host
> > pg-hdp-node1.kitchen.local(5432)
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [263-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  starting follow degeneration. shutdown host
> > pg-hdp-node2.kitchen.local(5432)
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [264-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  failover: 2 follow backends have been degenerated
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [265-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  failover: set new primary node: 2
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [266-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  failover: set new master node: 2
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [267-1] 2019-02-22 10:43:55:
> pid
> > 7746: LOG:  failover done. shutdown host pg-hdp-node2.kitchen.local(5432)
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-1] 2019-02-22 10:43:55:
> > pid 12437: ERROR:  Failed to check replication time lag
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-2] 2019-02-22 10:43:55:
> > pid 12437: DETAIL:  No persistent db connection for the node 1
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-3] 2019-02-22 10:43:55:
> > pid 12437: HINT:  check sr_check_user and sr_check_password
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-4] 2019-02-22 10:43:55:
> > pid 12437: CONTEXT:  while checking replication time lag
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [135-1] 2019-02-22 10:43:55:
> > pid 12437: LOG:  worker process received restart request
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-1] 2019-02-22 10:43:55:
> > pid 12774: LOG:  failback event detected
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-2] 2019-02-22 10:43:55:
> > pid 12774: DETAIL:  restarting myself
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [265-1] 2019-02-22 10:43:55:
> > pid 12742: LOG:  start triggering follow command.
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [266-1] 2019-02-22 10:43:55:
> > pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 0
> > pg-hdp-node1.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [267-1] 2019-02-22 10:43:55:
> > pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 1
> > pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
> > Feb 22 10:43:56 pg-hdp-node3 pgpool[12436]: [60-1] 2019-02-22 10:43:56:
> pid
> > 12436: LOG:  restart request received in pcp child process
> > Feb 22 10:43:56 pg-hdp-node3 pgpool[7746]: [268-1] 2019-02-22 10:43:56:
> pid
> > 7746: LOG:  PCP child 12436 exits with status 0 in failover()
> >
> > Pgpool self-explanatory failover.log
> >
> > 2019-02-22 10:43:54.893 PST Executing failover script ...
> > 2019-02-22 10:43:54.895 PST Script arguments:
> > failed_node_id           1
> > failed_node_host         pg-hdp-node2.kitchen.local
> > failed_node_port         5432
> > failed_node_pgdata       /var/lib/pgsql/10/data
> > old_primary_node_id      1
> > old_master_node_id       1
> > new_master_node_id       2
> > new_master_node_host     pg-hdp-node3.kitchen.local
> > new_master_node_port     5432
> > new_master_node_pgdata   /var/lib/pgsql/10/data
> > 2019-02-22 10:43:54.897 PST Primary node running on
> > pg-hdp-node2.kitchen.local host is unresponsive or have died
> > 2019-02-22 10:43:54.898 PST Attempting to stop primary node running on
> > pg-hdp-node2.kitchen.local host before promoting slave as the new primary
> > 2019-02-22 10:43:54.899 PST ssh -o StrictHostKeyChecking=no -i
> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node2.kitchen.local -T
> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data stop -m fast
> > 2019-02-22 10:43:55.151 PST Promoting pg-hdp-node3.kitchen.local host as
> > the new primary
> > 2019-02-22 10:43:55.153 PST ssh -o StrictHostKeyChecking=no -i
> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node3.kitchen.local -T
> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data promote
> > waiting for server to promote.... done
> > server promoted
> > 2019-02-22 10:43:55.532 PST Completed executing failover
> >
> > 2019-02-22 10:43:55.564 PST Executing follow master script ...
> > 2019-02-22 10:43:55.566 PST Script arguments
> > detached_node_id         0
> > detached_node_host       pg-hdp-node1.kitchen.local
> > detached_node_port       5432
> > detached_node_pgdata     /var/lib/pgsql/10/data
> > old_primary_node_id      1
> > old_master_node_id       1
> > new_master_node_id       2
> > new_master_node_host     pg-hdp-node3.kitchen.local
> > new_master_node_port     5432
> > new_master_node_pgdata   /var/lib/pgsql/10/data
> > 2019-02-22 10:43:55.567 PST Checking if server is running on
> > pg-hdp-node1.kitchen.local host
> > 2019-02-22 10:43:55.569 PST ssh -o StrictHostKeyChecking=no -i
> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node1.kitchen.local -T
> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data status
> >
> >
> > pg_ctl: no server running
> > 2019-02-22 10:43:55.823 PST Node on pg-hdp-node1.kitchen.local host is
> not
> > running. It could be old slave or primary that needs to be recovered.
> > 2019-02-22 10:43:55.824 PST Completed executing follow master script
> >
> > 2019-02-22 10:43:55.829 PST Executing follow master script ...
> > 2019-02-22 10:43:55.830 PST Script arguments
> > detached_node_id         1
> > detached_node_host       pg-hdp-node2.kitchen.local
> > detached_node_port       5432
> > detached_node_pgdata     /var/lib/pgsql/10/data
> > old_primary_node_id      1
> > old_master_node_id       1
> > new_master_node_id       2
> > new_master_node_host     pg-hdp-node3.kitchen.local
> > new_master_node_port     5432
> > new_master_node_pgdata   /var/lib/pgsql/10/data
> > 2019-02-22 10:43:55.831 PST Detached node on pg-hdp-node2.kitchen.local
> > host is the the old primary node
> > 2019-02-22 10:43:55.833 PST Slave can be created from old primary node by
> > deleting PG_DATA directory under /var/lib/pgsql/10/data on
> > pg-hdp-node2.kitchen.local host and re-running Chef client
> > 2019-02-22 10:43:55.834 PST Slave can be recovered from old primary node
> by
> > running /usr/pgsql-10/bin/pg_rewind -D /var/lib/pgsql/10/data
> > --source-server="port=5432 host=pg-hdp-node3.kitchen.local" command on
> > pg-hdp-node2.kitchen.local host as postgres user
> > 2019-02-22 10:43:55.835 PST After successful pg_rewind run cp
> > /var/lib/pgsql/10/data/recovery.done
> /var/lib/pgsql/10/data/recovery.conf,
> > ensure host connection string points to pg-hdp-node3.kitchen.local, start
> > PostgreSQL and attach it to pgpool
> > 2019-02-22 10:43:55.836 PST Completed executing follow master script
> >
> > On Thu, Feb 21, 2019 at 4:47 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> > Is this correct behavior?
> >> >
> >> > In 3-node setup, node1(primary) is shutdown, failover is executed and
> >> node2
> >> > becomes new primary and node3 follows new primary on node2.
> >> > Now, node2(new primary) is shutdown, failover is executed and node3
> >> becomes
> >> > new primary but fallow_master_command is executed on node1 even
> though it
> >> > is reported as down.
> >>
> >> No. follow master command should not be executed on an already-down
> >> node (in this case node1).
> >>
> >> > It happens that my script repoints node1 and restarts it which breaks
> >> hell
> >> > because node1 was never recovered after being shutdown.
> >> >
> >> > I'm on PgPool 3.7.4.
> >>
> >> Can you share the log from when node2 was shutdown to when node1 was
> >> recovered by your follow master command?
> >>
> >> In the mean time 3.7.4 is not the latest one. Can you try with the
> >> latest one? (3.7.8).
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >
> >
> > --
> >
> > *Andre Piwoni*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190225/97df0887/attachment-0001.html>


More information about the pgpool-general mailing list