[pgpool-hackers: 2596] Manual failover with pgpool and repmgr

Wed Nov 15 19:35:12 JST 2017

Hi guys

After executed a manual failover I have been recovered the repmgr replication between s1 (master - read/write) and s2 (standby - read only):

repmgr cluster show
Role | Name | Upstream | Connection String
----------+------|----------|----------------------------------------------
* master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
  standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr

So, the problem is after swapping the active nodes using repmgr (1. stop postgres on standby, 2. promote the master, 3. clone the standby), pgpool can't recognize the nodes correctly and shows me the master node as down:

show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
 1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 0 | true | 0

The replication is working fine and repmgr shows me everything is correct:
repmgr cluster show
Role | Name | Upstream | Connection String
----------+------|----------|----------------------------------------------
* master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
  standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr

So, I have tried to fix pgpool using pcp commands without success, and restarted pgpool service:

Detach command is not accepted:
pcp_detach_node 0 -h localhost -U postgres
ERROR: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover

I can promote the node 0 (down) but nothing happens:
pcp_promote_node 0 -U postgres -h localhost
pcp_promote_node -- Command Successful

show pool_nodes
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
 1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 3 | true | 0
(2 rows)

And I can't recovery node 1 (standby):
pcp_recovery_node 1 -U postgres -h localhost
ERROR: process recovery request failed
DETAIL: primary server cannot be recovered by online recovery.

Here is the main config on pgpool.conf
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_flag1 = 'ALLOW_TO_FAILOVER'

load_balance_mode = on

master_slave_mode = on
master_slave_sub_mode = 'stream'

failover_command = ''
recovery_1st_stage_command = ''

Please, help me. I don't know what I am doing wrong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20171115/e937b8ef/attachment.html>