[pgpool-hackers: 2597] Re: Manual failover with pgpool and repmgr

Wed Nov 15 20:11:04 JST 2017

Pgpool-II development team does not guarantee Pgpool-II works with repmgr.
Probably you'd better to ask someone else who is familiar with repmgr.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi guys
> 
> After executed a manual failover I have been recovered the repmgr replication between s1 (master - read/write) and s2 (standby - read only):
> 
> repmgr cluster show
> Role | Name | Upstream | Connection String
> ----------+------|----------|----------------------------------------------
> * master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
>   standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr
> 
> So, the problem is after swapping the active nodes using repmgr (1. stop postgres on standby, 2. promote the master, 3. clone the standby), pgpool can't recognize the nodes correctly and shows me the master node as down:
> 
> show pool_nodes;
>  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
> ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
>  0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
>  1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 0 | true | 0
> 
> The replication is working fine and repmgr shows me everything is correct:
> repmgr cluster show
> Role | Name | Upstream | Connection String
> ----------+------|----------|----------------------------------------------
> * master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
>   standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr
> 
> So, I have tried to fix pgpool using pcp commands without success, and restarted pgpool service:
> 
> Detach command is not accepted:
> pcp_detach_node 0 -h localhost -U postgres
> ERROR: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
> 
> I can promote the node 0 (down) but nothing happens:
> pcp_promote_node 0 -U postgres -h localhost
> pcp_promote_node -- Command Successful
> 
> show pool_nodes
>  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
> ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
>  0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
>  1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 3 | true | 0
> (2 rows)
> 
> And I can't recovery node 1 (standby):
> pcp_recovery_node 1 -U postgres -h localhost
> ERROR: process recovery request failed
> DETAIL: primary server cannot be recovered by online recovery.
> 
> Here is the main config on pgpool.conf
> backend_flag0 = 'ALLOW_TO_FAILOVER'
> backend_flag1 = 'ALLOW_TO_FAILOVER'
> 
> load_balance_mode = on
> 
> master_slave_mode = on
> master_slave_sub_mode = 'stream'
> 
> failover_command = ''
> recovery_1st_stage_command = ''
> 
> Please, help me. I don't know what I am doing wrong