[pgpool-hackers: 2604] Re: Manual failover with pgpool and repmgr

Fri Nov 17 07:49:45 JST 2017

Sorry for the wrong information.

You needed to run pcp_attached_node before using pcp_promoto_node.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Unfortunately, the node 0 cannot be promoted because it is down anyway, I have fixed it and pgpool have recognized the new replication configuration, the solution is below:
> 
> 1) So, basically I have stopped the pgpool service (debian variant):
> sudo service pgpool2 stop
> 
> 2) And started the pgpool process manually using the --discard-status option "-D, Discard pgpool_status file and do not restore previous status":
> pgpool -n -D &
> Note: The -n (--dont-detach) option it was used just to show the output on the tty
> 
> 3) And finally, I have restarted the pgpool and restored as a service again:
> service pgpool2 restart
> 
> Result:
> show pool_nodes;
> -[ RECORD 1 ]-----+---------------
> node_id           | 0
> hostname          | 192.168.0.1
> port              | 5432
> status            | up
> lb_weight         | 0.500000
> role              | primary
> select_cnt        | 1
> load_balance_node | false
> replication_delay | 0
> -[ RECORD 2 ]-----+---------------
> node_id           | 1
> hostname          | 192.168.0.2
> port              | 5432
> status            | up
> lb_weight         | 0.500000
> role              | standby
> select_cnt        | 0
> load_balance_node | true
> replication_delay | 0
> 
> Regards
> Juliano
> 
>> -------- Original Message --------
>> Subject: Re: [pgpool-hackers: 2596] Manual failover with pgpool and repmgr
>> Local Time: November 15, 2017 9:03 PM
>> UTC Time: November 15, 2017 9:03 PM
>> From: ishii at sraoss.co.jp
>> To: jplinux at protonmail.com
>> ishii at sraoss.co.jp, pgpool-hackers at pgpool.net
>>
>> Assuming that node 0 is already up and running, and node 1 is a
>> standby node connecting to node 0, then you can use pcp_promote_node
>> to make node 0 a primary again.
>>
>> Best regards,
>>
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>> Thanks for your suggestion, I have raised this question on Repmgr group as well.
>>> So, a different question could be: How to change the node 0 status value from DOWN to UP on pool_nodes?
>>> show pool_nodes;
>>> -[ RECORD 1 ]-----+---------------
>>> node_id | 0
>>> hostname | 192.168.0.1
>>> port | 5432
>>> status | down
>>> lb_weight | 0.500000
>>> role | standby
>>> select_cnt | 0
>>> load_balance_node | false
>>> replication_delay | 0
>>> Then I might to be able to promote this node to primary again:
>>> pcp_promote_node 0 -U postgres -h localhost
>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [pgpool-hackers: 2596] Manual failover with pgpool and repmgr
>>>> Local Time: November 15, 2017 11:11 AM
>>>> UTC Time: November 15, 2017 11:11 AM
>>>> From: ishii at sraoss.co.jp
>>>> To: jplinux at protonmail.com
>>>> pgpool-hackers at pgpool.net
>>>> Pgpool-II development team does not guarantee Pgpool-II works with repmgr.
>>>> Probably you'd better to ask someone else who is familiar with repmgr.
>>>> Best regards,
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese:http://www.sraoss.co.jp
>>>>
>>>>> Hi guys
>>>>> After executed a manual failover I have been recovered the repmgr replication between s1 (master - read/write) and s2 (standby - read only):
>>>>> repmgr cluster show
>>>>> Role | Name | Upstream | Connection String
>>>>> ----------+------|----------|----------------------------------------------
>>>>>
>>>>> - master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
>>>>> standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr
>>>>>
>>>>> So, the problem is after swapping the active nodes using repmgr (1. stop postgres on standby, 2. promote the master, 3. clone the standby), pgpool can't recognize the nodes correctly and shows me the master node as down:
>>>>> show pool_nodes;
>>>>> node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
>>>>> ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
>>>>> 0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
>>>>> 1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 0 | true | 0
>>>>> The replication is working fine and repmgr shows me everything is correct:
>>>>> repmgr cluster show
>>>>> Role | Name | Upstream | Connection String
>>>>> ----------+------|----------|----------------------------------------------
>>>>>
>>>>> - master | s1 | | host=192.168.0.1 dbname=repmgr user=repmgr
>>>>> standby | s2 | s1 | host=192.168.0.2 dbname=repmgr user=repmgr
>>>>>
>>>>> So, I have tried to fix pgpool using pcp commands without success, and restarted pgpool service:
>>>>> Detach command is not accepted:
>>>>> pcp_detach_node 0 -h localhost -U postgres
>>>>> ERROR: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
>>>>> I can promote the node 0 (down) but nothing happens:
>>>>> pcp_promote_node 0 -U postgres -h localhost
>>>>> pcp_promote_node -- Command Successful
>>>>> show pool_nodes
>>>>> node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
>>>>> ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------
>>>>> 0 | 192.168.0.1 | 5432 | down | 0.500000 | standby | 0 | false | 0
>>>>> 1 | 192.168.0.2 | 5432 | up | 0.500000 | standby | 3 | true | 0
>>>>> (2 rows)
>>>>> And I can't recovery node 1 (standby):
>>>>> pcp_recovery_node 1 -U postgres -h localhost
>>>>> ERROR: process recovery request failed
>>>>> DETAIL: primary server cannot be recovered by online recovery.
>>>>> Here is the main config on pgpool.conf
>>>>> backend_flag0 = 'ALLOW_TO_FAILOVER'
>>>>> backend_flag1 = 'ALLOW_TO_FAILOVER'
>>>>> load_balance_mode = on
>>>>> master_slave_mode = on
>>>>> master_slave_sub_mode = 'stream'
>>>>> failover_command = ''
>>>>> recovery_1st_stage_command = ''
>>>>> Please, help me. I don't know what I am doing wrong