watchdogアクティブ/スタンバイの切り替え

diff --git a/doc.ja/src/sgml/advanced.sgml b/doc.ja/src/sgml/advanced.sgml index 86c36df..6f1ef10 100644 --- a/doc.ja/src/sgml/advanced.sgml +++ b/doc.ja/src/sgml/advanced.sgml @@ -50,8 +50,8 @@ @@ -891,10 +891,10 @@ The heart of a watchdog process is a state machine that starts from its initial state (WD_LOADING) and transit towards either standby (WD_STANDBY) or - master/coordinator (WD_COORDINATOR) state. - Both standby and master/coordinator states are stable states of the + leader/coordinator (WD_COORDINATOR) state. + Both standby and leader/coordinator states are stable states of the watchdog state machine and the node stays in standby or - master/coordinator state until some problem in local + leader/coordinator state until some problem in local Pgpool-II node is detected or a remote Pgpool-II disconnects from the cluster. --> @@ -947,7 +947,7 @@ 参加しているすべてのPgpool-IIノードと通信し、マスター/コーディネーターノードの選択を調停し、クラスタのクォーラムを確実にする diff --git a/doc.ja/src/sgml/example-AWS.sgml b/doc.ja/src/sgml/example-AWS.sgml index 214b189..244c2cd 100644 --- a/doc.ja/src/sgml/example-AWS.sgml +++ b/doc.ja/src/sgml/example-AWS.sgml @@ -104,9 +104,9 @@ which we will not set in this example instead we will use and to switch the - Elastic IP address to the master/Active Pgpool-II node. + Elastic IP address to the leader/Active Pgpool-II node. --> - この例の設定はとほとんど同じになりますが、を設定せず、代わりにとを使ってmaster/Active Pgpool-IIノードのElastic IPアドレスを切り替えるのが異なります。 + この例の設定はとほとんど同じになりますが、を設定せず、代わりにとを使ってleader/Active Pgpool-IIノードのElastic IPアドレスを切り替えるのが異なります。 @@ -187,10 +187,10 @@ - このスクリプトは、watchdogがactive/masterノードになったときに、Elastic IPをアサインするためにwatchdogが実行します。 + このスクリプトは、watchdogがactive/leaderノードになったときに、Elastic IPをアサインするためにwatchdogが実行します。 aws-escalation.sh: @@ -222,9 +222,9 @@ - このスクリプトは、watchdogがactive/masterノードを退任するときに、Elastic IPのアサインを解除するためにwatchdogが実行します。 + このスクリプトは、watchdogがactive/leaderノードを退任するときに、Elastic IPのアサインを解除するためにwatchdogが実行します。 aws-de-escalation.sh: @@ -288,11 +288,11 @@ それぞれのサーバ上でPgpool-IIを"-n"スイッチ付きで起動し、pgpool.logにログメッセージをリダイレクトします。 - master/active Pgpool-IIノードは、Elastic IPのアサインメッセージを表示します。 + leader/active Pgpool-IIノードは、Elastic IPのアサインメッセージを表示します。 LOG: I am the cluster leader node. Starting escalation process LOG: escalation process started with PID:23543 @@ -357,8 +357,8 @@ LOG: watchdog node state changed from [JOINING] to [INITIALIZING] LOG: I am the only alive node in the watchdog cluster HINT: skipping stand for coordinator state - LOG: watchdog node state changed from [INITIALIZING] to [MASTER] - LOG: I am announcing my self as master/coordinator watchdog node + LOG: watchdog node state changed from [INITIALIZING] to [LEADER] + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes LOG: I am the cluster leader node. Starting escalation process diff --git a/doc.ja/src/sgml/example-cluster.sgml b/doc.ja/src/sgml/example-cluster.sgml index 40a0831..3bcd850 100644 --- a/doc.ja/src/sgml/example-cluster.sgml +++ b/doc.ja/src/sgml/example-cluster.sgml @@ -985,14 +985,14 @@ arping_path = '/usr/sbin' watchdogアクティブ/スタンバイの切り替え - pcp_watchdog_infoでPgpool-IIのwatchdogの情報を確認します。最初に起動したPgpool-IIが「MASTER」になります。 + pcp_watchdog_infoでPgpool-IIのwatchdogの情報を確認します。最初に起動したPgpool-IIが「LEADER」になります。 # pcp_watchdog_info -h 192.168.137.150 -p 9898 -U pgpool Password: 3 YES server1:9999 Linux server1 server1 - server1:9999 Linux server1 server1 9999 9000 4 MASTER #最初に起動されたサーバがMASTERになる + server1:9999 Linux server1 server1 9999 9000 4 LEADER #最初に起動されたサーバがLEADERになる server2:9999 Linux server2 server2 9999 9000 7 STANDBY #スタンバイとして稼働 server3:9999 Linux server3 server3 9999 9000 7 STANDBY #スタンバイとして稼働 @@ -1006,7 +1006,7 @@ arping_path = '/usr/sbin' Password: 3 YES server2:9999 Linux server2 server2 - server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2がアクティブに昇格 + server2:9999 Linux server2 server2 9999 9000 4 LEADER #server2がアクティブに昇格 server1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN #server1が停止された server3:9999 Linux server3 server3 9999 9000 7 STANDBY #スタンバイとして稼働 @@ -1020,7 +1020,7 @@ arping_path = '/usr/sbin' Password: 3 YES server2:9999 Linux server2 server2 - server2:9999 Linux server2 server2 9999 9000 4 MASTER + server2:9999 Linux server2 server2 9999 9000 4 LEADER server1:9999 Linux server1 server1 9999 9000 7 STANDBY server3:9999 Linux server3 server3 9999 9000 7 STANDBY diff --git a/doc.ja/src/sgml/example-watchdog.sgml b/doc.ja/src/sgml/example-watchdog.sgml index 2fd3ed1..ebca9b5 100644 --- a/doc.ja/src/sgml/example-watchdog.sgml +++ b/doc.ja/src/sgml/example-watchdog.sgml @@ -209,14 +209,14 @@ --> ログから、仮想IP アドレスを使用し、またwatchdogプロセス起動したことが確認できます。 - LOG: I am announcing my self as master/coordinator watchdog node + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes LOG: I am the cluster leader node. Starting escalation process LOG: escalation process started with PID:59449 LOG: watchdog process is initialized LOG: watchdog: escalation started - LOG: I am the master watchdog node + LOG: I am the leader watchdog node DETAIL: using the local backend node status @@ -340,8 +340,8 @@ LOG: watchdog node state changed from [JOINING] to [INITIALIZING] LOG: I am the only alive node in the watchdog cluster HINT: skipping stand for coordinator state - LOG: watchdog node state changed from [INITIALIZING] to [MASTER] - LOG: I am announcing my self as master/coordinator watchdog node + LOG: watchdog node state changed from [INITIALIZING] to [LEADER] + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes diff --git a/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml b/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml index d3051a3..dd24be4 100644 --- a/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml +++ b/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml @@ -134,7 +134,7 @@ Pgpool-II documentation 3 NO Linux_host1.localdomain_9991 host1 Linux_host1.localdomain_9991 host1 9991 9001 7 STANDBY - Linux_host2.localdomain_9992 host2 9992 9002 4 MASTER + Linux_host2.localdomain_9992 host2 9992 9002 4 LEADER Linux_host3.localdomain_9993 host3 9993 9003 7 STANDBY @@ -151,8 +151,8 @@ Pgpool-II documentation 1. クラスタ内の全 watchdog ノード数 @@ -196,8 +196,8 @@ Pgpool-II documentation Quorum state : QUORUM EXIST Alive Remote Nodes : 2 VIP up on local node : NO - Master Node Name : Linux_host2.localdomain_9992 - Master Host Name : localhost + Leader Node Name : Linux_host2.localdomain_9992 + Leader Host Name : localhost Watchdog Node Information Node Name : Linux_host1.localdomain_9991 @@ -216,7 +216,7 @@ Pgpool-II documentation Watchdog port : 9002 Node priority : 1 Status : 4 - Status Name : MASTER + Status Name : LEADER Node Name : Linux_host3.localdomain_9993 Host Name : host3 diff --git a/doc.ja/src/sgml/ref/watchdog_setup.sgml b/doc.ja/src/sgml/ref/watchdog_setup.sgml index 68e2f78..f14dcb1 100644 --- a/doc.ja/src/sgml/ref/watchdog_setup.sgml +++ b/doc.ja/src/sgml/ref/watchdog_setup.sgml @@ -484,8 +484,8 @@ Pgpool-II documentation Quorum state : QUORUM EXIST Alive Remote Nodes : 2 VIP up on local node : NO - Master Node Name : Linux_tishii-CF-SX3HE4BP_50004 - Master Host Name : localhost + Leader Node Name : Linux_tishii-CF-SX3HE4BP_50004 + Leader Host Name : localhost Watchdog Node Information Node Name : Linux_tishii-CF-SX3HE4BP_50000 @@ -504,7 +504,7 @@ Pgpool-II documentation Watchdog port : 50006 Node priority : 1 Status : 4 - Status Name : MASTER + Status Name : LEADER Node Name : Linux_tishii-CF-SX3HE4BP_50008 Host Name : localhost diff --git a/doc.ja/src/sgml/watchdog.sgml b/doc.ja/src/sgml/watchdog.sgml index abb6396..298fcd9 100644 --- a/doc.ja/src/sgml/watchdog.sgml +++ b/doc.ja/src/sgml/watchdog.sgml @@ -610,7 +610,7 @@ マスターwatchdogに昇格した時に、ここで指定したコマンドがwatchdogによって実行されます。 @@ -642,10 +642,10 @@ Pgpool-IIのマスターwatchdogが責務を辞退し降格するときに、ここで指定したコマンドが実行されます。 @@ -1183,15 +1183,15 @@ このパラメータによってローカルのwatchdogノードがマスターに選ばれる優先度を上げることができます。古いマスターノードが故障した状況でクラスタがマスターノードの選択を行う際に、wd_priorityが高いノードがマスターwatchdogノードに選ばれます。 diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index ebd8c5f..2f915e6 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -39,8 +39,8 @@ At the startup, if the watchdog is enabled, Pgpool-II node - sync the status of all configured backend nodes from the master watchdog node. - And if the node goes on to become a master node itself it initializes the backend + sync the status of all configured backend nodes from the leader watchdog node. + And if the node goes on to become a leader node itself it initializes the backend status locally. When a backend node status changes by failover etc.., watchdog notifies the information to other Pgpool-II nodes and synchronizes them. When online recovery occurs, watchdog restricts @@ -122,7 +122,7 @@ At startup watchdog verifies the Pgpool-II configuration of the local node for the consistency with the configurations - on the master watchdog node and warns the user of any differences. + on the leader watchdog node and warns the user of any differences. This eliminates the likelihood of undesired behavior that can happen because of different configuration on different Pgpool-II nodes. @@ -596,10 +596,10 @@ The heart of a watchdog process is a state machine that starts from its initial state (WD_LOADING) and transit towards either standby (WD_STANDBY) or - master/coordinator (WD_COORDINATOR) state. - Both standby and master/coordinator states are stable states of the + leader/coordinator (WD_COORDINATOR) state. + Both standby and leader/coordinator states are stable states of the watchdog state machine and the node stays in standby or - master/coordinator state until some problem in local + leader/coordinator state until some problem in local Pgpool-II node is detected or a remote Pgpool-II disconnects from the cluster. @@ -634,7 +634,7 @@ Communicates with all the participating Pgpool-II nodes to coordinate the selection of - master/coordinator node and to ensure the quorum in the cluster. + leader/coordinator node and to ensure the quorum in the cluster. diff --git a/doc/src/sgml/example-AWS.sgml b/doc/src/sgml/example-AWS.sgml index daa63c6..35dcd0d 100644 --- a/doc/src/sgml/example-AWS.sgml +++ b/doc/src/sgml/example-AWS.sgml @@ -67,7 +67,7 @@ which we will not set in this example instead we will use and to switch the - Elastic IP address to the master/Active Pgpool-II node. + Elastic IP address to the leader/Active Pgpool-II node. @@ -128,7 +128,7 @@ This script will be executed by the watchdog - to assign the Elastic IP on the instance when the watchdog becomes the active/master node. + to assign the Elastic IP on the instance when the watchdog becomes the active/leader node. Change the INSTANCE_ID and ELASTIC_IP values as per your AWS setup values. @@ -158,7 +158,7 @@ This script will be executed by watchdog - to remove the Elastic IP from the instance when the watchdog resign from the active/master node. + to remove the Elastic IP from the instance when the watchdog resign from the active/leader node. aws-de-escalation.sh: @@ -215,7 +215,7 @@ Start Pgpool-II on each server with "-n" switch and redirect log messages to the pgpool.log file. - The log message of master/active Pgpool-II node + The log message of leader/active Pgpool-II node will show the message of Elastic IP assignment. LOG: I am the cluster leader node. Starting escalation process @@ -268,8 +268,8 @@ LOG: watchdog node state changed from [JOINING] to [INITIALIZING] LOG: I am the only alive node in the watchdog cluster HINT: skipping stand for coordinator state - LOG: watchdog node state changed from [INITIALIZING] to [MASTER] - LOG: I am announcing my self as master/coordinator watchdog node + LOG: watchdog node state changed from [INITIALIZING] to [LEADER] + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes LOG: I am the cluster leader node. Starting escalation process diff --git a/doc/src/sgml/example-cluster.sgml b/doc/src/sgml/example-cluster.sgml index e615864..0fa401a 100644 --- a/doc/src/sgml/example-cluster.sgml +++ b/doc/src/sgml/example-cluster.sgml @@ -1059,14 +1059,14 @@ arping_path = '/usr/sbin' Switching active/standby watchdog - Confirm the watchdog status by using pcp_watchdog_info. The Pgpool-II server which is started first run as MASTER. + Confirm the watchdog status by using pcp_watchdog_info. The Pgpool-II server which is started first run as LEADER. # pcp_watchdog_info -h 192.168.137.150 -p 9898 -U pgpool Password: 3 YES server1:9999 Linux server1 server1 - server1:9999 Linux server1 server1 9999 9000 4 MASTER #The Pgpool-II server started first becames "MASTER". + server1:9999 Linux server1 server1 9999 9000 4 LEADER #The Pgpool-II server started first becames "LEADER". server2:9999 Linux server2 server2 9999 9000 7 STANDBY #run as standby server3:9999 Linux server3 server3 9999 9000 7 STANDBY #run as standby @@ -1083,7 +1083,7 @@ arping_path = '/usr/sbin' Password: 3 YES server2:9999 Linux server2 server2 - server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2 is promoted to MASTER + server2:9999 Linux server2 server2 9999 9000 4 LEADER #server2 is promoted to LEADER server1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN #server1 is stopped server3:9999 Linux server3 server3 9999 9000 7 STANDBY #server3 runs as STANDBY @@ -1098,7 +1098,7 @@ arping_path = '/usr/sbin' Password: 3 YES server2:9999 Linux server2 server2 - server2:9999 Linux server2 server2 9999 9000 4 MASTER + server2:9999 Linux server2 server2 9999 9000 4 LEADER server1:9999 Linux server1 server1 9999 9000 7 STANDBY server3:9999 Linux server3 server3 9999 9000 7 STANDBY diff --git a/doc/src/sgml/example-watchdog.sgml b/doc/src/sgml/example-watchdog.sgml index 3019354..c5fcdc9 100644 --- a/doc/src/sgml/example-watchdog.sgml +++ b/doc/src/sgml/example-watchdog.sgml @@ -132,14 +132,14 @@ Log messages will show that Pgpool-II has the virtual IP address and starts watchdog process. - LOG: I am announcing my self as master/coordinator watchdog node + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes LOG: I am the cluster leader node. Starting escalation process LOG: escalation process started with PID:59449 LOG: watchdog process is initialized LOG: watchdog: escalation started - LOG: I am the master watchdog node + LOG: I am the leader watchdog node DETAIL: using the local backend node status @@ -229,8 +229,8 @@ LOG: watchdog node state changed from [JOINING] to [INITIALIZING] LOG: I am the only alive node in the watchdog cluster HINT: skipping stand for coordinator state - LOG: watchdog node state changed from [INITIALIZING] to [MASTER] - LOG: I am announcing my self as master/coordinator watchdog node + LOG: watchdog node state changed from [INITIALIZING] to [LEADER] + LOG: I am announcing my self as leader/coordinator watchdog node LOG: I am the cluster leader node DETAIL: our declare coordinator message is accepted by all nodes diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index c0afa95..6614e54 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -66,19 +66,19 @@ database unavailability due to the Pgpool-II being down. Multiple Pgpool-II work together and monitor - each other. One of them is called "master" and it has a virtual + each other. One of them is called "leader" and it has a virtual IP. Clients do not need to aware that there are multiple Pgpool-II because they always access the same VIP. (See for watchdog). If one of Pgpool-II goes down, other Pgpool-II takes over the - master role. + leader role. - Since it is not allowed to have multiple master, watchdog votes to - decide a new master. If there are even number of + Since it is not allowed to have multiple leader, watchdog votes to + decide a new leader. If there are even number of Pgpool-II, it is impossible to decide - the new master by voting. Thus we recommend to deploy + the new leader by voting. Thus we recommend to deploy Pgpool-II in more than 3 odd numbers. diff --git a/doc/src/sgml/ref/pcp_watchdog_info.sgml b/doc/src/sgml/ref/pcp_watchdog_info.sgml index 088a9df..d9a29d3 100644 --- a/doc/src/sgml/ref/pcp_watchdog_info.sgml +++ b/doc/src/sgml/ref/pcp_watchdog_info.sgml @@ -79,7 +79,7 @@ Pgpool-II documentation 3 NO Linux_host1.localdomain_9991 host1 Linux_host1.localdomain_9991 host1 9991 9001 7 STANDBY - Linux_host2.localdomain_9992 host2 9992 9002 4 MASTER + Linux_host2.localdomain_9992 host2 9992 9002 4 LEADER Linux_host3.localdomain_9993 host3 9993 9003 7 STANDBY @@ -90,8 +90,8 @@ Pgpool-II documentation 1. Total watchdog nodes in the cluster 2. Is VIP is up on current node? - 3. Master node name - 4. Master node host + 3. Leader node name + 4. Leader node host Next is the list of watchdog nodes: @@ -116,8 +116,8 @@ Pgpool-II documentation Quorum state : QUORUM EXIST Alive Remote Nodes : 2 VIP up on local node : NO - Master Node Name : Linux_host2.localdomain_9992 - Master Host Name : localhost + Leader Node Name : Linux_host2.localdomain_9992 + Leader Host Name : localhost Watchdog Node Information Node Name : Linux_host1.localdomain_9991 @@ -136,7 +136,7 @@ Pgpool-II documentation Watchdog port : 9002 Node priority : 1 Status : 4 - Status Name : MASTER + Status Name : LEADER Node Name : Linux_host3.localdomain_9993 Host Name : host3 diff --git a/doc/src/sgml/ref/watchdog_setup.sgml b/doc/src/sgml/ref/watchdog_setup.sgml index 69f3147..18fb13d 100644 --- a/doc/src/sgml/ref/watchdog_setup.sgml +++ b/doc/src/sgml/ref/watchdog_setup.sgml @@ -389,8 +389,8 @@ Pgpool-II documentation Quorum state : QUORUM EXIST Alive Remote Nodes : 2 VIP up on local node : NO - Master Node Name : Linux_tishii-CF-SX3HE4BP_50004 - Master Host Name : localhost + Leader Node Name : Linux_tishii-CF-SX3HE4BP_50004 + Leader Host Name : localhost Watchdog Node Information Node Name : Linux_tishii-CF-SX3HE4BP_50000 @@ -409,7 +409,7 @@ Pgpool-II documentation Watchdog port : 50006 Node priority : 1 Status : 4 - Status Name : MASTER + Status Name : LEADER Node Name : Linux_tishii-CF-SX3HE4BP_50008 Host Name : localhost diff --git a/doc/src/sgml/watchdog.sgml b/doc/src/sgml/watchdog.sgml index b62020f..a2e205f 100644 --- a/doc/src/sgml/watchdog.sgml +++ b/doc/src/sgml/watchdog.sgml @@ -397,7 +397,7 @@ Watchdog executes this command on the node that is escalated - to the master watchdog. + to the leader watchdog. This command is executed just before bringing up the @@ -417,10 +417,10 @@ - Watchdog executes this command on the master Pgpool-II - watchdog node when that node resigns from the master node responsibilities. - A master watchdog node can resign from being a master node, - when the master node Pgpool-II shuts down, detects a network + Watchdog executes this command on the leader Pgpool-II + watchdog node when that node resigns from the leader node responsibilities. + A leader watchdog node can resign from being a leader node, + when the leader node Pgpool-II shuts down, detects a network blackout or detects the lost of quorumquorum. @@ -538,14 +538,14 @@ solved. - From Pgpool-II V4.1 onward, if the watchdog-master node + From Pgpool-II V4.1 onward, if the watchdog-leader node fails to build the consensus for primary backend node failover and the primary backend node gets into a - quarantine state, then it resigns from its master/coordinator responsibilities and lowers its wd_priority + quarantine state, then it resigns from its leader/coordinator responsibilities and lowers its wd_priority for next leader election and let the cluster elect some different new leader. - When the master node fails to build the consensus for standby backend node failure, it takes no action - and similarly quarantined standby backend nodes on watchdog-master do not trigger a new leader election. + When the leader node fails to build the consensus for standby backend node failure, it takes no action + and similarly quarantined standby backend nodes on watchdog-leader do not trigger a new leader election. @@ -710,7 +710,7 @@ cluster goes into two separated networks (A, B) and (C, D). For (A, B) and (C, D) the quorum still exist since for both groups there are two live nodes out of 4. The two groups choose their - own master watchdog, which is a split-brain. + own leader watchdog, which is a split-brain. Default is off. @@ -844,15 +844,15 @@ This parameter can be used to elevate the local watchdog node priority in the elections - to select master watchdog node. + to select leader watchdog node. The node with the higher wd_priority value will get selected - as master watchdog node when cluster will be electing its new master node - in the event of old master watchdog node failure. + as leader watchdog node when cluster will be electing its new leader node + in the event of old leader watchdog node failure. wd_priority is also valid at the time of cluster startup. When some watchdog nodes start up at same time,a node with the higher wd_priority - value is selected as a master node. + value is selected as a leader node. So we should start watchdog nodes in order of wd_priority priority to prevent - unintended nodes from being selected as masters. + unintended nodes from being selected as leader. wd_priority is not available in versions prior to diff --git a/src/config/pool_config_variables.c b/src/config/pool_config_variables.c index 88d0a87..8de5378 100644 --- a/src/config/pool_config_variables.c +++ b/src/config/pool_config_variables.c @@ -622,7 +622,7 @@ static struct config_bool ConfigureNamesBool[] = { {"clear_memqcache_on_escalation", CFGCXT_RELOAD, WATCHDOG_CONFIG, - "Clears the query cache in the shared memory when pgpool-II escaltes to master watchdog node.", + "Clears the query cache in the shared memory when pgpool-II escaltes to leader watchdog node.", CONFIG_VAR_TYPE_BOOL, false, 0 }, &g_pool_config.clear_memqcache_on_escalation, @@ -1005,7 +1005,7 @@ static struct config_string ConfigureNamesString[] = { {"wd_escalation_command", CFGCXT_RELOAD, WATCHDOG_CONFIG, - "Command to execute when watchdog node becomes cluster master/leader node.", + "Command to execute when watchdog node becomes cluster leader node.", CONFIG_VAR_TYPE_STRING, false, 0 }, &g_pool_config.wd_escalation_command, @@ -1015,7 +1015,7 @@ static struct config_string ConfigureNamesString[] = { {"wd_de_escalation_command", CFGCXT_RELOAD, WATCHDOG_CONFIG, - "Command to execute when watchdog node resigns from the cluster master/leader node.", + "Command to execute when watchdog node resigns from the cluster leader node.", CONFIG_VAR_TYPE_STRING, false, 0 }, &g_pool_config.wd_de_escalation_command, @@ -1035,7 +1035,7 @@ static struct config_string ConfigureNamesString[] = { {"delegate_IP", CFGCXT_INIT, WATCHDOG_CONFIG, - "Delegate IP address to be used when pgpool node become a watchdog cluster master/leader.", + "Delegate IP address to be used when pgpool node become a watchdog cluster leader.", CONFIG_VAR_TYPE_STRING, false, 0 }, &g_pool_config.delegate_IP, diff --git a/src/include/pool_config.h b/src/include/pool_config.h index 930b60e..43d8e7e 100644 --- a/src/include/pool_config.h +++ b/src/include/pool_config.h @@ -543,7 +543,7 @@ typedef struct char *wd_escalation_command; /* Executes this command at escalation * on new active pgpool. */ char *wd_de_escalation_command; /* Executes this command when - * master pgpool goes down. */ + * leader pgpool goes down. */ int wd_priority; /* watchdog node priority, during leader * election */ int pgpool_node_id; /* pgpool (watchdog) node id */ diff --git a/src/include/watchdog/watchdog.h b/src/include/watchdog/watchdog.h index 073920e..8c7247c 100644 --- a/src/include/watchdog/watchdog.h +++ b/src/include/watchdog/watchdog.h @@ -176,7 +176,7 @@ typedef struct WatchdogNode int standby_nodes_count; /* number of standby nodes joined the * cluster only applicable when this * WatchdogNode is the - * master/coordinator node */ + * leader/coordinator node */ int quorum_status; /* quorum status on the node */ bool escalated; /* true if the Watchdog node has performed * escalation */ diff --git a/src/include/watchdog/wd_internal_commands.h b/src/include/watchdog/wd_internal_commands.h index d8d7014..afe17df 100644 --- a/src/include/watchdog/wd_internal_commands.h +++ b/src/include/watchdog/wd_internal_commands.h @@ -40,7 +40,7 @@ extern WDFailoverCMDResults wd_promote_backend(int node_id, unsigned char flags) extern WdCommandResult wd_execute_cluster_command(char* clusterCommand, int nArgs, WDExecCommandArg *wdExecCommandArg); -extern WDPGBackendStatus * get_pg_backend_status_from_master_wd_node(void); +extern WDPGBackendStatus * get_pg_backend_status_from_leader_wd_node(void); extern WD_STATES wd_internal_get_watchdog_local_node_state(void); extern int wd_internal_get_watchdog_quorum_state(void); diff --git a/src/include/watchdog/wd_ipc_defines.h b/src/include/watchdog/wd_ipc_defines.h index b7ccb78..e66c455 100644 --- a/src/include/watchdog/wd_ipc_defines.h +++ b/src/include/watchdog/wd_ipc_defines.h @@ -38,7 +38,7 @@ typedef enum WDFailoverCMDResults FAILOVER_RES_WILL_BE_DONE, FAILOVER_RES_NOT_ALLOWED, FAILOVER_RES_INVALID_FUNCTION, - FAILOVER_RES_MASTER_REJECTED, + FAILOVER_RES_LEADER_REJECTED, FAILOVER_RES_BUILDING_CONSENSUS, FAILOVER_RES_CONSENSUS_MAY_FAIL, FAILOVER_RES_TIMEOUT @@ -66,13 +66,13 @@ typedef enum WDValueDataType #define WD_IPC_FAILOVER_COMMAND 'f' #define WD_IPC_ONLINE_RECOVERY_COMMAND 'r' #define WD_FAILOVER_LOCKING_REQUEST 's' -#define WD_GET_MASTER_DATA_REQUEST 'd' +#define WD_GET_LEADER_DATA_REQUEST 'd' #define WD_GET_RUNTIME_VARIABLE_VALUE 'v' #define WD_FAILOVER_INDICATION 'i' #define WD_COMMAND_RESTART_CLUSTER "RESTART_CLUSTER" -#define WD_COMMAND_REELECT_MASTER "REELECT_MASTER" +#define WD_COMMAND_REELECT_LEADER "REELECT_LEADER" #define WD_COMMAND_SHUTDOWN_CLUSTER "SHUTDOWN_CLUSTER" #define WD_COMMAND_RELOAD_CONFIG_CLUSTER "RELOAD_CONFIG_CLUSTER" diff --git a/src/include/watchdog/wd_json_data.h b/src/include/watchdog/wd_json_data.h index 6bd4606..4a45c4e 100644 --- a/src/include/watchdog/wd_json_data.h +++ b/src/include/watchdog/wd_json_data.h @@ -30,7 +30,7 @@ /* * The structure to hold the parsed PG backend node status data fetched - * from the master watchdog node + * from the leader watchdog node */ typedef struct WDPGBackendStatus { diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c index 1e085ae..0831ad0 100644 --- a/src/main/pgpool_main.c +++ b/src/main/pgpool_main.c @@ -1249,8 +1249,8 @@ sigusr1_interupt_processor(void) if (wd_internal_get_watchdog_local_node_state() == WD_STANDBY) { ereport(LOG, - (errmsg("master watchdog has performed failover"), - errdetail("syncing the backend states from the MASTER watchdog node"))); + (errmsg("leader watchdog has performed failover"), + errdetail("syncing the backend states from the LEADER watchdog node"))); sync_backend_from_watchdog(); } } @@ -1265,7 +1265,7 @@ sigusr1_interupt_processor(void) { ereport(LOG, (errmsg("we have joined the watchdog cluster as STANDBY node"), - errdetail("syncing the backend states from the MASTER watchdog node"))); + errdetail("syncing the backend states from the LEADER watchdog node"))); sync_backend_from_watchdog(); } } @@ -3785,7 +3785,7 @@ update_backend_quarantine_status(void) /* * The function fetch the current status of all configured backend - * nodes from the MASTER/COORDINATOR watchdog Pgpool-II and synchronize the + * nodes from the LEADER/COORDINATOR watchdog Pgpool-II and synchronize the * local backend states with the cluster wide status of each node. * * Latter in the funcrtion after syncing the backend node status the function @@ -3809,14 +3809,14 @@ sync_backend_from_watchdog(void) /* * Ask the watchdog to get all the backend states from the - * Master/Coordinator Pgpool-II node + * Leader/Coordinator Pgpool-II node */ - WDPGBackendStatus *backendStatus = get_pg_backend_status_from_master_wd_node(); + WDPGBackendStatus *backendStatus = get_pg_backend_status_from_leader_wd_node(); if (!backendStatus) { ereport(WARNING, - (errmsg("failed to get the backend status from the master watchdog node"), + (errmsg("failed to get the backend status from the leader watchdog node"), errdetail("using the local backend node status"))); return; } @@ -3824,21 +3824,21 @@ sync_backend_from_watchdog(void) { /* * -ve node count is returned by watchdog when the node itself is a - * master and in that case we need to use the loacl backend node + * leader and in that case we need to use the loacl backend node * status */ ereport(LOG, - (errmsg("I am the master watchdog node"), + (errmsg("I am the leader watchdog node"), errdetail("using the local backend node status"))); pfree(backendStatus); return; } ereport(LOG, - (errmsg("master watchdog node \"%s\" returned status for %d backend nodes", backendStatus->nodeName, backendStatus->node_count))); + (errmsg("leader watchdog node \"%s\" returned status for %d backend nodes", backendStatus->nodeName, backendStatus->node_count))); ereport(DEBUG1, - (errmsg("primary node on master watchdog node \"%s\" is %d", backendStatus->nodeName, backendStatus->primary_node_id))); + (errmsg("primary node on leader watchdog node \"%s\" is %d", backendStatus->nodeName, backendStatus->primary_node_id))); /* @@ -3858,7 +3858,7 @@ sync_backend_from_watchdog(void) node_status_was_changed_to_down = true; ereport(LOG, (errmsg("backend:%d is set to down status", i), - errdetail("backend:%d is DOWN on cluster master \"%s\"", i, backendStatus->nodeName))); + errdetail("backend:%d is DOWN on cluster leader \"%s\"", i, backendStatus->nodeName))); down_node_ids[down_node_ids_index++] = i; } } @@ -3877,7 +3877,7 @@ sync_backend_from_watchdog(void) ereport(LOG, (errmsg("backend:%d is set to UP status", i), - errdetail("backend:%d is UP on cluster master \"%s\"", i, backendStatus->nodeName))); + errdetail("backend:%d is UP on cluster leader \"%s\"", i, backendStatus->nodeName))); } } @@ -3885,7 +3885,7 @@ sync_backend_from_watchdog(void) /* * Update primary node id info on the shared memory area if it's different - * from the one on master watchdog node. This should be done only in streaming + * from the one on leader watchdog node. This should be done only in streaming * or logical replication mode. */ if (SL_MODE && Req_info->primary_node_id != backendStatus->primary_node_id) @@ -3893,13 +3893,13 @@ sync_backend_from_watchdog(void) /* Do not produce this log message if we are starting up the Pgpool-II */ if (processState != INITIALIZING) ereport(LOG, - (errmsg("primary node:%d on master watchdog node \"%s\" is different from local primary node:%d", + (errmsg("primary node:%d on leader watchdog node \"%s\" is different from local primary node:%d", backendStatus->primary_node_id, backendStatus->nodeName, Req_info->primary_node_id))); /* - * master node returns primary_node_id = -1 when the primary node is - * in quarantine state on the master. So we will not update our + * leader node returns primary_node_id = -1 when the primary node is + * in quarantine state on the leader. So we will not update our * primary node id when the status of current primary node is not - * CON_DOWN while primary_node_id sent by master watchdong node is -1 + * CON_DOWN while primary_node_id sent by leader watchdong node is -1 * * Note that Req_info->primary_node_id could be -2, which is the * initial value. So we need to avoid crash by checking the value is @@ -3910,7 +3910,7 @@ sync_backend_from_watchdog(void) backendStatus->primary_node_id == -1 && BACKEND_INFO(Req_info->primary_node_id).backend_status != CON_DOWN) { ereport(LOG, - (errmsg("primary node:%d on master watchdog node \"%s\" seems to be quarantined", + (errmsg("primary node:%d on leader watchdog node \"%s\" seems to be quarantined", Req_info->primary_node_id, backendStatus->nodeName), errdetail("keeping the current primary"))); } diff --git a/src/main/pool_internal_comms.c b/src/main/pool_internal_comms.c index 89a5329..e8b3310 100644 --- a/src/main/pool_internal_comms.c +++ b/src/main/pool_internal_comms.c @@ -209,7 +209,7 @@ degenerate_backend_set_ex(int *node_id_set, int count, unsigned char flags, bool } else if (res == FAILOVER_RES_WILL_BE_DONE) { - /* we will receive a sync request from master watchdog node */ + /* we will receive a sync request from leader watchdog node */ ereport(LOG, (errmsg("degenerate backend request for %d node(s) from pid [%d], will be handled by watchdog" ,node_count, getpid()))); diff --git a/src/sample/pgpool.conf.sample-logical b/src/sample/pgpool.conf.sample-logical index ac8af71..dae80cc 100644 --- a/src/sample/pgpool.conf.sample-logical +++ b/src/sample/pgpool.conf.sample-logical @@ -665,7 +665,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/sample/pgpool.conf.sample-raw b/src/sample/pgpool.conf.sample-raw index 40452c6..62b0f2b 100644 --- a/src/sample/pgpool.conf.sample-raw +++ b/src/sample/pgpool.conf.sample-raw @@ -703,7 +703,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/sample/pgpool.conf.sample-replication b/src/sample/pgpool.conf.sample-replication index 0743970..3db2e08 100644 --- a/src/sample/pgpool.conf.sample-replication +++ b/src/sample/pgpool.conf.sample-replication @@ -701,7 +701,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/sample/pgpool.conf.sample-slony b/src/sample/pgpool.conf.sample-slony index d8892a0..9baf492 100644 --- a/src/sample/pgpool.conf.sample-slony +++ b/src/sample/pgpool.conf.sample-slony @@ -701,7 +701,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/sample/pgpool.conf.sample-snapshot b/src/sample/pgpool.conf.sample-snapshot index 64ed840..5462fc7 100644 --- a/src/sample/pgpool.conf.sample-snapshot +++ b/src/sample/pgpool.conf.sample-snapshot @@ -699,7 +699,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/sample/pgpool.conf.sample-stream b/src/sample/pgpool.conf.sample-stream index 579c43f..52c6c0c 100644 --- a/src/sample/pgpool.conf.sample-stream +++ b/src/sample/pgpool.conf.sample-stream @@ -703,7 +703,7 @@ wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' - # Executes this command when master pgpool resigns from being master. + # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - diff --git a/src/test/regression/clean.sh b/src/test/regression/clean.sh index cfbc7b8..108058e 100644 --- a/src/test/regression/clean.sh +++ b/src/test/regression/clean.sh @@ -18,19 +18,19 @@ do cd .. done -rm -fr $dir/tests/004.watchdog/master +rm -fr $dir/tests/004.watchdog/leader rm -fr $dir/tests/004.watchdog/standby cd $dir/tests/010.rewrite_timestamp/timestamp/; make clean >/dev/null 2>&1; cd $dir -rm -fr $dir/tests/011.watchdoc_quorum_failover/master/ +rm -fr $dir/tests/011.watchdoc_quorum_failover/leader/ rm -fr $dir/tests/011.watchdoc_quorum_failover/standby/ rm -fr $dir/tests/011.watchdoc_quorum_failover/standby2/ -rm -fr $dir/tests/012.watchdog_failover_when_quorum_exists/master/ +rm -fr $dir/tests/012.watchdog_failover_when_quorum_exists/leader/ rm -fr $dir/tests/012.watchdog_failover_when_quorum_exists/standby/ rm -fr $dir/tests/012.watchdog_failover_when_quorum_exists/standby2/ -rm -fr $dir/tests/013.watchdoc_test_failover_require_consensus/master/ +rm -fr $dir/tests/013.watchdoc_test_failover_require_consensus/leader/ rm -fr $dir/tests/013.watchdoc_test_failover_require_consensus/standby/ rm -fr $dir/tests/013.watchdoc_test_failover_require_consensus/standby2/ -rm -fr $dir/tests/014.watchdoc_test_quorum_bypass/master/ -rm -fr $dir/tests/015.watchdoc_test_master_and_backend_fail/master/ +rm -fr $dir/tests/014.watchdoc_test_quorum_bypass/leader/ +rm -fr $dir/tests/015.watchdoc_test_master_and_backend_fail/leader/ rm -fr $dir/tests/015.watchdoc_test_master_and_backend_fail/standby/ rm -fr $dir/tests/015.watchdoc_test_master_and_backend_fail/standby2/ diff --git a/src/test/regression/tests/004.watchdog/.gitignore b/src/test/regression/tests/004.watchdog/.gitignore index c30a6fd..c1185f3 100644 --- a/src/test/regression/tests/004.watchdog/.gitignore +++ b/src/test/regression/tests/004.watchdog/.gitignore @@ -1,2 +1,2 @@ -master/ +leader/ standby/ diff --git a/src/test/regression/tests/004.watchdog/leader.conf b/src/test/regression/tests/004.watchdog/leader.conf new file mode 100644 index 0000000..4c95a80 --- /dev/null +++ b/src/test/regression/tests/004.watchdog/leader.conf @@ -0,0 +1,18 @@ +# leader watchdog +use_watchdog = on +wd_interval = 1 +wd_priority = 2 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port0 = 11100 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port01 = 21105 + +enable_consensus_with_half_votes = on diff --git a/src/test/regression/tests/004.watchdog/test.sh b/src/test/regression/tests/004.watchdog/test.sh index 599775d..a9748d0 100755 --- a/src/test/regression/tests/004.watchdog/test.sh +++ b/src/test/regression/tests/004.watchdog/test.sh @@ -2,31 +2,31 @@ #------------------------------------------------------------------- # test script for watchdog source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader STANDBY_DIR=standby success_count=0 -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool..." +# create leader environment +echo -n "creating leader pgpool..." $PGPOOL_SETUP -m n -n 1 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." # copy the configurations from to standby cp -r etc ../$STANDBY_DIR/ source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -42,27 +42,27 @@ cd .. mkdir $STANDBY_DIR/log echo -n "creating standby pgpool..." cat standby.conf >> $STANDBY_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool2.pid'" >> $STANDBY_DIR/etc/pgpool.conf echo 1 > $STANDBY_DIR/etc/pgpool_node_id # start the stnadby pgpool-II by hand $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY_DIR/etc/pgpool.conf -F $STANDBY_DIR/etc/pcp.conf -a $STANDBY_DIR/etc/pool_hba.conf > $STANDBY_DIR/log/pgpool.log 2>&1 & # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the pgpool master..." +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node. Starting escalation process" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node. Starting escalation process" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" sleep 2 done -# now check if standby has successfully joined connected to the master. +# now check if standby has successfully joined connected to the leader. echo "Waiting for the standby to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -76,16 +76,16 @@ do sleep 2 done -# step 2 stop master pgpool and see if standby take over -$PGPOOL_INSTALL_DIR/bin/pgpool -f $MASTER_DIR/etc/pgpool.conf -m f stop +# step 2 stop leader pgpool and see if standby take over +$PGPOOL_INSTALL_DIR/bin/pgpool -f $LEADER_DIR/etc/pgpool.conf -m f stop -echo "Checking if the Standby pgpool-II detected the master shutdown..." +echo "Checking if the Standby pgpool-II detected the leader shutdown..." for i in 1 2 3 4 5 6 7 8 9 10 do grep " is shutting down" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master shutdown detected." + echo "Leader shutdown detected." break; fi echo "[check] $i times" @@ -94,13 +94,13 @@ done # Finally see if standby take over -echo "Checking if the Standby pgpool-II takes over the master responsibility..." +echo "Checking if the Standby pgpool-II takes over the leader responsibility..." for i in 1 2 3 4 5 6 7 8 9 10 do grep "I am the cluster leader node. Starting escalation process" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Standby successfully became the new master." + echo "Standby successfully became the new leader." break; fi echo "[check] $i times" @@ -109,7 +109,7 @@ done # we are done. Just stop the standby pgpool-II $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY_DIR/etc/pgpool.conf -m f stop -cd master +cd leader ./shutdownall echo "$success_count out of 4 successfull"; diff --git a/src/test/regression/tests/011.watchdog_quorum_failover/.gitignore b/src/test/regression/tests/011.watchdog_quorum_failover/.gitignore index 76d02bb..e31af86 100644 --- a/src/test/regression/tests/011.watchdog_quorum_failover/.gitignore +++ b/src/test/regression/tests/011.watchdog_quorum_failover/.gitignore @@ -1,3 +1,3 @@ -master/ +leader/ standby/ standby2/ diff --git a/src/test/regression/tests/011.watchdog_quorum_failover/leader.conf b/src/test/regression/tests/011.watchdog_quorum_failover/leader.conf new file mode 100644 index 0000000..b35906c --- /dev/null +++ b/src/test/regression/tests/011.watchdog_quorum_failover/leader.conf @@ -0,0 +1,21 @@ +# leader watchdog +use_watchdog = on +wd_interval = 1 +wd_priority = 5 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port1 = 11100 +hostname2 = 'localhost' +wd_port2 = 21204 +pgpool_port2 = 11200 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port1 = 21105 +heartbeat_hostname2 = 'localhost' +heartbeat_port2 = 21205 diff --git a/src/test/regression/tests/011.watchdog_quorum_failover/test.sh b/src/test/regression/tests/011.watchdog_quorum_failover/test.sh index 74843aa..7ca98d3 100755 --- a/src/test/regression/tests/011.watchdog_quorum_failover/test.sh +++ b/src/test/regression/tests/011.watchdog_quorum_failover/test.sh @@ -6,7 +6,7 @@ # must be defined before compiling main/health_check.c. source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader STANDBY_DIR=standby STANDBY2_DIR=standby2 num_tests=9 @@ -14,32 +14,32 @@ success_count=0 PSQL=$PGBIN/psql PG_CTL=$PGBIN/pg_ctl -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR rm -fr $STANDBY2_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR mkdir $STANDBY2_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool and PostgreSQL clusters..." +# create leader environment +echo -n "creating leader pgpool and PostgreSQL clusters..." $PGPOOL_SETUP -m s -n 2 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." -# copy the configurations from master to standby +# copy the configurations from leader to standby cp -r etc ../$STANDBY_DIR/ -# copy the configurations from master to standby2 +# copy the configurations from leader to standby2 cp -r etc ../$STANDBY2_DIR/ source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -55,7 +55,7 @@ cd .. mkdir $STANDBY_DIR/log echo -n "creating standby pgpool..." cat standby.conf >> $STANDBY_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool2.pid'" >> $STANDBY_DIR/etc/pgpool.conf echo "logdir = $STANDBY_DIR/log" >> $STANDBY_DIR/etc/pgpool.conf echo 1 > $STANDBY_DIR/etc/pgpool_node_id @@ -67,7 +67,7 @@ $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY_DIR/etc/pgpool.conf -F $STANDBY mkdir $STANDBY2_DIR/log echo -n "creating standby2 pgpool..." cat standby2.conf >> $STANDBY2_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool3.pid'" >> $STANDBY2_DIR/etc/pgpool.conf echo "logdir = $STANDBY2_DIR/log" >> $STANDBY2_DIR/etc/pgpool.conf echo 2 > $STANDBY2_DIR/etc/pgpool_node_id @@ -75,20 +75,20 @@ echo 2 > $STANDBY2_DIR/etc/pgpool_node_id $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY2_DIR/etc/pgpool.conf -F $STANDBY2_DIR/etc/pcp.conf -a $STANDBY2_DIR/etc/pool_hba.conf > $STANDBY2_DIR/log/pgpool.log 2>&1 & # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the pgpool master..." +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" sleep 2 done -# now check if standby has successfully joined connected to the master. +# now check if standby has successfully joined connected to the leader. echo "Waiting for the standby to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -102,7 +102,7 @@ do sleep 2 done -# now check if standby2 has successfully joined connected to the master. +# now check if standby2 has successfully joined connected to the leader. echo "Waiting for the standby2 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -148,11 +148,11 @@ if [ $n -eq 3 ];then fi # raise an real DB node 1 error -$PG_CTL -D master/data1 -m f stop -echo "Checking if master detects the shutdown error" +$PG_CTL -D leader/data1 -m f stop +echo "Checking if leader detects the shutdown error" for i in 1 2 3 4 5 6 7 8 9 10 do - grep -i "failover" $MASTER_DIR/log/pgpool.log + grep -i "failover" $LEADER_DIR/log/pgpool.log if [ $? = 0 ];then success_count=$(( success_count + 1 )) echo "DB error detected." @@ -184,16 +184,16 @@ do sleep 2 done -# stop master pgpool and see if standby takes over the roll -$PGPOOL_INSTALL_DIR/bin/pgpool -f $MASTER_DIR/etc/pgpool.conf -m f stop +# stop leader pgpool and see if standby takes over the roll +$PGPOOL_INSTALL_DIR/bin/pgpool -f $LEADER_DIR/etc/pgpool.conf -m f stop -echo "Checking if the Standby pgpool-II detected the master shutdown..." +echo "Checking if the Standby pgpool-II detected the leader shutdown..." for i in 1 2 3 4 5 6 7 8 9 10 do grep " is shutting down" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master shutdown detected." + echo "Leader shutdown detected." break; fi echo "[check] $i times" @@ -202,13 +202,13 @@ done # Finally see if standby take over -echo "Checking if the Standby pgpool-II takes over the master responsibility..." +echo "Checking if the Standby pgpool-II takes over the leader responsibility..." for i in 1 2 3 4 5 6 7 8 9 10 do grep "I am the cluster leader node" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Standby successfully became the new master." + echo "Standby successfully became the new leader." break; fi echo "[check] $i times" @@ -218,7 +218,7 @@ done # we are done. Just stop the standby pgpool-II $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY_DIR/etc/pgpool.conf -m f stop $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY2_DIR/etc/pgpool.conf -m f stop -cd master +cd leader ./shutdownall echo "$success_count out of $num_tests successfull"; diff --git a/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/leader.conf b/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/leader.conf new file mode 100644 index 0000000..18d412b --- /dev/null +++ b/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/leader.conf @@ -0,0 +1,25 @@ +# leader watchdog +num_init_children = 4 +use_watchdog = on +failover_when_quorum_exists = true +failover_require_consensus = false +allow_multiple_failover_requests_from_node = false +wd_interval = 1 +wd_priority = 5 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port1 = 11100 +hostname2 = 'localhost' +wd_port2 = 21204 +pgpool_port2 = 11200 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port1 = 21105 +heartbeat_hostname2 = 'localhost' +heartbeat_port2 = 21205 diff --git a/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/test.sh b/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/test.sh index 819f5a2..5ad89f6 100755 --- a/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/test.sh +++ b/src/test/regression/tests/012.watchdog_failover_when_quorum_exists/test.sh @@ -8,7 +8,7 @@ # test failover_when_quorum_exists # source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader STANDBY_DIR=standby STANDBY2_DIR=standby2 num_tests=5 @@ -16,32 +16,32 @@ success_count=0 PSQL=$PGBIN/psql PG_CTL=$PGBIN/pg_ctl -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR rm -fr $STANDBY2_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR mkdir $STANDBY2_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool and PostgreSQL clusters..." +# create leader environment +echo -n "creating leader pgpool and PostgreSQL clusters..." $PGPOOL_SETUP -m s -n 2 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." -# copy the configurations from master to standby +# copy the configurations from leader to standby cp -r etc ../$STANDBY_DIR/ -# copy the configurations from master to standby2 +# copy the configurations from leader to standby2 cp -r etc ../$STANDBY2_DIR/ source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -57,7 +57,7 @@ cd .. mkdir $STANDBY_DIR/log echo -n "creating standby pgpool..." cat standby.conf >> $STANDBY_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool2.pid'" >> $STANDBY_DIR/etc/pgpool.conf echo "logdir = $STANDBY_DIR/log" >> $STANDBY_DIR/etc/pgpool.conf echo 1 > $STANDBY_DIR/etc/pgpool_node_id @@ -69,7 +69,7 @@ echo 1 > $STANDBY_DIR/etc/pgpool_node_id mkdir $STANDBY2_DIR/log echo -n "creating standby2 pgpool..." cat standby2.conf >> $STANDBY2_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool3.pid'" >> $STANDBY2_DIR/etc/pgpool.conf echo "logdir = $STANDBY2_DIR/log" >> $STANDBY2_DIR/etc/pgpool.conf echo 2 > $STANDBY2_DIR/etc/pgpool_node_id @@ -77,13 +77,13 @@ echo 2 > $STANDBY2_DIR/etc/pgpool_node_id #$PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY2_DIR/etc/pgpool.conf -F $STANDBY2_DIR/etc/pcp.conf -a $STANDBY2_DIR/etc/pool_hba.conf > $STANDBY2_DIR/log/pgpool.log 2>&1 & # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the pgpool master..." +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" @@ -91,12 +91,12 @@ do done -# raise an artificial communication error on master for DB node 1 -echo "1 down" > $MASTER_DIR/log/backend_down_request -echo "Checking if the Master rejects the failover because quorum is not present..." +# raise an artificial communication error on leader for DB node 1 +echo "1 down" > $LEADER_DIR/log/backend_down_request +echo "Checking if the Leader rejects the failover because quorum is not present..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep -i "Rejecting the failover request" $MASTER_DIR/log/pgpool.log + grep -i "Rejecting the failover request" $LEADER_DIR/log/pgpool.log if [ $? = 0 ];then success_count=$(( success_count + 1 )) echo "Fake DB error detected. and Failover rejected because of absence of quorum" @@ -113,7 +113,7 @@ $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY_DIR/etc/pgpool.conf -F $STANDBY # start the second stnadby pgpool-II by hand $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY2_DIR/etc/pgpool.conf -F $STANDBY2_DIR/etc/pcp.conf -a $STANDBY2_DIR/etc/pool_hba.conf > $STANDBY2_DIR/log/pgpool.log 2>&1 & -# now check if standby1 has successfully joined connected to the master. +# now check if standby1 has successfully joined connected to the leader. echo "Waiting for the standby1 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -127,7 +127,7 @@ do sleep 2 done -# now check if standby2 has successfully joined connected to the master. +# now check if standby2 has successfully joined connected to the leader. echo "Waiting for the standby2 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -142,7 +142,7 @@ do done # raise an artificial communication again to check if failover is executed this time -echo "1 down" > $MASTER_DIR/log/backend_down_request +echo "1 down" > $LEADER_DIR/log/backend_down_request #give some time to pgpool-II to execute failover sleep 5 # check to see if all Pgpool-II agrees that the failover request is @@ -171,7 +171,7 @@ done # we are done. Just stop the standby pgpool-II $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY_DIR/etc/pgpool.conf -m f stop $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY2_DIR/etc/pgpool.conf -m f stop -cd master +cd leader ./shutdownall echo "$success_count out of $num_tests successfull"; diff --git a/src/test/regression/tests/013.watchdog_failover_require_consensus/.gitignore b/src/test/regression/tests/013.watchdog_failover_require_consensus/.gitignore index 76d02bb..e31af86 100644 --- a/src/test/regression/tests/013.watchdog_failover_require_consensus/.gitignore +++ b/src/test/regression/tests/013.watchdog_failover_require_consensus/.gitignore @@ -1,3 +1,3 @@ -master/ +leader/ standby/ standby2/ diff --git a/src/test/regression/tests/013.watchdog_failover_require_consensus/leader.conf b/src/test/regression/tests/013.watchdog_failover_require_consensus/leader.conf new file mode 100644 index 0000000..fc59360 --- /dev/null +++ b/src/test/regression/tests/013.watchdog_failover_require_consensus/leader.conf @@ -0,0 +1,25 @@ +# leader watchdog +num_init_children = 4 +use_watchdog = on +failover_when_quorum_exists = true +failover_require_consensus = true +allow_multiple_failover_requests_from_node = false +wd_interval = 1 +wd_priority = 5 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port1 = 11100 +hostname2 = 'localhost' +wd_port2 = 21204 +pgpool_port2 = 11200 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port1 = 21105 +heartbeat_hostname2 = 'localhost' +heartbeat_port2 = 21205 diff --git a/src/test/regression/tests/013.watchdog_failover_require_consensus/test.sh b/src/test/regression/tests/013.watchdog_failover_require_consensus/test.sh index f0c77b5..d71fc7e 100755 --- a/src/test/regression/tests/013.watchdog_failover_require_consensus/test.sh +++ b/src/test/regression/tests/013.watchdog_failover_require_consensus/test.sh @@ -8,7 +8,7 @@ # test failover_require_consensus # source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader STANDBY_DIR=standby STANDBY2_DIR=standby2 num_tests=7 @@ -16,32 +16,32 @@ success_count=0 PSQL=$PGBIN/psql PG_CTL=$PGBIN/pg_ctl -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR rm -fr $STANDBY2_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR mkdir $STANDBY2_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool and PostgreSQL clusters..." +# create leader environment +echo -n "creating leader pgpool and PostgreSQL clusters..." $PGPOOL_SETUP -m s -n 2 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." -# copy the configurations from master to standby +# copy the configurations from leader to standby cp -r etc ../$STANDBY_DIR/ -# copy the configurations from master to standby2 +# copy the configurations from leader to standby2 cp -r etc ../$STANDBY2_DIR/ source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -57,7 +57,7 @@ cd .. mkdir $STANDBY_DIR/log echo -n "creating standby pgpool..." cat standby.conf >> $STANDBY_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool2.pid'" >> $STANDBY_DIR/etc/pgpool.conf echo "logdir = $STANDBY_DIR/log" >> $STANDBY_DIR/etc/pgpool.conf echo 1 > $STANDBY_DIR/etc/pgpool_node_id @@ -68,7 +68,7 @@ $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY_DIR/etc/pgpool.conf -F $STANDBY mkdir $STANDBY2_DIR/log echo -n "creating standby2 pgpool..." cat standby2.conf >> $STANDBY2_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool3.pid'" >> $STANDBY2_DIR/etc/pgpool.conf echo "logdir = $STANDBY2_DIR/log" >> $STANDBY2_DIR/etc/pgpool.conf echo 2 > $STANDBY2_DIR/etc/pgpool_node_id @@ -76,20 +76,20 @@ echo 2 > $STANDBY2_DIR/etc/pgpool_node_id $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY2_DIR/etc/pgpool.conf -F $STANDBY2_DIR/etc/pcp.conf -a $STANDBY2_DIR/etc/pool_hba.conf > $STANDBY2_DIR/log/pgpool.log 2>&1 & # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the pgpool master..." +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" sleep 2 done -# now check if standby1 has successfully joined connected to the master. +# now check if standby1 has successfully joined connected to the leader. echo "Waiting for the standby1 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -103,7 +103,7 @@ do sleep 2 done -# now check if standby2 has successfully joined connected to the master. +# now check if standby2 has successfully joined connected to the leader. echo "Waiting for the standby2 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -125,19 +125,19 @@ do grep -i "building consensus for request" $STANDBY_DIR/log/pgpool.log if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Fake DB error generated and master is waiting for consensus" + echo "Fake DB error generated and leader is waiting for consensus" break; fi echo "[check] $i times" sleep 2 done -echo "Checking if the Master receives the failover request and waiting for consensus..." +echo "Checking if the Leader receives the failover request and waiting for consensus..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep -i "failover request noted" $MASTER_DIR/log/pgpool.log + grep -i "failover request noted" $LEADER_DIR/log/pgpool.log if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Fake DB error delivered to master. and master is waiting for consensus" + echo "Fake DB error delivered to leader. and leader is waiting for consensus" break; fi echo "[check] $i times" @@ -192,7 +192,7 @@ done # we are done. Just stop the standby pgpool-II $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY_DIR/etc/pgpool.conf -m f stop $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY2_DIR/etc/pgpool.conf -m f stop -cd master +cd leader ./shutdownall echo "$success_count out of $num_tests successfull"; diff --git a/src/test/regression/tests/014.watchdog_test_quorum_bypass/.gitignore b/src/test/regression/tests/014.watchdog_test_quorum_bypass/.gitignore index b9f1372..b0a8a94 100644 --- a/src/test/regression/tests/014.watchdog_test_quorum_bypass/.gitignore +++ b/src/test/regression/tests/014.watchdog_test_quorum_bypass/.gitignore @@ -1 +1 @@ -master/ +leader/ diff --git a/src/test/regression/tests/014.watchdog_test_quorum_bypass/leader.conf b/src/test/regression/tests/014.watchdog_test_quorum_bypass/leader.conf new file mode 100644 index 0000000..fc59360 --- /dev/null +++ b/src/test/regression/tests/014.watchdog_test_quorum_bypass/leader.conf @@ -0,0 +1,25 @@ +# leader watchdog +num_init_children = 4 +use_watchdog = on +failover_when_quorum_exists = true +failover_require_consensus = true +allow_multiple_failover_requests_from_node = false +wd_interval = 1 +wd_priority = 5 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port1 = 11100 +hostname2 = 'localhost' +wd_port2 = 21204 +pgpool_port2 = 11200 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port1 = 21105 +heartbeat_hostname2 = 'localhost' +heartbeat_port2 = 21205 diff --git a/src/test/regression/tests/014.watchdog_test_quorum_bypass/test.sh b/src/test/regression/tests/014.watchdog_test_quorum_bypass/test.sh index 1ead154..e6a47a6 100755 --- a/src/test/regression/tests/014.watchdog_test_quorum_bypass/test.sh +++ b/src/test/regression/tests/014.watchdog_test_quorum_bypass/test.sh @@ -8,32 +8,32 @@ # test pcp_detach bypass failover_when_quorum_exists and failover_require_consensus # source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader num_tests=2 success_count=0 PSQL=$PGBIN/psql PG_CTL=$PGBIN/pg_ctl -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR rm -fr $STANDBY2_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR mkdir $STANDBY2_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool and PostgreSQL clusters..." +# create leader environment +echo -n "creating leader pgpool and PostgreSQL clusters..." $PGPOOL_SETUP -m s -n 2 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -44,14 +44,14 @@ wait_for_pgpool_startup cd .. -# First test check if pgpool-II became a master. -echo "Waiting for the pgpool master..." +# First test check if pgpool-II became a leader. +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" @@ -69,7 +69,7 @@ if [ $? = 0 ];then success_count=$(( success_count + 1 )) fi -cd master +cd leader ./shutdownall echo "$success_count out of $num_tests successfull"; diff --git a/src/test/regression/tests/015.watchdog_master_and_backend_fail/.gitignore b/src/test/regression/tests/015.watchdog_master_and_backend_fail/.gitignore index 76d02bb..e31af86 100644 --- a/src/test/regression/tests/015.watchdog_master_and_backend_fail/.gitignore +++ b/src/test/regression/tests/015.watchdog_master_and_backend_fail/.gitignore @@ -1,3 +1,3 @@ -master/ +leader/ standby/ standby2/ diff --git a/src/test/regression/tests/015.watchdog_master_and_backend_fail/leader.conf b/src/test/regression/tests/015.watchdog_master_and_backend_fail/leader.conf new file mode 100644 index 0000000..fc59360 --- /dev/null +++ b/src/test/regression/tests/015.watchdog_master_and_backend_fail/leader.conf @@ -0,0 +1,25 @@ +# leader watchdog +num_init_children = 4 +use_watchdog = on +failover_when_quorum_exists = true +failover_require_consensus = true +allow_multiple_failover_requests_from_node = false +wd_interval = 1 +wd_priority = 5 + +hostname0 = 'localhost' +wd_port0 = 21004 +pgpool_port0 = 11000 +hostname1 = 'localhost' +wd_port1 = 21104 +pgpool_port1 = 11100 +hostname2 = 'localhost' +wd_port2 = 21204 +pgpool_port2 = 11200 + +heartbeat_hostname0 = 'localhost' +heartbeat_port0 = 21005 +heartbeat_hostname1 = 'localhost' +heartbeat_port1 = 21105 +heartbeat_hostname2 = 'localhost' +heartbeat_port2 = 21205 diff --git a/src/test/regression/tests/015.watchdog_master_and_backend_fail/test.sh b/src/test/regression/tests/015.watchdog_master_and_backend_fail/test.sh index f253d33..d00b741 100755 --- a/src/test/regression/tests/015.watchdog_master_and_backend_fail/test.sh +++ b/src/test/regression/tests/015.watchdog_master_and_backend_fail/test.sh @@ -5,10 +5,10 @@ # Please note that to successfully run the test, "HEALTHCHECK_DEBUG" # must be defined before compiling main/health_check.c. # -# test if master and backend goes down at same time Pgpool-II behaves as expected +# test if leader and backend goes down at same time Pgpool-II behaves as expected # source $TESTLIBS -MASTER_DIR=master +LEADER_DIR=leader STANDBY_DIR=standby STANDBY2_DIR=standby2 num_tests=6 @@ -16,32 +16,32 @@ success_count=0 PSQL=$PGBIN/psql PG_CTL=$PGBIN/pg_ctl -rm -fr $MASTER_DIR +rm -fr $LEADER_DIR rm -fr $STANDBY_DIR rm -fr $STANDBY2_DIR -mkdir $MASTER_DIR +mkdir $LEADER_DIR mkdir $STANDBY_DIR mkdir $STANDBY2_DIR -# dir in master directory -cd $MASTER_DIR +# dir in leader directory +cd $LEADER_DIR -# create master environment -echo -n "creating master pgpool and PostgreSQL clusters..." +# create leader environment +echo -n "creating leader pgpool and PostgreSQL clusters..." $PGPOOL_SETUP -m s -n 2 -p 11000|| exit 1 -echo "master setup done." +echo "leader setup done." -# copy the configurations from master to standby +# copy the configurations from leader to standby cp -r etc ../$STANDBY_DIR/ -# copy the configurations from master to standby2 +# copy the configurations from leader to standby2 cp -r etc ../$STANDBY2_DIR/ source ./bashrc.ports -cat ../master.conf >> etc/pgpool.conf +cat ../leader.conf >> etc/pgpool.conf echo 0 > etc/pgpool_node_id ./startall @@ -57,7 +57,7 @@ cd .. mkdir $STANDBY_DIR/log echo -n "creating standby pgpool..." cat standby.conf >> $STANDBY_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool2.pid'" >> $STANDBY_DIR/etc/pgpool.conf echo "logdir = $STANDBY_DIR/log" >> $STANDBY_DIR/etc/pgpool.conf echo 1 > $STANDBY_DIR/etc/pgpool_node_id @@ -68,7 +68,7 @@ $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY_DIR/etc/pgpool.conf -F $STANDBY mkdir $STANDBY2_DIR/log echo -n "creating standby2 pgpool..." cat standby2.conf >> $STANDBY2_DIR/etc/pgpool.conf -# since we are using the same pgpool-II conf as of master. so change the pid file path in standby pgpool conf +# since we are using the same pgpool-II conf as of leader. so change the pid file path in standby pgpool conf echo "pid_file_name = '$PWD/pgpool3.pid'" >> $STANDBY2_DIR/etc/pgpool.conf echo "logdir = $STANDBY2_DIR/log" >> $STANDBY2_DIR/etc/pgpool.conf echo 2 > $STANDBY2_DIR/etc/pgpool_node_id @@ -76,20 +76,20 @@ echo 2 > $STANDBY2_DIR/etc/pgpool_node_id $PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $STANDBY2_DIR/etc/pgpool.conf -F $STANDBY2_DIR/etc/pcp.conf -a $STANDBY2_DIR/etc/pool_hba.conf > $STANDBY2_DIR/log/pgpool.log 2>&1 & # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the pgpool master..." +echo "Waiting for the pgpool leader..." for i in 1 2 3 4 5 6 7 8 9 10 do - grep "I am the cluster leader node" $MASTER_DIR/log/pgpool.log > /dev/null 2>&1 + grep "I am the cluster leader node" $LEADER_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Master brought up successfully." + echo "Leader brought up successfully." break; fi echo "[check] $i times" sleep 2 done -# now check if standby1 has successfully joined connected to the master. +# now check if standby1 has successfully joined connected to the leader. echo "Waiting for the standby1 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -103,7 +103,7 @@ do sleep 2 done -# now check if standby2 has successfully joined connected to the master. +# now check if standby2 has successfully joined connected to the leader. echo "Waiting for the standby2 to join cluster..." for i in 1 2 3 4 5 6 7 8 9 10 do @@ -117,18 +117,18 @@ do sleep 2 done -#shutdown master and one PG server by hand -$PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $MASTER_DIR/etc/pgpool.conf -m f stop -$PG_CTL -D $MASTER_DIR/data1 -m f stop +#shutdown leader and one PG server by hand +$PGPOOL_INSTALL_DIR/bin/pgpool -D -n -f $LEADER_DIR/etc/pgpool.conf -m f stop +$PG_CTL -D $LEADER_DIR/data1 -m f stop # First test check if both pgpool-II have found their correct place in watchdog cluster. -echo "Waiting for the standby to become new master..." +echo "Waiting for the standby to become new leader..." for i in 1 2 3 4 5 6 7 8 9 10 do grep "I am the cluster leader node" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Standby became new master successfully." + echo "Standby became new leader successfully." break; fi echo "[check] $i times" @@ -144,7 +144,7 @@ do grep " failover done" $STANDBY_DIR/log/pgpool.log > /dev/null 2>&1 if [ $? = 0 ];then success_count=$(( success_count + 1 )) - echo "Standby became new master successfully." + echo "Standby became new leader successfully." break; fi echo "[check] $i times" @@ -176,7 +176,7 @@ done # we are done. Just stop the standby pgpool-II $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY_DIR/etc/pgpool.conf -m f stop $PGPOOL_INSTALL_DIR/bin/pgpool -f $STANDBY2_DIR/etc/pgpool.conf -m f stop -cd master +cd leader ./shutdownall echo "$success_count out of $num_tests successfull"; diff --git a/src/watchdog/watchdog.c b/src/watchdog/watchdog.c index fb4a341..a23c6be 100644 --- a/src/watchdog/watchdog.c +++ b/src/watchdog/watchdog.c @@ -145,16 +145,16 @@ typedef enum IPC_CMD_PREOCESS_RES #define CLUSTER_QUORUM_FOUND 'F' #define CLUSTER_IN_SPLIT_BRAIN 'B' #define CLUSTER_NEEDS_ELECTION 'E' -#define CLUSTER_IAM_TRUE_MASTER 'M' -#define CLUSTER_IAM_NOT_TRUE_MASTER 'X' -#define CLUSTER_IAM_RESIGNING_FROM_MASTER 'R' +#define CLUSTER_IAM_TRUE_LEADER 'M' +#define CLUSTER_IAM_NOT_TRUE_LEADER 'X' +#define CLUSTER_IAM_RESIGNING_FROM_LEADER 'R' #define CLUSTER_NODE_INVALID_VERSION 'V' #define CLUSTER_NODE_REQUIRE_TO_RELOAD 'I' #define CLUSTER_NODE_APPEARING_LOST 'Y' #define CLUSTER_NODE_APPEARING_FOUND 'Z' -#define WD_MASTER_NODE getMasterWatchdogNode() +#define WD_LEADER_NODE getLeaderWatchdogNode() typedef struct packet_types { @@ -183,7 +183,7 @@ packet_types all_packet_types[] = { {WD_INFORM_I_AM_GOING_DOWN, "INFORM I AM GOING DOWN"}, {WD_ASK_FOR_POOL_CONFIG, "ASK FOR POOL CONFIG"}, {WD_POOL_CONFIG_DATA, "CONFIG DATA"}, - {WD_GET_MASTER_DATA_REQUEST, "DATA REQUEST FOR MASTER"}, + {WD_GET_LEADER_DATA_REQUEST, "DATA REQUEST FOR LEADER"}, {WD_GET_RUNTIME_VARIABLE_VALUE, "GET WD RUNTIME VARIABLE VALUE"}, {WD_CMD_REPLY_IN_DATA, "COMMAND REPLY IN DATA"}, {WD_FAILOVER_LOCKING_REQUEST, "FAILOVER LOCKING REQUEST"}, @@ -227,9 +227,9 @@ char *wd_state_names[] = { "LOADING", "JOINING", "INITIALIZING", - "MASTER", + "LEADER", "PARTICIPATING IN ELECTION", - "STANDING FOR MASTER", + "STANDING FOR LEADER", "STANDBY", "LOST", "IN NETWORK TROUBLE", @@ -337,19 +337,19 @@ typedef struct WDInterfaceStatus bool if_up; } WDInterfaceStatus; -typedef struct WDClusterMaster +typedef struct WDClusterLeader { - WatchdogNode *masterNode; + WatchdogNode *leaderNode; WatchdogNode **standbyNodes; int standby_nodes_count; bool holding_vip; -} WDClusterMasterInfo; +} WDClusterLeaderInfo; typedef struct wd_cluster { WatchdogNode *localNode; WatchdogNode *remoteNodes; - WDClusterMasterInfo clusterMasterInfo; + WDClusterLeaderInfo clusterLeaderInfo; int remoteNodeCount; int quorum_status; unsigned int nextCommandID; @@ -500,7 +500,7 @@ static void cluster_service_message_processor(WatchdogNode * wdNode, WDPacketDat static int get_cluster_node_count(void); static void clear_command_node_result(WDCommandNodeResult * nodeResult); -static inline bool is_local_node_true_master(void); +static inline bool is_local_node_true_leader(void); static inline WD_STATES get_local_node_state(void); static int set_state(WD_STATES newState); @@ -515,8 +515,8 @@ static int watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPack static int watchdog_state_machine_nw_error(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand); static int watchdog_state_machine_nw_isolation(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand); -static int I_am_master_and_cluser_in_split_brain(WatchdogNode * otherMasterNode); -static void handle_split_brain(WatchdogNode * otherMasterNode, WDPacketData * pkt); +static int I_am_leader_and_cluser_in_split_brain(WatchdogNode * otherLeaderNode); +static void handle_split_brain(WatchdogNode * otherLeaderNode, WDPacketData * pkt); static bool beacon_message_received_from_node(WatchdogNode * wdNode, WDPacketData * pkt); static void cleanUpIPCCommand(WDCommandData * ipcCommand); @@ -542,7 +542,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_nodeList_command(WDCommandData * ipcComm static IPC_CMD_PREOCESS_RES process_IPC_get_runtime_variable_value_request(WDCommandData * ipcCommand); static IPC_CMD_PREOCESS_RES process_IPC_online_recovery(WDCommandData * ipcCommand); static IPC_CMD_PREOCESS_RES process_IPC_failover_indication(WDCommandData * ipcCommand); -static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_master(WDCommandData * ipcCommand); +static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_leader(WDCommandData * ipcCommand); static IPC_CMD_PREOCESS_RES process_IPC_failover_command(WDCommandData * ipcCommand); static IPC_CMD_PREOCESS_RES process_failover_command_on_coordinator(WDCommandData * ipcCommand); static IPC_CMD_PREOCESS_RES process_IPC_execute_cluster_command(WDCommandData * ipcCommand); @@ -578,8 +578,8 @@ static void update_interface_status(void); static bool any_interface_available(void); static WDPacketData * process_data_request(WatchdogNode * wdNode, WDPacketData * pkt); -static WatchdogNode * getMasterWatchdogNode(void); -static void set_cluster_master_node(WatchdogNode * wdNode); +static WatchdogNode * getLeaderWatchdogNode(void); +static void set_cluster_leader_node(WatchdogNode * wdNode); static void clear_standby_nodes_list(void); static int standby_node_left_cluster(WatchdogNode * wdNode); static int standby_node_join_cluster(WatchdogNode * wdNode); @@ -777,10 +777,10 @@ wd_cluster_initialize(void) idx++; } - g_cluster.clusterMasterInfo.masterNode = NULL; - g_cluster.clusterMasterInfo.standbyNodes = palloc0(sizeof(WatchdogNode *) * g_cluster.remoteNodeCount); - g_cluster.clusterMasterInfo.standby_nodes_count = 0; - g_cluster.clusterMasterInfo.holding_vip = false; + g_cluster.clusterLeaderInfo.leaderNode = NULL; + g_cluster.clusterLeaderInfo.standbyNodes = palloc0(sizeof(WatchdogNode *) * g_cluster.remoteNodeCount); + g_cluster.clusterLeaderInfo.standby_nodes_count = 0; + g_cluster.clusterLeaderInfo.holding_vip = false; g_cluster.quorum_status = -1; g_cluster.nextCommandID = 1; g_cluster.clusterInitialized = false; @@ -2027,8 +2027,8 @@ static IPC_CMD_PREOCESS_RES process_IPC_command(WDCommandData * ipcCommand) return process_IPC_failover_indication(ipcCommand); break; - case WD_GET_MASTER_DATA_REQUEST: - return process_IPC_data_request_from_master(ipcCommand); + case WD_GET_LEADER_DATA_REQUEST: + return process_IPC_data_request_from_leader(ipcCommand); break; case WD_GET_RUNTIME_VARIABLE_VALUE: @@ -2143,7 +2143,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_get_runtime_variable_value_request(WDCom else if (strcasecmp(WD_RUNTIME_VAR_QUORUM_STATE, requestVarName) == 0) { jw_put_int(jNode, WD_JSON_KEY_VALUE_DATA_TYPE, VALUE_DATA_TYPE_INT); - jw_put_int(jNode, WD_JSON_KEY_VALUE_DATA, WD_MASTER_NODE ? WD_MASTER_NODE->quorum_status : -2); + jw_put_int(jNode, WD_JSON_KEY_VALUE_DATA, WD_LEADER_NODE ? WD_LEADER_NODE->quorum_status : -2); } else if (strcasecmp(WD_RUNTIME_VAR_ESCALATION_STATE, requestVarName) == 0) { @@ -2403,7 +2403,7 @@ service_expired_failovers(void) { /* lower my wd_priority for moment */ g_cluster.localNode->wd_priority = -1; - send_cluster_service_message(NULL, NULL, CLUSTER_IAM_RESIGNING_FROM_MASTER); + send_cluster_service_message(NULL, NULL, CLUSTER_IAM_RESIGNING_FROM_LEADER); set_state(WD_JOINING); } } @@ -2688,7 +2688,7 @@ static WDFailoverObject * add_failover(POOL_REQUEST_KIND reqKind, int *node_id_l } /* - * The function processes all failover commands on master node + * The function processes all failover commands on leader node */ static IPC_CMD_PREOCESS_RES process_failover_command_on_coordinator(WDCommandData * ipcCommand) { @@ -2806,7 +2806,7 @@ static IPC_CMD_PREOCESS_RES process_failover_command_on_coordinator(WDCommandDat static IPC_CMD_PREOCESS_RES process_IPC_failover_command(WDCommandData * ipcCommand) { - if (is_local_node_true_master()) + if (is_local_node_true_leader()) { ereport(LOG, (errmsg("watchdog received the failover command from local pgpool-II on IPC interface"))); @@ -2819,13 +2819,13 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_command(WDCommandData * ipcComm wd_packet_shallow_copy(&ipcCommand->sourcePacket, &ipcCommand->commandPacket); set_next_commandID_in_message(&ipcCommand->commandPacket); - ipcCommand->sendToNode = WD_MASTER_NODE; /* send the command to - * master node */ + ipcCommand->sendToNode = WD_LEADER_NODE; /* send the command to + * leader node */ if (send_command_packet_to_remote_nodes(ipcCommand, true) <= 0) { ereport(LOG, (errmsg("unable to process the failover command request received on IPC interface"), - errdetail("failed to forward the request to the master watchdog node \"%s\"", WD_MASTER_NODE->nodeName))); + errdetail("failed to forward the request to the leader watchdog node \"%s\"", WD_LEADER_NODE->nodeName))); return IPC_CMD_ERROR; } else @@ -2834,8 +2834,8 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_command(WDCommandData * ipcComm * we need to wait for the result */ ereport(LOG, - (errmsg("failover request from local pgpool-II node received on IPC interface is forwarded to master watchdog node \"%s\"", - WD_MASTER_NODE->nodeName), + (errmsg("failover request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node \"%s\"", + WD_LEADER_NODE->nodeName), errdetail("waiting for the reply..."))); return IPC_CMD_PROCESSING; } @@ -2869,12 +2869,12 @@ static IPC_CMD_PREOCESS_RES process_IPC_online_recovery(WDCommandData * ipcComma { ereport(LOG, (errmsg("unable to process the online recovery request received on IPC interface"), - errdetail("failed to forward the request to the master watchdog node \"%s\"", WD_MASTER_NODE->nodeName))); + errdetail("failed to forward the request to the leader watchdog node \"%s\"", WD_LEADER_NODE->nodeName))); return IPC_CMD_ERROR; } ereport(LOG, - (errmsg("online recovery request from local pgpool-II node received on IPC interface is forwarded to master watchdog node \"%s\"", - WD_MASTER_NODE->nodeName), + (errmsg("online recovery request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node \"%s\"", + WD_LEADER_NODE->nodeName), errdetail("waiting for the reply..."))); return IPC_CMD_PROCESSING; @@ -2889,7 +2889,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_online_recovery(WDCommandData * ipcComma return IPC_CMD_TRY_AGAIN; } -static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_master(WDCommandData * ipcCommand) +static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_leader(WDCommandData * ipcCommand) { /* * if cluster or myself is not in stable state just return cluster in @@ -2907,12 +2907,12 @@ static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_master(WDCommandData * wd_packet_shallow_copy(&ipcCommand->sourcePacket, &ipcCommand->commandPacket); set_next_commandID_in_message(&ipcCommand->commandPacket); - ipcCommand->sendToNode = WD_MASTER_NODE; + ipcCommand->sendToNode = WD_LEADER_NODE; if (send_command_packet_to_remote_nodes(ipcCommand, true) <= 0) { ereport(LOG, (errmsg("unable to process the get data request received on IPC interface"), - errdetail("failed to forward the request to the master watchdog node \"%s\"", WD_MASTER_NODE->nodeName))); + errdetail("failed to forward the request to the leader watchdog node \"%s\"", WD_LEADER_NODE->nodeName))); return IPC_CMD_ERROR; } else @@ -2921,17 +2921,17 @@ static IPC_CMD_PREOCESS_RES process_IPC_data_request_from_master(WDCommandData * * we need to wait for the result */ ereport(LOG, - (errmsg("get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node \"%s\"", - WD_MASTER_NODE->nodeName), + (errmsg("get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node \"%s\"", + WD_LEADER_NODE->nodeName), errdetail("waiting for the reply..."))); return IPC_CMD_PROCESSING; } } - else if (is_local_node_true_master()) + else if (is_local_node_true_leader()) { /* - * This node is itself a master node, So send the empty result with OK + * This node is itself a leader node, So send the empty result with OK * tag */ return IPC_CMD_OK; @@ -3013,7 +3013,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_indication(WDCommandData * ipcC else { ereport(LOG, - (errmsg("received the failover indication from Pgpool-II on IPC interface, but only master can do failover"))); + (errmsg("received the failover indication from Pgpool-II on IPC interface, but only leader can do failover"))); } reply_to_failove_command(ipcCommand, res, 0); @@ -3024,7 +3024,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_indication(WDCommandData * ipcC /* Failover start basically does nothing fency, It just sets the failover_in_progress * flag and inform all nodes that the failover is in progress. * - * only the local node that is a master can start the failover. + * only the local node that is a leader can start the failover. */ static WDFailoverCMDResults failover_start_indication(WDCommandData * ipcCommand) @@ -3032,7 +3032,7 @@ failover_start_indication(WDCommandData * ipcCommand) ereport(LOG, (errmsg("watchdog is informed of failover start by the main process"))); - /* only coordinator(master) node is allowed to process failover */ + /* only coordinator(leader) node is allowed to process failover */ if (get_local_node_state() == WD_COORDINATOR) { /* inform to all nodes about failover start */ @@ -3060,7 +3060,7 @@ failover_end_indication(WDCommandData * ipcCommand) ereport(LOG, (errmsg("watchdog is informed of failover end by the main process"))); - /* only coordinator(master) node is allowed to process failover */ + /* only coordinator(leader) node is allowed to process failover */ if (get_local_node_state() == WD_COORDINATOR) { send_message_of_type(NULL, WD_FAILOVER_END, NULL); @@ -3613,11 +3613,11 @@ static JsonNode * get_node_list_json(int id) JsonNode *jNode = jw_create_with_object(true); jw_put_int(jNode, "RemoteNodeCount", g_cluster.remoteNodeCount); - jw_put_int(jNode, "QuorumStatus", WD_MASTER_NODE ? WD_MASTER_NODE->quorum_status : -2); - jw_put_int(jNode, "AliveNodeCount", WD_MASTER_NODE ? WD_MASTER_NODE->standby_nodes_count : 0); + jw_put_int(jNode, "QuorumStatus", WD_LEADER_NODE ? WD_LEADER_NODE->quorum_status : -2); + jw_put_int(jNode, "AliveNodeCount", WD_LEADER_NODE ? WD_LEADER_NODE->standby_nodes_count : 0); jw_put_int(jNode, "Escalated", g_cluster.localNode->escalated); - jw_put_string(jNode, "MasterNodeName", WD_MASTER_NODE ? WD_MASTER_NODE->nodeName : "Not Set"); - jw_put_string(jNode, "MasterHostName", WD_MASTER_NODE ? WD_MASTER_NODE->hostname : "Not Set"); + jw_put_string(jNode, "LeaderNodeName", WD_LEADER_NODE ? WD_LEADER_NODE->nodeName : "Not Set"); + jw_put_string(jNode, "LeaderHostName", WD_LEADER_NODE ? WD_LEADER_NODE->hostname : "Not Set"); if (id < 0) { jw_put_int(jNode, "NodeCount", g_cluster.remoteNodeCount + 1); @@ -3868,28 +3868,28 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt) switch (pkt->data[0]) { - case CLUSTER_IAM_TRUE_MASTER: + case CLUSTER_IAM_TRUE_LEADER: { /* * The cluster was in split-brain and remote node thiks it is - * the worthy master + * the worthy leader */ if (get_local_node_state() == WD_COORDINATOR) { ereport(LOG, - (errmsg("remote node \"%s\" decided it is the true master", wdNode->nodeName), + (errmsg("remote node \"%s\" decided it is the true leader", wdNode->nodeName), errdetail("re-initializing the local watchdog cluster state because of split-brain"))); - send_cluster_service_message(NULL, pkt, CLUSTER_IAM_RESIGNING_FROM_MASTER); + send_cluster_service_message(NULL, pkt, CLUSTER_IAM_RESIGNING_FROM_LEADER); set_state(WD_JOINING); } - else if (WD_MASTER_NODE != NULL && WD_MASTER_NODE != wdNode) + else if (WD_LEADER_NODE != NULL && WD_LEADER_NODE != wdNode) { ereport(LOG, - (errmsg("remote node \"%s\" thinks it is a master/coordinator and I am causing the split-brain," \ - " but as per our record \"%s\" is the cluster master/coordinator", + (errmsg("remote node \"%s\" thinks it is a leader/coordinator and I am causing the split-brain," \ + " but as per our record \"%s\" is the cluster leader/coordinator", wdNode->nodeName, - WD_MASTER_NODE->nodeName), + WD_LEADER_NODE->nodeName), errdetail("restarting the cluster"))); send_cluster_service_message(NULL, pkt, CLUSTER_NEEDS_ELECTION); set_state(WD_JOINING); @@ -3897,12 +3897,12 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt) } break; - case CLUSTER_IAM_RESIGNING_FROM_MASTER: + case CLUSTER_IAM_RESIGNING_FROM_LEADER: { - if (WD_MASTER_NODE == wdNode) + if (WD_LEADER_NODE == wdNode) { ereport(LOG, - (errmsg("master/coordinator node \"%s\" decided to resigning from master, probably because of split-brain", + (errmsg("leader/coordinator node \"%s\" decided to resigning from leader, probably because of split-brain", wdNode->nodeName), errdetail("re-initializing the local watchdog cluster state"))); @@ -3911,9 +3911,9 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt) else { ereport(LOG, - (errmsg("master/coordinator node \"%s\" decided to resign from master, probably because of split-brain", + (errmsg("leader/coordinator node \"%s\" decided to resign from leader, probably because of split-brain", wdNode->nodeName), - errdetail("It was not our coordinator/master anyway. ignoring the message"))); + errdetail("It was not our coordinator/leader anyway. ignoring the message"))); } } break; @@ -3940,12 +3940,12 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt) } break; - case CLUSTER_IAM_NOT_TRUE_MASTER: + case CLUSTER_IAM_NOT_TRUE_LEADER: { - if (WD_MASTER_NODE == wdNode) + if (WD_LEADER_NODE == wdNode) { ereport(LOG, - (errmsg("master/coordinator node \"%s\" decided it was not true master, probably because of split-brain", wdNode->nodeName), + (errmsg("leader/coordinator node \"%s\" decided it was not true leader, probably because of split-brain", wdNode->nodeName), errdetail("re-initializing the local watchdog cluster state"))); set_state(WD_JOINING); @@ -3953,15 +3953,15 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt) else if (get_local_node_state() == WD_COORDINATOR) { ereport(LOG, - (errmsg("node \"%s\" was also thinking it was a master/coordinator and decided to resign", wdNode->nodeName), + (errmsg("node \"%s\" was also thinking it was a leader/coordinator and decided to resign", wdNode->nodeName), errdetail("cluster is recovering from split-brain"))); } else { ereport(LOG, - (errmsg("master/coordinator node \"%s\" decided to resign from master, probably because of split-brain", + (errmsg("leader/coordinator node \"%s\" decided to resign from leader, probably because of split-brain", wdNode->nodeName), - errdetail("but it was not our coordinator/master anyway. ignoring the message"))); + errdetail("but it was not our coordinator/leader anyway. ignoring the message"))); } } break; @@ -4095,7 +4095,7 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) cluster_service_message_processor(wdNode, pkt); break; - case WD_GET_MASTER_DATA_REQUEST: + case WD_GET_LEADER_DATA_REQUEST: replyPkt = process_data_request(wdNode, pkt); break; @@ -4171,15 +4171,15 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) if (wdNode->state == WD_COORDINATOR) { - if (WD_MASTER_NODE == NULL) + if (WD_LEADER_NODE == NULL) { - set_cluster_master_node(wdNode); + set_cluster_leader_node(wdNode); } - else if (WD_MASTER_NODE != wdNode) + else if (WD_LEADER_NODE != wdNode) { ereport(LOG, (errmsg("\"%s\" is the coordinator as per our record but \"%s\" is also announcing as a coordinator", - WD_MASTER_NODE->nodeName, wdNode->nodeName), + WD_LEADER_NODE->nodeName, wdNode->nodeName), errdetail("cluster is in the split-brain"))); if (get_local_node_state() != WD_COORDINATOR) @@ -4196,16 +4196,16 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) /* * okay the contention is between me and the other * node try to figure out which node is the worthy - * master + * leader */ ereport(LOG, (errmsg("I am the coordinator but \"%s\" is also announcing as a coordinator", wdNode->nodeName), - errdetail("trying to figure out the best contender for the master/coordinator node"))); + errdetail("trying to figure out the best contender for the leader/coordinator node"))); handle_split_brain(wdNode, pkt); } } - else if (WD_MASTER_NODE == wdNode && oldQuorumStatus != wdNode->quorum_status) + else if (WD_LEADER_NODE == wdNode && oldQuorumStatus != wdNode->quorum_status) { /* inform Pgpool main about quorum status changes */ register_watchdog_quorum_change_interupt(); @@ -4213,10 +4213,10 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) } /* - * if the info message is from master node. Make sure we are - * in sync with the master node state + * if the info message is from leader node. Make sure we are + * in sync with the leader node state */ - else if (WD_MASTER_NODE == wdNode) + else if (WD_LEADER_NODE == wdNode) { if (wdNode->state != WD_COORDINATOR) { @@ -4267,7 +4267,7 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) /* * if I am coordinator reply with accept, otherwise reject */ - if (g_cluster.localNode == WD_MASTER_NODE) + if (g_cluster.localNode == WD_LEADER_NODE) { replyPkt = get_minimum_message(WD_ACCEPT_MESSAGE, pkt); } @@ -4284,24 +4284,24 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt) * if the message is received from coordinator reply with * info, otherwise reject */ - if (WD_MASTER_NODE != NULL && wdNode != WD_MASTER_NODE) + if (WD_LEADER_NODE != NULL && wdNode != WD_LEADER_NODE) { ereport(LOG, (errmsg("\"%s\" is our coordinator node, but \"%s\" is also announcing as a coordinator", - WD_MASTER_NODE->nodeName, wdNode->nodeName), + WD_LEADER_NODE->nodeName, wdNode->nodeName), errdetail("broadcasting the cluster in split-brain message"))); send_cluster_service_message(NULL, pkt, CLUSTER_IN_SPLIT_BRAIN); } - else if (WD_MASTER_NODE != NULL) + else if (WD_LEADER_NODE != NULL) { replyPkt = get_mynode_info_message(pkt); beacon_message_received_from_node(wdNode, pkt); } /* - * if (WD_MASTER_NODE == NULL) + * if (WD_LEADER_NODE == NULL) * do not reply to beacon if we are not connected to - * any master node + * any leader node */ } break; @@ -5114,9 +5114,9 @@ static inline WD_STATES get_local_node_state(void) } static inline bool -is_local_node_true_master(void) +is_local_node_true_leader(void) { - return (get_local_node_state() == WD_COORDINATOR && WD_MASTER_NODE == g_cluster.localNode); + return (get_local_node_state() == WD_COORDINATOR && WD_LEADER_NODE == g_cluster.localNode); } /* @@ -5202,7 +5202,7 @@ wd_commands_packet_processor(WD_EVENTS event, WatchdogNode * wdNode, WDPacketDat if (pkt->type == WD_ACCEPT_MESSAGE) reply_to_failove_command(ipcCommand, FAILOVER_RES_PROCEED, 0); else - reply_to_failove_command(ipcCommand, FAILOVER_RES_MASTER_REJECTED, 0); + reply_to_failove_command(ipcCommand, FAILOVER_RES_LEADER_REJECTED, 0); return true; } @@ -5299,11 +5299,11 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk /* Inform the node, that it is lost for us */ send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_APPEARING_LOST); } - if (wdNode == WD_MASTER_NODE) + if (wdNode == WD_LEADER_NODE) { ereport(LOG, (errmsg("watchdog cluster has lost the coordinator node"))); - set_cluster_master_node(NULL); + set_cluster_leader_node(NULL); } /* close all socket connections to the node */ @@ -5579,7 +5579,7 @@ watchdog_state_machine_joining(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD switch (event) { case WD_EVENT_WD_STATE_CHANGED: - set_cluster_master_node(NULL); + set_cluster_leader_node(NULL); try_connecting_with_all_unreachable_nodes(); send_cluster_command(NULL, WD_REQ_INFO_MESSAGE, 4); set_timeout(MAX_SECS_WAIT_FOR_REPLY_FROM_NODE); @@ -5663,10 +5663,10 @@ watchdog_state_machine_initializing(WD_EVENTS event, WatchdogNode * wdNode, WDPa case WD_EVENT_TIMEOUT: { /* - * If master node exists in cluser, Join it otherwise try - * becoming a master + * If leader node exists in cluser, Join it otherwise try + * becoming a leader */ - if (WD_MASTER_NODE) + if (WD_LEADER_NODE) { /* * we found the coordinator node in network. Just join the @@ -5761,7 +5761,7 @@ watchdog_state_machine_standForCord(WD_EVENTS event, WatchdogNode * wdNode, WDPa { ereport(LOG, (errmsg("our stand for coordinator request is rejected by node \"%s\"",wdNode->nodeName), - errdetail("we might be in partial network isolation and cluster already have a valid master"), + errdetail("we might be in partial network isolation and cluster already have a valid leader"), errhint("please verify the watchdog life-check and network is working properly"))); set_state(WD_NETWORK_ISOLATION); } @@ -5853,9 +5853,9 @@ watchdog_state_machine_standForCord(WD_EVENTS event, WatchdogNode * wdNode, WDPa } /* - * Event handler for the coordinator/master state. + * Event handler for the coordinator/leader state. * The function handels all the event received when the local - * node is the master/coordinator node. + * node is the leader/coordinator node. */ static int watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand) @@ -5870,7 +5870,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac set_timeout(MAX_SECS_WAIT_FOR_REPLY_FROM_NODE); update_missed_beacon_count(NULL,true); ereport(LOG, - (errmsg("I am announcing my self as master/coordinator watchdog node"))); + (errmsg("I am announcing my self as leader/coordinator watchdog node"))); for (i = 0; i < g_cluster.remoteNodeCount; i++) { @@ -5908,7 +5908,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac (errmsg("I am the cluster leader node"), errdetail("our declare coordinator message is accepted by all nodes"))); - set_cluster_master_node(g_cluster.localNode); + set_cluster_leader_node(g_cluster.localNode); register_watchdog_state_change_interupt(); /* @@ -5979,8 +5979,8 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac case WD_EVENT_CLUSTER_QUORUM_CHANGED: { - /* make sure we are accepted as master */ - if (WD_MASTER_NODE == g_cluster.localNode) + /* make sure we are accepted as leader */ + if (WD_LEADER_NODE == g_cluster.localNode) { if (g_cluster.quorum_status == -1) { @@ -5988,7 +5988,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac (errmsg("We have lost the quorum"))); /* - * We have lost the quorum, stay as a master node but + * We have lost the quorum, stay as a leader node but * perform de-escalation. As keeping the VIP may * result in split-brain */ @@ -6030,7 +6030,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac * We do have some IP addresses assigned so its not a * total black-out check if we still have the VIP assigned */ - if (g_cluster.clusterMasterInfo.holding_vip == true) + if (g_cluster.clusterLeaderInfo.holding_vip == true) { ListCell *lc; bool vip_exists = false; @@ -6087,10 +6087,10 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac { /* * Since data version 1.1 we support CLUSTER_NODE_REQUIRE_TO_RELOAD - * which makes the standby nodes to re-send the join master node + * which makes the standby nodes to re-send the join leader node */ ereport(DEBUG1, - (errmsg("asking remote node \"%s\" to rejoin master", wdNode->nodeName), + (errmsg("asking remote node \"%s\" to rejoin leader", wdNode->nodeName), errdetail("watchdog data version %s",WD_MESSAGE_DATA_VERSION))); send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_REQUIRE_TO_RELOAD); @@ -6121,16 +6121,16 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac (errmsg("remote node \"%s\" is reachable again", wdNode->nodeName), errdetail("trying to add it back as a standby"))); wdNode->node_lost_reason = NODE_LOST_UNKNOWN_REASON; - /* If I am the cluster master. Ask for the node info and to re-send the join message */ + /* If I am the cluster leader. Ask for the node info and to re-send the join message */ send_message_of_type(wdNode, WD_REQ_INFO_MESSAGE, NULL); if (wdNode->wd_data_major_version >= 1 && wdNode->wd_data_minor_version >= 1) { /* * Since data version 1.1 we support CLUSTER_NODE_REQUIRE_TO_RELOAD - * which makes the standby nodes to re-send the join master node + * which makes the standby nodes to re-send the join leader node */ ereport(DEBUG1, - (errmsg("asking remote node \"%s\" to rejoin master", wdNode->nodeName), + (errmsg("asking remote node \"%s\" to rejoin leader", wdNode->nodeName), errdetail("watchdog data version %s",WD_MESSAGE_DATA_VERSION))); send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_REQUIRE_TO_RELOAD); @@ -6185,13 +6185,13 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac /* * we are not able to decide which should be * the best candidate to stay as - * master/coordinator node This could also + * leader/coordinator node This could also * happen if the remote node is using the * older version of Pgpool-II which send the * empty beacon messages. */ ereport(LOG, - (errmsg("We are in split brain, and not able to decide the best candidate for master/coordinator"), + (errmsg("We are in split brain, and not able to decide the best candidate for leader/coordinator"), errdetail("re-initializing the local watchdog cluster state"))); send_cluster_service_message(wdNode, pkt, CLUSTER_NEEDS_ELECTION); @@ -6260,7 +6260,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac * and incorrect from the other pgpool-II nodes in the cluster. So the ideal solution * for the situation is to make the pgpool-II main process aware of the network black out * and when the network recovers the pgpool-II asks the watchdog to sync again the state of - * all configured backend nodes from the master pgpool-II node. But to implement this lot + * all configured backend nodes from the leader pgpool-II node. But to implement this lot * of time is required, So until that time we are just opting for the easiest solution here * which is to commit a suicide as soon an the network becomes unreachable */ @@ -6329,7 +6329,7 @@ watchdog_state_machine_nw_error(WD_EVENTS event, WatchdogNode * wdNode, WDPacket /* * we could end up in tis state if we were connected to the - * master node as standby and got lost on the master. + * leader node as standby and got lost on the leader. * Here we just wait for BEACON_MESSAGE_INTERVAL_SECONDS * and retry to join the cluster. */ @@ -6401,107 +6401,107 @@ beacon_message_received_from_node(WatchdogNode * wdNode, WDPacketData * pkt) } /* - * This function decides the best contender for a coordinator/master node + * This function decides the best contender for a coordinator/leader node * when the remote node info states it is a coordinator while - * the local node is also in the master/coordinator state. + * the local node is also in the leader/coordinator state. * * return: - * -1 : remote node is the best candidate to remain as master - * 0 : both local and remote nodes are not worthy master or error - * 1 : local node should remain as the master/coordinator + * -1 : remote node is the best candidate to remain as leader + * 0 : both local and remote nodes are not worthy leader or error + * 1 : local node should remain as the leader/coordinator */ static int -I_am_master_and_cluser_in_split_brain(WatchdogNode * otherMasterNode) +I_am_leader_and_cluser_in_split_brain(WatchdogNode * otherLeaderNode) { if (get_local_node_state() != WD_COORDINATOR) return 0; - if (otherMasterNode->state != WD_COORDINATOR) + if (otherLeaderNode->state != WD_COORDINATOR) return 0; - if (otherMasterNode->current_state_time.tv_sec == 0) + if (otherLeaderNode->current_state_time.tv_sec == 0) { ereport(LOG, - (errmsg("not enough data to decide the master node"), - errdetail("the watchdog node:\"%s\" is using the older version of Pgpool-II", otherMasterNode->nodeName))); + (errmsg("not enough data to decide the leader node"), + errdetail("the watchdog node:\"%s\" is using the older version of Pgpool-II", otherLeaderNode->nodeName))); return 0; } - /* Decide which node should stay as master */ - if (otherMasterNode->escalated != g_cluster.localNode->escalated) + /* Decide which node should stay as leader */ + if (otherLeaderNode->escalated != g_cluster.localNode->escalated) { - if (otherMasterNode->escalated == true && g_cluster.localNode->escalated == false) + if (otherLeaderNode->escalated == true && g_cluster.localNode->escalated == false) { - /* remote node stays as the master */ + /* remote node stays as the leader */ ereport(LOG, - (errmsg("remote node:\"%s\" is best suitable to stay as master because it is escalated and I am not", - otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" is best suitable to stay as leader because it is escalated and I am not", + otherLeaderNode->nodeName))); return -1; } else { - /* local node stays as master */ + /* local node stays as leader */ ereport(LOG, - (errmsg("remote node:\"%s\" should step down from master because it is not escalated", - otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" should step down from leader because it is not escalated", + otherLeaderNode->nodeName))); return 1; } } - else if (otherMasterNode->quorum_status != g_cluster.quorum_status) + else if (otherLeaderNode->quorum_status != g_cluster.quorum_status) { - if (otherMasterNode->quorum_status > g_cluster.quorum_status) + if (otherLeaderNode->quorum_status > g_cluster.quorum_status) { /* quorum of remote node is in better state */ ereport(LOG, - (errmsg("remote node:\"%s\" is best suitable to stay as master because it holds the quorum" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" is best suitable to stay as leader because it holds the quorum" + ,otherLeaderNode->nodeName))); return -1; } else { - /* local node stays as master */ + /* local node stays as leader */ ereport(LOG, - (errmsg("remote node:\"%s\" should step down from master because it does not hold the quorum" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" should step down from leader because it does not hold the quorum" + ,otherLeaderNode->nodeName))); return 1; } } - else if (otherMasterNode->standby_nodes_count != g_cluster.clusterMasterInfo.standby_nodes_count) + else if (otherLeaderNode->standby_nodes_count != g_cluster.clusterLeaderInfo.standby_nodes_count) { - if (otherMasterNode->standby_nodes_count > g_cluster.clusterMasterInfo.standby_nodes_count) + if (otherLeaderNode->standby_nodes_count > g_cluster.clusterLeaderInfo.standby_nodes_count) { /* remote node has more alive nodes */ ereport(LOG, - (errmsg("remote node:\"%s\" is best suitable to stay as master because it has more connected standby nodes" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" is best suitable to stay as leader because it has more connected standby nodes" + ,otherLeaderNode->nodeName))); return -1; } else { - /* local node stays as master */ + /* local node stays as leader */ ereport(LOG, - (errmsg("remote node:\"%s\" should step down from master because we have more connected standby nodes" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" should step down from leader because we have more connected standby nodes" + ,otherLeaderNode->nodeName))); return 1; } } else /* decide on which node is the older mater */ { - if (otherMasterNode->current_state_time.tv_sec < g_cluster.localNode->current_state_time.tv_sec) + if (otherLeaderNode->current_state_time.tv_sec < g_cluster.localNode->current_state_time.tv_sec) { /* remote node has more alive nodes */ ereport(LOG, - (errmsg("remote node:\"%s\" is best suitable to stay as master because it is the older master" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" is best suitable to stay as leader because it is the older leader" + ,otherLeaderNode->nodeName))); return -1; } else { - /* local node should keep the master status */ + /* local node should keep the leader status */ ereport(LOG, - (errmsg("remote node:\"%s\" should step down from master because we are the older master" - ,otherMasterNode->nodeName))); + (errmsg("remote node:\"%s\" should step down from leader because we are the older leader" + ,otherLeaderNode->nodeName))); return 1; } @@ -6510,42 +6510,42 @@ I_am_master_and_cluser_in_split_brain(WatchdogNode * otherMasterNode) } static void -handle_split_brain(WatchdogNode * otherMasterNode, WDPacketData * pkt) +handle_split_brain(WatchdogNode * otherLeaderNode, WDPacketData * pkt) { - int decide_master = I_am_master_and_cluser_in_split_brain(otherMasterNode); + int decide_leader = I_am_leader_and_cluser_in_split_brain(otherLeaderNode); - if (decide_master == 0) + if (decide_leader == 0) { /* * we are not able to decide which should be the best candidate to - * stay as master/coordinator node This could also happen if the + * stay as leader/coordinator node This could also happen if the * remote node is using the older version of Pgpool-II which send the * empty beacon messages. */ ereport(LOG, - (errmsg("We are in split brain, and not able to decide the best candidate for master/coordinator"), + (errmsg("We are in split brain, and not able to decide the best candidate for leader/coordinator"), errdetail("re-initializing the local watchdog cluster state"))); - send_cluster_service_message(otherMasterNode, pkt, CLUSTER_NEEDS_ELECTION); + send_cluster_service_message(otherLeaderNode, pkt, CLUSTER_NEEDS_ELECTION); set_state(WD_JOINING); } - else if (decide_master == -1) + else if (decide_leader == -1) { - /* Remote node is the best candidate for the master node */ + /* Remote node is the best candidate for the leader node */ ereport(LOG, - (errmsg("We are in split brain, and \"%s\" node is the best candidate for master/coordinator" - ,otherMasterNode->nodeName), + (errmsg("We are in split brain, and \"%s\" node is the best candidate for leader/coordinator" + ,otherLeaderNode->nodeName), errdetail("re-initializing the local watchdog cluster state"))); - /* broadcast the message about I am not the true master node */ - send_cluster_service_message(NULL, pkt, CLUSTER_IAM_NOT_TRUE_MASTER); + /* broadcast the message about I am not the true leader node */ + send_cluster_service_message(NULL, pkt, CLUSTER_IAM_NOT_TRUE_LEADER); set_state(WD_JOINING); } else { - /* I am the best candidate for the master node */ + /* I am the best candidate for the leader node */ ereport(LOG, - (errmsg("We are in split brain, and I am the best candidate for master/coordinator"), - errdetail("asking the remote node \"%s\" to step down", otherMasterNode->nodeName))); - send_cluster_service_message(otherMasterNode, pkt, CLUSTER_IAM_TRUE_MASTER); + (errmsg("We are in split brain, and I am the best candidate for leader/coordinator"), + errdetail("asking the remote node \"%s\" to step down", otherLeaderNode->nodeName))); + send_cluster_service_message(otherLeaderNode, pkt, CLUSTER_IAM_TRUE_LEADER); } } @@ -6583,7 +6583,7 @@ start_escalated_node(void) ereport(LOG, (errmsg("escalation process started with PID:%d", g_cluster.escalation_pid))); if (strlen(g_cluster.localNode->delegate_ip) > 0) - g_cluster.clusterMasterInfo.holding_vip = true; + g_cluster.clusterLeaderInfo.holding_vip = true; } else { @@ -6617,7 +6617,7 @@ resign_from_escalated_node(void) (errmsg("escalation process does not exited in time"), errdetail("starting the de-escalation anyway"))); g_cluster.de_escalation_pid = fork_plunging_process(); - g_cluster.clusterMasterInfo.holding_vip = false; + g_cluster.clusterLeaderInfo.holding_vip = false; g_cluster.localNode->escalated = false; reset_watchdog_node_escalated(); } @@ -6697,7 +6697,7 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD switch (event) { case WD_EVENT_WD_STATE_CHANGED: - send_cluster_command(WD_MASTER_NODE, WD_JOIN_COORDINATOR_MESSAGE, 5); + send_cluster_command(WD_LEADER_NODE, WD_JOIN_COORDINATOR_MESSAGE, 5); /* Also reset my priority as per the original configuration */ g_cluster.localNode->wd_priority = pool_config->wd_priority; set_timeout(BEACON_MESSAGE_INTERVAL_SECONDS); @@ -6710,9 +6710,9 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD case WD_EVENT_WD_STATE_REQUIRE_RELOAD: ereport(LOG, - (errmsg("re-sending join coordinator message to master node: \"%s\"", WD_MASTER_NODE->nodeName))); + (errmsg("re-sending join coordinator message to leader node: \"%s\"", WD_LEADER_NODE->nodeName))); - send_cluster_command(WD_MASTER_NODE, WD_JOIN_COORDINATOR_MESSAGE, 5); + send_cluster_command(WD_LEADER_NODE, WD_JOIN_COORDINATOR_MESSAGE, 5); break; case WD_EVENT_COMMAND_FINISHED: @@ -6726,7 +6726,7 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD ereport(LOG, (errmsg("successfully joined the watchdog cluster as standby node"), - errdetail("our join coordinator request is accepted by cluster leader node \"%s\"", WD_MASTER_NODE->nodeName))); + errdetail("our join coordinator request is accepted by cluster leader node \"%s\"", WD_LEADER_NODE->nodeName))); } else { @@ -6734,10 +6734,10 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD (errmsg("our join coordinator is rejected by node \"%s\"", wdNode->nodeName), errhint("rejoining the cluster."))); - if (WD_MASTER_NODE->has_lost_us) + if (WD_LEADER_NODE->has_lost_us) { ereport(LOG, - (errmsg("master node \"%s\" thinks we are lost, and \"%s\" is not letting us join",WD_MASTER_NODE->nodeName,wdNode->nodeName), + (errmsg("leader node \"%s\" thinks we are lost, and \"%s\" is not letting us join",WD_LEADER_NODE->nodeName,wdNode->nodeName), errhint("please verify the watchdog life-check and network is working properly"))); set_state(WD_NETWORK_ISOLATION); } @@ -6757,10 +6757,10 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD * removed from it's standby list * So re-Join the cluster */ - if (WD_MASTER_NODE == wdNode) + if (WD_LEADER_NODE == wdNode) { ereport(LOG, - (errmsg("we are lost on the master node \"%s\"",wdNode->nodeName))); + (errmsg("we are lost on the leader node \"%s\"",wdNode->nodeName))); set_state(WD_JOINING); } } @@ -6772,10 +6772,10 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD * we have lost one remote connected node check if the node * was coordinator */ - if (WD_MASTER_NODE == NULL) + if (WD_LEADER_NODE == NULL) { ereport(LOG, - (errmsg("We have lost the cluster master node \"%s\"", wdNode->nodeName))); + (errmsg("We have lost the cluster leader node \"%s\"", wdNode->nodeName))); set_state(WD_JOINING); } } @@ -6790,10 +6790,10 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD /* In case we received the ADD node message from * our coordinator. Reset the cluster state */ - if (wdNode == WD_MASTER_NODE) + if (wdNode == WD_LEADER_NODE) { ereport(LOG, - (errmsg("received ADD NODE message from the master node \"%s\"", wdNode->nodeName), + (errmsg("received ADD NODE message from the leader node \"%s\"", wdNode->nodeName), errdetail("re-joining the cluster"))); set_state(WD_JOINING); } @@ -6809,7 +6809,7 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD case WD_STAND_FOR_COORDINATOR_MESSAGE: { - if (WD_MASTER_NODE == NULL) + if (WD_LEADER_NODE == NULL) { reply_with_minimal_message(wdNode, WD_ACCEPT_MESSAGE, pkt); set_state(WD_PARTICIPATE_IN_ELECTION); @@ -6817,27 +6817,27 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD else { ereport(LOG, - (errmsg("We are connected to master node \"%s\" and another node \"%s\" is trying to become a master",WD_MASTER_NODE->nodeName, wdNode->nodeName))); + (errmsg("We are connected to leader node \"%s\" and another node \"%s\" is trying to become a leader",WD_LEADER_NODE->nodeName, wdNode->nodeName))); reply_with_minimal_message(wdNode, WD_ERROR_MESSAGE, pkt); - /* Ask master to re-send its node info */ - send_message_of_type(WD_MASTER_NODE, WD_REQ_INFO_MESSAGE, NULL); + /* Ask leader to re-send its node info */ + send_message_of_type(WD_LEADER_NODE, WD_REQ_INFO_MESSAGE, NULL); } } break; case WD_DECLARE_COORDINATOR_MESSAGE: { - if (wdNode != WD_MASTER_NODE) + if (wdNode != WD_LEADER_NODE) { /* - * we already have a master node and we got a - * new node trying to be master + * we already have a leader node and we got a + * new node trying to be leader */ ereport(LOG, - (errmsg("We are connected to master node \"%s\" and another node \"%s\" is trying to declare itself as a master",WD_MASTER_NODE->nodeName, wdNode->nodeName))); + (errmsg("We are connected to leader node \"%s\" and another node \"%s\" is trying to declare itself as a leader",WD_LEADER_NODE->nodeName, wdNode->nodeName))); reply_with_minimal_message(wdNode, WD_ERROR_MESSAGE, pkt); - /* Ask master to re-send its node info */ - send_message_of_type(WD_MASTER_NODE, WD_REQ_INFO_MESSAGE, NULL); + /* Ask leader to re-send its node info */ + send_message_of_type(WD_LEADER_NODE, WD_REQ_INFO_MESSAGE, NULL); } } @@ -6849,11 +6849,11 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD * if the message is received from coordinator * reply with info, otherwise reject */ - if (wdNode != WD_MASTER_NODE) + if (wdNode != WD_LEADER_NODE) { ereport(LOG, (errmsg("\"%s\" is our coordinator node, but \"%s\" is also announcing as a coordinator", - WD_MASTER_NODE->nodeName, wdNode->nodeName), + WD_LEADER_NODE->nodeName, wdNode->nodeName), errdetail("broadcasting the cluster in split-brain message"))); send_cluster_service_message(NULL, pkt, CLUSTER_IN_SPLIT_BRAIN); @@ -6879,35 +6879,35 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD /* * before returning from the function make sure that we are connected with - * the master node + * the leader node */ - if (WD_MASTER_NODE) + if (WD_LEADER_NODE) { struct timeval currTime; gettimeofday(&currTime, NULL); - int last_rcv_sec = WD_TIME_DIFF_SEC(currTime, WD_MASTER_NODE->last_rcv_time); + int last_rcv_sec = WD_TIME_DIFF_SEC(currTime, WD_LEADER_NODE->last_rcv_time); if (last_rcv_sec >= (3 * BEACON_MESSAGE_INTERVAL_SECONDS)) { - /* we have missed atleast two beacons from master node */ + /* we have missed atleast two beacons from leader node */ ereport(WARNING, - (errmsg("we have not received a beacon message from master node \"%s\" and it has not replied to our info request", - WD_MASTER_NODE->nodeName), + (errmsg("we have not received a beacon message from leader node \"%s\" and it has not replied to our info request", + WD_LEADER_NODE->nodeName), errdetail("re-initializing the cluster"))); set_state(WD_JOINING); } else if (last_rcv_sec >= (2 * BEACON_MESSAGE_INTERVAL_SECONDS)) { /* - * We have not received a last becacon from master ask for the - * node info from master node + * We have not received a last becacon from leader ask for the + * node info from leader node */ ereport(WARNING, - (errmsg("we have not received a beacon message from master node \"%s\"", - WD_MASTER_NODE->nodeName), - errdetail("requesting info message from master node"))); - send_message_of_type(WD_MASTER_NODE, WD_REQ_INFO_MESSAGE, NULL); + (errmsg("we have not received a beacon message from leader node \"%s\"", + WD_LEADER_NODE->nodeName), + errdetail("requesting info message from leader node"))); + send_message_of_type(WD_LEADER_NODE, WD_REQ_INFO_MESSAGE, NULL); } } return 0; @@ -6930,11 +6930,11 @@ update_quorum_status(void) { int quorum_status = g_cluster.quorum_status; - if (g_cluster.clusterMasterInfo.standby_nodes_count > get_minimum_remote_nodes_required_for_quorum()) + if (g_cluster.clusterLeaderInfo.standby_nodes_count > get_minimum_remote_nodes_required_for_quorum()) { g_cluster.quorum_status = 1; } - else if (g_cluster.clusterMasterInfo.standby_nodes_count == get_minimum_remote_nodes_required_for_quorum()) + else if (g_cluster.clusterLeaderInfo.standby_nodes_count == get_minimum_remote_nodes_required_for_quorum()) { if (g_cluster.remoteNodeCount % 2 != 0) { @@ -7577,7 +7577,7 @@ check_and_report_IPC_authentication(WDCommandData * ipcCommand) case WD_IPC_FAILOVER_COMMAND: case WD_IPC_ONLINE_RECOVERY_COMMAND: case WD_EXECUTE_CLUSTER_COMMAND: - case WD_GET_MASTER_DATA_REQUEST: + case WD_GET_LEADER_DATA_REQUEST: /* only allowed internaly. */ internal_client_only = true; break; @@ -7778,27 +7778,27 @@ send_command_packet_to_remote_nodes(WDCommandData * ipcCommand, bool source_incl } static void -set_cluster_master_node(WatchdogNode * wdNode) +set_cluster_leader_node(WatchdogNode * wdNode) { - if (WD_MASTER_NODE != wdNode) + if (WD_LEADER_NODE != wdNode) { if (wdNode == NULL) ereport(LOG, - (errmsg("removing the %s node \"%s\" from watchdog cluster master", - (g_cluster.localNode == WD_MASTER_NODE) ? "local" : "remote", - WD_MASTER_NODE->nodeName))); + (errmsg("removing the %s node \"%s\" from watchdog cluster leader", + (g_cluster.localNode == WD_LEADER_NODE) ? "local" : "remote", + WD_LEADER_NODE->nodeName))); else ereport(LOG, - (errmsg("setting the %s node \"%s\" as watchdog cluster master", + (errmsg("setting the %s node \"%s\" as watchdog cluster leader", (g_cluster.localNode == wdNode) ? "local" : "remote", wdNode->nodeName))); - g_cluster.clusterMasterInfo.masterNode = wdNode; + g_cluster.clusterLeaderInfo.leaderNode = wdNode; } } -static WatchdogNode * getMasterWatchdogNode(void) +static WatchdogNode * getLeaderWatchdogNode(void) { - return g_cluster.clusterMasterInfo.masterNode; + return g_cluster.clusterLeaderInfo.leaderNode; } static int @@ -7809,24 +7809,24 @@ standby_node_join_cluster(WatchdogNode * wdNode) int i; /* First check if the node is already in the List */ - for (i = 0; i < g_cluster.clusterMasterInfo.standby_nodes_count; i++) + for (i = 0; i < g_cluster.clusterLeaderInfo.standby_nodes_count; i++) { - WatchdogNode *node = g_cluster.clusterMasterInfo.standbyNodes[i]; + WatchdogNode *node = g_cluster.clusterLeaderInfo.standbyNodes[i]; if (node && node == wdNode) { /* The node is already in the standby list */ - return g_cluster.clusterMasterInfo.standby_nodes_count; + return g_cluster.clusterLeaderInfo.standby_nodes_count; } } /* okay the node is not in the list */ ereport(LOG, (errmsg("adding watchdog node \"%s\" to the standby list", wdNode->nodeName))); - g_cluster.clusterMasterInfo.standbyNodes[g_cluster.clusterMasterInfo.standby_nodes_count] = wdNode; - g_cluster.clusterMasterInfo.standby_nodes_count++; + g_cluster.clusterLeaderInfo.standbyNodes[g_cluster.clusterLeaderInfo.standby_nodes_count] = wdNode; + g_cluster.clusterLeaderInfo.standby_nodes_count++; } - g_cluster.localNode->standby_nodes_count = g_cluster.clusterMasterInfo.standby_nodes_count; - return g_cluster.clusterMasterInfo.standby_nodes_count; + g_cluster.localNode->standby_nodes_count = g_cluster.clusterLeaderInfo.standby_nodes_count; + return g_cluster.clusterLeaderInfo.standby_nodes_count; } static int @@ -7836,19 +7836,19 @@ standby_node_left_cluster(WatchdogNode * wdNode) { int i; bool removed = false; - int standby_nodes_count = g_cluster.clusterMasterInfo.standby_nodes_count; + int standby_nodes_count = g_cluster.clusterLeaderInfo.standby_nodes_count; for (i = 0; i < standby_nodes_count; i++) { - WatchdogNode *node = g_cluster.clusterMasterInfo.standbyNodes[i]; + WatchdogNode *node = g_cluster.clusterLeaderInfo.standbyNodes[i]; if (node) { if (removed) { /* move this to previous index */ - g_cluster.clusterMasterInfo.standbyNodes[i - 1] = node; - g_cluster.clusterMasterInfo.standbyNodes[i] = NULL; + g_cluster.clusterLeaderInfo.standbyNodes[i - 1] = node; + g_cluster.clusterLeaderInfo.standbyNodes[i] = NULL; } else if (node == wdNode) { @@ -7858,15 +7858,15 @@ standby_node_left_cluster(WatchdogNode * wdNode) ereport(LOG, (errmsg("removing watchdog node \"%s\" from the standby list", wdNode->nodeName))); - g_cluster.clusterMasterInfo.standbyNodes[i] = NULL; - g_cluster.clusterMasterInfo.standby_nodes_count--; + g_cluster.clusterLeaderInfo.standbyNodes[i] = NULL; + g_cluster.clusterLeaderInfo.standby_nodes_count--; removed = true; } } } } - g_cluster.localNode->standby_nodes_count = g_cluster.clusterMasterInfo.standby_nodes_count; - return g_cluster.clusterMasterInfo.standby_nodes_count; + g_cluster.localNode->standby_nodes_count = g_cluster.clusterLeaderInfo.standby_nodes_count; + return g_cluster.clusterLeaderInfo.standby_nodes_count; } static void @@ -7876,12 +7876,12 @@ clear_standby_nodes_list(void) ereport(DEBUG1, (errmsg("removing all watchdog nodes from the standby list"), - errdetail("standby list contains %d nodes", g_cluster.clusterMasterInfo.standby_nodes_count))); + errdetail("standby list contains %d nodes", g_cluster.clusterLeaderInfo.standby_nodes_count))); for (i = 0; i < g_cluster.remoteNodeCount; i++) { - g_cluster.clusterMasterInfo.standbyNodes[i] = NULL; + g_cluster.clusterLeaderInfo.standbyNodes[i] = NULL; } - g_cluster.clusterMasterInfo.standby_nodes_count = 0; + g_cluster.clusterLeaderInfo.standby_nodes_count = 0; g_cluster.localNode->standby_nodes_count = 0; } @@ -7931,7 +7931,7 @@ static void update_missed_beacon_count(WDCommandData* ipcCommand, bool clear) * Node down request file. In the file, each line consists of watchdog * debug command. The possible commands are same as the defines below * for example to stop Pgpool-II from sending the reply to beacon messages - * from the master node write DO_NOT_REPLY_TO_BEACON in watchdog_debug_requests + * from the leader node write DO_NOT_REPLY_TO_BEACON in watchdog_debug_requests * * * echo "DO_NOT_REPLY_TO_BEACON" > pgpool_logdir/watchdog_debug_requests diff --git a/src/watchdog/wd_internal_commands.c b/src/watchdog/wd_internal_commands.c index e85b49c..f5dacaa 100644 --- a/src/watchdog/wd_internal_commands.c +++ b/src/watchdog/wd_internal_commands.c @@ -104,16 +104,16 @@ wd_ipc_initialize_data(void) /* * function gets the PG backend status of all attached nodes from - * the master watchdog node. + * the leader watchdog node. */ WDPGBackendStatus * -get_pg_backend_status_from_master_wd_node(void) +get_pg_backend_status_from_leader_wd_node(void) { unsigned int *shared_key = get_ipc_shared_key(); char *data = get_data_request_json(WD_DATE_REQ_PG_BACKEND_DATA, shared_key ? *shared_key : 0, pool_config->wd_authkey); - WDIPCCmdResult *result = issue_command_to_watchdog(WD_GET_MASTER_DATA_REQUEST, + WDIPCCmdResult *result = issue_command_to_watchdog(WD_GET_LEADER_DATA_REQUEST, WD_DEFAULT_IPC_COMMAND_TIMEOUT, data, strlen(data), true); @@ -122,14 +122,14 @@ get_pg_backend_status_from_master_wd_node(void) if (result == NULL) { ereport(WARNING, - (errmsg("get backend node status from master watchdog failed"), + (errmsg("get backend node status from leader watchdog failed"), errdetail("issue command to watchdog returned NULL"))); return NULL; } if (result->type == WD_IPC_CMD_CLUSTER_IN_TRAN) { ereport(WARNING, - (errmsg("get backend node status from master watchdog failed"), + (errmsg("get backend node status from leader watchdog failed"), errdetail("watchdog cluster is not in stable state"), errhint("try again when the cluster is fully initialized"))); FreeCmdResult(result); @@ -138,7 +138,7 @@ get_pg_backend_status_from_master_wd_node(void) else if (result->type == WD_IPC_CMD_TIMEOUT) { ereport(WARNING, - (errmsg("get backend node status from master watchdog failed"), + (errmsg("get backend node status from leader watchdog failed"), errdetail("ipc command timeout"))); FreeCmdResult(result); return NULL; @@ -149,7 +149,7 @@ get_pg_backend_status_from_master_wd_node(void) /* * Watchdog returns the zero length data when the node itself is a - * master watchdog node + * leader watchdog node */ if (result->length <= 0) { @@ -165,7 +165,7 @@ get_pg_backend_status_from_master_wd_node(void) } ereport(WARNING, - (errmsg("get backend node status from master watchdog failed"))); + (errmsg("get backend node status from leader watchdog failed"))); FreeCmdResult(result); return NULL; } @@ -432,7 +432,7 @@ wd_issue_failover_command(char *func_name, int *node_id_set, int count, unsigned * now watchdog can respond to the request in following ways. * * 1 - It can tell the caller to procees with failover. This - * happens when the current node is the master watchdog node. + * happens when the current node is the leader watchdog node. * * 2 - It can tell the caller to failover not allowed * this happens when either cluster does not have the quorum