[pgpool-general: 6004] pgpool identify nodes bug after stopping all nodes [without deatch]

Mariel Cherkassky mariel.cherkassky at gmail.com
Mon Mar 19 02:09:50 JST 2018


Hi,
I'm using pgpool version 3.7.2 .
I configured  IN pgpool 3 nodes (that havethe DISALLOW_FAILOVER command).
I configured those 3 nodes replication with repmgr.  I synced all the nodes
and started the pgpool (but forgot to deatch them via pcp in the pool when
I stopped the cluster).
 After starting the pgpool process many times It took the pool about 10
minutes to identify the nodes and you see in the log the same ourput during
the 10 minutes :
[[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
[31193]LOG:  find_primary_node: checking backend no 0
[[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
[31193]LOG:  find_primary_node: checking backend no 1
[[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
[31193]LOG:  find_primary_node: checking backend no 2

Moreover, I couldnt use the pcp_commands *, *I got the next error :
[postgres at pgpool1 log]$ pcp_node_info -h localhost -U postgres -p 9898 0
Password:
ERROR: connection to host "localhost" failed with error "Connection refused"

Suddenly, after 10 minutes I saw in the log that the nodes where identified
:

[[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
[31193]LOG:  find_primary_node: checking backe
nd no 0
[[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
[31193]LOG:  find_primary_node: checking backe
nd no 1
[[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
[31193]LOG:  find_primary_node: checking backe
nd no 2
[[No Connection]]([No Connection]) - 2018-03-18 18:56:05 - [No Connection]
[31193]LOG:  pgpool-II successfully started. v
ersion 3.7.2 (amefuriboshi)
[[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
[1132]LOG:  forked new pcp worker, pid=1942 so
cket=8
[[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
[1132]LOG:  PCP process with pid: 1942 exit wi
th SUCCESS.
[[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
[1132]LOG:  PCP process with pid: 1942 exits w
ith status 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
[1132]LOG:  forked new pcp worker, pid=1957 so
cket=8
[[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
[1132]LOG:  PCP process with pid: 1957 exit with SUCCESS.
[[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
[1132]LOG:  PCP process with pid: 1957 exits with status 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
[1132]LOG:  forked new pcp worker, pid=1971 socket=8
[[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
[1132]LOG:  PCP process with pid: 1971 exit with SUCCESS.
[[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
[1132]LOG:  PCP process with pid: 1971 exits with status 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[1132]LOG:  forked new pcp worker, pid=2074 socket=8
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[2074]LOG:  received failback request for node_id: 1 from pid [2074]
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  Pgpool-II parent process has received failover request
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  starting fail back. reconnect host pgserver2(5432)
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  Node 0 is not down (status: 2)
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[1132]LOG:  PCP process with pid: 2074 exit with SUCCESS.
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[1132]LOG:  PCP process with pid: 2074 exits with status 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  Do not restart children because we are failing back node id 1
host: pgserver2 port: 5432 and we are in streaming replication mode and not
all backends were down
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  find_primary_node_repeatedly: waiting for finding a primary
node
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  find_primary_node: checking backend no 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  find_primary_node: checking backend no 1
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  find_primary_node: primary node id is 1
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  failover: set new primary node: 1
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  failover: set new master node: 0
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[31193]LOG:  failback done. reconnect host pgserver2(5432)
[[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
[1135]LOG:  worker process received restart request
[[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
[1132]LOG:  restart request received in pcp child process
[[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
[31193]LOG:  PCP child 1132 exits with status 0 in failover()
[[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
[31193]LOG:  fork a new PCP child pid 2078 in failover()
[[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
[31193]LOG:  worker child process with pid: 1135 exits with status 256
[[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
[31193]LOG:  fork a new worker child process with pid: 2079
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]WARNING:  child process with pid: 31214 was terminated by
segmentation fault
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  fork a new child process with pid: 2086
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[2086]LOG:  failback event detected
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[2086]DETAIL:  restarting myself
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  child process with pid: 2086 exits with status 256
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  fork a new child process with pid: 2087
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]WARNING:  child process with pid: 31215 was terminated by
segmentation fault
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  fork a new child process with pid: 2090
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[2090]LOG:  failback event detected
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[2090]DETAIL:  restarting myself
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  child process with pid: 2090 exits with status 256
[[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
[31193]LOG:  fork a new child process with pid: 2091




--It seems that the pool was stuck and restarting it didnt resolve it. Is
it suppose to happen or is it a bug ? Can you explain to me what is the
reason behind it ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180318/18bec371/attachment.html>


More information about the pgpool-general mailing list