[pgpool-general: 5744] Re: pgpool says its degenerating a backend, but it does not...
Tatsuo Ishii
ishii at sraoss.co.jp
Fri Sep 22 14:58:26 JST 2017
Ok, I found a bug and commit/push the fix.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=be3712a31628a4378f0844c9181463c42fac5dd3
Pgpool-II does failover triggered by
failover_if_affected_tuples_mismatch correctly. But later on when a
client connect to Pgpool-II, a Pgpool-II child process thoughtlessly
tries to connect to the backend just detached fro Pgpool-II and change
the state to "Up".
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Hi Benjamin,
>
> I think you hit a bug of Pgpool-II. Will look into this.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>> Hi,
>>
>> we are using pgpool 3.6.5 with replication and load balancing. The
>> option "failover_if_affected_tuples_mismatch" is set to ON.
>> The option worked well for some time, if a mismatched tuple was detected
>> on a DELETE statement, the node was degenerated. But for some reason it
>> now only works from time to time, in most cases the node stays up,
>> despite pgpool log saying it was shut down.
>>
>> Example with 2 database nodes:
>>
>> /test=# show pool_nodes;//
>> // node_id | hostname | port | status | lb_weight | role | select_cnt
>> | load_balance_node | replication_delay//
>> //---------+-----------+------+--------+-----------+--------+------------+-------------------+-------------------//
>> // 0 | 10.0.1.9 | 5432 | up | 0.500000 | master | 32
>> | false | 0//
>> // 1 | 10.0.1.11 | 5432 | up | 0.500000 | slave | 12
>> | true | 0//
>> //(2 rows)//
>> /
>>
>> I manually add a table entry into one of the nodes to get them out of
>> sync and send a DELETE to pgpool:
>> /
>> //test=# DELETE FROM test;//
>> //ERROR: pgpool detected difference of the number of inserted, updated
>> or deleted tuples. Possible last query was: "DELETE FROM test;"//
>> //HINT: check data consistency between master and other db node//
>> //server closed the connection unexpectedly//
>> // This probably means the server terminated abnormally//
>> // before or while processing the request.//
>> //The connection to the server was lost. Attempting reset: Succeeded./
>>
>>
>> The log file states that the node was shut down:
>>
>> /Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09: pid
>> 3967: LOG: pgpool detected difference of the number of inserted,
>> updated or deleted tuples. Possible last query was: "DELETE FROM test;"//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG: processing command complete//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL: CommandComplete: Number of affected tuples are: 1 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG: processing ready for query message//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL: ReadyForQuery: Degenerate backends: 1//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG: processing ready for query message//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL: ReadyForQuery: Number of affected tuples are: 1 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG: received degenerate backend request for node_id: 1 from
>> pid [3967]//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG: Pgpool-II parent process has received failover request//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG: starting degeneration. shutdown host 10.0.1.11(5432)//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG: Restart all children//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG: failover: set new primary node: -1//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG: failover: set new master node: 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3969: LOG: worker process received restart request//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: failover done. shutdown
>> host 10.0.1.11(5432)2017-09-19 19:10:09: pid 798: LOG: failover done.
>> shutdown host 10.0.1.11(5432)//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 3968: LOG: restart request received in pcp child process//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: PCP child 3968 exits with status 0 in failover()//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: fork a new PCP child pid 5163 in failover()//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3451 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3451 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3452 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3452 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3453 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3453 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG: child process with pid: 3454 exits with status 0/
>>
>>
>> but the node is still up:
>>
>> /test=# show pool_nodes;//
>> // node_id | hostname | port | status | lb_weight | role | select_cnt
>> | load_balance_node | replication_delay//
>> //---------+-----------+------+--------+-----------+--------+------------+-------------------+-------------------//
>> // 0 | 10.0.1.9 | 5432 | up | 0.500000 | master | 44
>> | true | 0//
>> // 1 | 10.0.1.11 | 5432 | up | 0.500000 | slave | 54
>> | false | 0//
>> //(2 rows)/
>>
>>
>> If I send another DELETE the process repeats, but the node stays up.
>>
>>
>> Is this a bug or is my understanding wrong somewhere?
>>
>>
>> Thanks in advance
>> Benjamin Firl
>>
>>
>> --
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>
>> +++ Jetzt neu Wissensmanagement für Netzwerke +++
>>
>> +++ www.knodge.de +++
>>
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>> --
>>
>> www.wisit.com
>>
>> www.knodge.de
>>
>>
>>
>>
>>
>> wisit media GmbH
>>
>> Ehrenbergstr. 11
>>
>> D-98693 Ilmenau
>>
>> ---------------------------------------------------------------------------
>>
>> -
>>
>> wisit media GmbH, Ehrenbergstr. 11, D-98693 Ilmenau
>>
>> Registergericht Jena HRB 512472
>>
>> Geschaeftsfuehrung: Dipl. Ing. Markus Duelli
>>
>>
>>
>>
>>
>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich
>>
>> erhalten haben, informieren Sie bitte sofort den Absender und vernichten
>>
>> Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
>>
>> dieser E-Mail ist nicht gestattet.
>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
More information about the pgpool-general
mailing list