[pgpool-general: 5744] Re: pgpool says its degenerating a backend, but it does not...

Fri Sep 22 14:58:26 JST 2017

Ok, I found a bug and commit/push the fix.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=be3712a31628a4378f0844c9181463c42fac5dd3

Pgpool-II does failover triggered by
failover_if_affected_tuples_mismatch correctly. But later on when a
client connect to Pgpool-II, a Pgpool-II child process thoughtlessly
tries to connect to the backend just detached fro Pgpool-II and change
the state to "Up".

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Benjamin,
> 
> I think you hit a bug of Pgpool-II. Will look into this.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Hi,
>> 
>> we are using pgpool 3.6.5 with replication and load balancing. The
>> option "failover_if_affected_tuples_mismatch" is set to ON.
>> The option worked well for some time, if a mismatched tuple was detected
>> on a DELETE statement, the node was degenerated. But for some reason it
>> now only works from time to time, in most cases the node stays up,
>> despite pgpool log saying it was shut down.
>> 
>> Example with 2 database nodes:
>> 
>> /test=# show pool_nodes;//
>> // node_id | hostname  | port | status | lb_weight |  role  | select_cnt
>> | load_balance_node | replication_delay//
>> //---------+-----------+------+--------+-----------+--------+------------+-------------------+-------------------//
>> // 0       | 10.0.1.9  | 5432 | up     | 0.500000  | master | 32        
>> | false             | 0//
>> // 1       | 10.0.1.11 | 5432 | up     | 0.500000  | slave  | 12        
>> | true              | 0//
>> //(2 rows)//
>> /
>> 
>> I manually add a table entry into one of the nodes to get them out of
>> sync and send a DELETE to pgpool:
>> /
>> //test=# DELETE FROM test;//
>> //ERROR:  pgpool detected difference of the number of inserted, updated
>> or deleted tuples. Possible last query was: "DELETE FROM test;"//
>> //HINT:  check data consistency between master and other db node//
>> //server closed the connection unexpectedly//
>> //        This probably means the server terminated abnormally//
>> //        before or while processing the request.//
>> //The connection to the server was lost. Attempting reset: Succeeded./
>> 
>> 
>> The log file states that the node was shut down:
>> 
>> /Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09: pid
>> 3967: LOG:  pgpool detected difference of the number of inserted,
>> updated or deleted tuples. Possible last query was: "DELETE FROM test;"//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG:  processing command complete//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL:  CommandComplete: Number of affected tuples are: 1 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG:  processing ready for query message//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL:  ReadyForQuery: Degenerate backends: 1//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG:  processing ready for query message//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: DETAIL:  ReadyForQuery: Number of affected tuples are: 1 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3967: LOG:  received degenerate backend request for node_id: 1 from
>> pid [3967]//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG:  Pgpool-II parent process has received failover request//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG:  starting degeneration. shutdown host 10.0.1.11(5432)//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG:  Restart all children//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG:  failover: set new primary node: -1//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 798: LOG:  failover: set new master node: 0//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:09:
>> pid 3969: LOG:  worker process received restart request//
>> //Sep 19 19:10:09 sumak-test-pgpool pgpool[798]: failover done. shutdown
>> host 10.0.1.11(5432)2017-09-19 19:10:09: pid 798: LOG:  failover done.
>> shutdown host 10.0.1.11(5432)//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 3968: LOG:  restart request received in pcp child process//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  PCP child 3968 exits with status 0 in failover()//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  fork a new PCP child pid 5163 in failover()//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3451 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3451 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3452 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3452 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3453 exits with status 0//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3453 exited with success and will
>> not be restarted//
>> //Sep 19 19:10:10 sumak-test-pgpool pgpool[798]: 2017-09-19 19:10:10:
>> pid 798: LOG:  child process with pid: 3454 exits with status 0/
>> 
>> 
>> but the node is still up:
>> 
>> /test=# show pool_nodes;//
>> // node_id | hostname  | port | status | lb_weight |  role  | select_cnt
>> | load_balance_node | replication_delay//
>> //---------+-----------+------+--------+-----------+--------+------------+-------------------+-------------------//
>> // 0       | 10.0.1.9  | 5432 | up     | 0.500000  | master | 44        
>> | true              | 0//
>> // 1       | 10.0.1.11 | 5432 | up     | 0.500000  | slave  | 54        
>> | false             | 0//
>> //(2 rows)/
>> 
>> 
>> If I send another DELETE the process repeats, but the node stays up.
>> 
>> 
>> Is this a bug or is my understanding wrong somewhere?
>> 
>> 
>> Thanks in advance
>> Benjamin Firl
>> 
>> 
>> -- 
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> +++ Jetzt neu Wissensmanagement für Netzwerke +++
>> 
>> +++             www.knodge.de                 +++
>> 
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>>  
>> 
>> --
>> 
>> www.wisit.com
>> 
>> www.knodge.de
>> 
>>  
>> 
>>  
>> 
>> wisit media GmbH
>> 
>> Ehrenbergstr. 11
>> 
>> D-98693 Ilmenau
>> 
>> ---------------------------------------------------------------------------
>> 
>> -
>> 
>> wisit media GmbH, Ehrenbergstr. 11, D-98693 Ilmenau
>> 
>> Registergericht Jena HRB 512472
>> 
>> Geschaeftsfuehrung: Dipl. Ing. Markus Duelli
>> 
>>  
>> 
>>  
>> 
>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich
>> 
>> erhalten haben, informieren Sie bitte sofort den Absender und vernichten
>> 
>> Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
>> 
>> dieser E-Mail ist nicht gestattet. 
>> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general