[pgpool-hackers: 1981] Re: pgpool-hackers Digest, Vol 63, Issue 14

Thu Jan 12 17:10:08 JST 2017

Hi The Pgpool Community,

I'm not sure my understanding of the question is clean enough but to me it seems the pgpool faces two different cases which I want to highlight:

1) It starts and detects all backends (say we have 2 backends) are not is recovery mode
   At this moment if pgpool continues starting than it will damage data which is not acceptable.

2) The backends eventually become Masters (split brain).
   In this case there should be some logic that detects (on the fly) the Split Brain case, probably the lifecheck task can do that - query pg_is_in_recovery().

Here what we see and what will be possible to extend:
Would it be possible to tirgger some script in both cases and provide parameters into it which will be responsible for the choosing of the incorrect "Master". It seems the pgpool has to be shipped with the default script but it still should be possible to update it. Thus the customers will be free to consider different self developed scenarios - how  to resolve the Split Brain.

Best Regards,
Sergey
HA Architect as Odin
http://www.odin.com

________________________________
From: pgpool-hackers-bounces at pgpool.net <pgpool-hackers-bounces at pgpool.net> on behalf of pgpool-hackers-request at pgpool.net <pgpool-hackers-request at pgpool.net>
Sent: Thursday, January 12, 2017 6:00 AM
To: pgpool-hackers at pgpool.net
Subject: pgpool-hackers Digest, Vol 63, Issue 14

Send pgpool-hackers mailing list submissions to
        pgpool-hackers at pgpool.net

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.sraoss.jp/mailman/listinfo/pgpool-hackers
or, via email, send a message with subject or body 'help' to
        pgpool-hackers-request at pgpool.net

You can reach the person managing the list at
        pgpool-hackers-owner at pgpool.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of pgpool-hackers digest..."

Today's Topics:

   1. [pgpool-hackers: 1979] New feature candidate: verify standby
      node while finding primary node (Tatsuo Ishii)

----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Jan 2017 11:34:59 +0900 (JST)
From: Tatsuo Ishii <ishii at sraoss.co.jp>
To: pgpool-hackers at pgpool.net
Subject: [pgpool-hackers: 1979] New feature candidate: verify standby
        node while finding primary node
Message-ID: <20170112.113459.213399609554012313.t-ishii at sraoss.co.jp>
Content-Type: Text/Plain; charset=us-ascii

This is a proposal for a new feature toward Pgpool-II 3.7.

Currently Pgpool-II finds a primary node and standby node like this
(it happens while Pgpool-II starting up or failover):

1) Issue "SELECT pg_is_in_recovery()" to a node in question.

2) If it returns "t", then decide the node is standby. Go to next node
   (go back to step 1).

3) If it returns other than that, then decide the node is
   the primary. Other nodes are regarded as standby.

This logic works mostly well except in an unusual scenario like this:

i) We have two nodes: node 0 is primary, node 1 is standby.

ii) A stupid admin issues "pg_ctl promote" to the standby node and node 1 becomes
  a stand alone PostgreSQL.

In this case, eventually node 1 will be behind to node 0, because no
replication happens. If replication delay check is enabled, Pgpool-II
avoids to send queries to node 1 because of the replication
delay. However, if the replication delay check is not enabled or the
replication delay threshold is large, user will not notice the
situation.

Also the scenario is known as "split brain" which users want to
avoid. I think we need to do something here.

Here is the modified procedure to avoid it.

1) Issue "SELECT pg_is_in_recovery()" to a node in question.

2) If it returns "t", then decide the node is standby. Go to next node
   (go back to step 1).

3) If it returns other than that, then decide the node is the
   primary. Check remaining nodes whether they are actually standby or
   not by issuing "SELECT pg_is_in_recovery()".  Additionally we could
   use pg_stat_wal_receiver view to check if it actually connects to
   the primary node if the PostgreSQL version is 9.6 or higher.

Question is, what if the checking in #3 reveals that the node in
question is not "proper" standby.

- Do we want to add new status code other than "up", "down", "not
  connected" and "unused"?

- Do we want to automatically detach the node so that Pgpool-II does
  not use the node?

- Do we want to the check more ferequetly, say a similar timing as
  health checking?

Comments, suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

------------------------------

_______________________________________________
pgpool-hackers mailing list
pgpool-hackers at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-hackers

End of pgpool-hackers Digest, Vol 63, Issue 14
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20170112/63254d77/attachment-0001.html>