[pgpool-general: 5953] Re: Split-brain remedy

Thu Mar 1 01:37:47 JST 2018

With 'trusted_servers' configured, when I unplug 10.0.0.1 it kills pgpool,
i.e. 'service pgpool status' reports 'pgpool dead but subsys locked'.
Is that how it should be?

Plug/unplug = ifconfig eth0 up/down

On Tue, Feb 27, 2018 at 1:49 PM, Pierre Timmermans <ptim007 at yahoo.com>
wrote:

> To prevent this split brain scenario (caused by a network partition) you
> can use the configuration trusted_servers. This setting is a list of
> servers that pgpool can use to determine if a node is suffering a network
> partition or not. If a node cannot reach any of the servers in the list,
> then it will assume it is isolated (by a network partition) and will not
> promote itself to master.
>
> In general, when you have only two nodes, it is not safe to do an
> automatic failover I believe.  Unless you have some kind of fencing
> mechanism (means: you can shutdown and prevent a failed node to come back
> after a failure).
>
> Pierre
>
>
> On Tuesday, February 27, 2018, 7:58:55 PM GMT+1, Alexander Dorogensky <
> amazinglifetime at gmail.com> wrote:
>
>
> Hi All,
>
> I have a 10.0.0.1/10.0.0.2 master/hot standby configuration with
> streaming replication, where each node runs pgpool with watchdog enabled
> and postgres.
>
> I shut down the network interface on 10.0.0.1 and wait until 10.0.0.2
> triggers failover and promotes itself to master through my failover script.
>
> Now the watchdogs on 10.0.0.1 and 10.0.0.2 are out of sync, have
> conflicting views on which node has failed and both think they are master.
>
> When I bring back the network interface on 10.0.0.1, 'show pool_nodes'
> says that 10.0.0.1 is master/up and 10.0.0.2 is standby/down.
>
> I want 10.0.0.1 to be standby and 10.0.0.2 to be master.
>
> I've been playing with the failover script.. e.g.
>
> if (default network gateway is pingable) {
>     shut down pgpool and postgres
> } else if (this node is standby) {
>     promote this node to master
>     create a job that will run every minute and try to recover failed node
> (base backup)
>     cancel the job upon successful recovery
> }
>
> Can you please help me with this? Any ideas would be highly appreciated.
>
> Regards, Alex
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180228/0a18820b/attachment.html>