[pgpool-general: 8795] Re: How does pgpool handle the due-failure problem?

Tatsuo Ishii ishii at sraoss.co.jp
Fri Jun 2 09:28:48 JST 2023


> Hi Tatsuo!
> 
> I have been dwelling on this recently.
> 
>  I believe if once a server has been running *solo*, it would have to
> remain the "primary status" until one standby wakes up and makes
> replications of it.
> So in the situation that no other server is up, the
> awake server should check whether its last state is solo. If so it can
> serve on solo again but if not it has to wait until the solo wakes up and
> becomes standby again. Is there a way to mark up the* solo* status and
> cancel it whenever a transition ( failover and recovery) takes place?

Pgpool-II is not designed like that. But you may achieve somewhat
similar behavior by using "ALWAYS_PRIMARY" flag.  See
https://www.pgpool.net/docs/44/en/html/runtime-config-backend-settings.html#RUNTIME-CONFIG-BACKEND-DATA.

> Regards,
>   Zhaoxun
> 
> On Fri, Apr 7, 2023 at 9:55 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Zhaoxun,
>>
>> > Hi Tatsuo!
>> >
>> > Thank you for testing.
>> >
>> > In your example, I mean what if now localhost 11002 - the old primary
>> > postgresql - recovers, noticing standby is down and hence starts to serve
>> > as the primary with data0.
>>
>> My answer is don't do that, because 11002 primary does not have the
>> recent data. You should work on recovering 11003 PostgreSQL as this is
>> the only server having the latest data.
>>
>> For this reason I recommend you to have more than 1 standby servers so
>> that there's a good chance to have at least 1 alive standby server.
>>
>> > Later, as the old standby recovers, it must
>> > follow the old primary as standby, therefore loses all the data it
>> updated
>> > to data1 while the old primary is down.
>> >
>> > Best Regards,
>> >   Zhaoxun
>> >
>> > On Thu, Apr 6, 2023 at 1:55 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> > Suppose we have two servers, under extreme circumstances two may both
>> >> fail.
>> >> > Now that we face 4 possibilities:
>> >> >
>> >> > 1) Master fail -> Standby self-promote -> Standby fail -> old Master
>> >> > recover ?
>> >> > 2) Master fail -> Standby self-promote -> Standby fail -> Standby and
>> new
>> >> > Master recover?
>> >> > 3) Standby fail -> Master fail -> Standby Recover?
>> >> > 4) Standby fail -> Master fail -> Master recover?
>> >> >
>> >> > 1 and 3 are especially hazardous because the only recovered server may
>> >> view
>> >> > itself as the current master and hence lose data during its failure
>> >> time. I
>> >> > believe when only one server wakes up it should stay and wait for the
>> >> other
>> >> > server to recover before negotiating who should be the new master.
>> >> >
>> >> > Does pgpool have such a mechanism?
>> >>
>> >> For #1 yes.
>> >>
>> >> # initial state: primary and standby are up.
>> >> $ pcp_node_info -w -p 11001
>> >> localhost 11002 1 0.500000 waiting up primary primary 0 none none
>> >> 2023-04-06 14:37:42
>> >> localhost 11003 1 0.500000 waiting up standby standby 0 streaming async
>> >> 2023-04-06 14:37:42
>> >>
>> >> # master fail. stop the primary.
>> >> $ pg_ctl -D data0 stop
>> >> waiting for server to shut down.... done
>> >> server stopped
>> >>
>> >> # the primary down and the standby self-promote.
>> >> $ pcp_node_info -w -p 11001
>> >> localhost 11002 3 0.500000 down down standby unknown 0 none none
>> >> 2023-04-06 14:38:27
>> >> localhost 11003 1 0.500000 waiting up primary primary 0 none none
>> >> 2023-04-06 14:38:27
>> >>
>> >> # the (old) standby fail.
>> >> $ pg_ctl -D data1 stop
>> >> waiting for server to shut down.... done
>> >> server stopped
>> >> $ pcp_node_info -w -p 11001
>> >> pcp_node_info -w -p 11001
>> >> localhost 11002 3 0.500000 down down standby unknown 0 none none
>> >> 2023-04-06 14:38:27
>> >> localhost 11003 3 0.500000 down down standby unknown 0 none none
>> >> 2023-04-06 14:38:55
>> >>
>> >> # now pgpool does not accept any connection from clients.
>> >> $ psql -p 11000 test
>> >> psql: error: connection to server on socket "/tmp/.s.PGSQL.11000"
>> failed:
>> >> ERROR:  pgpool is not accepting any new connections
>> >> DETAIL:  all backend nodes are down, pgpool requires at least one valid
>> >> node
>> >> HINT:  repair the backend nodes and restart pgpool
>> >>
>> >> #2 is basically same because after both the primary and the stabdby go
>> >>  down, pgpool won't accept connection from clients.
>> >>
>> >> For #3 and #4, I am not sure what you mean. Maybe you mean the case
>> >> when no failover command is configured (thus no self-promote)? If so,
>> >> the result is same as #1 and #2.
>> >>
>> >> Best reagards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS LLC
>> >> English: http://www.sraoss.co.jp/index_en/
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>>


More information about the pgpool-general mailing list