[pgpool-general: 8761] Re: Clients disconnection when slave node is off

Tatsuo Ishii ishii at sraoss.co.jp
Sat May 13 16:24:11 JST 2023


Ok. The errors were generated while clients tried to connect to
pgpool.  My patch covers the case when failover happens while
connections from clients to pgpool are *kept*.  However the patch does
not cover the case when clients try to establish connection to pgpool
while failover.

I tested my patch using pgbench. If pgbench is given "-C" (create
connection for each transaction), I get same errors you mentioned.

I have to admit my patch does not cover all the cases. I need more
time to deal with these problems.

> Hi,
> 
> I think we cannot connect to pgpool. I will show you the output of my
> dbCheck script.
> 
> 
>    - *Pgpool without patch and backend1 as slave**:*
> 
> [root at pg_client1 services]# ./dbcheck.sh $VIP_PGPOOL
> 
> psql: ERROR:  do command failed
> 
> DETAIL:  backend error: "SFATAL"
> 
> psql: ERROR:  unable to read data from DB node 1
> 
> DETAIL:  socket read failed with error "Connection reset by peer"
> 
> psql: server closed the connection unexpectedly
> 
>         This probably means the server terminated abnormally
> 
>         before or while processing the request.
> 
> psql: server closed the connection unexpectedly
> 
>         This probably means the server terminated abnormally
> 
>         before or while processing the request.
> 
>         This probably means the server terminated abnormally
> 
>         before or while processing the request.
> 
> 
> 
>    - *Pgpool with patch and backend1 as slave:*
> 
> psql: ERROR:  unable to read message kind
> 
> DETAIL:  kind does not match between main(52)
> 
> 
> 
>    - *Pgpool with patch and backend1 as master**:*
> 
> psql: ERROR:  unable to read data from DB node
> 
> DETAIL:  socket read failed with error "Connection reset by peer"
> 
> server closed the connection unexpectedly
> 
>         This probably means the server terminated abnormally
> 
>         before or while processing the request.
> 
> connection to server was lost
> 
> server closed the connection unexpectedly
> 
>         This probably means the server terminated abnormally
> 
>         before or while processing the request.
> 
> connection to server was lost
> 
> 
> Anyway, with a client which uses ODBC, if it tries to access the database
> during failover (from slave node) the following error is displayed: "Driver
> Unable to Establish Connection with Data Source".
> 
> El vie, 12 may 2023 a las 9:40, Tatsuo Ishii (<ishii at sraoss.co.jp>)
> escribió:
> 
>> What do you mean by "database is not available"?
>>
>> 1. You can connect to pgpool but pgpool does not reply back.
>>
>> 2. You can cannect to pgpool but pgpool immediately disconnects.
>>
>> > Hi Tatsuo,
>> >
>> > I'm working with your patch but I continue facing a problem because the
>> > database is not available during 1 second aprox (I have a script calling
>> > select query every 0.1 seconds to check the time is not available the
>> > database).
>> >
>> > I will explain two different cases:
>> >
>> > 1. Slave node (backend1 in pgpool.conf) is turn off. With your patch the
>> > database is always available. Without your patch the database is not
>> > available during 1 second.
>> > 2. Master node (backend0) is turn off. Failover is done to promote
>> > backend1. After that, I turn on again backend0, which is now slave node.
>> If
>> > I turn off this slave node (backend0), the database is not available
>> during
>> > 1 second (with or without your patch)
>> >
>> > Do you have any idea why is this behaviour?
>> >
>> > Thanks in advance.
>> >
>> > Best,
>> > Jesús
>> >
>> > El vie, 14 abr 2023 3:41, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> >
>> >> Hi Jesús,
>> >>
>> >> > Hi Tatsuo,
>> >> >
>> >> > At first, thank you so much for your time to investigate this issue.
>> >>
>> >> No problem.
>> >>
>> >> > I have compiled pgpool 4.3.2 with your patch and the problem with
>> pgbench
>> >> > is solved.
>> >> > I still need to test it in my environment.
>> >> >
>> >> > Anyway, I had a look your code and I have seen that the session is
>> closed
>> >> > only if failover is not completed in 30 seconds.
>> >> > I have the following doubt related to this change. Is this session
>> >> > operative during the failover? I mean, if failover spends 20 seconds,
>> is
>> >> > this session blocked during this time or this session can accept any
>> >> > transaction?
>> >>
>> >> It is likely the session is blocked. The reason for "likely" is the
>> >> function which has the logic inside can be called frequently during
>> >> session but it is not always. It is possible that a pgpool process
>> >> already called the function by the time when failover starts, then
>> >> proceeds and sends a query to backend.
>> >>
>> >> > Let me another question. Should we add this issue as a bug?
>> >>
>> >> No you don't need. Developers already recognize this a bug report.
>> >>
>> >> > Thanks in advance.
>> >> >
>> >> > Best,
>> >> > Jesús
>> >> >
>> >> >
>> >> > El mié, 12 abr 2023 3:33, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> >> >
>> >> >> > However a downside of this is, while failover clients cannot
>> process
>> >> >> > queries or at least slow down processing. Below is the log from
>> >> >> > pgbench using "-P 1" option to show progress. As you can see from
>> 170
>> >> >> > s pgbench starts to slow down and recovers at 194 s. That is, the
>> >> >> > slowdown continued for 24 seconds.
>> >> >> >
>> >> >>
>> >> >> After more research, I suspect the slow down is due to effect of
>> >> >> checkpointing. If I add "-S" option to change the transaction time, I
>> >> >> don't see the slow down anymore.
>> >> >>
>> >> >> Best reagards,
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS LLC
>> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> Japanese:http://www.sraoss.co.jp
>> >> >>
>> >>
>>


More information about the pgpool-general mailing list