[pgpool-general: 8764] Re: Clients disconnection when slave node is off

Mon May 15 06:49:54 JST 2023

Ok, thank you for your great work.

In this case the failover is due to the slave node.
Taking into account I have no load balancing, I think clients should can
connect to pgpool during this failover because master node is still the
same and it is alive.

Is there a plan to solve this limitation in future releases?

Thanks in advance.

Best,
Jesús

El dom, 14 may 2023 4:06, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:

> Hi Jesús,
>
> > Hi Tatsuo. Thank you very much.
> > Should we add It as a bug in Pgpool-II Bug Tracker?
>
> No, you don't need to as it's a known limitation that pgpool does not
> accept new connection while failover.
>
> > El sáb, 13 may 2023 9:24, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
> >
> >> Ok. The errors were generated while clients tried to connect to
> >> pgpool.  My patch covers the case when failover happens while
> >> connections from clients to pgpool are *kept*.  However the patch does
> >> not cover the case when clients try to establish connection to pgpool
> >> while failover.
> >>
> >> I tested my patch using pgbench. If pgbench is given "-C" (create
> >> connection for each transaction), I get same errors you mentioned.
> >>
> >> I have to admit my patch does not cover all the cases. I need more
> >> time to deal with these problems.
> >>
> >> > Hi,
> >> >
> >> > I think we cannot connect to pgpool. I will show you the output of my
> >> > dbCheck script.
> >> >
> >> >
> >> >    - *Pgpool without patch and backend1 as slave**:*
> >> >
> >> > [root at pg_client1 services]# ./dbcheck.sh $VIP_PGPOOL
> >> >
> >> > psql: ERROR:  do command failed
> >> >
> >> > DETAIL:  backend error: "SFATAL"
> >> >
> >> > psql: ERROR:  unable to read data from DB node 1
> >> >
> >> > DETAIL:  socket read failed with error "Connection reset by peer"
> >> >
> >> > psql: server closed the connection unexpectedly
> >> >
> >> >         This probably means the server terminated abnormally
> >> >
> >> >         before or while processing the request.
> >> >
> >> > psql: server closed the connection unexpectedly
> >> >
> >> >         This probably means the server terminated abnormally
> >> >
> >> >         before or while processing the request.
> >> >
> >> >         This probably means the server terminated abnormally
> >> >
> >> >         before or while processing the request.
> >> >
> >> >
> >> >
> >> >    - *Pgpool with patch and backend1 as slave:*
> >> >
> >> > psql: ERROR:  unable to read message kind
> >> >
> >> > DETAIL:  kind does not match between main(52)
> >> >
> >> >
> >> >
> >> >    - *Pgpool with patch and backend1 as master**:*
> >> >
> >> > psql: ERROR:  unable to read data from DB node
> >> >
> >> > DETAIL:  socket read failed with error "Connection reset by peer"
> >> >
> >> > server closed the connection unexpectedly
> >> >
> >> >         This probably means the server terminated abnormally
> >> >
> >> >         before or while processing the request.
> >> >
> >> > connection to server was lost
> >> >
> >> > server closed the connection unexpectedly
> >> >
> >> >         This probably means the server terminated abnormally
> >> >
> >> >         before or while processing the request.
> >> >
> >> > connection to server was lost
> >> >
> >> >
> >> > Anyway, with a client which uses ODBC, if it tries to access the
> database
> >> > during failover (from slave node) the following error is displayed:
> >> "Driver
> >> > Unable to Establish Connection with Data Source".
> >> >
> >> > El vie, 12 may 2023 a las 9:40, Tatsuo Ishii (<ishii at sraoss.co.jp>)
> >> > escribió:
> >> >
> >> >> What do you mean by "database is not available"?
> >> >>
> >> >> 1. You can connect to pgpool but pgpool does not reply back.
> >> >>
> >> >> 2. You can cannect to pgpool but pgpool immediately disconnects.
> >> >>
> >> >> > Hi Tatsuo,
> >> >> >
> >> >> > I'm working with your patch but I continue facing a problem because
> >> the
> >> >> > database is not available during 1 second aprox (I have a script
> >> calling
> >> >> > select query every 0.1 seconds to check the time is not available
> the
> >> >> > database).
> >> >> >
> >> >> > I will explain two different cases:
> >> >> >
> >> >> > 1. Slave node (backend1 in pgpool.conf) is turn off. With your
> patch
> >> the
> >> >> > database is always available. Without your patch the database is
> not
> >> >> > available during 1 second.
> >> >> > 2. Master node (backend0) is turn off. Failover is done to promote
> >> >> > backend1. After that, I turn on again backend0, which is now slave
> >> node.
> >> >> If
> >> >> > I turn off this slave node (backend0), the database is not
> available
> >> >> during
> >> >> > 1 second (with or without your patch)
> >> >> >
> >> >> > Do you have any idea why is this behaviour?
> >> >> >
> >> >> > Thanks in advance.
> >> >> >
> >> >> > Best,
> >> >> > Jesús
> >> >> >
> >> >> > El vie, 14 abr 2023 3:41, Tatsuo Ishii <ishii at sraoss.co.jp>
> escribió:
> >> >> >
> >> >> >> Hi Jesús,
> >> >> >>
> >> >> >> > Hi Tatsuo,
> >> >> >> >
> >> >> >> > At first, thank you so much for your time to investigate this
> >> issue.
> >> >> >>
> >> >> >> No problem.
> >> >> >>
> >> >> >> > I have compiled pgpool 4.3.2 with your patch and the problem
> with
> >> >> pgbench
> >> >> >> > is solved.
> >> >> >> > I still need to test it in my environment.
> >> >> >> >
> >> >> >> > Anyway, I had a look your code and I have seen that the session
> is
> >> >> closed
> >> >> >> > only if failover is not completed in 30 seconds.
> >> >> >> > I have the following doubt related to this change. Is this
> session
> >> >> >> > operative during the failover? I mean, if failover spends 20
> >> seconds,
> >> >> is
> >> >> >> > this session blocked during this time or this session can accept
> >> any
> >> >> >> > transaction?
> >> >> >>
> >> >> >> It is likely the session is blocked. The reason for "likely" is
> the
> >> >> >> function which has the logic inside can be called frequently
> during
> >> >> >> session but it is not always. It is possible that a pgpool process
> >> >> >> already called the function by the time when failover starts, then
> >> >> >> proceeds and sends a query to backend.
> >> >> >>
> >> >> >> > Let me another question. Should we add this issue as a bug?
> >> >> >>
> >> >> >> No you don't need. Developers already recognize this a bug report.
> >> >> >>
> >> >> >> > Thanks in advance.
> >> >> >> >
> >> >> >> > Best,
> >> >> >> > Jesús
> >> >> >> >
> >> >> >> >
> >> >> >> > El mié, 12 abr 2023 3:33, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> escribió:
> >> >> >> >
> >> >> >> >> > However a downside of this is, while failover clients cannot
> >> >> process
> >> >> >> >> > queries or at least slow down processing. Below is the log
> from
> >> >> >> >> > pgbench using "-P 1" option to show progress. As you can see
> >> from
> >> >> 170
> >> >> >> >> > s pgbench starts to slow down and recovers at 194 s. That is,
> >> the
> >> >> >> >> > slowdown continued for 24 seconds.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> After more research, I suspect the slow down is due to effect
> of
> >> >> >> >> checkpointing. If I add "-S" option to change the transaction
> >> time, I
> >> >> >> >> don't see the slow down anymore.
> >> >> >> >>
> >> >> >> >> Best reagards,
> >> >> >> >> --
> >> >> >> >> Tatsuo Ishii
> >> >> >> >> SRA OSS LLC
> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230514/e2cd832f/attachment.htm>