[pgpool-general: 8980] Re: Clients disconnection when slave node is off

Tatsuo Ishii ishii at sraoss.co.jp
Fri Dec 8 06:21:44 JST 2023


Hi Jesus,

> Hi Tatsuo,
> 
> I just downloaded the 4.5RC1 version to test it to check if
> clients disconnections are solved.
> The problem is that I cannot compile this version in RHEL5. We had no
> problems to compile version 4.3.2.
> Is the new version compatible with RHEL5 operating system?

No. In our policy, we do not do tests on EOLed OS's like RHEL5.

> Please find attached info from my OS:
> 
> Red Hat Enterprise Linux Client release 5.8 (Tikanga)
> gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52)
> ldd (GNU libc) 2.5
> Postgres 12.5
> 
> The compilation error is:
> 
> gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../src/include  -D_GNU_SOURCE -I
> ../../src/include/parser -I /opt/postgresql/include   -g -O2 -Wall
> -Wmissing-prototypes -Wmissing-declarations -fno-strict-aliasing -c -o
> copyfuncs.o copyfuncs.c
> In file included from ../../src/include/parser/parsenodes.h:28,
>                  from copyfuncs.c:30:
> ../../src/include/parser/primnodes.h:27: error: redefinition of type
> ‘TransactionId’
> ../../src/include/parser/pg_list.h:50: error: previous declaration of ‘
> TransactionId’ was here

I have looked into this and found that there are two places where
TransactionId is defined, which is not good. Attached is a one-line
patch to fix that. If you like, please try it out.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


> Thank you for your support.
> 
> Best,
> Jesús
> 
> 
> El lun, 15 may 2023 a las 4:14, Tatsuo Ishii (<ishii at sraoss.co.jp>)
> escribió:
> 
>> > Ok, thank you for your great work.
>> >
>> > In this case the failover is due to the slave node.
>> > Taking into account I have no load balancing, I think clients should can
>> > connect to pgpool during this failover because master node is still the
>> > same and it is alive.
>>
>> There are some code paths to access the standby node, which triggers
>> session disconnection:
>>
>> 1. When client connects to pgpool, pgpool tries to connect to standby
>>    PostgreSQL.
>>
>> 2. When client connects to pgpool, pgpool sends SET application_name
>>    command.
>>
>> 3. Pgpool detects the PostgreSQL shutdown event even if it's from
>>    standby, which results in session disconnection.
>>
>> Although I am not sure if I can eliminate all the code paths above, I
>> will try for upcoming Pgpool 4.5.
>>
>> > Is there a plan to solve this limitation in future releases?
>> >
>> > Thanks in advance.
>> >
>> > Best,
>> > Jesús
>> >
>> > El dom, 14 may 2023 4:06, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> >
>> >> Hi Jesús,
>> >>
>> >> > Hi Tatsuo. Thank you very much.
>> >> > Should we add It as a bug in Pgpool-II Bug Tracker?
>> >>
>> >> No, you don't need to as it's a known limitation that pgpool does not
>> >> accept new connection while failover.
>> >>
>> >> > El sáb, 13 may 2023 9:24, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> >> >
>> >> >> Ok. The errors were generated while clients tried to connect to
>> >> >> pgpool.  My patch covers the case when failover happens while
>> >> >> connections from clients to pgpool are *kept*.  However the patch
>> does
>> >> >> not cover the case when clients try to establish connection to pgpool
>> >> >> while failover.
>> >> >>
>> >> >> I tested my patch using pgbench. If pgbench is given "-C" (create
>> >> >> connection for each transaction), I get same errors you mentioned.
>> >> >>
>> >> >> I have to admit my patch does not cover all the cases. I need more
>> >> >> time to deal with these problems.
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > I think we cannot connect to pgpool. I will show you the output of
>> my
>> >> >> > dbCheck script.
>> >> >> >
>> >> >> >
>> >> >> >    - *Pgpool without patch and backend1 as slave**:*
>> >> >> >
>> >> >> > [root at pg_client1 services]# ./dbcheck.sh $VIP_PGPOOL
>> >> >> >
>> >> >> > psql: ERROR:  do command failed
>> >> >> >
>> >> >> > DETAIL:  backend error: "SFATAL"
>> >> >> >
>> >> >> > psql: ERROR:  unable to read data from DB node 1
>> >> >> >
>> >> >> > DETAIL:  socket read failed with error "Connection reset by peer"
>> >> >> >
>> >> >> > psql: server closed the connection unexpectedly
>> >> >> >
>> >> >> >         This probably means the server terminated abnormally
>> >> >> >
>> >> >> >         before or while processing the request.
>> >> >> >
>> >> >> > psql: server closed the connection unexpectedly
>> >> >> >
>> >> >> >         This probably means the server terminated abnormally
>> >> >> >
>> >> >> >         before or while processing the request.
>> >> >> >
>> >> >> >         This probably means the server terminated abnormally
>> >> >> >
>> >> >> >         before or while processing the request.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >    - *Pgpool with patch and backend1 as slave:*
>> >> >> >
>> >> >> > psql: ERROR:  unable to read message kind
>> >> >> >
>> >> >> > DETAIL:  kind does not match between main(52)
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >    - *Pgpool with patch and backend1 as master**:*
>> >> >> >
>> >> >> > psql: ERROR:  unable to read data from DB node
>> >> >> >
>> >> >> > DETAIL:  socket read failed with error "Connection reset by peer"
>> >> >> >
>> >> >> > server closed the connection unexpectedly
>> >> >> >
>> >> >> >         This probably means the server terminated abnormally
>> >> >> >
>> >> >> >         before or while processing the request.
>> >> >> >
>> >> >> > connection to server was lost
>> >> >> >
>> >> >> > server closed the connection unexpectedly
>> >> >> >
>> >> >> >         This probably means the server terminated abnormally
>> >> >> >
>> >> >> >         before or while processing the request.
>> >> >> >
>> >> >> > connection to server was lost
>> >> >> >
>> >> >> >
>> >> >> > Anyway, with a client which uses ODBC, if it tries to access the
>> >> database
>> >> >> > during failover (from slave node) the following error is displayed:
>> >> >> "Driver
>> >> >> > Unable to Establish Connection with Data Source".
>> >> >> >
>> >> >> > El vie, 12 may 2023 a las 9:40, Tatsuo Ishii (<ishii at sraoss.co.jp
>> >)
>> >> >> > escribió:
>> >> >> >
>> >> >> >> What do you mean by "database is not available"?
>> >> >> >>
>> >> >> >> 1. You can connect to pgpool but pgpool does not reply back.
>> >> >> >>
>> >> >> >> 2. You can cannect to pgpool but pgpool immediately disconnects.
>> >> >> >>
>> >> >> >> > Hi Tatsuo,
>> >> >> >> >
>> >> >> >> > I'm working with your patch but I continue facing a problem
>> because
>> >> >> the
>> >> >> >> > database is not available during 1 second aprox (I have a script
>> >> >> calling
>> >> >> >> > select query every 0.1 seconds to check the time is not
>> available
>> >> the
>> >> >> >> > database).
>> >> >> >> >
>> >> >> >> > I will explain two different cases:
>> >> >> >> >
>> >> >> >> > 1. Slave node (backend1 in pgpool.conf) is turn off. With your
>> >> patch
>> >> >> the
>> >> >> >> > database is always available. Without your patch the database is
>> >> not
>> >> >> >> > available during 1 second.
>> >> >> >> > 2. Master node (backend0) is turn off. Failover is done to
>> promote
>> >> >> >> > backend1. After that, I turn on again backend0, which is now
>> slave
>> >> >> node.
>> >> >> >> If
>> >> >> >> > I turn off this slave node (backend0), the database is not
>> >> available
>> >> >> >> during
>> >> >> >> > 1 second (with or without your patch)
>> >> >> >> >
>> >> >> >> > Do you have any idea why is this behaviour?
>> >> >> >> >
>> >> >> >> > Thanks in advance.
>> >> >> >> >
>> >> >> >> > Best,
>> >> >> >> > Jesús
>> >> >> >> >
>> >> >> >> > El vie, 14 abr 2023 3:41, Tatsuo Ishii <ishii at sraoss.co.jp>
>> >> escribió:
>> >> >> >> >
>> >> >> >> >> Hi Jesús,
>> >> >> >> >>
>> >> >> >> >> > Hi Tatsuo,
>> >> >> >> >> >
>> >> >> >> >> > At first, thank you so much for your time to investigate this
>> >> >> issue.
>> >> >> >> >>
>> >> >> >> >> No problem.
>> >> >> >> >>
>> >> >> >> >> > I have compiled pgpool 4.3.2 with your patch and the problem
>> >> with
>> >> >> >> pgbench
>> >> >> >> >> > is solved.
>> >> >> >> >> > I still need to test it in my environment.
>> >> >> >> >> >
>> >> >> >> >> > Anyway, I had a look your code and I have seen that the
>> session
>> >> is
>> >> >> >> closed
>> >> >> >> >> > only if failover is not completed in 30 seconds.
>> >> >> >> >> > I have the following doubt related to this change. Is this
>> >> session
>> >> >> >> >> > operative during the failover? I mean, if failover spends 20
>> >> >> seconds,
>> >> >> >> is
>> >> >> >> >> > this session blocked during this time or this session can
>> accept
>> >> >> any
>> >> >> >> >> > transaction?
>> >> >> >> >>
>> >> >> >> >> It is likely the session is blocked. The reason for "likely" is
>> >> the
>> >> >> >> >> function which has the logic inside can be called frequently
>> >> during
>> >> >> >> >> session but it is not always. It is possible that a pgpool
>> process
>> >> >> >> >> already called the function by the time when failover starts,
>> then
>> >> >> >> >> proceeds and sends a query to backend.
>> >> >> >> >>
>> >> >> >> >> > Let me another question. Should we add this issue as a bug?
>> >> >> >> >>
>> >> >> >> >> No you don't need. Developers already recognize this a bug
>> report.
>> >> >> >> >>
>> >> >> >> >> > Thanks in advance.
>> >> >> >> >> >
>> >> >> >> >> > Best,
>> >> >> >> >> > Jesús
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > El mié, 12 abr 2023 3:33, Tatsuo Ishii <ishii at sraoss.co.jp>
>> >> >> escribió:
>> >> >> >> >> >
>> >> >> >> >> >> > However a downside of this is, while failover clients
>> cannot
>> >> >> >> process
>> >> >> >> >> >> > queries or at least slow down processing. Below is the log
>> >> from
>> >> >> >> >> >> > pgbench using "-P 1" option to show progress. As you can
>> see
>> >> >> from
>> >> >> >> 170
>> >> >> >> >> >> > s pgbench starts to slow down and recovers at 194 s. That
>> is,
>> >> >> the
>> >> >> >> >> >> > slowdown continued for 24 seconds.
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> After more research, I suspect the slow down is due to
>> effect
>> >> of
>> >> >> >> >> >> checkpointing. If I add "-S" option to change the
>> transaction
>> >> >> time, I
>> >> >> >> >> >> don't see the slow down anymore.
>> >> >> >> >> >>
>> >> >> >> >> >> Best reagards,
>> >> >> >> >> >> --
>> >> >> >> >> >> Tatsuo Ishii
>> >> >> >> >> >> SRA OSS LLC
>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_compile_error.patch
Type: text/x-patch
Size: 418 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231208/76a2aff5/attachment.bin>


More information about the pgpool-general mailing list