[pgpool-general: 9060] Re: Segmentation after switchover

Sun Mar 31 14:52:12 JST 2024

> Hi,
> 
> We've been observing some strange status reports after a controlled
> switchover (needed for a reboot) in our automated tests. In this scenario,
> we have 3 nodes, all running pgpool 4.5.0 and postgresql 12.16. Node 1 is
> both watchdog leader and runs the primary database. Node 2 and 3 reboot
> first, no special care is required, because both are standby. Finally, node
> 1 needs to reboot. We first force pgpool to drop its leader status by
> restarting it and then stop and detach the database to force a failover.
> Next, we bring the database back up to configure it to follow the new
> primary. At this point, we reboot node 1, and this is were something goes
> wrong and pgpool processes segfault throughout the cluster.
> 
> I've collected several coredumps and identified 2 different backtraces. The
> first is most common and occurs on node 2 (the new primary, standby
> watchdog):
> #0  0x0000559e25313126 in get_query_result (slots=0x7fff0ebcff50,
> backend_id=0, query=0x559e253f63b0 "SELECT
> pg_catalog.current_setting('server_version_num')", res=0x7fff0ebcf778) at
> streaming_replication/pool_worker_child.c:682
> #1  0x0000559e252a6ca3 in get_server_version (slots=0x7fff0ebcff50,
> node_id=0) at main/pgpool_main.c:3878
> #2  0x0000559e252a2bd6 in verify_backend_node_status (slots=0x7fff0ebcff50)
> at main/pgpool_main.c:2574
> #3  0x0000559e252a3904 in find_primary_node () at main/pgpool_main.c:2791
> #4  0x0000559e252a3e04 in find_primary_node_repeatedly () at
> main/pgpool_main.c:2892
> #5  0x0000559e252a8a71 in determine_new_primary_node
> (failover_context=0x7fff0ebd0420, node_id=0) at main/pgpool_main.c:4510
> #6  0x0000559e252a0eaf in failover () at main/pgpool_main.c:1719
> #7  0x0000559e252a04f1 in sigusr1_interrupt_processor () at
> main/pgpool_main.c:1507
> #8  0x0000559e252a9df0 in check_requests () at main/pgpool_main.c:4930
> #9  0x0000559e2529e18e in PgpoolMain (discard_status=0 '\000',
> clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:649
> #10 0x0000559e2529b9b2 in main (argc=2, argv=0x7fff0ebdd5c8) at
> main/main.c:365
> 
> The second is a bit less common and occurs on node 3 (standby database, new
> watchdog leader) and always occurs in tandem about 20 seconds after the
> first:
> #0  pfree (pointer=0x55b7fe6ef4f0) at ../../src/utils/mmgr/mcxt.c:956
> #1  0x000055b7fe09944b in free_persistent_db_connection_memory
> (cp=0x55b7fe6ef4c0) at protocol/pool_pg_utils.c:231
> #2  0x000055b7fe0992b6 in make_persistent_db_connection (db_node_id=0,
> hostname=0x7fb6f6346008 "172.29.30.1", port=5432, dbname=0x55b7fe1b408f
> "postgres", user=0x55b7fe724268 "keyhub", password=0x55b7fe6ef408
> "jCyuMFIENk1DU8-VWf2vFMx_DeFv25zTCG_0ZroG", retry=0 '\000')
>     at protocol/pool_pg_utils.c:164
> #3  0x000055b7fe0993a1 in make_persistent_db_connection_noerror
> (db_node_id=0, hostname=0x7fb6f6346008 "172.29.30.1", port=5432,
> dbname=0x55b7fe1b408f "postgres", user=0x55b7fe724268 "keyhub",
> password=0x55b7fe6ef408 "jCyuMFIENk1DU8-VWf2vFMx_DeFv25zTCG_0ZroG",
>     retry=0 '\000') at protocol/pool_pg_utils.c:185
> #4  0x000055b7fe068397 in establish_persistent_connection (node=0) at
> main/health_check.c:365
> #5  0x000055b7fe067b79 in do_health_check_child (node_id=0x7ffc6ef09158) at
> main/health_check.c:199
> #6  0x000055b7fe05ba04 in worker_fork_a_child (type=PT_HEALTH_CHECK,
> func=0x55b7fe067798 <do_health_check_child>, params=0x7ffc6ef09158) at
> main/pgpool_main.c:912
> #7  0x000055b7fe05b0e2 in PgpoolMain (discard_status=0 '\000',
> clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:625
> #8  0x000055b7fe0589b2 in main (argc=2, argv=0x7ffc6ef153a8) at
> main/main.c:365
> 
> After this segfault, the status of node 1 (backend 0) seems to be corrupt
> and is reported as 'Status Name=down' and 'Backend Status Name=up' and it
> does not recover from this. I've attached core dumps for both crashes, the
> reported status and pgpool logs for node 1, 2 and 3. Note that the pgpool
> logs for 1 and 3 report a segfault, while the core dumps are for node 2 and
> 3. I do not have a core dump for node 1 and also no kernel logging about a
> protection fault on that node.
> 
> PS. These logs and core dumps are all from tests, using randomly generated
> passwords and secrets, so these have no value to us.

In the second case make_persistent_db_connection() uses longjmp()
through PG_TRY/PG_CATCH. In this environment, any variable on the
stack (and later used in PG_CATCH block) needs to be declared with
"volatile" modifier so that it is not smashed out when an exception
occurs. The function missed it. Attached is the patch to fix this.

I have not found the cause of the first case yet, but I suspect it is
somewhat related to the error above because in the first case:
> #0  0x0000559e25313126 in get_query_result (slots=0x7fff0ebcff50,

"slot" was previously allocated by make_persistent_db_connection().

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_persistent_db_connection.patch
Type: text/x-patch
Size: 487 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240331/91ee13d6/attachment.bin>