<div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Ishii-San</div><div dir="ltr"><div><br></div><div>Please find the attached the debug log and pgpool.conf </div><div><br></div><div>The output of the stack trace, session context and query context is pasted below</div><div><br></div><div><b>--STACK-TRACE--</b></div><div><div><font face="monospace, monospace">#0 0x00007f3d5d218c53 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81<br></font></div><div><font face="monospace, monospace">#1 0x000000000042e086 in pool_check_fd (cp=cp@entry=0x7f3d5efb43a0) at protocol/pool_process_query.c:696</font></div><div><font face="monospace, monospace">#2 0x000000000042e356 in pool_check_fd (cp=cp@entry=0x7f3d5efb43a0) at protocol/pool_process_query.c:728</font></div><div><font face="monospace, monospace">#3 0x000000000045fafc in pool_read (cp=0x7f3d5efb43a0, buf=buf@entry=0x7fffbc4830b5, len=len@entry=1) at utils/pool_stream.c:166</font></div><div><font face="monospace, monospace">#4 0x000000000043303b in read_kind_from_backend (frontend=frontend@entry=0x2314e98, backend=backend@entry=0x7f3d5efaf750, decided_kind=decided_kind@entry=0x7fffbc4834ba "D")</font></div><div><font face="monospace, monospace"> at protocol/pool_process_query.c:3379</font></div><div><font face="monospace, monospace">#5 0x000000000043f402 in ProcessBackendResponse (frontend=frontend@entry=0x2314e98, backend=backend@entry=0x7f3d5efaf750, state=state@entry=0x7fffbc483524, </font></div><div><font face="monospace, monospace"> num_fields=num_fields@entry=0x7fffbc483522) at protocol/pool_proto_modules.c:2663</font></div><div><font face="monospace, monospace">#6 0x0000000000431ea7 in pool_process_query (frontend=0x2314e98, backend=0x7f3d5efaf750, reset_request=reset_request@entry=0) at protocol/pool_process_query.c:320</font></div><div><font face="monospace, monospace">#7 0x000000000042cd49 in do_child (fds=fds@entry=0x23134b0) at protocol/child.c:388</font></div><div><font face="monospace, monospace">#8 0x00000000004090a5 in fork_a_child (fds=0x23134b0, id=0) at main/pgpool_main.c:659</font></div><div><font face="monospace, monospace">#9 0x000000000040fb73 in PgpoolMain (discard_status=discard_status@entry=0 '\000', clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:378</font></div><div><font face="monospace, monospace">#10 0x0000000000407777 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:349</font></div><div><br></div></div><div><b>--SESSION CONTEXT--</b><br></div><div><div><font face="monospace, monospace">(gdb) print *session_context</font></div><div><font face="monospace, monospace">$4 = {process_context = 0x77c260 <process_context_d>, frontend = 0x2314e98, backend = 0x7f3d5efaf750, in_progress = 0 '\000', doing_extended_query_message = 0 '\000', </font></div><div><font face="monospace, monospace"> need_to_restore_where_to_send = 0 '\000', where_to_send_save = "\001", '\000' <repeats 126 times>, command_success = 1 '\001', writing_transaction = 0 '\000', </font></div><div><font face="monospace, monospace"> failed_transaction = 0 '\000', skip_reading_from_backends = 0 '\000', ignore_till_sync = 0 '\000', transaction_isolation = POOL_UNKNOWN, query_context = 0x248d418, </font></div><div><font face="monospace, monospace"> memory_context = 0x22f2558, uncompleted_message = 0x231aeb8, message_list = {capacity = 64, size = 63, sent_messages = 0x2351f70}, load_balance_node_id = 0, </font></div><div><font face="monospace, monospace"> mismatch_ntuples = 0 '\000', ntuples = {0, -1, 0 <repeats 126 times>}, reset_context = 0 '\000', query_cache_array = 0x0, num_selects = 0, pending_messages = 0x231b020, </font></div><div><font face="monospace, monospace"> previous_message_exists = 0 '\000', previous_message = {type = POOL_EXECUTE, contents = 0x24704f0 "SS_13", contents_len = 5, query = "COMMIT", '\000' <repeats 1017 times>, </font></div><div><font face="monospace, monospace"> statement = "\000_46", '\000' <repeats 123 times>, portal = '\000' <repeats 127 times>, is_rows_returned = 0 '\000', not_forward_to_frontend = 0 '\000', node_ids = {0, -1}, </font></div><div><font face="monospace, monospace"> query_context = 0x242ff68}, major = 3, minor = 0}</font></div><div><font face="monospace, monospace"><br></font></div><div><b>--QEURY-TRACE--</b><font face="monospace, monospace"><br></font></div><div><span style="font-family:monospace,monospace">(gdb) print *session_context->query_context</span><br></div><div><font face="monospace, monospace">$5 = {original_query = 0x248e0e8 "BEGIN", rewritten_query = 0x0, original_length = 6, rewritten_length = -1, parse_tree = 0x248e038, rewritten_parse_tree = 0x0, </font></div><div><font face="monospace, monospace"> where_to_send = "\001", '\000' <repeats 126 times>, virtual_master_node_id = 0, query_state = {POOL_UNPARSED <repeats 128 times>}, is_cache_safe = 0 '\000', temp_cache = 0x0, </font></div><div><font face="monospace, monospace"> is_multi_statement = 0 '\000', dboid = 0, query_w_hex = 0x0, is_parse_error = 0 '\000', num_original_params = -1, pg_terminate_backend_conn = 0x0, skip_cache_commit = 0 '\000', </font></div><div><font face="monospace, monospace"> memory_context = 0x7f3d5efbff20}</font></div><div><br></div></div><div>Thanks<font face="monospace, monospace"><br></font></div><div>Best Regards</div><div>Muhammad Usama</div><div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Dec 7, 2018 at 3:49 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Usama,<br>
<br>
I need:<br>
<br>
- exact Pgpool-II version<br>
- pgpool.conf<br>
- gdb print command result of *pool_session_context when Pgpool-II hangs<br>
<br>
Also a full debug log will be helpful.<br>
<br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
<br>
> Hi Ishii San<br>
> <br>
> Thanks for looking into this, As discussed over the call please find the<br>
> attached pgpool log and stack trace for the stuck issue.<br>
> I am currently not able to trim down the test case so sorry for throwing a<br>
> huge log file.<br>
> There is no special configuration on PostgreSQL server side and this issue<br>
> only happens when the load balance node and the primary node are same<br>
> and as you can see from the stack trace the stuck happens in the<br>
> read_kind_from_backend() function and the reason for that is<br>
> the function is trying to read response from the backend node 1(standby)<br>
> while it has only sent the query to the backend node 0 (primary)<br>
> <br>
> --stack-trace--<br>
> #0 0x00007f1775216c53 in __select_nocancel () at<br>
> ../sysdeps/unix/syscall-template.S:81<br>
> #1 0x000000000042def6 in pool_check_fd (cp=cp@entry=0x7f1776fb2328) at<br>
> protocol/pool_process_query.c:696<br>
> #2 0x000000000042e1c6 in pool_check_fd (cp=cp@entry=0x7f1776fb2328) at<br>
> protocol/pool_process_query.c:728<br>
> #3 0x000000000045f92c in pool_read (cp=0x7f1776fb2328,<br>
> buf=buf@entry=0x7ffebb5f2345,<br>
> len=len@entry=1) at utils/pool_stream.c:166<br>
> #4 0x0000000000432eab in read_kind_from_backend<br>
> (frontend=frontend@entry=0x2292e88,<br>
> backend=backend@entry=0x7f1776fad6d8,<br>
> decided_kind=decided_kind@entry=0x7ffebb5f274a<br>
> "D")<br>
> at protocol/pool_process_query.c:3379<br>
> #5 0x000000000043f232 in ProcessBackendResponse<br>
> (frontend=frontend@entry=0x2292e88,<br>
> backend=backend@entry=0x7f1776fad6d8, state=state@entry=0x7ffebb5f27b4,<br>
> num_fields=num_fields@entry=0x7ffebb5f27b2) at<br>
> protocol/pool_proto_modules.c:2653<br>
> #6 0x0000000000431d17 in pool_process_query (frontend=0x2292e88,<br>
> backend=0x7f1776fad6d8, reset_request=reset_request@entry=0) at<br>
> protocol/pool_process_query.c:320<br>
> #7 0x000000000042cbb9 in do_child (fds=fds@entry=0x2291370) at<br>
> protocol/child.c:387<br>
> #8 0x00000000004090a5 in fork_a_child (fds=0x2291370, id=2) at<br>
> main/pgpool_main.c:659<br>
> #9 0x000000000040fb73 in PgpoolMain (discard_status=discard_status@entry=0<br>
> '\000', clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at<br>
> main/pgpool_main.c:378<br>
> #10 0x0000000000407777 in main (argc=<optimized out>, argv=<optimized out>)<br>
> at main/main.c:361<br>
> <br>
> Thanks<br>
> Best Regards<br>
> Muhammad Usama<br>
> <br>
> <br>
> <br>
> On Thu, Dec 6, 2018 at 3:52 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>
> <br>
>> > Hi Ishii San<br>
>> ><br>
>> > I was investigating the issue faced by one of Pgpool-II user who reported<br>
>> > that after executing a certain transaction (large amount of data DELETE /<br>
>> > INSERT / UPDATE, COMMIT), the next execution of the BEGIN statement<br>
>> > hangs(using the extended query protocol).<br>
>> ><br>
>> > I think the problem is in pool_is_node_to_be_sent_in_current_query()<br>
>> > function. The issue is in streaming replication mode, when<br>
>> > pool_is_query_in_progress is not set, the function always returns true.<br>
>> Now<br>
>> > consider that we have two backend servers 1 primary and one standby. And<br>
>> > for a particular session the primary node gets selected as a load balance<br>
>> > node. So effectively that session should only be sending queries and<br>
>> > reading responses from primary server alone. But with the current<br>
>> > implementation of pool_is_node_to_be_sent_in_current_query() the<br>
>> > VALID_BACKEND macro will return true for standby node as well when the<br>
>> > query in progress flag is not set(which is wrong when standby node is not<br>
>> > selected for load balancing). and if this happens in let say<br>
>> > read_kind_from_backend() function we will get a stuck, since the<br>
>> > read_kind_from_backend() will be trying to read the response from standby<br>
>> > node while we have never sent any query to that node.<br>
>> ><br>
>> ><br>
>> > Can you please take a look at the attached patch, I am not sure if it<br>
>> will<br>
>> > have some side effects or not.<br>
>><br>
>> Unfortunately the patch would not help and probably would break many<br>
>> cases which are working today.<br>
>><br>
>> For the scenario you mentioned is already handled by<br>
>> pool_pending_message_* functions defined in<br>
>> context/query_context.c. When a message, for example parse, is sent<br>
>> from frontend and forwarded to backends, the info is recorded in an<br>
>> FIFO queue by pool_pending_message_add() along with info which of<br>
>> backend the message is forwarded to. When the parse message will be<br>
>> replied back with "parse complete message" from the backend,<br>
>> read_kind_message_from_backend() can know from which of backend it<br>
>> should read a message by looking at the head of the queue. Once the<br>
>> message is retrieved (in this example parse complete message), an<br>
>> entry in the queue (in this example parse message) is pulled out and<br>
>> removed.<br>
>><br>
>> This mechanism usually works well unless backend sends unpredicted<br>
>> message. Recently one of such a case was reported:<br>
>> <a href="https://www.pgpool.net/mantisbt/view.php?id=448" rel="noreferrer" target="_blank">https://www.pgpool.net/mantisbt/view.php?id=448</a><br>
>><br>
>> In the report PostgreSQL sends an error message caused by<br>
>> idle_in_transaction_timeout setting which Pgpool-II does not expect.<br>
>><br>
>> Does the user has any uncommon setting in postgresql.conf?<br>
>><br>
>> Best regards,<br>
>> --<br>
>> Tatsuo Ishii<br>
>> SRA OSS, Inc. Japan<br>
>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
>><br>
</blockquote></div>