[pgpool-hackers: 1151] Re: pgpool-II 3.4 problems

Muhammad Usama m.usama at gmail.com
Wed Nov 11 01:03:26 JST 2015


Hi Ishii San

Can you please have a look at the attached patch for 3.4 branch.
Basically it introduces a new error level FRONTEND_ERROR for ereport(), to
inform the error handler not to cache the current backend connection. Other
than that, this FRONTEND_ERROR error level behaves and works similar to the
ereport(ERROR..)

Thanks
Best regards
Muhammad Usama


On Thu, Oct 15, 2015 at 4:45 PM, Muhammad Usama <m.usama at gmail.com> wrote:

>
>
> On Thu, Oct 15, 2015 at 2:23 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
>
>> Usama,
>>
>> Can you please propose a patch to implement similar thing with 3.4?  I
>> think it will make 3.4 more robust. I have no idea how to implement
>> it with 3.4's exception mechanism.
>>
>
> sure I will work on this
>
> Thanks
> best regards
> Muhammad Usama
>
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi Ishii San
>> >
>> > I think this difference of behavior between 3.3. And 3.4 is caused by
>> the
>> > commit "cb735f22441001b6afdbb5bac72808db66094ca9" that was not ported to
>> > the 3.4 branch.
>> >
>> >
>> http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=cb735f22441001b6afdbb5bac72808db66094ca9
>> >
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>> >
>> >
>> >
>> > On Tue, Oct 13, 2015 at 2:45 PM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >
>> >> > Usama,
>> >> >
>> >> > While I have been working on:
>> >> > http://www.pgpool.net/mantisbt/view.php?id=147
>> >>
>> >> Oops. This is wrong. Here is the correct one.
>> >>
>> >> http://www.pgpool.net/mantisbt/view.php?id=145
>> >>
>> >> > I found there was an interesting behavior change between pgpool-II
>> 3.3
>> >> > and 3.4.
>> >> >
>> >> > It turns out that the bug was related to the action when
>> >> > client_idle_limit is reached. In 3.3, when client_idle_limit is
>> >> > reached, following statement is executed (pool_process_query.c):
>> >> >
>> >> > if (idle_count > pool_config->client_idle_limit)
>> >> > {
>> >> >       pool_log("pool_process_query: child connection forced to
>> terminate
>> >> due to client_idle_limit (%d) reached,
>> >> >                pool_config->client_idle_limit);
>> >> >               pool_send_error_message(frontend, MAJOR(backend),
>> >> >               "57000", "connection terminated due to client idle
>> limit
>> >> reached",
>> >> >                                       "","",  __FILE__, __LINE__);
>> >> >       return POOL_END_WITH_FRONTEND_ERROR;
>> >> > }
>> >> >
>> >> > then goes to this (child.c)
>> >> >
>> >> > if (status == POOL_END_WITH_FRONTEND_ERROR ||
>> >> >    pool_config->connection_cache == 0 ||
>> >> >    !strcmp(sp->database, "template0") ||
>> >> >    !strcmp(sp->database, "template1") ||
>> >> >    !strcmp(sp->database, "postgres") ||
>> >> >    !strcmp(sp->database, "regression"))
>> >> > {
>> >> >       reset_connection();
>> >> >       pool_close(frontend);
>> >> >       pool_send_frontend_exits(backend);
>> >> >       pool_discard_cp(sp->user, sp->database, sp->major);
>> >> > }
>> >> >
>> >> > This results in closing connection to backend.
>> >> >
>> >> > In 3.4,
>> >> >
>> >> > if (idle_count > pool_config->client_idle_limit)
>> >> > {
>> >> >       ereport(ERROR,
>> >> >            (pool_error_code("57000"),
>> >> >                errmsg("unable to read data"),
>> >> >                  errdetail("child connection forced to terminate due
>> to
>> >> client_idle_limit:%d is reached",
>> >> >                   pool_config->client_idle_limit)));
>> >> > }
>> >> >
>> >> > Then jump to backend_cleanup() and call pool_process_query() to
>> >> > execute queries defined in reset_query_list. This results in sending
>> >> > typically "DISCARD ALL" to backend. The problem reported in the bug
>> id
>> >> > 147 happens in this route (actually the reporter claims that
>> pgpool-II
>> >> > hangs in sending "DISCARD").
>> >> >
>> >> > In summary, when client idle limit reaches, pgpool-II 3.3 disconnects
>> >> > connections to backend, while 3.4 does not close the connections to
>> >> > backend and sends DISCARD to backend.
>> >> >
>> >> > We have a few reports that pgpool-II 3.4 hangs while sending DISCARD
>> >> > and I wonder those problems somewhat related to the behavior change
>> >> > described above because the code path in 3.4 is executed even if
>> >> > client_idle_limit is not reached (elog(ERROR...)).
>> >> >
>> >> > Usama,
>> >> >
>> >> > Was this behavior change intentional?
>> >> >
>> >> > Best regards,
>> >> > --
>> >> > Tatsuo Ishii
>> >> > SRA OSS, Inc. Japan
>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> > Japanese:http://www.sraoss.co.jp
>> >> > _______________________________________________
>> >> > pgpool-hackers mailing list
>> >> > pgpool-hackers at pgpool.net
>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> >> _______________________________________________
>> >> pgpool-hackers mailing list
>> >> pgpool-hackers at pgpool.net
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20151110/59a1ef6c/attachment-0001.html>
-------------- next part --------------
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index fd7ab1d..ecaf801 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -118,6 +118,10 @@ typedef enum
 #endif
 #define FATAL		21			/* fatal error - abort process */
 #define PANIC		22			/* take down the other backends with me */
+/* pgpool-II extension. This is same as ERROR but sets the
+ * do not cache connection flag before transforming to ERROR.
+ */
+#define FRONTEND_ERROR	23			/* transformed to ERROR at errstart */
 
  /* #define DEBUG DEBUG1 */	/* Backward compatibility with pre-7.3 */
 #define POOL_EXIT_SUCCESS	0	/* failure exit and child gets restarted*/
@@ -280,6 +284,7 @@ extern int	errposition(int cursorpos);
 
 #define pg_unreachable() exit(0)
 
+extern int  getfrontendinvalid(void);
 extern int	geterrcode(void);
 extern int	geterrposition(void);
 extern int	getinternalerrposition(void);
@@ -431,6 +436,7 @@ typedef struct ErrorData
 	const char *domain;			/* message domain */
 	const char *context_domain; /* message domain for context message */
 	int			sqlerrcode;		/* encoded ERRSTATE */
+	int			frontend_invalid;/* non zero means frontend connection is invalid */
 	char	   *pgpool_errcode;	/* error code to be sent to client */
 	char	   *message;		/* primary error message */
 	char	   *detail;			/* detail error message */
diff --git a/src/protocol/child.c b/src/protocol/child.c
index c0cc066..7ca1079 100644
--- a/src/protocol/child.c
+++ b/src/protocol/child.c
@@ -87,7 +87,7 @@ static POOL_CONNECTION *get_connection(int front_end_fd, SockAddr *saddr);
 static POOL_CONNECTION_POOL *get_backend_connection(POOL_CONNECTION *frontend);
 static StartupPacket *StartupPacketCopy(StartupPacket *sp);
 static void print_process_status(char *remote_host,char* remote_port);
-static bool backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend);
+static bool backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend, int frontend_invalid);
 static void free_persisten_db_connection_memory(POOL_CONNECTION_POOL_SLOT *cp);
 static int choose_db_node_id(char *str);
 static void child_will_go_down(int code, Datum arg);
@@ -211,6 +211,8 @@ void do_child(int *fds)
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
 	{
 		POOL_PROCESS_CONTEXT *session;
+		int frontend_invalid = getfrontendinvalid();
+
 		disable_authentication_timeout();
 		/* Since not using PG_TRY, must reset error stack by hand */
 		error_context_stack = NULL;
@@ -233,7 +235,7 @@ void do_child(int *fds)
 		if (accepted)
 			connection_count_down();
 
-		backend_cleanup(&child_frontend, backend);
+		backend_cleanup(&child_frontend, backend, frontend_invalid);
 
 		session = pool_get_process_context();
 
@@ -372,7 +374,7 @@ void do_child(int *fds)
 			status = pool_process_query(child_frontend, backend, 0);
             if(status != POOL_CONTINUE)
             {
-                backend_cleanup(&child_frontend, backend);
+                backend_cleanup(&child_frontend, backend, 0);
                 break;
             }
 		}
@@ -415,7 +417,7 @@ void do_child(int *fds)
  * return true if backend connection is cached
  */
 static bool
-backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend)
+backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend, int frontend_invalid)
 {
     StartupPacket *sp;
     bool cache_connection = false;
@@ -426,11 +428,11 @@ backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volat
     sp = MASTER_CONNECTION(backend)->sp;
 
     /*
-     * cach connection if connection is not for
+     * cach connection if frontend is not invalid, connection is not for
      * system db and connection cache configuration
      * parameter is enabled
      */
-    if (sp && pool_config->connection_cache != 0 && sp->system_db == false)
+    if (sp && pool_config->connection_cache != 0 && sp->system_db == false  && frontend_invalid == 0 )
     {
         if (*frontend)
         {
@@ -453,20 +455,22 @@ backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volat
         }
     }
 
-	/*
-	 * For those special databases, and when frontend client exits abnormally,
-	 * we don't cache connection to backend.
-	 */
-    if ((sp &&
-		(!strcmp(sp->database, "template0") ||
-		 !strcmp(sp->database, "template1") ||
-		 !strcmp(sp->database, "postgres") ||
-		 !strcmp(sp->database, "regression"))) ||
-		 (*frontend != NULL &&
-		  ((*frontend)->socket_state == POOL_SOCKET_EOF ||
-		  (*frontend)->socket_state == POOL_SOCKET_ERROR)))
-        cache_connection = false;
-
+	if (cache_connection)
+	{
+		/*
+		 * For those special databases, and when frontend client exits abnormally,
+		 * we don't cache connection to backend.
+		 */
+		if ((sp &&
+			(!strcmp(sp->database, "template0") ||
+			 !strcmp(sp->database, "template1") ||
+			 !strcmp(sp->database, "postgres") ||
+			 !strcmp(sp->database, "regression"))) ||
+			 (*frontend != NULL &&
+			  ((*frontend)->socket_state == POOL_SOCKET_EOF ||
+			  (*frontend)->socket_state == POOL_SOCKET_ERROR)))
+			cache_connection = false;
+	}
 	/*
 	 * Close frontend connection
 	 */
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index ff97bf1..271f19f 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4695,7 +4695,7 @@ SELECT_RETRY:
 
 			if (idle_count > pool_config->client_idle_limit)
 			{
-                ereport(ERROR,
+                ereport(FRONTEND_ERROR,
                         (pool_error_code("57000"),
                         errmsg("unable to read data"),
                          errdetail("child connection forced to terminate due to client_idle_limit:%d is reached",
@@ -4708,7 +4708,7 @@ SELECT_RETRY:
 
 			if (idle_count_in_recovery > pool_config->client_idle_limit_in_recovery)
 			{
-                ereport(ERROR,
+                ereport(FRONTEND_ERROR,
                     (pool_error_code("57000"),
                     errmsg("unable to read data"),
                          errdetail("child connection forced to terminate due to client_idle_limit_in_recovery:%d is reached",pool_config->client_idle_limit_in_recovery)));
@@ -4720,7 +4720,7 @@ SELECT_RETRY:
 			 * If we are in recovery and client_idle_limit_in_recovery is -1, then
 			 * exit immediately.
 			 */
-            ereport(ERROR,
+            ereport(FRONTEND_ERROR,
                 (pool_error_code("57000"),
                 errmsg("connection terminated due to online recovery"),
                      errdetail("child connection forced to terminate due to client_idle_limitis:-1")));
diff --git a/src/utils/error/elog.c b/src/utils/error/elog.c
index 18ab006..dd8f44a 100644
--- a/src/utils/error/elog.c
+++ b/src/utils/error/elog.c
@@ -217,11 +217,17 @@ errstart(int elevel, const char *filename, int lineno,
 	bool		output_to_server;
 	bool		output_to_client = false;
 	int			i;
-
+	int			frontend_invalid = 0;
 	/*
 	 * Check some cases in which we want to promote an error into a more
 	 * severe error.  None of this logic applies for non-error messages.
 	 */
+	if (elevel == FRONTEND_ERROR)
+	{
+		frontend_invalid = true;
+		elevel = ERROR;
+	}
+
 	if (elevel >= ERROR)
 	{
 		/*
@@ -321,6 +327,7 @@ errstart(int elevel, const char *filename, int lineno,
 	edata = &errordata[errordata_stack_depth];
 	MemSet(edata, 0, sizeof(ErrorData));
 	edata->elevel = elevel;
+	edata->frontend_invalid = frontend_invalid;
 	edata->output_to_server = output_to_server;
 	edata->output_to_client = output_to_client;
 	if(elevel == FATAL && PG_exception_stack == NULL) /* This is startup failure. Take down main process with it */
@@ -969,6 +976,21 @@ geterrcode(void)
 }
 
 /*
+ * getfrontendinvalid --- return the currently frontend_invalid value
+ *
+ * This is only intended for use in error callback subroutines, since there
+ * is no other place outside elog.c where the concept is meaningful.
+ */
+int getfrontendinvalid(void)
+{
+	ErrorData  *edata = &errordata[errordata_stack_depth];
+
+	/* we don't bother incrementing recursion_depth */
+	CHECK_STACK_DEPTH();
+
+	return edata->frontend_invalid;
+}
+/*
  * geterrposition --- return the currently set error position (0 if none)
  *
  * This is only intended for use in error callback subroutines, since there
@@ -1580,7 +1602,7 @@ write_eventlog(int level, const char *line, int len)
 
 	if (evtHandle == INVALID_HANDLE_VALUE)
 	{
-		evtHandle = RegisterEventSource(NULL, event_source ? event_source : "PostgreSQL");
+		evtHandle = RegisterEventSource(NULL, event_source ? event_source : "pgpool");
 		if (evtHandle == NULL)
 		{
 			evtHandle = INVALID_HANDLE_VALUE;


More information about the pgpool-hackers mailing list