[pgpool-hackers: 1161] Re: pgpool-II 3.4 problems

Muhammad Usama m.usama at gmail.com
Tue Nov 17 21:50:37 JST 2015


Hi Ishii-San

Thanks for reviewing the patch.

On Tue, Nov 17, 2015 at 7:31 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Hi Usama,
>
> I have taken a look at your patch.
>
> 1) Below diff seems to be unnecessary because even without the "+ if
> (cache_connection)", the result would be same.
>

Yes this is correct the result will be same after this new if statement.
The reason I made this change was to bypass if statement with four
strcmp()s when we already knew the cache_connection will remain false.
Although might not make much difference but done this to improve
performance of this condition statement.


> -       /*
> -        * For those special databases, and when frontend client exits
> abnormally,
> -        * we don't cache connection to backend.
> -        */
> -    if ((sp &&
> -               (!strcmp(sp->database, "template0") ||
> -                !strcmp(sp->database, "template1") ||
> -                !strcmp(sp->database, "postgres") ||
> -                !strcmp(sp->database, "regression"))) ||
> -                (*frontend != NULL &&
> -                 ((*frontend)->socket_state == POOL_SOCKET_EOF ||
> -                 (*frontend)->socket_state == POOL_SOCKET_ERROR)))
> -        cache_connection = false;
> -
> +       if (cache_connection)
> +       {
> +               /*
> +                * For those special databases, and when frontend client
> exits abnormally,
> +                * we don't cache connection to backend.
> +                */
> +               if ((sp &&
> +                       (!strcmp(sp->database, "template0") ||
> +                        !strcmp(sp->database, "template1") ||
> +                        !strcmp(sp->database, "postgres") ||
> +                        !strcmp(sp->database, "regression"))) ||
> +                        (*frontend != NULL &&
> +                         ((*frontend)->socket_state == POOL_SOCKET_EOF ||
> +                         (*frontend)->socket_state == POOL_SOCKET_ERROR)))
> +                       cache_connection = false;
> +       }
>
> 2) Shouldn't frontend_invalid variable be bool because it takes only 0
>    or 1. If so, geterrposition()'s return type should be declared as
>    bool.
>

Yes, you are right. frontend_invalid should've been bool. I have updated
this in attached version of the patch.


Thanks
Best regards
Muhammad Usama

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Ishii San
> >
> > Can you please have a look at the attached patch for 3.4 branch.
> > Basically it introduces a new error level FRONTEND_ERROR for ereport(),
> to
> > inform the error handler not to cache the current backend connection.
> Other
> > than that, this FRONTEND_ERROR error level behaves and works similar to
> the
> > ereport(ERROR..)
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> > On Thu, Oct 15, 2015 at 4:45 PM, Muhammad Usama <m.usama at gmail.com>
> wrote:
> >
> >>
> >>
> >> On Thu, Oct 15, 2015 at 2:23 AM, Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >>
> >>> Usama,
> >>>
> >>> Can you please propose a patch to implement similar thing with 3.4?  I
> >>> think it will make 3.4 more robust. I have no idea how to implement
> >>> it with 3.4's exception mechanism.
> >>>
> >>
> >> sure I will work on this
> >>
> >> Thanks
> >> best regards
> >> Muhammad Usama
> >>
> >>>
> >>> Best regards,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese:http://www.sraoss.co.jp
> >>>
> >>> > Hi Ishii San
> >>> >
> >>> > I think this difference of behavior between 3.3. And 3.4 is caused by
> >>> the
> >>> > commit "cb735f22441001b6afdbb5bac72808db66094ca9" that was not
> ported to
> >>> > the 3.4 branch.
> >>> >
> >>> >
> >>>
> http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=cb735f22441001b6afdbb5bac72808db66094ca9
> >>> >
> >>> > Thanks
> >>> > Best regards
> >>> > Muhammad Usama
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Oct 13, 2015 at 2:45 PM, Tatsuo Ishii <ishii at postgresql.org>
> >>> wrote:
> >>> >
> >>> >> > Usama,
> >>> >> >
> >>> >> > While I have been working on:
> >>> >> > http://www.pgpool.net/mantisbt/view.php?id=147
> >>> >>
> >>> >> Oops. This is wrong. Here is the correct one.
> >>> >>
> >>> >> http://www.pgpool.net/mantisbt/view.php?id=145
> >>> >>
> >>> >> > I found there was an interesting behavior change between pgpool-II
> >>> 3.3
> >>> >> > and 3.4.
> >>> >> >
> >>> >> > It turns out that the bug was related to the action when
> >>> >> > client_idle_limit is reached. In 3.3, when client_idle_limit is
> >>> >> > reached, following statement is executed (pool_process_query.c):
> >>> >> >
> >>> >> > if (idle_count > pool_config->client_idle_limit)
> >>> >> > {
> >>> >> >       pool_log("pool_process_query: child connection forced to
> >>> terminate
> >>> >> due to client_idle_limit (%d) reached,
> >>> >> >                pool_config->client_idle_limit);
> >>> >> >               pool_send_error_message(frontend, MAJOR(backend),
> >>> >> >               "57000", "connection terminated due to client idle
> >>> limit
> >>> >> reached",
> >>> >> >                                       "","",  __FILE__, __LINE__);
> >>> >> >       return POOL_END_WITH_FRONTEND_ERROR;
> >>> >> > }
> >>> >> >
> >>> >> > then goes to this (child.c)
> >>> >> >
> >>> >> > if (status == POOL_END_WITH_FRONTEND_ERROR ||
> >>> >> >    pool_config->connection_cache == 0 ||
> >>> >> >    !strcmp(sp->database, "template0") ||
> >>> >> >    !strcmp(sp->database, "template1") ||
> >>> >> >    !strcmp(sp->database, "postgres") ||
> >>> >> >    !strcmp(sp->database, "regression"))
> >>> >> > {
> >>> >> >       reset_connection();
> >>> >> >       pool_close(frontend);
> >>> >> >       pool_send_frontend_exits(backend);
> >>> >> >       pool_discard_cp(sp->user, sp->database, sp->major);
> >>> >> > }
> >>> >> >
> >>> >> > This results in closing connection to backend.
> >>> >> >
> >>> >> > In 3.4,
> >>> >> >
> >>> >> > if (idle_count > pool_config->client_idle_limit)
> >>> >> > {
> >>> >> >       ereport(ERROR,
> >>> >> >            (pool_error_code("57000"),
> >>> >> >                errmsg("unable to read data"),
> >>> >> >                  errdetail("child connection forced to terminate
> due
> >>> to
> >>> >> client_idle_limit:%d is reached",
> >>> >> >                   pool_config->client_idle_limit)));
> >>> >> > }
> >>> >> >
> >>> >> > Then jump to backend_cleanup() and call pool_process_query() to
> >>> >> > execute queries defined in reset_query_list. This results in
> sending
> >>> >> > typically "DISCARD ALL" to backend. The problem reported in the
> bug
> >>> id
> >>> >> > 147 happens in this route (actually the reporter claims that
> >>> pgpool-II
> >>> >> > hangs in sending "DISCARD").
> >>> >> >
> >>> >> > In summary, when client idle limit reaches, pgpool-II 3.3
> disconnects
> >>> >> > connections to backend, while 3.4 does not close the connections
> to
> >>> >> > backend and sends DISCARD to backend.
> >>> >> >
> >>> >> > We have a few reports that pgpool-II 3.4 hangs while sending
> DISCARD
> >>> >> > and I wonder those problems somewhat related to the behavior
> change
> >>> >> > described above because the code path in 3.4 is executed even if
> >>> >> > client_idle_limit is not reached (elog(ERROR...)).
> >>> >> >
> >>> >> > Usama,
> >>> >> >
> >>> >> > Was this behavior change intentional?
> >>> >> >
> >>> >> > Best regards,
> >>> >> > --
> >>> >> > Tatsuo Ishii
> >>> >> > SRA OSS, Inc. Japan
> >>> >> > English: http://www.sraoss.co.jp/index_en.php
> >>> >> > Japanese:http://www.sraoss.co.jp
> >>> >> > _______________________________________________
> >>> >> > pgpool-hackers mailing list
> >>> >> > pgpool-hackers at pgpool.net
> >>> >> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>> >> _______________________________________________
> >>> >> pgpool-hackers mailing list
> >>> >> pgpool-hackers at pgpool.net
> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>> >>
> >>>
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20151117/115a6c85/attachment-0001.html>
-------------- next part --------------
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index fd7ab1d..0947337 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -118,6 +118,10 @@ typedef enum
 #endif
 #define FATAL		21			/* fatal error - abort process */
 #define PANIC		22			/* take down the other backends with me */
+/* pgpool-II extension. This is same as ERROR but sets the
+ * do not cache connection flag before transforming to ERROR.
+ */
+#define FRONTEND_ERROR	23			/* transformed to ERROR at errstart */
 
  /* #define DEBUG DEBUG1 */	/* Backward compatibility with pre-7.3 */
 #define POOL_EXIT_SUCCESS	0	/* failure exit and child gets restarted*/
@@ -280,6 +284,7 @@ extern int	errposition(int cursorpos);
 
 #define pg_unreachable() exit(0)
 
+extern bool getfrontendinvalid(void);
 extern int	geterrcode(void);
 extern int	geterrposition(void);
 extern int	getinternalerrposition(void);
@@ -431,6 +436,7 @@ typedef struct ErrorData
 	const char *domain;			/* message domain */
 	const char *context_domain; /* message domain for context message */
 	int			sqlerrcode;		/* encoded ERRSTATE */
+	bool		frontend_invalid;/* non zero means frontend connection is invalid */
 	char	   *pgpool_errcode;	/* error code to be sent to client */
 	char	   *message;		/* primary error message */
 	char	   *detail;			/* detail error message */
diff --git a/src/protocol/child.c b/src/protocol/child.c
index c0cc066..6f5f7f0 100644
--- a/src/protocol/child.c
+++ b/src/protocol/child.c
@@ -87,7 +87,7 @@ static POOL_CONNECTION *get_connection(int front_end_fd, SockAddr *saddr);
 static POOL_CONNECTION_POOL *get_backend_connection(POOL_CONNECTION *frontend);
 static StartupPacket *StartupPacketCopy(StartupPacket *sp);
 static void print_process_status(char *remote_host,char* remote_port);
-static bool backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend);
+static bool backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend, bool frontend_invalid);
 static void free_persisten_db_connection_memory(POOL_CONNECTION_POOL_SLOT *cp);
 static int choose_db_node_id(char *str);
 static void child_will_go_down(int code, Datum arg);
@@ -211,6 +211,8 @@ void do_child(int *fds)
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
 	{
 		POOL_PROCESS_CONTEXT *session;
+		bool frontend_invalid = getfrontendinvalid();
+
 		disable_authentication_timeout();
 		/* Since not using PG_TRY, must reset error stack by hand */
 		error_context_stack = NULL;
@@ -233,7 +235,7 @@ void do_child(int *fds)
 		if (accepted)
 			connection_count_down();
 
-		backend_cleanup(&child_frontend, backend);
+		backend_cleanup(&child_frontend, backend, frontend_invalid);
 
 		session = pool_get_process_context();
 
@@ -372,7 +374,7 @@ void do_child(int *fds)
 			status = pool_process_query(child_frontend, backend, 0);
             if(status != POOL_CONTINUE)
             {
-                backend_cleanup(&child_frontend, backend);
+                backend_cleanup(&child_frontend, backend, 0);
                 break;
             }
 		}
@@ -415,7 +417,7 @@ void do_child(int *fds)
  * return true if backend connection is cached
  */
 static bool
-backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend)
+backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volatile backend, bool frontend_invalid)
 {
     StartupPacket *sp;
     bool cache_connection = false;
@@ -426,11 +428,11 @@ backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volat
     sp = MASTER_CONNECTION(backend)->sp;
 
     /*
-     * cach connection if connection is not for
+     * cach connection if frontend is not invalid, connection is not for
      * system db and connection cache configuration
      * parameter is enabled
      */
-    if (sp && pool_config->connection_cache != 0 && sp->system_db == false)
+    if (sp && pool_config->connection_cache != 0 && sp->system_db == false  && frontend_invalid == false )
     {
         if (*frontend)
         {
@@ -453,20 +455,22 @@ backend_cleanup(POOL_CONNECTION* volatile *frontend, POOL_CONNECTION_POOL* volat
         }
     }
 
-	/*
-	 * For those special databases, and when frontend client exits abnormally,
-	 * we don't cache connection to backend.
-	 */
-    if ((sp &&
-		(!strcmp(sp->database, "template0") ||
-		 !strcmp(sp->database, "template1") ||
-		 !strcmp(sp->database, "postgres") ||
-		 !strcmp(sp->database, "regression"))) ||
-		 (*frontend != NULL &&
-		  ((*frontend)->socket_state == POOL_SOCKET_EOF ||
-		  (*frontend)->socket_state == POOL_SOCKET_ERROR)))
-        cache_connection = false;
-
+	if (cache_connection)
+	{
+		/*
+		 * For those special databases, and when frontend client exits abnormally,
+		 * we don't cache connection to backend.
+		 */
+		if ((sp &&
+			(!strcmp(sp->database, "template0") ||
+			 !strcmp(sp->database, "template1") ||
+			 !strcmp(sp->database, "postgres") ||
+			 !strcmp(sp->database, "regression"))) ||
+			 (*frontend != NULL &&
+			  ((*frontend)->socket_state == POOL_SOCKET_EOF ||
+			  (*frontend)->socket_state == POOL_SOCKET_ERROR)))
+			cache_connection = false;
+	}
 	/*
 	 * Close frontend connection
 	 */
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index ff97bf1..271f19f 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4695,7 +4695,7 @@ SELECT_RETRY:
 
 			if (idle_count > pool_config->client_idle_limit)
 			{
-                ereport(ERROR,
+                ereport(FRONTEND_ERROR,
                         (pool_error_code("57000"),
                         errmsg("unable to read data"),
                          errdetail("child connection forced to terminate due to client_idle_limit:%d is reached",
@@ -4708,7 +4708,7 @@ SELECT_RETRY:
 
 			if (idle_count_in_recovery > pool_config->client_idle_limit_in_recovery)
 			{
-                ereport(ERROR,
+                ereport(FRONTEND_ERROR,
                     (pool_error_code("57000"),
                     errmsg("unable to read data"),
                          errdetail("child connection forced to terminate due to client_idle_limit_in_recovery:%d is reached",pool_config->client_idle_limit_in_recovery)));
@@ -4720,7 +4720,7 @@ SELECT_RETRY:
 			 * If we are in recovery and client_idle_limit_in_recovery is -1, then
 			 * exit immediately.
 			 */
-            ereport(ERROR,
+            ereport(FRONTEND_ERROR,
                 (pool_error_code("57000"),
                 errmsg("connection terminated due to online recovery"),
                      errdetail("child connection forced to terminate due to client_idle_limitis:-1")));
diff --git a/src/utils/error/elog.c b/src/utils/error/elog.c
index 18ab006..aab24ed 100644
--- a/src/utils/error/elog.c
+++ b/src/utils/error/elog.c
@@ -217,11 +217,17 @@ errstart(int elevel, const char *filename, int lineno,
 	bool		output_to_server;
 	bool		output_to_client = false;
 	int			i;
-
+	int			frontend_invalid = 0;
 	/*
 	 * Check some cases in which we want to promote an error into a more
 	 * severe error.  None of this logic applies for non-error messages.
 	 */
+	if (elevel == FRONTEND_ERROR)
+	{
+		frontend_invalid = true;
+		elevel = ERROR;
+	}
+
 	if (elevel >= ERROR)
 	{
 		/*
@@ -321,6 +327,7 @@ errstart(int elevel, const char *filename, int lineno,
 	edata = &errordata[errordata_stack_depth];
 	MemSet(edata, 0, sizeof(ErrorData));
 	edata->elevel = elevel;
+	edata->frontend_invalid = frontend_invalid;
 	edata->output_to_server = output_to_server;
 	edata->output_to_client = output_to_client;
 	if(elevel == FATAL && PG_exception_stack == NULL) /* This is startup failure. Take down main process with it */
@@ -969,6 +976,21 @@ geterrcode(void)
 }
 
 /*
+ * getfrontendinvalid --- return the currently frontend_invalid value
+ *
+ * This is only intended for use in error callback subroutines, since there
+ * is no other place outside elog.c where the concept is meaningful.
+ */
+bool getfrontendinvalid(void)
+{
+	ErrorData  *edata = &errordata[errordata_stack_depth];
+
+	/* we don't bother incrementing recursion_depth */
+	CHECK_STACK_DEPTH();
+
+	return edata->frontend_invalid;
+}
+/*
  * geterrposition --- return the currently set error position (0 if none)
  *
  * This is only intended for use in error callback subroutines, since there
@@ -1580,7 +1602,7 @@ write_eventlog(int level, const char *line, int len)
 
 	if (evtHandle == INVALID_HANDLE_VALUE)
 	{
-		evtHandle = RegisterEventSource(NULL, event_source ? event_source : "PostgreSQL");
+		evtHandle = RegisterEventSource(NULL, event_source ? event_source : "pgpool");
 		if (evtHandle == NULL)
 		{
 			evtHandle = INVALID_HANDLE_VALUE;


More information about the pgpool-hackers mailing list