[pgpool-hackers: 1720] WIP handling pg_terminate_backend function in pgpool-II

Muhammad Usama m.usama at gmail.com
Mon Jul 25 22:57:58 JST 2016


Hi

Please find the attached work in progress patch for the handling of
pg_terminate_backend() in pgpool-II. There is still some work need to be
done on the patch, but I am sharing it to get the thoughts of a wider group
of audience on the idea and the design.

Problems needed to be addressed for pg_terminated_backend()
======
1- Since the pg_terminate_backend just go on and kills the child connection
it is treated as the backend failure by pgpool-II and depending on the
pgpool-II configuration pgpool-II failover that backend node.

2- The current logic in pgpool-II have no idea of which backend node the
pg_terminate_backend call is intended to, and it is routed on the basis of
logic in pool_where_to_send() function that can send the query to the node
that does not have the connection with the pid.


How is this patch trying to solve this?
===========
After parsing the query in SimpleQuery() function the patch runs the
function walker on the parsed query tree to find the
pg_terminate_backend("const integer") call. And if the function is found in
the query the second step is to locate the pgpool-II child process that
contains the backend connection with PID specified in
pg_terminate_backend() function call. For that
the patch loops on all pgpool-II child process and try to locate
the backend connection with the specific PID by comparing the PID stored in
the shared memory ConnectionInfos. If the search is successful and the
child process with the backend connection having the same PID is found, the
shared memory "ConnectionInfo->backend_terminated"(added by this patch)
flag is set to true for that connection.
Once, after the backend node is identified and the flag is set, the next
step is to rout the query to the exact backend node that contains the
connection. For that the patch refrains from calling the
pool_where_to_send() for this query and explicitly sets the query
destination to the backend node that has the connection so that the query
should lands on the proper backend node.

Now when the query is successfully executed on the backend server and
consequently the backend node kills the said connection to the pgpool, It
results in the communication error in pgpool-II child process. So to make
sure that the child process does not failover that backend node and the
child process checks the "ConnectionInfo->backend_terminated" flag before
performing the failover, and if the flag is set, it just does not proceed
with failover and bails out by FATAL error that kills that
particular pgpool-II child process which eventually gets re-spawned by
pgpool main.


Issues with this approach
====
1- What to do if somehow two or more connections to different backend
PostgreSQL servers have the same PID.

2- We can only support pg_terminate_backend(constant number) function
calls. If the expression or sub query is used in the argument of
pg_terminate_backend then it would not be handled
 e.g
pgpool=# select pg_terminate_backend((select 1));
pgpool=# select pg_terminate_backend( 2 +1);


3- One very remote scenario where after setting the flag
ConnectionInfo->backend_terminated to true and before the execution of
pg_terminate_backend the backend actually gets fail, will still be treated
as pg_terminate_backend case. But I believe it would be harmless since that
backend node failure will be caught by the child process sending the
pg_terminate_backend to that same backend.

TODOs
=====

If the pg_terminate_backend() fails on the backend because of permission or
any other issue ,we need to reset the ConnectionInfo->backend_terminated
flag.


What are your thoughts and suggestions on this approach?

Many thanks in anticipation.
Best regards
Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160725/9fa3e654/attachment-0001.html>
-------------- next part --------------
diff --git a/src/auth/pool_auth.c b/src/auth/pool_auth.c
index 2f259df..14b4926 100644
--- a/src/auth/pool_auth.c
+++ b/src/auth/pool_auth.c
@@ -406,6 +406,8 @@ int pool_do_auth(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *cp)
 			strlcpy(cp->info[i].database, sp->database, sizeof(cp->info[i].database));
 			strlcpy(cp->info[i].user, sp->user, sizeof(cp->info[i].user));
 			cp->info[i].counter = 1;
+			CONNECTION(cp, i)->info = &cp->info[i];
+			cp->info[i].backend_terminated = 0;
 		}
 	}
 
diff --git a/src/context/pool_query_context.c b/src/context/pool_query_context.c
index 49b0056..5f1cfc5 100644
--- a/src/context/pool_query_context.c
+++ b/src/context/pool_query_context.c
@@ -311,6 +311,24 @@ int pool_virtual_master_db_node_id(void)
 	return my_master_node_id;
 }
 
+void pool_force_query_node_to_backend(POOL_QUERY_CONTEXT *query_context, int backend_id)
+{
+	int i;
+	CHECK_QUERY_CONTEXT_IS_VALID;
+	ereport(LOG,
+		(errmsg("forcing query destination node to backend node:%d",backend_id)));
+
+	pool_set_node_to_be_sent(query_context,backend_id);
+	for (i=0;i<NUM_BACKENDS;i++)
+	{
+		if (query_context->where_to_send[i])
+		{
+			query_context->virtual_master_node_id = i;
+			break;
+		}
+	}
+}
+
 /*
  * Decide where to send queries(thus expecting response)
  */
diff --git a/src/include/context/pool_query_context.h b/src/include/context/pool_query_context.h
index 207e239..34de352 100644
--- a/src/include/context/pool_query_context.h
+++ b/src/include/context/pool_query_context.h
@@ -103,5 +103,5 @@ extern bool pool_is_cache_exceeded(void);
 extern void pool_set_cache_exceeded(void);
 extern void pool_unset_cache_exceeded(void);
 extern bool pool_is_transaction_read_only(Node *node);
-
+extern void pool_force_query_node_to_backend(POOL_QUERY_CONTEXT *query_context, int backend_id);
 #endif /* POOL_QUERY_CONTEXT_H */
diff --git a/src/include/pcp/libpcp_ext.h b/src/include/pcp/libpcp_ext.h
index 42e83f2..7b727ea 100644
--- a/src/include/pcp/libpcp_ext.h
+++ b/src/include/pcp/libpcp_ext.h
@@ -95,6 +95,7 @@ typedef struct {
 							 * might be out of control of
 							 * pgpool-II. So we use "char" here.
 							 */
+	char		backend_terminated;
 } ConnectionInfo;
 
 /*
diff --git a/src/include/pool.h b/src/include/pool.h
index 1f43efd..af58357 100644
--- a/src/include/pool.h
+++ b/src/include/pool.h
@@ -235,6 +235,7 @@ typedef struct {
 	char *auth_arg;
 	char *database;
 	char *username;
+	ConnectionInfo *info; /* shared memory coninfo used for pg_terminate_backend*/
 } POOL_CONNECTION;
 
 /*
diff --git a/src/include/utils/pool_select_walker.h b/src/include/utils/pool_select_walker.h
index d3e6fa0..8bdd774 100644
--- a/src/include/utils/pool_select_walker.h
+++ b/src/include/utils/pool_select_walker.h
@@ -40,6 +40,7 @@ typedef struct {
 	bool    has_unlogged_table; /* True if unlogged table is used */
 	bool    has_view; /* True if view is used */
 	bool    has_function_call;  /* True if write function call is used */
+	int		pg_terminate_backend_pid; /* pid argument of pg_terminate_backedn_call(if used) */
 	bool    has_non_immutable_function_call;    /* True if non immutable functions are used */
 	bool    has_insertinto_or_locking_clause;   /* True if it has SELECT INTO or FOR SHARE/UPDATE */
 	int     num_oids;   /* number of oids */
@@ -47,6 +48,7 @@ typedef struct {
 	char    table_names[POOL_MAX_SELECT_OIDS][POOL_NAMEDATALEN];  /* table names */
 } SelectContext;
 
+extern int pool_get_terminate_backend_pid(Node *node);
 extern bool pool_has_function_call(Node *node);
 extern bool pool_has_non_immutable_function_call(Node *node);
 extern bool pool_has_system_catalog(Node *node);
diff --git a/src/protocol/pool_connection_pool.c b/src/protocol/pool_connection_pool.c
index 23c6665..a941ffc 100644
--- a/src/protocol/pool_connection_pool.c
+++ b/src/protocol/pool_connection_pool.c
@@ -167,6 +167,7 @@ POOL_CONNECTION_POOL *pool_get_cp(char *user, char *database, int protoMajor, in
 					info = connection_pool->info;
 					memset(connection_pool, 0, sizeof(POOL_CONNECTION_POOL));
 					connection_pool->info = info;
+					info->backend_terminated = 0;
 					memset(connection_pool->info, 0, sizeof(ConnectionInfo) * MAX_NUM_BACKENDS);
 					POOL_SETMASK(&oldmask);
 					return NULL;
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 2f12132..38381dd 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4719,8 +4719,18 @@ SELECT_RETRY:
 						was_error = 1;
 						if (!VALID_BACKEND(i))
 							break;
-						notice_backend_error(i, true);
-						sleep(5);
+						/* check if the pg_terminate_backend was issued on this connection */
+						if (CONNECTION_SLOT(backend, i)->info->backend_terminated == 1)
+						{
+							ereport(FATAL,
+								(errmsg("postmaster on DB node %d connection was lost due to pg_terminate_backend", i),
+									 errdetail("pg_terminate_backend was called on the backend")));
+						}
+						else
+						{
+							notice_backend_error(i, true);
+							sleep(5);
+						}
 						break;
 					}
 
@@ -4760,7 +4770,7 @@ SELECT_RETRY:
             ereport(ERROR,
                 (errmsg("unable to read from frontend socket"),
                      errdetail("exception occured on frontend socket")));
-    
+
 		else if (FD_ISSET(frontend->fd, &readmask))
 		{
 			status = ProcessFrontendResponse(frontend, backend);
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index 6277b61..073136f 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -409,11 +409,62 @@ POOL_STATUS SimpleQuery(POOL_CONNECTION *frontend,
 		}
 
 		/*
-		 * Decide where to send query
+		 * pg_terminate function needs special handling.
 		 */
-		pool_where_to_send(query_context, query_context->original_query,
-						   query_context->parse_tree);
+		int terminate_backend_id = -1;
+		int terminate_pid = pool_get_terminate_backend_pid(query_context->parse_tree);
+		if (terminate_pid > 0)
+		{
+			int child;
+			/*
+			 * pg_terminate backene found in the query
+			 * look for the child process that has the backend
+			 * with the pid
+			 */
+
+			ereport(LOG,
+				(errmsg("found pg_terminate_backend request for backend pid %d",terminate_pid)));
+
+			for (child = 0; child < pool_config->num_init_children; child++)
+			{
+				int pool;
+				ProcessInfo *pi = pool_get_process_info(process_info[child].pid);
 
+				for (pool = 0; pool < pool_config->max_pool; pool++)
+				{
+					int backend_id;
+					for (backend_id = 0; backend_id < NUM_BACKENDS; backend_id++)
+					{
+						int poolBE = pool*MAX_NUM_BACKENDS+backend_id;
+						if (ntohl(pi->connection_info[poolBE].pid) == terminate_pid)
+						{
+							ereport(LOG,
+								(errmsg("processing pg_terminate_backend request"),
+									 errdetail("found pid:%d child pid:%d",ntohl(pi->connection_info[poolBE].pid), pi->pid)));
+							/* set flag for in the shared memory for the connection info */
+							pi->connection_info[poolBE].backend_terminated = 1;
+							terminate_backend_id = backend_id;
+							/* Todo Reset the flag somehow in case the function was not successfull on the
+							 * backend
+							 */
+						}
+					}
+				}
+			}
+		}
+		if (terminate_backend_id >=0 )
+		{
+			/* It was the pg_terminate_backend call so send the query to appropriate backend */
+			pool_force_query_node_to_backend(query_context, terminate_backend_id);
+		}
+		else
+		{
+			/*
+			 * Decide where to send query
+			 */
+			pool_where_to_send(query_context, query_context->original_query,
+							   query_context->parse_tree);
+		}
 		/*
 		 * if this is DROP DATABASE command, send USR1 signal to parent and
 		 * ask it to close all idle connections.
diff --git a/src/utils/pool_select_walker.c b/src/utils/pool_select_walker.c
index be5c38a..7c063dd 100644
--- a/src/utils/pool_select_walker.c
+++ b/src/utils/pool_select_walker.c
@@ -58,6 +58,7 @@ bool pool_has_function_call(Node *node)
 		return false;
 
 	ctx.has_function_call = false;
+	ctx.pg_terminate_backend_pid = -1;
 
 	raw_expression_tree_walker(node, function_call_walker, &ctx);
 
@@ -65,6 +66,25 @@ bool pool_has_function_call(Node *node)
 }
 
 /*
+ * Search the pg_terminate_backend() call in the query
+ */
+int pool_get_terminate_backend_pid(Node *node)
+{
+	SelectContext	ctx;
+
+	if (!IsA(node, SelectStmt))
+		return false;
+
+	ctx.has_function_call = false;
+	ctx.pg_terminate_backend_pid = 0;
+
+	raw_expression_tree_walker(node, function_call_walker, &ctx);
+
+	return ctx.pg_terminate_backend_pid;
+
+}
+
+/*
  * Return true if this SELECT has system catalog table.
  */
 bool pool_has_system_catalog(Node *node)
@@ -286,6 +306,20 @@ static bool function_call_walker(Node *node, void *context)
 			ereport(DEBUG1,
 				(errmsg("function call walker, function name: \"%s\"", fname)));
 
+			if (ctx->pg_terminate_backend_pid == 0 && strcmp("pg_terminate_backend", fname) == 0)
+			{
+				if (list_length(fcall->args) == 1)
+				{
+					Node *arg = linitial(fcall->args);
+					if (IsA(arg, A_Const) &&
+					   ((A_Const *)arg)->val.type == T_Integer)
+					{
+						ctx->pg_terminate_backend_pid = ((A_Const *)arg)->val.val.ival;
+						ereport(DEBUG1,
+								(errmsg("pg_terminate_backend pid = %d",ctx->pg_terminate_backend_pid)));
+					}
+				}
+			}
 			/*
 			 * Check white list if any.
 			 */
diff --git a/src/utils/pool_stream.c b/src/utils/pool_stream.c
index 39903bf..892c07d 100644
--- a/src/utils/pool_stream.c
+++ b/src/utils/pool_stream.c
@@ -207,8 +207,16 @@ int pool_read(POOL_CONNECTION *cp, void *buf, int len)
 			cp->socket_state = POOL_SOCKET_ERROR;
 			if (cp->isbackend)
 			{
-				ereport(ERROR,
+				if (cp->info->backend_terminated == 1)
+				{
+					cp->info->backend_terminated = 0;
+					ereport(FATAL,
 						(errmsg("unable to read data from DB node %d",cp->db_node_id),
+							 errdetail("pg_terminate_backend was called on the backend")));
+				}
+
+				ereport(ERROR,
+					(errmsg("unable to read data from DB node %d",cp->db_node_id),
 						 errdetail("socket read failed with an error \"%s\"", strerror(errno))));
 
 				/* if fail_over_on_backend_error is true, then trigger failover */
@@ -347,6 +355,14 @@ char *pool_read2(POOL_CONNECTION *cp, int len)
 			cp->socket_state = POOL_SOCKET_ERROR;
 			if (cp->isbackend)
 			{
+				if (cp->info->backend_terminated == 1)
+				{
+					cp->info->backend_terminated = 0;
+					ereport(FATAL,
+						(errmsg("unable to read data from DB node %d",cp->db_node_id),
+							 errdetail("pg_terminate_backend was called on the backend")));
+				}
+
 				/* if fail_over_on_backend_error is true, then trigger failover */
 				if (pool_config->fail_over_on_backend_error)
 				{
@@ -603,6 +619,14 @@ int pool_flush(POOL_CONNECTION *cp)
 	{
 		if (cp->isbackend)
 		{
+			if (cp->info->backend_terminated == 1)
+			{
+				cp->info->backend_terminated = 0;
+				ereport(FATAL,
+						(errmsg("unable to read data from DB node %d",cp->db_node_id),
+						 errdetail("pg_terminate_backend was called on the backend")));
+			}
+
 			/* if fail_over_on_backend_error is true, then trigger failover */
 			if (pool_config->fail_over_on_backend_error)
 			{
@@ -650,6 +674,14 @@ int pool_flush_noerror(POOL_CONNECTION *cp)
     {
         if (cp->isbackend)
         {
+			if (cp->info->backend_terminated == 1)
+			{
+				cp->info->backend_terminated = 0;
+				ereport(FATAL,
+						(errmsg("unable to read data from DB node %d",cp->db_node_id),
+						 errdetail("pg_terminate_backend was called on the backend")));
+			}
+
             /* if fail_over_on_backend_erro is true, then trigger failover */
             if (pool_config->fail_over_on_backend_error)
             {
@@ -810,6 +842,14 @@ char *pool_read_string(POOL_CONNECTION *cp, int *len, int line)
 			cp->socket_state = POOL_SOCKET_ERROR;
 			if (cp->isbackend)
 			{
+				if (cp->info->backend_terminated == 1)
+				{
+					cp->info->backend_terminated = 0;
+					ereport(FATAL,
+							(errmsg("unable to read data from DB node %d",cp->db_node_id),
+							 errdetail("pg_terminate_backend was called on the backend")));
+				}
+
 				notice_backend_error(cp->db_node_id, true);
 				child_exit(POOL_EXIT_AND_RESTART);
                 ereport(ERROR,


More information about the pgpool-hackers mailing list