[pgpool-hackers: 1721] Re: WIP handling pg_terminate_backend function in pgpool-II
Muhammad Usama
m.usama at gmail.com
Tue Jul 26 05:38:36 JST 2016
please use this v2 patch, there was a little mistake in the previous
version
Regards
Muhammad Usama
On Mon, Jul 25, 2016 at 6:57 PM, Muhammad Usama <m.usama at gmail.com> wrote:
> Hi
>
> Please find the attached work in progress patch for the handling of
> pg_terminate_backend() in pgpool-II. There is still some work need to be
> done on the patch, but I am sharing it to get the thoughts of a wider group
> of audience on the idea and the design.
>
> Problems needed to be addressed for pg_terminated_backend()
> ======
> 1- Since the pg_terminate_backend just go on and kills the child
> connection it is treated as the backend failure by pgpool-II and depending
> on the pgpool-II configuration pgpool-II failover that backend node.
>
> 2- The current logic in pgpool-II have no idea of which backend node the
> pg_terminate_backend call is intended to, and it is routed on the basis of
> logic in pool_where_to_send() function that can send the query to the node
> that does not have the connection with the pid.
>
>
> How is this patch trying to solve this?
> ===========
> After parsing the query in SimpleQuery() function the patch runs the
> function walker on the parsed query tree to find the
> pg_terminate_backend("const integer") call. And if the function is found in
> the query the second step is to locate the pgpool-II child process that
> contains the backend connection with PID specified in
> pg_terminate_backend() function call. For that
> the patch loops on all pgpool-II child process and try to locate
> the backend connection with the specific PID by comparing the PID stored in
> the shared memory ConnectionInfos. If the search is successful and the
> child process with the backend connection having the same PID is found, the
> shared memory "ConnectionInfo->backend_terminated"(added by this patch)
> flag is set to true for that connection.
> Once, after the backend node is identified and the flag is set, the next
> step is to rout the query to the exact backend node that contains the
> connection. For that the patch refrains from calling the
> pool_where_to_send() for this query and explicitly sets the query
> destination to the backend node that has the connection so that the query
> should lands on the proper backend node.
>
> Now when the query is successfully executed on the backend server and
> consequently the backend node kills the said connection to the pgpool, It
> results in the communication error in pgpool-II child process. So to make
> sure that the child process does not failover that backend node and the
> child process checks the "ConnectionInfo->backend_terminated" flag before
> performing the failover, and if the flag is set, it just does not proceed
> with failover and bails out by FATAL error that kills that
> particular pgpool-II child process which eventually gets re-spawned by
> pgpool main.
>
>
> Issues with this approach
> ====
> 1- What to do if somehow two or more connections to different backend
> PostgreSQL servers have the same PID.
>
> 2- We can only support pg_terminate_backend(constant number) function
> calls. If the expression or sub query is used in the argument of
> pg_terminate_backend then it would not be handled
> e.g
> pgpool=# select pg_terminate_backend((select 1));
> pgpool=# select pg_terminate_backend( 2 +1);
>
>
> 3- One very remote scenario where after setting the flag
> ConnectionInfo->backend_terminated to true and before the execution of
> pg_terminate_backend the backend actually gets fail, will still be treated
> as pg_terminate_backend case. But I believe it would be harmless since that
> backend node failure will be caught by the child process sending the
> pg_terminate_backend to that same backend.
>
> TODOs
> =====
>
> If the pg_terminate_backend() fails on the backend because of permission
> or any other issue ,we need to reset the ConnectionInfo->backend_terminated
> flag.
>
>
> What are your thoughts and suggestions on this approach?
>
> Many thanks in anticipation.
> Best regards
> Muhammad Usama
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160726/691124b7/attachment-0001.html>
-------------- next part --------------
diff --git a/src/auth/pool_auth.c b/src/auth/pool_auth.c
index 2f259df..14b4926 100644
--- a/src/auth/pool_auth.c
+++ b/src/auth/pool_auth.c
@@ -406,6 +406,8 @@ int pool_do_auth(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *cp)
strlcpy(cp->info[i].database, sp->database, sizeof(cp->info[i].database));
strlcpy(cp->info[i].user, sp->user, sizeof(cp->info[i].user));
cp->info[i].counter = 1;
+ CONNECTION(cp, i)->info = &cp->info[i];
+ cp->info[i].backend_terminated = 0;
}
}
diff --git a/src/context/pool_query_context.c b/src/context/pool_query_context.c
index 49b0056..5f1cfc5 100644
--- a/src/context/pool_query_context.c
+++ b/src/context/pool_query_context.c
@@ -311,6 +311,24 @@ int pool_virtual_master_db_node_id(void)
return my_master_node_id;
}
+void pool_force_query_node_to_backend(POOL_QUERY_CONTEXT *query_context, int backend_id)
+{
+ int i;
+ CHECK_QUERY_CONTEXT_IS_VALID;
+ ereport(LOG,
+ (errmsg("forcing query destination node to backend node:%d",backend_id)));
+
+ pool_set_node_to_be_sent(query_context,backend_id);
+ for (i=0;i<NUM_BACKENDS;i++)
+ {
+ if (query_context->where_to_send[i])
+ {
+ query_context->virtual_master_node_id = i;
+ break;
+ }
+ }
+}
+
/*
* Decide where to send queries(thus expecting response)
*/
diff --git a/src/include/context/pool_query_context.h b/src/include/context/pool_query_context.h
index 207e239..34de352 100644
--- a/src/include/context/pool_query_context.h
+++ b/src/include/context/pool_query_context.h
@@ -103,5 +103,5 @@ extern bool pool_is_cache_exceeded(void);
extern void pool_set_cache_exceeded(void);
extern void pool_unset_cache_exceeded(void);
extern bool pool_is_transaction_read_only(Node *node);
-
+extern void pool_force_query_node_to_backend(POOL_QUERY_CONTEXT *query_context, int backend_id);
#endif /* POOL_QUERY_CONTEXT_H */
diff --git a/src/include/pcp/libpcp_ext.h b/src/include/pcp/libpcp_ext.h
index 42e83f2..7b727ea 100644
--- a/src/include/pcp/libpcp_ext.h
+++ b/src/include/pcp/libpcp_ext.h
@@ -95,6 +95,7 @@ typedef struct {
* might be out of control of
* pgpool-II. So we use "char" here.
*/
+ char backend_terminated;
} ConnectionInfo;
/*
diff --git a/src/include/pool.h b/src/include/pool.h
index 1f43efd..af58357 100644
--- a/src/include/pool.h
+++ b/src/include/pool.h
@@ -235,6 +235,7 @@ typedef struct {
char *auth_arg;
char *database;
char *username;
+ ConnectionInfo *info; /* shared memory coninfo used for pg_terminate_backend*/
} POOL_CONNECTION;
/*
diff --git a/src/include/utils/pool_select_walker.h b/src/include/utils/pool_select_walker.h
index d3e6fa0..8bdd774 100644
--- a/src/include/utils/pool_select_walker.h
+++ b/src/include/utils/pool_select_walker.h
@@ -40,6 +40,7 @@ typedef struct {
bool has_unlogged_table; /* True if unlogged table is used */
bool has_view; /* True if view is used */
bool has_function_call; /* True if write function call is used */
+ int pg_terminate_backend_pid; /* pid argument of pg_terminate_backedn_call(if used) */
bool has_non_immutable_function_call; /* True if non immutable functions are used */
bool has_insertinto_or_locking_clause; /* True if it has SELECT INTO or FOR SHARE/UPDATE */
int num_oids; /* number of oids */
@@ -47,6 +48,7 @@ typedef struct {
char table_names[POOL_MAX_SELECT_OIDS][POOL_NAMEDATALEN]; /* table names */
} SelectContext;
+extern int pool_get_terminate_backend_pid(Node *node);
extern bool pool_has_function_call(Node *node);
extern bool pool_has_non_immutable_function_call(Node *node);
extern bool pool_has_system_catalog(Node *node);
diff --git a/src/protocol/pool_connection_pool.c b/src/protocol/pool_connection_pool.c
index 23c6665..a941ffc 100644
--- a/src/protocol/pool_connection_pool.c
+++ b/src/protocol/pool_connection_pool.c
@@ -167,6 +167,7 @@ POOL_CONNECTION_POOL *pool_get_cp(char *user, char *database, int protoMajor, in
info = connection_pool->info;
memset(connection_pool, 0, sizeof(POOL_CONNECTION_POOL));
connection_pool->info = info;
+ info->backend_terminated = 0;
memset(connection_pool->info, 0, sizeof(ConnectionInfo) * MAX_NUM_BACKENDS);
POOL_SETMASK(&oldmask);
return NULL;
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 2f12132..38381dd 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4719,8 +4719,18 @@ SELECT_RETRY:
was_error = 1;
if (!VALID_BACKEND(i))
break;
- notice_backend_error(i, true);
- sleep(5);
+ /* check if the pg_terminate_backend was issued on this connection */
+ if (CONNECTION(backend, i)->info->backend_terminated == 1)
+ {
+ ereport(FATAL,
+ (errmsg("postmaster on DB node %d connection was lost due to pg_terminate_backend", i),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+ else
+ {
+ notice_backend_error(i, true);
+ sleep(5);
+ }
break;
}
@@ -4760,7 +4770,7 @@ SELECT_RETRY:
ereport(ERROR,
(errmsg("unable to read from frontend socket"),
errdetail("exception occured on frontend socket")));
-
+
else if (FD_ISSET(frontend->fd, &readmask))
{
status = ProcessFrontendResponse(frontend, backend);
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index 6277b61..073136f 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -409,11 +409,62 @@ POOL_STATUS SimpleQuery(POOL_CONNECTION *frontend,
}
/*
- * Decide where to send query
+ * pg_terminate function needs special handling.
*/
- pool_where_to_send(query_context, query_context->original_query,
- query_context->parse_tree);
+ int terminate_backend_id = -1;
+ int terminate_pid = pool_get_terminate_backend_pid(query_context->parse_tree);
+ if (terminate_pid > 0)
+ {
+ int child;
+ /*
+ * pg_terminate backene found in the query
+ * look for the child process that has the backend
+ * with the pid
+ */
+
+ ereport(LOG,
+ (errmsg("found pg_terminate_backend request for backend pid %d",terminate_pid)));
+
+ for (child = 0; child < pool_config->num_init_children; child++)
+ {
+ int pool;
+ ProcessInfo *pi = pool_get_process_info(process_info[child].pid);
+ for (pool = 0; pool < pool_config->max_pool; pool++)
+ {
+ int backend_id;
+ for (backend_id = 0; backend_id < NUM_BACKENDS; backend_id++)
+ {
+ int poolBE = pool*MAX_NUM_BACKENDS+backend_id;
+ if (ntohl(pi->connection_info[poolBE].pid) == terminate_pid)
+ {
+ ereport(LOG,
+ (errmsg("processing pg_terminate_backend request"),
+ errdetail("found pid:%d child pid:%d",ntohl(pi->connection_info[poolBE].pid), pi->pid)));
+ /* set flag for in the shared memory for the connection info */
+ pi->connection_info[poolBE].backend_terminated = 1;
+ terminate_backend_id = backend_id;
+ /* Todo Reset the flag somehow in case the function was not successfull on the
+ * backend
+ */
+ }
+ }
+ }
+ }
+ }
+ if (terminate_backend_id >=0 )
+ {
+ /* It was the pg_terminate_backend call so send the query to appropriate backend */
+ pool_force_query_node_to_backend(query_context, terminate_backend_id);
+ }
+ else
+ {
+ /*
+ * Decide where to send query
+ */
+ pool_where_to_send(query_context, query_context->original_query,
+ query_context->parse_tree);
+ }
/*
* if this is DROP DATABASE command, send USR1 signal to parent and
* ask it to close all idle connections.
diff --git a/src/utils/pool_select_walker.c b/src/utils/pool_select_walker.c
index be5c38a..7c063dd 100644
--- a/src/utils/pool_select_walker.c
+++ b/src/utils/pool_select_walker.c
@@ -58,6 +58,7 @@ bool pool_has_function_call(Node *node)
return false;
ctx.has_function_call = false;
+ ctx.pg_terminate_backend_pid = -1;
raw_expression_tree_walker(node, function_call_walker, &ctx);
@@ -65,6 +66,25 @@ bool pool_has_function_call(Node *node)
}
/*
+ * Search the pg_terminate_backend() call in the query
+ */
+int pool_get_terminate_backend_pid(Node *node)
+{
+ SelectContext ctx;
+
+ if (!IsA(node, SelectStmt))
+ return false;
+
+ ctx.has_function_call = false;
+ ctx.pg_terminate_backend_pid = 0;
+
+ raw_expression_tree_walker(node, function_call_walker, &ctx);
+
+ return ctx.pg_terminate_backend_pid;
+
+}
+
+/*
* Return true if this SELECT has system catalog table.
*/
bool pool_has_system_catalog(Node *node)
@@ -286,6 +306,20 @@ static bool function_call_walker(Node *node, void *context)
ereport(DEBUG1,
(errmsg("function call walker, function name: \"%s\"", fname)));
+ if (ctx->pg_terminate_backend_pid == 0 && strcmp("pg_terminate_backend", fname) == 0)
+ {
+ if (list_length(fcall->args) == 1)
+ {
+ Node *arg = linitial(fcall->args);
+ if (IsA(arg, A_Const) &&
+ ((A_Const *)arg)->val.type == T_Integer)
+ {
+ ctx->pg_terminate_backend_pid = ((A_Const *)arg)->val.val.ival;
+ ereport(DEBUG1,
+ (errmsg("pg_terminate_backend pid = %d",ctx->pg_terminate_backend_pid)));
+ }
+ }
+ }
/*
* Check white list if any.
*/
diff --git a/src/utils/pool_stream.c b/src/utils/pool_stream.c
index 39903bf..892c07d 100644
--- a/src/utils/pool_stream.c
+++ b/src/utils/pool_stream.c
@@ -207,8 +207,16 @@ int pool_read(POOL_CONNECTION *cp, void *buf, int len)
cp->socket_state = POOL_SOCKET_ERROR;
if (cp->isbackend)
{
- ereport(ERROR,
+ if (cp->info->backend_terminated == 1)
+ {
+ cp->info->backend_terminated = 0;
+ ereport(FATAL,
(errmsg("unable to read data from DB node %d",cp->db_node_id),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+
+ ereport(ERROR,
+ (errmsg("unable to read data from DB node %d",cp->db_node_id),
errdetail("socket read failed with an error \"%s\"", strerror(errno))));
/* if fail_over_on_backend_error is true, then trigger failover */
@@ -347,6 +355,14 @@ char *pool_read2(POOL_CONNECTION *cp, int len)
cp->socket_state = POOL_SOCKET_ERROR;
if (cp->isbackend)
{
+ if (cp->info->backend_terminated == 1)
+ {
+ cp->info->backend_terminated = 0;
+ ereport(FATAL,
+ (errmsg("unable to read data from DB node %d",cp->db_node_id),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+
/* if fail_over_on_backend_error is true, then trigger failover */
if (pool_config->fail_over_on_backend_error)
{
@@ -603,6 +619,14 @@ int pool_flush(POOL_CONNECTION *cp)
{
if (cp->isbackend)
{
+ if (cp->info->backend_terminated == 1)
+ {
+ cp->info->backend_terminated = 0;
+ ereport(FATAL,
+ (errmsg("unable to read data from DB node %d",cp->db_node_id),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+
/* if fail_over_on_backend_error is true, then trigger failover */
if (pool_config->fail_over_on_backend_error)
{
@@ -650,6 +674,14 @@ int pool_flush_noerror(POOL_CONNECTION *cp)
{
if (cp->isbackend)
{
+ if (cp->info->backend_terminated == 1)
+ {
+ cp->info->backend_terminated = 0;
+ ereport(FATAL,
+ (errmsg("unable to read data from DB node %d",cp->db_node_id),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+
/* if fail_over_on_backend_erro is true, then trigger failover */
if (pool_config->fail_over_on_backend_error)
{
@@ -810,6 +842,14 @@ char *pool_read_string(POOL_CONNECTION *cp, int *len, int line)
cp->socket_state = POOL_SOCKET_ERROR;
if (cp->isbackend)
{
+ if (cp->info->backend_terminated == 1)
+ {
+ cp->info->backend_terminated = 0;
+ ereport(FATAL,
+ (errmsg("unable to read data from DB node %d",cp->db_node_id),
+ errdetail("pg_terminate_backend was called on the backend")));
+ }
+
notice_backend_error(cp->db_node_id, true);
child_exit(POOL_EXIT_AND_RESTART);
ereport(ERROR,
More information about the pgpool-hackers
mailing list