[pgpool-general: 8001] Re: pcp_node_info does not return when host is lost on 4.3.0

Emond Papegaaij emond.papegaaij at gmail.com
Wed Jan 26 16:23:25 JST 2022


On Wed, Jan 26, 2022 at 2:37 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> >> It seems it's the call to PQpingParams in db_node_status that's causing
> >> the problems:
> >> ret = PQpingParams(keywords, values, 1);
> >>
> >> The worker never gets past that point for node 2. I hope this helps you
> >> address the issue.
>
> Thank you for the analysis. I had never expected PQpingParams() stucks.
>

That's the advantage of being ignorant. I had no expectations at all :)

> If that's not possible, the number of timeouts should be reduced to an
> > absolute minimum.
>
> Ok, I would add timeout 1 second (that's the minimum) to the call for
> PQpingParams.
>

I think the PQpingParams should take the same connect_timeout as configured
in the pgpool config. Otherwise this method might hit a timeout while other
parts of the code don't. I think that would only cause confusion.

> Why is pcp_node_info checking the status of every
> > backend, while I only request the status of a single backend?
>
> I don't realize it. If so, it must be a bug. Let me check.
>

The loop is in get_nodes. It always iterates over all
nodes. inform_node_info (pcp_worker.c) calls this function and only prints
the selected node.

> Also, it
> > should not try to connect twice. If the first attempt fails, the second
> > should be skipped.
>

Looking at the previous patches, I think both patches are needed. The last
patch (pcp_hang.patch) prevents a second timeout for the same backend when
the ping fails. The first (pcp_node_info_hang.patch) is also still needed,
because there's a small chance the database will get lost in between the
calls. pcp_node_info should not perform retries. So the two things
remaining are: setting a connect_timeout on PQpingParams and get_nodes
should be refactored to only collect the information for one node if the
pcp worker has a node id.

Best regards,
Emond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220126/a9571325/attachment.htm>


More information about the pgpool-general mailing list