[pgpool-hackers: 2335] Re: Corner case bug in primary failover

Tatsuo Ishii ishii at sraoss.co.jp
Tue May 16 10:10:46 JST 2017


>> On Tue, May 9, 2017 at 5:53 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > On Tue, May 9, 2017 at 6:16 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>> >
>>> >> Usama,
>>> >>
>>> >> While adding a new test case to 003.failover regression test, I found
>>> >> a corner case bug in primary failover.
>>> >>
>>> >> Suppose Pgpool-II starts but is yet finding primary node. If primary
>>> >> failover happens, it skips finding primary node and let the initial
>>> >> value of it (Req_info->primary_node_id == -1) to be used as the new
>>> >> primary node id. As a result, no primary node id exists until next
>>> >> failover happens.
>>> >>
>>> >> Initialy I thought The problem is in the code of
>>> >> pgpool_main.c:failover() which tries to optimize finding primary node
>>> >> process.
>>> >>
>>> >>                 /*
>>> >>                  * If the down node was a standby node in streaming
>>> >> replication
>>> >>                  * mode, we can avoid calling
>>> >> find_primary_node_repeatedly() and
>>> >>                  * recognize the former primary as the new primary node,
>>> >> which
>>> >>                  * will reduce the time to process standby down.
>>> >>                  */
>>> >>                 else if (MASTER_SLAVE && pool_config->master_slave_sub_
>>> mode
>>> >> == STREAM_MODE &&
>>> >>                                  reqkind == NODE_DOWN_REQUEST)
>>> >>                 {
>>> >>                         if (Req_info->primary_node_id != node_id)
>>> >>                                 new_primary = Req_info->primary_node_id;
>>> >>                         else
>>> >>                                 new_primary =
>>> >> find_primary_node_repeatedly();
>>> >>
>>> >> I was attempting to fix it by checking Req_info->primary_node_id to
>>> >> see if it's initial value (-1) or not. If it's -1,
>>> >> find_primary_node_repeatedly() need to be called.
>>> >>
>>> >> But looking into pgpool_main() closely, I suspect there's a
>>> >> fundamental problem:
>>> >>
>>> >> 1) It processes failover in CHECK_REQUEST *before* setting
>>> >>    Req_info->primary_node_id.
>>> >>
>>> >>         /*
>>> >>          * check for child signals to ensure child startup before
>>> >> reporting successfull start
>>> >>          */
>>> >>         CHECK_REQUEST;
>>> >>
>>> >>         ereport(LOG,
>>> >>                         (errmsg("%s successfully started. version %s
>>> >> (%s)", PACKAGE, VERSION, PGPOOLVERSION)));
>>> >>
>>> >>         /*
>>> >>          * if the primary node id is not loaded by watchdog, search for
>>> it
>>> >>          */
>>> >>         if (Req_info->primary_node_id < 0)
>>> >>         {
>>> >>                 /* Save primary node id */
>>> >>                 Req_info->primary_node_id = find_primary_node();
>>> >>         }
>>> >>
>>> >> 2) It uses find_primary_node(), rather than
>>> >>    find_primary_node_repeatedly(). So if by some reasons (for example
>>> >>    the backend does not come up yet), find_primary_node() will fail
>>> >>    and Req_info->primary_node_id is set to -1.
>>> >>
>>> >> I think proper fix will be moving the CHECK_REQUEST call above inside
>>> >> main loop, and change the find_primary_node() call to
>>> >> find_primary_node_repeatedly().
>>> >>
>>> >> Attached is the patch to do that (plus change the
>>> >> search_primary_node_timeout to smaller value in 055.backend_all_down
>>> >> test. Otherwise, regression timeout is triggered) against master
>>> >> branch.
>>> >>
>>> >> What do you think?
>>> >>
>>> >
>>> >
>>> > Waoo thanks for catching this, it is a really annoying issue, I think
>>> your
>>> > patch does solve the problem and is the right approach,
>>> > But I was thinking what if we move search for the primary node before
>>> > starting the child processes. So that we spawn the child processes after
>>> > finishing all the
>>> > startup rituals?
>>>
>>> Oh I think that will make even things safer.
>>>
>>> > Do you think it will cause some issues?
>>>
>>> So far there's nothing I can think of. Let me try it. I will report
>>> back tomorrow.
>>>
>> 
>> Many thanks
> 
> It seems your proposed change works well. So I pushed it to the master
> branch. However, I have not back patched for the moment because
> tomorrow Pen is going to release minor versions and I do not want to
> risk to make a regression in the release.

I have back patched this to 3.6, 3.5 and 3.4 stable branches. 3.3 or
older branches are hard to incorporate the fix and I just left them.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list