[pgpool-general: 2513] Re: wd_escalation_command exit code

Fri Jan 31 12:16:25 JST 2014

On Fri, 31 Jan 2014 00:50:37 +0400
Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:

> 
> On Jan 30, 2014, at 8:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> 
> > Hi,
> > 
> > On Wed, 29 Jan 2014 10:26:00 +0400
> > Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> > 
> >> Hi!
> >> 
> >> I'm testing this patch on a vagrant/virtualbox based VM. 
> >> 
> >> # uname -a
> >> Linux lb-node1 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> >> 
> >> # cat /etc/issue
> >> Ubuntu 12.04.3 LTS \n \l
> >> 
> >> This is the output of ifconfig before starting pgpool:
> >> 
> >> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
> >>          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
> >>          inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>          RX packets:7198 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:4853 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:1000
> >>          RX bytes:553607 (553.6 KB)  TX bytes:722721 (722.7 KB)
> >> 
> >> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>          inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
> >>          inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>          RX packets:23682 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:4876 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:1000
> >>          RX bytes:2551344 (2.5 MB)  TX bytes:646217 (646.2 KB)
> >> 
> >> lo        Link encap:Local Loopback
> >>          inet addr:127.0.0.1  Mask:255.0.0.0
> >>          inet6 addr: ::1/128 Scope:Host
> >>          UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >>          RX packets:1636 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:1636 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:0
> >>          RX bytes:109868 (109.8 KB)  TX bytes:109868 (109.8 KB)
> >> 
> >> 
> >> /etc/pgpool2/pgpool.conf:
> >> ...
> >> debug_level                   = 9
> >> …
> >> delegate_IP                   = '192.168.33.200'
> >> ...
> >> ifconfig_path                 = '/sbin'
> >> if_up_cmd                     = 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0'
> >> if_down_cmd                   = 'ifconfig eth1:0 down'
> >> ...
> >> 
> >> 
> >> Once I start pgpool I get the following ifconfig output
> >> 
> >> 
> >> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
> >>          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
> >>          inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>          RX packets:7939 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:5404 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:1000
> >>          RX bytes:606232 (606.2 KB)  TX bytes:816924 (816.9 KB)
> >> 
> >> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>          inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
> >>          inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>          RX packets:25179 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:5204 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:1000
> >>          RX bytes:2704567 (2.7 MB)  TX bytes:690834 (690.8 KB)
> >> 
> >> eth1:0    Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>          inet addr:192.168.33.200  Bcast:192.168.33.255  Mask:255.255.255.0
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >> 
> >> lo        Link encap:Local Loopback
> >>          inet addr:127.0.0.1  Mask:255.0.0.0
> >>          inet6 addr: ::1/128 Scope:Host
> >>          UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >>          RX packets:1745 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:1745 errors:0 dropped:0 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:0
> >>          RX bytes:117264 (117.2 KB)  TX bytes:117264 (117.2 KB)
> >> 
> >> 
> >> 
> >> # ping 192.168.33.200
> >> PING 192.168.33.200 (192.168.33.200) 56(84) bytes of data.
> >> 64 bytes from 192.168.33.200: icmp_req=1 ttl=64 time=0.060 ms
> >> ^C
> >> --- 192.168.33.200 ping statistics ---
> >> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >> rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
> >> 
> >> 
> >> And these are some messages from pgpool.log:
> >> 
> >> pgpool[4152]: wd_chk_setuid all commands have setuid bit
> >> pgpool[4152]: watchdog might call network commands which using setuid bit.
> >> pgpool[4152]: exec_ping: failed to ping 192.168.33.200
> >> pgpool[4152]: wd_escalation: escalating to master pgpool
> >> pgpool[4152]: wd_IP_up: ifconfig up failed
> >> pgpool[4152]: wd_declare: send the packet to declare the new master
> >> pgpool[4152]: wd_escalation: escalated to master pgpool with some errors
> > 
> > That's funny. This says that "failed to ping" but VIP is brought up in fact. 
> > It may take times between ifconfig and ping. However, pgpool should try to 
> > ping up to three times before this succeeds, but this is tried only one time
> > in the case.
> > 
> > For analysis, I would appreciate it if you would apply the attached patch and
> > send the log output messages.
> 
> start:
> 
> pgpool[2493]: num_backends: 2 total_weight: 2.000000
> pgpool[2493]: backend 0 weight: 1073741823.500000
> pgpool[2493]: backend 0 flag: 0000
> pgpool[2493]: backend 1 weight: 1073741823.500000
> pgpool[2493]: backend 1 flag: 0000
> pgpool[2493]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
> pgpool[2493]: wd_chk_setuid all commands have setuid bit
> pgpool[2493]: watchdog might call network commands which using setuid bit.
> pgpool[2493]: Backend status file /var/log/postgresql/pgpool_status discarded
> pgpool[2493]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> pgpool[2493]: send_packet_4_nodes: packet for lb-node2.site:9000 is canceled
> pgpool[2493]: exec_ping: failed to ping 192.168.33.200: exit code 1
> pgpool[2493]: wd_escalation: escalating to master pgpool
> pgpool[2493]: wd_IP_up: ifconfig up failed
> pgpool[2493]: wd_declare: send the packet to declare the new master
> pgpool[2493]: wd_escalation: escalated to master pgpool with some errors
> pgpool[2493]: wd_init: start watchdog
> 
> 
> 
> 
> stop:
> 
> pgpool[2504]: wd_IP_down: not delegate IP holder
> pgpool[2502]: hb_receiver child receives shutdown request signal 2
> pgpool[2503]: hb_sender child receives shutdown request signal 2
> pgpool[2589]: child received shutdown request signal 2
> pgpool[2493]: shmem_exit(0)
> 
> 
> 
> BTW, when I start/stop unpatched 3.3.2 version I see the same messages about ping failure. But everything works well in this case. 

I misunderstood that the ping failure occurs at bringing up VIP, but this is for
checking whether the VIP is already used by other host. So, this failure message
is no problem.

The problem is, there is no ping message after "escalating to master pgpool".
"ifconfig up failed" may be caused by arping command's failure, since ping
should be executed after arping succeeded.

Could you please try the next patch for analysis? This would oupute log
message when arping command fails.

> 
> unpatched start:
> 
> pgpool[7189]: exec_ping: failed to ping 192.168.33.200
> pgpool[7189]: wd_escalation: escalating to master pgpool
> pgpool[7189]: wd_declare: send the packet to declare the new master
> pgpool[7189]: wd_escalation: escalated to master pgpool successfully
> 
> unpatched stop:
> 
> pgpool[7198]: hb_receiver child receives shutdown request signal 2
> pgpool[7199]: hb_sender child receives shutdown request signal 2
> pgpool[7200]: exec_ping: failed to ping 192.168.33.200
> pgpool[7200]: wd_IP_down: ifconfig down succeeded
> pgpool[7189]: shmem_exit(0)
> 
> 
> 
> > 
> >> 
> >> 
> >> When I stop pgpool I get the following messages in pgpool.log:
> >> 
> >> pgpool[4163]: wd_IP_down: not delegate IP holder
> >> pgpool[4161]: hb_receiver child receives shutdown request signal 2
> >> pgpool[4162]: hb_sender child receives shutdown request signal 2
> >> pgpool[4152]: shmem_exit(0)
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Jan 29, 2014, at 6:42 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >> 
> >>> On Tue, 28 Jan 2014 23:03:20 +0400
> >>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>> 
> >>>> Hi!
> >>>> This patch applied successfully. But now a new problem. When I start pgpool service I get a new interface eth0:0 with failover IP address assigned as expected. But when I stop pgpool service eth0:0 won't go down. It remains even after complete shutdown of pgpool.
> >>> 
> >>> Odd, I can't reproduce this. Are there any error message?
> >>> What ifconfig command do you use?
> >>> 
> >>>> 
> >>>> I tried 3.3.2 without this patch and everything worked well. 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> On Jan 27, 2014, at 5:18 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>> 
> >>>>> On Sat, 25 Jan 2014 15:31:44 +0400
> >>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>> 
> >>>>>> 
> >>>>>> On Jan 24, 2014, at 1:25 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>>> 
> >>>>>>> On Tue, 21 Jan 2014 15:24:02 +0400
> >>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>>> 
> >>>>>>>> Great! Now it is working!
> >>>>>>>> 
> >>>>>>>> pgpool[31903]: wd_escalation: escalation command failed. exit status: 1
> >>>>>>>> 
> >>>>>>>> Thank you!
> >>>>>>>> 
> >>>>>>>> Will this patch be included in 3.3.3 ?
> >>>>>>>> 
> >>>>>>>> Also, what about failed if_up_cmd and further pgpool behaviour (my second message in the thread.) ?
> >>>>>>> 
> >>>>>>> I attached the patch. Could you try this? In this fix, pgpool outputs a error 
> >>>>>>> message for if_up_cmd failure. This patch should be applied after the previous
> >>>>>>> patch. This fix will be included in 3.3.3.
> >>>>>> 
> >>>>>> 
> >>>>>> Hi!
> >>>>>> 
> >>>>>> I tried to apply the patch against both 3.3.1 and 3.3.2
> >>>>>> 
> >>>>>> this is what I got:
> >>>>> 
> >>>>> Hmm.. Could you try the attached patch to 3.3.2? This includes allthe fix
> >>>>> for escalation command and ifconfig errors.
> >>>>> 
> >>>>>> 
> >>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/esc.patch
> >>>>>> 
> >>>>>> patching file src/watchdog/wd_packet.c
> >>>>>> Hunk #1 succeeded at 954 (offset 23 lines).
> >>>>>> 
> >>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/ifup.patch
> >>>>>> 
> >>>>>> patching file src/watchdog/wd_if.c
> >>>>>> Hunk #1 succeeded at 42 with fuzz 1 (offset 3 lines).
> >>>>>> Hunk #2 succeeded at 62 (offset 3 lines).
> >>>>>> Hunk #3 succeeded at 117 (offset 3 lines).
> >>>>>> patching file src/watchdog/wd_packet.c
> >>>>>> Hunk #1 succeeded at 654 (offset 23 lines).
> >>>>>> Hunk #2 succeeded at 939 (offset 23 lines).
> >>>>>> Hunk #3 FAILED at 932.
> >>>>>> Hunk #4 succeeded at 976 (offset 18 lines).
> >>>>>> 1 out of 4 hunks FAILED -- saving rejects to file src/watchdog/wd_packet.c.rej
> >>>>>> 
> >>>>>> 
> >>>>>> src/watchdog/wd_packet.c.rej:
> >>>>>> 
> >>>>>> 
> >>>>>> --- src/watchdog/wd_packet.c
> >>>>>> +++ src/watchdog/wd_packet.c
> >>>>>> @@ -932,22 +933,31 @@
> >>>>>> 	/* execute escalation command */
> >>>>>> 	if (strlen(pool_config->wd_escalation_command))
> >>>>>> 	{
> >>>>>> -		int r;
> >>>>>> 		r = system(pool_config->wd_escalation_command);
> >>>>>> 		if (WIFEXITED(r))
> >>>>>> 		{
> >>>>>> 			if (WEXITSTATUS(r) == EXIT_SUCCESS)
> >>>>>> 				pool_log("wd_escalation: escalation command succeeded");
> >>>>>> 			else
> >>>>>> +			{
> >>>>>> 				pool_error("wd_escalation: escalation command failed. exit status: %d", WEXITSTATUS(r));
> >>>>>> +				has_error = true;
> >>>>>> +			}
> >>>>>> 		}
> >>>>>> 		else
> >>>>>> +		{
> >>>>>> 			pool_error("wd_escalation: escalation command exit abnormally");
> >>>>>> +			has_error = true;
> >>>>>> +		}
> >>>>>> 	}
> >>>>>> 
> >>>>>> 	/* interface up as delegate IP */
> >>>>>> 	if (strlen(pool_config->delegate_IP) != 0)
> >>>>>> -		wd_IP_up();
> >>>>>> +	{
> >>>>>> +		r = wd_IP_up();
> >>>>>> +		if (r == WD_NG)
> >>>>>> +			has_error = true;
> >>>>>> +	}
> >>>>>> 
> >>>>>> 	/* set master status to the wd list */
> >>>>>> 	wd_set_wd_list(pool_config->wd_hostname, pool_config->port,
> >>>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> In addition, I consider that pgpool shoud go to down status when if_up_cmd fails, 
> >>>>>>> since this is worthless as a member of watchdog cluster. I'll make this fix for
> >>>>>>> either 3.3.3 or 3.4.0.
> >>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> Sounds reasonable. 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Yugo Nagata <nagata at sraoss.co.jp>
> >>>>> <escalation_error_all.patch>
> >>>> 
> >>> 
> >>> 
> >>> -- 
> >>> Yugo Nagata <nagata at sraoss.co.jp>
> >> 
> > 
> > 
> > -- 
> > Yugo Nagata <nagata at sraoss.co.jp>
> > <escalation_error_all_for_analysis.patch>
> 

-- 
Yugo Nagata <nagata at sraoss.co.jp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: escalation_error_all_for_analysis2.patch
Type: text/x-diff
Size: 4959 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140131/229361d7/attachment.bin>