kernel 3.13.0-32 ipvs "IPv6 header not found" related to UDP socket sendto() EPERM errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Chris J Arges | ||
Trusty |
Fix Released
|
Medium
|
Chris J Arges | ||
Utopic |
Fix Released
|
Medium
|
Chris J Arges |
Bug Description
SRU Justificaton:
[Impact]
Users of ipvs may encounter dropped packets and message such as:
[70325.516724] IPv6 header not found
[Fix]
Commit eb90b0c734ad793
[Regression Potential]
This changes NFPROTO_IPV4 to NFPROTO_IPV6 as we'd expect this nf_hook_ops struct to be in the code.
--
I have an Ubuntu 14.04 host that I am using as both a keepalived/ipvs loadbalancer and dnsmasq server for pxebooting servers.
After updating linux-image 3.13.0-29.53 -> 3.13.0-32.57 I noticed that dnsmasq-tftp stopped working. pxeboot clients would hang on the "Loading ..../linux" TFTP transfer, with the transfer stalling roughly ~1000 blocks into the transfer:
10:30:51.011728 IP 10.1.1.2.43540 > 10.1.12.1.49165: UDP, length 1412
10:30:51.011924 IP 10.1.12.1.49165 > 10.1.1.2.43540: UDP, length 4
10:30:51.012012 IP 10.1.1.2.43540 > 10.1.12.1.49165: UDP, length 1412
10:30:51.012183 IP 10.1.12.1.49165 > 10.1.1.2.43540: UDP, length 4
stracing dnsmasq I noticed something very odd: sendto() on the socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) would suddenly start persistently returning EPERM in mid-transfer, even when dnsmasq continued to periodically retry:
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 1 (in [17], left {0, 249834})
recvfrom(17, "\0\4\3\352", 4096, 0, NULL, NULL) = 4
lseek(16, 1410816, SEEK_SET) = 1410816
read(16, "\25\306\
sendto(17, "\0\3\3\
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 1 (in [17], left {0, 249839})
recvfrom(17, "\0\4\3\353", 4096, 0, NULL, NULL) = 4
lseek(16, 1412224, SEEK_SET) = 1412224
read(16, "*\360 <C\363l\
sendto(17, "\0\3\3\354*\360 <C\363l\
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
lseek(16, 1412224, SEEK_SET) = 1412224
read(16, "*\360 <C\363l\
sendto(17, "\0\3\3\354*\360 <C\363l\
This was with all iptables rules unloaded (so no OUTPUT -j DENY) and apparmor profiles torn down.
I also noticed the following dmesgs appearing at roughly similar times to the tftp transfers getting stuck (although not coinciding exactly with the stall):
[70325.516724] IPv6 header not found
The error pointed to ipvs (which I am using on the same host as an IPv4 NAT loadbalancer):
http://
http://
I then tore down the ipvs rules (service keepalived stop) and unloaded the modules (rmmod ip_vs_rr ip_vs), and the issue resolved itself - the stalled dnsmasq-tftp transfer resumed!
This seems to be reproducible, i.e. modprobing ip_vs and starting keepalived will cause dnsmasq-tftp to stall again, and stopping/unloading will resume.
This seems to happen reproducibly on boot with -32 and -30. This does NOT seem to happen with 3.13.0-29 which I was using up until now.
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jul 29 13:43 seq
crw-rw---- 1 root audio 116, 33 Jul 29 13:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=
InstallationDate: Installed on 2014-06-03 (56 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
MachineType: Dell Inc. PowerEdge R410
Package: linux-image-
PackageArchitec
PciMultimedia:
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.127.5
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-32-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 07/30/2013
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.12.0
dmi.board.name: 01V648
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: PowerEdge R410
dmi.sys.vendor: Dell Inc.
Changed in linux (Ubuntu Trusty): | |
assignee: | nobody → Chris J Arges (arges) |
status: | New → In Progress |
Changed in linux (Ubuntu Utopic): | |
status: | Confirmed → Fix Released |
Changed in linux (Ubuntu Trusty): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Utopic): | |
importance: | Undecided → Medium |
tags: | added: bug-exists-upstream |
Changed in linux (Ubuntu Utopic): | |
assignee: | nobody → Chris J Arges (arges) |
status: | Confirmed → In Progress |
description: | updated |
Changed in linux (Ubuntu Utopic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Fix Committed |
Apologies, misfiled initially on biosdevname, I meant to file on the linux-image package source.