IPV6 fragmentation and mtu issue

Bug #1463911 reported by Gyula Halmos
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Jay Vosburgh
Trusty
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Undecided
Unassigned

Bug Description

Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes. The traffic is goign through an intra-VM network and the packet loss is hurting the system.

There is a patch for this issue: http://patchwork.ozlabs.org/patch/434957/

I would like to know is there any bug report or official release date for this issue ?

This is pretty critical for my deployment.

Thanks in advance,

BR,

Gyula

Changed in nova:
status: New → Confirmed
Changed in neutron:
status: New → Confirmed
Revision history for this message
SecurityFun23 (securityfun23) wrote :

This issues is documented in more details in the following old question: https://ask.openstack.org/en/question/43063/ipv6-fragmentationmtu-issue-on-icehouseubuntu-1404/

We have also seen this issue in our lab using Ubuntu 14.04 and RHEL 6. As far as we can tell, the proposed kernel patch has not been implemented in any of the current linux kernel load lines (its possible that a different patch than the one referenced in the bug report could have been applied, but if that's the case the fix has not made it into the latest Ubuntu 14.04 or RHEL6 kernels).

The underlying issue is that IPv6 fragmented packets are being re-assembled as part of the ip6tables inspection performed by the "neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver" driver. This inspection occurs on the linux bridge layer, and it appears that once the packets have been assembled they are too big to be sent out of the bridge to the next interface. A better behavior would be to re-fragment the IPv6 packet, or to store and then send the original fragments.

This issue does not impact TCP in IPv6, since IPv6 does not fragment packets in the network just at the endpoints, and TCP will never create IP fragments. However, UDP and ICMP are both impacted by this issue. This means that IPv6 is essentially broken when the standard "neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver" driver is used. If the NOOP driver is used or if "net.bridge.bridge-nf-call-ip6tables = 0" option is set in /etc/sysctl.conf to disable ip6tables on bridges, then IPv6 will operate properly. However, in that case Neutron Security Groups and default neutron security rules will have no impact on IPv6 packets.

Possible solutions are to get a fix for this put into the Linux Kernel, or to modify the "OVSHybridIptablesFirewallDriver" so that it does not trigger re-assembly (if this is even possible).

Revision history for this message
Gyula Halmos (gyula-halmos) wrote :

Hi there,

We are rebuilding our computes' kernels with the patch to test the solution. But other than that we are waiting for some solution is it is a real showstopper for some of our customers, as their security policies doesnt allow to bypass ip6tables.

BR,

Gyula

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Thanks for the report. I've been looking at the netfilter docs and it doesn't look like we can stop the re-assembly and still have the first packet processed by conntrack. Do you know if this is possible?

If so, I can submit a patch to install a rule that would allow the subsequent fragments to go by as a temporary workaround. The downside would be that arbitrary fragments could get through.

Revision history for this message
Sean M. Collins (scollins) wrote :

I'd like to see the fix get merged into the linux kernel, inside netfilter. I don't think this is a Nova/Neutron specific thing that we can fix independently, since even the proposed fix has side-effects that can be undesirable.

Revision history for this message
SecurityFun23 (securityfun23) wrote :

I agree that the best solution would be to have this merged into the linux kernel. Also, I am unaware of a method to prevent reassembly while still using conntrack. Probably the only way to prevent the re-assembly would be to disable conntrack, but then that would break stateful firewalling, and that wouldn't be consistent with OpenStack security groups. So not only is the kernel update the "best" solution, I think its probably the only complete / consistent one.

As far as kernel patches go, I did want to point out that the current behavior for the IPv4 iptables re-assembly / re-fragmentation is to create entirely new fragments. For linux bridges, in both IPv4 and IPv6, I think that the more desirable behavior is to transmit the original fragments instead of re-fragmenting. The reason this behavior would be preferred is that re-fragmentation now introduces the MTU of the bridge into the picture. If the MTU of the bridge is larger or smaller than the MTU of the VM or gateway this can cause problems. Note that with OpenStack Juno the MTU of the bridges used by the OVSHybridIptablesFirewallDriver is a global value, so if you are using differ MTU for provider and internal networks, you can NOT use OVSHybridIptablesFirewallDriver for IPv4 either. In OpenStack Kilo, per network MTU is a supported feature, so I assume this is not a problem for Kilo (we are using Juno, so I cannot confirm this).

However, I should mention that an advantage of re-fragmentation is that you are guaranteed that the packet that is passed by iptables really matches what iptables thinks it does. I know there are some security attacks that take advantage of fact that ip fragment overlaps are handled differently by different operating systems. So iptables might interpret the packet as having one meaning and the end system might interpret it as something else. Doing re-fragmentation solves this issue. However, I would suggest ip fragment overlaps are really bad behavior, and the packet should probably just be dropped rather than trying to fix this by re-fragmenting.

This MTU issue is sort of secondardy to the IPv6 fragmentation patch BUT if someone is looking here to figure out the patch they should submit, I would like them to have this extra info in case they want to improve the patch.

Revision history for this message
Sean M. Collins (scollins) wrote :

A fix to Netfilter was merged into Linux Kernel version 4.2:

https://github.com/torvalds/linux/commit/efb6de9b4ba0092b2c55f6a52d16294a8a698edd

So, in my mind this isn't really an OpenStack bug. It's a Linux bug that just so happens to be hurting OpenStack deployers that want to do IPv6 related things. I think this bug can be closed, since the real fix is "Wait until distros start deploying the fix" - which yes is a long time away - unless someone backports. Regardless, more of a kernel issue than Nova or OpenStack.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Setting the MTU on the tap, bridge and veth pair to something high seems to fix it in my rudimentary testing.

for int in $(ifconfig | grep 1234abcd | grep -v qbr | awk '{ print $1 }'); do sudo ifconfig $int mtu 9000; done

Where "1234abcd" is the first part of the VM port's UUID.

Revision history for this message
SecurityFun23 (securityfun23) wrote :

Note that increasing the MTU is only possible if your network allows large ethernet frames, which is often not the case. Also, it is only a partial fix since you can't normally increase the MTU much above 9000 (as the Jumbo frame limit is usually near there).

tags: added: kernel-key
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1463911

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Confirmed
tags: added: bot-stop-nagging
Dave Chiluk (chiluk)
tags: added: sts
Revision history for this message
Dave Chiluk (chiluk) wrote :

I have confirmed that this exists with the latest trusty kernels. I have also attempted a cherry-pick+requisite patches, and that quickly ballooned into ridiculousness. I'm going to need to do a back-port/re-implementation of this.

Changed in linux (Ubuntu):
assignee: nobody → Dave Chiluk (chiluk)
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

I have done a backport of

commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd
Author: Bernhard Thaler <email address hidden>
Date: Sat May 30 15:30:16 2015 +0200

    netfilter: bridge: forward IPv6 fragmented packets

to the trusty 3.13 kernel. This necessitated pulling in some bits from other patches as well. I am currently testing for regressions and will submit it for SRU if all goes well.

Dave Chiluk (chiluk)
Changed in linux (Ubuntu):
assignee: Dave Chiluk (chiluk) → Jay Vosburgh (jvosburgh)
Revision history for this message
SecurityFun23 (securityfun23) wrote :

Jay Vosburgh,

      Have you finished your testing of the patch for Ubuntu Trusty? I would be interesting in getting a hold of a "beta" version of that patch even if i wasn't an official part of Ubuntu.

Thanks!

Revision history for this message
Gyula Halmos (gyula-halmos) wrote :

Would be interested as well.

Thanks!

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

The original patch had an error in it; I believe I've found it and once I verify that and clean it up a bit I"ll attach it to the bug.

Revision history for this message
SecurityFun23 (securityfun23) wrote :

Jay Vosburgh,

Just wondering if you found some time to clean up that patch.

Thanks!

tags: added: kernel-da-key
removed: kernel-key
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

SRU Justification:

Impact:

 This bug causes issues when ip6tables modules are loaded with IPv6
fragmented packets traversing a bridge. The extant conntrack processing
will reassemble the IPv6 fragments for netfilter processing, but is
incapable of re-fragmenting these datagrams for subsequent forwarding.
This causes the fragmented IPv6 datagrams to be dropped.

Fix:

 This is resolved by backporting functionality from mainline that
re-fragments the IPv6 datagrams upon bridge egress.

Testcase:

 The patch commit log includes a test case; to summarize:

 A bridge is configured with two ports and interfaces are attached
to these ports. A traffic source beyond one port generates fragmented
IPv6 datagrams, e.g., ping6 -s 2000, destined for a host beyond the
bridge.

 With ip6tables modules unloaded, the IPv6 fragments will traverse
the bridge. Loading ip6tables, e.g., "ip6tables -t nat -L", will cause
IPv6 fragmented datagrams to be dropped on the unpatched kernel.

 These datagrams are correctly forwarded with the patch applied.

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Test methodology performed on 3.19 kernel with patch applied:

Host A: fd01:2222::1/64 direct connect to host C

ip addr add fd01:2222::1/64 dev eth0

Host B: fd01:2222::2/64 direct connect to host C

ip addr add fd01:2222::2/64 dev eth0

host C: direct connect interfaces for Hosts A & B bridged together:

brctl addbr testbr0
brctl addif testbr0 eth1
brctl addif testbr0 eth5
ip link set dev eth1 up
ip link set dev eth5 up
ip link set dev testbr0 up
ip addr add fd01:2222::99/64 dev testbr0

host A:

continuous ping6 to host C's address beyond the bridge, using size large
enough to generate fragmented IPv6 datagrams for mtu setting of 1500:

ping6 -s 4000 fd01:2222::2

host C:

load ip6tables_nat:

ip6tables -t nat -Ln

Observe on host A that ping continues uninterrupted

Inspect eth1 and eth5 interfaces on host C with tcpdump to confirm traffic passes
through the bridge

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

The equivalent testing to comment #20 was also performed on the 3.13 and 3.16 kernels, additionally, a customer separately validated the 3.13 and 3.16 patches in their environment.

Brad Figg (brad-figg)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
Revision history for this message
BALAJI SRINIVASAN (balaji-vasan) wrote :

I understand a patch is going to come for Ubuntu.

We are using Centos latest release.

[root@sienna net]# uname -r
3.10.0-327.10.1.el7.x86_64
[root@sienna net]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

Any information on patch for Centos distro?

Is there any patch in neutron too?

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Yes, the patch has been committed for the next Ubuntu kernel releases.

I have no information on a Centos patch; you would need to file a bug against Centos or RHEL.

No patch to Neutron is required.

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-needed-vivid
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Travis Parchman (travis-parchman) wrote :

So, is this problem already resolved in Wily or is Xenial the first formal release that will not exhibit the problem?

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

The Wily kernel (4.2) already contains the fixes for this bug.

Revision history for this message
Travis Parchman (travis-parchman) wrote :

Awesome. Thank you most kindly for the info.

Jay Vosburgh (jvosburgh)
tags: added: verification-done-trusty verification-done-vivid
removed: verification-needed-trusty verification-needed-vivid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.3 KiB)

This bug was fixed in the package linux - 3.19.0-56.62

---------------
linux (3.19.0-56.62) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555832

  [ Florian Westphal ]

  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (3.19.0-55.61) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554708

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608

linux (3.19.0-54.60) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1552337

  [ Upstream Kernel Changes ]

  * Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
    - LP: #1551419

linux (3.19.0-53.59) vivid; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1550576

  [ Kamal Mostafa ]

  * Merged back 3.19.0-52.58

linux (3.19.0-52.58) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548548

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Upstream Kernel Changes ]

  * Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
    - LP: #1542457
  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1546320
  * net: ipmr: fix static mfc/dev leaks on table destruction
    - LP: #1542457
  * drm/nouveau/nv46: Change mc subdev oclass from nv44 to nv4c
    - LP: #1542457
  * ovl: allow zero size xattr
    - LP: #1542457
  * ovl: use a minimal buffer in ovl_copy_xattr
    - LP: #1542457
  * [media] vb2: fix a regression in poll() behavior for output,streams
    - LP: #1542457
  * [media] gspca: ov534/topro: prevent a division by 0
    - LP: #1542457
  * [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
    - LP: #1542457
  * tools lib traceevent: Fix output of %llu for 64 bit values read on 32
    bit machines
    - LP: #1542457
  * KVM: x86: expose MSR_TSC_AUX to userspace
    - LP: #1542457
  * KVM: x86: correctly print #AC in traces
    - LP: #1542457
  * drm/radeon: call hpd_irq_event on resume
    - LP: #1542457
  * xhci: refuse loading if nousb is used
    - LP: #1542457
  * arm64: Clear out any singlestep state on a ptrace detach operation
    - LP: #1542457
  * time: Avoid signed overflow in timekeeping_get_ns()
    - LP: #1542457
  * ovl: root: copy attr
    - LP: #1542457
  * Bluetooth: Add support of Toshiba Broadcom based devices
    - LP: #1522949, #1542457
  * rtlwifi: fix memory leak for USB device
    - LP: #1542457
  * wlcore/wl12xx: spi: fix oops on firmware load
    - LP: #1542457
  * ovl: check dentry positiveness in ovl_cleanup_whiteouts()
    - LP: #1542457
  * EDAC, mc_sysfs: Fix freeing bus' name
    - LP: #1542457
  * EDAC: Robustify workqueues destruction
    - LP: #1542457
  * arm64: mm: ensure that the zero page is visible to the page table
    walker
    - LP: #1542457
  * powerpc: Make value-returning atomics fully ordered
    - LP: #1542457
  * powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
    - LP: #1542457
  * dm space map metadata: remove unused variable in brb_pop()
    - LP: #1542457
  * dm thi...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 3.13.0-83.127

---------------
linux (3.13.0-83.127) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555839

  [ Florian Westphal ]

  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (3.13.0-82.126) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554732

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608
  * net: generic dev_disable_lro() stacked device handling
    - LP: #1547680

linux (3.13.0-81.125) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1552316

  [ Upstream Kernel Changes ]

  * Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
    - LP: #1551419
  * bcache: Fix a lockdep splat in an error path
    - LP: #1551327

linux (3.13.0-80.124) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548519

  [ Andy Whitcroft ]

  * [Debian] hv: hv_set_ifconfig -- convert to python3
    - LP: #1506521
  * [Debian] hv: hv_set_ifconfig -- switch to approved indentation
    - LP: #1540586
  * [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
    - LP: #1540586

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Upstream Kernel Changes ]

  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1546320
  * [media] gspca: ov534/topro: prevent a division by 0
    - LP: #1542497
  * [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
    - LP: #1542497
  * tools lib traceevent: Fix output of %llu for 64 bit values read on 32
    bit machines
    - LP: #1542497
  * KVM: x86: correctly print #AC in traces
    - LP: #1542497
  * drm/radeon: call hpd_irq_event on resume
    - LP: #1542497
  * xhci: refuse loading if nousb is used
    - LP: #1542497
  * arm64: Clear out any singlestep state on a ptrace detach operation
    - LP: #1542497
  * time: Avoid signed overflow in timekeeping_get_ns()
    - LP: #1542497
  * rtlwifi: fix memory leak for USB device
    - LP: #1542497
  * wlcore/wl12xx: spi: fix oops on firmware load
    - LP: #1542497
  * EDAC, mc_sysfs: Fix freeing bus' name
    - LP: #1542497
  * EDAC: Don't try to cancel workqueue when it's never setup
    - LP: #1542497
  * EDAC: Robustify workqueues destruction
    - LP: #1542497
  * powerpc: Make value-returning atomics fully ordered
    - LP: #1542497
  * powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
    - LP: #1542497
  * dm space map metadata: remove unused variable in brb_pop()
    - LP: #1542497
  * dm thin: fix race condition when destroying thin pool workqueue
    - LP: #1542497
  * futex: Drop refcount if requeue_pi() acquired the rtmutex
    - LP: #1542497
  * drm/radeon: clean up fujitsu quirks
    - LP: #1542497
  * mmc: sdio: Fix invalid vdd in voltage switch power cycle
    - LP: #1542497
  * mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
    - LP: #1542497
  * udf: limit the maximum number of indirect extents in a row
    - LP: #1542497
  * nfs: Fix race in __update_open_stateid...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
no longer affects: nova
Changed in neutron:
importance: Undecided → Medium
no longer affects: neutron
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.