ICMPv6 Neighbor Advertisement packets from VM's link-local address dropped by OVS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Committed
|
Medium
|
Brian Haley |
Bug Description
When a VM transmits an ICMPv6 Neighbour Advertisement packet from its link-local (fe80::/64) address, the NA packet ends up being dropped by the OVS and is not forwarded to the external provider network. This causes connectivity issues as the external router is unable to resolve the link-layer MAC address for the VM's link-local IPv6 address. NA packets from the VM's global IPv6 address are forwarded correctly.
Adding security group rule such as "Egress,
We are running OpenStack Antelope, neutron 22.0.2 and OVN 23.03. Platform is AlmaLinux 9.2, RDO packages.
We believe, but are not 100% sure, that this problem may have started after upgrading from OVN 22.12. Reverting the upgrade to confirm is unfortunately a complicated task, so we would like to avoid that if possible.
Tcpdump can be used to confirm that the packets vanish inside OVS. First, on the tap interface connected to the VM. We can here see the external router (fe80::
$ sudo tcpdump -i tapb7c872a4-a5 host fe80::669d:
08:41:24.201970 IP6 fe80::669d:
08:41:24.202004 IP6 fe80::18:
08:41:25.366752 IP6 fe80::669d:
08:41:25.366775 IP6 fe80::18:
08:41:26.374637 IP6 fe80::669d:
08:41:26.374693 IP6 fe80::18:
However, while tcpdumping the same traffic on the external interface (bond0) on the provider VLAN tag the network is using, the NA packets are no longer there:
$ sudo tcpdump -i bond0 vlan 882 and host fe80::669d:
08:41:24.201964 IP6 fe80::669d:
08:41:25.366747 IP6 fe80::669d:
08:41:26.374625 IP6 fe80::669d:
This explains why there are so many NS packets - the router keeps retrying forever.
Compare this with NA packets from the VM's global address, which works as expected:
$ sudo tcpdump -ni tapb7c872a4-a5 ether host 64:9d:99:3a:3d:58 and icmp6 and net not fe80::/10
08:56:03.015378 IP6 2a02:c0:
08:56:03.015408 IP6 2a02:c0:
$ sudo tcpdump -ni bond0 vlan 882 and ether host 64:9d:99:3a:3d:58 and icmp6 and net not fe80::/10
08:56:03.015292 IP6 2a02:c0:
08:56:03.015539 IP6 2a02:c0:
We can further confirm it by finding an explicit drop rule within OVS:
$ sudo ovs-appctl dpif/dump-flows br-int | grep drop
recirc_
We see that there are a ton of built-in default rules pertaining to NA packets:
$ sudo ovs-ofctl dump-flows br-int | grep -c icmp_type=136
178
This is not unexpected as ICMPv6 ND (NS/NA/RS/RA/etc) are essential parts of the IPv6 protocol (like ARP in IPv4), and should not be dropped even if the VM is using a "block everything" security group. Our assumption is that the logic in these rules are flawed somehow, so they inadvertently end up blocking the NA packets from the VM's link-local address.
We have been unable to reproduce the problem using ofproto/trace, probably because it does not allow to set the icmp_type attribute for some reason. If we add ",icmp_type=136" to the command line below, it fails with "prerequisites not met for setting icmp_type". We have no idea what that missing prerequisite could possibly be - any suggestions would be greatly appreciated.
$ sudo ovs-appctl ofproto/trace br-int in_port=
Flow: icmp6,in_
bridge("br-int")
----------------
0. in_port=161, priority 100, cookie 0x2f9439aa
set_
set_
set_
set_
set_
resubmit(,8)
8. metadata=0x9, priority 50, cookie 0x59f248ee
set_
resubmit(,73)
73. ipv6,reg14=
74. No match.
drop
move:
-> NXM_NX_XXREG0[111] is now 0
resubmit(,9)
9. metadata=0x9, priority 0, cookie 0xcc8526d3
resubmit(,10)
10. metadata=0x9, priority 0, cookie 0xc47fdc5d
resubmit(,11)
11. metadata=0x9, priority 0, cookie 0xddf6f6b9
resubmit(,12)
12. ipv6,metadata=0x9, priority 100, cookie 0x26ff06cc
set_
resubmit(,13)
13. metadata=0x9, priority 0, cookie 0xda44fc0c
resubmit(,14)
14. ipv6,reg0=
ct(
drop
-> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 15.
-> Sets the packet to an untracked state, and clears all the conntrack fields.
Final flow: icmp6,reg0=
Megaflow: recirc_
Datapath actions: ct(zone=
=======
recirc(0x23b8) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
=======
Flow: recirc_
bridge("br-int")
----------------
thaw
Resuming from table 15
15. ct_state=
set_
set_
resubmit(,16)
16. ipv6,reg0=
set_
resubmit(,17)
17. metadata=0x9, priority 0, cookie 0x77e302aa
resubmit(,18)
18. metadata=0x9, priority 0, cookie 0x97ee4db3
resubmit(,19)
19. metadata=0x9, priority 0, cookie 0x6b46ef3d
resubmit(,20)
20. metadata=0x9, priority 0, cookie 0x238074d5
resubmit(,21)
21. metadata=0x9, priority 0, cookie 0x4b2f00cb
resubmit(,22)
22. metadata=0x9, priority 0, cookie 0x1de1893e
resubmit(,23)
23. metadata=0x9, priority 0, cookie 0x1b7c54a9
resubmit(,24)
24. metadata=0x9, priority 0, cookie 0x91b808bf
resubmit(,25)
25. metadata=0x9, priority 0, cookie 0x827a7c62
resubmit(,26)
26. ipv6,reg0=
ct(
nat(src)
set_
-> Sets the packet to an untracked state, and clears all the conntrack fields.
resubmit(,27)
27. metadata=0x9, priority 0, cookie 0xe9561f7f
resubmit(,28)
28. metadata=0x9, priority 0, cookie 0x426dc5bb
resubmit(,29)
29. metadata=0x9, priority 0, cookie 0xeab289c
resubmit(,30)
30. metadata=0x9, priority 0, cookie 0x620602c5
resubmit(,31)
31. metadata=0x9, priority 0, cookie 0x5504e379
resubmit(,32)
32. metadata=0x9, priority 0, cookie 0x5e1c22f5
resubmit(,33)
33. metadata=0x9, priority 0, cookie 0x8233a381
set_
resubmit(,71)
71. No match.
drop
resubmit(,34)
34. reg15=0,
set_
resubmit(,37)
37. priority 0
resubmit(,39)
39. priority 0
resubmit(,40)
40. reg15=0x8001,
set_
set_
resubmit(,41)
41. priority 0
42. ipv6,reg15=
43. ipv6,reg15=
44. metadata=0x9, priority 0, cookie 0xcbd84a69
45. ct_state=
46. metadata=0x9, priority 0, cookie 0x9ae00a32
47. metadata=0x9, priority 0, cookie 0x98ca16da
48. metadata=0x9, priority 0, cookie 0x7eb5b6c5
49. metadata=0x9, priority 0, cookie 0x149995b7
50. metadata=0x9, priority 0, cookie 0x9158534f
75. No match.
-> NXM_NX_XXREG0[111] is now 0
51. metadata=0x9, priority 0, cookie 0xb046f48c
64. priority 0
65. reg15=0x1,
0. priority 0
set_
Final flow: recirc_
Megaflow: recirc_
Datapath actions: ct(commit,
tags: | added: ipv6 ovn |
Changed in neutron: | |
importance: | Undecided → Medium |
Changed in neutron: | |
assignee: | nobody → Brian Haley (brian-haley) |
So I wasn't able to reproduce this yet locally, although I've only been able to test on a private network which I assumed would show the same issue. But I figured I'd add a note with at least what I saw.
First, the OVS firewall will add a flow for the link-local address for IPv6 NA traffic, it was added in https:/ /review. opendev. org/c/openstack /neutron/ +/783743 which shows it backported to basically all releases.
First I tested with ML2/OVS and the OVS Firewall driver, testing was with devstack on the master branch (Bobcat). After booting a VM I saw the following flows for NA:
$ sudo ovs-appctl dpif/dump-flows br-int | grep 136 | grep fe80 id(0),in_ port(9) ,ct_state( -trk),eth( src=fa: 16:3e:aa: fc:06,dst= fa:16:3e: ae:2e:fe) ,eth_type( 0x86dd) ,ipv6(src= fe80::f816: 3eff:feaa: fc06,proto= 58,frag= no),key32( 00 00/00 00),icmpv6( type=136) , packets:0, bytes:0, used:never, actions:6 id(0),in_ port(9) ,ct_state( -trk),eth( src=fa: 16:3e:aa: fc:06,dst= 33:33:00: 00:00:01) ,eth_type( 0x86dd) ,ipv6(src= fe80::f816: 3eff:feaa: fc06,proto= 58,frag= no),key32( 00 00/00 00),icmpv6( type=136) , packets:2, bytes:172, used:9.348s, actions: push_vlan( vid=1,pcp= 0),1,pop_ vlan,4, 5,6
recirc_
recirc_
So no actions=drop rules. Pings from the qrouter namespace to the LL address worked fine.
Then I tried with ML2/OVN, same setup. After booting a VM I saw the following flows for NA:
$ sudo ovs-ofctl dump-flows br-int | grep 136 | grep 35d9 90,ipv6, reg14=0x4, metadata= 0x1,dl_ src=fa: 16:3e:6a: 35:d9,ipv6_ src=fe80: :f816:3eff: fe6a:35d9 actions= resubmit( ,10) 90,icmp6, reg14=0x4, metadata= 0x1,dl_ src=fa: 16:3e:6a: 35:d9,nw_ ttl=255, icmp_type= 136,icmp_ code=0, nd_target= fe80::f816: 3eff:fe6a: 35d9 actions= conjunction( 1093703813, 1/2) 90,icmp6, reg14=0x4, metadata= 0x1,dl_ src=fa: 16:3e:6a: 35:d9,nw_ ttl=255, icmp_type= 136,icmp_ code=0, nd_target= fd2c:af9f: 6196:0: f816:3eff: fe6a:35d9 actions= conjunction( 1093703813, 1/2) 90,ipv6, reg15=0x4, metadata= 0x1,dl_ dst=fa: 16:3e:6a: 35:d9,ipv6_ dst=fe80: :f816:3eff: fe6a:35d9 actions= resubmit( ,49)
cookie=0x2467f8c7, duration=248.112s, table=9, n_packets=9, n_bytes=774, idle_age=136, priority=
cookie=0x0, duration=248.110s, table=10, n_packets=0, n_bytes=0, idle_age=248, priority=
cookie=0x0, duration=248.110s, table=10, n_packets=0, n_bytes=0, idle_age=248, priority=
cookie=0x9e65c598, duration=248.113s, table=48, n_packets=2, n_bytes=204, idle_age=136, priority=
Again, no actions=drop rules, and pings worked fine.
Things could be different on a provider network, when I booted a similar VM using a shared network it had other issues, just don't have time at the moment to debug that.
This was all running with OVN 22.03.2 from the Ubuntu cloud archives.