[ovn] metadata route missing on the guest
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned | ||
neutron (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
* High level description
Metadata server (169.254.169.254) is unreachable on VMs attached to only one affected network in the entire cluster. DHCP is enabled on that subnet and VMs get their IP addresses on boot, however the routing rule for metadata is missing:
$ ip r
default via 10.134.253.1 dev eth0
10.134.253.0/24 dev eth0 scope link src 10.134.253.181
Because of that cloud-init metadata requests are being sent to the router rather than ovnmeta netns.
On guests running in the unaffected network, routing table after booting or sending DHCP request looks like this and metadata endpoint is reachable:
$ ip r
default via 172.16.2.1 dev eth0
169.254.169.254 via 172.16.2.10 dev eth0
172.16.2.0/24 dev eth0 scope link src 172.16.2.248
I managed to work this around by manually adding a route to the metadata IP via DHCP port on the router attached to that network, however I believe it should not be needed and such configuration is definitely not present on all the "good" networks on this cluster.
Please let me know what logs and other information would be useful here.
* Step-by-step reproduction steps
1) Create a VM attached to the affected network.
2) Metadata server is unreachable, cloud-init fails because of the missing route not being provided by DHCP server.
* Expected output
I'd expect metadata route to be present on the guest:
$ ip r
default via 10.134.253.1 dev eth0
169.254.169.254 via 10.134.253.2 dev eth0
10.134.253.0/24 dev eth0 scope link src 10.134.253.181
* Actual output:
$ ip r
default via 10.134.253.1 dev eth0
10.134.253.0/24 dev eth0 scope link src 10.134.253.181
* Versions
neutron-common 2:16.4.1-0ubuntu2
neutron-
python3-neutron 2:16.4.1-0ubuntu2
python3-neutron-lib 2.3.0-0ubuntu1
python3-
ovn-common 20.03.2-
ovn-host 20.03.2-
openvswitch-common 2.13.3-
openvswitch-switch 2.13.3-
python3-openvswitch 2.13.3-
python3-ovsdbapp 1.1.0-0ubuntu2
Host OS: Ubuntu 20.04.3 LTS
Kernel: 5.8.0-48-generic #54~20.04.1-Ubuntu
Deployment: Juju charms
Guest OS: cirros 0.5.2 and Ubuntu 20.04, so most likely all distros are affected
* Environment
42 compute nodes, nova-compute 21.2.2-0ubuntu1 + libvirt 6.0.0-0ubuntu8.14 + KVM.
Deployed with Juju charms.
* Perceived severity
Not a blocker since there is a workaround.
Hi Przemysław Lal,
What you mean by "affected" network? Do you mean there are multiple networks in your setup, and out of those only one such network is misbehaving in terms of routes?
If above is true, is the "affected" network misbehaving since when it's created, or it used to work earlier and stopped working later? What other differences are there in affected and unaffected networks/subnets?
Following information would be good to collect for affected network, by dropping the workarounds:- ids:subnet_ id=<subnet- id>
- openstack network show <network id>
- openstack subnet show <subnet id>
- openstack port list --device-owner network:distributed --network <affected network>
- ovn-nbctl find DHCP_Options external_
And also same info from unaffected network.