[segments] dnsmasq can't delete lease for instance due to mismatch between client ip and local addr
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Confirmed
|
Medium
|
Unassigned |
Bug Description
Issue:
The Neutron DHCP agent bootstraps the DHCP leases file for a network using all associated subnets[1]. In a multi-segment environment, however, a DHCP agent can only service a single segment/subnet of a given network.
The DHCP namespace, then, is configured with an interface containing a single IP address for the respective segment/subnet it's servicing. When a VM from the same network but different segment/subnet is deleted, the DHCP release packet that would be issued by dhcp_release isn't sent due to a mismatch between client IP and local addr.
Brian Haley patched dhcp_release.c recently to fix a similar issue here:
We can probably update dnsmasq-utils in the short term, but maybe making the DHCP agent segment aware is a better long-term solution?
Here are the steps to reproduce:
-=-=-=-=-
Network: rpn_multisegment
Segment 1:
VLAN 106 10.106.0.0/24
Provider Mapping: physnet1:bond1
Segment 2:
VLAN 206 10.206.0.0/24
Provider Mapping: physnet2:bond1
Two VMs:
🌕OpenStack Lab % openstack server list
+------
| ID | Name | Status | Networks | Image | Flavor |
+------
| 40f94b68-
| 34f8ff53-
+------
On compute01, we can see host file populated with entries for each subnet associated with the network:
root@lab-
fa:16:3e:
fa:16:3e:
fa:16:3e:
fa:16:3e:
Same on compute02:
root@lab-
fa:16:3e:
fa:16:3e:
fa:16:3e:
fa:16:3e:
The leases file, however, contains only those hosts that have obtained leases (expected):
root@lab-
1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:
1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-
1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:
1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
Everything looks OK so far.
When restarting the neutron-dhcp-agent, however, the leases file is bootstrapped and contains entries for all subnets associated with the network:
root@lab-
1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
This configuration becomes a problem when a VM is deleted and dhcp_release is executed, as the the namespaces on each host only have an IP from their respective segment and will not be able to delete a lease for what essentially is a non-connected subnet:
root@lab-
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-5ccc6426-
link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet6 fe80::f816:
valid_lft forever preferred_lft forever
root@lab-
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-0c51acd3-
link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet6 fe80::f816:
valid_lft forever preferred_lft forever
Example:
🌕OpenStack Lab % openstack server delete vm-seg1
lab-compute01:
Dec 01 13:58:12 lab-compute01 dnsmasq-
Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/
Dec 01 13:58:13 lab-compute01 dnsmasq-
Dec 01 13:58:13 lab-compute01 dnsmasq-
root@lab-
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
lab-compute02:
Dec 01 13:58:13 lab-compute02 neutron-
Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/
root@lab-
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
As you can see, the lease for 10.106.0.98 was not deleted on compute02, as that segment/subnet is not configured on ns-0c51acd3-60 in the DHCP namespace like it would be in an ordinary provider network.
tags: | removed: rfe |
That sounds interesting indeed, maybe RFE-level (as this would be fixed with making the DHCP agent segments-aware)