intermittent dhcp failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Incomplete
|
Undecided
|
Unassigned | ||
neutron (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Xenial |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hi,
We have observed instances failing to get a DHCP reply, either when booting for the first time, or after a reboot.
By tcpdumping the traffic on the tap, then qvb and qvo interfaces, we can see the DHCP request leaving, but it doesn't reach the neutron-gateway node where the dhcp agent for the network runs.
If we remove and then re-add the network to the dhcp agent, it starts working fine again.
I have dumped the output of `ovs-ofctl dump-flows br-tun` on the neutron-gateway side, before and after doing the remove/add.
One difference I noted, was a rule appearing, with the mac of the VM which had the issue:
cookie=
However, this is a rule for packets from the dhcp agent to the VM, so its absence doesn't explain why DHCP requests don't arrive to destination.
The cloud is running neutron 8.4.0 (Mitaka), on ubuntu (16.04) machines.
Most of the time it works, but this problem comes back often enough that it impacts users.
Thanks.
Can you see if this happens on master? Mitaka is no longer supported upstream.