neutron

intermittent dhcp failures

Bug #1748894 reported by Laurent Sesquès on 2018-02-12

This bug affects 3 people

	Status	Importance	Assigned to
neutron	Incomplete	Undecided	Unassigned
neutron (Ubuntu)	Confirmed	Undecided	Unassigned
Xenial	Confirmed	Undecided	Unassigned

Bug Description

Hi,

We have observed instances failing to get a DHCP reply, either when booting for the first time, or after a reboot.
By tcpdumping the traffic on the tap, then qvb and qvo interfaces, we can see the DHCP request leaving, but it doesn't reach the neutron-gateway node where the dhcp agent for the network runs.

If we remove and then re-add the network to the dhcp agent, it starts working fine again.

I have dumped the output of `ovs-ofctl dump-flows br-tun` on the neutron-gateway side, before and after doing the remove/add.
One difference I noted, was a rule appearing, with the mac of the VM which had the issue:
cookie=0xb42e8d404b989e72, duration=13.344s, table=20, n_packets=0, n_bytes=0, hard_timeout=300, idle_age=13, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:db:34:a3 actions=load:0->NXM_OF_VLAN_TCI[],load:0x20->NXM_NX_TUN_ID[],output:5201

However, this is a rule for packets from the dhcp agent to the VM, so its absence doesn't explain why DHCP requests don't arrive to destination.

The cloud is running neutron 8.4.0 (Mitaka), on ubuntu (16.04) machines.

Most of the time it works, but this problem comes back often enough that it impacts users.

Thanks.