Very first VM launched won't response to ARP request
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
networking-cisco |
New
|
Undecided
|
Unassigned |
Bug Description
I’m seeing this issue more consistently with Nexus VXLAN offload, and I think I found the culprit;
seemingly a timing issue.
In order to pass traffic, the 2 N9K VXLAN gateways have to form NVE peers first.
When the very 1st VM is launch, the cisco_nexus driver will configure the N9Ks with the VLAN,
VNI and multicast address info.
After the 9Ks are configured, it takes time to form the NVE peers.
In the mean time, Neutron spawned the DHCP server who in turns gives out an fixed
IP address to the VM.
However, since the data path is not yet established between the 9Ks, the VM never received
the IP address from the DHCP server. This can be confirmed from the VM console log:
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing
WARN: /etc/rc3.
.
.
=== pinging gateway failed, debugging connection ===
############ debug start ##############
### /etc/init.d/sshd start
Starting dropbear sshd: OK
route: fscanf
### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:2E:C7:1C
inet6 addr: fe80::f816:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1768 (1.7 KiB) TX bytes:1114 (1.0 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1020 (1020.0 B) TX bytes:1020 (1020.0 B)
### route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
route: fscanf
### cat /etc/resolv.conf
cat: can't open '/etc/resolv.conf': No such file or directory
### gateway not found
/sbin/cirros-
Now, if I reboot the VM, and since the data path is already established between the 9Ks,
i.e. NVE peers are formed, the VM will receive the IP address from the DHCP server:
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending select for 10.0.0.2...
Lease of 10.0.0.2 obtained, lease time 86400
This also explains why the subsequent VMs do not have the same problem.