So what I suspect here, it is some issue with l2population mechanism driver but I don't know exactly what the issue is there.
As a next steps, I think You should enable debug everywhere (on neutron-server and ovs-agents) and than try to reproduce the issue and check what is maybe missing or wrong there.
Also You can check if this dhcp requests are not going out from compute node to the vxlan tunnel, or maybe requests are sent properly and replies are dropped somewhere. It may also help us to understand exactly which missing flow is causing this problem.
Thx Marek for this data. I checked flows and I think that there are missing some flows in br-tun bridge before You restart neutron-ovs-agent.
For sure there is no flows:
table=0, priority= 1,in_port= "vxlan- ac100e47" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e46" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e49" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e39" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e2e" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e2b" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e42" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e29" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e2a" actions= resubmit( ,4) 1,in_port= "vxlan- ac100e32" actions= resubmit( ,4)
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
Which are responsible for packets comming to the host from vxlan tunnels. And next flow, which is:
table=0, priority=0 actions=drop
has got some packets in the counter.
Also there are missing flows like:
table=20, priority= 2,dl_vlan= 1,dl_dst= fa:16:3e: 51:f4:84 actions= strip_vlan, load:0x79- >NXM_NX_ TUN_ID[ ],output: "vxlan- ac100e2b" 2,dl_vlan= 1,dl_dst= fa:16:3e: 4b:f8:2d actions= strip_vlan, load:0x79- >NXM_NX_ TUN_ID[ ],output: "vxlan- ac100e29"
table=20, priority=
table=21, priority= 1,arp,dl_ vlan=1, arp_tpa= 10.13.37. 11 actions= load:0x2- >NXM_OF_ ARP_OP[ ],move: NXM_NX_ ARP_SHA[ ]->NXM_ NX_ARP_ THA[],move: NXM_OF_ ARP_SPA[ ]->NXM_ OF_ARP_ TPA[],load: 0xfa163e51f484- >NXM_NX_ ARP_SHA[ ],load: 0xa0d250b- >NXM_OF_ ARP_SPA[ ],move: NXM_OF_ ETH_SRC[ ]->NXM_ OF_ETH_ DST[],mod_ dl_src: fa:16:3e: 51:f4:84, IN_PORT 1,arp,dl_ vlan=1, arp_tpa= 10.13.37. 10 actions= load:0x2- >NXM_OF_ ARP_OP[ ],move: NXM_NX_ ARP_SHA[ ]->NXM_ NX_ARP_ THA[],move: NXM_OF_ ARP_SPA[ ]->NXM_ OF_ARP_ TPA[],load: 0xfa163e4bf82d- >NXM_NX_ ARP_SHA[ ],load: 0xa0d250a- >NXM_OF_ ARP_SPA[ ],move: NXM_OF_ ETH_SRC[ ]->NXM_ OF_ETH_ DST[],mod_ dl_src: fa:16:3e: 4b:f8:2d, IN_PORT
table=21, priority=
table=22, priority= 1,dl_vlan= 1 actions= strip_vlan, load:0x79- >NXM_NX_ TUN_ID[ ],output: "vxlan- ac100e47" ,output: "vxlan- ac100e46" ,output: "vxlan- ac100e49" ,output: "vxlan- ac100e39" ,output: "vxlan- ac100e2e" ,output: "vxlan- ac100e2b" ,output: "vxlan- ac100e42" ,output: "vxlan- ac100e29" ,output: "vxlan- ac100e2a" ,output: "vxlan- ac100e32"
So what I suspect here, it is some issue with l2population mechanism driver but I don't know exactly what the issue is there.
As a next steps, I think You should enable debug everywhere (on neutron-server and ovs-agents) and than try to reproduce the issue and check what is maybe missing or wrong there.
Also You can check if this dhcp requests are not going out from compute node to the vxlan tunnel, or maybe requests are sent properly and replies are dropped somewhere. It may also help us to understand exactly which missing flow is causing this problem.