I'm hitting this bug in a client installation, Bionic / Queens. Just spent a few hours debugging and in the end came to exactly the same tests and conclusion.
Using DPDK, using bond for dpdk, using isolated metadata (as this is provider only networks). I *can* send data to the netns up to 9000 and it gets there intact. Only packets going out are truncated to a little more than 1500 bytes (IIRC about 1504 (ping works up to -s1476).
Checked every mtu on every port in the path of the netns to ovs to dpdk to external switch, back to other node dpdk/ovs/virtual machine and none of them seem wrong. Tcpdump on both the netns and the target vm show packet leaving ok in the netns but arriving truncated on destination.
I manually set all MTUs in qdhcp namespaces tp 1500 and the problem is gone. Not sure about any consequences of this, though.
Funny thing is that this problem did not appear with charm neutron-openvswitch-next-359 but appeared after an upgrade to neutron-openvswitch-next-367 *and* a compute host reboot. The reason for us using this charms are related to other bugs that were fixed there, and the upgrade was to finally fix one last bug about TCP checksum corruption inside the netns (all this is explained in https://bugs.launchpad.net/neutron/+bug/1832021/ ).
Well, at least the client deployed more than 70 VMs without problem, I upgraded the charm about two weeks ago (things kept apparently ok) and a few days ago I rebooted some compute nodes because of an unrelated problem and this behavior appeared.
I'm hitting this bug in a client installation, Bionic / Queens. Just spent a few hours debugging and in the end came to exactly the same tests and conclusion.
Using DPDK, using bond for dpdk, using isolated metadata (as this is provider only networks). I *can* send data to the netns up to 9000 and it gets there intact. Only packets going out are truncated to a little more than 1500 bytes (IIRC about 1504 (ping works up to -s1476).
Checked every mtu on every port in the path of the netns to ovs to dpdk to external switch, back to other node dpdk/ovs/virtual machine and none of them seem wrong. Tcpdump on both the netns and the target vm show packet leaving ok in the netns but arriving truncated on destination.
I manually set all MTUs in qdhcp namespaces tp 1500 and the problem is gone. Not sure about any consequences of this, though.
Funny thing is that this problem did not appear with charm neutron- openvswitch- next-359 but appeared after an upgrade to neutron- openvswitch- next-367 *and* a compute host reboot. The reason for us using this charms are related to other bugs that were fixed there, and the upgrade was to finally fix one last bug about TCP checksum corruption inside the netns (all this is explained in https:/ /bugs.launchpad .net/neutron/ +bug/1832021/ ).
Well, at least the client deployed more than 70 VMs without problem, I upgraded the charm about two weeks ago (things kept apparently ok) and a few days ago I rebooted some compute nodes because of an unrelated problem and this behavior appeared.