neutron-agent-sriov fails to create port
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Queens |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Committed
|
Undecided
|
Unassigned | ||
Train |
Fix Committed
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
Victoria |
Fix Released
|
Undecided
|
Unassigned | ||
Wallaby |
Fix Released
|
Undecided
|
Unassigned | ||
Xena |
Fix Released
|
Undecided
|
Unassigned | ||
pyroute2 (Ubuntu) |
Fix Released
|
High
|
Billy Olsen | ||
Bionic |
Fix Released
|
High
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned | ||
Hirsute |
Fix Released
|
High
|
Unassigned | ||
Impish |
Fix Released
|
High
|
Billy Olsen |
Bug Description
[Impact]
Netlink calls to the kernel can return more than 16k bytes (they can return 32k on newer kernels). The pyroute2 library has a default buffer size of 16k and fails to read the data when kernel response data overflows this.
One example of where users encounter this is booting OpenStack instances with SRIOV when there are more than 32 VFs, as seen in the original problem description (included below).
[Test Case]
Use an SRIOV capable card and enable more than 32 VFs on a modern kernel. Attempt to launch an instance using OpenStack as follows:
1. Create example network:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack network create \
--provider-
--provider-segment 300 \
--provider-
test-sriov
$ openstack subnet create --network test-sriov \
--no-dhcp \
--gateway none \
--subnet-range 192.168.1.0/24 test-sriov
2. Create ports over virtual function:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack port create \
--network test-sriov \
--vnic-type direct \
sriov-vf1
$ openstack server create \
--image bionic-kvm \
--flavor m1.small \
--network ext-net-300 \
--port sriov-vf1 \
--key-name ubuntu-keypair \
--availability-zone nova:cmp4az1cz2
sriov-vf1
3. The instance stalls in build state (virsh list shows paused VM) and drops to ERROR
[Where problems could occur]
Problems may occur in existing customers already using openstack to schedule SRIOV instances and may show up as failure to build instances. Additional problems could include the increased memory usage of the nova processes which occurs by increasing the default buffer size. For tightly spec'd systems with small memory allocated to the host, this could further eat into any margin available and push memory usage over the edge.
summary: |
- neutron-agent-stiov fails to create port + neutron-agent-sriov fails to create port |
description: | updated |
information type: | Public → Private |
Changed in charm-ovn-chassis: | |
assignee: | nobody → Andrew McLeod (admcleod) |
Changed in charm-ovn-chassis: | |
status: | New → Incomplete |
Changed in charm-ovn-chassis: | |
status: | Incomplete → Invalid |
description: | updated |
Changed in pyroute2 (Ubuntu): | |
importance: | Undecided → High |
Changed in pyroute2 (Ubuntu Focal): | |
status: | New → Triaged |
Changed in pyroute2 (Ubuntu Hirsute): | |
status: | New → Triaged |
Changed in pyroute2 (Ubuntu Focal): | |
importance: | Undecided → High |
Changed in pyroute2 (Ubuntu Hirsute): | |
importance: | Undecided → High |
description: | updated |
information type: | Private → Public |
tags: | removed: verification-needed-victoria |
Would you be able to collect logs with debug enabled?