VM using Mellanox VF fails to reboot

Bug #1915081 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
Unassigned
Ubuntu-18.04-hwe
Fix Released
Undecided
Unassigned
Upstream-kernel
Fix Released
Undecided
Unassigned

Bug Description

linux-hwe 5.0.0-2.3~18.04.1

1) Instantiate a Mellanox VF, e.g.:
   echo 3 | sudo tee /sys/class/net/enp132s0f1/device/sriov_numvfs
2) Pass the newly instantiated VF into a virtual machine
3) Bring up the interface of the VF in the guest

The guest will be unable to reboot:

[ OK ] Stopped Monitoring of LVM2 mirrors,…sing dmeventd or progress polling.
         Stopping LVM2 metadata daemon...
[ OK ] Stopped LVM2 metadata daemon.
[ OK ] Deactivated swap /swapfile.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Reached target Shutdown.
[ OK ] Reached target Final Step.
         Starting Reboot...
[ 88.003995] mlx5_core 0000:04:00.0: mlx5_enter_error_state:103:(pid 1): start
[ 88.004629] mlx5_core 0000:04:00.0: mlx5_enter_error_state:110:(pid 1): end
[ 88.012056] reboot: Restarting system

Revision history for this message
dann frazier (dannf) wrote :

This issue no longer exists in Ubuntu. Kernel bisection shows that it impacted upstream kernels between v4.20 and v5.3.

Bisection was a little complicated because there are 2 overlapping issues. There's the reboot hang, but there's also an issue that causes the host mellanox driver to crash when you passthrough a VF. So I bisected the mellanox driver crash first, then manually applied that fix while biscting the reboot hang.

Here's a chronological set of the relevant commits (annotation will
require a fixed-width font):

v4.19
975bb8b4dc93 PCI/IOV: Use VF0 cached config space size for other VFs ------------------+
v4.20-rc1 |
b61d271e59d7 iommu/dma: Move domain lookup into __iommu_dma_{map,unmap} --------+ +---- Reboot hangs
76bf6a8634a1 Revert "PCI/IOV: Use VF0 cached config space size for other VFs" --|------+
v5.3-rc1 +--- Passthrough crashes
8af23fad6261 iommu/dma: Handle MSI mappings separately -------------------------+
v5.3-rc5

As you can see, the reboot hang problem started in upstream v4.20-rc1,
and was fixed in v5.3-rc1. So 4.15 was not impacted, and all 5.4
kernels already have the fix.

Changed in kunpeng920:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.