NVMe driver regression for non-smp/1-cpu systems
Bug #1651602 reported by
Chris Gregan
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Critical
|
Unassigned | ||
Xenial |
Fix Released
|
Critical
|
Dan Streetman |
Bug Description
MAAS Version 2.1.1+bzr5544-
Deploying Xenial Nodes
1) Deploy MAAS 2.1.1 on Yakkety
2) Associate Juju 2.1 beta3
3) Juju deploy Kubernetes Core
Nodes begin to deploy but fail
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: b"no disk with serial 'CVMD434500BN40
Related bugs:
* bug 1647485: NVMe symlinks broken by devices with spaces in model or serial strings
* bug 1642903: introduce disk/by-id (model_serial) symlinks for NVMe drives
Changed in linux (Ubuntu Xenial): | |
status: | New → Confirmed |
importance: | Undecided → High |
tags: | added: cdo-qa-blocker |
Changed in linux (Ubuntu Xenial): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in linux (Ubuntu Xenial): | |
status: | Confirmed → Fix Committed |
summary: |
- Intel NVMe driver does not expose consistent links in /dev/disk/by-id + NVMe driver regression for non-smp/1-cpu systems |
description: | updated |
description: | updated |
Changed in maas: | |
status: | Won't Fix → Invalid |
Changed in linux (Ubuntu): | |
status: | Invalid → Fix Committed |
importance: | Undecided → Critical |
affects: | curtin → ubuntu-translations |
no longer affects: | ubuntu-translations |
affects: | maas → ubuntu-translations |
no longer affects: | ubuntu-translations |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Invalid |
To post a comment you must log in.
Saw a similar failure in Curtin's vmtests [1]. Here is output from the Xenial boot log:
[ 1.370713] nvme nvme0: Failed to get enough MSI/MSIX interrupts
[ 1.371798] nvme 0000:00:07.0: Removing after probe failure
[ 1.380426] FDC 0 is a S82078B
[ 1.398396] nvme nvme1: Failed to get enough MSI/MSIX interrupts
[ 1.399396] nvme 0000:00:08.0: Removing after probe failure
Looks like a kernel regression at this point. This failure was on Linux version 4.4.0-57-generic. The last test to pass was on Linux version 4.4.0-53-generic. From [2] it looks like there was an attempt to fix this, by allowing the kernel to fall-back to legacy interrupts in the events that MSI-X and even MSI interrupts failed to be allocated.
[1] https:/ /jenkins. ubuntu. com/server/ job/curtin- vmtest/ 649/artifact/ output/ XenialTestNvme/ logs/ lists.infradead .org/pipermail/ linux-nvme/ 2016-May/ 004653. html
[2] http://