random oopses on s390 systems using NVMe devices
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Fix Released
|
Medium
|
Seth Forshee | ||
Xenial |
Fix Released
|
High
|
Kleber Sacilotto de Souza | ||
Bionic |
Fix Released
|
High
|
Kleber Sacilotto de Souza |
Bug Description
== SRU Justification ==
IBM is requesting a fix for the following issue found with NVMe devices on s390x:
The trigger is a PCI function whose driver requests more interrupts than the architectural maximum. Currently this is only possible with a machine that supports 64 CPUs (or more) with a NVMe function attached. Note that the LPAR does not have to use >=64 CPUs since the NVMe driver uses num_possible_cpus() which is resolved to the machine maximum on s390 (since all CPUs are hot-pluggable). The oops happens after the driver calls pci_alloc_
The fix has been cc'ed to stable@, but hasn't been picked up for Bionic yet.
== Fix ==
866f3576a72b s390/pci: fix out of bounds access during irq setup
== Regression Potential ==
Low. Affects only s390x systems with more than 64 cpus and NVMe function enabled.
== Test case ==
Boot the kernel in an affected environment.
=== Original bug description ===
Random oopses on s390 systems using NVMe and running the Ubuntu 18.04.1 kernel have been reported.
Bisect of the upstream kernel points to:
16ccfff28976 nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_
This commit is correct but reveals a bug in s390s IRQ setup routine. A fix is available fixed via:
Commit-ID : 866f3576a72b223
Need also be applied for Ubuntu 18.10
tags: | added: architecture-s39064 bugnameltc-170595 severity-high targetmilestone-inin1804 |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in ubuntu-z-systems: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → Seth Forshee (sforshee) |
importance: | Undecided → Medium |
status: | New → Fix Committed |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → Kleber Sacilotto de Souza (kleber-souza) |
status: | New → Triaged |
importance: | Undecided → Medium |
importance: | Medium → High |
description: | updated |
Changed in ubuntu-z-systems: | |
status: | Triaged → In Progress |
Changed in linux (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → High |
assignee: | nobody → Kleber Sacilotto de Souza (kleber-souza) |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
tags: | added: cscc |
@IBM: Even if we do not have NVMe devices in our Z machine (hence we cannot test this on s390x by ourselves) it would be good and helpful if you can share a description / or some steps of a potential test case.
This would help judging the regression risk in case of an SRU to 18.04 (and is needed for a SRU anyway).