intel_iommu: Fix enable intel_iommu, Ubuntu 22.04 installation crashes

Bug #1982104 reported by koba
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Critical
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Medium
koba
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
koba
Jammy
Invalid
Undecided
Unassigned
Kinetic
Invalid
Undecided
Unassigned
linux-oem-5.17 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
koba
Kinetic
Invalid
Undecided
Unassigned

Bug Description

[Impact]
Ubuntu 22.04 installation crashes on our Intel Sapphire Rapids proto server.
Attaching the console logs.

Currently, it looks like disabling VT-D option in BIOS settings helps mitigate the issue.
Console logs indicate something is wrong in iommu/dmar subsystem.

[Fix]
The IOMMU driver shares the pasid table for PCI alias devices. When the
RID2PASID entry of the shared pasid table has been filled by the first
device, the subsequent device will encounter the "DMAR: Setup RID2PASID
failed" failure as the pasid entry has already been marked as present.
As the result, the IOMMU probing process will be aborted.

On the contrary, when any alias device is hot-removed from the system,
for example, by writing to /sys/bus/pci/devices/.../remove, the shared
RID2PASID will be cleared without any notifications to other devices.
As the result, any DMAs from those rest devices are blocked.

Sharing pasid table among PCI alias devices could save two memory pages
for devices underneath the PCIe-to-PCI bridges. Anyway, considering that
those devices are rare on modern platforms that support VT-d in scalable
mode and the saved memory is negligible, it's reasonable to remove this
part of immature code to make the driver feasible and stable.

[Test Case]
1. use the target machine(Intel Sapphire Rapids) and install the kernel with the fix.
2. boot the target machine
3. check dmesg if the error message exists
[ 8.120527] pci 0000:03:01.0: DMAR: Setup RID2PASID failed

[Where problems could occur]
After enable intel_iommu, the errors may be occurred.
We need to figure out one by one once the related errors are triggered in the future.

CVE References

koba (kobako)
summary: - intel_iommu: Fixes enable intel_iommu, Ubuntu 22.04 installation crashes
+ Fixes intel_iommu: enable intel_iommu, Ubuntu 22.04 installation
+ crashes
no longer affects: linux-oem-5.10 (Ubuntu)
no longer affects: linux-oem-5.10 (Ubuntu Focal)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1982104

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Jammy):
status: New → Incomplete
koba (kobako)
no longer affects: linux-oem-5.10 (Ubuntu Jammy)
no longer affects: linux-oem-5.10 (Ubuntu Kinetic)
Changed in linux (Ubuntu Jammy):
assignee: nobody → koba (kobako)
status: Incomplete → In Progress
Changed in linux (Ubuntu Kinetic):
assignee: nobody → koba (kobako)
status: Incomplete → In Progress
Changed in linux-oem-5.14 (Ubuntu Focal):
assignee: nobody → koba (kobako)
status: New → In Progress
Changed in linux-oem-5.17 (Ubuntu Jammy):
assignee: nobody → koba (kobako)
status: New → In Progress
koba (kobako)
summary: - Fixes intel_iommu: enable intel_iommu, Ubuntu 22.04 installation
- crashes
+ Fix intel_iommu: enable intel_iommu, Ubuntu 22.04 installation crashes
summary: - Fix intel_iommu: enable intel_iommu, Ubuntu 22.04 installation crashes
+ intel_iommu: Fix enable intel_iommu, Ubuntu 22.04 installation crashes
koba (kobako)
tags: added: dellserver oem-priority originate-from-1973127
no longer affects: linux (Ubuntu Kinetic)
koba (kobako)
Changed in linux (Ubuntu):
assignee: koba (kobako) → nobody
status: In Progress → Invalid
AceLan Kao (acelankao)
Changed in linux-oem-5.17 (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Jammy):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Focal):
status: In Progress → Fix Committed
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.17/5.17.0-1014.15 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
koba (kobako)
tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.17 - 5.17.0-1014.15

---------------
linux-oem-5.17 (5.17.0-1014.15) jammy; urgency=medium

  * jammy/linux-oem-5.17: 5.17.0-1014.15 -proposed tracker (LP: #1981244)

  * Clear PCI errors left from BIOS (LP: #1981173)
    - PCI: Clear PCI_STATUS when setting up device

  * intel_iommu: Fix enable intel_iommu, Ubuntu 22.04 installation crashes
    (LP: #1982104)
    - iommu/vt-d: Fix RID2PASID setup/teardown failure

  * Failed to resume from S3 blocked by atlantic driver[1d6a:94c0]
    (LP: #1981950)
    - net: atlantic: remove deep parameter on suspend/resume functions
    - net: atlantic: remove aq_nic_deinit() when resume

  * Make cm32181 sensor work after system suspend (LP: #1981773)
    - iio: light: cm32181: Add PM support

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * CVE-2022-34918
    - netfilter: nf_tables: stricter validation of element data

  * System freeze after resuming from suspend due to PCI ASPM settings
    (LP: #1980829)
    - PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    - PCI:ASPM: Remove pcie_aspm_pm_state_change()

 -- Chia-Lin Kao (AceLan) <email address hidden> Tue, 19 Jul 2022 22:07:45 +0800

Changed in linux-oem-5.17 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.14/5.14.0-1047.54 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
koba (kobako)
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.14 - 5.14.0-1047.54

---------------
linux-oem-5.14 (5.14.0-1047.54) focal; urgency=medium

  * focal/linux-oem-5.14: 5.14.0-1047.54 -proposed tracker (LP: #1981285)

  * intel_iommu: Fix enable intel_iommu, Ubuntu 22.04 installation crashes
    (LP: #1982104)
    - iommu/vt-d: Fix RID2PASID setup/teardown failure

  * Failed to resume from S3 blocked by atlantic driver[1d6a:94c0]
    (LP: #1981950)
    - net: atlantic: remove deep parameter on suspend/resume functions
    - net: atlantic: remove aq_nic_deinit() when resume

  * Make cm32181 sensor work after system suspend (LP: #1981773)
    - iio: light: cm32181: Add PM support

  * Clear PCI errors left from BIOS (LP: #1981173)
    - PCI: Clear PCI_STATUS when setting up device

  * Miscellaneous Ubuntu changes
    - [Config] Drop CONFIG_PAHOLE_HAS_SPLIT_BTF again

 -- Timo Aaltonen <email address hidden> Tue, 26 Jul 2022 14:10:52 +0300

Changed in linux-oem-5.14 (Ubuntu Focal):
status: Fix Committed → Fix Released
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Timo Aaltonen (tjaalton)
Changed in hwe-next:
importance: Undecided → Critical
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (75.1 KiB)

This bug was fixed in the package linux - 5.15.0-47.51

---------------
linux (5.15.0-47.51) jammy; urgency=medium

  * jammy/linux: 5.15.0-47.51 -proposed tracker (LP: #1983903)

  * Jammy update: v5.15.46 upstream stable release (LP: #1981864)
    - UBUNTU: [Packaging] Move python3-dev to build-depends

  * touchpad and touchscreen doesn't work at all on ACER Spin 5 (SP513-54N)
    (LP: #1884232)
    - x86/PCI: Eliminate remove_e820_regions() common subexpressions
    - x86: Log resource clipping for E820 regions
    - x86/PCI: Clip only host bridge windows for E820 regions
    - x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
    - x86/PCI: Disable E820 reserved region clipping via quirks
    - x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"

  * [SRU][H/OEM-5.13/OEM-5.14/U][J/OEM-5.17/U] Fix invalid MAC address after
    hotplug tbt dock (LP: #1942999)
    - SAUCE: igc: wait for the MAC copy when enabled MAC passthrough

  * Mass Storage Gadget driver truncates device >2TB (LP: #1981390)
    - usb: gadget: storage: add support for media larger than 2T

  * AMD Rembrandt: DP tunneling fails with Thunderbolt monitors (LP: #1983143)
    - SAUCE: drm/amd: Fix DP Tunneling with Thunderbolt monitors
    - drm/amd/display: Fix for dmub outbox notification enable
    - Revert "drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset"
    - drm/amd/display: Reset link encoder assignments for GPU reset
    - drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset
    - drm/amd/display: Fix new dmub notification enabling in DM
    - SAUCE: thunderbolt: Add DP out resource when DP tunnel is discovered.

  * Fix sub-optimal I210 network speed (LP: #1976438)
    - igb: Make DMA faster when CPU is active on the PCIe link

  * e1000e report hardware hang (LP: #1973104)
    - e1000e: Enable GPT clock before sending message to CSME
    - Revert "e1000e: Fix possible HW unit hang after an s0ix exit"

  * ioam6.sh in net from ubuntu_kernel_selftests fails with 5.15 kernels in
    Focal (LP: #1982930)
    - selftests: net: fix IOAM test skip return code

  * Additional fix for TGL + AUO panel flickering (LP: #1983297)
    - Revert "UBUNTU: SAUCE: drm/i915/display/psr: Fix flicker on TGL + AUO panel"
    - drm/i915/display: Fix sel fetch plane offset calculation
    - drm/i915: Nuke ORIGIN_GTT
    - drm/i915/display: Drop PSR support from HSW and BDW
    - drm/i915/display/psr: Handle plane and pipe restrictions at every page flip
    - drm/i915/display/psr: Do full fetch when handling multi-planar formats
    - drm/i915/display: Drop unnecessary frontbuffer flushes
    - drm/i915/display: Handle frontbuffer rendering when PSR2 selective fetch is
      enabled
    - drm/i915/display: Fix glitches when moving cursor with PSR2 selective fetch
      enabled
    - SAUCE: drm/i915/display/psr: Reinstate fix for TGL + AUO panel flicker

  * AMD Yellow Carp DMCUB fw update for s0i3 B0 fixes (LP: #1957026)
    - drm/amd/display: Optimize bandwidth on following fast update
    - drm/amd/display: Fix surface optimization regression on Carrizo
    - drm/amd/display: Reset DMCUB before HW init

  * GPIO character devi...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Invalid → Fix Released
tags: added: fixed-kinetic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.