System freeze after resuming from suspend due to PCI ASPM settings

Bug #1980829 reported by AceLan Kao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Critical
Unassigned
linux (Ubuntu)
Fix Released
Undecided
AceLan Kao
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Medium
AceLan Kao
Kinetic
Fix Released
Undecided
AceLan Kao
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
AceLan Kao
Jammy
Invalid
Undecided
Unassigned
Kinetic
Invalid
Undecided
Unassigned
linux-oem-5.17 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
AceLan Kao
Kinetic
Invalid
Undecided
Unassigned
linux-oem-6.0 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Kinetic
Invalid
Undecided
Unassigned
linux-oem-6.1 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
AceLan Kao
Kinetic
Invalid
Undecided
Unassigned

Bug Description

For OEM-6.1
[Impact]
While doing some tests such as suspend/resume or CPU stress tests the system would hang.

[Fix]
Below commit fixed the issue, but not going to be merged into mainline.
The patch is still under discussion and have other variance, and we already merged the origin patch into oem-6.0 and 5.15/5.19 for a year, so could consider it's safer for us.
https://patchwork.<email address hidden>/

I also created a DMI quirk to make the patches only affects on listed platforms.

[Test]
The affected machines could suspend/resume well.

[Where problems could occur]
The patches only affects on the listed platforms, and won't affect other platforms.

======================================================================
For Jammy/Kinetic SRU

[Impact]
While doing some tests such as suspend/resume or CPU stress tests the system would hang.

[Fix]
The 2 commits fix the issue, but still not get accepted yet.
https://patchwork.<email address hidden>/
https://patchwork.ozlabs.org<email address hidden>/

So, I created a DMI quirk to make the patches only affects on listed platforms.

[Test]
Verified on the failed machines and ODM also verified on their side.

[Where problems could occur]
The patches only affects on the listed platforms, and won't affect other platforms.

======================================================================
For OEM-6.0

[Impact]
While doing some tests such as suspend/resume or CPU stress tests the system would hang.

[Fix]
The 2 commits fix the issue, but still not get accepted yet.
https://patchwork.<email address hidden>/
https://patchwork.ozlabs.org<email address hidden>/

[Test]
Verified on the failed machines and ODM also verified on their side.

[Where problems could occur]
The 2 patches look pretty safe to me, they try to preserve the ASPM state of devices.

AceLan Kao (acelankao)
Changed in linux-oem-5.14 (Ubuntu Jammy):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Focal):
assignee: nobody → AceLan Kao (acelankao)
status: New → In Progress
Changed in linux-oem-5.17 (Ubuntu Jammy):
assignee: nobody → AceLan Kao (acelankao)
status: New → In Progress
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1980829

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Jammy):
status: New → Incomplete
AceLan Kao (acelankao)
description: updated
Changed in linux (Ubuntu Focal):
status: Incomplete → Invalid
Changed in linux (Ubuntu Jammy):
assignee: nobody → AceLan Kao (acelankao)
status: Incomplete → In Progress
Changed in linux (Ubuntu Kinetic):
assignee: nobody → AceLan Kao (acelankao)
status: Incomplete → In Progress
AceLan Kao (acelankao)
tags: added: oem-priority originate-from-1978453 somerville
tags: added: originate-from-1978472
Timo Aaltonen (tjaalton)
Changed in hwe-next:
importance: Undecided → Critical
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.14 (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux-oem-5.17 (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.14/5.14.0-1046.53 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.17/5.17.0-1014.15 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
AceLan Kao (acelankao)
tags: added: verification-done-focal verification-done-jammy
removed: verification-needed-focal verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.14 - 5.14.0-1046.53

---------------
linux-oem-5.14 (5.14.0-1046.53) focal; urgency=medium

  * focal/linux-oem-5.14: 5.14.0-1046.53 -proposed tracker (LP: #1980928)

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * CVE-2022-34918
    - netfilter: nf_tables: stricter validation of element data

  * System freeze after resuming from suspend due to PCI ASPM settings
    (LP: #1980829)
    - PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    - PCI:ASPM: Remove pcie_aspm_pm_state_change()

 -- Chia-Lin Kao (AceLan) <email address hidden> Wed, 13 Jul 2022 21:02:35 +0800

Changed in linux-oem-5.14 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.17 - 5.17.0-1014.15

---------------
linux-oem-5.17 (5.17.0-1014.15) jammy; urgency=medium

  * jammy/linux-oem-5.17: 5.17.0-1014.15 -proposed tracker (LP: #1981244)

  * Clear PCI errors left from BIOS (LP: #1981173)
    - PCI: Clear PCI_STATUS when setting up device

  * intel_iommu: Fix enable intel_iommu, Ubuntu 22.04 installation crashes
    (LP: #1982104)
    - iommu/vt-d: Fix RID2PASID setup/teardown failure

  * Failed to resume from S3 blocked by atlantic driver[1d6a:94c0]
    (LP: #1981950)
    - net: atlantic: remove deep parameter on suspend/resume functions
    - net: atlantic: remove aq_nic_deinit() when resume

  * Make cm32181 sensor work after system suspend (LP: #1981773)
    - iio: light: cm32181: Add PM support

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * CVE-2022-34918
    - netfilter: nf_tables: stricter validation of element data

  * System freeze after resuming from suspend due to PCI ASPM settings
    (LP: #1980829)
    - PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    - PCI:ASPM: Remove pcie_aspm_pm_state_change()

 -- Chia-Lin Kao (AceLan) <email address hidden> Tue, 19 Jul 2022 22:07:45 +0800

Changed in linux-oem-5.17 (Ubuntu Jammy):
status: Fix Committed → Fix Released
AceLan Kao (acelankao)
description: updated
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
AceLan Kao (acelankao)
tags: added: originate-from-1985043
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (41.2 KiB)

This bug was fixed in the package linux - 5.15.0-48.54

---------------
linux (5.15.0-48.54) jammy; urgency=medium

  * jammy/linux: 5.15.0-48.54 -proposed tracker (LP: #1987775)

  * System freeze after resuming from suspend due to PCI ASPM settings
    (LP: #1980829)
    - SAUCE: PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    - SAUCE: whitelist platforms that needs save/restore ASPM L1SS for
      suspend/resume

  * [SRU][J/OEM-5.17][PATCH 0/1] Fix oled brightness set above frame-average
    luminance (LP: #1978986)
    - SAUCE: drm: New function to get luminance range based on static hdr metadata
    - SAUCE: drm/amdgpu_dm: Rely on split out luminance calculation function
    - SAUCE: drm/i915: Use luminance range calculated during edid parsing

  * Jammy: Add OVS Internal Port HW Offload to mlx5 driver (LP: #1983498)
    - net/mlx5e: Refactor rx handler of represetor device
    - net/mlx5e: Use generic name for the forwarding dev pointer
    - net/mlx5: E-Switch, Add ovs internal port mapping to metadata support
    - net/mlx5e: Support accept action
    - net/mlx5e: Accept action skbedit in the tc actions list
    - net/mlx5e: Offload tc rules that redirect to ovs internal port
    - net/mlx5e: Offload internal port as encap route device
    - net/mlx5e: Enable TC offload for ingress MACVLAN
    - net/mlx5e: Add indirect tc offload of ovs internal port
    - net/mlx5e: Term table handling of internal port rules
    - net/mlx5: Support internal port as decap route device
    - net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
    - net/mlx5e: TC, Fix memory leak with rules with internal port
    - net/mlx5e: Fix skb memory leak when TC classifier action offloads are
      disabled
    - net/mlx5e: Fix nullptr on deleting mirroring rule
    - net/mlx5e: Avoid implicit modify hdr for decap drop rule
    - net/mlx5e: Fix wrong source vport matching on tunnel rule
    - net/mlx5e: TC, fix decap fallback to uplink when int port not supported

  * Remove unused variable from i915 psr (LP: #1986798)
    - SAUCE: drm/i915/display/psr: Remove unused variable

  * refactoring of overlayfs fix to properly support shiftfs (LP: #1983640)
    - SAUCE: overlayfs: remove CONFIG_AUFS_FS dependency

  * Jammy update: v5.15.53 upstream stable release (LP: #1986728)
    - Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
    - drm/amdgpu: To flush tlb for MMHUB of RAVEN series
    - ksmbd: set the range of bytes to zero without extending file size in
      FSCTL_ZERO_DATA
    - ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA
    - ksmbd: use vfs_llseek instead of dereferencing NULL
    - ipv6: take care of disable_policy when restoring routes
    - net: phy: Don't trigger state machine while in suspend
    - nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX
      S40G)
    - nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
    - nvdimm: Fix badblocks clear off-by-one error
    - powerpc/prom_init: Fix kernel config grep
    - powerpc/book3e: Fix PUD allocation size in map_kernel_page()
    - powerpc/bpf: Fix use of user_pt_regs in uapi
    - dm raid: fix ...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.19.0-18.18

---------------
linux (5.19.0-18.18) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-18.18 -proposed tracker (LP: #1990366)

  * 5.19.0-17.17: kernel NULL pointer dereference, address: 0000000000000084
    (LP: #1990236)
    - Revert "UBUNTU: SAUCE: apparmor: Fix regression in stacking due to label
      flags"
    - Revert "UBUNTU: [Config] disable SECURITY_APPARMOR_RESTRICT_USERNS"
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - add an internal buffer""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't wait on cleanup""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't waste entropy""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - always add a pending
      request""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - unregister device before
      reset""
    - Revert "UBUNTU: SAUCE: Revert "virtio-rng: make device ready before making
      request""
    - Revert "UBUNTU: [Config] update configs after apply new apparmor patch set"
    - Revert "UBUNTU: SAUCE: apparmor: add user namespace creation mediation"
    - Revert "UBUNTU: SAUCE: selinux: Implement userns_create hook"
    - Revert "UBUNTU: SAUCE: bpf-lsm: Make bpf_lsm_userns_create() sleepable"
    - Revert "UBUNTU: SAUCE: security, lsm: Introduce security_create_user_ns()"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: AppArmor: Remove the exclusive
      flag"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add /proc attr entry for full
      LSM context"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Removed scaffolding function
      lsmcontext_init"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: netlabel: Use a struct lsmblob in
      audit data"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple
      object contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: audit: multiple subject lsm values
      for netlabel"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple task
      security contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Allow multiple records in an
      audit_buffer"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add a function to report
      multiple LSMs"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Create audit_stamp
      structure"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Keep multiple LSM data in
      audit_names"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx
      module selection"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: binder: Pass LSM identifier for
      confirmation"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: NET: Store LSM netlabel data in a
      lsmblob"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx in
      netlink netfilter"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_dentry_init_security"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_inode_getsecctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_secid_to_secctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM:...

Changed in linux (Ubuntu Kinetic):
status: In Progress → Fix Released
Timo Aaltonen (tjaalton)
Changed in linux-oem-6.0 (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-6.0 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-oem-6.0 (Ubuntu Jammy):
status: New → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1010.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
removed: verification-done-jammy
AceLan Kao (acelankao)
description: updated
AceLan Kao (acelankao)
Changed in linux-oem-6.1 (Ubuntu Focal):
assignee: nobody → AceLan Kao (acelankao)
status: New → In Progress
Changed in linux-oem-6.1 (Ubuntu Jammy):
assignee: nobody → AceLan Kao (acelankao)
status: New → In Progress
Changed in linux-oem-6.1 (Ubuntu Focal):
assignee: AceLan Kao (acelankao) → nobody
status: In Progress → Invalid
Changed in linux-oem-6.1 (Ubuntu Kinetic):
status: New → Invalid
Timo Aaltonen (tjaalton)
Changed in linux-oem-6.1 (Ubuntu):
status: New → Invalid
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

please verify oem-6.1 1020.20

Changed in linux-oem-6.1 (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
AceLan Kao (acelankao) wrote :

No machine to verify the issue, so only checked the code has been merged.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-6.1 - 6.1.0-1020.20

---------------
linux-oem-6.1 (6.1.0-1020.20) jammy; urgency=medium

  * jammy/linux-oem-6.1: 6.1.0-1020.20 -proposed tracker (LP: #2030594)

  * CVE-2022-40982
    - init: Provide arch_cpu_finalize_init()
    - x86/cpu: Switch to arch_cpu_finalize_init()
    - ARM: cpu: Switch to arch_cpu_finalize_init()
    - ia64/cpu: Switch to arch_cpu_finalize_init()
    - loongarch/cpu: Switch to arch_cpu_finalize_init()
    - m68k/cpu: Switch to arch_cpu_finalize_init()
    - mips/cpu: Switch to arch_cpu_finalize_init()
    - sh/cpu: Switch to arch_cpu_finalize_init()
    - sparc/cpu: Switch to arch_cpu_finalize_init()
    - um/cpu: Switch to arch_cpu_finalize_init()
    - init: Remove check_bugs() leftovers
    - init: Invoke arch_cpu_finalize_init() earlier
    - init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
    - x86/init: Initialize signal frame size late
    - x86/fpu: Remove cpuinfo argument from init functions
    - x86/fpu: Mark init functions __init
    - x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
    - x86/speculation: Add Gather Data Sampling mitigation
    - x86/speculation: Add force option to GDS mitigation
    - x86/speculation: Add Kconfig option for GDS
    - KVM: Add GDS_NO support to KVM
    - x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
    - x86/xen: Fix secondary processors' FPU initialization
    - x86/mm: fix poking_init() for Xen PV guests
    - x86/mm: Use mm_alloc() in poking_init()
    - mm: Move mm_cachep initialization to mm_init()
    - x86/mm: Initialize text poking earlier
    - Documentation/x86: Fix backwards on/off logic about YMM support
    - [Config]: Enable CONFIG_ARCH_HAS_CPU_FINALIZE_INIT

  * System freeze after resuming from suspend due to PCI ASPM settings
    (LP: #1980829)
    - SAUCE: PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    - SAUCE: whitelist platforms that needs save/restore ASPM L1SS for
      suspend/resume

  * CVE-2023-20593
    - x86/cpu/amd: Move the errata checking functionality up
    - x86/cpu/amd: Add a Zenbleed fix

  * Fix repeated errors of blacklisting during bootup (LP: #2029363)
    - certs: make blacklisted hash available in klog
    - KEYS: Add new function key_create()
    - certs: don't try to update blacklist keys

  * Fix AMD gpu hang when screen off/on (LP: #2028740)
    - drm/amd/display: Keep PHY active for dp config

  * CVE-2023-4015
    - netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR

  * CVE-2023-3995
    - netfilter: nf_tables: disallow rule addition to bound chain via
      NFTA_RULE_CHAIN_ID

  * CVE-2023-3777
    - netfilter: nf_tables: skip bound chain on rule flush

  * CVE-2023-4004
    - netfilter: nft_set_pipapo: fix improper element removal

 -- Timo Aaltonen <email address hidden> Wed, 16 Aug 2023 15:20:53 +0300

Changed in linux-oem-6.1 (Ubuntu Jammy):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.