A deadlock issue in scsi rescan task while resuming from S3

Bug #2018566 reported by AceLan Kao
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Status tracked in Mantic
Jammy
Fix Released
Medium
AceLan Kao
Lunar
Fix Released
Medium
AceLan Kao
Mantic
Fix Released
Medium
AceLan Kao
linux-oem-6.0 (Ubuntu)
Status tracked in Mantic
Jammy
Won't Fix
Undecided
AceLan Kao
Lunar
Invalid
Undecided
Unassigned
Mantic
Invalid
Undecided
Unassigned
linux-oem-6.1 (Ubuntu)
Status tracked in Mantic
Jammy
Fix Released
Undecided
AceLan Kao
Lunar
Invalid
Undecided
Unassigned
Mantic
Invalid
Undecided
Unassigned

Bug Description

[Impact]
During the S3 stress test, the system sometimes hangs when resuming. This is due to the SCSI rescan task being unable to acquire the mutex lock during the resumption from S3. The mutex lock has already been acquired by EH and is waiting for the device to be ready for a rescan. Unfortunately, the mutex lock is never released by either party, leading to a deadlock.

[Fix]
Kaiheng submitted a patch to fix this issue which defers the rescan if the disk is still suspended so the resume process of the disk device can proceed.
https://patchwork.ozlabs.org<email address hidden>/

Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now.

The similiar patch has been included in v6.4-rc7
6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume

[Test]
Verified on the machines by me and ODM.

[Where problems could occur]
It only defers the rescan task, and should not have any impact to current systems.

AceLan Kao (acelankao)
Changed in linux-oem-6.1 (Ubuntu Lunar):
status: New → Invalid
Changed in linux-oem-6.1 (Ubuntu Mantic):
status: New → Invalid
Changed in linux-oem-6.1 (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Mantic):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Lunar):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Jammy):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Mantic):
status: New → In Progress
Changed in linux (Ubuntu Lunar):
status: New → In Progress
Changed in linux (Ubuntu Jammy):
status: New → In Progress
tags: added: oem-priority originate-from-1999593 somerville
summary: - A race condiction issue in scsi rescan task while resuming from S3
+ A deadlock issue in scsi rescan task while resuming from S3
AceLan Kao (acelankao)
Changed in linux-oem-6.0 (Ubuntu Jammy):
assignee: nobody → AceLan Kao (acelankao)
status: New → In Progress
Timo Aaltonen (tjaalton)
tags: added: verification-needed-jammy
Changed in linux-oem-6.1 (Ubuntu Jammy):
status: In Progress → Fix Committed
AceLan Kao (acelankao)
Changed in linux-oem-6.0 (Ubuntu Lunar):
status: New → Invalid
Changed in linux-oem-6.0 (Ubuntu Mantic):
status: New → Invalid
Revision history for this message
AceLan Kao (acelankao) wrote :

Verified with 6.1.0-1012-oem

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-6.1 - 6.1.0-1012.12

---------------
linux-oem-6.1 (6.1.0-1012.12) jammy; urgency=medium

  * jammy/linux-oem-6.1: 6.1.0-1012.12 -proposed tracker (LP: #2018993)

  * A deadlock issue in scsi rescan task while resuming from S3 (LP: #2018566)
    - SAUCE: PM: suspend: Define pm_suspend_target_state
    - SAUCE: ata: libata: Defer rescan on suspended device

  * Miscellaneous Ubuntu changes
    - [Packaging] actually enforce set -e in dkms-build--nvidia-N
    - [Packaging] Preserve the correct log file variable value

 -- Timo Aaltonen <email address hidden> Tue, 09 May 2023 15:46:57 +0300

Changed in linux-oem-6.1 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the platform that needs this already moved to 6.1, so closing for 6.0

Changed in linux-oem-6.0 (Ubuntu Jammy):
status: In Progress → Won't Fix
AceLan Kao (acelankao)
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
Changed in linux (Ubuntu Lunar):
importance: Undecided → Medium
Changed in linux (Ubuntu Mantic):
importance: Undecided → Medium
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-79.86 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-27.28 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux verification-needed-lunar
Revision history for this message
AceLan Kao (acelankao) wrote :

Don't have machine to verify the issue now, but confirmed the commit exists in jammy and lunar kernels.

tags: added: verification-done-jammy verification-done-lunar
removed: verification-needed-jammy verification-needed-lunar
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-igx/5.15.0-1002.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-igx verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-5.15/5.15.0-1016.16~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-nvidia-tegra-5.15 verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/5.15.0-1016.16 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (44.9 KiB)

This bug was fixed in the package linux - 6.2.0-27.28

---------------
linux (6.2.0-27.28) lunar; urgency=medium

  * lunar/linux: 6.2.0-27.28 -proposed tracker (LP: #2026488)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] update annotations scripts

  * CVE-2023-2640 // CVE-2023-32629
    - Revert "UBUNTU: SAUCE: overlayfs: handle idmapped mounts in
      ovl_do_(set|remove)xattr"
    - Revert "UBUNTU: SAUCE: overlayfs: Skip permission checking for
      trusted.overlayfs.* xattrs"
    - SAUCE: overlayfs: default to userxattr when mounted from non initial user
      namespace

  * UNII-4 5.9G Band support request on 8852BE (LP: #2023952)
    - wifi: rtw89: 8851b: add 8851B basic chip_info
    - wifi: rtw89: introduce realtek ACPI DSM method
    - wifi: rtw89: regd: judge UNII-4 according to BIOS and chip
    - wifi: rtw89: support U-NII-4 channels on 5GHz band

  * Disable hv-kvp-daemon if /dev/vmbus/hv_kvp is not present (LP: #2024900)
    - [Packaging] disable hv-kvp-daemon if needed

  * A deadlock issue in scsi rescan task while resuming from S3 (LP: #2018566)
    - ata: libata-scsi: Avoid deadlock on rescan after device resume

  * [SRU] Intel Sapphire Rapids HBM support needs CONFIG_NUMA_EMU (LP: #2008745)
    - [Config] Intel Sapphire Rapids HBM support needs CONFIG_NUMA_EMU

  * Lunar update: v6.2.15 upstream stable release (LP: #2025067)
    - ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
    - ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
    - ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
    - x86/hyperv: Block root partition functionality in a Confidential VM
    - ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx
      (8A22)
    - iio: adc: palmas_gpadc: fix NULL dereference on rmmod
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
    - ASoC: da7213.c: add missing pm_runtime_disable()
    - net: wwan: t7xx: do not compile with -Werror
    - wifi: mt76: mt7921: Fix use-after-free in fw features query.
    - selftests mount: Fix mount_setattr_test builds failed
    - scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
    - net: sfp: add quirk enabling 2500Base-x for HG MXPD-483II
    - platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
    - wifi: ath11k: reduce the MHI timeout to 20s
    - tracing: Error if a trace event has an array for a __field()
    - asm-generic/io.h: suppress endianness warnings for readq() and writeq()
    - asm-generic/io.h: suppress endianness warnings for relaxed accessors
    - x86/cpu: Add model number for Intel Arrow Lake processor
    - wifi: mt76: mt7921e: Set memory space enable in PCI_COMMAND if unset
    - ASoC: amd: ps: update the acp clock source.
    - arm64: Always load shadow stack pointer directly from the task struct
    - arm64: Stash shadow stack pointer in the task struct on interrupt
    - powerpc/boot: Fix boot wrapper code generation with CONFIG_POWER10_CPU
    - PCI: kirin: Select REGMAP_MMIO
    - PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
    - PC...

Changed in linux (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (83.7 KiB)

This bug was fixed in the package linux - 5.15.0-79.86

---------------
linux (5.15.0-79.86) jammy; urgency=medium

  * jammy/linux: 5.15.0-79.86 -proposed tracker (LP: #2026531)

  * Jammy update: v5.15.111 upstream stable release (LP: #2025095)
    - ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
    - ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
    - x86/hyperv: Block root partition functionality in a Confidential VM
    - iio: adc: palmas_gpadc: fix NULL dereference on rmmod
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
    - selftests mount: Fix mount_setattr_test builds failed
    - asm-generic/io.h: suppress endianness warnings for readq() and writeq()
    - x86/cpu: Add model number for Intel Arrow Lake processor
    - wireguard: timers: cast enum limits members to int in prints
    - wifi: mt76: mt7921e: Set memory space enable in PCI_COMMAND if unset
    - arm64: Always load shadow stack pointer directly from the task struct
    - arm64: Stash shadow stack pointer in the task struct on interrupt
    - PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
    - PCI: qcom: Fix the incorrect register usage in v2.7.0 config
    - IMA: allow/fix UML builds
    - USB: dwc3: fix runtime pm imbalance on probe errors
    - USB: dwc3: fix runtime pm imbalance on unbind
    - hwmon: (k10temp) Check range scale when CUR_TEMP register is read-write
    - hwmon: (adt7475) Use device_property APIs when configuring polarity
    - posix-cpu-timers: Implement the missing timer_wait_running callback
    - blk-mq: release crypto keyslot before reporting I/O complete
    - blk-crypto: make blk_crypto_evict_key() return void
    - blk-crypto: make blk_crypto_evict_key() more robust
    - ext4: use ext4_journal_start/stop for fast commit transactions
    - staging: iio: resolver: ads1210: fix config mode
    - tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH
    - xhci: fix debugfs register accesses while suspended
    - tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
    - MIPS: fw: Allow firmware to pass a empty env
    - ipmi:ssif: Add send_retries increment
    - ipmi: fix SSIF not responding under certain cond.
    - kheaders: Use array declaration instead of char
    - wifi: mt76: add missing locking to protect against concurrent rx/status
      calls
    - pwm: meson: Fix axg ao mux parents
    - pwm: meson: Fix g12a ao clk81 name
    - soundwire: qcom: correct setting ignore bit on v1.5.1
    - pinctrl: qcom: lpass-lpi: set output value before enabling output
    - ring-buffer: Sync IRQ works before buffer destruction
    - crypto: api - Demote BUG_ON() in crypto_unregister_alg() to a WARN_ON()
    - crypto: safexcel - Cleanup ring IRQ workqueues on load failure
    - rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-
      ed
    - reiserfs: Add security prefix to xattr name in reiserfs_security_write()
    - KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted
    - relayfs: fix out-of-bounds access in relay_file_read
    - writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
 ...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1044.49 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-linux-aws'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-6.2/6.2.0-1011.11~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws-6.2' to 'verification-done-jammy-linux-aws-6.2'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws-6.2' to 'verification-failed-jammy-linux-aws-6.2'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-6.2-v2 verification-needed-jammy-linux-aws-6.2
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.2.0-1011.11 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux-azure' to 'verification-done-lunar-linux-azure'. If the problem still exists, change the tag 'verification-needed-lunar-linux-azure' to 'verification-failed-lunar-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-azure-v2 verification-needed-lunar-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-riscv/6.2.0-31.31.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux-riscv' to 'verification-done-lunar-linux-riscv'. If the problem still exists, change the tag 'verification-needed-lunar-linux-riscv' to 'verification-failed-lunar-linux-riscv'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-riscv-v2 verification-needed-lunar-linux-riscv
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-starfive/6.2.0-1003.3 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux-starfive' to 'verification-done-lunar-linux-starfive'. If the problem still exists, change the tag 'verification-needed-lunar-linux-starfive' to 'verification-failed-lunar-linux-starfive'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-starfive-v2 verification-needed-lunar-linux-starfive
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1046.53 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Revision history for this message
AceLan Kao (acelankao) wrote :

The similiar patch has been included in v6.4-rc7, so 6.5 kernel should include the fix.

6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume

Changed in linux (Ubuntu Mantic):
status: In Progress → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-5.15/5.15.0-1046.51~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-done-focal-linux-aws-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-failed-focal-linux-aws-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-aws-5.15-v2 verification-needed-focal-linux-aws-5.15
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-xilinx-zynqmp/5.15.0-1024.28 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-done-jammy-linux-xilinx-zynqmp'. If the problem still exists, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-failed-jammy-linux-xilinx-zynqmp'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-xilinx-zynqmp-v2 verification-needed-jammy-linux-xilinx-zynqmp
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.