CPU hard lockup with rigorous writes to NVMe drive
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
Mauricio Faria de Oliveira | ||
Bionic |
Fix Released
|
Medium
|
Unassigned | ||
Cosmic |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
* Users may experience cpu hard lockups when performing
rigorous writes to NVMe drives.
* The fix addresses an scheduling issue in the original
implementation of wbt/writeback throttling
* The fix is commit 2887e41b910b ("blk-wbt: Avoid lock
contention and thundering herd issue in wbt_wait"),
plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve
waking of tasks").
* Plus a few dependency commits for each fix.
* Backports are trivial: mainly replace rq_wait_inc_below()
with the equivalent atomic_inc_below(), and maintain the
__wbt_done() signature, both due to the lack of commit
a79050434b45 ("blk-rq-qos: refactor out common elements
of blk-wbt"), that changes lots of other/unrelated code.
[Test Case]
* This command has been reported to reproduce the problem:
$ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128
* It generates stack traces as below in the original kernel,
and does not generate them in the modified/patched kernel.
* The user/reporter verified the test kernel with these patches
resolved the problem.
* The developer verified in 2 systems (4-core and 24-core but
no NVMe) for regressions, and no error messages were logged
to dmesg.
[Regression Potential]
* The regression potential is contained within writeback
throttling mechanism (block/blk-wbt.*).
* The commits have been verified for fixes in later commits in
linux-next as of 2019-01-08 and all known fix commits are in.
[Other Info]
* The problem has been introduced with the blk-wbt mechanism,
in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,
so only Bionic and Cosmic needs this.
[Stack Traces]
[ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30
...
[ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu
...
[ 393.628720] Call Trace:
[ 393.628721] <IRQ>
[ 393.628724] enqueue_
[ 393.628726] ? __update_
[ 393.628728] ? __update_
[ 393.628731] activate_
[ 393.628735] ? sched_clock+
[ 393.628736] ? sched_clock+
[ 393.628738] ttwu_do_
[ 393.628739] try_to_
[ 393.628741] default_
[ 393.628743] autoremove_
[ 393.628744] __wake_
[ 393.628745] __wake_
[ 393.628746] __wake_up+0x13/0x20
[ 393.628749] __wbt_done.
[ 393.628749] wbt_done+0x72/0xa0
[ 393.628753] blk_mq_
[ 393.628755] blk_mq_
[ 393.628760] nvme_complete_
[ 393.628763] nvme_pci_
[ 393.628764] __blk_mq_
[ 393.628766] blk_mq_
[ 393.628767] nvme_process_
[ 393.628768] nvme_irq+0x23/0x50 [nvme]
[ 393.628772] __handle_
[ 393.628773] handle_
[ 393.628774] handle_
[ 393.628778] handle_
[ 393.628779] handle_
[ 393.628783] do_IRQ+0x46/0xd0
[ 393.628784] common_
[ 393.628785] </IRQ>
...
[ 393.628794] ? cpuidle_
[ 393.628796] cpuidle_
[ 393.628797] call_cpuidle+
[ 393.628798] do_idle+0x18c/0x1f0
[ 393.628799] cpu_startup_
[ 393.628802] start_secondary
[ 393.628804] secondary_
[ 393.628805] Code: ...
[ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled
[ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485
[ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818)
[ 435.617025] Sending NMI from CPU 8 to CPUs 30:
[ 435.617029] NMI backtrace for cpu 30
[ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu
...
[ 435.617047] Call Trace:
[ 435.617048] <IRQ>
[ 435.617051] enqueue_
[ 435.617053] enqueue_
[ 435.617056] activate_
[ 435.617059] ? sched_clock+
[ 435.617060] ? sched_clock+
[ 435.617061] ttwu_do_
[ 435.617063] try_to_
[ 435.617065] default_
[ 435.617067] autoremove_
[ 435.617068] __wake_
[ 435.617069] __wake_
[ 435.617070] __wake_up+0x13/0x20
[ 435.617073] __wbt_done.
[ 435.617074] wbt_done+0x72/0xa0
[ 435.617077] blk_mq_
[ 435.617079] blk_mq_
[ 435.617084] nvme_complete_
[ 435.617087] nvme_pci_
[ 435.617088] __blk_mq_
[ 435.617090] blk_mq_
[ 435.617091] nvme_process_
[ 435.617093] nvme_irq+0x23/0x50 [nvme]
[ 435.617096] __handle_
[ 435.617097] handle_
[ 435.617098] handle_
[ 435.617101] handle_
[ 435.617102] handle_
[ 435.617106] do_IRQ+0x46/0xd0
[ 435.617107] common_
[ 435.617108] </IRQ>
...
[ 435.617117] ? cpuidle_
[ 435.617118] cpuidle_
[ 435.617119] call_cpuidle+
[ 435.617121] do_idle+0x18c/0x1f0
[ 435.617122] cpu_startup_
[ 435.617125] start_secondary
[ 435.617127] secondary_
[ 435.617128] Code: ...
CVE References
Changed in linux (Ubuntu): | |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Cosmic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Cosmic): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Confirmed → Fix Released |
status: | Fix Released → Invalid |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1810998
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.