blk-mq: possible deadlock on CPU hot(un)plug
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Fix Released
|
High
|
Juerg Haefliger | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== Comment: #0 - Carsten Jacobi <email address hidden> - 2017-03-07 03:35:31 ==
I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured).
The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit.
The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that):
With the default settings and under a certain workload,
blk_mq seems to get into a presumed "deadlock".
Possibly this happens on CPU hot(un)plug.
After the I/O stalled, a dump was pulled manually.
The following information is from the crash dump pre-analysis.
$ zgetdump -i dump.0
General dump info:
Dump format........: elf
Version.
UTS node name......: mclint
UTS kernel release.: 4.4.0-65-generic
UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017
System arch........: s390x (64 bit)
CPU count (online).: 2
Dump memory range..: 8192 MB
Memory map:
0000000000000000 - 00000001b831afff (7043 MB)
00000001b831b000 - 00000001ffffffff (1149 MB)
Things look similarly with HWE kernel ubuntu16.
KERNEL: vmlinux.full
DUMPFILE: dump.0
CPUS: 2
DATE: Fri Mar 3 14:31:07 2017
UPTIME: 02:11:20
LOAD AVERAGE: 13.00, 12.92, 11.37
TASKS: 411
NODENAME: mclint
RELEASE: 4.4.0-65-generic
VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017
MACHINE: s390x (unknown Mhz)
MEMORY: 7.8 GB
PANIC: ""
PID: 0
COMMAND: "swapper/0"
TASK: bad528 (1 of 2) [THREAD_INFO: b78000]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
INFO: no panic task found
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV
...
8 1e1d6d800 sda 1e1d51210 0 23151 4294944145 N/A(MQ)
8 1e4e06800 sdc 2081b18 0 23148 4294944148 N/A(MQ)
8 1f07800 sdb 20c7568 0 23195 4294944101 N/A(MQ)
8 1e4e06000 sdd 1e4e31210 0 23099 4294944197 N/A(MQ)
252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ)
...
So both dm-mpath and sd have requests pending in their block multiqueue.
The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1.
[ 0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~
[ 0.798262] setup: Linux is running natively in 64-bit mode
[ 0.798290] setup: Max memory size: 8192MB
[ 0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB)
[ 0.836923] Kernel command line: root=/dev/
[ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds.
[ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu
[ 5281.179438] "echo 0 > /proc/sys/
[ 5281.179440] xfsaild/dm-11 D 00000000007bcf52 0 1604 2 0x00000000
[ 5281.179444] 00000001e931c230 00000000001a6964 00000001e6f9b958 00000001e6f9b9d8
[ 5281.179454] Call Trace:
[ 5281.179461] ([<00000000007b
[ 5281.179462] [<00000000007bc
[ 5281.179465] [<00000000007c0
[ 5281.179466] [<00000000007bd
[ 5281.179472] [<000000000017b
[ 5281.179564] [<000003ff805e1
[ 5281.179589] [<000003ff805de
[ 5281.179615] [<000003ff805df
[ 5281.179640] [<000003ff805ec
[ 5281.179643] [<0000000000183
[ 5281.179645] [<00000000007c0
[ 5281.179646] [<00000000007c0
see below
[ 5281.179664] INFO: task cpuplugd:2260 blocked for more than 120 seconds.
[ 5281.179665] Not tainted 4.4.0-65-generic #86-Ubuntu
[ 5281.179666] "echo 0 > /proc/sys/
[ 5281.179668] cpuplugd D 00000000007bcf52 0 2260 1 0x00000000
[ 5281.179670] 00000001e782e318 00000000001a6964 000000007bc4ba58 000000007bc4bad8
[ 5281.179678] Call Trace:
[ 5281.179680] ([<00000000007b
[ 5281.179681] [<00000000007bc
[ 5281.179685] [<0000000000516
[ 5281.179687] [<0000000000519
[ 5281.179690] [<0000000000184
[ 5281.179692] [<0000000000184
[ 5281.179696] [<0000000000160
[ 5281.179698] [<0000000000160
[ 5281.179700] [<00000000005d0
[ 5281.179703] [<00000000005ca
[ 5281.179704] [<00000000005ca
[ 5281.179710] [<00000000003a4
[ 5281.179712] [<0000000000312
[ 5281.179714] [<0000000000312
[ 5281.179715] [<00000000007c0
[ 5281.179718] [<000003ff803df
Cpuplugd performs CPU hot(un)plug based on configurable rules.
https:/
https:/
https:/
[ 5281.179769] INFO: task kworker/0:2:23669 blocked for more than 120 seconds.
[ 5281.179770] Not tainted 4.4.0-65-generic #86-Ubuntu
[ 5281.179771] "echo 0 > /proc/sys/
[ 5281.179773] kworker/0:2 D 00000000007bcf52 0 23669 2 0x00000000
[ 5281.179801] Workqueue: xfs-cil/dm-11 xlog_cil_push_work [xfs]
[ 5281.179802] 00000001c925e318 00000000001a6964 0000000068c5f9f8 0000000068c5fa78
[ 5281.179810] Call Trace:
[ 5281.179812] ([<00000000007b
[ 5281.179813] [<00000000007bc
[ 5281.179839] [<000003ff805de
[ 5281.179864] [<000003ff805de
[ 5281.179890] [<000003ff805e0
[ 5281.179891] [<000000000017c
[ 5281.179893] [<000000000017c
[ 5281.179894] [<0000000000183
[ 5281.179896] [<00000000007c0
[ 5281.179898] [<00000000007c0
While above kworker executes a work item for a long duration, other processes block in turn on flush_work for a long duration.
[ 5281.179728] INFO: task kworker/0:1:4454 blocked for more than 120 seconds.
[ 5281.179730] Not tainted 4.4.0-65-generic #86-Ubuntu
[ 5281.179731] "echo 0 > /proc/sys/
[ 5281.179732] kworker/0:1 D 00000000007bcf52 0 4454 2 0x00000000
[ 5281.179738] Workqueue: events vmstat_shepherd
[ 5281.179739] 00000001c925ac40 00000000001a6964 000000007eb8bb38 000000007eb8bbb8
[ 5281.179747] Call Trace:
[ 5281.179749] ([<00000000007b
[ 5281.179750] [<00000000007bc
[ 5281.179752] [<00000000007bd
[ 5281.179753] [<00000000007be
[ 5281.179755] [<00000000007be
[ 5281.179756] [<000000000015f
[ 5281.179757] [<00000000002a3
[ 5281.179759] [<000000000017c
[ 5281.179761] [<000000000017c
[ 5281.179762] [<0000000000183
[ 5281.179764] [<00000000007c0
[ 5281.179765] [<00000000007c0
This work item cannot progress maybe because cpuplugd:2260 above "hangs" in the cpu hotplug notifier chain.
The low level device driver (here zfcp) is completely idle without any pending I/O after the lockup happened and all its paths are in good state and could service I/O but it simply does not get any new I/O requests from the upper layers (scsi / block). Zfcp does not implement blk_mq so dm or scsi translate which works in general but fails with above workload. There were no other undesired events, i.e. no path interruptions nor any recovery in zfcp.
CVE References
tags: |
added: s390x removed: bot-comment |
affects: | ubuntu → linux (Ubuntu) |
Changed in ubuntu-z-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
tags: | added: kernel-key |
Changed in linux (Ubuntu): | |
status: | New → Triaged |
tags: |
added: kernel-da-key removed: kernel-key |
Changed in ubuntu-z-systems: | |
status: | New → Triaged |
tags: |
added: kernel-key removed: kernel-da-key |
tags: |
added: kernel-da-key removed: kernel-key |
Changed in linux (Ubuntu Xenial): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Triaged → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | Triaged → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
Default Comment by Bridge