Xenial: ZFS deadlock in shrinker path with xattrs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
zfs-linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Mauricio Faria de Oliveira |
Bug Description
[Impact]
* Xenial's ZFS can deadlock in the memory shrinker path
after removing files with extended attributes (xattr).
* Extended attributes are enabled by default, but are
_not_ used by default, which reduces the likelyhood.
* It's very difficult/rare to reproduce this problem,
due to file/xattr/
circumstances required. (weeks for a reporter user)
but a synthetic test-case has been found for tests.
[Test Case]
* A synthetic reproducer is available for this LP,
with a few steps to touch/setfattr/
plus a kernel module to massage the disposal list.
(comment #8)
* In the original ZFS module:
the xattr dir inode is not purged immediately on
file removal, but possibly purged _two_ shrinker
invocations later. This allows for other thread
started before file remove to call zfs_zget() on
the xattr child inode and iput() it, so it makes
to the same disposal list as the xattr dir inode.
(comment #3)
* In the modified ZFS module:
the xattr dir inode is purged immediately on file
removal not possibly later on shrinker invocation,
so the problem window above doesn't exist anymore.
(comment #12)
[Regression Potential]
* Low. The patches are confined to extended attributes
in ZFS, specifically node removal/purge, and another
change how an xattr child inode tracks its xattr dir
(parent) inode, so that it can be purged immediately
on removal.
* The ZFS test-suite has been run on original/modified
zfs-dkms package/kernel modules, with no regressions.
(comment #11)
description: | updated |
Changed in linux (Ubuntu Eoan): | |
status: | New → Invalid |
Changed in linux (Ubuntu Disco): | |
status: | New → Invalid |
Changed in linux (Ubuntu Bionic): | |
status: | New → Invalid |
Changed in linux (Ubuntu Xenial): | |
status: | New → In Progress |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
no longer affects: | linux (Ubuntu) |
no longer affects: | linux (Ubuntu Xenial) |
no longer affects: | linux (Ubuntu Bionic) |
no longer affects: | linux (Ubuntu Disco) |
no longer affects: | linux (Ubuntu Eoan) |
no longer affects: | zfs-linux (Ubuntu Bionic) |
no longer affects: | zfs-linux (Ubuntu Disco) |
no longer affects: | zfs-linux (Ubuntu Eoan) |
Changed in zfs-linux (Ubuntu): | |
status: | Invalid → Fix Released |
importance: | Undecided → Medium |
Changed in zfs-linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
[Original Description]
One LXC user reported lots of processes stuck in D state:
several threads waiting in the memory shrinker semaphore
(this symptom was thought to be/fixed via LP bug 1817628).
After some time, a provided crashdump revealed the issue
in ZFS's evict node path running in memory shrinker path
(thus locking the semaphore as observed previously/above).
The stack trace shows the inode memory shrinker entered
ZFS and is looping in zfs_zget().
PID: 42105 TASK: ffff881169f3d400 CPU: 36 COMMAND: "lxcfs" to_free_ pages at ffffffff811ad5fb pages_slowpath. constprop. 88 at ffffffff8119ee92 pages_nodemask at ffffffff8119f908
#0 [ffff88103ea88e38] crash_nmi_callback at ffffffff810518a7
#1 [ffff88103ea88e48] nmi_handle at ffffffff810323ae
#2 [ffff88103ea88ea0] default_do_nmi at ffffffff810328f4
#3 [ffff88103ea88ec0] do_nmi at ffffffff81032aa2
#4 [ffff88103ea88ee8] end_repeat_nmi at ffffffff8185a587
[exception RIP: _raw_spin_lock+20]
RIP: ffffffff81857464 RSP: ffff881a23bab138 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8810a11afb78 RCX: ffff881e7ad76858
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8810a11afb78
RBP: ffff881a23bab138 R8: 000000000001a6a0 R9: ffffffffc05e384a
R10: ffffea0070071400 R11: ffff88014e96d340 R12: 0000000000000000
R13: ffff8810a11afb50 R14: ffff88014e96d340 R15: ffff8810a11afaf8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffff881a23bab138] _raw_spin_lock at ffffffff81857464
#6 [ffff881a23bab140] dbuf_read at ffffffffc08c141a [zfs]
#7 [ffff881a23bab1e8] dnode_hold_impl at ffffffffc08db218 [zfs]
#8 [ffff881a23bab250] dnode_hold at ffffffffc08db659 [zfs]
#9 [ffff881a23bab260] dmu_bonus_hold at ffffffffc08ca2b6 [zfs]
#10 [ffff881a23bab2a0] sa_buf_hold at ffffffffc09023fe [zfs]
#11 [ffff881a23bab2b0] zfs_zget at ffffffffc095cb47 [zfs]
#12 [ffff881a23bab350] zfs_purgedir at ffffffffc093be54 [zfs]
#13 [ffff881a23bab558] zfs_rmnode at ffffffffc093c212 [zfs]
#14 [ffff881a23bab5a0] zfs_zinactive at ffffffffc095d2f8 [zfs]
#15 [ffff881a23bab5d8] zfs_inactive at ffffffffc0956671 [zfs]
#16 [ffff881a23bab628] zpl_evict_inode at ffffffffc096dc03 [zfs]
#17 [ffff881a23bab650] evict at ffffffff81233d81
#18 [ffff881a23bab678] dispose_list at ffffffff81233e86
#19 [ffff881a23bab690] prune_icache_sb at ffffffff81234fea
#20 [ffff881a23bab6c8] super_cache_scan at ffffffff8121b862
#21 [ffff881a23bab720] shrink_slab at ffffffff811a8e0d
#22 [ffff881a23bab800] shrink_zone at ffffffff811ad488
#23 [ffff881a23bab880] do_try_
#24 [ffff881a23bab900] try_to_free_pages at ffffffff811ad91e
#25 [ffff881a23bab980] __alloc_
#26 [ffff881a23baba60] __alloc_
#27 [ffff881a23babb00] alloc_pages_current at ffffffff811ea47c
#28 [ffff881a23babb48] alloc_kmem_pages at ffffffff8119d4d9
#29 [ffff881a23babb70] kmalloc_order_trace at ffffffff811bb04e
#30 [ffff881a23babbb0] __kmalloc at ffffffff811f6e90
#31 [ffff881a23babbf8] seq_buf_alloc at ffffffff8123ca00
#32 [ffff881a23babc10] single_open_size at ffffffff8123dc1a
#33 [ffff881a23babc50] stat_open at ffffffff8128fc76
#34 [ffff881a23babc68] proc_reg_open at ffffffff81286011
#35 [ffff881a23babca0] do_dentry_open ...