PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Native ZFS for Linux |
Fix Released
|
Unknown
|
|||
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Critical
|
Stefan Bader | ||
linux-raspi (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Undecided
|
Unassigned | ||
ubuntu-release-upgrader (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Impish |
Won't Fix
|
Undecided
|
Unassigned | ||
zfs-linux (Ubuntu) |
Fix Released
|
Critical
|
Unassigned | ||
Impish |
Fix Released
|
Critical
|
Unassigned |
Bug Description
Since today while running Ubuntu 21.04 Hirsute I started getting a ZFS panic in the kernel log which was also hanging Disk I/O for all Chrome/Electron Apps.
I have narrowed down a few important notes:
- It does not happen with module version 0.8.4-1ubuntu11 built and included with 5.8.0-29-generic
- It was happening when using zfs-dkms 0.8.4-1ubuntu16 built with DKMS on the same kernel and also on 5.8.18-acso (a custom kernel).
- For whatever reason multiple Chrome/Electron apps were affected, specifically Discord, Chrome and Mattermost. In all cases they seem (but I was unable to strace the processes so it was a bit hard ot confirm 100% but by deduction from /proc/PID/fd and the hanging ls) they seem hung trying to open files in their 'Cache' directory, e.g. ~/.cache/
- Once I removed zfs-dkms only to revert to the kernel built-in version it immediately worked without changing anything, removing files, etc.
- It happened over multiple reboots and kernels every time, all my Chrome apps weren't working but for whatever reason nothing else seemed affected.
- It would log a series of spl_panic dumps into kern.log that look like this:
Dec 2 12:36:42 optane kernel: [ 72.857033] VERIFY(0 == sa_handle_
Dec 2 12:36:42 optane kernel: [ 72.857036] PANIC at zfs_znode.
I could only find one other google reference to this issue, with 2 other users reporting the same error but on 20.04 here:
https:/
- I was not experiencing the issue on 0.8.4-1ubuntu14 and fairly sure it was working on 0.8.4-1ubuntu15 but broken after upgrade to 0.8.4-1ubuntu16. I will reinstall those zfs-dkms versions to verify that.
There were a few originating call stacks but the first one I hit was
Call Trace:
dump_stack+
spl_dumpstack+
spl_panic+
? sa_cache_
? _cond_resched+
? mutex_lock+
? dmu_buf_
zfs_znode_
zfs_znode_
? arc_buf_
? __cv_init+0x42/0x60 [spl]
? dnode_cons+
? _cond_resched+
? _cond_resched+
? mutex_lock+
? aggsum_
? spl_kmem_
? arc_space_
? dbuf_read+
? _cond_resched+
? mutex_lock+
? dnode_rele_
? _cond_resched+
? mutex_lock+
? dmu_object_
zfs_zget+
? dmu_buf_
zfs_dirent_
zfs_dirlook+
? zfs_zaccess+
zfs_lookup+
zpl_lookup+
path_openat+
do_filp_
? __check_
? __alloc_
do_sys_
? do_sys_
do_sys_
__x64_
tags: | added: seg |
Changed in zfs: | |
status: | Unknown → New |
Changed in zfs-linux (Ubuntu): | |
status: | Fix Released → In Progress |
Changed in zfs-linux (Ubuntu Impish): | |
status: | In Progress → Fix Committed |
no longer affects: | zfs-linux (Ubuntu Impish) |
Changed in zfs-linux (Ubuntu Impish): | |
assignee: | nobody → Colin Ian King (colin-king) |
importance: | Undecided → Critical |
status: | New → Fix Released |
Changed in linux (Ubuntu Impish): | |
importance: | Undecided → Critical |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | New → Invalid |
Changed in linux (Ubuntu Impish): | |
assignee: | nobody → Stefan Bader (smb) |
Changed in linux (Ubuntu Impish): | |
status: | In Progress → Fix Committed |
Changed in zfs-linux (Ubuntu): | |
assignee: | Colin Ian King (colin-king) → nobody |
Changed in zfs-linux (Ubuntu Impish): | |
assignee: | Colin Ian King (colin-king) → nobody |
Changed in zfs: | |
status: | New → Fix Released |
Should mention that Chrome itself always showed "waiting for cache" part of backing up the story around the cache files.