Have created a 100% reliable reproducer test case and also determined the Ubuntu-specific patch 4701-enable-ARC-FILL-LOCKED-flag.patch to fix Bug #1900889 is likely the cause.
[Test Case]
The important parts are:
- Use encryption
- rsync the zfs git tree
- Use parallel I/O from silversearcher-ag to access it after a reboot. A simple "find ." or "find . -exec cat {} > /dev/null \;" does not reproduce the issue.
Reproduction done using a libvirt VM installed from the Ubuntu Impish daily livecd using a normal ext4 root but with a second 4GB /dev/vdb disk for zfs later
# If you access the data now it works fine.
reboot
zfs load-key test/test
zfs mount -a
cd /test/test/zfs/
ag DISKS=
= Test Result
ag hangs, "sudo dmesg" shows an exception
[Analysis]
I rebuilt the zfs-linux 2.0.6-1ubuntu1 package from ppa:colin-king/zfs-impish without the Ubuntu-specific patch ubuntu/4701-enable-ARC-FILL-LOCKED-flag.patch which fixed Bug #1900889. With this patch disabled the issue does not reproduce. Re-enabling the patch it reproduces reliably every time again.
Seems this bug was never sent upstream. No code changes upstream setting the flag ARC_FILL_IN_PLACE appear to have been added since that I can see however interestingly the code for this ARC_FILL_IN_PLACE handling was added to fix a similar sounding issue "Raw receive fix and encrypted objset security fix"
in https://github.com/openzfs/zfs/commit/69830602de2d836013a91bd42cc8d36bbebb3aae . This first shipped in zfs 0.8.0 and the original bug was filed against 0.8.3.
Without fully understanding the ZFS code in relation to this flag, the code at https://github.com/openzfs/zfs/blob/ce2bdcedf549b2d83ae9df23a3fa0188b33327b7/module/zfs/arc.c#L2026 talks about how this flag is to do with decrypting blocks in the ARC and doing so 'inplace'. It makes some sense thus that I need encryption to reproduce it and it works best after a reboot (thus flushing the ARC) and why I can still read the data in the test case before doing a reboot when it then fails.
This patch was added in 0.8.4-1ubuntu15 and I first experienced the issue somewhere between 0.8.4-1ubuntu11 and 0.8.4-1ubuntu16.
So it all adds up and I suggest that this patch should be reverted.
Have created a 100% reliable reproducer test case and also determined the Ubuntu-specific patch 4701-enable- ARC-FILL- LOCKED- flag.patch to fix Bug #1900889 is likely the cause.
[Test Case]
The important parts are:
- Use encryption
- rsync the zfs git tree
- Use parallel I/O from silversearcher-ag to access it after a reboot. A simple "find ." or "find . -exec cat {} > /dev/null \;" does not reproduce the issue.
Reproduction done using a libvirt VM installed from the Ubuntu Impish daily livecd using a normal ext4 root but with a second 4GB /dev/vdb disk for zfs later
= Preparation /github. com/openzfs/ zfs /root/zfs
apt install silversearcher-ag git zfs-dkms zfsutils-linux
echo -n testkey2 > /root/testkey
git clone https:/
= Test Execution passphrase -o keylocation= file:// /root/testkey
zpool create test /dev/vdb
zfs create test/test -o encryption=on -o keyformat=
rsync -va --progress -HAX /root/zfs/ /test/test/zfs/
# If you access the data now it works fine.
reboot
zfs load-key test/test
zfs mount -a
cd /test/test/zfs/
ag DISKS=
= Test Result
ag hangs, "sudo dmesg" shows an exception
[Analysis] king/zfs- impish without the Ubuntu-specific patch ubuntu/ 4701-enable- ARC-FILL- LOCKED- flag.patch which fixed Bug #1900889. With this patch disabled the issue does not reproduce. Re-enabling the patch it reproduces reliably every time again.
I rebuilt the zfs-linux 2.0.6-1ubuntu1 package from ppa:colin-
Seems this bug was never sent upstream. No code changes upstream setting the flag ARC_FILL_IN_PLACE appear to have been added since that I can see however interestingly the code for this ARC_FILL_IN_PLACE handling was added to fix a similar sounding issue "Raw receive fix and encrypted objset security fix" /github. com/openzfs/ zfs/commit/ 69830602de2d836 013a91bd42cc8d3 6bbebb3aae . This first shipped in zfs 0.8.0 and the original bug was filed against 0.8.3.
in https:/
I also have found the same issue as the original Launchpad bug reported upstream without any fixes and a lot of discussion (and quite a few duplicates linking back to 11679): /github. com/openzfs/ zfs/issues/ 11679 /github. com/openzfs/ zfs/issues/ 12014
https:/
https:/
Without fully understanding the ZFS code in relation to this flag, the code at https:/ /github. com/openzfs/ zfs/blob/ ce2bdcedf549b2d 83ae9df23a3fa01 88b33327b7/ module/ zfs/arc. c#L2026 talks about how this flag is to do with decrypting blocks in the ARC and doing so 'inplace'. It makes some sense thus that I need encryption to reproduce it and it works best after a reboot (thus flushing the ARC) and why I can still read the data in the test case before doing a reboot when it then fails.
This patch was added in 0.8.4-1ubuntu15 and I first experienced the issue somewhere between 0.8.4-1ubuntu11 and 0.8.4-1ubuntu16.
So it all adds up and I suggest that this patch should be reverted.