ZFS module cannot be loaded after Kernel 5.4.0-67-generic release AND if the system has zfs-dkms and spl-dkms packages installed

Bug #1920956 reported by Willian Braga da Silva
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Triaged
Medium
Dimitri John Ledkov

Bug Description

WARNING: The current bug will only occur under the following conditions:

1- You have a system running on Bionic Release (18.04). Other systems seems to be unaffected for now;
2- This system have 'spl-dkms' and 'zfs-dkms' packages installed. This triggers both SPL and ZFS to be recompiled from source if we have a new Kernel Image being installed;
3- You didn't patched your system kernel to either "linux-modules-5.4.0-67-generic" or "linux-modules-5.4.0-1039-aws". You are running an earlier kernel version, which can be from 5.3.0-XXX or 5.4.0-60 for example and you need to upgrade your system to the latest 'kernel-image' release.

ALTERNATIVELY:
1- When creating a new OS image based on the Bionic and it's already running kernel 5.4.0-67 or 5.4.0-1030-aws, you decide to install 'spl-dkms' and 'zfs-dkms'.

SCENARIO:
We have systems running on Xenial, Bionic, and Focal Fossa releases. We also have the 'Unattended Upgrades' on and we use ZFS.

Every time that Ubuntu releases kernel updates, they are updated on our systems and both the new kernel and its modules will be active on the next reboot. Or if you run your workload on AWS and uses Canonical images, this is happening also on the image tagged as 'ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20210224' on AWS Ireland / eu-west-1 Region (AMI ID = ami-0d75330b9efa7072d) for fresh installs.

Some days ago, Ubuntu releases a new Linux Kernel update for the 5.4.0 release, and this is happening on both "linux-modules-5.4.0-67-generic" and "linux-modules-5.4.0-1039-aws" (this might be true on GKE, Azure and other public cloud providers kernel releases, but I didn't test them).

I have noticed that both EC2 instances that were rebooted after the kernel upgrade or if we are baking a new OS image based on the latest Canonical images I was unable to get the ZFS module loaded. The error message is showed below:

$ sudo modprobe zfs
modprobe: ERROR: could not insert 'zfs': Invalid argument

When checking the 'dmesg', we can see the errors below (or even when booting up if you have the 'zfs-load-module.service' active on your system).

[ 51.358828] icp: disagrees about version of symbol __kstat_create
[ 51.358830] icp: Unknown symbol __kstat_create (err -22)
[ 51.358925] icp: disagrees about version of symbol __kstat_install
[ 51.358927] icp: Unknown symbol __kstat_install (err -22)
[ 79.196638] icp: disagrees about version of symbol __kstat_delete
[ 79.196643] icp: Unknown symbol __kstat_delete (err -22)
[ 79.196687] icp: disagrees about version of symbol __kstat_create
[ 79.196689] icp: Unknown symbol __kstat_create (err -22)
[ 79.196765] icp: disagrees about version of symbol __kstat_install
[ 79.196766] icp: Unknown symbol __kstat_install (err -22)

On Bionic release, both th ZFS and the SPL modules comes with version '0.7.5'.

$ dpkg -l |grep 0.7.5-1ubuntu16.9
ii libnvpair1linux 0.7.5-1ubuntu16.9 amd64 Solaris name-value library for Linux
ii libuutil1linux 0.7.5-1ubuntu16.9 amd64 Solaris userland utility library for Linux
ii libzfs2linux 0.7.5-1ubuntu16.9 amd64 OpenZFS filesystem library for Linux
ii libzpool2linux 0.7.5-1ubuntu16.9 amd64 OpenZFS pool library for Linux
ii zfs-dkms 0.7.5-1ubuntu16.9 all OpenZFS filesystem kernel modules for Linux
ii zfs-initramfs 0.7.5-1ubuntu16.9 all OpenZFS root filesystem capabilities for Linux - initramfs
ii zfs-zed 0.7.5-1ubuntu16.9 amd64 OpenZFS Event Daemon
ii zfsutils-linux 0.7.5-1ubuntu16.9 amd64 command-line tools to manage OpenZFS filesystems

On systems running kernel 5.4.0 already have the modules compiled on version 0.8.1, which can vary depending on the active kernel.

(test) wbraga@bastion-host-i-07c42966be34bd44e:~$ modinfo zfs |head -2
filename: /lib/modules/5.3.0-1030-aws/kernel/zfs/zfs.ko
version: 0.8.1-1ubuntu14.4
(test) wbraga@bastion-host-i-07c42966be34bd44e:~$ modinfo spl |head -2
filename: /lib/modules/5.3.0-1030-aws/kernel/zfs/spl.ko
version: 0.8.1-1ubuntu14.4
(test) wbraga@bastion-host-i-07c42966be34bd44e:~$ uname -sr
Linux 5.3.0-1030-aws
(test) wbraga@bastion-host-i-07c42966be34bd44e:~$

From what I understood, both the SPL and the ZFS DKMS scripts would not replace the modules already installed on the systems that comes with 'linux-modules-5.3.0-XXXX' if they are equal or newer than the compiled one, as shown below:

(...)
zavl.ko:
Running module version sanity check.
Error! Module version 0.7.5-1ubuntu16.11 for zavl.ko
is not newer than what is already found in kernel 5.4.0-64-generic (0.8.3-1ubuntu12.5).
You may override by specifying --force.
(...)

However, some of the modules will be installed in another path (in this case, I am showing the modules that were installed):

(...)
splat.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-67-generic/updates/dkms/
(...)
zpios.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-67-generic/updates/dkms/
(...)
icp.ko:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/5.4.0-67-generic/updates/dkms/

After all these three modules are installed, I cannot load ZFS. I would assume that the issue lies on the duplicated 'icp.ko' module under '/lib/modules/5.4.0-67-generic/updates/dkms/' directory.

$ ls -l /lib/modules/5.4.0-67-generic/updates/dkms/
total 632
-rw-r--r-- 1 root root 317744 Mar 23 14:55 icp.ko
-rw-r--r-- 1 root root 289680 Mar 23 14:52 splat.ko
-rw-r--r-- 1 root root 36848 Mar 23 14:55 zpios.ko
$

It appears that newer kernels (such as '5.4.0-67-generic') have ZFS version '0.8.3-1ubuntu12.6', while on previous kernel release (such as 5.4.0-64-generic) the ZFS version is '0.8.3-1ubuntu12.5'. Something may have changed that now confuses modprobe to properly load the kernel.

A workaround solution for this issue is to remove both 'spl-dkms' and 'zfs-dkms' packages. apt/dkg will remove the compiled modules and we can load the module back.

STEPS TO REPRODUCE THE ISSUE:

1- Fire it up a Bionic machine. You can test it out on Vagrant for example. Ensure the kernel is something one or two versions below.
2- Bump your system to kernel 5.4.0-60 (for example). You can achieve this by running:

sudo apt install linux-image-5.4.0-60-generic linux-headers-5.4.0-60-generic linux-modules-5.4.0-60-generic

3- Reboot your system
4- Install ZFS packages, including both DKMS.

sudo apt install libnvpair1linux libuutil1linux libzfs2linux libzpool2linux spl spl-dkms zfs-dkms zfs-initramfs zfs-zed zfsutils-linux.

5- Install a new kernel release, such as 5.4.0-64. Check whether spl-dkms and zfs-dkms are triggered.

sudo apt install linux-image-5.4.0-64-generic linux-headers-5.4.0-64-generic linux-modules-5.4.0-64-generic

6- reboot (you will be in in kernel 5.4.0-64)

7- Check whether you can load zfs with 'sudo modprobe zfs'

8- Now install kernel 5.4.0-67.

sudo apt install linux-image-5.4.0-67-generic linux-headers-5.4.0-67-generic linux-modules-5.4.0-67-generic.

9- Reboot your system. You will be in Kernel '5.4.0-67'. The ZFS module won't come up (check 'dmesg').

Revision history for this message
Colin Ian King (colin-king) wrote (last edit ):

The more recent Ubuntu kernel packages provide ZFS + SPL as built-in modules to avoid having to rebuild the modules with DKMS. This saves time and space on disc as well as the added advantage of knowing that the modules will always be installable and will work on kernel updates.

The recommended practice (as you noted in the workaround) is to avoid using the dkms zfs + spl modules by uninstalling them. The dkms modules are only provided now for user who use their own custom kernels and want to build ZFS against their kernels.

I therefore suggest removing the ZFS + SPL dkms modules as the standard recommended solution.

Changed in zfs-linux (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu):
assignee: Colin Ian King (colin-king) → Dimitri John Ledkov (xnox)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.