NULL pointer dereference when using z3fold and zswap
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
High
|
|||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Po-Hsu Lin | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Cosmic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== Justification ==
When using z3fold and zswap on a VM under overcommitted memory stress,
z3fold will complains about an "unknown buddy id 0" and fail to get a
pointer to the mapped allocation in z3fold_map().
z3fold: unknown buddy id 0
WARNING: CPU: 2 PID: 1584 at mm/z3fold.c:971 z3fold_
And it will leads to a null pointer dereference in zswap
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 1584 Comm: stress Tainted: G W 4.18.0-17-generic #18-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
RIP: 0010:zswap_
== Fix ==
ca0246bb (z3fold: fix possible reclaim races)
This patch has already in Disco, and can be cherry-picked into B/C.
Not needed for Xenial and older kernels as z3fold is not supported.
== Test ==
Test kernels for Bionic / Cosmic could be found here:
http://
http://
This issue can be reproduced easily in a KVM with the following setup:
* 8G disk, 4G RAM, 4 CPUs
* 1G swap
* "zswap.enabled=1 zswap.zpool=z3fold zswap.max_
* "z3fold" module added into /etc/initramfs-
Stress it with two childs running:
* stress --vm-bytes 512M --vm 4 --vm-hang 3
* stress --vm-bytes 512M --vm 4 --vm-hang 7
The VM is expected to crash within 5 minutes.
With the patched kernel, the VM can withstand this stress for over an
hour with crashing with this issue
== Regression potential ==
Small.
Fix limited to z3fold. User needs to enable it explicitly for this
feature.
== Original Bug Report ==
Under memory pressure, my VM locks up. This has been reported upstream though I don't know how far any solution has progressed.
https:/
Feb 6 07:15:42 vps632258 kernel: [151336.450064] z3fold: unknown buddy id 0
Feb 6 07:15:42 vps632258 kernel: [151336.454450] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
The little bit of log I managed to salvage is attached.
This has happened to two identical VMs. Unusually it has not occurred on a third VM which is configured the same but has less RAM (fingers crossed it won't).
Irrelevant information:
I thought the lock-ups were due to me using a BTRFS filesystem, however I swapped over to NILFS2 and this still occurs. The only difference seems to be that I am now able to grab some of the kernel output.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.18.0-14-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Wed Feb 6 10:55:05 2019
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_GB.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
Changed in linux: | |
importance: | Unknown → High |
status: | Unknown → Confirmed |
Changed in linux: | |
status: | Confirmed → Fix Released |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in linux (Ubuntu Cosmic): | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
tags: | added: cosmic |
description: | updated |
Changed in linux (Ubuntu Cosmic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Created attachment 279297
dmesg log of crash
This happens mostly during memory pressure but I am not sure how to trigger it reliably. I am attaching the full log.
This is the kernel commandline
>BOOT_IMAGE= ../vmlinuz- linux root=UUID= 57274b3a- 92ab-468e- b03a-06026675c1 af rw name=92b4aeb2- fb97-45c1- 8a60-2816efe5d5 7e=home resume= /dev/mapper/ home offset= 42772480 acpi_backlight= video zswap.enabled=1 zswap.zpool=z3fold max_pool_ percent= 5 transparent_ hugepage= madvise scsi_mod. use_blk_ mq=1 ../intel- ucode.img, ../initramfs- linux.img
>rd.luks.
>resume_
>zswap.
>vga=current initrd=
I found this bug https:/ /bugzilla. kernel. org/show_ bug.cgi? id=198585 to be very similar but the proposed fix has not been merged so I can't be sure if it will fix the issue I am having.