Silent data corruption in Linux kernel 4.15
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
High
|
Colin Ian King |
Bug Description
== SRU Justification [BIONIC] ==
A silent data corruption was introduced in v4.10-rc1 with commit
72ecad22d9f198a
with commit 17d51b10d7773e4
users of O_DIRECT, in our case a KVM virtual machine with drives
which use qemu's "cache=none" option.
== Fix ==
Upstream commits:
0aa69fd32a5f766
block: add a lower-level bio_add_page interface
b403ea2404889e1
block: bio_iov_
9362dd1109f87a9
blkdev: __blkdev_
17d51b10d7773e4
block: bio_iov_
The first 3 patches are required for a clean application of the final
patch that actually addresses the problem with a fix to this known
issue.
== Regression Potential ==
This touches the block layer, so there is risk potential in data
corruption. The fixes have several weeks in the upstream kernel and
so far, I see no subsequent fixes required.
== Test Case ==
Build the program listed below [1]
kudos to Jan Kara, and run with:
dd if=/dev/zero if=loop.img bs=1M count=2048
sudo losetup /dev/loop0 loop.img
./blkdev-dio-test /dev/loop0 0 &
./blkdev-dio-test /dev/loop0 2048 &
Without the fix, ones lost writes fairly soon. Without the fix, this
runs without any losy write messages.
blkdev-dio-test.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>
#include <sys/uio.h>
#define PAGE_SIZE 4096
#define SECT_SIZE 512
#define BUF_OFF (2*SECT_SIZE)
int main(int argc, char **argv)
{
int fd = open(argv[1], O_RDWR | O_DIRECT);
int ret;
char *buf;
loff_t off;
struct iovec iov[2];
unsigned int seq;
if (fd < 0) {
perror("open");
return 1;
}
off = strtol(argv[2], NULL, 10);
buf = aligned_
iov[0].iov_base = buf;
iov[0].iov_len = SECT_SIZE;
iov[1].iov_base = buf + BUF_OFF;
iov[1].iov_len = SECT_SIZE;
seq = 0;
memset(buf, 0, PAGE_SIZE);
while (1) {
*(unsigned int *)buf = seq;
*(unsigned int *)(buf + BUF_OFF) = seq;
ret = pwritev(fd, iov, 2, off);
if (ret < 0) {
perror(
return 1;
}
if (ret != 2*SECT_SIZE) {
fprintf(stderr, "Short pwritev: %d\n", ret);
return 1;
}
ret = pread(fd, buf, PAGE_SIZE, off);
if (ret < 0) {
perror("pread");
return 1;
}
if (ret != PAGE_SIZE) {
fprintf(stderr, "Short read: %d\n", ret);
return 1;
}
if (*(unsigned int *)buf != seq ||
*(unsigned int *)(buf + SECT_SIZE) != seq) {
printf("Lost write %u: %u %u\n", seq, *(unsigned int *)buf, *(unsigned int *)(buf + SECT_SIZE));
return 1;
}
seq++;
}
return 0;
}
References:
[1] https:/
=======
TLDR: commit 72ecad22d9f198a
A silent data corruption was introduced in v4.10-rc1 with commit 72ecad22d9f198a
This is the commit which fixes the issue:
-------
commit 17d51b10d7773e4
Author: Martin Wilck <email address hidden>
Date: Wed Jul 25 23:15:09 2018 +0200
block: bio_iov_
bio_
segment from the iov_iter to the bio. That's suboptimal for callers,
which typically try to pin as many pages as fit into the bio. This patch
converts the current bio_iov_
introduces a new helper that allocates as many pages as
1) fit into the bio,
2) are present in the iov_iter,
3) and can be pinned by MM.
Error is returned only if zero pages could be pinned. Because of 3), a
zero return value doesn't necessarily mean all pages have been pinned.
Callers that have to pin every page in the iov_iter must still call this
function in a loop (this is currently the case).
This change matters most for __blkdev_
bio_
requested, it returns a "short write" or "short read", and
__generic_
lead to data corruption.
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io")
Reviewed-by: Christoph Hellwig <email address hidden>
Signed-off-by: Martin Wilck <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
-------
Since there were a lot of components involved in the initial report to us (xfs, guest kernel, guest virtio drivers, qemu, host kernel, storage system), we had to isolate it. This is the commit which fixes the data corruption bug. We created a reliable reproduction and tested with the patch and without the patch. We also created a version of the kernel which prints when the data-corrupting path in the kernel is triggered.
> 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
# lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04
> 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
# apt-cache policy linux-image-
linux-image-
Installed: 4.15.0-36.39
Candidate: 4.15.0-36.39
Version table:
*** 4.15.0-36.39 500
500 http://
500 http://
100 /var/lib/
> 3) What you expected to happen
We ran a fio random write workload over 8x 512MB files over XFS in guest OS, over qemu/kvm, over kernel 4.15.0-
qemu-system was configured with cache=none, which means Direct IO. This is a very common configuration.
qemu-system was with aio=threads -- the default.
We were expecting no data corruption.
> 4) What happened instead
The guest filesystem was corrupted.
affects: | linux-signed (Ubuntu) → linux (Ubuntu) |
summary: |
- Silent corruption in Linux kernel 4.15 + Silent data corruption in Linux kernel 4.15 |
Changed in linux (Ubuntu Bionic): | |
status: | Triaged → Fix Released |
Changed in linux (Ubuntu Bionic): | |
status: | Triaged → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
tags: | added: cscc |
Status changed to 'Confirmed' because the bug affects multiple users.