btrfs discard issue after power event
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
btrfs-tools (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
----Overview----
Automation scripts testing SSD firmware over power transitions during interoperability testing, with the following procedure:
1) Create 4 25% partitions (varying file systems) and mount as secondary data drive
2) BTRFS partition mounted with discard flag via /etc/fstab
3) Create 10G unique data pattern file on root fs
4) Copy to each target
5) Verify each target
6) Perform power transition (restart, shutdown, sleep, or hibernate)
7) Verify each target
8) Remove target file
9) Copy file from the internal to each target again
10) Verify targets
11) Perform power transition
..etc
BTRFS fails at step 10. The machine has come up from the power event, verified the target files, deleted the target files, copied from the internal again, and fails verifying the freshly copied file.
----Failure----
On failure we see the fio verify threads fail with invalid header (data is ALWAYS "101" when expecting fios ACCA header, I assume a quirk of FIO), dmesg has csum failed messages
csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362
and the file is readable to a certain point, at which it will yield I/O error when attempting to dd.
root@xxxxx:$ dd if=/mnt/
1+0 records in
1+0 records out
512 bytes copied, 0.000311177 s, 1.6 MB/s
root@xxxxx:$ dd if=/mnt/
dd: error reading '/mnt/g/
0+0 records in
0+0 records out
0 bytes copied, 0.000773759 s, 0.0 kB/s
Here we see that both files claim to be the right size but restart-3.bin is unreadable after the offset above.
-rw-r--r-- 1 root root 10737418240 Oct 17 17:34 restart-1.bin
-rw-r--r-- 1 root root 10737418240 Oct 17 17:44 restart-3.bin
This fails on Ubuntu Server 16.04 with btrfs-progs 4.4 and 4.8, and now Ubuntu Server 16.10. Removing the discard flag from btrfs entry in fstab will result in failure to reproduce, also removing the power event will also result in a failure to reproduce.
----Reproducibi
Ubuntu Server 16.04 / BTRFS-PROGS 4.4 : 100% within 10 restarts, 25-30 reproductions
Ubuntu Server 16.04 / BTRFS-PROGS 4.8 : 100% within 10 restarts, 5 reproductions
Ubuntu Server 16.10 / BTRFS-PROGS 4.7 : 100% within 10 restarts, 3 reproductions
----System Information----
Distro : ubuntu 16.10
Kernel : Linux 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64
CPU : Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (1261.444)
CPUCores: 4
Model : Gigabyte Technology Co., Ltd. Z170M-D3H-CF
BIOS : American Megatrends Inc. F2
--DUT Controller Info---
PCI Bus ID : 0000:00:17.0
Device Path: /sys/bus/
Module Name: ahci
Module Vers: 3.0
---DUT Controller Bus---
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31) (prog-if 01 [AHCI 1.0])
(END)
---DUT Layout---
/dev/sdb4 ext4 110G 23G 82G 22% /mnt/i
/dev/sdb1 ext4 110G 39G 66G 37% /mnt/f
/dev/sdb2 btrfs 112G 33G 79G 30% /mnt/g
/dev/sdb3 xfs 112G 25G 88G 22% /mnt/h
btrfs-tools:
Installed: 4.7.3-1
Candidate: 4.7.3-1
Version table:
*** 4.7.3-1 500
500 http://
100 /var/lib/
---- Logs ----
17-10 17:43:41 | -------
17-10 17:43:41 | loopy : restart 3 - pre-power copy
17-10 17:43:41 | -------
17-10 17:43:41 | Copying from /systemtest/
17-10 17:43:41 | Started tag cp-_mnt_
17-10 17:43:41 | Copying from /systemtest/
17-10 17:43:41 | Started tag cp-_mnt_
17-10 17:43:41 | Copying from /systemtest/
17-10 17:43:41 | Started tag cp-_mnt_
17-10 17:43:41 | Copying from /systemtest/
17-10 17:43:41 | Started tag cp-_mnt_
17-10 17:43:41 | -------
17-10 17:43:41 | Monitoring 4 pids for 999 minutes
17-10 17:44:15 | PID 2933 - cp-_mnt_
17-10 17:45:07 | PID 2944 - cp-_mnt_
17-10 17:45:12 | PID 2922 - cp-_mnt_
17-10 17:45:13 | PID 2955 - cp-_mnt_
17-10 17:45:14 | All tags exhausted
17-10 17:45:14 | -------
17-10 17:45:14 |
17-10 17:45:25 |
17-10 17:45:25 | -------
17-10 17:45:25 | loopy : restart 3 - pre-power verification
17-10 17:45:25 | -------
17-10 17:45:25 | Verifying /mnt/f/
17-10 17:45:25 | Started tag restart-_mnt_f_-pre [5031]
17-10 17:45:25 | Verifying /mnt/g/
17-10 17:45:25 | Started tag restart-_mnt_g_-pre [5045]
17-10 17:45:25 | Verifying /mnt/h/
17-10 17:45:25 | Started tag restart-_mnt_h_-pre [5059]
17-10 17:45:25 | Verifying /mnt/i/
17-10 17:45:25 | Started tag restart-_mnt_i_-pre [5073]
17-10 17:45:25 | -------
17-10 17:45:25 | Monitoring 4 pids for 999 minutes
17-10 17:46:40 | PID 5045 - restart-_mnt_g_-pre - FAILED. Exit: 1
17-10 17:46:40 | FAILED: 5045 has failed.
17-10 17:46:40 | -------
17-10 17:46:40 | ERROR: Failed during restart 3 pre-power event verification [Line:499]
BTRFS warning (device sdb2): csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362
BTRFS warning (device sdb2): csum failed ino 262 off 9985982464 csum 1218422395 expected csum 1497608406
BTRFS warning (device sdb2): csum failed ino 262 off 9986113536 csum 3058027576 expected csum 25891403
This also fails on Antergos kernel 4.8.2-1-ARCH with btrfs-progs 4.8.1
Will update this bug if/when patches go in from the btrfs mailing list.