bcache makes the whole io system hang after long run time

Bug #1724173 reported by Peter Maloney
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux-lts-xenial (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I am using Ubuntu 14.04 (trusty) with the 4.4.x xenial kernel (the trusty kernel is way easier to make bcache crash). I have mdadm raid1 on /boot and /, backed by 2 SSDs.

I have XFS on 12 ceph directories (/var/lib/ceph/osd/ceph-*), which is backed by bcache, which is backed by one separate disk per osd directory, plus a bcache cache device on an NVMe PCIe device. The bcache cache is shared by all 12 of the osd bcache devices.

I also have 2 unused bcache cache devices on the SSDs, without mdadm raid. This hang problem was much more frequent with the cache there, and I suspected mdadm+bcache together, so I moved it to the NVMe.

The problem happens on all these devices used as bcache: Micron S630DC-400 (firmware M013 and M017), SAMSUNG MZ7KM480HMHQ-00005 (SM863a, firmware GXM5004Q), Intel DC P3700 800GB.

If I let the machines run for a few days, and then detach and attach cache devices, it was very easy to hang it with the cache on the SSDs, but with it on the NVMe, I haven't seen that yet. The uptime on the machine was 33-34 days, and the other ones with same setup are now at 69 days 22h (that's when I changed the cache to NVMe).

For the latest hang, when the machine hangs, the text tty at the local terminal has the login prompt, but no stack trace or anything, and typing into them has no effect, not even echoing what is typed. Terminals connected previously with ssh are just hung, not responding to anything. New ssh connections fail. ping to the hung machine still replies. Soft shutdown via IPMI doesn't appear to do anything.

I will attach 2 files collected like: `ssh machine cat /dev/kmsg > cephX.kmsg` (since dmesg -w isn't supported here, and the logs do not get saved on the machine's disk since the IO system is hung).

----- Ubuntu bug reporting guidelines stuff -----

# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04

not including uname -a, apt-cache policy, since the kernel running now is different. It was linux-image-4.4.0-93-generic when it crashed. (and the previous crash was with 4.4.0-78-generic)

Also you should likely discard similar information from the apport collect data, which is from this boot, not the previously hung one.

----- debugging procedures stuff -----

https://help.ubuntu.com/community/DebuggingSystemCrash

It wants a memtest, but these machines were tested in the past, and it affects more than 2 machines, so that's not useful.

I'll try to remember to try Alt+SysRq+1,t next time.

I think the other sections are about getting dmesg output, which I have already, so I'll skip that.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-4.4.0-97-generic 4.4.0-97.120~14.04.1
ProcVersionSignature: Ubuntu 4.4.0-97.120~14.04.1-generic 4.4.87
Uname: Linux 4.4.0-97-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.25
Architecture: amd64
Date: Tue Oct 17 10:50:03 2017
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-lts-xenial
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Peter Maloney (peter-maloney) wrote :
Revision history for this message
Peter Maloney (peter-maloney) wrote :
Revision history for this message
Peter Maloney (peter-maloney) wrote :
description: updated
Revision history for this message
Peter Maloney (peter-maloney) wrote :

Problem still exists on 4.4.0-97-generic. And in my attempt to reproduce it on a VM, I got a corrupt kernel instead, which sounds worse (could create some persistent damage, not just a hang).

Attached are the 3 scripts to reproduce the corrupt kernel, and an extra one to attach if the first failed.

- Make a VM with at least 2GB RAM.
- Run crash-setup first (it uses a loop device backed by something in /dev/shm so it doesn't have to wear out an SSD).
- Then verify the cache is attached, or attach it if not.
    Output of these 2 should match:
        bcache-super-show /dev/loop0 | awk '$1=="cset.uuid"{print $2}'
        basename $(readlink /sys/block/bcache0/bcache/cache)
    This should say [writeback]:
        cat /sys/block/bcache0/bcache/cache_mode
    eg. writethrough [writeback] writearound none
- Then run the setup-grep and the setup-fio at the same time, like in screen or 2 terminals. The grep is not intended to be only information output... I strongly believe that reading the files in sysfs causes crashing, as I have caused hangs and other problems many times this way in the
past.
- wait a while... could be as soon as 20 minutes, or some hours.

So far I made this VM die 3 times... it wasn't possible ot get the console output or dmesg as I wanted and I didn't save a screenshot either the first 2 times... and the 3rd time also failed, even with a serial console set up, so attached is a screenshot.

Revision history for this message
Peter Maloney (peter-maloney) wrote :
Revision history for this message
Peter Maloney (peter-maloney) wrote :

tested again with a better serial console setup, and here's the text version like that screenshot:

[ 744.779536] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffc00d3ccc
[ 744.779536]
[ 744.780013] CPU: 0 PID: 5087 Comm: grep Not tainted 4.4.0-97-generic #120~14.04.1-Ubuntu
[ 744.780013] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.3-20171021_125229-anatol 04/01/2014
[ 744.780013] 0000000000000000 ffff8800791afc48 ffffffff813df51c ffffffff81cbae98
[ 744.780013] 000000ad5f71c8e7 ffff8800791afcc0 ffffffff8118356c 000000ad00000010
[ 744.780013] ffff8800791afcd0 ffff8800791afc70 ffff8800367a6b40 ffffffffc00d3ccc
[ 744.780013] Call Trace:
[ 744.780013] [<ffffffff813df51c>] dump_stack+0x63/0x87
[ 744.780013] [<ffffffff8118356c>] panic+0xc8/0x20f
[ 744.786378] [<ffffffffc00d3ccc>] ? __bch_cached_dev_show+0x4ac/0x4b0 [bcache]
[ 744.786877] [<ffffffff8107e82b>] __stack_chk_fail+0x1b/0x20
[ 744.786877] [<ffffffffc00d3ccc>] __bch_cached_dev_show+0x4ac/0x4b0 [bcache]
[ 744.786877] [<ffffffff8120fe69>] ? path_openat+0x2e9/0x12d0
[ 744.786877] [<ffffffffc00d3d01>] bch_cached_dev_show+0x31/0x50 [bcache]
[ 744.786877] [<ffffffff8127f822>] sysfs_kf_seq_show+0xc2/0x1a0
[ 744.786877] [<ffffffff8127e043>] kernfs_seq_show+0x23/0x30
[ 744.786877] [<ffffffff812255bb>] seq_read+0xeb/0x360
[ 744.786877] [<ffffffff8127e7dd>] kernfs_fop_read+0x10d/0x170
[ 744.786877] [<ffffffff81201b18>] __vfs_read+0x18/0x40
[ 744.786877] [<ffffffff812020cf>] vfs_read+0x7f/0x130
[ 744.793893] [<ffffffff81202ea6>] SyS_read+0x46/0xa0
[ 744.794753] [<ffffffff8180f4f6>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 744.795617] Kernel Offset: disabled
[ 744.795617] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffc00d3ccc
[ 744.795617]

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi there was a slight issue in the repro relying on sda5, I changed that to be on shm as well.
2x800M are easy to spare. Hope it triggers with that as well.

# Get a 3G guest with full console logging
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=xenial
$ uvt-kvm create --password=ubuntu --log-console-output --memory 3096 xenial-bcache-test release=xenial arch=amd64 label=daily

# setup guest into the crash in the guest
$ sudo apt update
$ sudo apt install fio
$ wget https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1724173/+attachment/5008126/+files/bcachecrash.tgz
$ tar xvf bcachecrash.tgz
# Add in the fixed version of crash-setup from http://paste.ubuntu.com/26063669/
$ sudo ./bcachecrash/crash-setup
$ sudo ./bcachecrash/crash-attach

# now on two consoles
$ sudo ./bcachecrash/crash-grep
$ sudo ./bcachecrash/crash-fio

Not sure if it triggers the bug, but with those fixes it runs ...
Will let you know if it crashed the kernel before I needed to recycle the machine.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This one sets up a slower device which might be important according to IRC discussions
http://paste.ubuntu.com/26063765/

Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-xenial (Ubuntu):
status: New → Confirmed
Revision history for this message
Xav Paice (xavpaice) wrote :

We're also seeing this with 4.4.0-111-generic (on Trusty), and a very similar hardware profile. The boxes in question are running Swift with a large (millions) number of objects all approx 32k in size.

I'm currently fio'ing in a test environment to try to reproduce this away from production.

Revision history for this message
Peter Maloney (peter-maloney) wrote :

@xavpaice Is anyone else with the problem also using isdct (Intel® SSD Data Center Tool for NVMe)? I have not had the problem since disabling it from running regularly. I was using it to check the wear level on regular intervals before.

And the reason I thought of that was that finally one time I was able to run a few shell commands shortly after a hang, and the only things that would hang were the ones directly related to the NVMe (surely tested isdct, probably tested parted, dd, etc.). (and then eventually everything was hung as normal)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.