kernel BUG at /build/linux-hwe-IJy1zi/linux-hwe-4.10.0/fs/cachefiles/namei.c:164!

Bug #1727861 reported by Ken Sharp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
cachefilesd (Debian)
Fix Released
Unknown
linux-meta-hwe (Ubuntu)
Fix Committed
Undecided
Unassigned

Bug Description

I have just installed cachefilesd and rebooted. The only change I made was the location of the cache - everything else is set to default. The cache location exists and is writable.

I thought I'd test so first attempt:

$ cat /mnt/server/share/file > /dev/null

No problem.

Second attempt:

$ cat /mnt/server/share/file > /dev/null

[ 471.283546] ------------[ cut here ]------------
[ 471.288354] kernel BUG at /build/linux-hwe-IJy1zi/linux-hwe-4.10.0/fs/cachefiles/namei.c:164!
[ 471.296977] invalid opcode: 0000 [#1] SMP
[ 471.301172] Modules linked in: arc4 md4 cachefiles cifs ccm fscache ip6table_filter ip6_tables iptable_filter ip_tables x_tables ppdev pvpanic input_leds parport_pc serio_raw parport autofs4 btrfs xor raid6_pq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd psmouse virtio_net virtio_scsi
[ 471.332032] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-37-generic #41~16.04.1-Ubuntu
[ 471.340656] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 471.350158] Workqueue: fscache_object fscache_object_work_func [fscache]
[ 471.356972] task: ffff8ddee426ad00 task.stack: ffffbb3f401b0000
[ 471.363083] RIP: 0010:cachefiles_walk_to_object+0x718/0x890 [cachefiles]
[ 471.370055] RSP: 0018:ffffbb3f401b3d18 EFLAGS: 00010202
[ 471.375385] RAX: ffff8ddee18a6201 RBX: ffff8ddee0b6bd28 RCX: 0000000000019523
[ 471.382897] RDX: 0000000000019522 RSI: ffff8ddee621c720 RDI: ffff8ddee4001280
[ 471.390234] RBP: ffffbb3f401b3dc8 R08: 000000000001c720 R09: ffffffffc0459068
[ 471.397468] R10: ffffe66040862980 R11: ffffffff97c487ed R12: ffff8ddee11750c0
[ 471.404704] R13: ffff8ddee0b6bd80 R14: ffff8ddee0b6bd28 R15: ffff8ddee1175780
[ 471.411991] FS: 0000000000000000(0000) GS:ffff8ddee6200000(0000) knlGS:0000000000000000
[ 471.420180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 471.426037] CR2: 00000000025c4008 CR3: 0000000020b6a000 CR4: 00000000001406f0
[ 471.433359] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 471.440678] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 471.447921] Call Trace:
[ 471.450478] ? wake_atomic_t_function+0x60/0x60
[ 471.455109] cachefiles_lookup_object+0x51/0xf0 [cachefiles]
[ 471.460872] fscache_look_up_object+0xa6/0x1c0 [fscache]
[ 471.466378] fscache_object_work_func+0x9d/0x200 [fscache]
[ 471.472012] process_one_work+0x16b/0x4a0
[ 471.476118] worker_thread+0x4b/0x500
[ 471.479880] kthread+0x109/0x140
[ 471.483294] ? process_one_work+0x4a0/0x4a0
[ 471.487577] ? kthread_create_on_node+0x60/0x60
[ 471.492208] ret_from_fork+0x2c/0x40
[ 471.495879] Code: 78 ff ff ff 4c 89 f7 49 89 de 48 8b 00 ff 50 58 e9 b5 fc ff ff 48 c7 c7 68 b3 45 c0 e8 b5 75 75 d6 31 f6 4c 89 ef e8 bd 2c 00 00 <0f> 0b 41 bc ff ff ff ff 48 8b 90 98 00 00 00 48 8b bd 78 ff ff
[ 471.514854] RIP: cachefiles_walk_to_object+0x718/0x890 [cachefiles] RSP: ffffbb3f401b3d18
[ 471.523171] ---[ end trace d57820f35931610a ]---

I tried with the following builds but the result is the same:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/linux-image-4.10.17-041017-generic_4.10.17-041017.201705201051_amd64.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc6/linux-image-4.14.0-041400rc6-generic_4.14.0-041400rc6.201710230731_amd64.deb

I tried setting the cache back to the default of /var/cache/fscache and it seemed to start working. I ran the test three or four times. Then it failed in the same way.

I'm running Ubuntu 16.04 on Google Cloud Compute. In case it's a filesystem issue:

$ sudo dumpe2fs -h /dev/sda1
dumpe2fs 1.42.13 (17-May-2015)
Filesystem volume name: cloudimg-rootfs
Last mounted on: /
Filesystem UUID: 67e1c0a6-cdc9-4247-b2e5-dd20ef042025
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1280000
Block count: 2621179
Reserved block count: 0
Free blocks: 1076375
Free inodes: 1066233
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 127
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16000
Inode blocks per group: 1000
Flex block group size: 16
Filesystem created: Tue May 16 15:25:31 2017
Last mount time: Thu Oct 26 21:58:51 2017
Last write time: Thu Oct 26 21:58:44 2017
Mount count: 40
Maximum mount count: -1
Last checked: Tue May 16 21:30:38 2017
Check interval: 0 (<none>)
Lifetime writes: 337 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 264319
Default directory hash: half_md4
Directory Hash Seed: d8fb3bde-69b0-4404-9797-471e5fffc1ba
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 64M
Journal length: 16384
Journal sequence: 0x00016c30
Journal start: 4274

$ sudo dumpe2fs -h /dev/sdb
dumpe2fs 1.42.13 (17-May-2015)
Filesystem volume name: <none>
Last mounted on: /mnt/upload
Filesystem UUID: 6eea1b82-bb0a-44fa-b091-a8ee534dfa08
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 13107200
Block count: 52428800
Reserved block count: 2621440
Free blocks: 50885632
Free inodes: 13107162
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1011
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Sun Oct 22 14:34:40 2017
Last mount time: Thu Oct 26 21:58:54 2017
Last write time: Thu Oct 26 21:58:54 2017
Mount count: 3
Maximum mount count: -1
Last checked: Thu Oct 26 21:28:19 2017
Check interval: 0 (<none>)
Lifetime writes: 552 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Default directory hash: half_md4
Directory Hash Seed: 4d9800bc-b23f-4f2e-86e4-b42f37fc3288

$ mount | fgrep sd
/dev/sda1 on / type ext4 (rw,relatime,discard,data=ordered)
/dev/sdb on /mnt/upload type ext4 (rw,nosuid,nodev,noexec,relatime,block_validity,discard,delalloc,barrier,user_xattr,acl)

I don't know what else it could be unless it is a genuine kernel bug.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-virtual-hwe-16.04 4.10.0.37.39
ProcVersionSignature: Ubuntu 4.10.0-37.41~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-37-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
Date: Thu Oct 26 21:05:56 2017
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-meta-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
In , fhimpe (fhimpe-linux-kernel-bugs) wrote :
Download full text (11.9 KiB)

I encountered this bug on a Debian Stretch system with an NFS 4 mount
and cachefilesd 0.10.9. NFS 4 file systems are mounted like this from
fstab:

master.cluster.ai:/home /home nfs4 _netdev,fsc,noatime,vers=4.2,nodev,nosuid 0 0
master.cluster.ai:/cluster /srv/cluster nfs4 _netdev,fsc,noatime,vers=4.2,nodev,nosuid 0 0

cachefilesd.conf contains this:

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%
secctx system_u:system_r:cachefiles_kernel_t:s0

[2967555.161617] CacheFiles:
[2967555.161652] CacheFiles: Error: Unexpected object collision
[2967555.161712] CacheFiles: object: OBJ35e9f
[2967555.161744] CacheFiles: objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
[2967555.161790] CacheFiles: ops=0 inp=0 exc=0
[2967555.161821] CacheFiles: parent=ffff93a412926900
[2967555.161856] CacheFiles: cookie=ffff93aa080e53c0 [pr=ffff93a5327ce870 nd=ffff93acaa1be810 fl=27]
[2967555.161918] CacheFiles: key=[32] '010006812fb038644d2346408da129441408b285fb15aa4600000000676141aa'
[2967555.162076] CacheFiles: xobject: OBJ35e39
[2967555.162108] CacheFiles: xobjstate=WAIT_FOR_CMD fl=38 wbusy=1 ev=40[6d]
[2967555.162156] CacheFiles: xops=0 inp=0 exc=0
[2967555.162188] CacheFiles: xparent=ffff93a412926900
[2967555.162225] CacheFiles: xcookie=ffff939ae8a6cbe0 [pr=ffff93a5327ce870 nd= (null) fl=18]
[2967555.162325] ------------[ cut here ]------------
[2967555.162361] kernel BUG at /build/linux-zDY19G/linux-4.8.15/fs/cachefiles/namei.c:197!
[2967555.162416] invalid opcode: 0000 [#1] SMP
[2967555.162446] Modules linked in: bluetooth(E) rfkill(E) crc16(E) msr(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) cachefiles(E) fscache(E) xt_multiport(E) iptable_filter(E) dm_mod(E) binfmt_misc cpuid cpufreq_userspace cpufreq_powersave cpufreq_conservative intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp mgag200 kvm_intel kvm ipmi_devintf irqbypass ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt i2c_algo_bit iTCO_vendor_support pcspkr evdev dcdbas sg ipmi_si ipmi_msghandler tpm_tis ioatdma wmi mei_me mei lpc_ich mfd_core tpm_tis_core tpm shpchp acpi_power_meter dca acpi_pad button psmouse ip_tables x_tables autofs4 xfs libcrc32c btrfs crc32c_generic xor raid6_pq hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci xhci_pci libahci xhci_hcd ehci_pci libata megaraid_sas ehci_hcd tg3 usbcor
 e ptp scsi_mod pps_core libphy usb_common fjes
[2967555.163363] CPU: 29 PID: 206756 Comm: kworker/u578:0 Tainted: G E 4.8.0-2-amd64 #1 Debian 4.8.11-1
[2967555.163432] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
[2967555.163496] Workqueue: fscache_object fscache_object_work_func [fscache]
[2967555.163546] task: ffff93aa2dec9000 task.stack: ffff93b18f1bc000
[2967555.163590] RIP: 0010:[<ffffffffc06d62a3>] [<ffffffffc06d62a3>] cachefiles_walk_to_object+0x7a3/0x800 [cachefiles]
[2967555.163668] RSP: 0018:ffff93b18f1bfd18 EFLAGS: 00010286
[2967555.163708] RAX: 0000000000000000 RBX: fff...

Revision history for this message
In , fhimpe (fhimpe-linux-kernel-bugs) wrote :

Created attachment 254839
dmesg

Revision history for this message
Ken Sharp (kennybobs) wrote :
Revision history for this message
In , imwellcushtymelike (imwellcushtymelike-linux-kernel-bugs) wrote :

Same trace in Ubuntu Xenial on a Google Cloud image, but this time with CIFS.

https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe/+bug/1727861

I cannot get fscache to work at all. It looks fairly broken.

Revision history for this message
In , imwellcushtymelike (imwellcushtymelike-linux-kernel-bugs) wrote :

And Debian:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=751933

Is the fault in cachefilesd?

Revision history for this message
Ken Sharp (kennybobs) wrote :

A Debian user has reported this upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=194639

Changed in cachefilesd (Debian):
status: Unknown → New
Revision history for this message
In , imwellcushtymelike (imwellcushtymelike-linux-kernel-bugs) wrote :

Does this need to be e-mailed somewhere?

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Ken Sharp (kennybobs)
no longer affects: cachefilesd (Ubuntu)
Revision history for this message
Ken Sharp (kennybobs) wrote :

The HWE kernel has been pushed to 4.15 and the problem has disappeared (same machine). Tested again on Xenial and Bionic (which has the newer kernel) and they work fine.

I haven't yet tested Trusty but I suspect that it isn't yet fixed. I will update asap.

Changed in linux-meta-hwe (Ubuntu):
status: New → Fix Committed
Revision history for this message
In , imwellcushtymelike (imwellcushtymelike-linux-kernel-bugs) wrote :

This is fixed somewhere between kernel 4.13 and 4.15. Vague, I know, but it's a starting point. Longterm support kernels may still be affected but testing would take me an age. If I get time I'll test further.

Changed in cachefilesd (Debian):
status: New → Confirmed
Changed in cachefilesd (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.