SRU: walker list corruption while being intensively stressed

Bug #1526811 reported by Colin Ian King
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Fix Released
High
Colin Ian King
Wily
Fix Released
Undecided
Colin Ian King
Xenial
Fix Released
High
Colin Ian King

Bug Description

[SRU Justification][Wily] + [Xenial]

While stress testing with the stress-ng procfs stressor I hit a walker list bug. This has been recently fixed by Herbert Xu in commit:

The commit ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable: Fix sleeping inside RCU critical section in walk_stop") introduced a new spinlock for the walker list. However, it did not convert all existing users of the list over to the new spin lock. Some continued to use the old mutex for this purpose. This obviously led to corruption of the list.

[Fix]
Clean upstream cherry pick, commit c6ff5268293ef98e48a99597e765ffc417e39fa5
Will land in Xenial automatically (4.4)

[Testcase]
Run multiple instances of the attached code on a multicore system. Alternatively, run stress-ng --procfs 0 on a multi-core system

Fix will stop the above code corrupting the list and crashing.

Revision history for this message
Colin Ian King (colin-king) wrote :
information type: Private Security → Public
Revision history for this message
Colin Ian King (colin-king) wrote :

Fix will land in Linux 4.4 Ubuntu Xenial automatically, commit c6ff5268293ef98e48a99597e765ffc417e39fa5

Wily SRU submitted to kernel team mailing list, https://lists.ubuntu.com/archives/kernel-team/2015-December/067287.html

description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Wily):
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (4.5 KiB)

Looks like the fix triggered another issue, so we need to see how upstream fixes this one.

Kernel test robot picked up a bug from this fix:

FYI, we noticed the below changes on

https://github.com/0day-ci/linux
Herbert-Xu/rhashtable-Fix-walker-list-corruption/20151216-164833
commit f9f51b8070be3e829100614a7372b219723b864f ("rhashtable: Fix walker
list corruption")

[ 8.933376] ===============================
[ 8.933376] ===============================
[ 8.934629] [ INFO: suspicious RCU usage. ]
[ 8.934629] [ INFO: suspicious RCU usage. ]
[ 8.935941] 4.4.0-rc3-00995-gf9f51b8 #2 Not tainted
[ 8.935941] 4.4.0-rc3-00995-gf9f51b8 #2 Not tainted
[ 8.937494] -------------------------------
[ 8.937494] -------------------------------
[ 8.938818] lib/rhashtable.c:504 suspicious
rcu_dereference_protected() usage!
[ 8.938818] lib/rhashtable.c:504 suspicious
rcu_dereference_protected() usage!
[ 8.941705]
[ 8.941705] other info that might help us debug this:
[ 8.941705]
[ 8.941705]
[ 8.941705] other info that might help us debug this:
[ 8.941705]
[ 8.944161]
[ 8.944161] rcu_scheduler_active = 1, debug_locks = 0
[ 8.944161]
[ 8.944161] rcu_scheduler_active = 1, debug_locks = 0
[ 8.946244] 1 lock held by swapper/0/1:
[ 8.946244] 1 lock held by swapper/0/1:
[ 8.947463] #0:
[ 8.947463] #0: (
(&(&ht->lock)->rlock&(&ht->lock)->rlock){+.+...}){+.+...}, at: , at:
[<ffffffff814b8900>] rhashtable_walk_init+0x70/0x150
[<ffffffff814b8900>] rhashtable_walk_init+0x70/0x150
[ 8.950428]
[ 8.950428] stack backtrace:
[ 8.950428]
[ 8.950428] stack backtrace:
[ 8.951770] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.4.0-rc3-00995-gf9f51b8 #2
[ 8.951770] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.4.0-rc3-00995-gf9f51b8 #2
[ 8.954245] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Debian-1.8.2-1 04/01/2014
[ 8.954245] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Debian-1.8.2-1 04/01/2014
[ 8.956973] 0000000000000001
[ 8.956973] 0000000000000001 ffff880078393d30 ffff880078393d30
ffffffff81493238 ffffffff81493238 ffff88007838c040 ffff88007838c040

[ 8.959333] ffff880078393d60
[ 8.959333] ffff880078393d60 ffffffff8112cb9f ffffffff8112cb9f
ffff880078393da0 ffff880078393da0 ffffffff83e9d6c0 ffffffff83e9d6c0

[ 8.961684] ffffffff83e9d7f0
[ 8.961684] ffffffff83e9d7f0 ffff880061720e00 ffff880061720e00
ffff880078393d90 ffff880078393d90 ffffffff814b89c8 ffffffff814b89c8

[ 8.964148] Call Trace:
[ 8.964148] Call Trace:
[ 8.964955] [<ffffffff81493238>] dump_stack+0x7c/0xb4
[ 8.964955] [<ffffffff81493238>] dump_stack+0x7c/0xb4
[ 8.966728] [<ffffffff8112cb9f>] lockdep_rcu_suspicious+0x14f/0x1c0
[ 8.966728] [<ffffffff8112cb9f>] lockdep_rcu_suspicious+0x14f/0x1c0
[ 8.968753] [<ffffffff814b89c8>] rhashtable_walk_init+0x138/0x150
[ 8.968753] [<ffffffff814b89c8>] rhashtable_walk_init+0x138/0x150
[ 8.970567] [<ffffffff815021d8>] test_bucket_stats+0x22/0x17d
[ 8.970567] [<ffffffff815021d8>] test_bucket_stats+0x22/0x17d
[ 8.972682] [<ffffffff82dda0fa>] test_rhashtable+0xe0/0x12ac
[ 8.972682] [<ffffffff...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :

Seems that upstream commit 179ccc0a73641ffd24e44ff10a7bd494efe98d8d ("rhashtable: Kill harmless RCU warning in rhashtable_walk_init") is also required to stop the RCU warning in rhashtable_walk_init

Brad Figg (brad-figg)
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily
Revision history for this message
Colin Ian King (colin-king) wrote :

A better reproducer is running stress-ng --procfs 0 on a multi-core machine. Without the fix, it oopses in less than a second. With the fix, it works perfectly, no oopsing.

Tested on 4.2.0-24-generic #29-Ubuntu, ran soak test for 600 seconds on an 8 proc Xeon box:

stress-ng: info: [3044] successful run completed in 600.23s (10 mins, 0.23 secs)
stress-ng: info: [3044] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [3044] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [3044] procfs 8 600.00 151.83 4646.48 0.01 0.00
stress-ng: info: [3044] procfs:

tags: added: verification-done-wily
removed: verification-needed-wily
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.3.0-6.17

---------------
linux (4.3.0-6.17) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1532958

  [ Eric Dumazet ]

  * SAUCE: (noup) net: fix IP early demux races
    - LP: #1526946

  [ Guilherme G. Piccoli ]

  * SAUCE: powerpc/eeh: Validate arch in eeh_add_device_early()
    - LP: #1486180

  [ Hui Wang ]

  * [Config] CONFIG_I2C_DESIGNWARE_BAYTRAIL=y, CONFIG_IOSF_MBI=y
    - LP: #1527096

  [ Jann Horn ]

  * ptrace: being capable wrt a process requires mapped uids/gids
    - LP: #1527374

  [ Serge Hallyn ]

  * SAUCE: add a sysctl to disable unprivileged user namespace unsharing

  [ Tim Gardner ]

  * [Config] CONFIG_ZONE_DEVICE=y for amd64
  * [Config] CONFIG_VIRTIO_BLK=y, CONFIG_VIRTIO_NET=y for s390
    - LP: #1532886

  [ Upstream Kernel Changes ]

  * rhashtable: Fix walker list corruption
    - LP: #1526811
  * rhashtable: Kill harmless RCU warning in rhashtable_walk_init
    - LP: #1526811
  * ovl: fix permission checking for setattr
    - LP: #1528904
    - CVE-2015-8660

 -- Tim Gardner <email address hidden> Thu, 17 Dec 2015 05:34:47 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 4.2.0-27.32

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.