fanotify07 in LTP syscall test generates kernel trace with T/X kernel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
Confirmed
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Trusty |
Won't Fix
|
Medium
|
Unassigned | ||
Xenial |
Won't Fix
|
Medium
|
Matthew Ruffell |
Bug Description
BugLink: https:/
[Impact]
When userspace tasks which are processing fanotify permission events act
incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
the whole notification subsystem to hang.
This has been seen in production, and it can also be seen when running the
Linux Test Project testsuite, specifically fanotify07.
[Fix]
Instead of holding the SRCU lock while waiting for userspace to respond,
which may never happen, or not in the order we are expecting, we drop the
fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then
reacquire the lock again when userspace responds.
The fixes are from a series of upstream commits:
05f0e38724e8449
9385a84d7e1f658
abc77577a669f42
The following are upstream commits necessary for the fixes to function:
35e481761cdc688
0918f1c309b8630
[Testcase]
You can reproduce the problem pretty quickly with the Linux Test Project:
Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https:/
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs
On a stock Xenial kernel, the system will hang, and the testcase will look like:
<<<test_start>>>
tag=fanotify07 stime=1554326200
cmdline="fanotify07 "
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Cannot kill test processes!
Congratulation, likely test hit a kernel bug.
Exitting uncleanly...
<<<execution_
initiation_
duration=350 termination_
cutime=0 cstime=0
<<<test_end>>>
Looking at dmesg, we see the following call stack
[ 790.772792] LTP: starting fanotify07 (fanotify07 )
[ 960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
[ 960.140867] Not tainted 4.4.0-142-generic #168-Ubuntu
[ 960.141185] "echo 0 > /proc/sys/
[ 960.141498] fsnotify_mark D ffff8800b6703c98 0 36 2 0x00000000
[ 960.141516] ffff8800b6703c98 ffff88013a558a00 ffff8800b7797000 ffff8800b66f8000
[ 960.141524] ffff8800b6704000 7fffffffffffffff ffff8800b6703de0 ffff8800b66f8000
[ 960.141528] 0000000000000000 ffff8800b6703cb0 ffffffff8185cb45 ffff8800b6703de8
[ 960.141532] Call Trace:
[ 960.141580] [<ffffffff8185c
[ 960.141588] [<ffffffff81860
[ 960.141617] [<ffffffff810f5
[ 960.141621] [<ffffffff8185c
[ 960.141625] [<ffffffff8185d
[ 960.141636] [<ffffffff810b1
[ 960.141641] [<ffffffff810eb
[ 960.141645] [<ffffffff810ea
[ 960.141664] [<ffffffff81260
[ 960.141669] [<ffffffff810eb
[ 960.141672] [<ffffffff81260
[ 960.141680] [<ffffffff810ca
[ 960.141691] [<ffffffff810a6
[ 960.141694] [<ffffffff8185c
[ 960.141699] [<ffffffff810a6
[ 960.141703] [<ffffffff81861
[ 960.141706] [<ffffffff810a6
The vanilla 4.4 kernel also shows the same call stack.
On a patched kernel, the test will pass successfully, and there will be no
messages in dmesg.
[Regression Potential]
This makes modifications to how locking is performed in fsnotify / fanotify and
there may be some cause for regression. Running all fanotify Linux Test Project
tests shows that there are no extra failures caused by the patches, and instead
fewer failures are seen due to the bugfix.
Running the entire Linux Test Project testsuite actually works and runs to
completion, somewhich doesn't happen in a unpatched kernel since it will hang
on the fanotify07 test.
The patches are taken from upstream, and all necessary commits have been taken
into account, so I am happy with the potential risks and that testing has been
completed.
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Trusty): | |
status: | New → Triaged |
Changed in linux (Ubuntu): | |
status: | Confirmed → Triaged |
Changed in linux (Ubuntu Trusty): | |
importance: | Undecided → Medium |
summary: |
- fanotify07 in LTP syscall test generates kernel trace with T kernel + fanotify07/fanotify08 in LTP syscall test generates kernel trace with T + kernel |
description: | updated |
tags: | added: ppc64le |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
status: | New → In Progress |
assignee: | nobody → Matthew Ruffell (mruffell) |
tags: | added: sts |
description: | updated |
tags: | added: xenial |
tags: |
added: ppc64el removed: ppc64le |
tags: | added: cscc |
tags: | added: linux-oracle oracle sru-20190722 |
summary: |
- fanotify07/fanotify08 in LTP syscall test generates kernel trace with - T/X/X-AWS kernel + fanotify07 in LTP syscall test generates kernel trace with T/X/X-AWS + kernel |
summary: |
- fanotify07 in LTP syscall test generates kernel trace with T/X/X-AWS - kernel + fanotify07 in LTP syscall test generates kernel trace with T/X kernel |
This change was made by a bot.