dev test from ubuntu_stress_smoke_tests hang with T-3.13 on some AWS instances

Bug #1978082 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

Issue found on 3.13.0-190 with stress-ng commit 8a8add4, the dev test will hang on the following AWS cloud instances:
  * c5n.large
  * i3.metal
  * i3en.24xlarge
  * m5a.large
  * r5.large
  * r5.metal
  * t3.medium
  * t3a.2xlarge

Note that it's been skipped on the following instance due to they're too old:
  * c3.xlarge
  * c4.large
  * t2.small
  * x1e.xlarge

This looks like a test-case issue, it will pass with stress-ng V0.13.07

Test output:
    $ time sudo ./stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable
    stress-ng: debug: [2615] stress-ng 0.13.11 gde14c6695830
    stress-ng: debug: [2615] system: Linux ip-172-31-14-180 3.13.0-190-generic #241-Ubuntu SMP Tue May 31 12:06:16 UTC 2022 x86_64
    stress-ng: debug: [2615] RAM total: 15.4G, RAM free: 15.1G, swap free: 0.0
    stress-ng: debug: [2615] 2 processors online, 2 processors configured
    stress-ng: info: [2615] setting to a 5 second run per stressor
    stress-ng: info: [2615] dispatching hogs: 4 dev
    stress-ng: debug: [2615] cache allocate: shared cache buffer size: 33792K
    stress-ng: debug: [2615] starting stressors
    stress-ng: debug: [2616] stress-ng-dev: started [2616] (instance 0)
    stress-ng: debug: [2615] 4 stressors started
    stress-ng: debug: [2617] stress-ng-dev: started [2617] (instance 1)
    stress-ng: debug: [2619] stress-ng-dev: started [2619] (instance 3)
    stress-ng: debug: [2618] stress-ng-dev: started [2618] (instance 2)
    stress-ng: debug: [2617] stress-ng-dev: exited [2617] (instance 1)
    stress-ng: debug: [2619] stress-ng-dev: exited [2619] (instance 3)
    stress-ng: debug: [2618] stress-ng-dev: exited [2618] (instance 2)
    (test hangs here, system is still responsive)

syslog:
    stress-ng: info: [1379] stress-ng-dev: 15 of 64 devices opened and exercised
    kernel: [ 1324.186794] INFO: task stress-ng:1392 blocked for more than 120 seconds.
    kernel: [ 1324.189937] Not tainted 3.13.0-190-generic #241-Ubuntu
    kernel: [ 1324.192595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kernel: [ 1324.196298] stress-ng D ffff88042d013b80 0 1392 1380 0x00000004
    kernel: [ 1324.196303] ffff880414063cf0 0000000000000086 ffff88041719c800 0000000000013b80
    kernel: [ 1324.196305] ffff880414063fd8 0000000000013b80 ffff88041719c800 ffff8804119e8c28
    kernel: [ 1324.196307] ffffffff00000002 ffff8804119e8c30 ffff88041719c800 7fffffffffffffff
    kernel: [ 1324.196309] Call Trace:
    kernel: [ 1324.196316] [<ffffffff81740c09>] schedule+0x29/0x70
    kernel: [ 1324.196318] [<ffffffff8173fec9>] schedule_timeout+0x279/0x310
    kernel: [ 1324.196323] [<ffffffff81323a1d>] ? apparmor_capable+0x1d/0x130
    kernel: [ 1324.196325] [<ffffffff81744038>] ldsem_down_read+0x108/0x280
    kernel: [ 1324.196330] [<ffffffff814653f0>] tty_ldisc_ref_wait+0x20/0x50
    kernel: [ 1324.196333] [<ffffffff8145e2ef>] tty_ioctl+0x6df/0xcd0
    kernel: [ 1324.196337] [<ffffffff811dc803>] do_vfs_ioctl+0x2e3/0x4d0
    kernel: [ 1324.196339] [<ffffffff811dca71>] SyS_ioctl+0x81/0xa0
    kernel: [ 1324.196342] [<ffffffff8174da89>] system_call_fastpath+0x26/0x2b
    kernel: [ 1324.196344] INFO: task stress-ng:1393 blocked for more than 120 seconds.
    kernel: [ 1324.199457] Not tainted 3.13.0-190-generic #241-Ubuntu
    kernel: [ 1324.202194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kernel: [ 1324.205856] stress-ng D ffff880411c7e000 0 1393 1380 0x00000004
    kernel: [ 1324.205859] ffff880412bd9cf0 0000000000000086 ffff880411c7e000 0000000000013b80
    kernel: [ 1324.205862] ffff880412bd9fd8 0000000000013b80 ffff880411c7e000 ffff8804119e8c28
    kernel: [ 1324.205866] ffffffff00000002 ffff8804119e8c30 ffff880411c7e000 7fffffffffffffff
    kernel: [ 1324.205868] Call Trace:
    kernel: [ 1324.205874] [<ffffffff81740c09>] schedule+0x29/0x70
    kernel: [ 1324.205877] [<ffffffff8173fec9>] schedule_timeout+0x279/0x310
    kernel: [ 1324.205881] [<ffffffff81323a1d>] ? apparmor_capable+0x1d/0x130
    kernel: [ 1324.205885] [<ffffffff81744038>] ldsem_down_read+0x108/0x280
    kernel: [ 1324.205895] [<ffffffff814653f0>] tty_ldisc_ref_wait+0x20/0x50
    kernel: [ 1324.205899] [<ffffffff8145e2ef>] tty_ioctl+0x6df/0xcd0
    kernel: [ 1324.205904] [<ffffffff811ce32d>] ? cp_new_stat+0x13d/0x160
    kernel: [ 1324.205908] [<ffffffff811dc803>] do_vfs_ioctl+0x2e3/0x4d0
    kernel: [ 1324.205911] [<ffffffff811ce425>] ? SYSC_newfstat+0x25/0x30
    kernel: [ 1324.205914] [<ffffffff811dca71>] SyS_ioctl+0x81/0xa0
    kernel: [ 1324.205918] [<ffffffff8174da89>] system_call_fastpath+0x26/0x2b

Po-Hsu Lin (cypressyew)
tags: added: 3.13 amd64 sru-20220509 trusty ubuntu-stress-smoke-test
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Comment copied from Colin's comment in the upstream stress-ng project [1]:

This appears to be a kernel issue triggered by the stress-ng dev stressor on a tty device. One can corner this by working through the devices by specifying the dev file, for example: stress-ng --dev 0 --dev-file /dev/tty0

By the look of the commit it could be due to the enablement of the TIOCGETD ioctl() call, the commit has the following change:

 /*
  * On some older 3.13 kernels this can lock up, need to add
  * a method to detect and skip this somehow. For the moment
  * disable this stress test.
  */
-#if defined(TIOCGETD) && 0
+#if defined(TIOCGETD)
        {
        {
                int ldis;

@@ -705,9 +779,13 @@ static void stress_dev_tty(
                if (ret == 0) {
                        ret = ioctl(fd, TIOCSETD, &ldis);
                }
+#else
+ UNEXPECTED
 #endif

This seems to match up with the ldisc issue shown in the kernel trace. So I suspect there is a kernel fix for this that needs to be backported as this does not occur with newer kernels.

[1] https://github.com/ColinIanKing/stress-ng/issues/202

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.