rm -r dir on USB disk locks up hdparm on different disk
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-meta-hwe-5.15 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Description: Ubuntu 20.04.6 LTS
Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
Running a "rm -r dir" on a directory with millions of files that resides
on a disk in an external USB-3 hard drive dock locks up an unrelated
hdparm processes running on an internal disk such that the kernel says:
May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds.
[...]
May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds.
[...]
May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds.
First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that
it doesn't affect anything:
# \time hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec
0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k
2312704inpu
Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive.
I now run this loop to show the timings and process states below:
# while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done
(I have some processes stopped in a freezer cgroup ("refrig") that I
don't want to see in the grep output.) I SIGCONT the offending "rm -r"
running on the drive in the USB3 drive dock and you see the rm appear
in uninterruptible sleep along with a couple of kernel processes:
Sun May 7 05:01:07 EDT 2023
Sun May 7 05:01:08 EDT 2023
Sun May 7 05:01:09 EDT 2023
Sun May 7 05:01:10 EDT 2023
Sun May 7 05:01:11 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage]
4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3
Sun May 7 05:01:12 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage]
1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd
4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3
Sun May 7 05:01:13 EDT 2023
The above lines showing those processes in uninterruptible sleep repeat
over and over each second as the "rm -r" continues. I then start up
"hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
uninterruptible sleep and doesn't finish even after minutes of waiting:
Sun May 7 05:01:25 EDT 2023
1 0 368 2 20 0 0 0 md_sup D ? 2:57 [md0_raid5]
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:01:26 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage]
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:01:27 EDT 2023
[...]
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:01:35 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage]
4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:01:36 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage]
1 0 4985 2 20 0 0 0 rq_qos D ? 0:24 [jbd2/sdj1-8]
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:01:37 EDT 2023
I keep waiting. The above lines repeat over and over and the hdparm is
blocked and doesn't finish.
Sun May 7 05:03:32 EDT 2023
1 0 447 2 20 0 0 0 usb_sg D ? 3:18 [usb-storage]
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:03 rm -rf 15tb3
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:34 EDT 2023
At this point, I SIGSTOP the offending "rm -r" and it disappears from
uninterruptible sleep, but the other processes are still in it for
several seconds until the lockup clears and the hdparm finally finishes:
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:35 EDT 2023
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:36 EDT 2023
1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:37 EDT 2023
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:38 EDT 2023
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:39 EDT 2023
1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/
4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:40 EDT 2023
Sun May 7 05:03:41 EDT 2023
Sun May 7 05:03:42 EDT 2023
Sun May 7 05:03:43 EDT 2023
4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:44 EDT 2023
4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda
Sun May 7 05:03:45 EDT 2023
Sun May 7 05:03:46 EDT 2023
And the hdparm is finally done, several minutes after I started it. If I keep
the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an hdparm
(running on a different disk) long enough to cause the above kernel INFO messages.
If I try hdparm on a different disk, e.g. one of the WD Red drives in
my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r":
1 0 447 2 20 0 0 0 usb_sg D ? 3:51 [usb-storage]
1 0 4985 2 20 0 0 0 blk_mq D ? 0:31 [jbd2/sdj1-8]
1 0 1466181 2 20 0 0 0 rq_qos D ? 0:08 [kworker/
4 0 1469821 11939 20 0 27524 18888 wait_o D+ pts/28 0:44 rm -rf 15tb3
4 0 1501584 11933 20 0 4680 4584 rq_qos DL+ pts/26 0:00 hdparm -t /dev/sdb
While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP
the offending "rm -r" and then everything eventually unlocks after a few seconds.
Is this the way things are supposed to work? I did not expect this behaviour. Here's the syslog trace:
[163968.537842] INFO: task hdparm:1391162 blocked for more than 483 seconds.
[163968.537853] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu
[163968.537858] "echo 0 > /proc/sys/
[163968.537861] task:hdparm state:D stack: 0 pid:1391162 ppid:1391130 flags:0x00004004
[163968.537869] Call Trace:
[163968.537872] <TASK>
[163968.537876] __schedule+
[163968.537889] schedule+0x69/0x110
[163968.537896] wb_wait_
[163968.537907] ? wait_woken+
[163968.537928] sync_inodes_
[163968.537932] sync_inodes_
[163968.537933] iterate_
[163968.537937] ? __x64_sys_
[163968.537939] ksys_sync+0x42/0xa0
[163968.537941] __do_sys_
[163968.537942] do_syscall_
[163968.537946] ? do_syscall_
[163968.537948] ? do_syscall_
[163968.537950] ? do_syscall_
[163968.537953] entry_SYSCALL_
[163968.537956] RIP: 0033:0x7f1d374a822b
[163968.537958] RSP: 002b:00007ffc97
[163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 00007f1d374a822b
[163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 0000000000000003
[163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 000000000000002a
[163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 0000000000000000
[163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 00007f1d37193000
[163968.537967] </TASK>
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-
ProcVersionSign
Uname: Linux 5.15.0-72-generic x86_64
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
Date: Sun May 7 08:38:10 2023
EcryptfsInUse: Yes
InstallationDate: Installed on 2020-10-07 (941 days ago)
InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
SourcePackage: linux-meta-hwe-5.15
UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago)
I now have a Perl script running that is removing duplicate files by doing thousands of hard links on a different external USB3 disk and it is locking up or timing out many disk-related things on all my other disks. Both this USB3 external drive and the one above are plugged directly into the motherboard (HP Z440 Workstation).
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 2210112 2210111 20 0 53680 42588 blk_mq D+ pts/24 3:00 /usr/bin/perl [...]
May 14 13:37:49 kernel: [259424.745462] INFO: task smartd:2719 blocked for more than 120 seconds.
May 15 00:08:09 kernel: [297244.761855] INFO: task smartd:2719 blocked for more than 120 seconds.
May 15 00:10:10 kernel: [297365.592485] INFO: task smartd:2719 blocked for more than 241 seconds.
May 15 01:08:34 kernel: [300869.682961] INFO: task smartd:2719 blocked for more than 120 seconds.
May 15 01:28:43 kernel: [302077.989582] INFO: task hdparm:2052842 blocked for more than 120 seconds.
May 15 01:30:43 kernel: [302198.820278] INFO: task hdparm:2052842 blocked for more than 241 seconds.
May 15 01:32:44 kernel: [302319.654907] INFO: task hdparm:2052842 blocked for more than 362 seconds.
May 15 01:34:45 kernel: [302440.481601] INFO: task hdparm:2052842 blocked for more than 483 seconds.
May 15 01:36:46 kernel: [302561.316237] INFO: task hdparm:2052842 blocked for more than 604 seconds.
May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds.
From one of the logged events:
May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 0x2cd/0x890 preempt_ disabled+ 0xe/0x20 lock.isra. 0+0x20c/ 0x470 0+0x61/ 0x1e0 lock_slowpath+ 0x13/0x20 0x36/0x40 get_by_ dev+0x11d/ 0x2d0 close+0x30/ 0x30 open+0x50/ 0x90 open+0x169/ 0x3e0 isra.0+ 0x20d/0x480 0x18e/0xe50 0x13/0x20 put+0x31/ 0x40
May 15 02:06:58 kernel: [304373.770209] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu
May 15 02:06:58 kernel: [304373.770215] "echo 0 > /proc/sys/
May 15 02:06:58 kernel: [304373.770218] task:smartd state:D stack: 0 pid: 2719 ppid: 1 flags:0x00000000
May 15 02:06:58 kernel: [304373.770226] Call Trace:
May 15 02:06:58 kernel: [304373.770230] <TASK>
May 15 02:06:58 kernel: [304373.770236] __schedule+
May 15 02:06:58 kernel: [304373.770251] schedule+0x69/0x110
May 15 02:06:58 kernel: [304373.770260] schedule_
May 15 02:06:58 kernel: [304373.770269] __mutex_
May 15 02:06:58 kernel: [304373.770276] ? iput.part.
May 15 02:06:58 kernel: [304373.770287] __mutex_
May 15 02:06:58 kernel: [304373.770294] mutex_lock+
May 15 02:06:58 kernel: [304373.770299] blkdev_
May 15 02:06:58 kernel: [304373.770309] ? blkdev_
May 15 02:06:58 kernel: [304373.770318] blkdev_
May 15 02:06:58 kernel: [304373.770325] do_dentry_
May 15 02:06:58 kernel: [304373.770336] vfs_open+0x2d/0x40
May 15 02:06:58 kernel: [304373.770342] do_open.
May 15 02:06:58 kernel: [304373.770351] path_openat+
May 15 02:06:58 kernel: [304373.770361] ? put_device+
May 15 02:06:58 kernel: [304373.770371] ? scsi_device_
May 15 02:06:58 kernel: [304373.770380] ? sd_release...