5.19 not reporting cgroups v1 blkio.throttle.io_serviced
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
linux (Ubuntu) | Status tracked in Mantic | |||||
Kinetic |
Fix Released
|
Undecided
|
Unassigned | |||
Lunar |
Fix Released
|
Undecided
|
Unassigned | |||
Mantic |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Commit f382fb0bcef4 ("block: remove legacy IO schedulers") introduced a behavior change in the blkio throttle cgroup subsystem: IO statistics are not reported anymore unless a throttling rule is explicitly defined, because the current code only counts bios that are actually throttled.
This behavior change is potentially breaking some user-space
applications that are relying on the old behavior (see original bug
report below).
[Test case]
- mount cgroup v1
- create a blkio cgroup
- move a task into the blkio cgroup
- perform some I/O (i.e., dd)
- read the IO stats for the cgroup (blkio.
- IO stats are all 0, unless a throttle rule is defined
Previous behavior (kernel 5.15) was showing I/O statistics even without throttling rules defined.
[Fix]
Apply / backport this fix:
https://<email address hidden>/t/
[Regression potential]
The fix is affecting the block IO cgroup subsystem, we may see potential regressions in this particular cgroup subsystem with this fix applied.
[Original bug report]
Hi,
I'm still investigating but, am a bit stuck. Here's what I've found so far.
Today I've upgraded some nodes in AWS EC2 from the previous v5.15 linux-aws package to the recently pusblished v5.19 package and rebooted. It seems that even when there's disk activity, the files:
/sys/fs/
/sys/fs/
Are only ever populated with 0's. Prior on v5.15 these would reflect the actual disk usage. No other system configuration changes were applied just the kernel upgrade and reboot. I've also verified that simply rebooting a v5.15 where this does work doesn't break the reporting. These EC2 instances are running with cgroups v1 due to other compatability issues and I suspect that might be the issue. So far, I cannot find any differences. mtab shows the same v1 mount setup, the kernel options match betwen v5.15 and v5.19.
I'm more than happy to fetch whatever info would help out here. I'd love to get 5.19 working for us but, we really need the data from these files.
Info:
Prior version that works: Linux ip-10-128-168-154 5.15.0-1031-aws #35-Ubuntu SMP Fri Feb 10 02:07:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Upgraded version that's broken: Linux ip-10-128-166-219 5.19.0-1022-aws #23~22.04.1-Ubuntu SMP Fri Mar 17 15:38:24 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
EC2 instances built off of the published 22.04 LTS AMI in us-east-1.
CVE References
description: | updated |
no longer affects: | linux-aws (Ubuntu) |
no longer affects: | linux-azure (Ubuntu) |
no longer affects: | linux-gcp (Ubuntu) |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Kinetic): | |
status: | Incomplete → Fix Committed |
Changed in linux (Ubuntu Lunar): | |
status: | Incomplete → Fix Committed |
A few clarifications from IRC: unified_ cgroup_ hierarchy= 0 systemd. legacy_ systemd_ cgroup_ controller= true' to force cgroups v1 as, unfortunately, we cannot safely turn on cgroups v2 yet (that's another pile of work I want to do!). modules- extra-aws' , 'modprobe bfq', and then 'echo bfq > /sys/block/ nvme0n1/ queue/scheduler ' you will see stats in the '/sys/fs/ cgroup/ blkio/blkio. bfq.io_ service* ' files. cgroup/ blkio/blkio. throttle. io_service* ' files.
1. We run all of our Ubuntu 22.04 LTS nodes with the kernel args 'systemd.
2. If you install 'linux-
3. However, we continue to only see 0's in the '/sys/fs/
Potentially an upstream change but, definitely something that breaks with the '5.19.0. 1022.23~ 22.04.6' Jammy package update. For me, this likely means I need to pin everything to the older 5.15 package pending cgroups v2 working or a fix to this. Obviously I'd prefer having this fixed so that we can get to 5.19 and stick w/ cgroups v1. I'd also offer a note that pushing 5.19 to Jammy without this support feels like a breaking change. I'm more worried that _other_ cgroups v1 controllers aren't working in a way I haven't noticed yet. Anyway, thanks so much for the help so far and gimme a holler if I can test/confirm anything else!