[SRU][jammy] Backport "parse_proc_interrupts: fix parsing interrupt counts"

Bug #2038300 reported by Nicolas Dechesne
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Fix Released
Medium
Unassigned
Jammy
In Progress
Medium
Loïc Minier

Bug Description

[ Impact ]

On tegra/orin platform, running Ubuntu 22.04 image and the linux-nvidia-tegra-igx kernel. When trying to run the 'reboot' command, I am seeing:

[ *** ] A stop job is running for irqbalance daemon (1min 17s / 1min 30s)

After the 1min 30s delay, the reboot carries on.

This appears to be happening because the version of irqbalance in jammy gets stuck repeatedly attempting to rebalance due to a bug in its parsing of /proc/interrupts.

The GPIO irqchip has name 2200000.gpio, which starts with a number. Irqbalance reads this as an interrupt count for another CPU, causing it to parse the number of CPUs as 13 which doesn't match the number of CPUs from num_online_cpus() (12), and thus it keeps rescanning.

The bug was fixed by this commit for irqbalance: https://github.com/Irqbalance/irqbalance/commit/0a82dddbaf5702caded0d0d83a6eafaca743254d, which is not present in the current jammy version.

The bug is "already" fixed in mantic, which has a newer version of irqbalance (1.9.2) which includes this fix.

I have made a local package with this backport and tested against jammy, and I can confirm the problem is fixed. This bug is to get this backported properly into jammy. For now my backport is available in my PPA in https://launchpad.net/~ndec/+archive/ubuntu/ppa-ndec.

[ Test Plan ]

The bug is 100% reproducible on Jammy running Ubuntu on any Jetson hardware, the most obvious way to observe it is that it happens when trying to stop irqbalance, such as when trying to reboot.

Once the bug is fixed, the reboot command works flawlessly.

Additionally, running "irqbalance --debug" will show it continuously trying to "Rescanning cpu topology", after applying the fix, irqbalance --debug works as expected.

[ Where problems could occur ]

irqbalance is included widely in Ubuntu. I have tested the change on x86 (reboot, restart irqbalance and irqbalance --debug) and I am not seeing any particular side effect.

Revision history for this message
Loïc Minier (lool) wrote :

It really feels like there should be an unit test upstream for this kind of things, is there a way to pass a working vs non-working /proc/interrupts file to irqbalance to test with known good and know bad data before and after the change?

summary: - [jammy] Backport "parse_proc_interrupts: fix parsing interrupt counts"
+ [SRU][jammy] Backport "parse_proc_interrupts: fix parsing interrupt
+ counts"
Changed in irqbalance (Ubuntu):
status: New → Fix Released
Revision history for this message
Loïc Minier (lool) wrote :

"irqbalance --debug" as non-root might go through that codepath, albeit /proc/interrupts is a hardcoded path.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

yes, 'irqbalance --debug' as non root goes through it, and can show the problem indeed. On Tegra running jammy irqbalance, it will continuously loop and restart parsing /proc/interupts, with the fix it is ok. On x86 both work fine.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

debdiff attached.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

@lool, I tried to look how we could add a specific test case for this specific issue, and unfortunately, that is far from being straightforward. First, there is a no tests infrastructure upstream at all, and the while I admit that the parsing of /proc/interrupts is fragile, it's done in such a way that it cannot be tested in isolation, and it's intermixed with parsing other data structure in /proc. Is it possible to move forward with the SRU as it is?

Revision history for this message
Loïc Minier (lool) wrote :

Yeah, I gave it a couple of tries (https://github.com/lool/irqbalance/tree/proc-interrupts-env-override https://github.com/lool/irqbalance/tree/test-proc-interrupts-parsing) but actually it's a bunch of files in /proc and /sys that participate in the overall state, and it would be quite more work to patch this all unfortunately.

Loïc Minier (lool)
Changed in irqbalance (Ubuntu Jammy):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Loïc Minier (lool)
Changed in irqbalance (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Nicolas Dechesne (ndec) wrote :

There is another SRU in progress for Jammy for this package (see https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/2038573). We will wait until this SRU is finalized before moving forward.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Rebased on top of recent SRU which made it to -proposed.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Rebased on top of recent SRU which made it to -proposed.

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi Nicolas,

I think your proposed patch was not uploaded to the archive yet, right? May I suggest to add some DEP-3 headers [1] to your patch? For instance, adding the Origin, Reviewed-by and Applied-Upstream fields might be helpful.

[1] https://dep-team.pages.debian.net/deps/dep3/

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Thanks Lucas for your review. I have updated the debdiff with a couple of DEP-3 headers.
It was not uploaded yet, I think Loic will do that soon.

Revision history for this message
Loïc Minier (lool) wrote :

Was off yesterday, uploaded today!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.