Activity log for bug #1950666

Date Who What changed Old value New value Message
2021-11-11 16:37:18 Ioanna Alifieraki bug added bug
2021-11-11 16:37:43 Ioanna Alifieraki nominated for series Ubuntu Focal
2021-11-11 16:37:43 Ioanna Alifieraki bug task added linux (Ubuntu Focal)
2021-11-11 16:37:43 Ioanna Alifieraki nominated for series Ubuntu Hirsute
2021-11-11 16:37:43 Ioanna Alifieraki bug task added linux (Ubuntu Hirsute)
2021-11-11 16:37:43 Ioanna Alifieraki nominated for series Ubuntu Impish
2021-11-11 16:37:43 Ioanna Alifieraki bug task added linux (Ubuntu Impish)
2021-11-11 16:37:43 Ioanna Alifieraki nominated for series Ubuntu Jammy
2021-11-11 16:37:43 Ioanna Alifieraki bug task added linux (Ubuntu Jammy)
2021-11-11 16:37:57 Ioanna Alifieraki linux (Ubuntu Focal): importance Undecided Medium
2021-11-11 16:38:00 Ioanna Alifieraki linux (Ubuntu Hirsute): importance Undecided Medium
2021-11-11 16:38:03 Ioanna Alifieraki linux (Ubuntu Impish): importance Undecided Medium
2021-11-11 16:38:05 Ioanna Alifieraki linux (Ubuntu Jammy): importance Undecided Medium
2021-11-11 16:38:10 Ioanna Alifieraki linux (Ubuntu Focal): status New Confirmed
2021-11-11 16:38:16 Ioanna Alifieraki linux (Ubuntu Hirsute): status New Confirmed
2021-11-11 16:38:19 Ioanna Alifieraki linux (Ubuntu Impish): status New Confirmed
2021-11-11 16:38:38 Ioanna Alifieraki linux (Ubuntu Jammy): status New In Progress
2021-11-11 16:38:48 Ioanna Alifieraki linux (Ubuntu Jammy): assignee Ioanna Alifieraki (joalif)
2021-11-11 16:38:50 Ioanna Alifieraki linux (Ubuntu Impish): assignee Ioanna Alifieraki (joalif)
2021-11-11 16:38:53 Ioanna Alifieraki linux (Ubuntu Hirsute): assignee Ioanna Alifieraki (joalif)
2021-11-11 16:38:56 Ioanna Alifieraki linux (Ubuntu Focal): assignee Ioanna Alifieraki (joalif)
2021-11-11 16:43:52 Ioanna Alifieraki description [IMPACT] Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) pushes the removal of an ipmi_user into the system's workqueue. Whenever an ipmi_user struct is about to be removed it is scheduled as a work on the system's workqueue to guarantee the free operation won't be executed in atomic context. When the work is executed the free_user_work() function is invoked which frees the ipmi_user. When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no check if there are any pending works to be executed. Therefore, there is a potential race condition : An ipmi_user is scheduled for removal and shortly to remove ipmi_msghandler module. If the scheduled work delays execution for any reason and the module is removed first then when the work is executed the pages of free_user_work() are gone and the system crashes with the following : BUG: unable to handle page fault for address: ffffffffc05c3450 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0 Oops: 0010 [#1] SMP PTI CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 5.4.0-77-generic #86~18.04.1-Ubuntu Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 05/23/2019 Workqueue: events 0xffffffffc05c3450 RIP: 0010:0xffffffffc05c3450 Code: Bad RIP value. RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286 RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8 RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0 RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080 R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700 R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070 FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? process_one_work+0x20f/0x400 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x400/0x400 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler] CR2: ffffffffc05c3450 [TEST CASE] [WHERE PROBLEMS COULD OCCUR] [OTHER] Upstream is affected too, working on a patch to address this. [IMPACT] Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) pushes the removal of an ipmi_user into the system's workqueue. Whenever an ipmi_user struct is about to be removed it is scheduled as a work on the system's workqueue to guarantee the free operation won't be executed in atomic context. When the work is executed the free_user_work() function is invoked which frees the ipmi_user. When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no check if there are any pending works to be executed. Therefore, there is a potential race condition : An ipmi_user is scheduled for removal and shortly after to remove the ipmi_msghandler module. If the scheduled work delays execution for any reason and the module is removed first, then when the work is executed the pages of free_user_work() are gone and the system crashes with the following : BUG: unable to handle page fault for address: ffffffffc05c3450 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0 Oops: 0010 [#1] SMP PTI CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 5.4.0-77-generic #86~18.04.1-Ubuntu Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 05/23/2019 Workqueue: events 0xffffffffc05c3450 RIP: 0010:0xffffffffc05c3450 Code: Bad RIP value. RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286 RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8 RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0 RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080 R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700 R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070 FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? process_one_work+0x20f/0x400 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x400/0x400 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler] CR2: ffffffffc05c3450 [TEST CASE] The user who reported the issue can reproduce reliably by stopping the ipmi related services and then removing the ipmi modules. I could reproduce the issue only when turning the normal 'work' to delayed work. [WHERE PROBLEMS COULD OCCUR] TBD [OTHER] Upstream is affected too, working on a patch to address this.
2021-12-04 13:01:10 Ioanna Alifieraki description [IMPACT] Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) pushes the removal of an ipmi_user into the system's workqueue. Whenever an ipmi_user struct is about to be removed it is scheduled as a work on the system's workqueue to guarantee the free operation won't be executed in atomic context. When the work is executed the free_user_work() function is invoked which frees the ipmi_user. When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no check if there are any pending works to be executed. Therefore, there is a potential race condition : An ipmi_user is scheduled for removal and shortly after to remove the ipmi_msghandler module. If the scheduled work delays execution for any reason and the module is removed first, then when the work is executed the pages of free_user_work() are gone and the system crashes with the following : BUG: unable to handle page fault for address: ffffffffc05c3450 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0 Oops: 0010 [#1] SMP PTI CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 5.4.0-77-generic #86~18.04.1-Ubuntu Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 05/23/2019 Workqueue: events 0xffffffffc05c3450 RIP: 0010:0xffffffffc05c3450 Code: Bad RIP value. RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286 RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8 RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0 RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080 R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700 R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070 FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? process_one_work+0x20f/0x400 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x400/0x400 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler] CR2: ffffffffc05c3450 [TEST CASE] The user who reported the issue can reproduce reliably by stopping the ipmi related services and then removing the ipmi modules. I could reproduce the issue only when turning the normal 'work' to delayed work. [WHERE PROBLEMS COULD OCCUR] TBD [OTHER] Upstream is affected too, working on a patch to address this. [IMPACT] Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) pushes the removal of an ipmi_user into the system's workqueue. Whenever an ipmi_user struct is about to be removed it is scheduled as a work on the system's workqueue to guarantee the free operation won't be executed in atomic context. When the work is executed the free_user_work() function is invoked which frees the ipmi_user. When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no check if there are any pending works to be executed. Therefore, there is a potential race condition : An ipmi_user is scheduled for removal and shortly after to remove the ipmi_msghandler module. If the scheduled work delays execution for any reason and the module is removed first, then when the work is executed the pages of free_user_work() are gone and the system crashes with the following : BUG: unable to handle page fault for address: ffffffffc05c3450 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0 Oops: 0010 [#1] SMP PTI CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 5.4.0-77-generic #86~18.04.1-Ubuntu Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 05/23/2019 Workqueue: events 0xffffffffc05c3450 RIP: 0010:0xffffffffc05c3450 Code: Bad RIP value. RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286 RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8 RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0 RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080 R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700 R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070 FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? process_one_work+0x20f/0x400 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x400/0x400 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT ... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler] CR2: ffffffffc05c3450 [TEST CASE] The user who reported the issue can reproduce reliably by stopping the ipmi related services and then removing the ipmi modules. I could reproduce the issue only when turning the normal 'work' to delayed work. [WHERE PROBLEMS COULD OCCUR] The fixing patch creates a dedicated workqueue for the remove_work struct of ipmi_user when loading the ipmi_msghandler modules and destroys the workqueue when removing the module. Therefore any potential problems would occur during these two operations or when scheduling works on the dedicated workqueue. [OTHER] Upstream patches : 1d49eb91e86e (ipmi: Move remove_work to dedicated workqueue) 5a3ba99b62d8 (ipmi: msghandler: Make symbol 'remove_work_wq' static)
2021-12-17 19:41:59 Kelsey Steele linux (Ubuntu Focal): status Confirmed Fix Committed
2021-12-17 19:42:01 Kelsey Steele linux (Ubuntu Hirsute): status Confirmed Fix Committed
2021-12-17 19:42:03 Kelsey Steele linux (Ubuntu Impish): status Confirmed Fix Committed
2022-01-06 17:56:36 Ubuntu Kernel Bot tags verification-needed-impish
2022-01-12 13:36:43 Ubuntu Kernel Bot tags verification-needed-impish verification-needed-hirsute verification-needed-impish
2022-01-19 13:18:30 Ubuntu Kernel Bot tags verification-needed-hirsute verification-needed-impish verification-needed-focal verification-needed-hirsute verification-needed-impish
2022-01-19 16:48:40 Ioanna Alifieraki tags verification-needed-focal verification-needed-hirsute verification-needed-impish verification-done-focal verification-done-hirsute verification-done-impish
2022-01-26 21:58:07 Brian Murray linux (Ubuntu Hirsute): status Fix Committed Won't Fix
2022-01-31 12:27:53 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2022-01-31 12:32:51 Launchpad Janitor linux (Ubuntu Impish): status Fix Committed Fix Released
2022-01-31 12:32:51 Launchpad Janitor cve linked 2021-4090
2022-01-31 12:32:51 Launchpad Janitor cve linked 2021-42327
2022-02-07 15:05:26 Ubuntu Kernel Bot tags verification-done-focal verification-done-hirsute verification-done-impish verification-done-focal verification-done-hirsute verification-done-impish verification-needed-bionic