Ubuntu 18.04- call trace in kernel buffer when unloading ib_ipoib module
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Ian May |
Bug Description
[Impact]
unloading ib_ipoib causes a call trace to be logged in kernel buffer.
bisecting the bionic kernel reveals that this issue was discovered by
616e695435e3 workqueue: Try to catch flush_work() without INIT_WORK()
in version 4.15.0-59.66
[test case]
# modprobe ib_ipoib
# modprobe ib_ipoib -r
# dmesg
[ 306.277717] ------------[ cut here ]------------
[ 306.277738] WARNING: CPU: 10 PID: 2148 at /build/
[ 306.277739] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_
[ 306.277790] serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core scsi_transport_sas ptp pps_core devlink
[ 306.277817] CPU: 10 PID: 2148 Comm: modprobe Not tainted 4.15.0-124-generic #127-Ubuntu
[ 306.277818] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
[ 306.277823] RIP: 0010:__
[ 306.277825] RSP: 0018:ffffbdeb47
[ 306.277827] RAX: 0000000000000024 RBX: ffff993a5c3d8ec8 RCX: 0000000000000006
[ 306.277829] RDX: 0000000000000000 RSI: ffff99429ef16498 RDI: ffff99429ef16490
[ 306.277830] RBP: ffffbdeb47ecfd48 R08: 000000000000050d R09: 0000000000000004
[ 306.277832] R10: ffffe263a058c1c0 R11: 0000000000000001 R12: ffff993a5c3d8ec8
[ 306.277833] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: ffffffffb00a9800
[ 306.277835] FS: 00007fa1124a954
[ 306.277837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 306.277839] CR2: 000055b1c5007bb0 CR3: 0000000fcf36c002 CR4: 00000000001606e0
[ 306.277840] Call Trace:
[ 306.277850] __cancel_
[ 306.277881] ? mlx5_core_
[ 306.277886] cancel_
[ 306.277909] mlx5e_detach_
[ 306.277931] mlx5_rdma_
[ 306.277941] mlx5_ib_
[ 306.277948] ipoib_remove_
[ 306.277965] ib_unregister_
[ 306.277972] ipoib_cleanup_
[ 306.277978] SyS_delete_
[ 306.277983] do_syscall_
[ 306.277989] entry_SYSCALL_
[ 306.277992] RIP: 0033:0x7fa111fc1047
[ 306.277993] RSP: 002b:00007ffc0d
[ 306.277996] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 00007fa111fc1047
[ 306.277997] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005614be46cd08
[ 306.277999] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 0000000000000000
[ 306.278000] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 00005614be46cd08
[ 306.278002] R13: 0000000000000001 R14: 00005614be46cd08 R15: 00007ffc0db33680
[ 306.278004] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
[ 306.278035] ---[ end trace 652f7759937172a2 ]---
[ 306.646061] ------------[ cut here ]------------
[ 306.646077] WARNING: CPU: 6 PID: 2148 at /build/
[ 306.646078] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_
[ 306.646123] serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core scsi_transport_sas ptp pps_core devlink
[ 306.646146] CPU: 6 PID: 2148 Comm: modprobe Tainted: G W 4.15.0-124-generic #127-Ubuntu
[ 306.646148] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
[ 306.646152] RIP: 0010:__
[ 306.646154] RSP: 0018:ffffbdeb47
[ 306.646156] RAX: 0000000000000024 RBX: ffff9942970b8ec8 RCX: 0000000000000006
[ 306.646158] RDX: 0000000000000000 RSI: ffff99429ee16498 RDI: ffff99429ee16490
[ 306.646159] RBP: ffffbdeb47ecfd48 R08: 0000000000000533 R09: 0000000000000004
[ 306.646161] R10: ffffe2639fa66740 R11: 0000000000000001 R12: ffff9942970b8ec8
[ 306.646162] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: ffffffffb00a9800
[ 306.646164] FS: 00007fa1124a954
[ 306.646166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 306.646167] CR2: 000055dd889e4a30 CR3: 0000000fcf36c006 CR4: 00000000001606e0
[ 306.646169] Call Trace:
[ 306.646177] __cancel_
[ 306.646205] ? mlx5_core_
[ 306.646210] cancel_
[ 306.646233] mlx5e_detach_
[ 306.646255] mlx5_rdma_
[ 306.646264] mlx5_ib_
[ 306.646271] ipoib_remove_
[ 306.646287] ib_unregister_
[ 306.646295] ipoib_cleanup_
[ 306.646300] SyS_delete_
[ 306.646305] do_syscall_
[ 306.646310] entry_SYSCALL_
[ 306.646313] RIP: 0033:0x7fa111fc1047
[ 306.646314] RSP: 002b:00007ffc0d
[ 306.646317] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 00007fa111fc1047
[ 306.646318] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005614be46cd08
[ 306.646319] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 0000000000000000
[ 306.646321] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 00005614be46cd08
[ 306.646322] R13: 0000000000000001 R14: 00005614be46cd08 R15: 00007ffc0db33680
[ 306.646325] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
[ 306.646355] ---[ end trace 652f7759937172a3 ]---
[Fix]
the root cause for this error is canceling uninitialized delayed_work_queue belongs to ipoib net devices and the solution is not failing to initialize it.
this solution is specified in the very small patched (one line) attached.
please note that this patch is not upstream and it is based on the following upstream commits which introduced similar functionality to upstream v4.20-rc1.
303211b44ce3 net/mlx5e: Always initialize update stats delayed work
182570b26223 net/mlx5e: Gather common netdev init/cleanup functionality in one place
applying this two on the bionic tree in a clean way requires more patches that might introduce a large change so I think it's better (if possible) to use the attached patch.
[Regression Potential]
Regression risk is low since it's introduce a small fix that was also accepted upstream in v4.20.
CVE References
description: | updated |
tags: | added: patch |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → Kamal Mostafa (kamalmostafa) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | Confirmed → Invalid |
Changed in linux (Ubuntu Bionic): | |
assignee: | Kamal Mostafa (kamalmostafa) → Ian (ian-may) |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1904848
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.