net:veth.sh in ubuntu_kernel_selftests hang with J-intel-iotg (BUG: unable to handle page fault)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
HWE Next |
Invalid
|
Undecided
|
Unassigned | ||
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned | ||
linux-intel-iotg (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Issue found on node "onibi" with J-intel-iotg 5.15.0-1026.31 this cycle.
The veth.sh test in net category will hang and timeout, causing test report incomplete.
I can see some traces in dmesg with manual test.
ubuntu@
default - gro flag ok
- peer gro flag ok
- tso flag ok
- peer tso flag ok
- aggregation ok
- aggregation with TSO off ok
with gro on - gro flag ok
- peer gro flag ok
- tso flag ok
- peer tso flag ok
- aggregation with TSO off ok
default channels ok
with gro enabled on link down - gro flag ok
- peer gro flag ok
- tso flag ok
- peer tso flag ok
- aggregation with TSO off ok
setting tx channels ok
setting both rx and tx channels ok
bad setting: combined channels ok
setting invalid channels nr fail rx:3:3 tx:3:5 combined:n/a:n/a
bad setting: XDP with RX nr less than TX ok
(hangs here)
dmesg output:
[ 547.520923] BUG: unable to handle page fault for address: ffffb73800000001
[ 547.520999] #PF: supervisor write access in kernel mode
[ 547.521045] #PF: error_code(0x0002) - not-present page
[ 547.521089] PGD 100000067 P4D 100000067 PUD 0
[ 547.521133] Oops: 0002 [#1] SMP PTI
[ 547.521168] CPU: 1 PID: 1559 Comm: ip Not tainted 5.15.0-
[ 547.521233] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.8.2 08/17/2011
[ 547.521293] RIP: 0010:veth_
[ 547.521342] Code: ff 41 89 9d 1c 01 00 00 49 21 85 e8 00 00 00 e9 74 ff ff ff 48 c7 c7 80 e3 b0 c0 e8 2b 3b 06 c1 b8 e4 ff ff ff 4d 85 ff 74 85 <49> c7 07 80 e3 b0 c0 e9 79 ff ff ff 48 c7 c7 20 e4 b0 c0 e8 09 3b
[ 547.521488] RSP: 0018:ffffb738c2
[ 547.521535] RAX: 00000000ffffffe4 RBX: 0000000000000db2 RCX: ffffb738c254fb20
[ 547.521594] RDX: ffffffffc0b0bf90 RSI: ffffb738c254f468 RDI: ffffffffc0b0e380
[ 547.521653] RBP: ffffb738c254f450 R08: 0000000000000001 R09: ffffb738c0081000
[ 547.521711] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c65ced90000
[ 547.521769] R13: ffff8c65c12f6000 R14: 0000000000000000 R15: ffffb73800000001
[ 547.521828] FS: 00007faa028b3b8
[ 547.521895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.521943] CR2: ffffb73800000001 CR3: 000000010f068000 CR4: 00000000000006e0
[ 547.522004] Call Trace:
[ 547.522029] <TASK>
[ 547.522052] ? veth_open+0x90/0x90 [veth]
[ 547.522094] dev_xdp_
[ 547.522135] dev_xdp_
[ 547.522171] ? __bpf_prog_
[ 547.522212] dev_change_
[ 547.522252] do_setlink+
[ 547.522288] ? dev_get_
[ 547.522326] __rtnl_
[ 547.522363] ? security_
[ 547.522406] ? skb_queue_
[ 547.522444] ? sock_def_
[ 547.522485] ? __netlink_
[ 547.522528] ? netlink_
[ 547.522566] ? rtnl_getlink+
[ 547.522611] ? kmem_cache_
[ 547.522657] rtnl_newlink+
[ 547.522692] rtnetlink_
[ 547.522731] ? rtnl_calcit.
[ 547.524524] netlink_
[ 547.526314] rtnetlink_
[ 547.528102] netlink_
[ 547.529837] netlink_
[ 547.531505] sock_sendmsg+
[ 547.533114] ____sys_
[ 547.534667] ? import_
[ 547.536164] ? sendmsg_
[ 547.537609] ___sys_
[ 547.539024] ? rseq_ip_
[ 547.540420] ? __rseq_
[ 547.541824] ? exit_to_
[ 547.543227] ? exit_to_
[ 547.544623] ? syscall_
[ 547.545993] ? __x64_sys_
[ 547.547334] __sys_sendmsg+
[ 547.548650] __x64_sys_
[ 547.549918] do_syscall_
[ 547.551134] ? exc_page_
[ 547.552322] entry_SYSCALL_
[ 547.553511] RIP: 0033:0x7faa02a07b17
[ 547.554680] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 547.557206] RSP: 002b:00007ffdbb
[ 547.558517] RAX: ffffffffffffffda RBX: 0000000063f5ffbc RCX: 00007faa02a07b17
[ 547.559818] RDX: 0000000000000000 RSI: 00007ffdbbca36e0 RDI: 0000000000000003
[ 547.561110] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000555ea5465830
[ 547.562386] R10: 00007faa02afa340 R11: 0000000000000246 R12: 0000000000000001
[ 547.563647] R13: 00007ffdbbca3790 R14: 0000000000000000 R15: 0000555ea4edb040
[ 547.564924] </TASK>
[ 547.566184] Modules linked in: algif_hash af_alg veth intel_powerclamp ipmi_ssif coretemp joydev input_leds binfmt_misc kvm_intel ipmi_si kvm dcdbas ipmi_devintf ipmi_msghandler intel_cstate mac_hid acpi_power_meter i7core_edac sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 i2c_algo_bit hid_generic gpio_ich drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas cec rc_core usbhid raid_class drm pata_acpi hid lpc_ich bnx2 scsi_transport_sas
[ 547.575407] CR2: ffffb73800000001
[ 547.577070] ---[ end trace 3ebb9a2cada35096 ]---
[ 547.586349] RIP: 0010:veth_
[ 547.588039] Code: ff 41 89 9d 1c 01 00 00 49 21 85 e8 00 00 00 e9 74 ff ff ff 48 c7 c7 80 e3 b0 c0 e8 2b 3b 06 c1 b8 e4 ff ff ff 4d 85 ff 74 85 <49> c7 07 80 e3 b0 c0 e9 79 ff ff ff 48 c7 c7 20 e4 b0 c0 e8 09 3b
[ 547.591612] RSP: 0018:ffffb738c2
[ 547.593432] RAX: 00000000ffffffe4 RBX: 0000000000000db2 RCX: ffffb738c254fb20
[ 547.595282] RDX: ffffffffc0b0bf90 RSI: ffffb738c254f468 RDI: ffffffffc0b0e380
[ 547.597143] RBP: ffffb738c254f450 R08: 0000000000000001 R09: ffffb738c0081000
[ 547.599012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c65ced90000
[ 547.600888] R13: ffff8c65c12f6000 R14: 0000000000000000 R15: ffffb73800000001
[ 547.602764] FS: 00007faa028b3b8
[ 547.604675] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.606593] CR2: ffffb73800000001 CR3: 000000010f068000 CR4: 00000000000006e0
As this node was not tested with this test in previous cycle, it's yet to determine whether this is a regression or not.
Next is to test the kernel in -updates on this node.
CVE References
description: | updated |
Changed in linux-intel-iotg (Ubuntu): | |
status: | New → Invalid |
Changed in linux-intel-iotg (Ubuntu Jammy): | |
importance: | Undecided → Medium |
status: | New → In Progress |
tags: | added: lookout-canyon oem-priority originate-from-2011522 |
Changed in hwe-next: | |
status: | New → Invalid |
tags: | added: originate-from-1943687 |
tags: |
added: verification-done-jammy removed: verification-needed-jammy |
This issue can be reproduced with -1025 as well, so it's not a regression.
Looking back into test history, I can see onibi has been tested with 5.15.0-1015.20, but the test was built at that time:
Running 'make run_tests -C net TEST_PROGS=veth.sh TEST_GEN_PROGS='' TEST_CUSTOM_ PROGS=' '' ubuntu/ autotest/ client/ tmp/ubuntu_ kernel_ selftests/ src/linux/ tools/testing/ selftests/ net' ubuntu/ autotest/ client/ tmp/ubuntu_ kernel_ selftests/ src/linux' ubuntu/ autotest/ client/ tmp/ubuntu_ kernel_ selftests/ src/linux'
make: Entering directory '/home/
make --no-builtin-rules ARCH=x86 -C ../../../.. headers_install
make[1]: Entering directory '/home/
INSTALL ./usr/include
make[1]: Leaving directory '/home/
TAP version 13
1..1
# selftests: net: veth.sh
# Missing xdp_dummy helper. Build bpf selftest first
not ok 1 selftests: net: veth.sh # exit=1