refcount underflow / kernel NULL dereference after attempting to add basic tc filter
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Fabian Grünbichler | ||
Zesty |
Fix Released
|
Medium
|
Fabian Grünbichler |
Bug Description
== SRU Justification ==
Impact: adding a tc filter sometimes fails, potentially followed by kernel hangs and kernel NULL pointer dereference
Fix: proposed upstream by Wolfgang Bumiller [1,2]
Regression Potential: Since nobody else noticed this issue in 4.11 >= rc1 or Ubuntu 4.10 >= 15.17, and the fix only touches the broken code, the regression potential should be minimal ;)
1: http://
2: http://
---
Commit 1045ba77a which was backported for #1674087 in fc0cef7a8ec1e63
The full cover letter of the proposed fix by my colleague Wolfgang Bumiller follows:
Commit 1045ba77a ("net sched actions: Add support for user cookies")
added code to net/sched/
nlattr array unconditionally, while it was otherwise used as well as
initialized only when `name == NULL`:
if (name == NULL) {
err = nla_parse_
In the other case `nla` is instead passed over to ->init to be parsed
there (using a different set of TCA_ enum values, iow. TCA_ACT_COOKIE
then "clashes" with some other value). This lead to the following three
example commands resulting in errors (sometimes followed by more traces
and hangups some time later (although the hangups happened seconds or
sometimes minutes later, sometimes not at all - results differed between
different kernel versions (linux git-master vs ubuntu's mainline 4.11
rc6 vs. pve 4.10.5 (based off ubuntu's zesty kernel where the commit is
cherry-
# ip link add ve0 type veth peer name ve0b
# tc qdisc add dev ve0 handle ffff: ingress
# tc filter add dev ve0 parent ffff: prio 50 basic police rate 1000bps burst 1000b drop
The 3rd command would sometimes succeed, sometimes error with:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
and sometimes error with:
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
In the latter case I assume `cklen` became negative, which passes the
TC_COOKIE_MAX_SIZE check since it is signed but becomes unsigned later
in kmemdup() (see the crash dump below)
When the `tc filter add` command fails a backtrace shows up in dmesg,
added below.
I'm not sure why the TC_ACT_COOKIE code was added to tcf_action_init_1
where it is now. It makes me think that it's supposed to be available
universally, but the `name == NULL` check for how nla is used or passed
to ->init() shows that the there are various different TC_ACT_* enums in
use at this point, hence the 'RFC' part of the patches, I'm not that
familiar with the code yet.
Backtrace when running `tc filter add`:
Apr 12 11:31:38 testmachine kernel: ------------[ cut here ]------------
Apr 12 11:31:38 testmachine kernel: WARNING: CPU: 7 PID: 16596 at mm/page_
Apr 12 11:31:38 testmachine kernel: Modules linked in: act_police cls_basic sch_ingress veth nfsv3 nfs_acl nfs lockd grace ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack ip_set_hash_net ip_set arc4 md4 nls_utf8 cifs ccm fscache ipta
Apr 12 11:31:38 testmachine kernel: snd_hda_
Apr 12 11:31:38 testmachine kernel: CPU: 7 PID: 16596 Comm: tc Tainted: P O 4.10.5-1-pve #1
Apr 12 11:31:38 testmachine kernel: Hardware name: ASUS All Series/Z97-A, BIOS 2801 11/11/2015
Apr 12 11:31:38 testmachine kernel: Call Trace:
Apr 12 11:31:38 testmachine kernel: dump_stack+
Apr 12 11:31:38 testmachine kernel: __warn+0xcb/0xf0
Apr 12 11:31:38 testmachine kernel: warn_slowpath_
Apr 12 11:31:38 testmachine kernel: __alloc_
Apr 12 11:31:38 testmachine kernel: ? get_page_
Apr 12 11:31:38 testmachine kernel: ? schedule+0x36/0x80
Apr 12 11:31:38 testmachine kernel: ? schedule_
Apr 12 11:31:38 testmachine kernel: __alloc_
Apr 12 11:31:38 testmachine kernel: alloc_pages_
Apr 12 11:31:38 testmachine kernel: kmalloc_
Apr 12 11:31:38 testmachine kernel: kmalloc_
Apr 12 11:31:38 testmachine kernel: __kmalloc_
Apr 12 11:31:38 testmachine kernel: kmemdup+0x20/0x50
Apr 12 11:31:38 testmachine kernel: nla_memdup_
Apr 12 11:31:38 testmachine kernel: tcf_action_
Apr 12 11:31:38 testmachine kernel: tcf_exts_
Apr 12 11:31:38 testmachine kernel: basic_change+
Apr 12 11:31:38 testmachine kernel: tc_ctl_
Apr 12 11:31:38 testmachine kernel: rtnetlink_
Apr 12 11:31:38 testmachine kernel: ? __kmalloc_
Apr 12 11:31:38 testmachine kernel: ? __alloc_
Apr 12 11:31:38 testmachine kernel: ? rtnl_newlink+
Apr 12 11:31:38 testmachine kernel: netlink_
Apr 12 11:31:38 testmachine kernel: rtnetlink_
Apr 12 11:31:38 testmachine kernel: netlink_
Apr 12 11:31:38 testmachine kernel: netlink_
Apr 12 11:31:38 testmachine kernel: ? aa_sock_
Apr 12 11:31:38 testmachine kernel: sock_sendmsg+
Apr 12 11:31:38 testmachine kernel: ___sys_
Apr 12 11:31:38 testmachine kernel: ? schedule+0x36/0x80
Apr 12 11:31:38 testmachine kernel: ? ptrace_
Apr 12 11:31:38 testmachine kernel: ? ptrace_
Apr 12 11:31:38 testmachine kernel: __sys_sendmsg+
Apr 12 11:31:38 testmachine kernel: SyS_sendmsg+
Apr 12 11:31:38 testmachine kernel: do_syscall_
Apr 12 11:31:38 testmachine kernel: entry_SYSCALL64
Apr 12 11:31:38 testmachine kernel: RIP: 0033:0x7f0aef7d0a77
Apr 12 11:31:38 testmachine kernel: RSP: 002b:00007ffe88
Apr 12 11:31:38 testmachine kernel: RAX: ffffffffffffffda RBX: 0000000058edf3fc RCX: 00007f0aef7d0a77
Apr 12 11:31:38 testmachine kernel: RDX: 0000000000000000 RSI: 00007ffe886275b0 RDI: 0000000000000003
Apr 12 11:31:38 testmachine kernel: RBP: 00007ffe886275b0 R08: 0000000000000001 R09: 0000000000000050
Apr 12 11:31:38 testmachine kernel: R10: 00000000000005e9 R11: 0000000000000246 R12: 00007ffe886275f0
Apr 12 11:31:38 testmachine kernel: R13: 00005619ea31ee00 R14: 00007ffe8862f690 R15: 0000000000000000
Apr 12 11:31:38 testmachine kernel: ---[ end trace be009b606808485e ]---
Which would later on be followed by different kinds of hangups,
sometimes with more seemingly unrelated crash dumps such as:
Apr 12 11:38:50 testmachine kernel: general protection fault: 0000 [#1] SMP
Apr 12 11:38:50 testmachine kernel: Modules linked in: act_police cls_basic sch_ingress veth nfsv3 nfs_acl nfs lockd grace ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack ip_set_hash_net ip_set arc4 md4 nls_utf8 cifs ccm fscache ipta
Apr 12 11:38:50 testmachine kernel: snd_hda_
Apr 12 11:38:50 testmachine kernel: CPU: 7 PID: 4829 Comm: chromium Tainted: P W O 4.10.5-1-pve #1
Apr 12 11:38:50 testmachine kernel: Hardware name: ASUS All Series/Z97-A, BIOS 2801 11/11/2015
Apr 12 11:38:50 testmachine kernel: task: ffff93679b132d00 task.stack: ffffa479a0e00000
Apr 12 11:38:50 testmachine kernel: RIP: 0010:kmem_
Apr 12 11:38:50 testmachine kernel: RSP: 0018:ffffa479a0
Apr 12 11:38:50 testmachine kernel: RAX: 0000000000000000 RBX: 00000000014000c0 RCX: 0000000000005291
Apr 12 11:38:50 testmachine kernel: RDX: 0000000000005290 RSI: 00000000014000c0 RDI: 000000000001c5c0
Apr 12 11:38:50 testmachine kernel: RBP: ffffa479a0e03b00 R08: ffff9367bfbdc5c0 R09: ffff936724698580
Apr 12 11:38:50 testmachine kernel: R10: 0017ffffc0040038 R11: 0000000000000007 R12: 00000000014000c0
Apr 12 11:38:50 testmachine kernel: R13: ffff93679f003b80 R14: ffffffffc0b9090f R15: ffff93679f003b80
Apr 12 11:38:50 testmachine kernel: FS: 00007f5a069c404
Apr 12 11:38:50 testmachine kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 12 11:38:50 testmachine kernel: CR2: 00007f5a068de000 CR3: 00000007ccb8b000 CR4: 00000000001426e0
Apr 12 11:38:50 testmachine kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 12 11:38:50 testmachine kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 12 11:38:50 testmachine kernel: Call Trace:
Apr 12 11:38:50 testmachine kernel: i915_gem_
Apr 12 11:38:50 testmachine kernel: ? kmem_cache_
Apr 12 11:38:50 testmachine kernel: ____i915_
Apr 12 11:38:50 testmachine kernel: __i915_
Apr 12 11:38:50 testmachine kernel: i915_gem_
Apr 12 11:38:50 testmachine kernel: i915_gem_
Apr 12 11:38:50 testmachine kernel: ? shmem_getpage_
Apr 12 11:38:50 testmachine kernel: i915_gem_
Apr 12 11:38:50 testmachine kernel: drm_ioctl+
Apr 12 11:38:50 testmachine kernel: ? i915_gem_
Apr 12 11:38:50 testmachine kernel: ? __seccomp_
Apr 12 11:38:50 testmachine kernel: do_vfs_
Apr 12 11:38:50 testmachine kernel: ? __secure_
Apr 12 11:38:50 testmachine kernel: ? syscall_
Apr 12 11:38:50 testmachine kernel: SyS_ioctl+0x79/0x90
Apr 12 11:38:50 testmachine kernel: do_syscall_
Apr 12 11:38:50 testmachine kernel: entry_SYSCALL64
Apr 12 11:38:50 testmachine kernel: RIP: 0033:0x7f59fba67ca7
Apr 12 11:38:50 testmachine kernel: RSP: 002b:00007ffd39
Apr 12 11:38:50 testmachine kernel: RAX: ffffffffffffffda RBX: 000024e398f52800 RCX: 00007f59fba67ca7
Apr 12 11:38:50 testmachine kernel: RDX: 00007ffd397788b0 RSI: 0000000040406469 RDI: 00000000000000a4
Apr 12 11:38:50 testmachine kernel: RBP: 00007ffd397788b0 R08: 0000000000000000 R09: 0000000000000000
Apr 12 11:38:50 testmachine kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469
Apr 12 11:38:50 testmachine kernel: R13: 00000000000000a4 R14: 000024e399dd82c0 R15: 0000000000000070
Apr 12 11:38:50 testmachine kernel: Code: 08 65 4c 03 05 e7 de 9e 68 49 83 78 10 00 4d 8b 10 0f 84 e0 00 00 00 4d 85 d2 0f 84 d7 00 00 00 49 63 47 20 49 8b 3f 48 8d 4a 01 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
Apr 12 11:38:50 testmachine kernel: RIP: kmem_cache_
Apr 12 11:38:50 testmachine kernel: general protection fault: 0000 [#2] SMP
Apr 12 11:38:50 testmachine kernel: general protection fault: 0000 [#3] SMP
or:
Apr 12 09:19:35 testmachine kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000019c
Apr 12 09:19:35 testmachine kernel: IP: __free_
Apr 12 09:19:35 testmachine kernel: PGD 0
Apr 12 09:19:35 testmachine kernel:
Apr 12 09:19:35 testmachine kernel: Oops: 0002 [#1] SMP
Apr 12 09:19:35 testmachine kernel: Modules linked in: act_police cls_basic sch_ingress veth nfsv3 nfs_acl nfs lockd grace ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack ip_set_hash_net ip_set arc4 md4 nls_utf8 cifs ccm fscache ipta
Apr 12 09:19:35 testmachine kernel: aes_x86_64 crypto_simd glue_helper cryptd intel_cstate snd_hda_
Apr 12 09:19:35 testmachine kernel: CPU: 2 PID: 69 Comm: kworker/2:1 Tainted: P W O 4.10.5-1-pve #1
Apr 12 09:19:35 testmachine kernel: Hardware name: ASUS All Series/Z97-A, BIOS 2801 11/11/2015
Apr 12 09:19:35 testmachine kernel: Workqueue: events __i915_
Apr 12 09:19:35 testmachine kernel: task: ffff88885b134380 task.stack: ffffa7e243410000
Apr 12 09:19:35 testmachine kernel: RIP: 0010:__
Apr 12 09:19:35 testmachine kernel: RSP: 0018:ffffa7e243
Apr 12 09:19:35 testmachine kernel: RAX: 00000000000ffff8 RBX: ffff888762473460 RCX: ffff888762473470
Apr 12 09:19:35 testmachine kernel: RDX: ffff888762473460 RSI: 0000000000000014 RDI: 0000000000000180
Apr 12 09:19:35 testmachine kernel: RBP: ffffa7e243413d38 R08: 0000000000000000 R09: 0000000000000000
Apr 12 09:19:35 testmachine kernel: R10: ffff8887dd8c1080 R11: 0000000000000000 R12: ffff8887624738f0
Apr 12 09:19:35 testmachine kernel: R13: 00000000ffffffff R14: ffff8887dd8c0440 R15: 0000000000000000
Apr 12 09:19:35 testmachine kernel: FS: 000000000000000
Apr 12 09:19:35 testmachine kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 12 09:19:35 testmachine kernel: CR2: 000000000000019c CR3: 0000000476e09000 CR4: 00000000001426e0
Apr 12 09:19:35 testmachine kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 12 09:19:35 testmachine kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 12 09:19:35 testmachine kernel: Call Trace:
Apr 12 09:19:35 testmachine kernel: ? internal_
Apr 12 09:19:35 testmachine kernel: i915_gem_
Apr 12 09:19:35 testmachine kernel: __i915_
Apr 12 09:19:35 testmachine kernel: ? dma_fence_
Apr 12 09:19:35 testmachine kernel: __i915_
Apr 12 09:19:35 testmachine kernel: __i915_
Apr 12 09:19:35 testmachine kernel: process_
Apr 12 09:19:35 testmachine kernel: worker_
Apr 12 09:19:35 testmachine kernel: kthread+0x101/0x140
Apr 12 09:19:35 testmachine kernel: ? process_
Apr 12 09:19:35 testmachine kernel: ? kthread_
Apr 12 09:19:35 testmachine kernel: ret_from_
Apr 12 09:19:35 testmachine kernel: Code: ff 41 b8 05 00 00 00 31 c9 4c 89 ea 4c 89 fe e8 a2 e0 ff ff e9 1e ff ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <f0> ff 4f 1c 75 0e 55 85 f6 48 89 e5 74 08 e8 48 e4 ff ff 5d f3
Apr 12 09:19:35 testmachine kernel: RIP: __free_
Apr 12 09:19:35 testmachine kernel: CR2: 000000000000019c
Apr 12 09:19:35 testmachine kernel: ---[ end trace 89cb022ec57f7bd1 ]---
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Zesty): | |
status: | Confirmed → In Progress |
assignee: | nobody → Fabian Grünbichler (f-gruenbichler) |
tags: | added: kernel-da-key zesty |
tags: | removed: kernel-da-key |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Zesty): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-zesty removed: verification-needed-zesty |
Changed in linux (Ubuntu Zesty): | |
status: | Fix Committed → Fix Released |
SRU request sent to kernel-team list.