Kernel panics on Xenial when using cgroups and strict CFS limits
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Daniel Axtens | ||
Xenial |
Fix Released
|
High
|
Unassigned |
Bug Description
SRU Justification
-----------------
[Impact]
Apache Mesos and Kubernetes workloads on Xenial cause a panic
(NULL pointer dereference) in the completely fair scheduler.
These panics are in pick_next_entity and include pick_next_task_fair
in the call stack.
[Fix]
Cherry-picking both
754bd598be9bbc9
(http://
and
094f469172e00d6
(http://
fix the crash.
They appear to be intended as a series - they were posted to LKML at
the same time.
[Testcase]
The fix has been validated by the user who reported the bug
Bug description
---------------
We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits.
These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_
[24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[24334.501611] IP: [<ffffffff810b2
[24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
[24334.512806] Oops: 0000 [#1] SMP
[24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_
[24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
[24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: ffff8803ee67c000
[24334.601799] RIP: 0010:[<
[24334.610490] RSP: 0018:ffff8803ee
[24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000
[24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000
[24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000
[24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000
[24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178
[24334.652512] FS: 000000000000000
[24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0
[24334.673851] Stack:
[24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 ffff880036529800
[24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 ffff8803ffd16e70
[24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 0000000000000000
[24334.700172] Call Trace:
[24334.702750] [<ffffffff810b9
[24334.708886] [<ffffffff81804
[24334.714349] [<ffffffff81804
[24334.719445] [<ffffffff81804
[24334.725962] [<ffffffff810bf
[24334.732012] [<ffffffff8104f
[24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[24334.765124] RIP [<ffffffff810b2
[24334.771473] RSP <ffff8803ee67fdd8>
[24334.775077] CR2: 0000000000000050
[24334.779121] ---[ end trace 05d941efb97b7bae ]---
and
[155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[155852.036931] IP: [<ffffffff810b2
[155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
[155852.048550] Oops: 0000 [#1] SMP
[155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_
[155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
[155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: ffff8800bbb10000
[155852.135347] RIP: 0010:[<
[155852.144120] RSP: 0018:ffff8800bb
[155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: ffff8800bb777400
[155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 0000000000000000
[155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: ffff8803ed29aa00
[155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 0000000000000000
[155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 0000000000000001
[155852.186387] FS: 00007f387d1c970
[155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 00000000001406e0
[155852.207967] Stack:
[155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 0000000000000000
[155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 ffff8803ffc96e70
[155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 0000000000000001
[155852.234506] Call Trace:
[155852.237156] [<ffffffff81036
[155852.242673] [<ffffffff810b9
[155852.248968] [<ffffffff81803
[155852.254491] [<ffffffff81804
[155852.259667] [<ffffffff81807
[155852.266838] [<ffffffff810e9
[155852.272712] [<ffffffff81807
[155852.280052] [<ffffffff81807
[155852.288558] [<ffffffff81247
[155852.293817] [<ffffffff810a8
[155852.299271] [<ffffffff81248
[155852.304967] [<ffffffff81807
[155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[155852.338852] RIP [<ffffffff810b2
[155852.345270] RSP <ffff8800bbb13ce0>
[155852.348958] CR2: 0000000000000050
[155852.353086] ---[ end trace 8ce693b2314611c4 ]---
Similar issues have been reported in the community for kernels based on 4.4: https:/
These panics occur in the CFS code when a next buddy is set on an entity that is not on a run-queue. This causes pick_next_entity to end up with curr == left == NULL, which means it will call into wakeup_
This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to catch when a sched_entity in the hierarchy was not on_rq, as per https:/
The stack-trace for the WARN is quite involved:
Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here ]------------
Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at /build/
Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_
Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor Not tainted 4.4.0-72-generic #93+hf135461v20
Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Apr 25 14:14:48 (none) kernel: [ 5339.764647] 0000000000000086 00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3
Apr 25 14:14:48 (none) kernel: [ 5339.764650] 0000000000000000 ffffffff81cbae20 ffff8803ed947640 ffffffff81081302
Apr 25 14:14:48 (none) kernel: [ 5339.764652] ffff8800bb5fc800 ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400
Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace:
Apr 25 14:14:48 (none) kernel: [ 5339.764665] [<ffffffff813f8
Apr 25 14:14:48 (none) kernel: [ 5339.764669] [<ffffffff81081
Apr 25 14:14:48 (none) kernel: [ 5339.764672] [<ffffffff81081
Apr 25 14:14:48 (none) kernel: [ 5339.764674] [<ffffffff810b5
Apr 25 14:14:48 (none) kernel: [ 5339.764676] [<ffffffff810b5
Apr 25 14:14:48 (none) kernel: [ 5339.764679] [<ffffffff810ab
Apr 25 14:14:48 (none) kernel: [ 5339.764682] [<ffffffff810b4
Apr 25 14:14:48 (none) kernel: [ 5339.764685] [<ffffffff810be
Apr 25 14:14:48 (none) kernel: [ 5339.764688] [<ffffffff810be
Apr 25 14:14:48 (none) kernel: [ 5339.764692] [<ffffffff81837
Apr 25 14:14:48 (none) kernel: [ 5339.764694] [<ffffffff81838
Apr 25 14:14:48 (none) kernel: [ 5339.764697] [<ffffffff8183b
Apr 25 14:14:48 (none) kernel: [ 5339.764700] [<ffffffff810ef
Apr 25 14:14:48 (none) kernel: [ 5339.764703] [<ffffffff8183b
Apr 25 14:14:48 (none) kernel: [ 5339.764705] [<ffffffff8183b
Apr 25 14:14:48 (none) kernel: [ 5339.764709] [<ffffffff81223
Apr 25 14:14:48 (none) kernel: [ 5339.764711] [<ffffffff81224
Apr 25 14:14:48 (none) kernel: [ 5339.764715] [<ffffffff811fb
Apr 25 14:14:48 (none) kernel: [ 5339.764718] [<ffffffff811fd
Apr 25 14:14:48 (none) kernel: [ 5339.764720] [<ffffffff810b5
Apr 25 14:14:48 (none) kernel: [ 5339.764722] [<ffffffff810b5
Apr 25 14:14:48 (none) kernel: [ 5339.764724] [<ffffffff810b7
Apr 25 14:14:48 (none) kernel: [ 5339.764729] [<ffffffff81071
Apr 25 14:14:48 (none) kernel: [ 5339.764731] [<ffffffff810ba
Apr 25 14:14:48 (none) kernel: [ 5339.764736] [<ffffffff8102d
Apr 25 14:14:48 (none) kernel: [ 5339.764740] [<ffffffff81401
Apr 25 14:14:48 (none) kernel: [ 5339.764742] [<ffffffff810ef
Apr 25 14:14:48 (none) kernel: [ 5339.764744] [<ffffffff810ef
Apr 25 14:14:48 (none) kernel: [ 5339.764746] [<ffffffff810ef
Apr 25 14:14:48 (none) kernel: [ 5339.764751] [<ffffffff81101
Apr 25 14:14:48 (none) kernel: [ 5339.764753] [<ffffffff810ab
Apr 25 14:14:48 (none) kernel: [ 5339.764756] [<ffffffff81224
Apr 25 14:14:48 (none) kernel: [ 5339.764758] [<ffffffff810ef
Apr 25 14:14:48 (none) kernel: [ 5339.764762] [<ffffffff81128
Apr 25 14:14:48 (none) kernel: [ 5339.764764] [<ffffffff81103
Apr 25 14:14:48 (none) kernel: [ 5339.764768] [<ffffffff81064
Apr 25 14:14:48 (none) kernel: [ 5339.764772] [<ffffffff810f5
Apr 25 14:14:48 (none) kernel: [ 5339.764774] [<ffffffff81224
Apr 25 14:14:48 (none) kernel: [ 5339.764777] [<ffffffff8183c
Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 ]---
Cherry-picking both 754bd598be9bbc9
CVE References
tags: | added: kernel-da-key |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Changed in linux (Ubuntu Xenial): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in linux (Ubuntu): | |
status: | Confirmed → Triaged |
description: | updated |
Changed in linux (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Triaged → Fix Released |
Hi,
I didn't realise this had hit -proposed; I am in the process of verifying the fix and will let you know ASAP.
Regards,
Daniel