kernel BUG - handle_mm_fault - Ubuntu 14.04 kernel 3.13.0-29-generic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-lts-trusty (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Here's the log:
Jun 12 15:42:42 node73 kernel: [17196.908781] ------------[ cut here]------------
Jun 12 15:42:42 node73 kernel: [17196.909789] kernel BUG at/build/
Jun 12 15:42:42 node73 kernel: [17196.911210] invalid opcode: 0000 [#1] SMPJun 12 15:42:42 node73 kernel: [17196.912130] Modules linked in: nfsdauth_rpcgss nfs_acl nfs lockd sunrpc fscache gpio_ich intel_rapl x86_pkg_
Jun 12 15:42:42 node73 kernel: [17196.924647] CPU: 5 PID: 25935 Comm:java Not tainted 3.13.0-29-generic #53-Ubuntu
Jun 12 15:42:42 node73 kernel: [17196.926280] Hardware name: SupermicroX9DRF
Jun 12 15:42:42 node73 kernel: [17196.928566] task: ffff880c4a795fc0 ti:ffff880ce7d96000 task.ti: ffff880ce7d96000
Jun 12 15:42:42 node73 kernel: [17196.930200] RIP:0010:
Jun 12 15:42:42 node73 kernel: [17196.932066] RSP:0018:
Jun 12 15:42:42 node73 kernel: [17196.933217] RAX: 0000000000000100 RBX:000000078dd
Jun 12 15:42:42 node73 kernel: [17196.934773] RDX: ffff880c4a795fc0 RSI:00000000000
Jun 12 15:42:42 node73 kernel: [17196.936328] RBP: ffff880ce7d97e20 R08:00000000000
Jun 12 15:42:42 node73 kernel: [17196.937884] R10: 0000000000000001 R11:00000000000
Jun 12 15:42:42 node73 kernel: [17196.939440] R13: ffff881e0c4d3d40 R14:ffff8810251
Jun 12 15:42:42 node73 kernel: [17196.940996] FS: 00007f252934070
Jun 12 15:42:42 node73 kernel: [17196.979078] CS: 0010 DS: 0000 ES:0000 CR0: 0000000080050033
Jun 12 15:42:42 node73 kernel: [17197.017222] CR2: 0000000718184000 CR3:0000001021a
Jun 12 15:42:42 node73 kernel: [17197.056416] Stack:Jun 12 15:42:42 node73 kernel: [17197.094614] 000000000000000
Jun 12 15:42:42 node73 kernel: [17197.171848] ffffffff810d7b5
Jun 12 15:42:42 node73 kernel: [17197.249793] ffffffff810d996
Jun 12 15:42:42 node73 kernel: [17197.327660] Call Trace:Jun 12 15:42:42 node73 kernel: [17197.365233] [<ffffffff8109a
Jun 12 15:42:42 node73 kernel: [17197.403036] [<ffffffff810d7
Jun 12 15:42:42 node73 kernel: [17197.439822] [<ffffffff810d9
Jun 12 15:42:42 node73 kernel: [17197.475937] [<ffffffff81726
Jun 12 15:42:42 node73 kernel: [17197.511226] [<ffffffff81111
Jun 12 15:42:42 node73 kernel: [17197.546109] [<ffffffff8109d
Jun 12 15:42:42 node73 kernel: [17197.580167] [<ffffffff8109d
Jun 12 15:42:42 node73 kernel: [17197.613381] [<ffffffff81726
Jun 12 15:42:42 node73 kernel: [17197.645771] [<ffffffff81722
Jun 12 15:42:42 node73 kernel: [17197.677251] Code: ff 48 89 d9 4c 89 e24c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 c0 3c a6 81 44 89 4d c8 e8 48 e2
Jun 12 15:42:42 node73 kernel: [17197.772738] RIP [<ffffffff81179
Jun 12 15:42:42 node73 kernel: [17197.804166] RSP <ffff880ce7d97d
Jun 12 17:15:21 node73 kernel: [22748.792239] ------------[ cut here]------------
Please see my mail here:
https:/
And the response here (cc included @canonical.com):
https:/
Which was linked to here (Which has a patch that is said to fix this):
https:/
I applied that patch and built a kernel... it's in testing now on 2 machines out of 3 that have this problem. We have Ubuntu 14.04 on 73 single socket machines, where one has this problem, and 3 dual socket machines where 2 have this problem.
Problem machines:
- single socket Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz, Supermicro X9DR3-F
- dual socket Intel(R) Xeon(R) CPU E5520 @ 2.27GHz, Dell PowerEdge R710
- dual socket Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, Supermicro X9DRFF-
(and the other dual socket one without the problem is another PowerEdge R710, strangely enough... maybe it's just not heavily loaded like the other, prime95 for a few hours doesn't cause it either)