We are seeing these frequent kernel backtraces with contrail running on 2.01-41 and kernel version 3.13. The only way to recover after this is a hypervisor reboot. Need to dig into the root-cause of this as this is seriously affecting our Uptime.
Following is the backtrace:
2015-04-25T12:28:05.944331+00:00 b0c010ash2018 kernel: [535885.190684] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
2015-04-25T12:28:05.944348+00:00 b0c010ash2018 kernel: [535885.215308] IP: [<ffffffff811b0657>] kmem_cache_alloc+0x77/0x1f0
2015-04-25T12:28:05.944349+00:00 b0c010ash2018 kernel: [535885.230779] PGD 12df28d067 PUD 12d69d0067 PMD 0
2015-04-25T12:28:05.944350+00:00 b0c010ash2018 kernel: [535885.247356] Oops: 0000 [#1] SMP
2015-04-25T12:28:05.944351+00:00 b0c010ash2018 kernel: [535885.264374] Modules linked in: veth vhost_net macvtap macvlan vhost xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 8021q mrp garp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nbd vrouter(OX) ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dm_multipath crct10dif_pclmul crc32_pclmul dcdbas ghash_clmulni_intel scsi_dh xfs gpio_ich joydev ipmi_devintf aesni_intel msr ablk_helper lp mac_hid cryptd sb_edac edac_core lrw gf128mul parport mei_me wmi ioatdma lpc_ich mei ipmi_si glue_helper libcrc32c shpchp acpi_power_meter aes_x86_64 hid_generic usbhid hid ixgbe igb megaraid_sas dca i2c_algo_bit ptp pps_core mdio
2015-04-25T12:28:05.944354+00:00 b0c010ash2018 kernel: [535885.575624] CPU: 28 PID: 45796 Comm: openstack-statu Tainted: G OX 3.13.0-49-generic #81~precise1-Ubuntu
2015-04-25T12:28:05.944355+00:00 b0c010ash2018 kernel: [535885.654294] Hardware name: Dell Inc. PowerEdge R720xd/0X3D66, BIOS 2.4.3 07/09/2014
2015-04-25T12:28:05.944356+00:00 b0c010ash2018 kernel: [535885.737038] task: ffff8812f2ce0000 ti: ffff8812c70fe000 task.ti: ffff8812c70fe000
2015-04-25T12:28:05.944356+00:00 b0c010ash2018 kernel: [535885.826571] RIP: 0010:[<ffffffff811b0657>] [<ffffffff811b0657>] kmem_cache_alloc+0x77/0x1f0
2015-04-25T12:28:05.944357+00:00 b0c010ash2018 kernel: [535885.923753] RSP: 0018:ffff8812c70ffd90 EFLAGS: 00010282
2015-04-25T12:28:05.944358+00:00 b0c010ash2018 kernel: [535885.974387] RAX: 0000000000000000 RBX: 0000000001200011 RCX: 000000000003e8bc
2015-04-25T12:28:05.944359+00:00 b0c010ash2018 kernel: [535886.078298] RDX: 000000000003e8bb RSI: 00000000000000d0 RDI: 00000000000162a0
2015-04-25T12:28:05.944360+00:00 b0c010ash2018 kernel: [535886.188858] RBP: ffff8812c70ffde0 R08: ffff88181fbd62a0 R09: ffffffff8108be94
2015-04-25T12:28:05.944363+00:00 b0c010ash2018 kernel: [535886.302853] R10: ffff88187fffbf00 R11: 00007fff8841b000 R12: 0000000000000001
2015-04-25T12:28:05.944365+00:00 b0c010ash2018 kernel: [535886.418492] R13: ffff88181f403900 R14: ffff88181f403900 R15: 00000000000000d0
2015-04-25T12:28:05.944387+00:00 b0c010ash2018 kernel: [535886.536915] FS: 00007fbc18506700(0000) GS:ffff88181fbc0000(0000) knlGS:0000000000000000
2015-04-25T12:28:05.944388+00:00 b0c010ash2018 kernel: [535886.656231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2015-04-25T12:28:05.944393+00:00 b0c010ash2018 kernel: [535886.716245] CR2: 0000000000000001 CR3: 00000012d6b18000 CR4: 00000000001427e0
2015-04-25T12:28:05.944394+00:00 b0c010ash2018 kernel: [535886.833304] Stack:
2015-04-25T12:28:05.944395+00:00 b0c010ash2018 kernel: [535886.889899] ffffffff8108be94 ffff8817ef6a0aa8 ffff8817f7189880 0000000000000000
2015-04-25T12:28:05.944395+00:00 b0c010ash2018 kernel: [535887.004037] ffff8812c70ffde0 0000000001200011 0000000000000000 ffffffff81c452a0
2015-04-25T12:28:05.944396+00:00 b0c010ash2018 kernel: [535887.117912] 0000000000000000 0000000000000000 ffff8812c70ffe10 ffffffff8108be94
2015-04-25T12:28:05.944397+00:00 b0c010ash2018 kernel: [535887.232613] Call Trace:
2015-04-25T12:28:05.944397+00:00 b0c010ash2018 kernel: [535887.288242] [<ffffffff8108be94>] ? alloc_pid+0x24/0x2e0
2015-04-25T12:28:05.944400+00:00 b0c010ash2018 kernel: [535887.344143] [<ffffffff8108be94>] alloc_pid+0x24/0x2e0
2015-04-25T12:28:05.944401+00:00 b0c010ash2018 kernel: [535887.398737] [<ffffffff8106998a>] copy_process.part.27+0x8ca/0xf50
2015-04-25T12:28:05.944401+00:00 b0c010ash2018 kernel: [535887.452662] [<ffffffff8110331b>] ? audit_filter_rules.isra.7+0x55b/0xad0
2015-04-25T12:28:05.944402+00:00 b0c010ash2018 kernel: [535887.506223] [<ffffffff8106a090>] copy_process+0x80/0x90
2015-04-25T12:28:05.944402+00:00 b0c010ash2018 kernel: [535887.558678] [<ffffffff8106a1d2>] do_fork+0x62/0x280
2015-04-25T12:28:05.944403+00:00 b0c010ash2018 kernel: [535887.610125] [<ffffffff81103924>] ? audit_filter_syscall+0x94/0xe0
2015-04-25T12:28:05.944412+00:00 b0c010ash2018 kernel: [535887.661481] [<ffffffff8106a476>] SyS_clone+0x16/0x20
2015-04-25T12:28:05.944413+00:00 b0c010ash2018 kernel: [535887.711599] [<ffffffff8176ead9>] stub_clone+0x69/0x90
2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.760582] [<ffffffff8176e77d>] ? system_call_fastpath+0x1a/0x1f
2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.808776] Code: 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 5e 01 00 00 48 85 c0 0f 84 55 01 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 b7 49 63
2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.808776] Code: 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 5e 01 00 00 48 85 c0 0f 84 55 01 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 b7 49 63
2015-04-25T12:28:05.944415+00:00 b0c010ash2018 kernel: [535887.954161] RIP [<ffffffff811b0657>] kmem_cache_alloc+0x77/0x1f0
2015-04-25T12:28:05.944416+00:00 b0c010ash2018 kernel: [535888.001499] RSP <ffff8812c70ffd90>
2015-04-25T12:28:05.944416+00:00 b0c010ash2018 kernel: [535888.047379] CR2: 0000000000000001
2015-04-25T12:28:05.944419+00:00 b0c010ash2018 kernel: [535888.157161] ---[ end trace 8e74a782f5824da3 ]---
2015-04-25T12:28:06.040139+00:00 b0c010ash2018 kernel: [535888.207652] [sched_delayed] sched: RT throttling activated
Hi
Can you please check whether there is a steady memory leak before this crash happened? It is difficult to see what has gone wrong without a dump.
Thanks,