kernel BUG at skbuff.h:1486 Insufficient linear data in skb __skb_pull.part.7+0x4/0x6 [openvswitch]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Trusty |
Fix Released
|
High
|
Andrew Crawford |
Bug Description
Since 2016-12-30 EST we have been experiencing repeated crashes of our OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h line 1486:
1471 /**
1472 * skb_peek - peek at the head of an &sk_buff_head
1473 * @list_: list to peek at
1474 *
1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
1476 * be careful with this one. A peek leaves the buffer on the
1477 * list and someone else may run off with it. You must hold
1478 * the appropriate locks or have a private queue to do this.
1479 *
1480 * Returns %NULL for an empty list or a pointer to the head element.
1481 * The reference count is not incremented and the reference is therefore
1482 * volatile. Use with caution.
1483 */
1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
1485 {
1486 struct sk_buff *skb = list_->next;
1487
1488 if (skb == (struct sk_buff *)list_)
1489 skb = NULL;
1490 return skb;
1491 }
This generally results in a full panic crash of the Neutron node and connectivity breaking for VMs within the cloud. However, after using crash-dumptools to collect information on the crashes over the past three days, the kernel loaded by kexec during the crashdump appears in about 2 out of 3 crash instances to continue running, and we see a flap of the neutron services instead of a full panic that brings the Neutron server down and necessitates a hard reboot.
I believe that this is a manifestation of the openvswitch and issue described on 2017-01-08 as:
"OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb."
https:/
I have not tested the patch provided above yet.
Other information and a few sample dmesg outputs from the crash: (multiple dumps available)
# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04
# apt-cache policy openvswitch
N: Unable to locate package openvswitch
root@neutron01:
openvswitch-common:
Installed: 2.0.2-0ubuntu0.
Candidate: 2.0.2-0ubuntu0.
Version table:
*** 2.0.2-0ubuntu0.
500 http://
100 /var/lib/
2.
500 http://
# apt-cache policy openvswitch-switch
openvswitch-switch:
Installed: 2.0.2-0ubuntu0.
Candidate: 2.0.2-0ubuntu0.
Version table:
*** 2.0.2-0ubuntu0.
500 http://
100 /var/lib/
2.
500 http://
# apt-cache policy neutron-
neutron-
Installed: 1:2014.1.5-0ubuntu7
Candidate: 1:2014.1.5-0ubuntu7
Version table:
*** 1:2014.1.5-0ubuntu7 0
500 http://
100 /var/lib/
1:
500 http://
1:
500 http://
example dmesg:
############## dmesg.201701060019
> [33100.131019] ------------[ cut here ]------------
> [33100.131176] kernel BUG at /build/
> [33100.131424] invalid opcode: 0000 [#1] SMP
> [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_
> [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
> [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]
> [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
> [33100.134325] RIP: 0010:[<
> [33100.134628] RSP: 0018:ffff88046f
> [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
> [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
> [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
> [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
> [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
> [33100.141118] FS: 000000000000000
> [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [33100.163382] Stack:
> [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
> [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
> [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
> [33100.202184] Call Trace:
> [33100.207553] <IRQ>
> [33100.207617]
> [33100.212849] [<ffffffffa022b
> [33100.218139] [<ffffffffa022a
> [33100.228464] [<ffffffffa0230
> [33100.233727] [<ffffffffa0231
> [33100.238898] [<ffffffffa0222
> [33100.243974] [<ffffffffa0222
> [33100.248938] [<ffffffff81666
> [33100.253823] [<ffffffff81666
> [33100.258547] [<ffffffff81665
> [33100.263138] [<ffffffff81666
> [33100.267636] [<ffffffff8162f
> [33100.272134] [<ffffffff8162f
> [33100.276544] [<ffffffff81630
> [33100.280999] [<ffffffff8162f
> [33100.285447] [<ffffffff8106f
> [33100.289886] [<ffffffff81070
> [33100.294224] [<ffffffff81740
> [33100.298433] [<ffffffff81735
> [33100.302613] <EOI>
> [33100.302676]
> [33100.306717] [<ffffffff815dc
> [33100.310816] [<ffffffff815dc
> [33100.314828] [<ffffffff815dc
> [33100.318732] [<ffffffff8101e
> [33100.322479] [<ffffffff810c2
> [33100.326138] [<ffffffff81042
> [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
> [33100.340962] RIP [<ffffffffa0232
> [33100.344857] RSP <ffff88046fd03bb0>
############## dmesg.201701080127
[ 911.714512] ------------[ cut here ]------------
[ 911.714670] kernel BUG at /build/
[ 911.714917] invalid opcode: 0000 [#1] SMP
[ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_
[ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
[ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]
[ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 911.717827] RIP: 0010:[<
[ 911.718128] RSP: 0018:ffff88046f
[ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
[ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
[ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
[ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
[ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
[ 911.724614] FS: 000000000000000
[ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
[ 911.746800] Stack:
[ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
[ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
[ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
[ 911.785550] Call Trace:
[ 911.790915] <IRQ>
[ 911.790979]
[ 911.796163] [<ffffffffa01bf
[ 911.801437] [<ffffffffa01be
[ 911.811769] [<ffffffffa01c4
[ 911.817038] [<ffffffffa01c5
[ 911.822211] [<ffffffffa01b6
[ 911.827280] [<ffffffffa01b6
[ 911.832213] [<ffffffff81666
[ 911.837094] [<ffffffff81666
[ 911.841810] [<ffffffff81665
[ 911.846397] [<ffffffff81666
[ 911.850889] [<ffffffff8162f
[ 911.855384] [<ffffffff8162f
[ 911.859796] [<ffffffff81630
[ 911.864208] [<ffffffff8162f
[ 911.868654] [<ffffffff8106f
[ 911.873093] [<ffffffff81070
[ 911.877442] [<ffffffff81740
[ 911.881654] [<ffffffff81735
[ 911.885832] <EOI>
[ 911.885896]
[ 911.889937] [<ffffffff815dc
[ 911.894036] [<ffffffff815dc
[ 911.898017] [<ffffffff815dc
[ 911.901888] [<ffffffff8101e
[ 911.905643] [<ffffffff810c2
[ 911.909308] [<ffffffff8171b
[ 911.912842] [<ffffffff81d34
[ 911.916281] [<ffffffff81d34
[ 911.919767] [<ffffffff81d34
[ 911.923347] [<ffffffff81d34
[ 911.926859] [<ffffffff81d34
[ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[ 911.940880] RIP [<ffffffffa01c6
[ 911.944483] RSP <ffff88046fc03bb0>
############## dmesg.201701071542
[23738.192626] ------------[ cut here ]------------
[23738.192782] kernel BUG at /build/
[23738.193031] invalid opcode: 0000 [#1] SMP
[23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_
[23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
[23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]
[23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
[23738.195936] RIP: 0010:[<
[23738.196238] RSP: 0018:ffff88046f
[23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
[23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
[23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
[23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
[23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
[23738.202738] FS: 000000000000000
[23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
[23738.224978] Stack:
[23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
[23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
[23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
[23738.263818] Call Trace:
[23738.269200] <IRQ>
[23738.269264]
[23738.274454] [<ffffffffa023d
[23738.279737] [<ffffffffa023c
[23738.290071] [<ffffffffa0242
[23738.295339] [<ffffffffa0243
[23738.300513] [<ffffffffa0206
[23738.305587] [<ffffffffa0206
[23738.310531] [<ffffffff81666
[23738.315420] [<ffffffff81666
[23738.320146] [<ffffffff81665
[23738.324743] [<ffffffff81666
[23738.329244] [<ffffffff8162f
[23738.333744] [<ffffffff8162f
[23738.338158] [<ffffffff81630
[23738.342576] [<ffffffff8162f
[23738.347025] [<ffffffff8106f
[23738.351463] [<ffffffff81070
[23738.355804] [<ffffffff81740
[23738.360010] [<ffffffff81735
[23738.364183] <EOI>
[23738.364246]
[23738.368280] [<ffffffff815dc
[23738.372372] [<ffffffff815dc
[23738.376347] [<ffffffff815dc
[23738.380212] [<ffffffff8101e
[23738.383958] [<ffffffff810c2
[23738.387612] [<ffffffff81042
[23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[23738.402433] RIP [<ffffffffa0244
[23738.406297] RSP <ffff88046fd03bb0>
#######
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Changed in linux (Ubuntu Trusty): | |
status: | New → Incomplete |
importance: | Undecided → High |
status: | Incomplete → Confirmed |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1655683
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.