Intel i40e PF reset due to incorrect MDD detection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Dan Streetman | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Using an Intel i40e network device, under heavy traffic load with
TSO enabled, the device will spontaneously reset itself and issue errors
similar to the following:
Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e 0000:05:00.1: TX driver issue detected, PF reset issued
Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e 0000:05:00.1: TX driver issue detected, PF reset issued
Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e 0000:05:00.1: TX driver issue detected, PF reset issued
This causes a full reset of the PF, which causes an interruption
in traffic flow.
This was partially fixed by Xenial commit 12f8cc59d5886b8
commit 841493a3f64395b
Author: Alexander Duyck <email address hidden>
Date: Tue Sep 6 18:05:04 2016 -0700
i40e: Limit TX descriptor count in cases where frag size is greater than 16K
This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel.
[Testcase]
In this case, the issue occurs at a customer site using i40e based
Intel network cards with SR-IOV enabled. Under heavy load, the card will
reset itself as described.
[Regression Potential]
As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset.
[Other Info]
The previous bug for this issue is bug 1700834.
CVE References
Changed in linux (Ubuntu): | |
status: | Incomplete → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in linux (Ubuntu Xenial): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1713553
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.