thunder nic: avoid link delays due to RX_PACKET_DIS

Bug #1630038 reported by dann frazier
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
dann frazier
Xenial
Fix Released
Medium
dann frazier
Yakkety
Fix Released
Medium
dann frazier

Bug Description

[Impact]
Link establishment is delayed during initialization, possibly resulting in remote fault conditions that may cause the interface to fail to come up.

[Test Case]
Put the system in a reboot loop and watch for a remote fault condition, or a failure to bring up the link that can only be resolved by reloading the module.

[Regression Risk]
Patch is to a specific driver that is only used on Cavium ThunderX systems. The patch is upstream, so will have upstream support for regressions.

CVE References

dann frazier (dannf)
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → dann frazier (dannf)
Seth Forshee (sforshee)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

dann frazier (dannf)
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
dann frazier (dannf)
tags: added: verification-done-xenial
removed: verification-needed-xenial
dann frazier (dannf)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Seth Forshee (sforshee)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → dann frazier (dannf)
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-47.68

---------------
linux (4.4.0-47.68) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1636941

  * Add a driver for Amazon Elastic Network Adapters (ENA) (LP: #1635721)
    - lib/bitmap.c: conversion routines to/from u32 array
    - net: ethtool: add new ETHTOOL_xLINKSETTINGS API
    - net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
    - [config] enable CONFIG_ENA_ETHERNET=m (Amazon ENA driver)

  * unexpectedly large memory usage of mounted snaps (LP: #1636847)
    - [Config] switch squashfs to single threaded decode

 -- Kamal Mostafa <email address hidden> Wed, 26 Oct 2016 10:47:55 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package linux - 4.8.0-27.29

---------------
linux (4.8.0-27.29) yakkety; urgency=low

  [ Seth Forshee ]

  * Release Tracking Bug
    - LP: #1635377

  * proc_keys_show crash when reading /proc/keys (LP: #1634496)
    - SAUCE: KEYS: ensure xbuf is large enough to fix buffer overflow in
      proc_keys_show (LP: #1634496)

  * Revert "If zone is so small that watermarks are the same, stop zone balance"
    in yakkety (LP: #1632894)
    - Revert "UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the
      same, stop zone balance."

  * lts-yakkety 4.8 cannot mount lvm raid1 (LP: #1631298)
    - SAUCE: (no-up) dm raid: fix compat_features validation

  * kswapd0 100% CPU usage (LP: #1518457)
    - SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone
      balance.

  * [Trusty->Yakkety] powerpc/64: Fix incorrect return value from
    __copy_tofrom_user (LP: #1632462)
    - SAUCE: (no-up) powerpc/64: Fix incorrect return value from
      __copy_tofrom_user

  * Ubuntu 16.10: Oops panic in move_page_tables/page_remove_rmap after running
    memory_stress_ng. (LP: #1628976)
    - SAUCE: (no-up) powerpc/pseries: Fix stack corruption in htpe code

  * Paths not failed properly when unmapping virtual FC ports in VIOS (using
    ibmvfc) (LP: #1632116)
    - scsi: ibmvfc: Fix I/O hang when port is not mapped

  * [Ubuntu16.10]KV4.8: kernel livepatch config options are not set
    (LP: #1626983)
    - [Config] Enable live patching on powerpc/ppc64el

  * CONFIG_AUFS_XATTR is not set (LP: #1557776)
    - [Config] CONFIG_AUFS_XATTR=y

  * Yakkety update to 4.8.1 stable release (LP: #1632445)
    - arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
    - Using BUG_ON() as an assert() is _never_ acceptable
    - usb: misc: legousbtower: Fix NULL pointer deference
    - Staging: fbtft: Fix bug in fbtft-core
    - usb: usbip: vudc: fix left shift overflow
    - USB: serial: cp210x: Add ID for a Juniper console
    - Revert "usbtmc: convert to devm_kzalloc"
    - ALSA: hda - Adding one more ALC255 pin definition for headset problem
    - ALSA: hda - Fix headset mic detection problem for several Dell laptops
    - ALSA: hda - Add the top speaker pin config for HP Spectre x360
    - Linux 4.8.1

  * PSL data cache should be flushed before resetting CAPI adapter
    (LP: #1632049)
    - cxl: Flush PSL cache before resetting the adapter

  * thunder nic: avoid link delays due to RX_PACKET_DIS (LP: #1630038)
    - net: thunderx: Don't set RX_PACKET_DIS while initializing

  * crypto/vmx/p8_ghash memory corruption (LP: #1630970)
    - crypto: ghash-generic - move common definitions to a new header file
    - crypto: vmx - Fix memory corruption caused by p8_ghash
    - crypto: vmx - Ensure ghash-generic is enabled

  * arm64: SPCR console not autodetected (LP: #1630311)
    - of/serial: move earlycon early_param handling to serial
    - [Config] CONFIG_ACPI_SPCR_TABLE=y
    - ACPI: parse SPCR and enable matching console
    - ARM64: ACPI: enable ACPI_SPCR_TABLE
    - serial: pl011: add console matching function

  * include/linux/security.h header syntax error with !CONFIG_SECURITYFS
...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-30.32

---------------
linux (4.8.0-30.32) yakkety; urgency=low

  * CVE-2016-8655 (LP: #1646318)
    - packet: fix race condition in packet_set_ring

 -- Brad Figg <email address hidden> Thu, 01 Dec 2016 08:02:53 -0800

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Alexandru Avadanii (alexandru-avadanii) wrote :

Hi,
This fix introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45.
I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]).

BR,
Alex

[1] https://jira.opnfv.org/browse/ARMBAND-168
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17

Revision history for this message
Raghuram Kota (rkota) wrote :

Hi,

Regarding comment#6, can you please help provide :

1) The model of the specific server in use ?
2) A console log that help determine the UEFI firmware version running on that model ?

Thanks,
Raghu

Revision history for this message
Alexandru Avadanii (alexandru-avadanii) wrote :

Hi,

1) We tested different models (CRB-1S, CRB-2S) - all behave the same.
2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31).

All in all, this issue seems pretty tied to the switch we use, and all firmware/board model combinations behaved the same ...

Thanks,
Alex

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS

On Tue, Mar 21, 2017 at 3:41 PM, Alexandru Avadanii
<email address hidden> wrote:
> Hi,
>
> 1) We tested different models (CRB-1S, CRB-2S) - all behave the same.
> 2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31).
>
> All in all, this issue seems pretty tied to the switch we use, and all
> firmware/board model combinations behaved the same ...

Hi Alex,

  Would you mind opening a new bug to track this regression and linking it here?

Revision history for this message
Alexandru Avadanii (alexandru-avadanii) wrote :

Hi, Dann,
I created a new bug and pasted the same info as above at [1].
Afaict, there is no useful information in the logs when link training fails.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.