Firmware Hang detected on HP netxen NIC

Bug #1750176 reported by Ernie Martinez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-firmware (Ubuntu)
New
Undecided
Unassigned

Bug Description

HP DL580G7
MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1
cloud-init.log and syslog attached.
System normally deploys fine as part of a MAAS cluster. Issue is only occurring when trying to juju deploy a Kubernetes worker to this node. Hang occurs trying to load docker.io according to syslog once system hangs, can't ssh into system console shows the error:

netenp4s0f0 Firmware Hang Detected
.
.
.
netxen_nic enp... Device Initialization error

uname -a
Linux gpu-server 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

ethtool -i enp4s0f0

driver: netxen_nic
version: 4.0.82
firmware-version: 4.0.596
expansion-rom-version:
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

dmesg |grep netxen

[ 3.284815] netxen_nic 0000:04:00.0: 2MB memory map
[ 3.537593] netxen_nic 0000:04:00.0: Gen2 strapping detected
[ 3.537682] netxen_nic 0000:04:00.0: using 64-bit dma mask
[ 3.828914] netxen_nic: NX3031 Gigabit Ethernet Board S/N \xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff Chip rev 0x42
[ 3.828918] netxen_nic 0000:04:00.0: Driver v4.0.82, firmware v4.0.596 [legacy]
[ 3.829102] netxen_nic 0000:04:00.0: using msi-x interrupts
[ 3.829105] netxen_nic 0000:04:00.0: non ULA adapter
[ 3.829372] netxen_nic 0000:04:00.0: eth0: GbE port initialized
[ 3.850279] netxen_nic 0000:04:00.1: 2MB memory map
[ 3.850489] netxen_nic 0000:04:00.1: using 64-bit dma mask
[ 4.080327] netxen_nic 0000:04:00.1: Driver v4.0.82, firmware v4.0.596 [legacy]
[ 4.080588] netxen_nic 0000:04:00.1: using msi-x interrupts
[ 4.080874] netxen_nic 0000:04:00.1: eth1: GbE port initialized
[ 4.132499] netxen_nic 0000:04:00.2: 2MB memory map
[ 4.132707] netxen_nic 0000:04:00.2: using 64-bit dma mask
[ 4.188596] netxen_nic 0000:04:00.2: Driver v4.0.82, firmware v4.0.596 [legacy]
[ 4.192717] netxen_nic 0000:04:00.2: using msi-x interrupts
[ 4.196837] netxen_nic 0000:04:00.2: eth2: GbE port initialized
[ 4.317290] netxen_nic 0000:04:00.3: 2MB memory map
[ 4.321319] netxen_nic 0000:04:00.3: using 64-bit dma mask
[ 4.388142] netxen_nic 0000:04:00.3: Driver v4.0.82, firmware v4.0.596 [legacy]
[ 4.392254] netxen_nic 0000:04:00.3: using msi-x interrupts
[ 4.396445] netxen_nic 0000:04:00.3: eth3: GbE port initialized
[ 4.403938] netxen_nic 0000:04:00.2 enp4s0f2: renamed from eth2
[ 4.524474] netxen_nic 0000:04:00.0 enp4s0f0: renamed from eth0
[ 4.564541] netxen_nic 0000:04:00.1 enp4s0f1: renamed from eth1
[ 4.600619] netxen_nic 0000:04:00.3 enp4s0f3: renamed from eth3
[ 19.290490] netxen_nic: enp4s0f0 NIC Link is up
[ 21.606695] netxen_nic: enp4s0f1 NIC Link is up

Last thing in syslog before nics crash:

Feb 17 19:08:57 gpu-server systemd[1]: Started ACPI event daemon.
Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Socket for the API.
Feb 17 19:08:57 gpu-server systemd[1]: Listening on Docker Socket for the API.
Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Application Container Engine...
Feb 17 19:08:58 gpu-server dockerd[18889]: time="2018-02-17T19:08:58.032297862Z" level=info msg="libcontainerd: new containerd process, pid: 18913"
Feb 17 19:08:59 gpu-server kernel: [ 327.615084] audit: type=1400 audit(1518894539.111:43): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=18928 comm="apparmor_parser"
Feb 17 19:08:59 gpu-server kernel: [ 327.662287] aufs 4.13-20170911
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.358149475Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.358893722Z" level=warning msg="Your kernel does not support swap memory limit"
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.359058327Z" level=warning msg="Your kernel does not support cgroup rt period"
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.359106384Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.360798923Z" level=info msg="Loading containers: start."
Feb 17 19:08:59 gpu-server kernel: [ 327.889472] Bridge firewalling registered
Feb 17 19:08:59 gpu-server kernel: [ 327.924904] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Feb 17 19:08:59 gpu-server dockerd[18889]: time="2018-02-17T19:08:59.456778799Z" level=info msg="Firewalld running: false"

Revision history for this message
Ernie Martinez (erniel29) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.