Controller lockup detected on ProLiant DL380 Gen9 with P440 Controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Eric Desrochers | ||
Zesty |
Fix Released
|
Medium
|
Eric Desrochers | ||
Artful |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Deploying ceph osd on Trusty/14.04 (LTS) with Ubuntu 4.4 series kernel on HP HW[1] system triggers "Controller lockup"[2]
[1] - HW
System Information
Manufacturer: HP
Product Name: ProLiant DL380 Gen9
BIOS
Vendor: HP
Version: P89
Release Date: 02/17/2017
Smart Array Controller
Smart Array P440 Controller
[2] - /var/log/kern.log
...
ceph-osd: 2017-09-26 15:34:42.259205 7f99a3d38700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f99ad69d700' had timed out after 60
eph-osd: 2017-09-26 15:34:42.259215 7f99a553b700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f99ace9c700' had timed out after 60
ceph-osd: 2017-09-26 15:34:42.259219 7f99a553b700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f99ad69d700' had timed out after 60
hpsa 0000:08:00.0: Controller lockup detected: 0x00130001 after 30
hpsa 0000:08:00.0: hpsa_send_
....
CVE References
tags: | added: sts |
tags: | added: kernel-da-key |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
status: | Confirmed → In Progress |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in linux (Ubuntu Artful): | |
status: | In Progress → Fix Released |
Changed in linux (Ubuntu Zesty): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in linux (Ubuntu Xenial): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in linux (Ubuntu Trusty): | |
assignee: | nobody → Eric Desrochers (slashd) |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Zesty): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in linux (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in linux (Ubuntu Zesty): | |
status: | New → In Progress |
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Zesty): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-xenial verification-done-zesty removed: verification-needed-xenial verification-needed-zesty |
The above behavior is reproducible using upstream mainline kernel, so not Ubuntu specific.
A bisection against upstream kernel revealed :
# first bad commit: [a736e9b6a03283 a2e0fc8190b748b 3a672f289c1] hpsa: correct ioaccel2 sg chain len
Note that the behavior also seems to be fix in recent upstream kernel (tested with v4.14-rc2)
The upstream v4.14-rc2 kernel is using :
HPSA_DRIVER_VERSION : "3.4.20-0"
while
Ubuntu-4.4.0-97.120 is using :
HPSA_DRIVER_VERSION : "3.4.14-0"
Another bisection is in progress to find the first good commit between "a736e9b6a03283 a2e0fc8190b748b 3a672f289c1] " and "v4.14-rc2" HEAD.
- Eric