Precise crashes hard when HP array rebuilds
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
High
|
Unassigned |
Bug Description
We've long suspected that on certain hardware, Precise will crash entirely when certain events (such as "the array has finished rebuilding") come from the HP storage array.
On a ProLiant SL335s G7 running 3.2.0-33-generic amd64 on Ubuntu 12.04.1 LTS, we noticed this with more conclusive information. The system in question is an openstack compute node, and we pulled the following out of its logs:
{'class': 'POST Message',
'count': 1,
'description': 'POST Error: 1716-Slot X Drive Array - Unregenerable Media Errors Detected on Drives during previous Rebuild or Auto-Reliability Monitoring (ARM) scan. Problem will be fixed automatically when the sector(s) are overwritten.',
'initial_update': '01/10/2013 17:09',
'last_update': '01/10/2013 17:09',
'severity': 'Caution'}]
This corresponded with the system crash.
I'm not sure what level of reboot-on-panic or host watchdog/ILO may have done this, but at least one crash resulted in an automatic reboot. More may have followed.