hisi_sas_v3_hw: internal task abort: timeout and not done.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
dann frazier | ||
Bionic |
Fix Released
|
Undecided
|
dann frazier |
Bug Description
[Impact]
On deployments with lots of disks, timeouts can occur that escalate into nexus resets. This can cause disk devices to disappear from the system, possibly requiring a reboot to recover:
[18324.951189] cq: iptt:892, task:ffff8026fb
[18324.951190] sb dw0:0x8001,
[18324.951191] cmd table: 0x0,0x0,0x0,0x0,0x0
[18324.951192] itct: 0x12fa0345,
[18324.951334] hisi_sas_v3_hw 0000:74:02.0: slot complete: task(ffff8026fb
[18325.039774] sb dw0:0x8001,
[18325.044467] cmd table: 0x0,0x0,0x0,0x0,0x0
[18325.048553] itct: 0x12fa0345,
[18325.057058] hisi_sas_v3_hw 0000:74:02.0: slot complete: task(ffff8027dc
[18326.951312] cq: iptt:1705, task:ffff802782
[18326.968247] sb dw0:0x8001,
[18326.972938] cmd table: 0x0,0x0,0x0,0x0,0x0
[18326.977023] itct: 0x12fa0345,
[18326.985496] hisi_sas_v3_hw 0000:74:02.0: slot complete: task(ffff802782
[18329.384695] hisi_sas_v3_hw 0000:74:02.0: internal task abort: timeout and not done.
[18329.392344] hisi_sas_v3_hw 0000:74:02.0: start dump all regs,reason:abort timeout!
[18329.399904] ***************DUMP IS DISABLED*
[18329.405467] dump reg fail.
[18329.408162] hisi_sas_v3_hw 0000:74:02.0: I_T nexus reset: internal abort (-5)
[18329.936017] cq: iptt:649, task:ffff802798
[18329.936154] cq: iptt:1091, task:ffff8026ff
[18329.936155] sb dw0:0x8001,
[18329.936156] cmd table: 0x0,0x0,0x0,0x0,0x0
[18329.936158] itct: 0x12fa0345,
[18329.936301] hisi_sas_v3_hw 0000:74:02.0: slot complete: task(ffff8026ff
[Test Case]
This was seen on a system with 100s of disks, something I don't have access to, so verification testing will be regression-only.
[Fix]
A fix queued in the scsi maintainer's tree adjusts some magic registers in the controller, and that somehow fixes the problem (I don't have programming docs for this controller, so I can only hand-wave here).
[Regression Risk]
The fix is localized to the hisi_sas_v3_hw driver, which is only used in Ubuntu for the D06 platform.
summary: |
- hisi_sas: + hisi_sas_v3_hw: internal task abort: timeout and not done. |
Changed in linux (Ubuntu): | |
status: | New → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
assignee: | nobody → dann frazier (dannf) |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → dann frazier (dannf) |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- bionic' to 'verification- done-bionic' . If the problem still exists, change the tag 'verification- needed- bionic' to 'verification- failed- bionic' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!