OpenPower: Some multipaths temporarily have only a single path
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
Undecided
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Ubuntu on IBM Power Systems Bug Triage | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Yakkety |
Fix Released
|
Medium
|
Unassigned | ||
Zesty |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
* The SES driver causes a long delay in disk discovery when
a large number of disks is present in the disk enclosure,
which increases with the number of disks attached.
* This delays the addition and visibility of the disk devices
to userspace, which among other things causes multipath not
to have multiple paths, actually, until the disk discovery
eventually/
* The fix significantly shortens the time taken by the SES
driver to handle disk discovery, causing no extra delays,
by removing a superfluous SCSI command sent to enclosure.
[Test Case]
* Load the module to access the enclosure and its disks; e.g.,
$ sudo modprobe mpt3sas
* Notice the interval between the discovery of each disk; e.g., dmesg
$ dmesg -T | grep 'Attached SCSI disk' | tail -n2
[Thu Jun 1 14:18:30 2017] sd 17:0:100:0: [sdcr] Attached SCSI disk
[Thu Jun 1 14:18:35 2017] sd 17:0:101:0: [sdcs] Attached SCSI disk
* The interval should be in the same second or so range with the fix.
$ dmesg -T | grep 'Attached SCSI disk' | tail -n2
[Wed Jun 7 13:11:59 2017] sd 18:0:176:0: [sdly] Attached SCSI disk
[Wed Jun 7 13:11:59 2017] sd 18:0:175:0: [sdlx] Attached SCSI disk
[Regression Potential]
* The power status of the disks in the enclosure is no longer
checked during probe time. However, the patch demonstrates that
initial value was never used in any way. So, little regression
potential.
* Nonetheless, users of SES enclosures which verify the power status
of disks in the enclosure might _theoretically_ see a problem, iff
the fix has a problem (which has not been found yet).
[Other Info]
* None at this time.
Problem Description:
=======
This week, I went ahead and scaled up my test configuration to max configuration 2x5U84_
Checkpoint #1:
==============
- system reboot around 2pm (14:00)
Checkpoint # 2:
===============
- It took several minutes for first disk to be detected.
root@smb1p1:~# multipath -ll|grep dm |wc -l
103
root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail
[Thu Jun 1 14:18:30 2017] sd 17:0:100:0: [sdcr] Attached SCSI disk
[Thu Jun 1 14:18:35 2017] sd 17:0:101:0: [sdcs] Attached SCSI disk
[Thu Jun 1 14:18:40 2017] sd 17:0:102:0: [sdct] Attached SCSI disk
[Thu Jun 1 14:18:44 2017] sd 17:0:103:0: [sdcu] Attached SCSI disk
[Thu Jun 1 14:18:54 2017] sd 17:0:105:0: [sdcv] Attached SCSI disk
[Thu Jun 1 14:18:59 2017] sd 17:0:106:0: [sdcw] Attached SCSI disk
[Thu Jun 1 14:19:04 2017] sd 17:0:107:0: [sdcx] Attached SCSI disk
[Thu Jun 1 14:19:09 2017] sd 17:0:108:0: [sdcy] Attached SCSI disk
[Thu Jun 1 14:19:14 2017] sd 17:0:109:0: [sdcz] Attached SCSI disk
[Thu Jun 1 14:19:19 2017] sd 17:0:110:0: [sdda] Attached SCSI disk
root@smb1p1:~#
...
root@smb1p1:~# multipath -ll|grep dm |wc -l
142
root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail
[Thu Jun 1 14:21:54 2017] sd 17:0:141:0: [sdee] Attached SCSI disk
[Thu Jun 1 14:21:58 2017] sd 17:0:142:0: [sdef] Attached SCSI disk
[Thu Jun 1 14:22:04 2017] sd 17:0:143:0: [sdeg] Attached SCSI disk
[Thu Jun 1 14:22:08 2017] sd 17:0:144:0: [sdeh] Attached SCSI disk
[Thu Jun 1 14:22:14 2017] sd 17:0:145:0: [sdei] Attached SCSI disk
[Thu Jun 1 14:22:18 2017] sd 17:0:146:0: [sdej] Attached SCSI disk
[Thu Jun 1 14:22:24 2017] sd 17:0:147:0: [sdek] Attached SCSI disk
[Thu Jun 1 14:22:29 2017] sd 17:0:148:0: [sdel] Attached SCSI disk
[Thu Jun 1 14:22:34 2017] sd 17:0:149:0: [sdem] Attached SCSI disk
[Thu Jun 1 14:22:39 2017] sd 17:0:150:0: [sden] Attached SCSI disk
root@smb1p1:~#
...
- After 43 minutes, multipath -ll command shows some paths with only single path and no redundancy and some path with multiple paths and redundancy.
root@smb1p1:~# date
Thu Jun 1 14:43:00 CDT 2017
root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+'
252
root@smb1p1:~#
...
- After 47 minutes, multipath -ll command still shows some paths with only single path and no redundancy.
root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+'
288
root@smb1p1:~#
- After 51 minutes after system reboot, looks like all disk are discovered and the Multipath is correctly built.
root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+'
336
== Comment: #24 - Mauricio Faria De Oliveira - 2017-06-06 11:42:59 ==
Hi Paul,
Per your logs, yes, it's the slowness with the SES driver.
I'll ask Canonical to pick it up for 16.10 and 17.04 so it makes into 16.04.2 and 16.04.3.
Thanks,
Mauricio
== Comment: #26 - Mauricio Faria De Oliveira <email address hidden> - 2017-06-06 12:06:32 ==
The patch applies cleanly in the master-next branch of ubuntu-zesty.git and ubuntu-yakkety.git.
Mirroring to Canonical to get a LP bug number, required in the submission process.
== Comment: #27 - Mauricio Faria De Oliveira <email address hidden> - 2017-06-06 12:07:58 ==
The commit is [1].
commit 75106523f397513
Author: Mauricio Faria de Oliveira <email address hidden>
Date: Wed Apr 5 12:18:19 2017 -0300
scsi: ses: don't get power status of SES device slot on probe
description: | updated |
description: | updated |
tags: | added: kernel-da-key |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu Zesty): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Yakkety): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Zesty): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
Changed in ubuntu-power-systems: | |
status: | New → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in linux (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
Default Comment by Bridge