Mutipath performance issues: issues with connect_volume and similar functions

Bug #1443977 reported by Loren Erwin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Andrey Kurilin
6.0.x
Fix Released
High
Alexander Nevenchannyy
6.1.x
Fix Committed
High
Andrey Kurilin
7.0.x
Fix Released
High
Andrey Kurilin

Bug Description

Upstream bugs: https://bugs.launchpad.net/nova/+bug/1277316 and https://bugs.launchpad.net/nova/+bug/1382440

Customer: When multipath is enabled with Nova, a different codepath is
executed for most functions, such as connect_volume and disconnect_volume.

A series of loops exist within connect_volume when multipath is enabled
An iscsiadm discover is ran against the portal information associated with the instance, in our case a portal on the nimble storage array.
For every iqn discovered, _connect_to_iscsi_portal runs and logins to the portal.

In our test scenario, with 183 instances - this results in 183*2 IQN targets being discovered, and logged into, every single time an instance is launched on that node. With multiple instance creation this compounds.

These calls can take upwards of 1 second to run * the number of times.

Additionally there seems to be other code paths with similar loops with iSCSI behaviour.

We may have just addressed the connect_volume loop as well. The IQN of the iSCSI device is always the same, although the portal may be different. iscsi_properties['target_iqn'] contains the target_iqn. Instead of looping through every single IQN to login to, the code now only logs into the IQN's it needs.

such as

for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
if iqn in iscsi_properties['target_iqn']:
props = iscsi_properties.copy()
props['target_portal'] = ip
props['target_iqn'] = iqn
self._connect_to_iscsi_portal(props)

Revision history for this message
Loren Erwin (loren-erwin) wrote :
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 6.1
tags: added: customer-found
no longer affects: fuel
Changed in mos:
milestone: none → 6.1
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
description: updated
tags: added: nova
Revision history for this message
OSCI Robot (oscirobot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/6189
Committed: https://review.fuel-infra.org/gitweb?p=openstack/nova.git;a=commitdiff;h=4495e8d8e7a3052cf04a5ceaf8d01d85c0c6c128
Submitter: mos-infra-ci
Branch: openstack-ci/fuel-6.0-updates/2014.2

commit 4495e8d8e7a3052cf04a5ceaf8d01d85c0c6c128
Author: Alexandr Nevenchannyy <email address hidden>

Fix connecting unnecessary iSCSI sessions issue

In Icehouse with "iscsi_use_multipath=true", attaching a multipath
iSCSI volume may create unnecessary iSCSI sessions.

The iscsiadm discovery command in connect_volume() returns all of the
targets in the Cinder node, not just the ones related to the multipath
volume which is specified by iqn. If the storage has many targets,
connecting to all these volumes will also result in many unnecessary
connections.

There are two types of iSCSI multipath devices. One which shares
the same iqn between multiple portals, and the other which use
different iqns on different portals. connect_volume() needs to
identify the type by checking iscsiadm the output if the iqn is
used by multiple portals.

This patch changes the behavior of attaching volume:

   1. Identify the type by checking the iscsiadm output.
   2. Connect to the correct targets by connect_to_iscsi_portal().

(cherry picked from commit fb0de106f2f15604750bafc318ba06c41070cc35)

Conflicts:
        nova/tests/unit/virt/libvirt/test_volume.py

Change-Id: I488ad0c09bf26a609e27d67b9ef60b65bb45e0ad

Change-Id: I926f50eaf4ea9384de376f2a4b07cac98f82a3b0
Closes-bug: #1443974
Closes-bug: #1443977

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.0.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0.1/2014.2
Change author: Keiichi KII <email address hidden>
Review: https://review.fuel-infra.org/6510

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-6.0.1/2014.2
Change author: Nikolas Hermanns <email address hidden>
Review: https://review.fuel-infra.org/6511

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/6476
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: eb5cbb7fdbf203783fc8c1c463e804910d061d46
Author: Keiichi KII <email address hidden>
Date: Fri May 8 11:41:07 2015

libvirt: optimize multipath call to identify IQN

When detaching a multipath volume in an environment where there are
many attached volumes, excessive multipath calls are generated and
it takes too much time. This issue is due to the fact that a multipath -ll
call against many multipath devices takes a few seconds.

When detaching a volume, the current _disconnect_volume_multipath_iscsi()
calls 'multipath -ll <each iscsi device>' against every iscsi devices.
This behavior is to identify the IQNs used by multipath device, however
the IQNs are extracted from one 'multipath -ll' call without massive
'multipath -ll <each iscsi device>' calls.

This patch changes the behavior of identifying IQNs used by multipath device:

  1. add an utility to identify IQNs by using multipath device map
     (/dev/sdX => /dev/mapper/XXX) generated by parsing 'multipath -ll'.
  2. replace the current nested for loop to identify the IQNs with
     the utility.

Conflicts:
 nova/tests/unit/virt/libvirt/test_volume.py
 nova/virt/libvirt/volume.py

Partial-Bug: #1443977

Change-Id: I77e6eda950726d7ee9a0d92882d4501e70a0d8f8

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Reviewed: https://review.fuel-infra.org/6477
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: beb72248f444decebddd6017d50321bbf3199926
Author: Nikolas Hermanns <email address hidden>
Date: Fri May 8 11:41:29 2015

LibvirtDriver: Add post_connection_terminated

When multiple iSCSI connections are terminated
it is very likely to happen that not all
iSCSI devices and multipath devices are deleted
on the host (where nova-compute is running).
This bugfix adds the possibility to call
post_connection_terminated to the libvirt driver
which can be called after terminate_connection
is called. In the case of iSCSI it will a second
time remove the iSCSI and multipath devices.

Conflicts:
 nova/tests/unit/compute/test_compute.py
 nova/tests/unit/compute/test_compute_mgr.py

Closes-Bug: #1443977
Change-Id: I817214749fdb0da8d577ebd89ebdde4f670006ed

summary: Mutipath performance issues: issues with connect_volume and similar
- fucntions
+ functions
tags: added: on-verification
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

I performed following operations from description of upstream bug.

  1. configure "iscsi_use_multipath=True" in nova.conf on compute node.
  2. configure "volume_driver=cinder.volume.drivers.lvm.LVMISCSIDriver"
     in cinder.conf on cinder node.
  2. create an instance.
  3. create 3 volumes and attach them to the instance.
  4. detach one of these volumes.
  5. check "multipath -ll" and "iscsiadm --mode session".
I had to install iscsiadm and multipath tools before this step.

~# iscsiadm --mode session
iscsiadm: No active sessions.

# multipath -ll
(gave no any output)

MOS 6.1 Build 462
http://jenkins-product.srt.mirantis.net:8080/job/6.1.all/462/

Is this kind of behavior expected?

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Update: re-configuring of services just required reboot of nodes.

So we have on compute node with one instance running, two volumes attached and one detached:

# multipath -ll
33000000100000001 dm-3 IET,VIRTUAL-DISK
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:1 sda 8:0 active ready running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 2:0:0:1 sdb 8:16 active ready running

# iscsiadm --mode session
tcp: [1] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-8e65e22a-0b99-4e28-bca0-e15204fdc648
tcp: [2] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-8a5a8fe8-6a18-4cf1-b510-38854ed84dc4

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Attached three volumes

 iscsiadm --mode session ; multipath -ll
tcp: [1] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-9289512a-227a-4085-87e6-e69af6574ef8
tcp: [2] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-68502556-ea4f-4947-bd01-83de129640b6
tcp: [3] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-30681ec0-6584-4146-b86f-2f3a577ae123
33000000300000001 dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:0:0:1 sdc 8:32 active ready running
33000000100000001 dm-3 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 2:0:0:1 sda 8:0 active ready running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:1 sdb 8:16 active ready running

Detached one of them

root@node-5:~# iscsiadm --mode session ; multipath -ll
tcp: [2] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-68502556-ea4f-4947-bd01-83de129640b6
tcp: [3] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-30681ec0-6584-4146-b86f-2f3a577ae123
33000000300000001 dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:0:0:1 sdc 8:32 active ready running
33000000100000001 dm-3 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:1 sdb 8:16 active ready running

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

On another env:

3 volumes attached to instance:

root@node-7:~# iscsiadm --mode session ; multipath -ll
tcp: [2] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-140028ac-ded0-4508-8843-e6b7033ae5a8
tcp: [3] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-44fd8a87-0548-4bea-9edc-791194476d02
tcp: [4] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-d87fef38-ebdc-47ba-b2af-93dcb556f8cd
33000000300000001 dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 5:0:0:1 sdc 8:32 active ready running
33000000100000001 dm-3 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:1 sda 8:0 active ready running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:0:0:1 sdb 8:16 active ready running

Detached one of them:

root@node-7:~# iscsiadm --mode session ; multipath -ll
tcp: [3] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-44fd8a87-0548-4bea-9edc-791194476d02
tcp: [4] 192.168.1.4:3260,1 iqn.2010-10.org.openstack:volume-d87fef38-ebdc-47ba-b2af-93dcb556f8cd
33000000300000001 dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 5:0:0:1 sdc 8:32 active ready running
33000000100000001 dm-3 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- #:#:#:# - #:# active faulty running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:0:0:1 sdb 8:16 active ready running

Detached all volumes

root@node-7:~# iscsiadm --mode session ; multipath -ll
iscsiadm: No active sessions.
33000000300000001 dm-5 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running
33000000100000001 dm-3 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running
33000000200000001 dm-4 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "462"
  build_id: "2015-05-24_15-51-50"
  nailgun_sha: "76441596e4fe6420cc7819427662fa244e150177"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "0bd72c72369e743376864e8e8dabfe873d40450a"
  fuel-library_sha: "889c2534ceadf8afd5d1540c1cadbd913c0c8c14"
  fuel-ostf_sha: "9a5f55602c260d6c840c8333d8f32ec8cfa65c1f"
  fuelmain_sha: "5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93"

Ubuntu deployment, 1 controller, 1 compute, 1 cinder lvm
Neutron with VLAN segmentation.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Keiichi KII <email address hidden>
Review: https://review.fuel-infra.org/8263

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Nikolas Hermanns <email address hidden>
Review: https://review.fuel-infra.org/8264

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/8263
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: c593c1feb590d8e17e50ebd0c3e421770a6aa8dc
Author: Keiichi KII <email address hidden>
Date: Thu Jul 16 08:22:44 2015

libvirt: optimize multipath call to identify IQN

When detaching a multipath volume in an environment where there are
many attached volumes, excessive multipath calls are generated and
it takes too much time. This issue is due to the fact that a multipath -ll
call against many multipath devices takes a few seconds.

When detaching a volume, the current _disconnect_volume_multipath_iscsi()
calls 'multipath -ll <each iscsi device>' against every iscsi devices.
This behavior is to identify the IQNs used by multipath device, however
the IQNs are extracted from one 'multipath -ll' call without massive
'multipath -ll <each iscsi device>' calls.

This patch changes the behavior of identifying IQNs used by multipath device:

  1. add an utility to identify IQNs by using multipath device map
     (/dev/sdX => /dev/mapper/XXX) generated by parsing 'multipath -ll'.
  2. replace the current nested for loop to identify the IQNs with
     the utility.

Co-Authored-By: Pavel Kholkin <email address hidden>

Change-Id: I77e6eda950726d7ee9a0d92882d4501e70a0d8f8
Closes-Bug: #1443977

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Reviewed: https://review.fuel-infra.org/8264
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 21bd974374f1212c25642f361476a6feb05eed6b
Author: Nikolas Hermanns <email address hidden>
Date: Thu Jul 16 08:22:47 2015

libvirt: Add post_connection_terminated

When multiple iSCSI connections are terminated
it is very likely to happen that not all
iSCSI devices and multipath devices are deleted
on the host (where nova-compute is running).
This bugfix adds the possibility to call
post_connection_terminated to the libvirt driver
which can be called after terminate_connection
is called. In the case of iSCSI it will a second
time remove the iSCSI and multipath devices.

Co-Authored-By: Pavel Kholkin <email address hidden>
Co-Authored-By: Sergey Nikitin <email address hidden>

DocImpact
Closes-Bug: #1443977

Change-Id: I817214749fdb0da8d577ebd89ebdde4f670006ed

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Keiichi KII <email address hidden>
Review: https://review.fuel-infra.org/13281

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Nikolas Hermanns <email address hidden>
Review: https://review.fuel-infra.org/13291

tags: added: on-verification
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Verification of bug blocked by new issue:
Can not attach volume to instance when iscsi multipath enabled
https://bugs.launchpad.net/mos/+bug/1513145

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Steps:
1. Deploy environment of released MOS 7.0 (MirantisOpenStack-7.0.iso) and install MU1 updates repos.
Settings:
Kilo on Ubuntu 14.04
1 controller, 1 compute, 1 cinder node.
Cinder LVM over iSCSI for volumes, Neutron VLAN, QEMU

2. SSH to compute node.
3. Configure "iscsi_use_multipath=True" in [libvirt] section of /etc/nova/nova.conf on compute node.
4. Then restart nova-compute service.
5. Launch an instance.
6. Create 3 volumes and attach them to instance.
7. Check iscsiadm and multipath:
root@node-5:/var/log/nova# iscsiadm --mode session ; multipath -ll
tcp: [1] 192.168.1.1:3260,1 iqn.2010-10.org.openstack:volume-022fbe38-6605-4033-9214-f5fe0ae99821
tcp: [2] 192.168.1.1:3260,1 iqn.2010-10.org.openstack:volume-0b632367-4a2d-4252-9fa7-9a3fcd73efda
tcp: [3] 192.168.1.1:3260,1 iqn.2010-10.org.openstack:volume-241b85e5-f87a-4902-bc2e-7efb67d7e93e
33000000300000001 dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:0:0:1 sdc 8:32 active ready running
33000000100000001 dm-3 IET,VIRTUAL-DISK
size=3.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 2:0:0:1 sda 8:0 active ready running
33000000200000001 dm-4 IET,VIRTUAL-DISK
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:1 sdb 8:16 active ready running

8. Detach volumes from instance one by one.
9. Recheck iscsiadm and multipath:
root@node-5:/var/log/nova# iscsiadm --mode session ; multipath -ll
iscsiadm: No active sessions.
33000000300000001 dm-5 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running
33000000100000001 dm-3 ,
size=3.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running
33000000200000001 dm-4 ,
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- #:#:#:# - #:# failed faulty running

As we can see IQNs was closed, but volumes stuck in multipath list with faulty condition.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Note that open-iscsi​ and multipath-tools should​ be installed on Compute node.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-8.0/liberty)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13281
Reason: Already in os-brick in stable/liberty - https://review.openstack.org/#/c/190864/1

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I've verified it on MOS7.0 with MU1 - bug didn't reproduce.
ENV: neutron vlan / 3 controllers, 2 compute / ceph for all
Proof: http://pastebin.com/i1ZtZBfj

Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13291
Reason: This never made it to upstream or was tested in any real environments. It potentially breaks more things than it fixes. Looks like os-brick currently handles this better and we don't really need this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.