first iscsi device attach on host fails with multipath enabled

Bug #1854159 reported by Lars
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
os-brick
New
Undecided
Unassigned

Bug Description

On the first iscsi attachment on a host (e.g. newly deployed vm) the multipath device only gets attached with one path[1]. All following attachments work fine, it even works if I delete the first vm with that iscsi device and recreate it.

It looks like there is a race condition in the _connect_to_iscsi_portal[2] function wich gets called from the _connect_vol[3] function and this function is called for each path (4 paths) in a different thread[4].

I currently do not understand why this behaviour is only for the first time, but I was able to solve that problem by adding a random millisecond delay in front of that line[5]. It looks like on the first "iscsiadm -m node -T <target> -p <ip>:<port> --interface default --op new" call there shouldn't be a second call on the same time.

And as this thread never finishes it is stuck in the waiting loop forver: https://github.com/openstack/os-brick/blob/81f26f822d66c71c29ea25fd4158ac41fc162964/os_brick/initiator/connectors/iscsi.py#L733

[1]
[root@node01:~] 1 # multipath -ll
3624a9370d6c5f98b3e94412345678905 dm-3 PURE,FlashArray
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 1:0:0:1 sdc 8:32 active ready running

[2]
https://github.com/openstack/os-brick/blob/81f26f822d66c71c29ea25fd4158ac41fc162964/os_brick/initiator/connectors/iscsi.py#L1021

[3]
https://github.com/openstack/os-brick/blob/81f26f822d66c71c29ea25fd4158ac41fc162964/os_brick/initiator/connectors/iscsi.py#L592

[4]
https://github.com/openstack/os-brick/blob/81f26f822d66c71c29ea25fd4158ac41fc162964/os_brick/initiator/connectors/iscsi.py#L721

[5]
https://github.com/openstack/os-brick/blob/81f26f822d66c71c29ea25fd4158ac41fc162964/os_brick/initiator/connectors/iscsi.py#L1037

Revision history for this message
Lars (l4rs) wrote :

we have an ubuntu 18.04 environment using ubuntu cloud archive repository for stein running everything under python 3. the exact versions we have are:

os-brick: 2.8.1-0ubuntu1~cloud0
cinder: 1:4.1.0-0ubuntu1~cloud0

we even tried using the os-brick package version 2.10.x and open-iscsi package from Ubuntu eaon repository. unfortunately without any luck.

we are having a purestorage for our iscsi devices.

Revision history for this message
do3meli (d-info-e) wrote :

the same behaviour is being observed with Ubuntu Cloud Archive repositories for the train release with the following packages:

os-brick: 2.10.0-0ubuntu1~cloud0
cinder: 1:5.0.0-0ubuntu2~cloud0

Revision history for this message
Gorka Eguileor (gorka) wrote :

This is most likely yet another race condition within the Open-iSCSI persistent configuration database code.

Probably the reason why this only happens on the first connection is because the Pure driver is sharing the same target portal, so once the first login succeeds the rest are just a simple LUN scan.

Revision history for this message
Glenn Satchell (7-glenn) wrote :

We are seeing this too. Ubuntu focal with openstack ussuri. These are new servers and storage installed in the past 2 months.

os-brick-common 3.0.1-0ubuntu1.3
cinder-volume 2:16.2.1.7.g32855b7f9+focal-1

open-iscsi 2.0.874-7.1ubuntu6.2

Storage is IBM FS7200 using volume_driver=cinder.volume.drivers.ibm.storwize_svc.storwize_svc_iscsi.StorwizeSVCISCSIDriver

Revision history for this message
Glenn Satchell (7-glenn) wrote :

on each compute node we have 2 nodes with 4 paths each giving a total of 8 paths

Revision history for this message
do3meli (d-info-e) wrote :

meanwhile we have upgraded our environment to victoria on ubuntu 20.04 with the cloud archive repositories and we still see this behaviour.

os-brick: 4.0.1-0ubuntu1~cloud0
cinder: 2:17.1.0-0ubuntu1~cloud0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.