Activity log for bug #1807723

Date Who What changed Old value New value Message
2018-12-10 15:43:48 Matt Riedemann bug added bug
2018-12-10 15:58:11 Matt Riedemann description This is found from debugging the tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap failures in the (multinode) tempest-slow job here: https://review.openstack.org/#/c/606981/ The failure is ultimately during teardown of the test class, it fails to delete the "old" multiattach volume: http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/controller/logs/screen-c-vol.txt.gz?level=TRACE#_Dec_08_01_45_41_380161 Dec 08 01:45:41.380161 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [req-4aa23d64-309a-44a4-a7c9-381a6fbef330 req-f47e0ea9-7e80-4481-bce1-7a1060c6728b tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command. Dec 08 01:45:41.380475 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 Dec 08 01:45:41.380783 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Exit code: 22 Dec 08 01:45:41.381094 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stdout: u'Command:\n\ttgtadm -C 0 --mode target --op delete --tid=1\nexited with code: 22.\n' Dec 08 01:45:41.381386 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stderr: u'tgtadm: this target is still active\n': ProcessExecutionError: Unexpected error while running command. Dec 08 01:45:49.587081 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [None req-9dddc5b9-7e06-4ab9-83a6-ef211e0d4c48 tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command. Dec 08 01:45:49.587355 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 After digging through the logs, I can see at least one set of issues in the ComputeManager.swap_volume code here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5798-L5806 Which is updating the "old" bdm with values from the new volume, including updating the bdm.volume_id from the old to the new, which likely orphans the old record somehow, or messes up the cleanup. My detailed notes follow: server1: {"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:11.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-1", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}} server1.hostId = f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19 server1 builds on the subnode (ubuntu-xenial-ovh-gra1-0001066278) server2: {"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:10.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-2", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}} server2.hostId = 5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df server2 builds on the primary node (ubuntu-xenial-ovh-gra1-0001066161) So clearly the two servers are on different hosts. Also, volume1 is on the subnode (same as server1). attach volume1 to server1: Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437 attach volume1 to server2: Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437 swap volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) to volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) on server1 (e0b74695-f9a0-4f4c-8a08-a0de3420cc6c): 2018-12-08 01:43:28.336 8815 INFO tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request (TestMultiAttachVolumeSwap:test_volume_swap_with_multiattach): 202 PUT https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8 0.618s 2018-12-08 01:43:28.337 8815 DEBUG tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>', 'X-OpenStack-Nova-API-Version': '2.60'} Body: {"volumeAttachment": {"volumeId": "fa56e173-3bd4-4701-bc71-cb4810c53c51"}} Response - Headers: {u'x-openstack-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-length': '0', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'server': 'Apache/2.4.18 (Ubuntu)', u'x-compute-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-type': 'application/json', 'content-location': 'https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8', u'date': 'Sat, 08 Dec 2018 01:43:27 GMT', 'status': '202', u'openstack-api-version': 'compute 2.60', u'connection': 'close', u'x-openstack-nova-api-version': '2.60'} now the test waits for the swap to complete and the attachments are all recorded correctly; once the swap is done, volume1 should be attached to server2 and volume2 should be attached to server1 This is the connection_info for the old/new volume during the swap: Dec 08 01:43:31.321263 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Calling driver volume swap with connection infos: new: {'status': u'reserved', 'multiattach': True, 'detached_at': u'', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'attach_mode': u'null', 'driver_volume_type': u'iscsi', 'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', 'attached_at': u'', 'serial': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'data': {u'access_mode': u'rw', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_portal': u'213.32.74.238:3260', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_lun': 1, u'auth_password': u'***', u'auth_username': u'oAJ38A88Rom7xrwQ5A4Y', u'auth_method': u'CHAP'}}; old: {u'status': u'reserved', u'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', u'multiattach': True, u'attached_at': u'', u'attach_mode': u'null', u'driver_volume_type': u'iscsi', u'detached_at': u'', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'serial': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'data': {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}} {{(pid=20206) _swap_volume /opt/stack/nova/nova/compute/manager.py:5646}} When the test cleans up and tries to delete volume1, it fails trying to cleanup iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 from the subnode (which is the old connection to server1). volume1 connection is supposed to be detached from server1 here: http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/compute1/logs/screen-n-cpu.txt.gz#_Dec_08_01_45_36_699753 Dec 08 01:45:36.699753 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG os_brick.initiator.connectors.iscsi [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] ==> disconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fe1aa096d50>, {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}, None), 'kwargs': {}}" {{(pid=20206) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:146}} It says it logged out: Dec 08 01:45:37.336156 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: Logout of [sid: 1, target: iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8, portal: 149.202.185.218,3260] successful. This definitely looks wrong (post-swap volume, updating old BDM record), note it's updating the BDM for volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) but with the volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) connection_info - note that for the new volume attach flow the connection_info might not matter since we likely don't use it (we just work with the attachment records), but if the attachment_id (769b75cc-4283-4b46-a975-243faae6e263) is for the wrong volume, that could be a problem: Dec 08 01:45:37.699221 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Updating volume 2a48ae08-2833-488c-8771-eea86f41c7b8 BDM record with {'attachment_id': u'769b75cc-4283-4b46-a975-243faae6e263', 'no_device': None, 'connection_info': '{"status": "reserved", "multiattach": true, "detached_at": "", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "attach_mode": "null", "driver_volume_type": "iscsi", "instance": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "attached_at": "", "serial": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_portal": "213.32.74.238:3260", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_lun": 1, "device_path": "/dev/sdb", "auth_password": "Ko6DLxWNfrbqWQ7E", "auth_username": "oAJ38A88Rom7xrwQ5A4Y", "auth_method": "CHAP"}}', 'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'source_type': 'volume', 'snapshot_id': None, 'destination_type': 'volume'} {{(pid=20206) swap_volume /opt/stack/nova/nova/compute/manager.py:5819}} Probably the bigger problem is the BDM for volume1 is updated to have volume_id fa56e173-3bd4-4701-bc71-cb4810c53c51 so we changed the volume_id on the bdm for volume1, so that might have something to do with orphaning volume1's connections when we go to delete - since deleting server2 (which still has volume1 attached) likely doesn't cleanup volume1 properly. The other things I'm wondering about are: 1. Is nova-compute deleting the old volume1/server1 attachment record here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5691 because I didn't see anything about attachment delete in the logs. 2. Is there a periodic task in the cinder-volume service that is recreating the target connection for volume1 and server1 in between the time we swap the volume and nova-compute deletes the connection but before nova-compute deletes the old attachment record? The first and most obvious thing to do is fix that bdm update code at the end of the ComputeManager.swap_volume method. I'm not exactly sure yet how the multi-host nature of this comes into play, except possibly in the libvirt driver volume disconnect code here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/virt/libvirt/driver.py#L1221 This is found from debugging the tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap failures in the (multinode) tempest-slow job here: https://review.openstack.org/#/c/606981/ The failure is ultimately during teardown of the test class, it fails to delete the "old" multiattach volume: http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/controller/logs/screen-c-vol.txt.gz?level=TRACE#_Dec_08_01_45_41_380161 Dec 08 01:45:41.380161 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [req-4aa23d64-309a-44a4-a7c9-381a6fbef330 req-f47e0ea9-7e80-4481-bce1-7a1060c6728b tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command. Dec 08 01:45:41.380475 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 Dec 08 01:45:41.380783 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Exit code: 22 Dec 08 01:45:41.381094 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stdout: u'Command:\n\ttgtadm -C 0 --mode target --op delete --tid=1\nexited with code: 22.\n' Dec 08 01:45:41.381386 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stderr: u'tgtadm: this target is still active\n': ProcessExecutionError: Unexpected error while running command. Dec 08 01:45:49.587081 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [None req-9dddc5b9-7e06-4ab9-83a6-ef211e0d4c48 tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command. Dec 08 01:45:49.587355 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 After digging through the logs, I can see at least one set of issues in the ComputeManager.swap_volume code here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5798-L5806 Which is updating the "old" bdm with values from the new volume, including updating the bdm.volume_id from the old to the new, which likely orphans the old record somehow, or messes up the cleanup. My detailed notes follow: server1: {"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:11.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-1", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}} server1.hostId = f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19 server1 builds on the subnode (ubuntu-xenial-ovh-gra1-0001066278) server2: {"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:10.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-2", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}} server2.hostId = 5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df server2 builds on the primary node (ubuntu-xenial-ovh-gra1-0001066161) So clearly the two servers are on different hosts. Also, volume1 is on the primary node (same as server2). attach volume1 to server1: Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437 attach volume1 to server2: Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437 swap volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) to volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) on server1 (e0b74695-f9a0-4f4c-8a08-a0de3420cc6c): 2018-12-08 01:43:28.336 8815 INFO tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request (TestMultiAttachVolumeSwap:test_volume_swap_with_multiattach): 202 PUT https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8 0.618s 2018-12-08 01:43:28.337 8815 DEBUG tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>', 'X-OpenStack-Nova-API-Version': '2.60'}         Body: {"volumeAttachment": {"volumeId": "fa56e173-3bd4-4701-bc71-cb4810c53c51"}}     Response - Headers: {u'x-openstack-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-length': '0', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'server': 'Apache/2.4.18 (Ubuntu)', u'x-compute-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-type': 'application/json', 'content-location': 'https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8', u'date': 'Sat, 08 Dec 2018 01:43:27 GMT', 'status': '202', u'openstack-api-version': 'compute 2.60', u'connection': 'close', u'x-openstack-nova-api-version': '2.60'} now the test waits for the swap to complete and the attachments are all recorded correctly; once the swap is done, volume1 should be attached to server2 and volume2 should be attached to server1 This is the connection_info for the old/new volume during the swap: Dec 08 01:43:31.321263 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Calling driver volume swap with connection infos: new: {'status': u'reserved', 'multiattach': True, 'detached_at': u'', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'attach_mode': u'null', 'driver_volume_type': u'iscsi', 'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', 'attached_at': u'', 'serial': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'data': {u'access_mode': u'rw', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_portal': u'213.32.74.238:3260', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_lun': 1, u'auth_password': u'***', u'auth_username': u'oAJ38A88Rom7xrwQ5A4Y', u'auth_method': u'CHAP'}}; old: {u'status': u'reserved', u'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', u'multiattach': True, u'attached_at': u'', u'attach_mode': u'null', u'driver_volume_type': u'iscsi', u'detached_at': u'', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'serial': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'data': {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}} {{(pid=20206) _swap_volume /opt/stack/nova/nova/compute/manager.py:5646}} When the test cleans up and tries to delete volume1, it fails trying to cleanup iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 from the subnode (which is the old connection to server1). volume1 connection is supposed to be detached from server1 here: http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/compute1/logs/screen-n-cpu.txt.gz#_Dec_08_01_45_36_699753 Dec 08 01:45:36.699753 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG os_brick.initiator.connectors.iscsi [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] ==> disconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fe1aa096d50>, {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}, None), 'kwargs': {}}" {{(pid=20206) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:146}} It says it logged out: Dec 08 01:45:37.336156 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: Logout of [sid: 1, target: iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8, portal: 149.202.185.218,3260] successful. This definitely looks wrong (post-swap volume, updating old BDM record), note it's updating the BDM for volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) but with the volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) connection_info - note that for the new volume attach flow the connection_info might not matter since we likely don't use it (we just work with the attachment records), but if the attachment_id (769b75cc-4283-4b46-a975-243faae6e263) is for the wrong volume, that could be a problem: Dec 08 01:45:37.699221 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Updating volume 2a48ae08-2833-488c-8771-eea86f41c7b8 BDM record with {'attachment_id': u'769b75cc-4283-4b46-a975-243faae6e263', 'no_device': None, 'connection_info': '{"status": "reserved", "multiattach": true, "detached_at": "", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "attach_mode": "null", "driver_volume_type": "iscsi", "instance": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "attached_at": "", "serial": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_portal": "213.32.74.238:3260", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_lun": 1, "device_path": "/dev/sdb", "auth_password": "Ko6DLxWNfrbqWQ7E", "auth_username": "oAJ38A88Rom7xrwQ5A4Y", "auth_method": "CHAP"}}', 'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'source_type': 'volume', 'snapshot_id': None, 'destination_type': 'volume'} {{(pid=20206) swap_volume /opt/stack/nova/nova/compute/manager.py:5819}} Probably the bigger problem is the BDM for volume1 is updated to have volume_id fa56e173-3bd4-4701-bc71-cb4810c53c51 so we changed the volume_id on the bdm for volume1, so that might have something to do with orphaning volume1's connections when we go to delete - since deleting server2 (which still has volume1 attached) likely doesn't cleanup volume1 properly. The other things I'm wondering about are: 1. Is nova-compute deleting the old volume1/server1 attachment record here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5691 because I didn't see anything about attachment delete in the logs. 2. Is there a periodic task in the cinder-volume service that is recreating the target connection for volume1 and server1 in between the time we swap the volume and nova-compute deletes the connection but before nova-compute deletes the old attachment record? The first and most obvious thing to do is fix that bdm update code at the end of the ComputeManager.swap_volume method. I'm not exactly sure yet how the multi-host nature of this comes into play, except possibly in the libvirt driver volume disconnect code here: https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/virt/libvirt/driver.py#L1221
2018-12-10 16:41:45 Matt Riedemann bug task added cinder
2018-12-10 16:41:56 Matt Riedemann cinder: status New Confirmed
2018-12-10 16:41:59 Matt Riedemann cinder: importance Undecided Medium
2018-12-10 19:44:59 Matt Riedemann bug task deleted cinder
2018-12-11 20:58:08 Matt Riedemann bug task added tempest
2018-12-11 20:58:44 Matt Riedemann tempest: status New In Progress
2018-12-11 20:58:46 Matt Riedemann tempest: importance Undecided High
2018-12-11 20:58:48 Matt Riedemann tempest: assignee Matt Riedemann (mriedem)
2018-12-11 20:58:52 Matt Riedemann bug task deleted nova
2018-12-11 20:59:16 Matt Riedemann summary swap multiattach volume intermittently fails when servers are on different hosts test_volume_swap_with_multiattach intermittently fails during cleanup
2019-02-05 13:40:11 Matt Riedemann tempest: importance High Medium
2019-02-05 13:42:06 Matt Riedemann summary test_volume_swap_with_multiattach intermittently fails during cleanup test_volume_swap_with_multiattach intermittently fails during cleanup when run in a multi-node job
2020-09-23 22:53:13 Lee Yarwood tempest: status In Progress New
2020-09-23 22:53:15 Lee Yarwood tempest: assignee Matt Riedemann (mriedem)
2020-09-23 22:53:46 Lee Yarwood tempest: assignee Lee Yarwood (lyarwood)