2018-12-10 15:58:11 |
Matt Riedemann |
description |
This is found from debugging the tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap failures in the (multinode) tempest-slow job here:
https://review.openstack.org/#/c/606981/
The failure is ultimately during teardown of the test class, it fails to delete the "old" multiattach volume:
http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/controller/logs/screen-c-vol.txt.gz?level=TRACE#_Dec_08_01_45_41_380161
Dec 08 01:45:41.380161 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [req-4aa23d64-309a-44a4-a7c9-381a6fbef330 req-f47e0ea9-7e80-4481-bce1-7a1060c6728b tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command.
Dec 08 01:45:41.380475 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8
Dec 08 01:45:41.380783 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Exit code: 22
Dec 08 01:45:41.381094 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stdout: u'Command:\n\ttgtadm -C 0 --mode target --op delete --tid=1\nexited with code: 22.\n'
Dec 08 01:45:41.381386 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stderr: u'tgtadm: this target is still active\n': ProcessExecutionError: Unexpected error while running command.
Dec 08 01:45:49.587081 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [None req-9dddc5b9-7e06-4ab9-83a6-ef211e0d4c48 tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command.
Dec 08 01:45:49.587355 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8
After digging through the logs, I can see at least one set of issues in the ComputeManager.swap_volume code here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5798-L5806
Which is updating the "old" bdm with values from the new volume, including updating the bdm.volume_id from the old to the new, which likely orphans the old record somehow, or messes up the cleanup.
My detailed notes follow:
server1:
{"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:11.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-1", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}}
server1.hostId = f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19
server1 builds on the subnode (ubuntu-xenial-ovh-gra1-0001066278)
server2:
{"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:10.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-2", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}}
server2.hostId = 5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df
server2 builds on the primary node (ubuntu-xenial-ovh-gra1-0001066161)
So clearly the two servers are on different hosts. Also, volume1 is on the subnode (same as server1).
attach volume1 to server1:
Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437
attach volume1 to server2:
Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437
swap volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) to volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) on server1 (e0b74695-f9a0-4f4c-8a08-a0de3420cc6c):
2018-12-08 01:43:28.336 8815 INFO tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request (TestMultiAttachVolumeSwap:test_volume_swap_with_multiattach): 202 PUT https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8 0.618s
2018-12-08 01:43:28.337 8815 DEBUG tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>', 'X-OpenStack-Nova-API-Version': '2.60'}
Body: {"volumeAttachment": {"volumeId": "fa56e173-3bd4-4701-bc71-cb4810c53c51"}}
Response - Headers: {u'x-openstack-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-length': '0', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'server': 'Apache/2.4.18 (Ubuntu)', u'x-compute-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-type': 'application/json', 'content-location': 'https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8', u'date': 'Sat, 08 Dec 2018 01:43:27 GMT', 'status': '202', u'openstack-api-version': 'compute 2.60', u'connection': 'close', u'x-openstack-nova-api-version': '2.60'}
now the test waits for the swap to complete and the attachments are all recorded correctly;
once the swap is done, volume1 should be attached to server2 and volume2 should be attached
to server1
This is the connection_info for the old/new volume during the swap:
Dec 08 01:43:31.321263 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Calling driver volume swap with connection infos:
new: {'status': u'reserved', 'multiattach': True, 'detached_at': u'', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'attach_mode': u'null', 'driver_volume_type': u'iscsi', 'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', 'attached_at': u'', 'serial': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'data': {u'access_mode': u'rw', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_portal': u'213.32.74.238:3260', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_lun': 1, u'auth_password': u'***', u'auth_username': u'oAJ38A88Rom7xrwQ5A4Y', u'auth_method': u'CHAP'}};
old: {u'status': u'reserved', u'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', u'multiattach': True, u'attached_at': u'', u'attach_mode': u'null', u'driver_volume_type': u'iscsi', u'detached_at': u'', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'serial': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'data': {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}} {{(pid=20206) _swap_volume /opt/stack/nova/nova/compute/manager.py:5646}}
When the test cleans up and tries to delete volume1, it fails trying to cleanup iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 from the subnode (which is the old connection to server1).
volume1 connection is supposed to be detached from server1 here:
http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/compute1/logs/screen-n-cpu.txt.gz#_Dec_08_01_45_36_699753
Dec 08 01:45:36.699753 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG os_brick.initiator.connectors.iscsi [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] ==> disconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fe1aa096d50>, {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}, None), 'kwargs': {}}" {{(pid=20206) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:146}}
It says it logged out:
Dec 08 01:45:37.336156 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: Logout of [sid: 1, target: iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8, portal: 149.202.185.218,3260] successful.
This definitely looks wrong (post-swap volume, updating old BDM record), note it's updating the BDM for volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) but with the volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) connection_info - note that for the new volume attach flow the connection_info might not matter since we likely don't use it (we just work with the attachment records), but if
the attachment_id (769b75cc-4283-4b46-a975-243faae6e263) is for the wrong volume, that could be a problem:
Dec 08 01:45:37.699221 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Updating volume 2a48ae08-2833-488c-8771-eea86f41c7b8 BDM record with {'attachment_id': u'769b75cc-4283-4b46-a975-243faae6e263', 'no_device': None, 'connection_info': '{"status": "reserved", "multiattach": true, "detached_at": "", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "attach_mode": "null", "driver_volume_type": "iscsi", "instance": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "attached_at": "", "serial": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_portal": "213.32.74.238:3260", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_lun": 1, "device_path": "/dev/sdb", "auth_password": "Ko6DLxWNfrbqWQ7E", "auth_username": "oAJ38A88Rom7xrwQ5A4Y", "auth_method": "CHAP"}}', 'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'source_type': 'volume', 'snapshot_id': None, 'destination_type': 'volume'} {{(pid=20206) swap_volume /opt/stack/nova/nova/compute/manager.py:5819}}
Probably the bigger problem is the BDM for volume1 is updated to have volume_id fa56e173-3bd4-4701-bc71-cb4810c53c51 so we changed the volume_id on the bdm for volume1, so that might have something to do with orphaning volume1's connections when we go to delete - since deleting server2 (which still has volume1 attached) likely doesn't cleanup volume1 properly.
The other things I'm wondering about are:
1. Is nova-compute deleting the old volume1/server1 attachment record here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5691
because I didn't see anything about attachment delete in the logs.
2. Is there a periodic task in the cinder-volume service that is recreating the target connection for volume1 and server1 in between the time we swap the volume and nova-compute deletes the connection but before nova-compute deletes the old attachment record?
The first and most obvious thing to do is fix that bdm update code at the end of the ComputeManager.swap_volume method. I'm not exactly sure yet how the multi-host nature of this comes into play, except possibly in the libvirt driver volume disconnect code here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/virt/libvirt/driver.py#L1221 |
This is found from debugging the tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap failures in the (multinode) tempest-slow job here:
https://review.openstack.org/#/c/606981/
The failure is ultimately during teardown of the test class, it fails to delete the "old" multiattach volume:
http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/controller/logs/screen-c-vol.txt.gz?level=TRACE#_Dec_08_01_45_41_380161
Dec 08 01:45:41.380161 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [req-4aa23d64-309a-44a4-a7c9-381a6fbef330 req-f47e0ea9-7e80-4481-bce1-7a1060c6728b tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command.
Dec 08 01:45:41.380475 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8
Dec 08 01:45:41.380783 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Exit code: 22
Dec 08 01:45:41.381094 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stdout: u'Command:\n\ttgtadm -C 0 --mode target --op delete --tid=1\nexited with code: 22.\n'
Dec 08 01:45:41.381386 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Stderr: u'tgtadm: this target is still active\n': ProcessExecutionError: Unexpected error while running command.
Dec 08 01:45:49.587081 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: ERROR cinder.volume.targets.tgt [None req-9dddc5b9-7e06-4ab9-83a6-ef211e0d4c48 tempest-TestMultiAttachVolumeSwap-1803326020 None] Failed to remove iscsi target for Volume ID: 2a48ae08-2833-488c-8771-eea86f41c7b8: Unexpected error while running command.
Dec 08 01:45:49.587355 ubuntu-xenial-ovh-gra1-0001066161 cinder-volume[1776]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --force --delete iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8
After digging through the logs, I can see at least one set of issues in the ComputeManager.swap_volume code here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5798-L5806
Which is updating the "old" bdm with values from the new volume, including updating the bdm.volume_id from the old to the new, which likely orphans the old record somehow, or messes up the cleanup.
My detailed notes follow:
server1:
{"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:11.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-1", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}}
server1.hostId = f4600b0fc614ce95d5be30fc1fc82bdb150c47ff8c84b22262be6d19
server1 builds on the subnode (ubuntu-xenial-ovh-gra1-0001066278)
server2:
{"server": {"OS-EXT-STS:task_state": null, "addresses": {}, "links": [{"href": "https://149.202.185.218/compute/v2.1/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "self"}, {"href": "https://149.202.185.218/compute/servers/e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "rel": "bookmark"}], "image": {"id": "f55b4d12-f159-453f-86cb-8965684121c8", "links": [{"href": "https://149.202.185.218/compute/images/f55b4d12-f159-453f-86cb-8965684121c8", "rel": "bookmark"}]}, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2018-12-08T01:43:10.000000", "flavor": {"ephemeral": 0, "ram": 64, "original_name": "m1.nano", "vcpus": 1, "extra_specs": {}, "swap": 0, "disk": 0}, "id": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "description": null, "user_id": "9bc61597b6794ca49c79ce599c26b223", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "status": "ACTIVE", "updated": "2018-12-08T01:43:11Z", "hostId": "5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df", "OS-SRV-USG:terminated_at": null, "tags": [], "key_name": null, "locked": false, "name": "tempest-TestMultiAttachVolumeSwap-server-226416218-2", "created": "2018-12-08T01:43:04Z", "tenant_id": "5db98765c5d94a73bae8aba2f13c7762", "os-extended-volumes:volumes_attached": [], "metadata": {}}}
server2.hostId = 5f1a8abdf427d6cbda8003bd178ef40c605f24f626ec88c3a02816df
server2 builds on the primary node (ubuntu-xenial-ovh-gra1-0001066161)
So clearly the two servers are on different hosts. Also, volume1 is on the primary node (same as server2).
attach volume1 to server1:
Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437
attach volume1 to server2:
Body: {"volumeAttachment": {"device": "/dev/vdb", "serverId": "e0b74695-f9a0-4f4c-8a08-a0de3420cc6c", "id": "2a48ae08-2833-488c-8771-eea86f41c7b8", "volumeId": "2a48ae08-2833-488c-8771-eea86f41c7b8"}} _log_request_full tempest/lib/common/rest_client.py:437
swap volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) to volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) on server1 (e0b74695-f9a0-4f4c-8a08-a0de3420cc6c):
2018-12-08 01:43:28.336 8815 INFO tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request (TestMultiAttachVolumeSwap:test_volume_swap_with_multiattach): 202 PUT https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8 0.618s
2018-12-08 01:43:28.337 8815 DEBUG tempest.lib.common.rest_client [req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>', 'X-OpenStack-Nova-API-Version': '2.60'}
Body: {"volumeAttachment": {"volumeId": "fa56e173-3bd4-4701-bc71-cb4810c53c51"}}
Response - Headers: {u'x-openstack-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-length': '0', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'server': 'Apache/2.4.18 (Ubuntu)', u'x-compute-request-id': 'req-b6cb2f89-ebf6-47b5-9757-8692dd91c812', u'content-type': 'application/json', 'content-location': 'https://149.202.185.218/compute/v2.1/servers/d46fba31-9469-4799-b2bf-1fbad4369a9a/os-volume_attachments/2a48ae08-2833-488c-8771-eea86f41c7b8', u'date': 'Sat, 08 Dec 2018 01:43:27 GMT', 'status': '202', u'openstack-api-version': 'compute 2.60', u'connection': 'close', u'x-openstack-nova-api-version': '2.60'}
now the test waits for the swap to complete and the attachments are all recorded correctly;
once the swap is done, volume1 should be attached to server2 and volume2 should be attached
to server1
This is the connection_info for the old/new volume during the swap:
Dec 08 01:43:31.321263 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Calling driver volume swap with connection infos:
new: {'status': u'reserved', 'multiattach': True, 'detached_at': u'', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'attach_mode': u'null', 'driver_volume_type': u'iscsi', 'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', 'attached_at': u'', 'serial': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'data': {u'access_mode': u'rw', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_portal': u'213.32.74.238:3260', u'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', u'target_lun': 1, u'auth_password': u'***', u'auth_username': u'oAJ38A88Rom7xrwQ5A4Y', u'auth_method': u'CHAP'}};
old: {u'status': u'reserved', u'instance': u'd46fba31-9469-4799-b2bf-1fbad4369a9a', u'multiattach': True, u'attached_at': u'', u'attach_mode': u'null', u'driver_volume_type': u'iscsi', u'detached_at': u'', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'serial': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'data': {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}} {{(pid=20206) _swap_volume /opt/stack/nova/nova/compute/manager.py:5646}}
When the test cleans up and tries to delete volume1, it fails trying to cleanup iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8 from the subnode (which is the old connection to server1).
volume1 connection is supposed to be detached from server1 here:
http://logs.openstack.org/81/606981/4/check/tempest-slow/fafde23/compute1/logs/screen-n-cpu.txt.gz#_Dec_08_01_45_36_699753
Dec 08 01:45:36.699753 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG os_brick.initiator.connectors.iscsi [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] ==> disconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fe1aa096d50>, {u'device_path': u'/dev/sda', u'target_discovered': False, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8', u'target_portal': u'149.202.185.218:3260', u'volume_id': u'2a48ae08-2833-488c-8771-eea86f41c7b8', u'auth_password': u'***', u'target_lun': 1, u'access_mode': u'rw', u'auth_username': u'HMHytc239biDCT6NTgMa', u'auth_method': u'CHAP'}, None), 'kwargs': {}}" {{(pid=20206) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:146}}
It says it logged out:
Dec 08 01:45:37.336156 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: Logout of [sid: 1, target: iqn.2010-10.org.openstack:volume-2a48ae08-2833-488c-8771-eea86f41c7b8, portal: 149.202.185.218,3260] successful.
This definitely looks wrong (post-swap volume, updating old BDM record), note it's updating the BDM for volume1 (2a48ae08-2833-488c-8771-eea86f41c7b8) but with the volume2 (fa56e173-3bd4-4701-bc71-cb4810c53c51) connection_info - note that for the new volume attach flow the connection_info might not matter since we likely don't use it (we just work with the attachment records), but if
the attachment_id (769b75cc-4283-4b46-a975-243faae6e263) is for the wrong volume, that could be a problem:
Dec 08 01:45:37.699221 ubuntu-xenial-ovh-gra1-0001066278 nova-compute[20206]: DEBUG nova.compute.manager [None req-b6cb2f89-ebf6-47b5-9757-8692dd91c812 tempest-TestMultiAttachVolumeSwap-477435682 tempest-TestMultiAttachVolumeSwap-477435682] [instance: d46fba31-9469-4799-b2bf-1fbad4369a9a] swap_volume: Updating volume 2a48ae08-2833-488c-8771-eea86f41c7b8 BDM record with {'attachment_id': u'769b75cc-4283-4b46-a975-243faae6e263', 'no_device': None, 'connection_info': '{"status": "reserved", "multiattach": true, "detached_at": "", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "attach_mode": "null", "driver_volume_type": "iscsi", "instance": "d46fba31-9469-4799-b2bf-1fbad4369a9a", "attached_at": "", "serial": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_portal": "213.32.74.238:3260", "volume_id": "fa56e173-3bd4-4701-bc71-cb4810c53c51", "target_lun": 1, "device_path": "/dev/sdb", "auth_password": "Ko6DLxWNfrbqWQ7E", "auth_username": "oAJ38A88Rom7xrwQ5A4Y", "auth_method": "CHAP"}}', 'volume_id': u'fa56e173-3bd4-4701-bc71-cb4810c53c51', 'source_type': 'volume', 'snapshot_id': None, 'destination_type': 'volume'} {{(pid=20206) swap_volume /opt/stack/nova/nova/compute/manager.py:5819}}
Probably the bigger problem is the BDM for volume1 is updated to have volume_id fa56e173-3bd4-4701-bc71-cb4810c53c51 so we changed the volume_id on the bdm for volume1, so that might have something to do with orphaning volume1's connections when we go to delete - since deleting server2 (which still has volume1 attached) likely doesn't cleanup volume1 properly.
The other things I'm wondering about are:
1. Is nova-compute deleting the old volume1/server1 attachment record here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5691
because I didn't see anything about attachment delete in the logs.
2. Is there a periodic task in the cinder-volume service that is recreating the target connection for volume1 and server1 in between the time we swap the volume and nova-compute deletes the connection but before nova-compute deletes the old attachment record?
The first and most obvious thing to do is fix that bdm update code at the end of the ComputeManager.swap_volume method. I'm not exactly sure yet how the multi-host nature of this comes into play, except possibly in the libvirt driver volume disconnect code here:
https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/virt/libvirt/driver.py#L1221 |
|