Bug #2019190 “[RBD] Retyping of in-use boot volumes renders inst...” : Series wallaby : Bugs : Cinder

Revision history for this message

Sofia Enriquez (lsofia-enriquez) wrote on 2023-05-17:

#1

Hello Alexander Käb,

To clarify:
- (double check) Are instances created from volumes, or are volumes attached to an instance? Can you share the command you are using to do this (steps).
- Is the data on the volumes encrypted?
- Have you encountered any errors in the cinder c-vol logs? Could you share the c-vol log?

Thanks!

tags:	added: drivers live-migration nova rbd retype
Changed in cinder:
importance:	Undecided → Medium
summary:	- Retyping of in-use boot volumes renders instances unusable (possible - data corruption) + [RBD] Retyping of in-use boot volumes renders instances unusable + (possible data corruption)

Revision history for this message

Sofia Enriquez (lsofia-enriquez) wrote on 2023-05-17:

#2

Adding Nova because the report indicates that the volume is migrated to a different ceph pool but the instance points to the old location.

Revision history for this message

Alexander Käb (alexander-kaeb) wrote on 2023-05-22:

#3

cinder-volume.log Edit (359.4 KiB, text/plain)

Hi Sofia,

all the tested instances were created from an image, with the option `Create New Volume`
checked, when creating an instance via the dashboard. The steps performed to retype the
volumes are as follows:

- Either via the Dashboard or the CLI (`cinder retype --migration-policy on-demand [...]`) retype the volume from either slow to fast or fast to slow
- rebooting the instance using i.e. a soft reboot

Just these two steps are enough to bring the instance to an error state, as libvirt will
try to load the instance's volume from the pre-retype location which will fail.
Sometimes live-migrating the instance after the retype can lead to the instance working again, but
if the instance performs some IO-Operations, there is a great chance, that the FS is broken
after an reboot:

```
[[0;32m OK [0m] Stopped target [0;1;39mBasic System[0m.
[[0;32m OK [0m] Reached target [0;1;39mInitrd File Systems[0m.
[[0;32m OK [0m] Stopped target [0;1;39mSystem Initialization[0m.
[[0;32m OK [0m] Stopped [0;1;39mdracut pre-mount hook[0m.
[[0;32m OK [0m] Stopped [0;1;39mdracut initqueue hook[0m.
[[0;32m OK [0m] Stopped [0;1;39mdracut pre-trigger hook[0m.
[[0;32m OK [0m] Stopped [0;1;39mdracut pre-udev hook[0m.
[[0;32m OK [0m] Stopped [0;1;39mdracut cmdline hook[0m.
[[0;32m OK [0m] Started [0;1;39mEmergency Shell[0m.
[[0;32m OK [0m] Reached target [0;1;39mEmergency Mode[0m.

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

[?2004h:/#
```

Attached you will find the cinder-volume log and the nova-compute log during an earlier
test. (debug log enabled)

Revision history for this message

Alexander Käb (alexander-kaeb) wrote on 2023-05-22:

#4

nova-compute.log Edit (2.2 MiB, text/html)

nova log

Revision history for this message

melanie witt (melwitt) wrote on 2023-07-18:

#5

Generally, nova gets the volume locations from cinder as a field called 'connection_info' which belongs to a volume attachment.

The way retype usually works is cinder creates a new empty volume with the destination volume type and then calls the nova swap_volume API [1] to swap the volume from the original source volume to the new destination volume. Nova will call the cinder API to create a new attachment for the destination volume. Then, nova gathers the nova-compute host connector and calls the cinder API to update the attachment with the host connector. Cinder API returns the new connection_info from this call. Nova calls down into the libvirt driver to connect the new volume and copy the volume data from the old volume to the new volume, using the new connection_info for the destination libvirt XML. Finally, Nova disconnects the old volume.

However from what I can tell reading the code, in the case of the RBD driver on the cinder side, I don't see that nova is called at all as part of the retyping process, so it doesn't know about the new volume location when it goes to generate the guest XML.

I found mention about this issue on the ceph-users mailing list recently as well:

https://<email address hidden>/thread/TJO6YBJFHCY743UPQDY4D4PENZDQFAHH

which pointed to these posts on the openstack-discuss mailing list:

https://lists.openstack.org/pipermail/openstack-discuss/2023-June/034160.html

https://lists.openstack.org/pipermail/openstack-discuss/2023-June/034165.html

According to the second post, the retype of attached RBD volumes was working in Victoria as long as the [nova] section of the cinder.conf was configured and then it stopped working in Wallaby. The second post noted https://bugs.launchpad.net/cinder/+bug/1886543 as the only change around retype for Wallaby, so is it possible that is related?

I think this bug is Critical given it's a regression and has potential for data loss. Please let me know if I’ve got anything wrong here and/or if anything is needed on the nova side.

[1] https://github.com/openstack/cinder/blob/5728d3899f13140203d44259ca8dfb7ae132e192/cinder/volume/manager.py#L2429

Generally, nova gets the volume locations from cinder as a field called 'connection_info' which belongs to a volume attachment.

The way retype usually works is cinder creates a new empty volume with the destination volume type and then calls the nova swap_volume API [1] to swap the volume from the original source volume to the new destination volume. Nova will call the cinder API to create a new attachment for the destination volume. Then, nova gathers the nova-compute host connector and calls the cinder API to update the attachment with the host connector. Cinder API returns the new connection_info from this call. Nova calls down into the libvirt driver to connect the new volume and copy the volume data from the old volume to the new volume, using the new connection_info for the destination libvirt XML. Finally, Nova disconnects the old volume.

However from what I can tell reading the code, in the case of the RBD driver on the cinder side, I don't see that nova is called at all as part of the retyping process, so it doesn't know about the new volume location when it goes to generate the guest XML.

I found mention about this issue on the ceph-users mailing list recently as well:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TJO6YBJFHCY743UPQDY4D4PENZDQFAHH

which pointed to these posts on the openstack-discuss mailing list:

https://lists.openstack.org/pipermail/openstack-discuss/2023-June/034160.html

https://lists.openstack.org/pipermail/openstack-discuss/2023-June/034165.html

According to the second post, the retype of attached RBD volumes was working in Victoria as long as the [nova] section of the cinder.conf was configured and then it stopped working in Wallaby. The second post noted https://bugs.launchpad.net/cinder/+bug/1886543 as the only change around retype for Wallaby, so is it possible that is related?

I think this bug is Critical given it's a regression and has potential for data loss. Please let me know if I’ve got anything wrong here and/or if anything is needed on the nova side.
 
[1] https://github.com/openstack/cinder/blob/5728d3899f13140203d44259ca8dfb7ae132e192/cinder/volume/manager.py#L2429

Sofia Enriquez (lsofia-enriquez) on 2023-07-19

Changed in cinder:
importance:	Medium → Critical

Eric Harney (eharney) on 2023-07-20

Changed in cinder:
assignee:	nobody → Eric Harney (eharney)

Revision history for this message

melanie witt (melwitt) wrote on 2023-07-21:

#6

I spent some time on this and I was able to reproduce the bug.

I am not sure exactly how RBD assisted volume migration is supposed to work but there is no call to Nova happening, so Nova doesn't know anything has changed. That point kind of doesn't matter though because AFAICT there is no existing API call that could be used to tell Nova, "point at the new volume location without copying any volume data to it". The only API we have at present is the swap volume API and there's no way to tell it not to copy volume data.

The other issue I see is that the volume attachment connection_info on the Cinder side does not itself get updated with the new volume location. So even if Nova was able to pull new connection_info from Cinder [1], it would still fail to boot because the new volume location isn't there.

Based on the fact that we don't have an API to tell Nova about the new volume location without copying data, I'm not sure what we can do to immediately fix this other than revert the patch that changed the mechanism for RBD volume retype.

For a future fix, I "think" it would not be difficult to add a "do not copy" type of flag to the PUT /servers/{server_id}/os-volume_attachments/{volume_id} API in Nova [2]. Then after the retype Cinder could call Nova to say "this volume moved but don't copy any data there".

Here are the steps I used to reproduce the issue:

https://paste.openstack.org/show/bNpzkjbeXrmTCwNHfDGs

No volumes are encrypted and the [nova] section is configured in cinder.conf.

[1] https://docs.openstack.org/nova/latest/cli/nova-manage.html#volume-attachment-refresh
[2] https://docs.openstack.org/api-ref/compute/?expanded=update-a-volume-attachment-detail#update-a-volume-attachment

Revision history for this message

melanie witt (melwitt) wrote on 2023-08-04:

#7

Download full text (9.4 KiB)

I uploaded a DNM tempest patch to run modified TestVolumeMigrateRetypeAttached tests in tempest/scenario/test_volume_migrate_attached.py with the master, stable/wallaby, and stable/victoria branches [1]:

https://review.opendev.org/c/openstack/tempest/+/890360

The tests in ^ are modified to add a hard reboot of the instance at the end.

The migrate volume test passes in all branches while the retype volume test fails in master and stable/wallaby but passes in stable/victoria [2].

The unmodified tests will pass because they aren't hard rebooting the server to cause regeneration of guest XML.

In the test logs on the DNM patch [2], I think I might have also found why migrate works while retype fails.

The RBD driver [3] makes a decision about which path to take based on the volume status. In the test logs, it's showing that for migrate, the volume is 'in-use' and the RBD driver (correctly) considers this case to be a move across different pools and falls back to a generic migrate which calls the Nova swap volume API. For retype however, the volume status is 'retyping' so it doesn't refuse the assisted migration and it goes ahead.

Excerpts from the c-vol log:

migrate volume:

Aug 03 22:24:16.833416 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.manager [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Issue driver.migrate_volume. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/manager.py:2609}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Attempting RBD assisted volume migration. volume: 9a27b9cd-e6e5-4f29-a127-a030e94c5356, host: {'host': 'np0034853654@ceph2#ceph2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 24.56, 'free_capacity_gb': 24.56, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:018eb22d-04d2-464f-8294-675d033013df:cinder:othervolumes', 'backend_state': 'up', 'volume_backend_name': 'ceph2', 'replication_enabled': False, 'allocated_capacity_gb': 0, 'filter_function': None, 'goodness_function': None, 'timestamp': '2023-08-03T22:23:59.050934'}}, status=in-use. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1924}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG os_brick.initiator.linuxrbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] opening connection to ceph cluster (timeout=-1). {{(pid=116332) connect /opt/stack/os-brick/os_brick/initiator/linuxrbd.py:70}}
Aug 03 22:24:16.861112 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Au...

I uploaded a DNM tempest patch to run modified TestVolumeMigrateRetypeAttached tests in tempest/scenario/test_volume_migrate_attached.py with the master, stable/wallaby, and stable/victoria branches [1]:

https://review.opendev.org/c/openstack/tempest/+/890360

The tests in ^ are modified to add a hard reboot of the instance at the end.

The migrate volume test passes in all branches while the retype volume test fails in master and stable/wallaby but passes in stable/victoria [2].

The unmodified tests will pass because they aren't hard rebooting the server to cause regeneration of guest XML.

In the test logs on the DNM patch [2], I think I might have also found why migrate works while retype fails.

The RBD driver [3] makes a decision about which path to take based on the volume status. In the test logs, it's showing that for migrate, the volume is 'in-use' and the RBD driver (correctly) considers this case to be a move across different pools and falls back to a generic migrate which calls the Nova swap volume API. For retype however, the volume status is 'retyping' so it doesn't refuse the assisted migration and it goes ahead.

Excerpts from the c-vol log:

migrate volume:

Aug 03 22:24:16.833416 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.manager [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Issue driver.migrate_volume. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/manager.py:2609}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Attempting RBD assisted volume migration. volume: 9a27b9cd-e6e5-4f29-a127-a030e94c5356, host: {'host': 'np0034853654@ceph2#ceph2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 24.56, 'free_capacity_gb': 24.56, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:018eb22d-04d2-464f-8294-675d033013df:cinder:othervolumes', 'backend_state': 'up', 'volume_backend_name': 'ceph2', 'replication_enabled': False, 'allocated_capacity_gb': 0, 'filter_function': None, 'goodness_function': None, 'timestamp': '2023-08-03T22:23:59.050934'}}, status=in-use. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1924}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG os_brick.initiator.linuxrbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] opening connection to ceph cluster (timeout=-1). {{(pid=116332) connect /opt/stack/os-brick/os_brick/initiator/linuxrbd.py:70}}
Aug 03 22:24:16.861112 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:24:16.889427 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Migration in-use volume between different pools. Falling back to generic migration. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1958}}

retype volume:

Aug 03 22:25:38.269904 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Issue driver.migrate_volume. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/manager.py:2609}}
Aug 03 22:25:38.271201 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Attempting RBD assisted volume migration. volume: 59681499-b1b4-4fcb-af7a-24a64ded93df, host: {'host': 'np0034853654@ceph2#ceph2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 24.61, 'free_capacity_gb': 24.61, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:018eb22d-04d2-464f-8294-675d033013df:cinder:othervolumes', 'backend_state': 'up', 'volume_backend_name': 'ceph2', 'replication_enabled': False, 'allocated_capacity_gb': 0, 'filter_function': None, 'goodness_function': None, 'timestamp': '2023-08-03T22:24:59.055464'}}, status=retyping. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1924}}
Aug 03 22:25:38.271569 np0034853654 cinder-volume[116332]: DEBUG os_brick.initiator.linuxrbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] opening connection to ceph cluster (timeout=-1). {{(pid=116332) connect /opt/stack/os-brick/os_brick/initiator/linuxrbd.py:70}}
Aug 03 22:25:38.289482 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:38.309237 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.129720 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.170401 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] volume has no backup snaps {{(pid=116332) _delete_backup_snaps /opt/stack/cinder/cinder/volume/drivers/rbd.py:1104}}
Aug 03 22:25:39.170857 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df is not a clone. {{(pid=116332) _get_clone_info /opt/stack/cinder/cinder/volume/drivers/rbd.py:1127}}
Aug 03 22:25:39.177397 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] deleting rbd volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df {{(pid=116332) delete_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1247}}
Aug 03 22:25:39.242308 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] moving volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df to trash {{(pid=116332) _try_remove_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1235}}
Aug 03 22:25:39.318993 np0034853654 cinder-volume[116332]: INFO cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Successful RBD assisted volume migration.
Aug 03 22:25:39.347198 np0034853654 cinder-volume[116332]: INFO cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Migrate volume completed successfully.
Aug 03 22:25:39.383312 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.410340 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.435421 np0034853654 cinder-volume[116332]: DEBUG cinder.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Notifying Schedulers of capabilities ... {{(pid=116332) _publish_service_capabilities /opt/stack/cinder/cinder/manager.py:197}}
Aug 03 22:25:39.442048 np0034853654 cinder-volume[116332]: INFO cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Retype volume completed successfully.

[1] (Note: I tried to include stable/train in the test runs but the job was unable to complete devstack setup due to failure to install the dependencies needed to run with the tempest master branch)
[2] https://review.opendev.org/c/openstack/tempest/+/890360
[3] https://github.com/openstack/cinder/blob/ff4b1c910e65274efcbc0fd052f1f9bc5a643603/cinder/volume/drivers/rbd.py#L2214

Revision history for this message

Luigi Toscano (ltoscano) wrote on 2023-09-06:

#8

Can the tempest patch be resurrect and pushed as a proper patch? I didn't notice this comment (sorry) and ended up writing a simpler version, which I'm going to abandon: https://review.opendev.org/c/openstack/tempest/+/893863

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-09-22: Related fix proposed to cinder (master)

#9

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/896172

Revision history for this message

melanie witt (melwitt) wrote on 2023-09-27:

#10

Thank you Luigi for pointing that out!

I have pushed a proper patch and proposed two more patches as well to enable us to configure Ceph in devstack to use a separate Ceph pool per backend:

* tempest patch to test regression: https://review.opendev.org/c/openstack/tempest/+/890360

* devstack-plugin-ceph patch to enable config of separate Ceph pools: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/895533

* cinder patch to add a cinder-tempest-ceph-multibackend job: https://review.opendev.org/c/openstack/cinder/+/896172

Revision history for this message

Yusuf Güngör (yusuf2) wrote on 2023-10-02:

#11

Hi everyone, on our test we have a workaround. After volume retype, cold migrating the instance updates the pool name on guest xml and creates a new volume attachment which contains the new pool name on the attachment properties.

Cinder

[RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
Cinder	New	Critical	Eric Harney
Wallaby	New	Critical	Unassigned
OpenStack Compute (nova)	New	Undecided	Unassigned