Support service tokens to prevent failures of long-running (1-3+ hours) retype/migration jobs

Bug #1986886 reported by Trent Lloyd
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Confirmed
Undecided
Unassigned

Bug Description

If you re-type a Cinder volume it is moved from one storage provider to another (live). This process can take hours or even days. In such a case the original Keystone token expires and the migration cannot be completed.

The migration gets stuck in the status "migrating" and does not error out, revert or complete. There is no automated or easy process to complete the migration. In the case of a Ceph migration this leaves the VM in a dangerous state - if you stop and start the VM it will revert from the new storage to the old storage rolling back all of the data days, week or months. While both volumes still exist depending on the application reconciling this can be very difficult.

From the upstream bug here:
https://bugs.launchpad.net/cinder/+bug/1969408

This can sometimes be prevented using a service token:
https://docs.openstack.org/cinder/latest/configuration/block-storage/service-token.html

This may still not entirely solve the issue as the default fernet rotation and token expiration is 3 * 1 hour = 3 hours and we may need to make further improvements in the default Keystone configuration. As documented in "Troubleshooting #3" above.

The default keystone allow_expired_window may also not be sufficient as in this specific case the retype took 7 days but the default allow_expired_window is 2 days.

Tags: sts
Trent Lloyd (lathiat)
Changed in charm-cinder:
status: New → Confirmed
tags: added: sts
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.