Cinder fails to automatically map availability zones to volume types

Bug #1999706 reported by Alan Baghumian
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
In Progress
High
Unassigned

Bug Description

Scenario:

- Multi-AZ OpenStack deployment

$ openstack availability zone list --compute
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| az1 | available |
| az2 | available |
| internal | available |
+-----------+-------------+

- Multiple Cinder-Ceph storage backends each available in their respective AZ

$ openstack availability zone list --volume
+-----------+---------------+
| Zone Name | Zone Status |
+-----------+---------------+
| az1 | available |
| nova | not available |
| az2 | available |
+-----------+---------------+

- Each Cinder storage back-ends is tied to a unique volume type

$ openstack volume type list --long
+--------------------------------------+-------------+-----------+---------------------+------------------------------------------------------------------------------+
| ID | Name | Is Public | Description | Properties |
+--------------------------------------+-------------+-----------+---------------------+------------------------------------------------------------------------------+
| 84acc1d5-4a83-45b4-b4bd-beb9bd655c0c | ssd | True | None | RESKEY:availability_zones='nova,az2', volume_backend_name='cinder-ceph-ssd' |
| 332299a5-4b68-4663-b316-1df9735646a8 | nvme | True | None | RESKEY:availability_zones='nova,az1', volume_backend_name='cinder-ceph-nvme' |
| 07fc3552-b546-4eeb-ba4a-08d0b03c5c1b | __DEFAULT__ | True | Default Volume Type | |
+--------------------------------------+-------------+-----------+---------------------+------------------------------------------------------------------------------+

$ openstack volume service list --long | grep -v disabled
+------------------+--------------------------------------+------+----------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated At | Disabled Reason |
+------------------+--------------------------------------+------+----------+-------+----------------------------+-----------------+
| cinder-backup | cinder | nova | enabled | up | 2022-12-15T02:15:19.000000 | None |
| cinder-volume | cinder@cinder-ceph-nvme | az1 | enabled | up | 2022-12-15T02:15:19.000000 | None |
| cinder-scheduler | cinder | nova | enabled | up | 2022-12-15T02:15:23.000000 | None |
| cinder-volume | cinder@cinder-ceph-ssd | az2 | enabled | up | 2022-12-15T02:15:24.000000 | None |
+------------------+--------------------------------------+------+----------+-------+----------------------------+-----------------+

Nova Cloud Controller has been configured to disallow volume cross-az-attach, otherwise volumes will be created in an inaccessible availability zone

$ juju config nova-cloud-controller cross-az-attach=false

Cinder should be able to calculate respective volume types when a respective AZ is specified, however it fails to create volumes:

$ openstack volume create --availability-zone az2 --size 10 az2-volume-test
Availability zone 'az2' is invalid. (HTTP 400) (Request-ID: req-4a4ce41d-cf57-4157-8c4b-2cad8edd28aa)

Setting these juju configurations does not make a difference:

$ juju config cinder-ceph-ssd backend-availability-zone=az2
$ juju config cinder-ceph-nvme backend-availability-zone=az1

Tracing the process shows that the issue is related to _get_volume_type() call in the _extract_availability_zones() function (trace log attached).

Looking into _get_volume_type(), it appears to be missing the logic to calculate proper volume types matching avalability zones defined with RESKEY:availability_zones.

I made a small patch that addresses that issue (attached).

Once the patch is applied, volumes are properly created mapping correctly to correct volume types:

$ openstack volume create --availability-zone az2 --size 10 az2-volume-test
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| attachments | [] |
| availability_zone | az2 |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2022-12-15T02:24:24.516545 |
| description | None |
| encrypted | False |
| id | 0d78adb5-aa4f-42a1-ab75-902603ff8d51 |
| migration_status | None |
| multiattach | False |
| name | az2-volume-test |
| properties | |
| replication_status | None |
| size | 10 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| type | ssd |
| updated_at | None |
| user_id | e40ba17bc77c462baa57b1f578b9b719 |
+---------------------+--------------------------------------+

Tags: az sts
Revision history for this message
Alan Baghumian (alanbach) wrote :
Revision history for this message
Alan Baghumian (alanbach) wrote :

Adding the trace log. This was tested on cinder-* 2:19.1.1-0ubuntu1~cloud0

Revision history for this message
Alan Baghumian (alanbach) wrote :

This becomes a more serious issue when launching volume backed instances, since lack of this functionality causes all instance launches in az2 to ERROR due to the above volume mapping issue.

tags: added: sts
tags: added: az
Changed in cinder:
importance: Undecided → High
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hi Alan,
Would you mind proposing your patch upstream (On Gerrit) ?
https://docs.openstack.org/contributors/code-and-documentation/using-gerrit.html
Thanks in advance
Sofia

Revision history for this message
Alan Baghumian (alanbach) wrote (last edit ):

Hi Sofia,

Done.

This is the link: https://review.opendev.org/c/openstack/cinder/+/868539

Please let me know if you need me to do anything else.

Best,
Alan

Revision history for this message
Jay Jahns (jjahns) wrote :

Hi folks,

We are seeing this issue in a multi-AZ environment, where backends are specific to the AZ. There are volume types that need to be used specifically for key/values.

Right now, it will use __DEFAULT__ and while that works, it makes moving VMs across AZs with bootable volumes more complex to accomplish because the volume type is the same.

Is there any prioritization to getting this done and backported?

Revision history for this message
Alan Baghumian (alanbach) wrote :

Hello,

I updated the patch set to include unit tests, and uploaded a fresh review:

https://review.opendev.org/c/openstack/cinder/+/893593

This builds and passes all the tests in a clean jammy chroot with all the required build dependencies installed.

Best,
Alan

Changed in cinder:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.