cinder backup tries to restore to wrong host

Bug #1949313 reported by Sam Morrison
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Medium
Sam Morrison

Bug Description

Have discovered a bug when having multiple AZs and cinder-volume and cinder-backup hosts that can't talk to backend storage in different AZ.

Environment:
2 azs, each with a c-vol and c-bak
cinder servers at each az have their own ceph and the ceph is only accessible within the AZ
cinder backup uses swift driver and swift is globally accessible

Steps to reproduce:
create a volume in az1
backup volume
restore volume by creating a new volume and passing in backup id and az1 (or az2, doesn't matter) - this will go to error_restoring 50% of the time

What happens is:
The volume is created at az1 and then logs:

Backend does not support creating volume from backup 967f369f-c269-4393-aebb-14d570cc3baa. It will directly create the raw volume at the backend and then schedule the request to the backup service to restore the volume with backup.

This then has some logic to decide which backup host to use to do the restore.
This logic will then randomly choose a volume host. If it chooses a host in the other az it will fail as it can't talk to the ceph

What it should do is choose a volume host that is in the same AZ as the newly created volume, not the AZ of the backup (which is None)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/816104

Changed in cinder:
status: New → In Progress
Revision history for this message
Sam Morrison (sorrison) wrote :

Should add the cli steps to be clear.

cinder create --availability-zone az1 1
cinder backup-create <vol-id>
cinder create --backup <backup-id> --availability-zone az1 1

Sam Morrison (sorrison)
description: updated
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Greetings Sam Morrison,
Are you using RBD for c-vol or iscsi?
Regards,
Sofia

tags: added: az backup-service ceph restore swift
Changed in cinder:
importance: Undecided → Medium
assignee: nobody → Sam Morrison (sorrison)
Revision history for this message
Sam Morrison (sorrison) wrote :

Using RBD

Revision history for this message
Denis Nurislamov (spumm) wrote :
Download full text (3.2 KiB)

Hello,

The option "backup_use_same_host" partially solves the problem of scheduling restore tasks for backups. As described in the following link, this option allows backup services to use the same backend:
https://docs.openstack.org/cinder/pike/sample_config.html#:~:text=%23backup_use_same_host%20%3D%20false. Therefore, all backup tasks will be scheduled on the same host.

However, this solution has limitations. It only allows restoring backups to the same availability zone (AZ) where the backup was created.
Additionally, managing backups is difficult for users as the "host" field is not visible. Moreover, when creating a volume from a backup or restoring it to a new volume, the AZ must be set explicitly.
Otherwise, the volume will be created in another AZ, to which storage there will be no access from cinder-backup.

As far as I understand, the AZ field is no longer filled for created backups, as seen in this commit:
https://github.com/openstack/cinder/commit/f0211b53b82cde96eb39e9914c3e5fae232db07c

The availability zone is always set to None.

Therefore, when we filtering hosts for task, the AZ not specified, as shown it this part of the code:
https://github.com/openstack/cinder/blob/b75c29c7d8e0e6ac212b59f9ad8d140874e55251/cinder/backup/api.py#L443-L444

This small patch resolved problem with restoring backups.

```
Index: cinder/volume/flows/manager/create_volume.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/cinder/volume/flows/manager/create_volume.py b/cinder/volume/flows/manager/create_volume.py
--- a/cinder/volume/flows/manager/create_volume.py (revision f9941d2fb3064d2b9de397fbafe27af5b5247bac)
+++ b/cinder/volume/flows/manager/create_volume.py (date 1679903102534)
@@ -1091,7 +1091,7 @@
             volume.save()

             backup_host = self.backup_api.get_available_backup_service_host(
- backup.host, backup.availability_zone)
+ backup.host, volume.availability_zone)
             updates = {'status': fields.BackupStatus.RESTORING,
                        'restore_volume_id': volume.id,
                        'host': backup_host}
Index: cinder/backup/api.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/cinder/backup/api.py b/cinder/backup/api.py
--- a/cinder/backup/api.py (revision f9941d2fb3064d2b9de397fbafe27af5b5247bac)
+++ b/cinder/backup/api.py (date 1679908441677)
@@ -394,7 +394,7 @@
         # Setting the status here rather than setting at start and unrolling
         # for each error condition, it should be a very small window
         backup.host = self._get_available_backup_service_host(
- backup.host, backup.availability_zone)
+ backup.host, volume.availability_zone)
         backup.status = fields.BackupStatus.RESTORING
         backup.restore_volume_id = volume.id
         backup.save()
```

However, it should be noted that this patch does not cover cases where cinder backup service in each availability zone has ...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/879531

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/879534

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Denis Nurislamov <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/879531
Reason: invalid

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Denis Nurislamov <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/879534
Reason: invalid

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.