I found a race condition in the Cinder backup manager:
- when the Cinder backup service starts
- it loads the configured backup driver
- and it calls the setup_backup_backend function two times
1. from init_host()
2. and from _report_driver_status()
at the same time, and this causes the __init__ method from the configured driver to run again.
And this (for example for NFS backup driver) in turn, leads in an attempt to re-mount the NFS resource to the same mount point.
Steps to reproduce:
1. Configure NFS backup driver and enable debug:
cinder.conf
[DEFAULT]
debug = True
backup_driver = cinder.backup.drivers.nfs.NFSBackupDriver
backup_share = 10.3.35.43:/volumes/pool1/backup/ns5_nfs
backup_mount_options = vers=3
2. Restart cinder-backup service and check the service log file:
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61505]: INFO oslo_service.service [None req-b6b67a1d-ac40-46a9-8cfb-5927888c9f08 None None] Starting 1 workers
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.007s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.019s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.034s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG os_brick.remotefs.remotefs [-] Mounted 10.3.35.43:/volumes/pool1/backup/ns5_nfs using nfs. {{_mount_nfs /opt/stack/os-brick/os_brick/remotefs/remotefs.py:152}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.057s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG os_brick.remotefs.remotefs [-] Mounted 10.3.35.43:/volumes/pool1/backup/ns5_nfs using nfs. {{_mount_nfs /opt/stack/os-brick/os_brick/remotefs/remotefs.py:152}}
As you can see, the Cinder backup manager ran the __init__ function from the NFS driver two times at the same time.
As a workaround I added coordination lock to the NFS driver:
$ git diff
diff --git a/cinder/backup/drivers/nfs.py b/cinder/backup/drivers/nfs.py
index 886c2cb..0434d50 100644
--- a/cinder/backup/drivers/nfs.py
+++ b/cinder/backup/drivers/nfs.py
@@ -26,6 +26,7 @@ from oslo_config import cfg
from oslo_log import log as logging
from cinder.backup.drivers import posix
+from cinder import coordination
from cinder import exception
from cinder import interface
from cinder import utils
@@ -78,6 +79,7 @@ class NFSBackupDriver(posix.PosixBackupDriver):
raise exception.InvalidConfigurationValue(option=flag,
value=val)
+ @coordination.synchronized('check-for-race')
def _init_backup_repo_path(self):
if self.backup_share is None:
LOG.info("_init_backup_repo_path: "
But it would be better to remove second call of setup_backup_backend function from _report_driver_status function:
diff --git a/cinder/backup/manager.py b/cinder/backup/manager.py
index 203ce12..0026f21 100644
--- a/cinder/backup/manager.py
+++ b/cinder/backup/manager.py
@@ -1089,8 +1089,6 @@ class BackupManager(manager.SchedulerDependentManager):
self._publish_service_capabilities(context)
def _report_driver_status(self, context):
- if not self.is_working():
- self.setup_backup_backend(context)
backup_stats = {
'backend_state': self.is_working(),
'driver_name': self.driver_name,
Fix proposed to branch: master /review. opendev. org/741543
Review: https:/