shared_targets_online_data_migration fails when cinder-volume service not running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
The shared_
online_
The original/approved spec for shared_targets [2] described the migration being implemented thusly:
"During init of the volume service, we’ll query the backend capabilities for the setting “shared_targets”, if the value from the stats structure is False we’ll check any volumes that are set as True and update them accordingly. This way we migrated everything and set it to true, then on service init we verify that the setting is actually correct."
but apparently it was decided to not implement this. Had the spec been implemented as written, we wouldn't have this problem.
[1] https:/
[2] https:/

iain MacDonnell (imacdonn) wrote : | #1 |

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #2 |
I'm catching the same problem even when all cinder-volumes are running: http://
So I'm not sure, whether it's related to the non-running cinder-volumes.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #3 |
Forgot to mention trace from logfile:
Dec 12 20:06:42 uacloud-
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage [req-3a8a32be-
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage Traceback (most recent call last):
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage found, done = migration_
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage non_shared_hosts, total_vols_
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage capabilities = rpcapi.
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage return cctxt.call(ctxt, 'get_capabilities', discover=discover)
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage retry=self.retry)
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage retry=retry)
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage call_monitor_
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage call_monitor_
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage message = self.waiters.
2018-12-12 20:06:42.981 23905 ERROR cinder.cmd.manage File "/openstack/
2018-12-12 20:06:42.981 23905 E...

iain MacDonnell (imacdonn) wrote : | #4 |
The migrations requires an RPC response from all of the hosts that have cinder-volume services. If times out waiting for that response for any reason, the migration fails. The cinder-volume service being down is certainly one cause. There may be others. The point of this bug is that the migration should not assume that it can get a response from the service.
Changed in cinder: | |
status: | New → Confirmed |

Chris Martin (6-chris-z) wrote : | #5 |
Also hitting this problem, using OpenStack-Ansible to upgrade from Queens to Rocky. Has anyone identified a workaround yet?

iain MacDonnell (imacdonn) wrote : | #6 |
I don't know how OpenStack-Ansible sequences the upgrade, so I can't confirm if it's susceptible to this bug, or how much you can manipulate it, but; it appears there were no new online migrations added in Rocky, so (if you can find a way) you can probably skip the step entirely for a Queens->Rocky upgrade. You could run it manually afterwards if you want to be sure (once all of the services are back up).

Thiago Martins (martinx) wrote : | #7 |
Hey guys,
I believe that I'm facing this very same problem while trying to deploy a fresh OpenStack Rocky using "openstack-ansible" project.
The "os-cinder-
http://
Basically, the following command fails:
`/openstack/
"Error attempting to run shared_
I checked and the cinder-volume is running inside of its LXC container as expected (I believe).
This isn't an upgrade from Queens! It's a fresh Rocky install, in top of fresh installed Ubuntu 18.04 (everything deployed and redeployed via MaaS).
Cheers!

iain MacDonnell (imacdonn) wrote : | #8 |
Did you check cinder-manage.log ?

Thiago Martins (martinx) wrote : | #9 |
Looks like that after this systemd thing the logs where moved to God knows where...
There are no cinder-*.log inside of cinder-api containers, neither on /openstack/logs/* at the container's host, neither on the logs_hosts.
So... No, I didn't. :-/
I can manually run `cinder-manage db online_

Thiago Martins (martinx) wrote : | #10 |
Believe you or not, I just executed that command and it worked! LOL
I'm running the "os-cinder-
Next step, try to create a Cinder volume and see if it hits Ceph.
Best!

iain MacDonnell (imacdonn) wrote : | #11 |
Can't really help with the logging issue, although maybe you could try this and see if you can spot anything; journalctl --since '2019-01-28 23:12'
Looking at your ansible output again, it seems it took only 3 seconds of tail (if I'm reading it right), so it seems unlikely to be a RPC timeout.
Running it manually will at least tell you that your database is OK... it doesn't exactly tell you why it failed during the deployment, though (since the circumstances are different when you're doing it manually).

Thiago Martins (martinx) wrote : | #12 |
I just reinstalled the whole thing from scratch, after going to Ubuntu MaaS and "Release -> Deploy" and then, running `openstack-ansible setup-everythin
I know that my fresh Ceph cluster doesn't come HEALTHY after `openstack-ansible ceph-install.yml` but, I decided to proceed anyway, just to test it again as it was.
It fails on the same spot!
I noticed that when Ceph is in bad shape, and I try to create a Cinder Volume on it, the "cinder-volume" agents turns itself "DOWN", while the cinder-volume process is still running inside of cinder-container!
So, this means that cinder-volume isn't in a "up" state, then, I believe that it's triggering this bug as well!
What I'm going to do next? Same thing I did before!
Which was to bring Ceph to a HEALTHY state and try `openstack-ansible os-cinder-

Cody (codylab) wrote : | #13 |
I also hit the same error while running OSA 18.1.3 on freshly built bare metals running Ubuntu 18.04 LTS.
I also attempted to run '/openstack/
Unlike @martinx's case, my Ceph cluster was in HEALTH state when the deployment failed.
Has anyone got a fix or workaround on this?

Ian Kumlien (pomac) wrote : | #14 |
We also just hit this, and I'm wondering how this would work when cinder-manage service list returns a rbd:<poolname>
Or is it all going trough rabbitmq and those hosts will just accept it?

Ian Kumlien (pomac) wrote : | #15 |
Actually, we have all cinder services running, but still hitting this, just FYI

Ian Kumlien (pomac) wrote : | #16 |
Looking at it a bit more yields the following:
1. It can't handle rbd:
2. The volumes in ceph has shared_targets set
3. it's "capabilities = rpcapi.
A quick hack gets information about the storage backendv3 just fine.
It looks like this should have
if service.host[:4] == "rbd:":
next
---

iain MacDonnell (imacdonn) wrote : | #17 |
If possible, please post what you see in cinder-manage.log when this happens....

Cody (codylab) wrote : | #18 |
The error is gone after I set the cinder service to be containerized:
# /etc/openstack_
container_skel:
cinder_
properties:
is_metal: false
Tested on OSA 18.1.3 with Ceph integration (stable-3.1).

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #19 |
I still catch this problem even with containerized cinder-volume on brand-new setup of OSA 18.1.3.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #20 |
@Cody: Probably you've got need_online_
@imacdonn I've posted my log at the one of the first messages.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #21 |
Btw, I've started to facing this issue since the following patch https:/

iain MacDonnell (imacdonn) wrote : | #22 |
Yeah, well, prior to that fix, the migration could have been failing silently, and you would have been unaware.

Eric Smith (stephen-e-smith) wrote : | #23 |
Did anyone identify a workaround for this? I verified all Cinder services are up, rabbitMQ is working fine, I'm wondering if the transformation is maybe trying to do something that I can skip or don't care about? I see the comment about line 111 (I assume this is from /usr/lib/
information type: | Public → Public Security |

Jeremy Stanley (fungi) wrote : | #24 |
I've now read through this entire report and it's still unclear to me why you suspect this indicates an exploitable security vulnerability in Cinder. I'm switching the bug type back to a normal "Public" state, but if you do have some reason for the OpenStack VMT to triage it as report of a suspected security vulnerability please provide a clear explanation in a comment so we can better classify this. Thanks!
information type: | Public Security → Public |

Eric Smith (stephen-e-smith) wrote : | #25 |
It was accidental.

Ian Kumlien (pomac) wrote : | #26 |
Sorry, my code should have said continue, i think, too many programming languages ;)
In cinder/
def _get_non_
Original:
services = objects.
for service in services:
---
Modified:
services = objects.
for service in services:
if service.host[:4] == "rbd:":
---
Basically, skip all "unknown hosts with rbd: prefix" since they don't actually exist ;)

Ian Kumlien (pomac) wrote : | #27 |
Someone asked for the error log, so here it is:
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage [req-073ba3bc-
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage Traceback (most recent call last):
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage found, done = migration_
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage non_shared_hosts, total_vols_
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage capabilities = rpcapi.
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage return cctxt.call(ctxt, 'get_capabilities', discover=discover)
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage retry=self.retry)
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage retry=retry)
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage call_monitor_
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage call_monitor_
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage message = self.waiters.
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage File "/var/lib/
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage 'to message ID %s' % msg_id)
2019-04-13 14:39:10.986 116 ERROR cinder.cmd.manage MessagingTimeout: Timed out waiting for a reply to message ID 61ccbe4e84f743d

Ian Kumlien (pomac) wrote : | #28 |
Gnn... formatting... here is a link:
http://

Jie Li (ramboman) wrote : | #29 |
I also catch this problem. When we configured backend_host (not host) for cinder-volume HA, then we execute the cli "cinder-manage db online_

ilian dimov (iliandimov80) wrote : | #30 |
We facing same problem
-------
Error attempting to run shared_
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage Traceback (most recent call last):
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage found, done = migration_
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage non_shared_hosts, total_vols_
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage capabilities = rpcapi.
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage return cctxt.call(ctxt, 'get_capabilities', discover=discover)
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage retry=self.retry)
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage retry=retry)
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage call_monitor_
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage call_monitor_
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage message = self.waiters.
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage File "/openstack/
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage 'to message ID %s' % msg_id)
2019-04-17 17:22:21.607 285 ERROR cinder.cmd.manage MessagingTimeout: Timed out waiting for a reply to message ID ...

Vladislav Naydenov (ssap) wrote : | #31 |
We have the same problem. Please provide some solution

Henry Bonath (hbonath) wrote : | #32 |
I am also getting hit by this bug, using OSA 18.1.4 Rocky.
With my deployment, I am simply re-running playbooks on a fully working system.
Taking what @Dmitriy Rabotjagov said about need_online_
Unless there is a workaround, we definitely seem to be stuck.

Chenjun Shen (cshen) wrote : | #33 |
We're also hit by this bug, using OSA 18.1.5 Rocky. While running 'openstack-ansible setup-openstack

f-ender (f-ender) wrote : | #34 |
We've hit the same problem with 18.1.6 on Ubuntu 18.04.2 on a fresh install.
Is there any known work around?

Chenjun Shen (cshen) wrote : | #35 |
Whoever faces the same problem, please have a look at https:/
It seemed to be the ERROR state volumes which caused the issue.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #36 |
No, the problem exist even when all cinder service states are running (see my previous paste from 2018-12-12)

iain MacDonnell (imacdonn) wrote : | #37 |
If the migration fails when the cinder-volume services are online, it's failing for some other reason, so not (directly) related to this bug.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #38 |
So should I submit another one bug report? I'm almost 100% sure, that all people who said that they face with the bug in terms of the OSA, have running cinder-volumes like I do. And this looks like smth that might be reproduced...

iain MacDonnell (imacdonn) wrote : | #39 |
Looking at your paste and log from 2018-12-12, the services were last updated at 19:57 but the failure occurred at 20:06, so it's not clear if the services we're actually operational at the time of the failure.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #40 |
Ok, placed fresh paste, with more clear timings: http://

Erik McCormick (emccormickva) wrote : | #41 |
Just came here to say I hit this as well in kolla-ansible during an upgrade from Queens to Rocky. All services were online and seemed to be working properly, but I hit the same RPC messaging timeout. I've commented out the migration in the playbook just to get by it, but I"m sure it'll rear its ugly head again come Stein upgrade time. I'm happy to share any logs if needed, but they mainly look like the above.

Chenjun Shen (cshen) wrote : | #42 |
Today we did another Queens -> Rocky upgrade, and we hit problem again.
This time, no ERROR state volumes in the db.

Chenjun Shen (cshen) wrote : | #43 |
when we hit this problem, I also observed RPC message timeout from the logfile.
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage [req-ed246967-
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage Traceback (most recent call last):
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage found, done = migration_
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage non_shared_hosts, total_vols_
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage capabilities = rpcapi.
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage return cctxt.call(ctxt, 'get_capabilities', discover=discover)
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage retry=self.retry)
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage retry=retry)
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage call_monitor_
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage call_monitor_
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage message = self.waiters.
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage File "/openstack/
2019-07-25 13:19:20.607 24665 ERROR cinder.cmd.manage 'to message ID ...

iain MacDonnell (imacdonn) wrote : | #44 |
I'm not sure about all of the scenarios where the symptom is being observed, but the bottom line here is that migrations are supposed to be capable of completing when services are not running, and there will never me an RPC response if services are not running, so the way that the migration is currently implemented is fundamentally flawed.

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #45 |
TBH it would be great to prioritize this bug somehow since really lot's of people facing it and there's even no good workaround.

Georgina Shippey (gshippey) wrote : | #46 |
Ran into this issue as well on Stein 19.0.1dev14, Bionic 18.04, while performing a minor version upgrade.
ERROR cinder.cmd.manage [req-6ec0adcc-
ERROR cinder.cmd.manage Traceback (most recent call last):
ERROR cinder.cmd.manage File "/openstack/
...

Andreas Krebs (wintamute) wrote : | #47 |
Also ran into this issue when upgrading Rocky to Stein using kolla-ansible, all cinder-volume containers were up and running.
I then checked the database via
openstack volume service list
and found some cinder-volume entries with state 'down' for no longer existing hostnames.
After removing all entries with state 'down' via
cinder-manage service remove cinder-volume $old_hostname
from inside the cinder-api container, the task completed sucessfully.
We upgraded openstack-ansible from rocky to stain and cinder-manage also fails (see below to fix/workaround).
~# cinder-manage --debug db online_
; echo $?
Error attempting to run volume_
+------
| Migration | Total Needed | Completed |
+------
| attachment_
| backup_
| service_
| shared_
| volume_
+------
Some migrations failed unexpectedly. Check log for details.
2
~# journalctl --since "1 hour ago" | grep cinder.cmd.manage | grep Key
Aug 07 13:59:26 ik01-cinder-
In our case database was inconsistent eg. 'service_uuid' field was sometimes NULL:
MariaDB [cinder]> select id,volume_
Our workaround/fix was:
MariaDB [cinder]> update volumes set service_
~# /openstack/
Running batches of 50 until complete.
49 rows matched query volume_
+------
| Migration | Total Needed | Completed |
+------
| attachment_
| backup_
| service_
| shared_
| volume_
+------
0
Hope it helps someone to save few hours.

Chenjun Shen (cshen) wrote : | #49 |
Yesterday we ran a FRESH install of openstack-ansible 18.1.8 (Rocky), but it unfortunately failed a t the same place by cinder online_
It seems that it always failed at part _get_non_
http://
Although according to the explanation, the online data migration doesn't need any RPC message. So I would suggest to have a look at code(_get_

Chenjun Shen (cshen) wrote : | #50 |
I agree to what iain MacDonnell (imacdonn) explained in the ticket description.
The online data migration seems to be ambiguous. One side, it is supposed to complete when cinder service is not running, on the other side, it needs a RPC response to know the capabilities of the cinder hosts. So as Iain said, the migration is partly designed fundamentally flawed.
The workaround could be this patch https:/
There is NO need to do data online migration since Rocky, Stein and Train.

Eric Miller (erickmiller) wrote : | #51 |
I ran into a similar situation as comment 48:
https:/
This was caused by some failed volume backups, which created "backup-vol-<uuid>" volumes that were detached.
Once these volumes were deleted, all rows in the volumes table that had NULL service_uuid values where marked as deleted.
The problem created the following entries in cinder-manage.log file:
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage [req-380963c9-
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage Traceback (most recent call last):
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage File "/var/lib/
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage found, done = migration_
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage File "/var/lib/
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage return IMPL.volume_
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage File "/var/lib/
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage return fn(*args, **kwargs)
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage File "/var/lib/
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage v['service_uuid'] = svc_map[host[0]]
2019-10-06 19:10:48.277 12 ERROR cinder.cmd.manage KeyError: u'compute004@

Dmitriy Rabotyagov (noonedeadpunk) wrote : | #52 |
Wondering if that has been fixed with https:/

iain MacDonnell (imacdonn) wrote : | #53 |
I doubt it; that change doesn't look related.

yule sun (syle87) wrote : | #54 |
Hello everyone
I have the same problem when i use kolla-ansible upgrade to rocky from queens. I deleted the volume from database which volume uuid='NULL', and i also check the cinder service in the cluster . Everything seems to be working fine, but when I do the upgrade I still get this error.
Anybody who known how to fix it? need your help.
2022-09-29 09:19:33.719 326 INFO cinder.rpc [req-e70a1e11-
2022-09-29 09:19:33.728 326 INFO cinder.rpc [req-e70a1e11-
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage [req-e70a1e11-
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage Traceback (most recent call last):
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage found, done = migration_
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage non_shared_hosts, total_vols_
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage capabilities = rpcapi.
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage return cctxt.call(ctxt, 'get_capabilities', discover=discover)
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage retry=self.retry)
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage retry=retry)
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage call_monitor_
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage File "/var/lib/
2022-09-29 09:20:35.486 326 ERROR cinder.cmd.manage call_monitor_
2022-09-29 09:20:35.486 326 E...

yule sun (syle87) wrote : | #55 |
Great, I solved this problem. The steps to resolve are
1. Find the volume id with service_uuid=NULL that was deleted from volumes at that time.
2. Clear the volume information from golume_
3. Clean up these volumes from ceph's volumes pool
Then execute docker exec -it cinder_api cinder-manage --debug db online_
My upgrade scenario (using my bespoke ansible-based deployment automation) involves shutting down all services, installing new code, then invoking the upgrade procedures for each component - e.g. apply any needed config changes, then "db sync", then start the service, then do online_ data_migrations . cinder-volume separate from the core cinder services (api, scheduler), because I run multiple instances of cinder-volume for different storage backends (I think of it almost more like an agent than a service), on separate control VMs. I do not update and start up the cinder-volume services until later, so they are not running when I would normally do the online_ data_migrations for cinder, so the migration fails. I'm working around this for now by just not doing the online_ data_migrations during cinder upgrade - fortunately it seems that there have not been any added since Queens, so I don't actually need any ... for now.
I don't know how other distributions handle this, so maybe this issue only affects me right now.
I suppose I could rework my playbook to delay online_ data_migrations until after cinder-volume services have been updated and restarted... but it seems I really shouldn't have to.
I haven't looked into FFU in any detail yet, but I keep hearing mention of it, and suspect it will become more prevalent in the future. I believe that it requires migrations to be capable of completing when services are not online.