Cannot retry model migration to new controller after previous attempt fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Incomplete
|
High
|
Joseph Phillips |
Bug Description
My apologies - I'm filing this after the fact.
On a customer cloud, we created a new bionic controller to replace the existing xenial controller. We then tried migrating models. One model succeeded, but the main openstack model failed to migrate. Supposedly it failed due to references to 2 stale apps which were no longer in use but were unable to be cleanly removed for some reason.
Afterwards, a way to remove those "stale" apps was found, and the migration was retried. Upon retry, the error we encountered was:
migrating: aborted, removing model from target controller: model data transfer failed, failed to import model into target controller: model <REDACTED-UUID> already exists (already exists)
At the time of writing this, I see all controllers are running 2.9.18, and the openstack model in question is also on 2.9.18.
As this is an issue with a customer cloud, I will share a database dump via internal channels.
tags: | added: model-migration |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Joseph Phillips (manadart) |
importance: | High → Medium |
Looking at the target db dump, the model UUID that is being migrated is used as a key in these collections:
annotations.json ts.json ons.json
applications.json
constraints.json
deviceConstrain
meterStatus.json
operations.json
remoteApplicati
resources.json
spaces.json
txns.log.json
unitstates.json
eg the application collection has apps like "ceph-mon" "ceph-radosgw" and "ceilometer" etc etc which are tagged as belonging to the source model.
Were there any errors (or do the logs have any errors) related to aborting the failed migration and cleaning up afterwards? That will help understand how stuff got left behind in the first place.
The 3 named models in the target controller are:
- controller
- default
- maas-infra
None of these have the model UUID in question.
Those records in the above collections are orphaned. I think we should do a regexp delete of records in those collections where the doc "_id" field starts with "<modeluuid>:". That will hopefully unblock another migration attempt.
One thing that is not clear to me right now though is when we import a model, we check the models collection and if that model uuid is there, return an "already exists" error as we see here. I would expect to see perhaps a slightly different error if the model record does not exist but other artifacts like application etc do.