Cannot retry model migration to new controller after previous attempt fails

Bug #1952811 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
High
Joseph Phillips

Bug Description

My apologies - I'm filing this after the fact.

On a customer cloud, we created a new bionic controller to replace the existing xenial controller. We then tried migrating models. One model succeeded, but the main openstack model failed to migrate. Supposedly it failed due to references to 2 stale apps which were no longer in use but were unable to be cleanly removed for some reason.

Afterwards, a way to remove those "stale" apps was found, and the migration was retried. Upon retry, the error we encountered was:

  migrating: aborted, removing model from target controller: model data transfer failed, failed to import model into target controller: model <REDACTED-UUID> already exists (already exists)

At the time of writing this, I see all controllers are running 2.9.18, and the openstack model in question is also on 2.9.18.

As this is an issue with a customer cloud, I will share a database dump via internal channels.

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the target db dump, the model UUID that is being migrated is used as a key in these collections:

annotations.json
applications.json
constraints.json
deviceConstraints.json
meterStatus.json
operations.json
remoteApplications.json
resources.json
spaces.json
txns.log.json
unitstates.json

eg the application collection has apps like "ceph-mon" "ceph-radosgw" and "ceilometer" etc etc which are tagged as belonging to the source model.

Were there any errors (or do the logs have any errors) related to aborting the failed migration and cleaning up afterwards? That will help understand how stuff got left behind in the first place.

The 3 named models in the target controller are:
- controller
- default
- maas-infra

None of these have the model UUID in question.

Those records in the above collections are orphaned. I think we should do a regexp delete of records in those collections where the doc "_id" field starts with "<modeluuid>:". That will hopefully unblock another migration attempt.

One thing that is not clear to me right now though is when we import a model, we check the models collection and if that model uuid is there, return an "already exists" error as we see here. I would expect to see perhaps a slightly different error if the model record does not exist but other artifacts like application etc do.

Revision history for this message
Joseph Phillips (manadart) wrote :

This message is only returned if we detect the model UUID in the models collection.

I have a suspicion about redirects here.
Is it possible that the target controller has models with the CMRs to the *migrating* model?

See this patch:
https://github.com/juju/juju/pull/11025

I wonder if there is a possibility that we followed a redirect to *source* controller, which told us that the model is present.

Revision history for this message
Giuseppe Petralia (peppepetra) wrote :

yes this is the case. The target controller has a model "maas-infra" which contains a CMR to the migrating "openstack" model

Revision history for this message
Joseph Phillips (manadart) wrote :

A simple migration of an offering model to the same controller as a consuming model worked OK for me, but the SAAS was terminated.

Revision history for this message
Joseph Phillips (manadart) wrote :

Migrating the consuming model into the offering controller works fine.

tags: added: model-migration
Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Joseph Phillips (manadart)
importance: High → Medium
Revision history for this message
Joseph Phillips (manadart) wrote (last edit ):

Based on the particular scenario here, I believe we may have addressed it in some patches targeting other issues. Specifically:
- https://github.com/juju/juju/pull/15154
- https://github.com/juju/juju/pull/15132
- https://github.com/juju/juju/pull/15145

I'll mark as incomplete in order to kick off the subsidence countdown; feel free to re-open if this issue is enduring.

Changed in juju:
importance: Medium → High
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.