Model migration fails

Bug #1668646 reported by Sandor Zeestraten
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Tim Penhey

Bug Description

# Issue
MAAS environment with 2 separate controllers. Old controller A has 4GB RAM and a model with about 40 machines. Controller B is an empty controller with 8GB RAM. Trying to migrate the model from A to B stops after a few seconds with with the message "migration: aborted, removing model from target controller" when running "juju show-model".

# Logs machine-0.log from controller A
2017-02-28 14:25:40 ERROR juju.worker.migrationmaster.40fd7e worker.go:284 model data transfer failed, model export failed: failed to read status history collection: Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.
2017-02-28 14:25:40 WARNING juju.worker.migrationmaster.40fd7e worker.go:592 failed to remove model from target controller, model not found (not found)

# Versions
Juju 2.1.0
MAAS 2.1.3

Any tips for further troubleshooting?

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2-rc1
importance: Undecided → High
status: New → Triaged
Revision history for this message
Christian Muirhead (2-xtian) wrote :

This error means that when we query the status history to export it we're sorting by columns that don't match the available indexes in Mongo.

Looking in the code I can see that the index on the statuseshistory collection is {"model-uuid", "globalkey", "updated"}, but in the export we're sorting by {"-updated", "-_id"} (after filtering by model-uuid).

I think one way to fix it would be to add an index on {"model-uuid", "-updated", "-_id"}. (Maybe we could fix it by querying slightly differently instead.)

The statuseshistory collection should be being trimmed by the status history pruner - can you see any messages in the logs about that? Maybe some indication of how big the collection is?

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

I'm seeing multiple entries of the line below in machine-0.log on controller A.

DEBUG juju.worker.dependency engine.go:500 "status-history-pruner" manifold worker stopped: "migration-inactive-flag" not running: dependency not available

# machine-0.log after a reboot of controller A
http://pastebin.com/uePi1QA4

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Thanks for the paste of the logs. It definitely looks like something odd is happening - particularly near the end of the log the migration-fortress starts but then the status-history-pruner complains that the migration-fortress isn't running. Would you be able to attach the whole log file to the bug?

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

No problem. When you say the whole log, do you mean the machine log from the beginning/bootstrap?

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

See attachment for whole machine-0.log

Revision history for this message
Tim Penhey (thumper) wrote :

The problem is in the exporting of the model from the source controller. The status history collection is too large for the query that is being done without an index. Looking to test the index code and get into 2.1.1.

Changed in juju:
milestone: 2.2-beta1 → 2.1.1
assignee: nobody → Tim Penhey (thumper)
status: Triaged → In Progress
Revision history for this message
Tim Penhey (thumper) wrote :
Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.