Error in finish_migration results in image deletion on source with no copy
Bug #1686703 reported by
Matthew Booth
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Alexey Stupnikov |
Bug Description
ML post describing the issue here:
http://
User was resizing an instance whose glance image had been deleted. An ssh failure occurred in finish_migration, which runs on the destination, attempting to copy the image out of the image cache on the source. This left the instance and migration in an error state on the destination, but with no copy of the image on the destination. Cache manager later ran on the source and expired the image from the image cache there, leaving no remaining copies. At this point the user's instance was unrecoverable.
tags: | added: resize |
Changed in nova: | |
status: | In Progress → Won't Fix |
assignee: | Matthew Booth (mbooth-9) → nobody |
To post a comment you must log in.
As mentioned in the above ML post, I don't think the image cache manager should expire the image of an instance while a migration is active. However, also as described in the post I'm not convinced it's currently possible to reliably identify if a migration is ongoing.
My current thought is that we could send the image from source to dest during migrate_ disk_and_ power_off. This way, all data transfer would happen in the same place, and any failure involving user data would happen before the switch, not after.
However, while this would resolve this failure mode, I still think it would be better for the image cache manager to consider instances with active migrations.