Port detach fails when compute host is unreachable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Medium
|
Unassigned | ||
octavia |
Invalid
|
Undecided
|
Unassigned |
Bug Description
When a compute host is unreachable, a port detach for a VM on that host will not complete until the host is reachable again. In some cases, this may for an extended period or even indefinitely (for example, a host is powered down for hardware maintenance, and possibly needs to be removed from the fleet entirely). This is problematic for multiple reasons:
1) The port should not be deleted in this state (it can be, but for reasons outside the scope of this bug, that is not recommended). Thus, the quota cannot be reclaimed by the project.
2) The port cannot be reassigned to another VM. This means that for projects that rely heavily on maintaining a published IP (or possibly even a published port ID), there is no way to proceed. For example, if Octavia wanted to allow failing over from one VM to another in a VM down event (as would happen if the host was powered off) without using AAP, it would be unable to do so, leading to an extended downtime.
Nova will supposedly clean up such resources after the host has been powered up, but that could take hours or possibly never happen. So, there should be a way to force the port to detach regardless of ability to reach the compute host, and simply allow the cleanup to happen on that host in the future (if possible) but immediately release the port for delete or rebinding.
If nova would allow an admin to `force` an unbind, but still queue all the standard cleanup in nova, would that solve this? Is that unreasonable?