Because the user wants to delete a thing in our supposed "elastic infrastructure". They want their quota back, they want to stop being billed for it, they want the IP for use somewhere else, or whatever. They don't care that we can't delete it because of some backend failure - that's not their problem. That's why we have the ability to queue the delete even if the compute is down - that's how important it is.
It's also not at all about deleting the VM, it's about the instance going away from the perspective of the user (i.e. marking the instance record as deleted). The instance record is what determines if they're billed for it, if their quota is used, etc. We "charge" the user the same whether the VM is running or not. Further, even if we have stopped the VM, we cannot re-assign the resources committed to that VM until the deletion completes in the backend. Another scenario that infuriates operators is "I've deleted a thing, the compute node should be clear, but the scheduler tells me I can't boot something else there."
Your example workflow is exactly why I feel like the solution to this problem can't (entirely) be one of preventing a delete if we fail to detach. Because the admins will just force-delete/detach/reset-state/whatever until things free up (as I would expect to do myself). Especially if the user is demanding that they get their quota back, stop being billed, and/or attach the volume somewhere else.
It seems to me that there *must* be some way to ensure that we never attach a volume to the wrong place. Regardless of how we get there, there must be some positive affirmation that we're handing precious volume data to the right person.
Because the user wants to delete a thing in our supposed "elastic infrastructure". They want their quota back, they want to stop being billed for it, they want the IP for use somewhere else, or whatever. They don't care that we can't delete it because of some backend failure - that's not their problem. That's why we have the ability to queue the delete even if the compute is down - that's how important it is.
It's also not at all about deleting the VM, it's about the instance going away from the perspective of the user (i.e. marking the instance record as deleted). The instance record is what determines if they're billed for it, if their quota is used, etc. We "charge" the user the same whether the VM is running or not. Further, even if we have stopped the VM, we cannot re-assign the resources committed to that VM until the deletion completes in the backend. Another scenario that infuriates operators is "I've deleted a thing, the compute node should be clear, but the scheduler tells me I can't boot something else there."
Your example workflow is exactly why I feel like the solution to this problem can't (entirely) be one of preventing a delete if we fail to detach. Because the admins will just force-delete/ detach/ reset-state/ whatever until things free up (as I would expect to do myself). Especially if the user is demanding that they get their quota back, stop being billed, and/or attach the volume somewhere else.
It seems to me that there *must* be some way to ensure that we never attach a volume to the wrong place. Regardless of how we get there, there must be some positive affirmation that we're handing precious volume data to the right person.