Stopping an instance that goes into error state wait too long to try again
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
akanda |
Triaged
|
Medium
|
Mark McClain |
Bug Description
When Nova times out talking to Neutron during delete, the Rug will get stuck in a long wait loop that does not properly detect the current conditions.
Steps to reproduce:
1) Boot router instance
2) If it instance transitions into error state skip to 4.
3) Disable appliance rest service on appliance.
4) Intentionally make Neutron's update_
5) Issue poll command to router via rug_ctl.
At this point the Rug will attempt to stop the instance and Nova will call update_port on Neutron. The call will fail and the instance will go into Error state. The Rug will continue to periodically check that the instance is gone until the timeout is reached. The call will not be reattempted until the boot_timeout is reached (this config value is also used to determine maximum amount of time to wait to delete to occur).
When delete fails because the instance goes into Error state, no change to the vm state in the rug's instance data will occur. The Rug will not reattempt until the long boot interval occurs.
Expected:
When the vm delete commands causes the VM state to move to error, we should exit the stop wait loop and try the stop command again earlier than the boot_timeout.
description: | updated |
affects: | akanda-rug → akanda |
Changed in akanda: | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in akanda: | |
assignee: | nobody → Mark McClain (markmcclain) |
tags: | added: akanda-rug |
Changed in akanda: | |
milestone: | none → kilo-rc1 |
milestone: | kilo-rc1 → liberty-1 |
Changed in akanda: | |
milestone: | liberty-1 → liberty-2 |
Changed in akanda: | |
milestone: | liberty-2 → none |