nova compute service does not reset instance with task_state in rebooting_hard
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Unassigned |
Bug Description
Description
===========
When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again.
The condition to get this issue is to have a rabbitmq message-ttl of messages in queue which is lower than the time needed to get nova compute up again.
Steps to reproduce
==================
Prerequisites:
* Set a low message-ttl (let's say 60 seconds) in your rabbitmq
* Have a running instance on a host
First case is having a failure on nova-compute service
1/ stop nova compute service on host
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
4/ start nova compute service
5/ check instance task_state and status
Second case is having a failure on the host
1/ hard shutdown the host (let's say a power supply issue)
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
2/ restart the host
5/ check instance task_state and status
Expected result
===============
We expect nova compute to be able to reset the state to active as we lost the message, to let the user take some other actions on the instance.
Actual result
=============
The instance is stuck in rebooting_hard task_state, user is blocked
Related fix proposed to branch: master /review. opendev. org/c/openstack /nova/+ /867807
Review: https:/