For performance improvement, a cache was added in the ironic driver to store nodes (and hence their power states) in https://github.com/openstack/nova/commit/9d5fb1b58e908ccacbbbf29341918d0b0588a36f#diff-1e4547e2c3b36b8f836d8f851f85fde7. Later on through https://github.com/openstack/nova/commit/19cb8280232fd3b0ba0000a475d061ea9fb10e1a#diff-1e4547e2c3b36b8f836d8f851f85fde7 a "use_cache" option was added to remove inconsistencies during power_sync periodic task caused due to this cache. However when we do nova start/stop on an instance, the power_state of the instance is obtained from the cache (https://github.com/openstack/nova/blob/f298973520420710a617e4d79e853f2416b29786/nova/compute/manager.py#L1284) and this causes inconsistencies on the CLI listing/showing between the vm_states and power_states for a considerable amount of time (assuming until the next periodic power sync between nova and ironic that depends on sync_power_state_interval config option) before the cache gets refreshed to reflect the correct states:
+--------------------------------------+---------+--------+------------+-------------+-------------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+---------+--------+------------+-------------+-------------------------------------------------------+
| cd38b5c1-80dc-425d-8b8e-f523dc60e6ba | test000 | ACTIVE | - | Shutdown | private=fde8:a67c:e94e:0:5054:ff:fe28:5da1, 10.0.0.31 |
| cd38b5c1-80dc-425d-8b8e-f523dc60e6ba | test000 | SHUTOFF | - | Running | private=fde8:a67c:e94e:0:5054:ff:fe28:5da1, 10.0.0.31 |+--------------------------------------+---------+---------+------------+-------------+-------------------------------------------------------+
The code comment specifies that the refresh of the cache should happen during every RT periodic update which should be every 60 seconds (https://github.com/openstack/nova/blob/61558f274842b149044a14bbe7537b9f278035fd/nova/virt/ironic/driver.py#L989) but the inconsistency seems to last for more than a minute and this is confusing for the user. The "use_cache" should be set to False for these actions to avoid confusing vm and power states.
okay so the bug is because I don't think RT calls "get_available_ nodes" function anymore (https:/ /github. com/openstack/ nova/blob/ 9f28727eb75e05e 07bad51b6eecce6 67d09dfb65/ nova/virt/ ironic/ driver. py#L725). I think the refresh should be updated either in the "get_available_ resources" or passed separately per function call during start/stop etc..