Instances end up with no cell assigned in instance_mappings
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Medium
|
Unassigned | ||
Pike |
Confirmed
|
Medium
|
Unassigned | ||
Queens |
Confirmed
|
Medium
|
Unassigned |
Bug Description
There has been situations where due to an unrelated issue such as an RPC or DB problem, the nova_api instance_mappings table can end up with instances that have cell_id set to NULL which can cause annoying and weird behaviour such as undeletable instances, etc.
This seems to be an issue only during times where these external infrastructure components had issues. I have come up with the following script which loops over all cells and checks where they are, and spits out a mysql query to run to fix.
This would be nice to have as a nova-manage cell_v2 command to help if any other users run into this, unfortunately I'm a bit short on time so I don't have time to nova-ify it, but it's here:
=======
#!/usr/bin/env python
import urlparse
import pymysql
# Connect to databases
api_conn = pymysql.
api_cur = api_conn.cursor()
def _get_conn(db):
parsed_url = urlparse.
conn = pymysql.
return conn.cursor()
# Get list of all cells
api_cur.
CELLS = [{'uuid': uuid, 'name': name, 'db': _get_conn(db)} for uuid, name, db in api_cur.fetchall()]
# Get list of all unmapped instances
api_cur.
print "Number of unmapped instances: %s" % api_cur.rowcount
# Go over all unmapped instances
for (instance_uuid,) in api_cur.fetchall():
instance_cell = None
# Check which cell contains this instance
for cell in CELLS:
cell[
if cell['db'].rowcount != 0:
instance_cell = cell
break
# Update to the correct cell
if instance_cell:
print "UPDATE instance_mappings SET cell_id = '%s' WHERE instance_uuid = '%s'" % (instance_
continue
# If we reach this point, it's not in any cell?!
print "%s: not found in any cell" % (instance_uuid)
=======
Changed in nova: | |
status: | In Progress → Confirmed |
assignee: | Matt Riedemann (mriedem) → nobody |
I think we might be hitting this:
https:/ /github. com/openstack/ nova/blob/ 6be7f7248fb1c2b bb890a0a48a424e 205e173c9c/ nova/conductor/ manager. py#L1243
Where the build request was deleted by the user (user deletes the instance) before we created the instance in a cell, but that means they shouldn't be able to list it later either, which is why we don't bother updating the instance mapping for that instance because the instance doesn't exist as a build request nor was it created in a cell.
I'm not sure why we don't just update the instance mapping as soon as we create the instance in a cell:
https:/ /github. com/openstack/ nova/blob/ 6be7f7248fb1c2b bb890a0a48a424e 205e173c9c/ nova/conductor/ manager. py#L1257
Because in the normal flow, we don't update the instance mapping until much later:
https:/ /github. com/openstack/ nova/blob/ 6be7f7248fb1c2b bb890a0a48a424e 205e173c9c/ nova/conductor/ manager. py#L1322
And if anything fails between those times, the instance will exist in a cell but the instance mapping won't point at it so you can't do things on the instance, but you can list it (because the list routine doesn't go through instance_mappings, it just iterates cells). Furthermore, the user could delete the instance in this case, but what they'd really be deleting is the build request, and since we don't have the instance mapping pointing to the cell, we won't know which cell to find the instance and delete it.