Ironic hypervisor disappears once hashring got rebuilt

Bug #1825876 reported by Nikolay Fedotov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Nikolay Fedotov

Bug Description

Steps to reproduce
==================
Precondition: Need fresh openstack deployment. Database tables nova.compute_nodes and nova_api.host_mappings must be empty. In other words baremetal nodes were not added to ironic database yet.
It HA deployment. Need to have at least two ironic-conductors running on different servers.

Steps:
1. Create baremetal node . "openstack baremetal node create ..."
2. Change node's state to manageable
3. After sometime "nova hypervisor-list" should list a hypervisor with same UUID as the baremetal node.
3.1 Database should like below
MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+--------------------------------------+-------------+--------+
| uuid | host | mapped |
+--------------------------------------+-------------+--------+
| d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio1 | 1 |
+--------------------------------------+-------------+--------+
MariaDB [(none)]> select * from nova_api.host_mappings;
+---------------------+------------+----+---------+-------------+
| created_at | updated_at | id | cell_id | host |
+---------------------+------------+----+---------+-------------+
| 2019-04-22 09:14:23 | NULL | 22 | 7 | ironic.aio1 |
+---------------------+------------+----+---------+-------------+

4. Call "nova hypervisor-show <hypervisor UUID>" in order to find out server where ironic-conductor is running. Log into that server and stop ironic-conductor. Need to force hashring to rebuild it's state. Wait for about five minutes.
5. Check output of "nova hypervisor-list". The hypervisor is absent.

Result
==================
Look inside database (see below). ironic.aio3 took the baremetal thus node nova changed 'host' field of compute (d394aa91-3544-417c-acab-916a22e5a5b5) to 'ironic.aio3'.
Because of mapped = 1 'nova-manage cell_v2 discover_hosts' (run preiodically https://bugs.launchpad.net/nova/+bug/1715646) does not try to create host mapping.

MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+--------------------------------------+-------------+--------+
| uuid | host | mapped |
+--------------------------------------+-------------+--------+
| d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio3 | 1 |
+--------------------------------------+-------------+--------+
MariaDB [(none)]> select * from nova_api.host_mappings;
+---------------------+------------+----+---------+-------------+
| created_at | updated_at | id | cell_id | host |
+---------------------+------------+----+---------+-------------+
| 2019-04-22 09:14:23 | NULL | 22 | 7 | ironic.aio1 |
+---------------------+------------+----+---------+-------------+

2019-04-22 19:54:00.813 8 WARNING nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] No compute node record for ironic.aio3:5f9c2619-30bb-40d2-8b62-8923f04d90f2: ComputeHostNotFound_Remote: Compute host ironic.aio3 could not be found.
2019-04-22 19:54:00.831 8 INFO nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] ComputeNode 5f9c2619-30bb-40d2-8b62-8923f04d90f2 moving from ironic.aio1 to ironic.aio3
2019-04-22 19:54:00.891 8 DEBUG nova.virt.ironic.driver [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] Using cache for node 5f9c2619-30bb-40d2-8b62-8923f04d90f2, age: 0.0979330539703 _node_from_cache /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:860

Missing record in host_mappings table causes nova to print "Unable to find service" DEBUG message (see below). The compute become 'invisible'.
See source code nova/api/openstack/compute/hypervisors.py:HypervisorsController._get_hypervisors

108 def _get_hypervisors(self, req, detail=False, limit=None, marker=None,
109 links=False):
110 """Get hypervisors for the given request.
111
112 :param req: nova.api.openstack.wsgi.Request for the GET request
...
161 hypervisors_list = []
162 for hyp in compute_nodes:
163 try:
164 instances = None
165 if with_servers:
166 instances = self.host_api.instance_get_all_by_host(
167 context, hyp.host)
168 service = self.host_api.service_get_by_compute_host(
169 context, hyp.host)
170 hypervisors_list.append(
171 self._view_hypervisor(
172 hyp, service, detail, req, servers=instances))
173 except (exception.ComputeHostNotFound,
174 exception.HostMappingNotFound):
175 # The compute service could be deleted which doesn't delete
176 # the compute node record, that has to be manually removed
177 # from the database so we just ignore it when listing nodes.
178 LOG.debug('Unable to find service for compute node %s. The '
179 'service may be deleted and compute nodes need to '
180 'be manually cleaned up.', hyp.host)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/654584

Changed in nova:
assignee: nobody → Nikolay Fedotov (nfedotov)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Nikolay Fedotov (<email address hidden>) on branch: master
Review: https://review.opendev.org/654584

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/817467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "Julia Kreger <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/817467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Julia Kreger <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/813897
Reason: The consensus from the PTG was to refactor the driver to utilize a static mapping. Specifically the introduction of Shard Keys https://review.opendev.org/c/openstack/ironic-specs/+/861803 and their use in the nova.virt.ironic driver https://review.opendev.org/c/openstack/nova-specs/+/862833

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.