Originally reported there https://bugzilla.redhat.com/show_bug.cgi?id=1590297#c14 and tracked there https://bugzilla.redhat.com/show_bug.cgi?id=1592427
@owalsh: Looks like a race in the service startup.
(undercloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
| fcda23f8-7b98-41bf-9e45-7a4579164874 | overcloud-controller-1 | ERROR | ctlplane=192.168.24.7 | overcloud-full_renamed | oooq_control |
| ced0e0db-ad1f-4381-a523-0a79ae0303ff | overcloud-controller-2 | ERROR | ctlplane=192.168.24.21 | overcloud-full_renamed | oooq_control |
| 0731685f-a8fc-4126-868a-bdf9879238f3 | overcloud-controller-0 | ERROR | ctlplane=192.168.24.12 | overcloud-full_renamed | oooq_control |
| c4fd4e28-fe44-40d2-9b40-7196b7c3b6a2 | overcloud-novacompute-2 | ERROR | ctlplane=192.168.24.15 | overcloud-full_renamed | oooq_compute |
| 5bef3afd-4485-4cbc-b76c-f126c85bd015 | overcloud-novacompute-1 | ERROR | ctlplane=192.168.24.8 | overcloud-full_renamed | oooq_compute |
| e4c9da33-c452-446c-82c4-bd55a6b294d8 | overcloud-novacompute-0 | ERROR | ctlplane=192.168.24.9 | overcloud-full_renamed | oooq_compute |
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
Looking at controller-1...
(undercloud) [stack@undercloud ~]$ openstack server show overcloud-controller-1
+-------------------------------------+---------------------------------------------------------------+
| Field | Value |
+-------------------------------------+---------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | undercloud |
| OS-EXT-SRV-ATTR:hypervisor_hostname | b6a32fba-b57a-4e7b-a6ce-99941b4d134d |
| OS-EXT-SRV-ATTR:instance_name | instance-00000005 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2018-06-11T14:56:34.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | ctlplane=192.168.24.7 |
| config_drive | True |
| created | 2018-06-11T14:53:09Z |
| flavor | oooq_control (04f8ba26-e9bd-472f-b92a-919e7ec8bed1) |
| hostId | da8848b6f3dc77a51235f70dcf44df197261eca0529173f670c94ce9 |
| id | fcda23f8-7b98-41bf-9e45-7a4579164874 |
| image | overcloud-full_renamed (4d965d80-61fc-45d0-88e1-e6365c4afd57) |
| key_name | default |
| name | overcloud-controller-1 |
| project_id | 3b778414471a47e4b0760bbecc9d4070 |
| properties | |
| security_groups | name='default' |
| status | ERROR |
| updated | 2018-06-15T09:29:17Z |
| user_id | 356b9a5451c643bb8162b9349bc9487b |
| volumes_attached | |
+-------------------------------------+---------------------------------------------------------------+
From /var/log/ironic/ironic-conductor.log I can see that ironic-conductor started loading extensions after the updated timestamp:
2018-06-15 09:29:18.576 1408 DEBUG oslo_concurrency.lockutils [req-6ad8cb26-b607-4f65-a8b7-30ac72a7a997 - - - - -] Lock "extension_manager" acquired by "ironic.common.driver_factory._init_extension_manager" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
The errors in /var/log/ironic/app.log occurred before this, because there wasn't a conductor registered that supports ipmi:
2018-06-15 09:29:14.419 1793 DEBUG wsme.api [req-81ce866e-1c15-4499-a452-044a44efa1c0 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:15.304 1792 DEBUG wsme.api [req-be8a2443-12ad-4115-9640-b91b39d15765 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:16.105 1793 DEBUG wsme.api [req-0f2c2010-3bce-4600-8c8a-54a6aa0b8965 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:16.797 1792 DEBUG wsme.api [req-e660c0cd-8a25-4b93-815f-edc26c15ddba 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:17.596 1793 DEBUG wsme.api [req-ffa50a06-3a62-4f3e-937c-4304818dca4c 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:18.277 1793 DEBUG wsme.api [req-c391da42-997e-49ee-b604-11ab81f875e2 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
And /var/log/nova/nova-compute.log:
2018-06-15 09:29:12.007 2623 INFO nova.service [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Starting compute node (version 18.0.0-0.20180601221704.f902e0d.el7)
2018-06-15 09:29:12.135 2623 DEBUG nova.servicegroup.drivers.db [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Seems service nova-compute on host undercloud is down. Last heartbeat was 2018-06-15 09:27:40. Elapsed time is 92.135605 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:79
2018-06-15 09:29:12.260 2623 DEBUG nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Checking state _get_power_state /usr/lib/python2.7/site-packages/nova/compute/manager.py:1167
2018-06-15 09:29:13.819 2623 DEBUG nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] plug: instance_uuid=5bef3afd-4485-4cbc-b76c-f126c85bd015 vif=[{"profile": {}, "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "192.168.24.8"}], "version": 4, "meta": {"dhcp_server": "192.168.24.5"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "192.168.23.1"}], "routes": [{"interface": null, "cidr": "169.254.169.254/32", "meta": {}, "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "cidr": "192.168.24.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "meta": {"injected": false, "tenant_id": "3b778414471a47e4b0760bbecc9d4070", "mtu": 1500}, "id": "1fe54003-4d8c-4b37-9f20-f04216ac4e26", "label": "ctlplane"}, "devname": "tapcbd2c5be-32", "vnic_type": "baremetal", "qbh_params": null, "meta": {}, "details": {}, "address": "00:1a:24:52:db:70", "active": true, "type": "other", "id": "cbd2c5be-32fd-4c5c-9a14-43e323071912", "qbg_params": null}] _plug_vifs /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:1397
2018-06-15 09:29:14.422 2623 ERROR nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400): BadRequest: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Vifs plug failed: VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Traceback (most recent call last):
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 942, in _init_instance
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] self.driver.plug_vifs(instance, net_info)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1441, in plug_vifs
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] self._plug_vifs(node, instance, network_info)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1408, in _plug_vifs
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] raise exception.VirtualInterfacePlugException(msg)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]
etc... for the other instances
This can also happen if a node with an instance was deleted from ironic while nova-compute was down: http:// paste.openstack .org/show/ 728326/