unshelve to host fails with "Compute host could not be found" even when the compute exists

Bug #1988316 reported by Balazs Gibizer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
High
Unassigned

Bug Description

We observed that the recently merged test_unshelve_to_specific_host[id-b5cc0889-50c2-46a0-b8ff-b5fb4c3a6e20] tempest test case 100% fails in the nova-multi-cell job with:
Details: {'code': 400, 'message': 'Compute host ubuntu-focal-rax-dfw-0030919238 could not be found.'}[1]

Even thought the requested host do exists, up and enabled.

The problem appears to be that the compute.api.API.unshelve() code, running the compute-api service tries to queries the Cell DB to check the provided hostname without targeting the context to the proper Cell DB [2].

Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi Traceback (most recent call last):
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/shelve.py", line 124, in _unshelve
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi self.compute_api.unshelve(
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 388, in inner
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 241, in inner
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 167, in inner
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw)
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 4577, in unshelve
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi objects.ComputeNode.get_first_node_by_host_for_old_compat(
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/usr/local/lib/python3.8/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi result = fn(cls, context, *args, **kwargs)
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/objects/compute_node.py", line 293, in get_first_node_by_host_for_old_compat
Aug 31 12:35:18.468701 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi computes = ComputeNodeList.get_all_by_host(context, host, use_slave)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/usr/local/lib/python3.8/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi result = fn(cls, context, *args, **kwargs)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/objects/compute_node.py", line 476, in get_all_by_host
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi db_computes = cls._db_compute_node_get_all_by_host(context, host,
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/db/main/api.py", line 179, in wrapper
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/objects/compute_node.py", line 472, in _db_compute_node_get_all_by_host
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return db.compute_node_get_all_by_host(context, host)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/db/main/api.py", line 241, in wrapper
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return f(context, *args, **kwargs)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/db/main/api.py", line 740, in compute_node_get_all_by_host
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi raise exception.ComputeHostNotFound(host=host)
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi nova.exception.ComputeHostNotFound: Compute host ubuntu-focal-rax-dfw-0030919238 could not be found.
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi During handling of the above exception, another exception occurred:
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi Traceback (most recent call last):
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 539, in _process_stack
Aug 31 12:35:18.470614 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi action_result = self.dispatch(meth, request, action_args)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 630, in dispatch
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return method(req=request, **action_args)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 664, in wrapped
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/validation/__init__.py", line 110, in wrapper
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/validation/__init__.py", line 110, in wrapper
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/shelve.py", line 144, in _unshelve
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi raise exc.HTTPBadRequest(explanation=e.format_message())
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi webob.exc.HTTPBadRequest: Compute host ubuntu-focal-rax-dfw-0030919238 could not be found.
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi During handling of the above exception, another exception occurred:
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi Traceback (most recent call last):
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 539, in _process_stack
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi action_result = self.dispatch(meth, request, action_args)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 372, in __exit__
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi raise Fault(ex_value)
Aug 31 12:35:18.472497 ubuntu-focal-rax-dfw-0030919238 <email address hidden>[98054]: ERROR nova.api.openstack.wsgi nova.api.openstack.wsgi.Fault: Compute host ubuntu-focal-rax-dfw-0030919238 could not be found.

[1] https://de836787b7e59a5adc13-298f4365cc798f0001a632f171eb41d6.ssl.cf2.rackcdn.com/831219/22/check/nova-multi-cell/9d8aa66/controller/logs/screen-n-api.txt
[2] https://github.com/openstack/nova/blob/733a87e6126e4da8261eada74ba2cd0ec55f8a72/nova/compute/api.py#L4577

summary: - unshelve to host fails with Compute host could not be found even when
+ unshelve to host fails with "Compute host could not be found" even when
the compute exists
Changed in nova:
importance: Undecided → Critical
tags: added: gate-failure
Changed in nova:
status: New → Confirmed
tags: added: api cells shelve
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/855378

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

So the unshelve code assumes that the context is targeted to the cell of the instance when it checks if the requested host exists. So if the host exists but in a different Cell then the call will fail with the above error. The unshelve code can assume that the context is targeted as it is actually done when the instance is loaded from the DB[1]. But the code should emit a better error message and we need a release notes and API ref update to state that cross cell unshelve is not supported.

[1] https://github.com/openstack/nova/blob/3862cfc649d0099971c91d17e6f8800d10712a26/nova/compute/api.py#L2867-L2870

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/855378
Committed: https://opendev.org/openstack/nova/commit/1bc1b599df7f2f69640654a22aafb8411a459ed4
Submitter: "Zuul (22348)"
Branch: master

commit 1bc1b599df7f2f69640654a22aafb8411a459ed4
Author: Balazs Gibizer <email address hidden>
Date: Wed Aug 31 16:48:33 2022 +0200

    Skip UnshelveToHostMultiNodesTest in nova-multi-cell

    The I303a28afe69d5d17261a07fd45c4fa92bbad5598 added tempest test coverage
    for the new unshelve-to-host nova feature. However the test fails in the
    multi cell job as it tries to unshelve the instance to another cell
    which is clearly not supported.

    So this patch skips the unshelve to host test cases to unblock the gate.

    Related-Bug: #1988316
    Change-Id: I50c08a5dcffbf7c31bf02bdfb8615966f9271791

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Set to High as the gate now works back.

TBC, we should just state that cross-cell unshelve to host isn't supported yet and we could just handle better the exception but I'll leave other thoughts here.

Changed in nova:
importance: Critical → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.