nova-conductor may crash during deploy due to haproxy-keystone 504

Bug #1846820 reported by Radosław Piliszek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Dan Smith
Wallaby
In Progress
Undecided
Unassigned
Xena
In Progress
Undecided
Unassigned
Yoga
Fix Released
Undecided
Unassigned

Bug Description

keystone was busy (behind haproxy)

nova-conductor:
2019-10-04 15:39:17.103 6 CRITICAL nova [-] Unhandled error: GatewayTimeout: Gateway Timeout (HTTP 504)
2019-10-04 15:39:17.103 6 ERROR nova Traceback (most recent call last):
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/bin/nova-conductor", line 10, in <module>
2019-10-04 15:39:17.103 6 ERROR nova sys.exit(main())
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/cmd/conductor.py", line 44, in main
2019-10-04 15:39:17.103 6 ERROR nova topic=rpcapi.RPC_TOPIC)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/service.py", line 257, in create
2019-10-04 15:39:17.103 6 ERROR nova periodic_interval_max=periodic_interval_max)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/service.py", line 129, in __init__
2019-10-04 15:39:17.103 6 ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/conductor/manager.py", line 117, in __init__
2019-10-04 15:39:17.103 6 ERROR nova self.compute_task_mgr = ComputeTaskManager()
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/conductor/manager.py", line 243, in __init__
2019-10-04 15:39:17.103 6 ERROR nova self.report_client = report.SchedulerReportClient()
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 186, in __init__
2019-10-04 15:39:17.103 6 ERROR nova self._client = self._create_client()
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 229, in _create_client
2019-10-04 15:39:17.103 6 ERROR nova client = self._adapter or utils.get_sdk_adapter('placement')
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/utils.py", line 1039, in get_sdk_adapter
2019-10-04 15:39:17.103 6 ERROR nova return getattr(conn, service_type)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/openstack/service_description.py", line 92, in __get__
2019-10-04 15:39:17.103 6 ERROR nova endpoint = proxy_mod.Proxy.get_endpoint(proxy)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 282, in get_endpoint
2019-10-04 15:39:17.103 6 ERROR nova return self.session.get_endpoint(auth or self.auth, **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/session.py", line 1200, in get_endpoint
2019-10-04 15:39:17.103 6 ERROR nova return auth.get_endpoint(self, **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/identity/base.py", line 380, in get_endpoint
2019-10-04 15:39:17.103 6 ERROR nova allow_version_hack=allow_version_hack, **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/identity/base.py", line 271, in get_endpoint_data
2019-10-04 15:39:17.103 6 ERROR nova service_catalog = self.get_access(session).service_catalog
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/identity/base.py", line 134, in get_access
2019-10-04 15:39:17.103 6 ERROR nova self.auth_ref = self.get_auth_ref(session)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/identity/generic/base.py", line 208, in get_auth_ref
2019-10-04 15:39:17.103 6 ERROR nova return self._plugin.get_auth_ref(session, **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/identity/v3/base.py", line 184, in get_auth_ref
2019-10-04 15:39:17.103 6 ERROR nova authenticated=False, log=False, **rkwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/session.py", line 1106, in post
2019-10-04 15:39:17.103 6 ERROR nova return self.request(url, 'POST', **kwargs)
2019-10-04 15:39:17.103 6 ERROR nova File "/var/lib/kolla/venv/lib/python2.7/site-packages/keystoneauth1/session.py", line 943, in request
2019-10-04 15:39:17.103 6 ERROR nova raise exceptions.from_response(resp, method, url)
2019-10-04 15:39:17.103 6 ERROR nova GatewayTimeout: Gateway Timeout (HTTP 504)
2019-10-04 15:39:17.103 6 ERROR nova

Revision history for this message
Matt Riedemann (mriedem) wrote :

It's blowing up trying to get the service catalog from keystone to get the endpoint URL for the placement service. We could do something like in bug 1807219 and offload that placement SchedulerReportClient instantiation to a singleton lazy-load pattern so it's not hit when conductor starts up but if keystone is having problems we could just fail later when trying to lazy-load the client, so I'm not sure if that's helpful. I'd consider this very low priority since the root problem appears to be in keystone.

Changed in nova:
importance: Undecided → Low
status: New → Triaged
tags: added: conductor
tags: removed: nova-conductor
Revision history for this message
Mark Goddard (mgoddard) wrote :

Are there any retries for this request?

Revision history for this message
Mark Goddard (mgoddard) wrote :

Radoslaw suggested looking at timeouts.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Changed in kolla-ansible:
importance: High → Medium
Changed in kolla-ansible:
importance: Medium → Low
milestone: 9.0.0 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/852900

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/852901

Dan Smith (danms)
Changed in nova:
assignee: nobody → Dan Smith (danms)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/852900
Committed: https://opendev.org/openstack/nova/commit/c178d9360665c219cbcc71c9f37b9e6e3055a5e5
Submitter: "Zuul (22348)"
Branch: master

commit c178d9360665c219cbcc71c9f37b9e6e3055a5e5
Author: Dan Smith <email address hidden>
Date: Thu Aug 11 09:50:30 2022 -0700

    Unify placement client singleton implementations

    We have many places where we implement singleton behavior for the
    placement client. This unifies them into a single place and
    implementation. Not only does this DRY things up, but may cause us
    to initialize it fewer times and also allows for emitting a common
    set of error messages about expected failures for better
    troubleshooting.

    Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
    Related-Bug: #1846820

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/852901
Committed: https://opendev.org/openstack/nova/commit/232684b44022f1bc4d72b07045900780de456e63
Submitter: "Zuul (22348)"
Branch: master

commit 232684b44022f1bc4d72b07045900780de456e63
Author: Dan Smith <email address hidden>
Date: Thu Aug 11 10:18:25 2022 -0700

    Avoid n-cond startup abort for keystone failures

    Conductor creates a placement client for the potential case where
    it needs to make a call for certain operations. A transient network
    or keystone failure will currently cause it to abort startup, which
    means it is not available for other unrelated activities, such as
    DB proxying for compute.

    This makes conductor test the placement client on startup, but only
    abort startup on errors that are highly likely to be permanent
    configuration errors, and only warn about things like being unable
    to contact keystone/placement during initialization. If a non-fatal
    error is encountered at startup, later operations needing the
    placement client will retry initialization.

    Closes-Bug: #1846820
    Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.0.0.0rc1

This issue was fixed in the openstack/nova 26.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/858997

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/858998

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/858999

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/859000

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/859001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/859002

Changed in kolla-ansible:
importance: Low → Undecided
status: Triaged → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/858997
Committed: https://opendev.org/openstack/nova/commit/77273f067d96a4ec401c3b36f2922d63c4ad7103
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 77273f067d96a4ec401c3b36f2922d63c4ad7103
Author: Dan Smith <email address hidden>
Date: Thu Aug 11 09:50:30 2022 -0700

    Unify placement client singleton implementations

    We have many places where we implement singleton behavior for the
    placement client. This unifies them into a single place and
    implementation. Not only does this DRY things up, but may cause us
    to initialize it fewer times and also allows for emitting a common
    set of error messages about expected failures for better
    troubleshooting.

    Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
    Related-Bug: #1846820
    (cherry picked from commit c178d9360665c219cbcc71c9f37b9e6e3055a5e5)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/858998
Committed: https://opendev.org/openstack/nova/commit/19346082058d51c78bb157ca5e1304d15691dd9a
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 19346082058d51c78bb157ca5e1304d15691dd9a
Author: Dan Smith <email address hidden>
Date: Thu Aug 11 10:18:25 2022 -0700

    Avoid n-cond startup abort for keystone failures

    Conductor creates a placement client for the potential case where
    it needs to make a call for certain operations. A transient network
    or keystone failure will currently cause it to abort startup, which
    means it is not available for other unrelated activities, such as
    DB proxying for compute.

    This makes conductor test the placement client on startup, but only
    abort startup on errors that are highly likely to be permanent
    configuration errors, and only warn about things like being unable
    to contact keystone/placement during initialization. If a non-fatal
    error is encountered at startup, later operations needing the
    placement client will retry initialization.

    Conflicts:
        nova/tests/unit/conductor/test_conductor.py

    NOTE(melwitt): The conflict is because change
    Id5b04cf2f6ca24af8e366d23f15cf0e5cac8e1cc
    (Use unittest.mock instead of third party mock) is not in Yoga.

    Closes-Bug: #1846820
    Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
    (cherry picked from commit 232684b44022f1bc4d72b07045900780de456e63)

no longer affects: kolla-ansible
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.1.1

This issue was fixed in the openstack/nova 25.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.