Bug #1807044 “nova-api startup does not scan cells looking for m...” : Series rocky : Bugs : OpenStack Compute (nova)

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-12-06:

#1

Another thing that is probably contributing to the slow nova-api start time is that every nova.compute.api.API constructs a SchedulerReportClient, which grabs an in-memory lock per API worker during init:

Dec 05 20:14:27.694593 ubuntu-xenial-ovh-bhs1-0000959981 <email address hidden>[23459]: DEBUG oslo_concurrency.lockutils [None req-dfdfad07-2ff4-43ed-9f67-2acd59687e0c None None] Lock "placement_client" released by "nova.scheduler.client.report._create_client" :: held 0.006s {{(pid=23462) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:339}}

We could probably be smarter about either making that a singleton in the API or only init on first access since most of the API extensions aren't going to even use that SchedulerReportClient.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-12-06:

#2

I've created bug 1807219 for the issue described in comment 1.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-06: Related fix proposed to nova (master)

#3

Related fix proposed to branch: master
Review: https://review.openstack.org/623282

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-06:

#4

Related fix proposed to branch: master
Review: https://review.openstack.org/623283

Changed in nova:
assignee:	nobody → Dan Smith (danms)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-06: Fix proposed to nova (master)

#5

Fix proposed to branch: master
Review: https://review.openstack.org/623284

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-12: Related fix merged to nova (master)

#6

Reviewed: https://review.openstack.org/623282
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=183f6238c1d98b04fdcee0fb70212f01f4012ae4
Submitter: Zuul
Branch: master

commit 183f6238c1d98b04fdcee0fb70212f01f4012ae4
Author: Dan Smith <email address hidden>
Date: Thu Dec 6 07:56:04 2018 -0800

Only warn about not having computes nodes once in rpcapi

    When we instantiate the compute rpcapi, with upgrade_levels=auto,
    we determine the minimum service version of nova-compute. If we
    find no computes, we log a message. This may be known, if we are
    starting at zero day and no computes have started, but may also
    indicate a more serious problem. In either case, logging this once
    per process (especially in the case of multi-worker, multi-module
    services) is more than enough.

Change-Id: I3350026c872a548839b857c53529c1dbbb88091c
Related-Bug: #1807044

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-12: Related fix proposed to nova (stable/rocky)

#7

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/624677

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-12-12:

#8

It should also be noted that you only hit this if you configure nova with [upgrade_levels]compute=auto.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-12: Fix merged to nova (master)

#9

Reviewed: https://review.openstack.org/623284
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d779b33d72d6ef41651ce7b93fe982f121bae2d7
Submitter: Zuul
Branch: master

commit d779b33d72d6ef41651ce7b93fe982f121bae2d7
Author: Dan Smith <email address hidden>
Date: Thu Dec 6 08:07:37 2018 -0800

Make compute rpcapi version calculation check all cells

    For top-level services, the compute rpcapi module needs to check for the
    minimum version of nova-compute in all cells in order to function
    properly during an upgrade. This moves that module to use the all-cells
    variant of this routine in the case where [api_database]/connection is
    set.

Change-Id: Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4
Closes-Bug: #1807044

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-13: Related fix merged to nova (stable/rocky)

#10

Reviewed: https://review.openstack.org/624677
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=58d3ac01a041326a831d6799b217255881d825a7
Submitter: Zuul
Branch: stable/rocky

commit 58d3ac01a041326a831d6799b217255881d825a7
Author: Dan Smith <email address hidden>
Date: Thu Dec 6 07:56:04 2018 -0800

Only warn about not having computes nodes once in rpcapi

    When we instantiate the compute rpcapi, with upgrade_levels=auto,
    we determine the minimum service version of nova-compute. If we
    find no computes, we log a message. This may be known, if we are
    starting at zero day and no computes have started, but may also
    indicate a more serious problem. In either case, logging this once
    per process (especially in the case of multi-worker, multi-module
    services) is more than enough.

    Change-Id: I3350026c872a548839b857c53529c1dbbb88091c
    Related-Bug: #1807044
    (cherry picked from commit 183f6238c1d98b04fdcee0fb70212f01f4012ae4)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-13: Fix proposed to nova (stable/rocky)

#11

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/624982

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-13: Related fix proposed to nova (stable/queens)

#12

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/625051

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-13: Fix proposed to nova (stable/queens)

#13

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/625060

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-08: Fix merged to nova (stable/rocky)

#14

Reviewed: https://review.openstack.org/624982
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=db6fc29ad2b817652d6bceea6f5bd0bdd1cba3bd
Submitter: Zuul
Branch: stable/rocky

commit db6fc29ad2b817652d6bceea6f5bd0bdd1cba3bd
Author: Dan Smith <email address hidden>
Date: Thu Dec 6 08:07:37 2018 -0800

Make compute rpcapi version calculation check all cells

    For top-level services, the compute rpcapi module needs to check for the
    minimum version of nova-compute in all cells in order to function
    properly during an upgrade. This moves that module to use the all-cells
    variant of this routine in the case where [api_database]/connection is
    set.

Depends-On: https://review.openstack.org/625039/

    Change-Id: Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4
    Closes-Bug: #1807044
    (cherry picked from commit d779b33d72d6ef41651ce7b93fe982f121bae2d7)

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-02-13:

#15

Note this resulted in a side effect bug 1815697 when there is at least one down cell in the deployment.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-22: Fix included in openstack/nova 19.0.0.0rc1

#16

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-24: Fix included in openstack/nova 18.2.0

#17

This issue was fixed in the openstack/nova 18.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-18: Change abandoned on nova (stable/queens)

#18

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/625051
Reason: Not worth the complexity at this point for queens.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-18:

#19

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/625060
Reason: Not worth the complexity at this point for queens.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-07: Related fix proposed to nova (master)

#20

Related fix proposed to branch: master
Review: https://review.opendev.org/675148

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-09: Related fix merged to nova (master)

#21

Reviewed: https://review.opendev.org/675148
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7d7d58509d5e60ec19c6310931dc62eeff033595
Submitter: Zuul
Branch: master

commit 7d7d58509d5e60ec19c6310931dc62eeff033595
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 7 12:23:15 2019 -0400

Add useful error log when _determine_version_cap raises DBNotAllowed

    Change Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4 was intended for the
    API service to check all cells for the minimum nova-compute service
    version when [upgrade_levels]/compute=auto.

    That worked in the gate with devstack because we don't configure
    nova-compute with access to the database and run nova-compute with
    a separate nova-cpu.conf so even if nova-compute is on the same
    host as the nova-api service, they aren't using the same config
    file (nova-api runs with nova.conf which has access to the API DB
    obviously).

    The problem is when nova-compute is configured with
    [upgrade_levels]/compute=auto and an [api_database]/connection,
    there are flows that can try to hit the API database directly
    because of the _determine_version_cap method. For example, the
    _sync_power_states periodic task trying to stop an instance,
    or even simple inter-compute communication over RPC like during
    a resize.

    This change simply catches the DBNotAllowed exception, logs a more
    useful error message, and re-raises the exception. In addition,
    the config help for the [api_database] group and "configuration"
    option specifically are updated to mention they should not be set
    on the nova-compute service.

    Change-Id: Iac2911a7a305a9d14bc6dadb364998f3ecb9ce42
    Related-Bug: #1807044
    Closes-Bug: #1839360

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-10: Related fix proposed to nova (stable/stein)

#22

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/675714

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-29: Related fix merged to nova (stable/stein)

#23

Reviewed: https://review.opendev.org/675714
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bd03723a0c1a16d67433658fb486a84bb1bddf02
Submitter: Zuul
Branch: stable/stein

commit bd03723a0c1a16d67433658fb486a84bb1bddf02
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 7 12:23:15 2019 -0400

Add useful error log when _determine_version_cap raises DBNotAllowed

    Change Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4 was intended for the
    API service to check all cells for the minimum nova-compute service
    version when [upgrade_levels]/compute=auto.

    That worked in the gate with devstack because we don't configure
    nova-compute with access to the database and run nova-compute with
    a separate nova-cpu.conf so even if nova-compute is on the same
    host as the nova-api service, they aren't using the same config
    file (nova-api runs with nova.conf which has access to the API DB
    obviously).

    The problem is when nova-compute is configured with
    [upgrade_levels]/compute=auto and an [api_database]/connection,
    there are flows that can try to hit the API database directly
    because of the _determine_version_cap method. For example, the
    _sync_power_states periodic task trying to stop an instance,
    or even simple inter-compute communication over RPC like during
    a resize.

    This change simply catches the DBNotAllowed exception, logs a more
    useful error message, and re-raises the exception. In addition,
    the config help for the [api_database] group and "configuration"
    option specifically are updated to mention they should not be set
    on the nova-compute service.

    NOTE(mriedem): The test was modified to set the LAST_VERSION
    global to None since change I48109d5e32a2e9635c240da1c77f7f6cc7e3c76d
    is not in Stein.

    Change-Id: Iac2911a7a305a9d14bc6dadb364998f3ecb9ce42
    Related-Bug: #1807044
    Closes-Bug: #1839360
    (cherry picked from commit 7d7d58509d5e60ec19c6310931dc62eeff033595)

Reviewed:  https://review.opendev.org/675714
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bd03723a0c1a16d67433658fb486a84bb1bddf02
Submitter: Zuul
Branch:    stable/stein

commit bd03723a0c1a16d67433658fb486a84bb1bddf02
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Wed Aug 7 12:23:15 2019 -0400

Add useful error log when _determine_version_cap raises DBNotAllowed
    
    Change Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4 was intended for the
    API service to check all cells for the minimum nova-compute service
    version when [upgrade_levels]/compute=auto.
    
    That worked in the gate with devstack because we don't configure
    nova-compute with access to the database and run nova-compute with
    a separate nova-cpu.conf so even if nova-compute is on the same
    host as the nova-api service, they aren't using the same config
    file (nova-api runs with nova.conf which has access to the API DB
    obviously).
    
    The problem is when nova-compute is configured with
    [upgrade_levels]/compute=auto and an [api_database]/connection,
    there are flows that can try to hit the API database directly
    because of the _determine_version_cap method. For example, the
    _sync_power_states periodic task trying to stop an instance,
    or even simple inter-compute communication over RPC like during
    a resize.
    
    This change simply catches the DBNotAllowed exception, logs a more
    useful error message, and re-raises the exception. In addition,
    the config help for the [api_database] group and "configuration"
    option specifically are updated to mention they should not be set
    on the nova-compute service.
    
    NOTE(mriedem): The test was modified to set the LAST_VERSION
    global to None since change I48109d5e32a2e9635c240da1c77f7f6cc7e3c76d
    is not in Stein.
    
    Change-Id: Iac2911a7a305a9d14bc6dadb364998f3ecb9ce42
    Related-Bug: #1807044
    Closes-Bug: #1839360
    (cherry picked from commit 7d7d58509d5e60ec19c6310931dc62eeff033595)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-30: Related fix proposed to nova (stable/rocky)

#24

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/679449

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-07: Related fix merged to nova (stable/rocky)

#25

Reviewed: https://review.opendev.org/679449
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7732f0e6f3691025a06f2abfc5a57c71d83e5e72
Submitter: Zuul
Branch: stable/rocky

commit 7732f0e6f3691025a06f2abfc5a57c71d83e5e72
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 7 12:23:15 2019 -0400

Add useful error log when _determine_version_cap raises DBNotAllowed

    Change Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4 was intended for the
    API service to check all cells for the minimum nova-compute service
    version when [upgrade_levels]/compute=auto.

    That worked in the gate with devstack because we don't configure
    nova-compute with access to the database and run nova-compute with
    a separate nova-cpu.conf so even if nova-compute is on the same
    host as the nova-api service, they aren't using the same config
    file (nova-api runs with nova.conf which has access to the API DB
    obviously).

    The problem is when nova-compute is configured with
    [upgrade_levels]/compute=auto and an [api_database]/connection,
    there are flows that can try to hit the API database directly
    because of the _determine_version_cap method. For example, the
    _sync_power_states periodic task trying to stop an instance,
    or even simple inter-compute communication over RPC like during
    a resize.

    This change simply catches the DBNotAllowed exception, logs a more
    useful error message, and re-raises the exception. In addition,
    the config help for the [api_database] group and "configuration"
    option specifically are updated to mention they should not be set
    on the nova-compute service.

    Change-Id: Iac2911a7a305a9d14bc6dadb364998f3ecb9ce42
    Related-Bug: #1807044
    Closes-Bug: #1839360
    (cherry picked from commit 7d7d58509d5e60ec19c6310931dc62eeff033595)
    (cherry picked from commit bd03723a0c1a16d67433658fb486a84bb1bddf02)

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Medium	Dan Smith
Pike	Won't Fix	Medium	Unassigned
Queens	Won't Fix	Medium	Unassigned
Rocky	Fix Committed	Medium	Matt Riedemann

OpenStack Compute (nova)

nova-api startup does not scan cells looking for minimum nova-compute service version

Bug Description

Other bug subscribers

Remote bug watches