OpenStack Compute (nova)

Oversubscription broken for instances with NUMA topologies

Series rocky
Bug #1810977

Bug #1810977 reported by Stephen Finucane on 2019-01-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Stephen Finucane
	Rocky	Fix Committed	Medium	Stephen Finucane

Bug Description

As described in [1], the fix to [2] appears to have inadvertently broken oversubscription of memory for instances with a NUMA topology but no hugepages.

Steps to reproduce:

1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance:

$ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=2

2. Boot an instance using this flavor:

$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test

3. Boot another instance using this flavor:

$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2

# Expected result:

The second instance should boot.

# Actual result:

The second instance fails to boot. We see the following error message in the logs.

nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}

If we revert the patch that addressed the bug [3] then we revert to the correct behaviour and the instance boots. With this though, we obviously lose whatever benefits that change gave us.

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
[2] https://bugs.launchpad.net/nova/+bug/1734204
[3] https://review.openstack.org/#/c/532168

Tags:

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2019-01-08:

triaged as medium as while this will affect all deployment with ram_allocation_ratio >1.0
that use numa affined guests without hugepages, the propotion of clouds that it affect is
expected to be low.

for does that are affected there is no workaround beyond disabling all numa related feature if they want to achive
memory over subscription.

Changed in nova:
importance:	Undecided → Medium
status:	New → Confirmed
tags:	added: libvirt numa scheduler

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-08: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/629281

Changed in nova:
assignee:	nobody → Stephen Finucane (stephenfinucane)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-18: Fix merged to nova (master)

Reviewed: https://review.openstack.org/629281
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b24ad3780bc872d1a17907909cd6bcbea7e804b3
Submitter: Zuul
Branch: master

commit b24ad3780bc872d1a17907909cd6bcbea7e804b3
Author: Stephen Finucane <email address hidden>
Date: Tue Jan 8 17:01:41 2019 +0000

Fix overcommit for NUMA-based instances

    Change I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53 modified how we
    evaluated available memory for instances with a NUMA topology.
    Previously, we used a non-pagesize aware check unless the user had
    explicitly requested a specific pagesize. This means that for instances
    without pagesize requests, nova considers hugepages as available memory
    when deciding if a host has enough available memory for the instance.

    The aforementioned change modified this so that all NUMA-based
    instances, whether they had hugepages or not, would use the
    pagesize-aware check. Unfortunately the functionality it was reusing to
    do this was functionality previously only used for hugepages. Hugepages
    cannot be oversubscribed so we did not take oversubscription into
    account, comparing against available memory on the host (i.e. memory not
    consumed by other instances) rather than total memory. This is OK when
    using hugepages but not small pages, where overcommit is OK.

    Given that overcommit is already handled elsewhere in the code, we
    simply modify the non-hugepage code path to check for available memory
    of the lowest pagesize vs. total memory.

    Change-Id: I890b2c81cd49c1c601e9baee6a249709d0f6810e
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1810977

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-25: Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/633197

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-04: Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/633197
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=780ccfcbdea919b196c18372d1c66bc88b4fa48c
Submitter: Zuul
Branch: stable/rocky

commit 780ccfcbdea919b196c18372d1c66bc88b4fa48c
Author: Stephen Finucane <email address hidden>
Date: Tue Jan 8 17:01:41 2019 +0000

Fix overcommit for NUMA-based instances

    Given that overcommit is already handled elsewhere in the code, we
    simply modify the non-hugepage code path to check for available memory
    of the lowest pagesize vs. total memory.

    Change-Id: I890b2c81cd49c1c601e9baee6a249709d0f6810e
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1810977
    (cherry picked from commit fd19aeafbce0fa11821b2a064bd694b078613c2f)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-22: Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-24: Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-11: Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/726868

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/queens)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/726868
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.