Root Cause
Nova by default will first fill up NUMA node 0 if there are still free pCPUs. This issue happens when the requested pCPUs still fir into NUMA 0, but the hugepages on NUMA 0 aren't sufficient for the instance memory to fit. Unfortunately, at time of this writing, one cannot tell nova to spawn an instance on a specific NUMA node.
Diagnostic Steps
On a hypervisor with 2MB hugepages and 512 free hugepages per NUMA node:
Raw
[root@overcloud-compute-1 ~]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 512
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 512
Node 1 HugePages_Surp: 0
And with the following NUMA architecture:
Raw
[root@overcloud-compute-1 nova]# lscpu | grep -i NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Spawn 3 instances with the following flavor (1 vCPU and 512 MB or memory):
Raw
[stack@undercloud-4 ~]$ nova flavor-show m1.tiny
+----------------------------+-------------------------------------------------------------+
| Property | Value |
+----------------------------+-------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 8 |
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "large"} |
| id | 49debbdb-c12e-4435-97ef-f575990b352f |
| name | m1.tiny |
| os-flavor-access:is_public | True |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+-------------------------------------------------------------+
The new instance will boot and will use memory from NUMA 1:
Raw
[stack@undercloud-4 ~]$ nova list | grep d98772d1-119e-48fa-b1d9-8a68411cba0b
| d98772d1-119e-48fa-b1d9-8a68411cba0b | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe8d:a6ef, 10.0.0.102 |
Raw
[root@overcloud-compute-1 nova]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 256
Node 1 HugePages_Surp: 0
Raw
nova boot --nic net-id=$NETID --image cirros --flavor m1.tiny --key-name id_rsa cirros-test0
The 3rd instance fails to boot:
Raw
[stack@undercloud-4 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+
| 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc | cirros-test0 | ERROR | - | NOSTATE | |
| a44c43ca-49ad-43c5-b8a1-543ed8ab80ad | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe0f:565b, 10.0.0.105 |
| e21ba401-6161-45e6-8a04-6c45cef4aa3e | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe69:18bd, 10.0.0.111 |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+
From the compute node, we can see that free hugepages on NUMA Node 0 are exhausted, whereas in theory there's still enough space on NUMA node 1:
Raw
[root@overcloud-compute-1 qemu]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 512
Node 1 HugePages_Surp: 0
/var/log/nova/nova-compute.log reveals that the instance CPU shall be pinned to NUMA node 0:
Raw
instance-00000006
1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc
cirros-test0
2017-11-23 19:53:00
512
8
0
0
1
admin
admin
524288
524288
1
1024
In the above, also look at the nodeset='0' in the numatune section, which indicates that memory shall be claimed from NUMA 0.