qemu-system-amd64 max cpus is too low for latest processors

Bug #2012763 reported by Jeff Lane 
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Status tracked in Mantic
Jammy
New
Undecided
Sergio Durigan Junior
Lunar
New
Undecided
Sergio Durigan Junior
Mantic
Confirmed
Critical
Sergio Durigan Junior

Bug Description

During testing of an AMD Genoa CPU, it was discovered that qemu-system-amd64 doesn't support enough cpus.

The specific error the tester received was:

qemu-system-x86_64: Invalid SMP CPUs 384. The max supported by machine 'pc-q35-7.1' is 288

Looking at the sournce that seems to be an easy fix at first glance:

https://github.com/qemu/qemu/blob/master/hw/i386/pc_q35.c
372 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
373 m->max_cpus = 288;

Tags: server-todo

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jeff,
thanks for the request, that is a known limit that is being worked on by various upstream projects.

The limit of 288 [1] was deliberately chosen for being the limits of testing at the time and limits of xapic [2].

There recently ~5.15 (which is jammy and later) has been a lift of thelimit on the kernel side [3][4], but that is only the first step.

You also need other components to be ready, like the smbios 3.0 entry point which is in seabios 1.16 (Kinetic and later) and edk2 (there it is rather old and should be ok for longer).

The work / discussions in qemu is ongoing as you might see in [5], but those haven't completed or landed yet - it is work in progress that has to complete and stabilize. You see here that would be a post 7.2 change anyway.

There are more things in the stack which might need patching e.g. in libvirt or even higher parts, I haven't checked those yet - but overall this isn't a "change a number and done" change :-/

I hope that the upstream projects can continue their great work and complete it all, but right now despite looking like a simple number there is not enough confidence for all the implications yet to just bump up that number.

[1]: https://gitlab.com/qemu-project/qemu/-/commit/00d0f9fd6602a27b204f672ef5bc8e69736c7ff1
[2]: https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02266.html
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=074c82c8f7cf8a46c3b81965f122599e3a133450
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da1bfd52b930726288d58f066bd668df9ce15260
[5]: https://<email address hidden>/

Changed in qemu (Ubuntu):
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Jeff Lane  (bladernr) wrote :

Thanks Christian. The tester reporting it was from one of the OEM labs during cert testing on the newer CPUs... I don't think this is really any sort of show-stopper, just one of those things noticed in the output that looked concerning to them (They report in anything that looks out of the ordinary).

So in the context of the details you provided I think it's safe on our end then to just know it's going to be a limitation and then wait for the various bits to update naturally.

Revision history for this message
Jeff Lane  (bladernr) wrote :

This causes QEMU to be unusable on systems with more than 288 cores, notably recent AMD CPUs and is affecting certifications

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm,
"unusable" - really. Isn't it just limiting you to have each guest at max 288 vcpus?
Or did I miss that, due to that, it won't work to create any guest at all?

Revision history for this message
Rod Smith (rodsmith) wrote :

Our test is failing to run, not simply running with fewer than the requested number of cores. From the test output (which includes test script output and formatting, not just QEMU output):

DEBUG:root:Start VM:
ERROR:root:Command lxc start testbed returned a code of 1
ERROR:root: STDOUT:
ERROR:root: STDERR: Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 -- /snap/lxd/24322/bin/qemu-system-x86_64 -S -name testbed -uuid e149a6e6-ce67-4b5b-ab56-94c740521c0e -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/testbed/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/testbed/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/testbed/qemu.pid -D /var/snap/lxd/common/lxd/logs/testbed/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value 1
Try `lxc info --show-log testbed` for more info

I know that may not be the error logs or output you need to fully debug this, but it's what I have on hand. (The system in question belongs to a Canonical partner.) We can work to produce more logs or output, but it would be helpful to know what you need.

Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

Also, I did some digging, we use LXD to kick off KVMs and this exists int he LXD docs:
https://linuxcontainers.org/lxd/docs/stable-4.0/instances/

limits.cpu string - yes - Number or range of CPUs to expose to the instance (defaults to 1 CPU for VMs)

I had hoped that the issue was that kicking off that single VM was somehow going crazy and attaching to every CPU core.

BUT it looks like LXD defaults to 1 CPU for VMs, meaning it's not coming anywhere near close to that limit of 288. If that's the case that means QEMU itself is unsable on these new high-core-count CPUs

We can try to explicitly use limts.cpu with LXD but if that doesn't work, we need some help sorting out exactly what's happening here and how to work around it.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I launched a VM via LXC using qemu and verified that it does only create / attach a single CPU core:
root@maximum-porpoise:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz

Note my machine has a single 10 core CPU with HT enabled:
Architecture: x86_64
  CPU op-mode(s): 32-bit, 64-bit
  Address sizes: 39 bits physical, 48 bits virtual
  Byte Order: Little Endian
CPU(s): 20
  On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
  Model name: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz
    CPU family: 6
    Model: 165
    Thread(s) per core: 2
    Core(s) per socket: 10

So I do suspect that qemu itself simply fails on systems with more than 288 cores regardless of the config of the VM...

The servers that are failing have dual 96 core AMD EPYC 9654 96-Core Processor, which, with hyperthreading provides 384 CPU cores to the system.

I've gone back and asked them to disable hyperthreading to get the CPU count down to 192 cores to see if qemu works then or not... if the same test succeeds with that config, I think that would certainly confirm the issue.

Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

So just to update/reconfirm something, qemu-system-amd64 fails on systems with more than 288 cores, regardless of how you've configured the KVM Guest.

We have had them test both the default (which defaults to 1 vCPU), and by explicitly setting the config to a single vCPU. We have NEVER launched a KVM guest that was handed more than 1 CPU core, as we have always used the default config for simplicity.

Currently, this causes certification tests on systems with high end AMD CPUs to fail, as those have far more than 288 cores.

We do not have a system currently in house to test this with, but we can get our OEM partner to test patched versions of packages that address this. I have also raised this directly with AMD.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Oh wow, Sorry but I didn't read that in between the lines of the report yet.
I expected that to only block extra large guests which is where we would have waited for upstream.

Indeed guests up to the size limit should work (almost) no matter how many CPUs the system has.
Could you please work with Sergio (assigned now) to provide him access to the system so that he can have a look and potential debugging in the real thing.

tags: added: server-todo
Changed in qemu (Ubuntu):
importance: Wishlist → Critical
assignee: nobody → Sergio Durigan Junior (sergiodj)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> So just to update/reconfirm something, qemu-system-amd64 fails on systems with more than
> 288 cores, regardless of how you've configured the KVM Guest.

This really should be a guest size limit, I wonder if the system is picking up any default like "but it could be 384 via hotplugging" that one needs to configure.

@Jeff
Could you - in preparation - please provide the most simple libvirt-xml or qemu commandline that you expect to work but fails when the host count it >288.

> I launched a VM via LXC using qemu and verified that it does only create / attach a single CPU core

They also just use qemu, so that shouldn't be different...
Have you done that test
a) on a different system to check how many CPUs it configures by default?
b) on the 384 cpu system and you are saying "it works with the LXD snaps qemu, but not with the qemu in the Archive"?

If it was (a) that test isn't sufficient as qemu has the concept current and max cpus (available for hot-plug). And the Limit counts against the max-cpus.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I checked LXD myself on my laptop

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 1
=> Yes it is one by default, but it just doesn't give any arguments at all
$ ps axlf | grep qemu | grep j-vm
7 999 2014958 1 20 0 1776840 480184 - Sl ? 0:33 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 6e58b1c8-9484-4131-b4f4-d61e32556d28 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

And at first it looks like LXD does limit things via cpusets only
https://linuxcontainers.org/lxd/docs/stable-4.0/instances/#cpu-limits

Even with that set explicitly it behaves the same:

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c limits.cpu=1
Creating j-vm
Starting j-vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 1
$ ps axlf | grep qemu | grep j-vm
7 999 2033243 1 20 0 1777348 477060 - Sl ? 0:12 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 4c469ad8-136e-422a-9366-3503f072cddd -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c limits.cpu=2
Creating j-vm
Starting j-vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 2
$ ps axlf | grep qemu | grep j-vm
7 999 2036838 1 20 0 1984268 481300 - Sl ? 0:15 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 73ed3b5b-c1f9-4d8f-bed3-dc763a4329e2 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For a start to rule out a real bug...
And to rule out any other smartness let us start a very very small qemu that does almost nothing. Does the following stumble over the 384 cpu error as well?

$ sudo qemu-system-x86_64 -smp cpus=1,maxcpus=1 -enable-kvm -net none -m 512M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

That will load a kernel from your host disk, after kernel load it will fail missing a root disk but that is fine. This way we would quickly know if really "everything fails" (bug) or if there might be just a argument needed in your way to spawn guests (configuration).

Pleas let us know if this works

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

AFAIC - If you insist/depend on LXD - you need to go all-in and use raw.qemu to add commandline parameters ignoring LXDs intentional opinionated use:

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c raw.qemu="-smp cpus=1,maxcpus=1"
Creating j-vm
Starting j-vm
$ ps axlf | grep qemu | grep j-vm | grep smp
7 999 2043048 1 20 0 1777460 323388 - Sl ? 0:07 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 9346be46-67fa-4931-ba2d-529cbc268190 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd -smp cpus=1,maxcpus=1

P.S. if only cpu is set maxcpu is the same and if nowing else is there cpu is implied. So I know that raw.qemu="-smp 1" does the same, but I wanted to be explicit while debugging here.

Revision history for this message
Jeff Lane  (bladernr) wrote :

There seems to have been some movement on this upstream:

https://lore.kernel<email address hidden>/T/#m4f61669a283a87623e4b8ce484e65c1bbaa76935

The exact commands we use typically are:
lxc init ubuntu:22.04 testbed --vm
# lxc config set testbed limits.cpu 1
lxc start testbed

and assume defaults on everything. (the commented config line was added in later as an experiement)

I don't have direct access to a system with that many cores, but I'll ask them to try all your suggestions and update the bug with results.

Revision history for this message
Mark Coskey (mcoskey) wrote :

On our XD225v AMD server with 2P 9754 Bergamo 128c (512 vcpus) on Ubuntu 22.04.2LTS, I ran the command from comment #12, see attached output comment12.txt.

Revision history for this message
Mark Coskey (mcoskey) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

So this was apparently fixed in qemu 8.1.0:

commit e0001297eb2f8569e950e55dbda8ad686e4155fb
Author: Suravee Suthikulpanit <email address hidden>
Date: Wed Jun 7 15:57:17 2023 -0500

    pc: q35: Bump max_cpus to 1024

    Since KVM_MAX_VCPUS is currently defined to 1024 for x86 as shown in
    arch/x86/include/asm/kvm_host.h, update QEMU limits to the same number.

    In case KVM could not support the specified number of vcpus, QEMU would
    return the following error message:

      qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (xxx): Invalid argument

    Also, keep max_cpus at 288 for machine version 8.0 and older.

    Cc: Igor Mammedov <email address hidden>
    Cc: Daniel P. Berrangé <email address hidden>
    Cc: Michael S. Tsirkin <email address hidden>
    Cc: Julia Suvorova <email address hidden>
    Reviewed-by: Igor Mammedov <email address hidden>
    Signed-off-by: Suravee Suthikulpanit <email address hidden>
    Message-Id: <email address hidden>
    Reviewed-by: Michael S. Tsirkin <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>
    Reviewed-by: Daniel P. Berrangé <email address hidden>

$ git tag --contains e0001297eb2
v8.1.0
v8.1.0-rc0
v8.1.0-rc1
v8.1.0-rc2
v8.1.0-rc3
v8.1.0-rc4

Looking at rmadison, mantic only has 8.0.4:

 qemu | 1:8.0.4+dfsg-1ubuntu1 | mantic | source

Would it be possible to:

A: get mantic bumped to 8.1.0
B: work on getting this back to Jammy to unblock 22.04 certs? (well, for now we are just accepting failed VM tests because these larger CPUs have no support in Jammy due to the qemu-system-x86_64 max_cpu limitation.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the update.

It's not possible to bump QEMU to 8.1.0 on Mantic anymore (we're already on Feature Freeze), but it is possible to backport the patch above. It's also possible to backport this patch to Jammy as part of an SRU.

I'm assigning the bug to myself, but I'll likely only have time to work on this bug next week. Also, it's possible that I'll need your help to test the fix.

Thanks.

Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Changed in qemu (Ubuntu Lunar):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Jeff et al,

I worked to create new machine types for Jammy which support up to 1024 CPUs, which is exactly what the upstream patch pointed to by Jeff does. We decided to implement this via new machine types because, as Christian said, it is not entirely clear what kind of side effects this (apparent simple) setting can have, and also (perhaps most importantly) because it is much easier to justify SRUing such change if it's as contained as possible.

You can find a PPA with the proposed change here:

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu

The qemu version is 1:6.2+dfsg-2ubuntu6.16~ppa2. The new machine types are named:

pc-i440fx-jammy-maxcpus Ubuntu 22.04 PC (i440FX + PIIX, maxcpus=1024, 1996)
pc-i440fx-jammy-hpb-maxcpus Ubuntu 22.04 PC (i440FX + PIIX +host-phys-bits=true, maxcpus=1024, 1996)
pc-q35-jammy-maxcpus Ubuntu 22.04 PC (Q35 + ICH9, maxcpus=1024, 2009)
pc-q35-jammy-hpb-maxcpus Ubuntu 22.04 PC (Q35 + ICH9, +host-phys-bits=true, maxcpus=1024, 2009)

Would it be possible for you to give this a try and let me know if it works? I still don't have access to a machine with that number of CPUs, so the amount of testing I can do is limited.

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.