cpu features hle and rtm disabled for security are present in /usr/share/libvirt/cpu_map.xml
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
High
|
Christian Ehrhardt | ||
Bionic |
Confirmed
|
Undecided
|
Ubuntu Security Team | ||
Eoan |
Won't Fix
|
Undecided
|
Ubuntu Security Team | ||
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Confirmed
|
Undecided
|
Ubuntu Security Team | ||
Eoan |
Won't Fix
|
Undecided
|
Ubuntu Security Team |
Bug Description
When trying to launch an instance in OpenStack Queens on Ubuntu 18.04 with the new kernels, this error happens:
Error: Failed to perform requested operation on instance "david", the instance has an error status: Please try again later [Error: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance bf8dc8b8-
This seems to be caused by the new kernels disabling the tsx cpu feature as per https:/
Disabling tsx also disables hle and rtm, and /usr/share/
ubuntu@cloud3:~$ grep -e "model name" -e hle -e rtm -e tsx
[...]
<model name='Haswell'>
<feature name='hle'/>
<feature name='rtm'/>
<model name='Haswell-
<feature name='hle'/>
<feature name='rtm'/>
[...]
<model name='Broadwell'>
<feature name='hle'/>
<feature name='rtm'/>
<model name='Broadwell
<feature name='hle'/>
<feature name='rtm'/>
<model name='Skylake-
<feature name='hle'/>
<feature name='rtm'/>
<model name='Skylake-
<feature name='hle'/>
<feature name='rtm'/>
<model name='Skylake-
<feature name='hle'/>
<feature name='rtm'/>
<model name='Skylake-
<feature name='hle'/>
<feature name='rtm'/>
[...]
This only happens when configuring cpu_mode and cpu_model in /etc/nova/
[libvirt]
cpu_mode = custom
cpu_model = Skylake-Server-IBRS
In my case, this was done by setting the cpu-mode and cpu-model nova-compute charm options.
[Additional info]
I see this issue with the following kernel and libvirt versions:
Linux cloud3 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@cloud3:~$ dpkg -l | grep -e libvirt -e nova
ii libvirt-clients 4.0.0-1ubuntu8.13 amd64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.13 amd64 Virtualization daemon
ii libvirt-
ii libvirt-
ii libvirt0:amd64 4.0.0-1ubuntu8.13 amd64 library for interfacing with different virtualization systems
ii nova-common 2:17.0.11-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:17.0.11-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:17.0.11-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-
ii python-libvirt 4.0.0-1 amd64 libvirt Python bindings
ii python-nova 2:17.0.11-0ubuntu1 all OpenStack Compute Python libraries
ii python-novaclient 2:9.1.1-0ubuntu1 all client library for OpenStack Compute API - Python 2.7
ubuntu@cloud3:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
[Workaround]
A workaround is to remove the cpu_mode and cpu_model lines in the libvirt section of /etc/nova/
This can be done with juju like this:
juju config nova-compute-kvm --reset cpu-model
juju config nova-compute-kvm --reset cpu-mode
Apparently another workaround would be to re-enable the tsx cpu feature on the host with tsx=yes on the boot command line, but I have not tested that workaround.
Related branches
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
- Canonical Server packageset reviewers: Pending requested
-
Diff: 5704 lines (+5514/-0)31 files modifieddebian/changelog (+13/-0)
debian/patches/series (+29/-0)
debian/patches/stable/lp-1868539-bhyve-command-remove-unused-includes.patch (+41/-0)
debian/patches/stable/lp-1868539-daemon-set-default-memlock-limit-for-systemd-service.patch (+94/-0)
debian/patches/stable/lp-1868539-m4-libxl-properly-fail-when-libxl-is-required.patch (+47/-0)
debian/patches/stable/lp-1868539-qemu-Don-t-compare-local-and-remote-hostnames-on-mig.patch (+62/-0)
debian/patches/stable/lp-1868539-qemu-Stop-domain-on-failed-restore.patch (+104/-0)
debian/patches/stable/lp-1868539-qemu-Use-g_autoptr-for-qemuDomainSaveCookie.patch (+140/-0)
debian/patches/stable/lp-1868539-qemu-do-not-revert-to-NULL-bandwidth.patch (+45/-0)
debian/patches/stable/lp-1868539-qemu-preserve-error-on-bandwidth-rollback.patch (+59/-0)
debian/patches/stable/lp-1868539-qemu-save-restore-original-error-when-recovering-fro.patch (+60/-0)
debian/patches/stable/lp-1868539-qemu-use-correct-backendType-when-checking-memfd-cap.patch (+46/-0)
debian/patches/stable/lp-1868539-qemuDomainGetStatsIOThread-Don-t-leak-array-with-0-i.patch (+49/-0)
debian/patches/stable/lp-1868539-qemuDomainSaveImageStartVM-Use-VIR_AUTOCLOSE-for-int.patch (+50/-0)
debian/patches/stable/lp-1868539-qemuDomainSaveImageStartVM-Use-g_autoptr-for-virComm.patch (+40/-0)
debian/patches/stable/lp-1868539-qemuTestParseCapabilitiesArch-Free-binary.patch (+52/-0)
debian/patches/stable/lp-1868539-security-Try-harder-to-run-transactions.patch (+97/-0)
debian/patches/stable/lp-1868539-tests-fix-double-unlock-of-monitor-in-hotplug-test.patch (+64/-0)
debian/patches/stable/lp-1868539-testutils-check-return-value-of-g_setenv.patch (+39/-0)
debian/patches/stable/lp-1868539-testutilsxen-error-out-on-initialization-failure.patch (+42/-0)
debian/patches/stable/lp-1868539-virDomainFSDefFree-Unref-private-data.patch (+52/-0)
debian/patches/stable/lp-1868539-virsystemdtest-do-not-leak-socket-path.patch (+55/-0)
debian/patches/stable/lp-1868539-vz-Fix-return-value-in-error-path.patch (+49/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Add-decode-element-to-x86-CPU-model-definiti.patch (+741/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Add-more-noTSX-x86-CPU-models.patch (+695/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Don-t-use-new-noTSX-models-for-host-model-CP.patch (+129/-0)
debian/patches/ubuntu/lp-1853200-cpu_x86-Honor-CPU-models-decode-element.patch (+59/-0)
debian/patches/ubuntu/lp-1853200-cputest-Add-data-for-Intel-R-Core-TM-i7-8550U-CPU-wi.patch (+2022/-0)
debian/patches/ubuntu/lp-1867460-qemu-fixing-auto-detecting-binary-in-domain-capabili.patch (+115/-0)
debian/patches/ubuntu/lp-1867460-qemu_capabilities-Rework-domain-caps-cache.patch (+325/-0)
debian/patches/ubuntu/lp-1868528-util-virhostcpu-Fail-when-fetching-CPU-Stats-for-inv.patch (+99/-0)
CVE References
description: | updated |
Changed in libvirt (Ubuntu): | |
status: | Confirmed → Won't Fix |
Hi Dave,
IIRC Openstack either tries to determine the least common denominator (in cpu features) or whatever you pass to hi, in your case that was:
[libvirt]
cpu_mode = custom
cpu_model = Skylake-Server-IBRS
And your guest definition won't change after the initial definition. Even if you would run host-model instead of a named type it would (in the past) have determined the hle and rtm features and now can't start with them.
But Skylake-Server-IBRS is a name for a defined set of feature and it would be a bug to change "Skylake- Server- IBRS" to now contain other features.
As you have spotted yourself people could set tsx=yes on the commandline or for whatever probably non-smart reason run with a kernel without the fixes.
Therefore changing the existing "Skylake- Server- IBRS" is a no-go as an SRU, lets consider other options.
---
Upstream did create these new custom names with the -IBRS suffix when the first security issues hit. But as you know there were many issues following that one like L1TF, MDS, ....
Upstream realized quickly that this would be a massive type proliferation that grows even further every now and then.
Also these types back then got defined in qemu not libvirt, you can see them with libvirt/ cpu_map* .
$ qemu-system-x86_64 -cpu ?
Libvirt only tracks names and features of those in /usr/share/
---
Interestingly for all the dangers and drawbacks of host-passthrough, in these cases those setups would not care as they would just pass less features. But modelling the features in libvirt or openstack made them explicit and it is now correctly telling us that it can't provide those.
---
Back when the first set of spectre mitigations hit Daniel made a great post summarizing how configuring models&features works including modifying the named models to yoour needs. /www.berrange. com/posts/ 2018/06/ 29/cpu- model-configura tion-for- qemu-kvm- on-x86- hosts/ Server- IBRS" would now be in libvirt like:
<model> Skylake- Server- IBRS</model>
=> https:/
An example for your "Skylake-
<cpu mode='custom'>
<feature name="hle" policy="disable"/>
<feature name="rtm" policy="disable"/>
</cpu>
Therefore from libvirt's perspective there isn't much we can/should do IMHO, I'll double check if upstream on qemu/libvirt considered otherwise and again defined new types or other quirks. But looking at L1TF, MDS and such I'm expecting that using individual features is what is expected.
---
Lets summarize the options we have right now:
a) You can define your own types for libvirt in /usr/share/ libvirt/ cpu_map, that seems tempting at first, but
a1) you'd still need to change the type in every guest, so you gained nothing
a2) those are not meant to be edited, e.g. they are no conffiles and will
be overwritten on upgrades of libvirt0
b) Define a new type in qemu and then libvirt as the -IBRS types
b1) as I said recent security fixes didn't do this anymore, I don't expect this to be different
b2) this needs to be in sync with others (upstream and distros) or proliferation
and confusion gets even worse
c) Start to define your guests based on feature and not (only) on names
c1) that is what most recent security fixes ...