Windows 10 wil not install using qemu-system-x86_64

Bug #1915063 reported by David Ober
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned
linux-oem-5.10 (Ubuntu)
Fix Released
Undecided
Unassigned
linux-oem-5.6 (Ubuntu)
Confirmed
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Steps to reproduce
install virt-manager and ovmf if nopt already there
copy windows and virtio iso files to /var/lib/libvirt/images

Use virt-manager from local machine to create your VMs with the disk, CPUs and memory required
    Select customize configuration then select OVMF(UEFI) instead of seabios
    set first CDROM to the windows installation iso (enable in boot options)
    add a second CDROM and load with the virtio iso
 change spice display to VNC

  Always get a security error from windows and it fails to launch the installer (works on RHEL and Fedora)
I tried updating the qemu version from Focals 4.2 to Groovy 5.0 which was of no help
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.14
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
DistributionChannelDescriptor:
 # This is the distribution channel descriptor for the OEM CDs
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-sutton-focal-amd64-20201030-422+pc-sutton-bachman-focal-amd64+X00
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-01-20 (19 days ago)
InstallationMedia: Ubuntu 20.04 "Focal" - Build amd64 LIVE Binary 20201030-14:39
MachineType: LENOVO 30E102Z
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.6.0-1042-oem root=UUID=389cd165-fc52-4814-b837-a1090b9c2387 ro locale=en_US quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.6.0-1042.46-oem 5.6.19
RelatedPackageVersions:
 linux-restricted-modules-5.6.0-1042-oem N/A
 linux-backports-modules-5.6.0-1042-oem N/A
 linux-firmware 1.187.8
RfKill:

Tags: focal
Uname: Linux 5.6.0-1042-oem x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip docker kvm libvirt lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 07/29/2020
dmi.bios.vendor: LENOVO
dmi.bios.version: S07KT08A
dmi.board.name: 1046
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.type: 3
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrS07KT08A:bd07/29/2020:svnLENOVO:pn30E102Z:pvrThinkStationP620:rvnLENOVO:rn1046:rvrNotDefined:cvnLENOVO:ct3:cvrNone:
dmi.product.family: INVALID
dmi.product.name: 30E102Z
dmi.product.sku: LENOVO_MT_30E1_BU_Think_FM_ThinkStation P620
dmi.product.version: ThinkStation P620
dmi.sys.vendor: LENOVO

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1915063

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
David Ober (dober60) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
David Ober (dober60) wrote : AudioDevicesInUse.txt

apport information

Revision history for this message
David Ober (dober60) wrote : CRDA.txt

apport information

Revision history for this message
David Ober (dober60) wrote : CurrentDmesg.txt

apport information

Revision history for this message
David Ober (dober60) wrote : IwConfig.txt

apport information

Revision history for this message
David Ober (dober60) wrote : Lspci.txt

apport information

Revision history for this message
David Ober (dober60) wrote : Lspci-vt.txt

apport information

Revision history for this message
David Ober (dober60) wrote : Lsusb.txt

apport information

Revision history for this message
David Ober (dober60) wrote : Lsusb-t.txt

apport information

Revision history for this message
David Ober (dober60) wrote : Lsusb-v.txt

apport information

Revision history for this message
David Ober (dober60) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
David Ober (dober60) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
David Ober (dober60) wrote : ProcInterrupts.txt

apport information

Revision history for this message
David Ober (dober60) wrote : ProcModules.txt

apport information

Revision history for this message
David Ober (dober60) wrote : PulseList.txt

apport information

Revision history for this message
David Ober (dober60) wrote : UdevDb.txt

apport information

Revision history for this message
David Ober (dober60) wrote : WifiSyslog.txt

apport information

Revision history for this message
David Ober (dober60) wrote : acpidump.txt

apport information

David Ober (dober60)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Mark Pearson (markrhpearson) wrote :

We believe that the latest version of Windows doesn't play nice with the older version of QEMU - it seems Windows is broken somewhere between 4.2.0-2 and 5.0.0-6 on AMD Ryzen based processors (which is what we have on the P620).

What are the recommendations for the best way for a Ubuntu customer to get going again, with an updated version of QEMU?

affects: linux (Ubuntu) → qemu (Ubuntu)
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

I'm subscribing Christian to this bug, who is our QEMU expert.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I was just recently for a different case installing a Win10 guest and had the ISOs available.
So I was trying to install an OVMF based one through virt-manager as you asked.

ISOs
- win10 20H2_v2
- virtio iso 0.1.190

I used the default "new guest" of virt-manager and then went into "customize before install" to also add the virtio drivers and select the OVMF mode.

BTW I'd not expect that providing the virtio ISO is related to triggering (or not triggering the bug). If you could confirm that it happens without that as well it would make reproducing this a little bit easier. The default config uses SATA.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Info:
If the ISO is not auto-boointg (depends on some setup detail e.g. how you add your extra CD images) you can still manually load the Windows EFI loader like:
FS0:
cd EFI
cd BOOT
BOOTX64.EFI

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I started my tests on 21.04 Hirsute as that is what I had around.
qemu 1:5.2+dfsg-6ubuntu2 virt-manager 1:3.2.0-3 ovmf 2020.11-2
That started up the installer just fine no matter if I used SATA or virtio for the root disk.

Next I tried Focal (as reported)
qemu 1:4.2-3ubuntu6.14 virt-manager 1:2.2.1-3ubuntu2.1 ovmf 0~20191122.bd85bf54-2ubuntu3.1
That also started the windows installer without an error.

Therefore I can't confirm your issue (yet) and will set this to incomplete.
Have you in the meantime found more details about what exactly makes the issue trigger/pass for you?

In regard your further comments and for clarification:
- @Mark: I don't have a Ryzen based processor thou, so your case could still be absolutely
  valid, yet I can't confirm/deny at the time. Do you have any more data showing that it is
  processor dependent?
- David said 5.0 didn't help while Mark said between .2.0-2 and 5.0.0-6 it is fixed. Are
  we sure you all talk about the same bug?
  For those that want/can try a new version give the virt-stack from [1] a try, if that
  resolves it for you we can look fox fixes in between those versions.
- David you said "get a security error from windows" can you be more specific about the error
  that you get? That will also help to check if you two are facing the same issue.

[1]: https://launchpad.net/~canonical-server/+archive/ubuntu/server-backports

Changed in qemu (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
David Ober (dober60) wrote :

I am not using a virtio based drive so that should not be par of the issue, I normally do not use the virtio iso until windows is installed to clear errors in the device manager
I tried using even the version 5.2 from the hirsute release and that also is not working
As a test I tried doing this from an Intel based machine and it does install correctly using even the default version of qemu-system_x64 from focal
Attaching a screen show of the error I get when installing on an AMD Ryzen Threadripper processor

Revision history for this message
David Ober (dober60) wrote :
Changed in qemu (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks David,
I have no threadripper around atm, I think I can next week get hands on an EPYC Rome, but that isn't 100% the same.

But gladly you tried this on the latest qemu 5.2 and since it is failing there as well it might be worth to also report it upstream. That is a great community which might have ran things on a threadripper already and be able to point us to a qemu/kernel fix - or at least an existing discussions abut it.
For now I'm adding a qemu task here which will mirror this case to the mailing list.

Revision history for this message
David Ober (dober60) wrote :

I was playing around with this and find that if I change the Configuration under CPUs from the default (uncheck "Copy host CPU configuration") and select qemu64 in the Model drop down box I can get it to work

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That is awesome David,
qemu64 is like a very low common denominator with only very basic CPU features.
While "copy host" means "enable all you can".

We can surely work with that a bit, but until I get access to the same HW I need you to do it.

If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Skylake-Client-IBRS</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
...
    </mode>

That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.

If you could please post the full output you have, that can be useful.
From there you could go two steps.
1. as you see in my example it will list some cpu features on top of the named type.
   If you remove them one by one you might be able to identify the single-cpu featute
   that breaks in your case.
2. The named CPU that you have also has a representation, it can be found in
   /usr/share/libvirt/cpu_map...
   That ill list all the CPU features that make up the named type.
   If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled
   state, example
    <feature policy='disable' name='ss'/>

A description of the underlying mechanism is here https://libvirt.org/formatdomain.html#cpu-model-and-topology

Revision history for this message
David Ober (dober60) wrote :
Download full text (6.1 KiB)

<domainCapabilities>
  <path>/usr/bin/qemu-system-x86_64</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-hirsute</machine>
  <arch>x86_64</arch>
  <vcpu max='255'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'>
      <value>efi</value>
    </enum>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE_4M.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='no'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='no'>coreduo</model>
      <model usable='no'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='no'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='no'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='no'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='yes'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='no'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='no'>IvyBridge-IBRS</model>
      <model usable='no'>IvyBridge</model>
      <model usable='no'>Icelake-Server-noTSX</model>
  ...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.5 KiB)

Ok, so you should be able to drop these lines one by one:

      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>

If that does not yet make it work, add those one by one (removing the features of the named type)
    <feature policy='disable' name='3dnowprefetch'/>
    <feature policy='disable' name='abm'/>
    <feature policy='disable' name='adx'/>
    <feature policy='disable' name='aes'/>
    <feature policy='disable' name='amd-stibp'/>
    <feature policy='disable' name='apic'/>
    <feature policy='disable' name='arat'/>
    <feature policy='disable' name='avx'/>
    <feature policy='disable' name='avx2'/>
    <feature policy='disable' name='bmi1'/>
    <feature policy='disable' name='bmi2'/>
    <feature policy='disable' name='clflush'/>
    <feature policy='disable' name='clflushopt'/>
    <feature policy='disable' name='clwb'/>
    <feature policy='disable' name='clzero'/>
    <feature policy='disable' name='cmov'/>
    <feature policy='disable' name='cr8legacy'/>
    <feature policy='disable' name='cx16'/>
    <feature policy='disable' name='cx8'/>
    <feature policy='disable' name='de'/>
    <feature policy='disable' name='f16c'/>
    <feature policy='disable' name='fma'/>
    <feature policy='disable' name='fpu'/>
    <feature policy='disable' name='fsgsbase'/>
    <feature policy='disable' name='fxsr'/>
    <feature policy='disable' name='fxsr_opt'/>
    <feature policy='disable' name='ibpb'/>
    <feature policy='disable' name='lahf_lm'/>
    <feature policy='disable' name='lm'/>
    <feature policy='disable' name='mca'/>
    <feature policy='disable' name='mce'/>
    <feature policy='disable' name='misalignsse'/>
    <feature policy='disable' name='mmx'/>
    <feature policy='disable' name='mmxext'/>
    <feature policy='disable' name='movbe'/>
    <feature policy='disable' name='msr'/>
    <feature policy='disable' name='mtrr'/>
    <feature policy='disable' name='npt'/>
    <feature policy='disable' name='nrip-save'/>
    <feature policy='disable' name='nx'/>
    <feature policy='disable' name='osvw'/>
    <feature policy='disable' name='pae'/>
    <feature policy='disable' name='pat'/>
    <feature policy='disable' name='pclmuldq'/>
    <feature policy='disable' name='pdpe1gb'/>
    <feature policy='disable' name='perfctr_core'/>
    <feature policy='disable' name='pge'/>
    <feature policy='disable' name='pni'/>
    <feature policy='disable' name='popcnt'/>
    <feature p...

Read more...

Revision history for this message
Igor (imammedo) wrote : Re: [Bug 1915063] Re: Windows 10 wil not install using qemu-system-x86_64

On Sat, 03 Apr 2021 16:52:13 -0000
Christian Ehrhardt  <email address hidden> wrote:

> That is awesome David,
> qemu64 is like a very low common denominator with only very basic CPU features.
> While "copy host" means "enable all you can".

Also it's worth to try setting real CPU topology for if EPYC cpu model is used.
i.e. use -smp with options that resemble a real EPYC cpu
(for number of core complexes is configured with 'dies' option in QEMU)

> We can surely work with that a bit, but until I get access to the same
> HW I need you to do it.
>
>
> If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like
>
> <mode name='host-model' supported='yes'>
> <model fallback='forbid'>Skylake-Client-IBRS</model>
> <vendor>Intel</vendor>
> <feature policy='require' name='ss'/>
> <feature policy='require' name='vmx'/>
> <feature policy='require' name='hypervisor'/>
> ...
> </mode>
>
>
> That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.
>
> If you could please post the full output you have, that can be useful.
> >From there you could go two steps.
> 1. as you see in my example it will list some cpu features on top of the named type.
> If you remove them one by one you might be able to identify the single-cpu featute
> that breaks in your case.
> 2. The named CPU that you have also has a representation, it can be found in
> /usr/share/libvirt/cpu_map...
> That ill list all the CPU features that make up the named type.
> If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled
> state, example
> <feature policy='disable' name='ss'/>
>
> A description of the underlying mechanism is here
> https://libvirt.org/formatdomain.html#cpu-model-and-topology
>

Revision history for this message
David Ober (dober60) wrote :

I have not done any of what you are asking so not exactly sure how to change those values, been looking and reading but not finding what I want so thought it might be better to just ask how to do what yo are asking. I did try CPU type EPYC and that did get past the error I am seeing on install

Revision history for this message
Igor (imammedo) wrote :

On Wed, 07 Apr 2021 13:00:23 -0000
David Ober <email address hidden> wrote:

> I have not done any of what you are asking so not exactly sure how to
> change those values, been looking and reading but not finding what I
> want so thought it might be better to just ask how to do what yo are
> asking.

see https://libvirt.org/formatdomain.html#cpu-model-and-topology
for the way to describe topology in domain xml.
Pick a real AMD CPU for cpu model you're are having problem with,
and use its config to define topology.

> I did try CPU type EPYC and that did get past the error I am
> seeing on install
So it works with EPYC but not with ECPY-Rome, then probably topology
is not issue.

CCing Babu,
who added EPYC-Rome cpu model, maybe he can help

Revision history for this message
Babu Moger (babumoger) wrote :

I remember seeing something similar before. This was supposed to be fixed by the linux kernel commit.

commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
Author: Maxim Levitsky <email address hidden>
Date: Wed Jul 8 14:57:31 2020 +0300

kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host

# git describe --contains 841c2be09fe4f495fe5224952a419bd8c7e5b455
v5.9-rc1~121^2~67

Problem seems to happen with EPYC-Rome model which exposes the feature STIBP but not IBRS.

Did you guys try "-cpu host"? It might work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Babu/Igor for chiming in!

@Babu
That exposed STIBP but not IBRS - isn't that what you tried to solve (for userspace) in qemu via a v2 for the Rome chips?
=> https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html

I was recently pinging that, as it wasn't merged into the qemu 6.0-rc
Do you have any more insight why this is held back still?

If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
Assumptions (please correct me):
1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even on unfixed kernels)
2. with the kernel fix it will no more crash, even with an unfixed qemu?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

Finally I'm able to test on a Threadripper myself now.

Note: In regard to the commit that Babu identified - I'm on kernel 5.10.0-1020-oem so that patch would be applied already. I need to find an older kernel to retry with that as well

(on that new kernel) I did a full Win10 install and it worked fine for me.

In regard to CPU types (for comparison) I got

qemu 1:4.2-3ubuntu6.15 / libvirt 6.0.0-0ubuntu8.8:
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>

With a more recent qemu/libvirt it isn't much different for this chip (there recently were some Milan changes, but those seem not to matter for this chip).

qemu 1:5.2+dfsg-9ubuntu1 / libvirt 7.0.0-2ubuntu1

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>

I wasn't able to crash this setup with an old (18.04) nor a new 21.04) Ubuntu guest.
Installing Win10 worked fine for a while and didn't break as reported. But the setup I have goes through triple ssh-tunnels and around the globe - that slows things down a lot :-/
This is how far I've got:
1. start up the install
2. select no license key -> custom install -> it started copying files
3. it goes into the first reboot

After this the latency kills me and virt-manager starts to abort the installation.
So far I did not hit (https://launchpadlibrarian.net/529734412/security.png) as reported by David.
@David - did this already pass the critical step for you, how earl...

Read more...

Revision history for this message
Babu Moger (babumoger) wrote :

@Christian,
Yes. This following patch fixes the problem
 https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html

I saw your ping on the patch. I am not sure why it is not picked up. I am going ping them today.

>If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
>Assumptions (please correct me):
Problem seem to happen when guest tries to access the SPEC_CTRL register to with the wrong settings. The kernel fix avoids writing those values and avoids #GP fault.

>1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even >on unfixed kernels)
Yes. With Qemu patch EPYC-Rome v2 this issue will be fixed.

>2. with the kernel fix it will no more crash, even with an unfixed qemu?
Yes, That is correct. We need at lease one of these patches to fix this problem.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

David used "5.6.0-1042.46-oem", the closest I had was "5.6.0-1052-oem" so I tried that one.

With that my win10 install immediately crashed into the reported issue.

So to summarize:
1. I can reproduce it
2. Chances are high that it is fixed by kernel commit 841c2be0 "kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host"
3. there are some qemu changes which might be related, but we need Babu to reply about if/how those are related

I need to get myself updated on Ubuntu oem kernels.
If there is a 5.6 series that is supposed to work on that, then this patch needs to be backported.
But if OTOH it is a valid upgrade path that you'll get the 5.10.0-1020-oem that I had or later as part of your 20.04 OEM then that "is the fix" for you @David.

Changed in linux-oem-5.10 (Ubuntu):
status: New → Fix Released
Changed in linux-oem-5.6 (Ubuntu):
status: New → Confirmed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks @Babu for the clarifications!
I really hope that the qemu patch makes it in v6.0 - then I can better consider picking it up as backport for qemu (already have a bug about that in bug 1921754 - therefore I'm setting the qemu task here as invalid)

The last step I can provide for the kernel bug that this one here is (before the rest of the work is with the kernel Team) is to verify/falsify if that also affects the non-oem linux-generic kernel.
There the latest was 5.4.0.71.74 from focal-proposed and the latest already released one is 5.4.0.70.73.

5.4.0.70.73 - failing
5.4.0.71.74 - failing

So while the almost-released oem kernel based on 5.10 will cover this - the patch should indeed also be backported to linux-generic and all the other flavours - otherwise Windows (and potentially more) will no more be usable as KVM guest on such Chips (threadrippers, but maybe more AMD chips that are not yet known as well)

Changed in qemu (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Andy Whitcroft (apw) wrote :

The commit in question is marked for stable:

  commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
  Author: Maxim Levitsky <email address hidden>
  Date: Wed Jul 8 14:57:31 2020 +0300

    kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host

    To avoid complex and in some cases incorrect logic in
    kvm_spec_ctrl_test_value, just try the guest's given value on the host
    processor instead, and if it doesn't #GP, allow the guest to set it.

    One such case is when host CPU supports STIBP mitigation
    but doesn't support IBRS (as is the case with some Zen2 AMD cpus),
    and in this case we were giving guest #GP when it tried to use STIBP

    The reason why can can do the host test is that IA32_SPEC_CTRL msr is
    passed to the guest, after the guest sets it to a non zero value
    for the first time (due to performance reasons),
    and as as result of this, it is pointless to emulate #GP condition on
    this first access, in a different way than what the host CPU does.

    This is based on a patch from Sean Christopherson, who suggested this idea.

    Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL")
    Cc: <email address hidden>
    Suggested-by: Sean Christopherson <email address hidden>
    Signed-off-by: Maxim Levitsky <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>

It appears to be in `v5.4.102` which is currently queued up for the cycle following the one just starting.

Revision history for this message
Thomas Huth (th-huth) wrote :

The patch for QEMU that has been mentioned in comment #38 has been merged already, so I'm marking this as Fix-Released there.

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.