A general-proteciton exception during guest migration to unsupported PKRU machine

Bug #2032164 reported by Chengen Du
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Triaged
High
Chengen Du

Bug Description

[Impact]
When a host that supports PKRU initiates a guest that lacks PKRU support, the flag is enabled on the guest's fpstate.
This information is then passed to userspace through the vcpu ioctl KVM_GET_XSAVE.
However, a problem arises when the user opts to migrate the mentioned guest to another machine that does not support PKRU.
In this scenario, the new host attempts to restore the guest's fpu registers.
Nevertheless, due to the absence of PKRU support on the new host, a general-protection exception takes place, leading to a guest crash.

[Fix]
The problem is resolved by the following upstream commit:
ad856280ddea x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0

Additionally, a subsequent fix tackles the migration problem stemming from the earlier commit:
a1020a25e697 KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES

[Test Plan]
1. Set up two machines: one with PKRU support and the other without.
2. Initiate a guest that lacks PKRU support on the machine with PKRU support.
3. Utilize libvirt to migrate the aforementioned guest to a different machine that lacks PKRU support.
4. The error emerges on the destination machine:
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=86cf7970 EBX=00000000 ECX=00000001 EDX=005b0036
ESI=00000087 EDI=00000087 EBP=87c03e38 ESP=87c03e18
EIP=86cf7d5e EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-07-09T03:03:14.911750Z qemu-system-x86_64: terminating on signal 15 from pid 4134 (/usr/sbin/libvirtd)
2023-07-09 03:03:15.312+0000: shutting down, reason=destroyed

[Where problems could occur]
The introduced commits will impact the guest migration process,
potentially leading to failures and preventing the guest from operating successfully on the migration destination.

Chengen Du (chengendu)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Chengen Du (chengendu)
Chengen Du (chengendu)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Adrien Cunin (adri2000) wrote :

We see the same issue on focal, kernel 5.15.0-76-generic (linux-image-generic-hwe-20.04).

Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-85.95 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy-linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux
Revision history for this message
Chengen Du (chengendu) wrote :

The kernels (5.15.0-85.95) have been tested without any issues.

tags: added: verification-done-jammy-linux
removed: verification-needed-jammy-linux
Revision history for this message
Alan Baghumian (alanbach) wrote :

We have now confirmed at three different locations that Live-Migration from PKRU PRE-5.15.0-85.95 to PKRU 5.15.0-85.95 compute nodes breaks.

Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Revision history for this message
Stefan Bader (smb) wrote :

This got reverted on request since it caused different migration issues (bug #2036675).

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Triaged
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1050.57 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/5.15.0-1018.18 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-done-jammy-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-failed-jammy-linux-nvidia-tegra'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-v2 verification-needed-jammy-linux-nvidia-tegra
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-igx/5.15.0-1005.5 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-done-jammy-linux-nvidia-tegra-igx'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-failed-jammy-linux-nvidia-tegra-igx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 verification-needed-jammy-linux-nvidia-tegra-igx
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.