Synchronous Exception when booting VMs via qemu-efi-aarch64

Bug #2036604 reported by Heinrich Schuchardt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
New
Undecided
Unassigned
autopkgtest (Ubuntu)
Confirmed
Undecided
Unassigned
edk2 (Debian)
Fix Released
Unknown
edk2 (Ubuntu)
Fix Released
High
dann frazier
qemu (Ubuntu)
Confirmed
Undecided
Unassigned
shim (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I try to create an autopkgtest VM on an arm64 system with

autopkgtest-buildvm-ubuntu-cloud -v --release mantic

or on an amd64 system with

autopkgtest-buildvm-ubuntu-cloud --arch arm64 -v --release mantic

In both cases I get:

Found linux image: /boot/vmlinuz-6.5.0-5-generic
Found initrd image: /boot/initrd.img-6.5.0-5-generic
Found linux image: /boot/vmlinuz-6.3.0-7-generic
Found initrd image: /boot/initrd.img-6.3.0-7-generic
Found linux image: /boot/vmlinuz-6.2.0-20-generic
Found initrd image: /boot/initrd.img-6.2.0-20-generic
Warning: os-prober will not be executed to detect other bootable partitions.
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x5,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x5,0x0)

Synchronous Exception at 0x000000005C328000

Synchronous Exception at 0x000000005C328000
^A^Cqemu-system-aarch64: terminating on signal 2
Traceback (most recent call last):

/var/crash has no entries.

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: qemu-system-arm 1:8.0.4+dfsg-1ubuntu1
ProcVersionSignature: Ubuntu 6.3.0-7.7-generic 6.3.5
Uname: Linux 6.3.0-7-generic aarch64
NonfreeKernelModules: zfs
ApportVersion: 2.27.0-0ubuntu2
Architecture: arm64
CasperMD5CheckResult: pass
CloudArchitecture: aarch64
CloudID: none
CloudName: none
CloudPlatform: none
CloudSubPlatform: config
Date: Tue Sep 19 15:37:20 2023
InstallationDate: Installed on 2021-08-17 (763 days ago)
InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Alpha arm64 (20210813)
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
Lspci-vt:
 -[0000:00]-+-00.0 NVIDIA Corporation GK208B [GeForce GT 730]
            \-00.1 NVIDIA Corporation GK208 HDMI/DP Audio Controller
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.3.0-7-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro
RebootRequiredPkgs: Error: path contained symlinks.
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)
acpidump:

dmi.bios.date: Mar 26 2020
dmi.bios.release: 1.0
dmi.bios.vendor: EFI Development Kit II / Marvell
dmi.bios.version: EDK II
dmi.board.name: Armada 8040 MacchiatoBin
dmi.board.vendor: SolidRun
dmi.board.version: Rev. 1.3
dmi.chassis.type: 2
dmi.chassis.vendor: SolidRun
dmi.chassis.version: Rev. 1.3
dmi.modalias: dmi:bvnEFIDevelopmentKitII/Marvell:bvrEDKII:bdMar262020:br1.0:svnSolidRun:pnArmada8040MacchiatoBin:pvrRev.1.3:rvnSolidRun:rnArmada8040MacchiatoBin:rvrRev.1.3:cvnSolidRun:ct2:cvrRev.1.3:sku:
dmi.product.name: Armada 8040 MacchiatoBin
dmi.product.version: Rev. 1.3
dmi.sys.vendor: SolidRun

Revision history for this message
Heinrich Schuchardt (xypron) wrote :
summary: - qemu-system-arm64: Synchronous Exception
+ Synchronous Exception in qemu-system-arm64 VM during autopkgtest-
+ buildvm-ubuntu-cloud
Revision history for this message
Paride Legovini (paride) wrote (last edit ): Re: Synchronous Exception in qemu-system-arm64 VM during autopkgtest-buildvm-ubuntu-cloud

Hi Heinrich, I tried:

$ autopkgtest-buildvm-ubuntu-cloud -v --release mantic

on my amd64 Mantic system, and looks like it worked:

[...]
[ OK ] Reached target poweroff.target - System Power Off.
[ 500.184102] reboot: Power down
Moving image into final destination ./autopkgtest-mantic-amd64.img

Which host system are you using?

Changed in autopkgtest (Ubuntu):
status: New → Incomplete
Revision history for this message
Paride Legovini (paride) wrote :

I forgot --arch, testing again.

Revision history for this message
Paride Legovini (paride) wrote :

I can now immediately reproduce the issue.

This looks like a bootability issue of the cloud images?

Changed in autopkgtest (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
John Chittum (jchittum) wrote :

do you know which image this is attempting to boot? the last image we have produced for arm64:

http://cloud-images.ubuntu.com/mantic/20230823/

It's a bit old, but we've been hitting some major infra issues with s390x which has prevented publishing. however we don't have any failing tests over the past few weeks for arm64 mantic images booting in Openstack

I'm guessing this comes down to how autopkgtest is attempting to launch the VM, and it's storage backend. Both virtio-blk and virtio-scsi should be supported at this point, but i'll need more info. Could you link to how autpkgtest is attempting to launch a vm?

Revision history for this message
Paride Legovini (paride) wrote (last edit ):

autopkgtest-buildvm-ubuntu-cloud downloads from:

https://cloud-images.ubuntu.com/mantic/current/mantic-server-cloudimg-arm64.img

which I suppose is a link to 20230823/, given that it's currently the latest one. Note that autopkgtest-buildvm-ubuntu-cloud doesn't actually run autopkgtests: it merely prepares an image, booting it using bare qemu and running some commands in it.

I can reproduce the issue by downloading that .img file and running this tool you are probably familiar with:

launch-qcow2-image-qemu-arm64.sh --image mantic-server-cloudimg-arm64.img --password ubuntu

Actually I can reproduce the issue with jammy-server-cloudimg-arm64.img, so maybe this has something to do with the _host_ system (Mantic, in my case)?

Revision history for this message
Heinrich Schuchardt (xypron) wrote :

@John

This is the executed QEMU command:

qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 512 -smp 1 -nographic -net nic,model=virtio -net user,hostfwd=tcp:127.0.0.1:10022-:22 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,id=rng-device0 -monitor unix:/tmp/autopkgtest-qemu.09msxtav/monitor,server=on,wait=off -virtfs local,id=autopkgtest,path=/tmp/autopkgtest-qemu.09msxtav/shared,security_model=none,mount_tag=autopkgtest -device virtio-serial -chardev socket,path=/tmp/autopkgtest-qemu.09msxtav/hvc0,server=on,wait=off,id=hvc0 -device virtconsole,chardev=hvc0 -chardev socket,path=/tmp/autopkgtest-qemu.09msxtav/hvc1,server=on,wait=off,id=hvc1 -device virtconsole,chardev=hvc1 -serial unix:/tmp/autopkgtest-qemu.09msxtav/ttyS0,server=on,wait=off -drive index=0,file=/tmp/autopkgtest-buildvm-ubuntu-cloud433ii_sr/mantic-server-cloudimg-arm64.img,format=qcow2,if=virtio,discard=unmap -drive index=1,file=/tmp/autopkgtest-buildvm-ubuntu-cloud433ii_sr/autopkgtest.seed,format=raw,if=virtio,discard=unmap,readonly -drive if=pflash,format=raw,unit=0,read-only=on,file=/usr/share/AAVMF/AAVMF_CODE.fd -drive if=pflash,format=raw,unit=1,file=/tmp/autopkgtest-qemu.09msxtav/efivars.fd

Changed in autopkgtest (Debian):
status: Unknown → Confirmed
dann frazier (dannf)
Changed in edk2 (Ubuntu):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
no longer affects: autopkgtest (Debian)
Paride Legovini (paride)
summary: - Synchronous Exception in qemu-system-arm64 VM during autopkgtest-
- buildvm-ubuntu-cloud
+ Synchronous Exception in qemu-system-arm64 VM
Changed in edk2 (Ubuntu):
importance: Undecided → High
tags: added: rls-mm-incoming
Paride Legovini (paride)
summary: - Synchronous Exception in qemu-system-arm64 VM
+ Synchronous Exception when booting VMs via qemu-efi-aarch64
Changed in edk2 (Debian):
status: Unknown → Confirmed
Revision history for this message
dann frazier (dannf) wrote :

shim 15.7-0ubuntu1

qemu-efi-aarch64 now implements EFI Memory Attribute Protocol. When shim detects this, it uses it to set memory attributes appropriately for the sections of the bootloader image it loads before passing control to it. After this change, fresh Ubuntu VMs began crashing on startup (bug 2036604):

  --------------------------------------
  BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)
  BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)

  Synchronous Exception at 0x00000000BC300000

  Synchronous Exception at 0x00000000BC300000

  --------------------------------------

 I narrowed this down to only happening when shim executes fbaa64.efi (thus the fresh VM). I found upstream shim is unaffected, so I used bisection to identify the relevant change:

  From c7b305152802c8db688605654f75e1195def9fd6 Mon Sep 17 00:00:00 2001
  From: Nicholas Bishop <REDACTED>
  Date: Mon, 19 Dec 2022 18:56:13 -0500
  Subject: [PATCH] pe: Align section size up to page size for mem attrs

  Setting memory attributes is generally done at page granularity, and
  this is enforced by checks in `get_mem_attrs` and
  `update_mem_attrs`. But unlike the section address, the section size
  isn't necessarily aligned to 4KiB. Round up the section size to fix
  this.

  Signed-off-by: Nicholas Bishop <email address hidden>

Please add this patch to shim.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Changed in shim (Ubuntu):
status: New → Confirmed
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

Hello Dann,

The UEFI specification requires that if a 64 KiB page contains either of

– EfiRuntimeServicesCode
– EfiRuntimeServicesData
– EfiReserved
– EfiACPIMemoryNVS

then all 4KiB pages in the 64KiB page must use identical attributes.

So additionally to the cited patch you must ensure that buffer allocated with AllocatePages() in handle_image() for which you set memory attributes does not contain any of the above memory types. The easiest way to fulfill the requirement is appropriate alignment and rounding of the used memory. I can't find this in upstream shim.

Best regards

Heinrich

Revision history for this message
Julian Andres Klode (juliank) wrote :

Unfortunately this is not something we can fix in shim by the 23.10 release, I expect we'll have new shim with the fix by November.

Revision history for this message
dann frazier (dannf) wrote :

OK, then I think I'll do the following:

 - I'll avoid the regression for 23.10 by disabling the EFI_MEMORY_ATTRIBUTE protocol in edk2 for now and close this bug.

 - I'll repurpose bug 2037137 (currently a dupe of this one) to track re-enabling NX support, with tasks for both edk2 (remove the workaround) and shim (address this bug, and the additional issue in Comment #11)

 - I'll drop the shim task from here.

 - I'll plan to re-enable EFI_MEMORY_ATTRIBUTE protocol again ahead of 24.04 LTS, and communicate this in NEWS.Debian in the package and the release notes for 24.04.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 2036604] Re: Synchronous Exception when booting VMs via qemu-efi-aarch64

On Fri, Sep 22, 2023 at 6:15 PM Heinrich Schuchardt
<email address hidden> wrote:
>
> Hello Dann,
>
> The UEFI specification requires that if a 64 KiB page contains either of
>
> – EfiRuntimeServicesCode
> – EfiRuntimeServicesData
> – EfiReserved
> – EfiACPIMemoryNVS
>
> then all 4KiB pages in the 64KiB page must use identical attributes.
>
> So additionally to the cited patch you must ensure that buffer allocated
> with AllocatePages() in handle_image() for which you set memory
> attributes does not contain any of the above memory types. The easiest
> way to fulfill the requirement is appropriate alignment and rounding of
> the used memory. I can't find this in upstream shim.

Nice catch - would you mind reporting this upstream Heinrich?

  -dann

Changed in edk2 (Debian):
status: Confirmed → Fix Released
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

@Dann

Upstream issue created

Setting memory attributes in handle_image() does not comply with UEFI specification
https://github.com/rhboot/shim/issues/614

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package edk2 - 2023.05-2

---------------
edk2 (2023.05-2) unstable; urgency=medium

  * qemu-efi-aarch64/qemu-efi-arm: Disable the EFI_MEMORY_ATTRIBUTE
    protocol temporarily to workaround a bug in shim until distributions
    have had a chance to fix it. Closes: #1042438, LP: #2036604.
  * Drop qemu-efi transitional package. Closes: #1032695.

 -- dann frazier <email address hidden> Sat, 23 Sep 2023 08:35:39 -0600

Changed in edk2 (Ubuntu):
status: In Progress → Fix Released
tags: added: rls-nn-incoming
removed: rls-mm-incoming
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.