Jammy buildd image doesn't boot because grub is installed to \EFI\debian instead of \EFI\ubuntu

Bug #2034253 reported by Ricardo Abreu
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
High
John Chittum
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
grub2 (Ubuntu)
In Progress
Undecided
Unassigned
Focal
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned
livecd-rootfs (Ubuntu)
Invalid
Undecided
John Chittum
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

Recent Jammy buildd images fail to boot (since at least September 1st: https://github.com/canonical/craft-application/actions/runs/6016993725/job/16424264050?pr=64).

Trying to run an image from https://cloud-images.ubuntu.com/buildd/daily/jammy/current in QEMU, only gets me a GRUB prompt. Multipass and Snapcraft consequently fail to work with these images.

I haven't checked other image series.

This is similar to https://bugs.launchpad.net/cloud-images/+bug/2027686, but a new issue.

SRU Template for cloud-images and livecd-rootfs (not the change for grub2)

[ Impact ]

 * snapcraft consumers daily buildd images for running snap builds. in this case Core22 based builds will use the daily 22.04 build image. At this time, that image drops to a grub menu. This causes snap builds to fail by having everything hang for a long time (multipass doesn't handle this case well, and it hangs and enters an odd state. snapcraft then hangs, also in a bad state)

[ Test Plan ]

 * build image
 * boot image using qemu. example script:

qemu-system-x86_64 \
-cpu host -machine type=q35,accel=kvm -m 1024 \
-nographic \
-snapshot \
-netdev id=net00,type=user,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net00 \
-drive if=virtio,format=qcow2,file=build.output/livecd.ubuntu-base.disk-linux-virtual.img \
-cdrom <A_WORKING_CLOUD_INIT_FILE> \
-bios /usr/share/OVMF/OVMF_CODE.fd

This is very close to how multipass calls qemu under the hood.
  * observe that the machine successfully boots (no grub prompt)

[ Where problems could occur ]

 * for `livecd-rootfs`, images could still fail to boot, probably from an incorrect GRUB variable being written in the file.
 * ensuring grub2 is updatable properly. The current known use case for buildd daily vm images is multipass, via snapcraft, so they're ephemeral. but there is nothing stopping someone from utilizing them in a longer running setup.

[ Other Info ]

 * buildd images need more testing. that's on my team at this time. it's in the backlog as an item, and we should endeavor to add it in the next roadmap cycle.

Related branches

Revision history for this message
John Chittum (jchittum) wrote :

1. could you provide any specific failures?
2. can you provide how it's booting? UEFI boot? Secureboot? etc

right now i'm trying directly with QEMU. in a Bios booting (not UEFI) setup, i cannot reproduce.

with UEFI i'm seeing this error, which is new to me:

BdsDxe: failed to load Boot0001 "UEFI QEMU DVD-ROM QM00005 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x2,0xFFFF,0x0): Not Found
BdsDxe: loading Boot0002 "UEFGNU GRUB version 2.06iRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0002 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
   Minimal BASH-like line editing is supported. For the first word, TAB
   lists possible command completions. Anywhere else TAB lists possible
   device or file completions.

this is my QEMU call:

qemu-system-x86_64 \
-cpu host -machine type=q35,accel=kvm -m 2048 \
-nographic \
-snapshot \
-netdev id=net00,type=user,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net00 \
-drive if=virtio,format=qcow2,file=20230901-jammy-server-cloudimg-amd64-disk1.img \
-drive if=virtio,format=raw,file=/home/jchittum/dev01/vmdks/ci-ssh-pub-set.iso \
-drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_CODE.fd,readonly=on

I can confirm that OVMF_CODE.fd is on my local filesystem at that path:

stat /usr/share/OVMF/OVMF_CODE.fd
  File: /usr/share/OVMF/OVMF_CODE.fd
  Size: 1966080 Blocks: 3840 IO Block: 4096 regular file
Device: 10305h/66309d Inode: 11534775 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-09-05 15:22:39.599644073 -0500
Modify: 2022-09-12 22:05:26.000000000 -0500
Change: 2022-10-28 01:11:35.003430249 -0500
 Birth: 2022-10-28 01:11:34.859427923 -0500

What's interesting is when i do this on an Openstack instance, i see the failure but it still boots:

BdsDxe: failed to load Boot0001 "UEFI QEMU DVD-ROM QM00005 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x2,0xFFFF,0x0): Not Found
BdsDxe: loading Boot0002 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0002 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
[ 0.000000] Linux version 5.15.0-82-generic (buildd@lcy02-amd64-027) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC )
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-82-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0

Lots more to do on triage, but if i know the way this is being launched more specifically, it'll help.

Revision history for this message
Ricardo Abreu (ricab) wrote :

For completeness, adding here the command line we're using (same that Chris sent on Mattermost):

/snap/multipass/10512/usr/bin/qemu-system-x86_64 -bios OVMF.fd --enable-kvm -cpu host -nic tap,ifname=tap-25903a30615,script=no,downscript=no,model=virtio-net-pci,mac=52:54:00:26:16:4d -device virtio-scsi-pci,id=scsi0 -drive file=/var/snap/multipass/common/data/multipassd/vault/instances/comic-whiting/jammy-server-cloudimg-amd64-disk1.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/snap/multipass/common/data/multipassd/vault/instances/comic-whiting/cloud-init-config.iso

If I removing `-no-graphic` there, I get the console window which drops into the grub prompt soon after the initial "TianoCore" splash.

I don't remember seeing any error, but there may have been quick messages that I missed.

Revision history for this message
John Chittum (jchittum) wrote :

pinpointed to `livecd-rootfs` release 2.765.22 on Jammy:

Drop use of --removable flag to grub-install from
    live-build/buildd/hooks/02-disk-image-uefi.binary, to match the cloud
    images (7c760864fdcb278ca37396f06f5e3f297428d63d). This fixes
    bootloader updates in the buildd images, but also fixes compatibility
    with using devtmpfs for losetup.

https://git.launchpad.net/livecd-rootfs/commit/?h=ubuntu/jammy&id=5ac4df3a1a928b76e00c74647ad9d0fe30c007e7

this was revered in `ubuntu/master` earlier because it caused images to fail to boot. for completeness, i did try a few different configurations in the code block to see if things could work

1. full-revert of the commit. This worked
2. Deleted just the `--removable` line. This caused it to fail to boot
3. kept `--removable` but deleted the if statement with `boot/efi`. this failed to boot

I'm building livecd-rootfs package now, and will try to run a launchpad build with it as well. The big issue here is that vorlon noticed an issue with livecd-rootfs builders, and issues related to buildd. I have successful builds before the merge in, within a single day, so it appears the launchpad-builder issue isn't affecting being able to build `buildd` on 22.04 as of 20230827.

Changed in cloud-images:
status: New → Confirmed
Changed in livecd-rootfs (Ubuntu):
status: New → Confirmed
Changed in cloud-images:
importance: Undecided → High
assignee: nobody → John Chittum (jchittum)
Changed in livecd-rootfs (Ubuntu):
assignee: nobody → John Chittum (jchittum)
Revision history for this message
Steve Langasek (vorlon) wrote :

The root cause here is that the environment when grub-install is called is apparently not set up correctly, and grub ends up installed to \EFI\debian on the ESP instead of to \EFI\ubuntu. So grub can't find its config at the expected location (which is built into the grub efi binary), and drops to a grub shell.

Steve Langasek (vorlon)
summary: - Jammy buildd image doesn't boot
+ Jammy buildd image doesn't boot because grub is installed to \EFI\debian
+ instead of \EFI\ubuntu
Revision history for this message
Steve Langasek (vorlon) wrote :

The path on the ESP is constructed from GRUB_DISTRIBUTOR, which in /etc/default/grub is set to:

GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`

So first, 'Debian' is not a reasonable fallback on Ubuntu. The fallback should be Ubuntu.

And second, the lsb-release package is not guaranteed to be installed whenever grub-install is called, because grub2-common does not depend on it. It's part of the minimal seed, but the buildd images don't use the minimal seed; they subset it.

Rather than depending on lsb-release, however, we should just change this to use the not-deprecated /etc/os-release interface, e.g.:

GRUB_DISTRIBUTOR=`. /etc/os-release && echo $NAME || echo Ubuntu`

Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't plan to keep a delta, so I'm thinking

diff --git a/debian/default/grub b/debian/default/grub
index 03f98ec7f..5068a6566 100644
--- a/debian/default/grub
+++ b/debian/default/grub
@@ -5,7 +5,7 @@

 GRUB_DEFAULT=0
 GRUB_TIMEOUT=@DEFAULT_TIMEOUT@
-GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
+GRUB_DISTRIBUTOR=`. /etc/os-release && echo $NAME || echo @DPKG_VENDOR@`
 GRUB_CMDLINE_LINUX_DEFAULT="@DEFAULT_CMDLINE@"
 GRUB_CMDLINE_LINUX=""

diff --git a/debian/rules b/debian/rules
index 0da34cae0..5cfd8bf4e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -145,6 +145,7 @@ endif
 # rebuild grub, need a programmatic way to get the vendor, as it's used by build-efi-images
 # to create the monolithic Grub image and thus is needed to create the partitions on the EFI
 # media. Add it to the control file user metadata: XB-Efi-Vendor: $vendor
+DPKG_VENDOR ?= $(shell dpkg-vendor --query vendor)
 SB_EFI_VENDOR ?= $(shell dpkg-vendor --query vendor | tr '[:upper:]' '[:lower:]')

 %:
@@ -355,6 +356,7 @@ platform_subst = \
        if [ -e debian/$(1) ]; then \
                debian/platform-subst \
                        PACKAGE="$(2)" \
+ DPKG_VENDOR="$(DPKG_VENDOR)" \
                        DEFAULT_CMDLINE="$(DEFAULT_CMDLINE)" \
                        DEFAULT_TIMEOUT="$(DEFAULT_TIMEOUT)" \
                        DEFAULT_HIDDEN_TIMEOUT_BOOL="$(DEFAULT_HIDDEN_TIMEOUT_BOOL)" \

But maybe also just drop dynamic lookup entirely because that breaks expectations in some ways, e.g. shim can't find grub anymore if you change grub distributor.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 2034253] Re: Jammy buildd image doesn't boot because grub is installed to \EFI\debian instead of \EFI\ubuntu

On Thu, Sep 07, 2023 at 10:21:05AM -0000, Julian Andres Klode wrote:
> +GRUB_DISTRIBUTOR=`. /etc/os-release && echo $NAME || echo @DPKG_VENDOR@`

Maybe you want

GRUB_DISTRIBUTOR=`. /etc/os-release; echo ${NAME:-@DPKG_VENDOR@}`

To handle the uncommon case of /etc/os-release exists but is broken (missing
NAME=)

Revision history for this message
Julian Andres Klode (juliank) wrote :
Changed in grub2 (Ubuntu):
status: New → In Progress
Steve Langasek (vorlon)
Changed in livecd-rootfs (Ubuntu):
status: Confirmed → Invalid
John Chittum (jchittum)
description: updated
description: updated
Revision history for this message
Benjamin Drung (bdrung) wrote :

Better use ID from /etc/os-release instead.

https://www.freedesktop.org/software/systemd/man/os-release.html says: "NAME identifies the operating system, without a version component, and suitable for presentation to the user." and "ID is lower-case string (no spaces or other characters outside of 0–9, a–z, ".", "_" and "-") identifying the operating system, excluding any version information and suitable for processing by scripts or usage in generated filenames."

Revision history for this message
Julian Andres Klode (juliank) wrote :

That advise is incorrect, because GRUB_DISTRIBUTOR is the human-readable name. grub also mangles it to a machine-readable name because it doesn't have two vars, but still.

The mangling is:

$(echo ${GRUB_DISTRIBUTOR} | tr 'A-Z' 'a-z' | cut -d' ' -f1 | LC_ALL=C sed 's,[^[:alnum:]_],_,g')

Revision history for this message
Christopher Townsend (townsend) wrote (last edit ):

Hi! Any movement on this? This is really affecting Multipass users that are trying to build snaps with Multipass on amd64 architecture.

Revision history for this message
Julian Andres Klode (juliank) wrote :

It is? I started working on the fix but forgot to include it in Monday's Debian release, we're still hunting down some UEFI fixes.

I was told weeks ago that we don't really have to worry about because a workaround was being applied, so I of course didn't prioritize this in any sort of way, just planned to keep this for an SRU
eventually as a training exercise.

I see now the livecd-rootfs task has been marked Invalid, was this the planned workaround and has it been abandoned?

Revision history for this message
Julian Andres Klode (juliank) wrote :

Hmm the task is invalid, but the fix was merged?

Revision history for this message
Steve Langasek (vorlon) wrote :

There is another SRU in jammy-proposed that is currently awaiting verification. Working on expediting that so we can get it published to -updates and the fix for this reviewed for -proposed ASAP.

Revision history for this message
Christopher Townsend (townsend) wrote :

Great, thanks for the updates!

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Looking at this now. The other SRU was just released.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Ricardo, or anyone else affected,

Accepted livecd-rootfs into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd-rootfs/2.765.26 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in livecd-rootfs (Ubuntu Jammy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-jammy
Changed in livecd-rootfs (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Ricardo, or anyone else affected,

Accepted livecd-rootfs into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd-rootfs/2.664.49 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ricardo Abreu (ricab) wrote :

Hi Andreas, this issue prevents VMs from booting with certain disk images, so I believe I need an image to verify that the problem is fixed. Do you have any other testing procedure in mind?

John Chittum, can you perhaps herd things such that we get an image to test now? Thanks in advance!

Revision history for this message
John Chittum (jchittum) wrote :

I'll do verification of the working `livecd-rootfs` code, build an image, and test locally.

ricab, i can post an image i build locally to a fileshare for you test the image as well.

Revision history for this message
Ricardo Abreu (ricab) wrote :

Sounds good, thanks John.

Revision history for this message
John Chittum (jchittum) wrote :

I've verified Focal and Jammy. I've posted test builds in my public fileshare for anyone to verify the test build. Build and test steps below, which should be reproducible by anyone (nothing Canonical internal needed)

https://people.canonical.com/~jchittum/buildd-lp2034253/

## Build and test steps

### Jammy
1. downloaded 2.765.26 source: https://launchpad.net/ubuntu/+source/livecd-rootfs/2.765.26
2. ran a build with Bartender locally to produce a buildd image. Command ran from the directory above where the tar unpacks (so specifying the top level of the code)

https://github.com/ubuntu-bartenders/ubuntu-old-fashioned/tree/master/scripts/ubuntu-bartender

bartender --livecd-rootfs-dir ./livecd-rootfs --build-provider multipass -- --series jammy --project ubuntu-base --image-target all --subproject buildd

**NOTE This build does require any proprietary bits. i'm using the multipass provider, but there are many providers available. This command should be reproducible

3. after success, unpacked the finished product, and launched with qemu

qemu-system-x86_64 \
-cpu host -machine type=q35,accel=kvm -m 1024 \
-nographic \
-snapshot \
-netdev id=net00,type=user,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net00 \
-drive if=virtio,format=qcow2,file=build.output/livecd.ubuntu-base.disk-linux-virtual.img \
-cdrom ./ci-ssh-pub-set.iso \ # THIS IS A cloud-init ISO build using cloud-image-utils : cloud-localds
-bios /usr/share/OVMF/OVMF_CODE.fd # local OVMF for UEFI booting

4. Watched console, and saw successful boot. was able to SSH into the node using my inserted credentials (i added my public key with cloud-init)

### Focal
1. downloaded2.664.49 source: https://launchpad.net/ubuntu/+source/livecd-rootfs/2.664.49
2. ran a build with Bartender locally to produce a buildd image. Command ran from the directory above where the tar unpacks (so specifying the top level of the code)

bartender --livecd-rootfs-dir ./livecd-rootfs --build-provider multipass -- --series focal --project ubuntu-base --image-target all --subproject buildd

3. after success, unpacked the finished product, and launched with qemu

qemu-system-x86_64 \
-cpu host -machine type=q35,accel=kvm -m 1024 \
-nographic \
-snapshot \
-netdev id=net00,type=user,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net00 \
-drive if=virtio,format=qcow2,file=build.output/livecd.ubuntu-base.disk-linux-virtual.img \
-cdrom ./ci-ssh-pub-set.iso \ # THIS IS A cloud-init ISO build using cloud-image-utils : cloud-localds
-bios /usr/share/OVMF/OVMF_CODE.fd # local OVMF for UEFI booting

4. Watched console, and saw successful boot. was able to SSH into the node using my inserted credentials (i added my public key with cloud-init)

Revision history for this message
John Chittum (jchittum) wrote :

I've marked verification-done in relation to livecd-rootfs changes. I've published test images and build instructions for anyone else to verify.

tags: added: verification-done verification-done-focal verification-done-jammy
removed: verification-needed verification-needed-focal verification-needed-jammy
Revision history for this message
Christopher Townsend (townsend) wrote :

Hi!

Thanks John!

I have downloaded both published test images and have successfully booted these with Multipass. The issue appears to be fixed.

Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for livecd-rootfs has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.664.49

---------------
livecd-rootfs (2.664.49) focal; urgency=medium

  * Address the missing GRUB_DISTRIBUTOR issue. LP: #2034253

livecd-rootfs (2.664.48) focal; urgency=medium

  * Drop use of --removable flag to grub-install from
    live-build/buildd/hooks/02-disk-image-uefi.binary, to match the cloud
    images (7c760864fdcb278ca37396f06f5e3f297428d63d). This fixes
    bootloader updates in the buildd images, but also fixes compatibility
    with using devtmpfs for losetup.

 -- jchittum <email address hidden> Fri, 08 Sep 2023 08:35:15 -0500

Changed in livecd-rootfs (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.765.26

---------------
livecd-rootfs (2.765.26) jammy; urgency=medium

  * Set GRUB_DISTRIBUTION in 50-builddimg-settings.cfg to ensure
    EFI is installed in the right place (LP: #2034253)

 -- jchittum <email address hidden> Thu, 07 Sep 2023 13:51:33 -0500

Changed in livecd-rootfs (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
John Chittum (jchittum) wrote :

marking all cloud-images as fixed-released as new buildd are confirmed building with the code.

Changed in cloud-images:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.