race on boot between multiple invocations of grub-editenv

Bug #1910815 reported by Steve Dodd
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * two grub systemd units run in parallel.

 * ensure they are serialzed.

[Test Case]

 * boot on initrdless systems (such as modern/recent public cloud instances)

 * observe that grub-initrd-fallback.service is always after grub-common.service

 * observe that both are successful

this is easy to view using

$ sudo journalctl -u grub-initrd-fallback.service grub-common.service

the output should be in order, not interleaved, with grub-common first and then grub-initrd-fallback.

[Where problems could occur]

 * The boot is slightly slower as the two jobs are serialized, but they are not computationally intensive and shouldn't affect bootspeed at all, especially on multicore systems where other things are happening in parallel to these.

[Other Info]

 * Original bug report

On focal, it appears systemd can run /etc/init.d/grub-common in parallel with /lib/systemd/system/grub-initrd-fallback.service. Both of these invoke grub-editenv for different reasons, apparently resulting in race conditions that generate messages like this:

Jan 08 18:07:15 asr-host systemd[1]: Starting LSB: Record successful boot for GRUB...
Jan 08 18:07:15 asr-host systemd[1]: Starting GRUB failed boot detection...
[..]
Jan 08 18:07:15 asr-host grub-common[1822]: * Recording successful boot for GRUB
[..]
Jan 08 18:07:16 asr-host grub-editenv[1886]: /usr/bin/grub-editenv: error: cannot rename the file /boot/grub/grubenv.new to /boot/grub/grubenv.
Jan 08 18:07:16 asr-host systemd[1]: grub-initrd-fallback.service: Main process exited, code=exited, status=1/FAILURE
Jan 08 18:07:16 asr-host systemd[1]: grub-initrd-fallback.service: Failed with result 'exit-code'.
Jan 08 18:07:16 asr-host systemd[1]: Failed to start GRUB failed boot detection.
Jan 08 18:07:16 asr-host grub-common[1822]: ...done.
Jan 08 18:07:16 asr-host systemd[1]: Started LSB: Record successful boot for GRUB.

Google search for "Failed to start GRUB failed boot detection" throws up a few hits, which suggests this isn't necessarily something to weird about the machine I'm running on:

https://www.google.co.uk/search?q=%22Failed+to+start+GRUB+failed+boot+detection.%22

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: grub-common 2.04-1ubuntu26.7
ProcVersionSignature: Ubuntu 5.4.0-59.65-generic 5.4.78
Uname: Linux 5.4.0-59-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.14
Architecture: amd64
CasperMD5CheckResult: skip
Date: Fri Jan 8 20:19:42 2021
ProcEnviron:
 TERM=screen.xterm-256color
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: grub2
UpgradeStatus: Upgraded to focal on 2020-12-23 (15 days ago)

Related branches

Revision history for this message
Steve Dodd (anarchetic) wrote :
Revision history for this message
Steve Dodd (anarchetic) wrote :

Fix might well be as simple as adding "After=grub-common.service" to /lib/systemd/system/grub-initrd-fallback.service - it's worked for one boot so far, which is obviously not a great test of a race condition :) It does seem to lead to sequential ordering of the two jobs, though:

Jan 08 20:31:26 asr-host systemd[1]: Starting LSB: Record successful boot for GRUB...
Jan 08 20:31:26 asr-host grub-common[1816]: * Recording successful boot for GRUB
Jan 08 20:31:27 asr-host grub-common[1816]: ...done.
Jan 08 20:31:27 asr-host systemd[1]: Started LSB: Record successful boot for GRUB.
Jan 08 20:31:27 asr-host systemd[1]: Starting GRUB failed boot detection...
Jan 08 20:31:27 asr-host systemd[1]: grub-initrd-fallback.service: Succeeded.
Jan 08 20:31:27 asr-host systemd[1]: Finished GRUB failed boot detection.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Cody Shepherd (codyshepherd) wrote :

This bug is also affecting Hirsute cloud images. The workaround mentioned in the comments does seem to alleviate the issue.

tags: added: rls-hh-incoming
Revision history for this message
Cody Shepherd (codyshepherd) wrote :

I've proposed an MP that lays down a systemd conf file in cloud images that implements the workaround described in this bug's top comment. This represents a temporary workaround for cloud images, and can be reverted if/when it's no longer needed.

Changed in grub2 (Ubuntu):
status: Confirmed → Fix Committed
tags: removed: rls-hh-incoming
Changed in grub2 (Ubuntu Focal):
status: New → Fix Committed
Changed in grub2 (Ubuntu Groovy):
status: New → In Progress
Changed in grub2 (Ubuntu Focal):
status: Fix Committed → In Progress
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Steve, or anyone else affected,

Accepted grub2 into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu35.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Steve, or anyone else affected,

Accepted grub2 into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu26.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (grub2/2.04-1ubuntu35.4)

All autopkgtests for the newly accepted grub2 (2.04-1ubuntu35.4) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

ubiquity/unknown (amd64)
grml2usb/unknown (amd64)
zsys/unknown (amd64)
grubzfs-testsuite/unknown (amd64)
ubuntu-image/unknown (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#grub2

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

booted hirsute instance and observed the race:

$ journalctl -u grub-initrd-fallback -u grub-common
-- Journal begins at Tue 2021-02-16 18:28:59 UTC, ends at Tue 2021-02-16 18:29:30 UTC. --
Feb 16 18:29:06 ip-172-31-7-49 systemd[1]: Starting Record successful boot for GRUB...
Feb 16 18:29:06 ip-172-31-7-49 systemd[1]: Starting GRUB failed boot detection...
Feb 16 18:29:06 ip-172-31-7-49 sh[528]: grub-editenv: error: invalid environment block.
Feb 16 18:29:06 ip-172-31-7-49 systemd[1]: Started Record successful boot for GRUB.
Feb 16 18:29:06 ip-172-31-7-49 systemd[1]: grub-initrd-fallback.service: Succeeded.
Feb 16 18:29:06 ip-172-31-7-49 systemd[1]: Finished GRUB failed boot detection.

Upgraded to

$ dpkg-query -W grub-efi-arm64 grub-efi-arm64-signed grub-efi-arm64-bin
grub-efi-arm64 2.04-1ubuntu40
grub-efi-arm64-bin 2.04-1ubuntu40
grub-efi-arm64-signed 1.161+2.04-1ubuntu40

Rebooted twice

Observed

$ journalctl -b -u grub-initrd-fallback -u grub-common

Feb 16 18:33:22 ip-172-31-7-49 systemd[1]: Starting Record successful boot for GRUB...
Feb 16 18:33:22 ip-172-31-7-49 systemd[1]: Started Record successful boot for GRUB.
Feb 16 18:33:22 ip-172-31-7-49 systemd[1]: Starting GRUB failed boot detection...
Feb 16 18:33:22 ip-172-31-7-49 systemd[1]: grub-initrd-fallback.service: Succeeded.
Feb 16 18:33:22 ip-172-31-7-49 systemd[1]: Finished GRUB failed boot detection.

Notice how one job starting & started; before the next one does starting & finished. Thus they are are now run in sequence.

Rather than having two Starting messages back to back

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ dpkg-query -W grub-efi-arm64 grub-efi-arm64-bin grub-efi-arm64-signed
grub-efi-arm64 2.04-1ubuntu26.9
grub-efi-arm64-bin 2.04-1ubuntu26.9
grub-efi-arm64-signed 1.142.11+2.04-1ubuntu26.9

Before upgrade:

Feb 16 19:06:10 ip-172-31-7-182 systemd[1]: Starting LSB: Record successful boot for GRUB...
Feb 16 19:06:10 ip-172-31-7-182 systemd[1]: Starting GRUB failed boot detection...
Feb 16 19:06:10 ip-172-31-7-182 systemd[1]: grub-initrd-fallback.service: Succeeded.
Feb 16 19:06:10 ip-172-31-7-182 systemd[1]: Finished GRUB failed boot detection.
Feb 16 19:06:10 ip-172-31-7-182 grub-common[543]: * Recording successful boot for GRUB
Feb 16 19:06:10 ip-172-31-7-182 grub-common[543]: ...done.
Feb 16 19:06:10 ip-172-31-7-182 systemd[1]: Started LSB: Record successful boot for GRUB.

After upgrade:

Feb 16 19:08:44 ip-172-31-7-182 systemd[1]: Starting LSB: Record successful boot for GRUB...
Feb 16 19:08:44 ip-172-31-7-182 grub-common[485]: * Recording successful boot for GRUB
Feb 16 19:08:44 ip-172-31-7-182 grub-common[485]: ...done.
Feb 16 19:08:44 ip-172-31-7-182 systemd[1]: Started LSB: Record successful boot for GRUB.
Feb 16 19:08:44 ip-172-31-7-182 systemd[1]: Starting GRUB failed boot detection...
Feb 16 19:08:44 ip-172-31-7-182 systemd[1]: grub-initrd-fallback.service: Succeeded.
Feb 16 19:08:44 ip-172-31-7-182 systemd[1]: Finished GRUB failed boot detection.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Took groovy instance

-- Logs begin at Wed 2021-02-17 17:12:44 UTC, end at Wed 2021-02-17 17:16:09 UTC. --
Feb 17 17:12:52 ip-172-31-10-148 systemd[1]: Starting LSB: Record successful boot for GRUB...
Feb 17 17:12:52 ip-172-31-10-148 systemd[1]: Starting GRUB failed boot detection...

Boot badly.

Upgraded to:

$ dpkg-query -W | grep 2.04-1ubuntu35.4
grub-common 2.04-1ubuntu35.4
grub-efi-arm64 2.04-1ubuntu35.4
grub-efi-arm64-bin 2.04-1ubuntu35.4
grub2-common 2.04-1ubuntu35.4

Boots nice and in order

-- Reboot --
Feb 17 17:15:12 ip-172-31-10-148 systemd[1]: Starting LSB: Record successful boot for GRUB...
Feb 17 17:15:12 ip-172-31-10-148 grub-common[486]: * Recording successful boot for GRUB
Feb 17 17:15:12 ip-172-31-10-148 grub-common[486]: ...done.
Feb 17 17:15:12 ip-172-31-10-148 systemd[1]: Started LSB: Record successful boot for GRUB.
Feb 17 17:15:12 ip-172-31-10-148 systemd[1]: Starting GRUB failed boot detection...
Feb 17 17:15:13 ip-172-31-10-148 systemd[1]: grub-initrd-fallback.service: Succeeded.
Feb 17 17:15:13 ip-172-31-10-148 systemd[1]: Finished GRUB failed boot detection.

tags: added: verification-done-groovy
removed: verification-needed-groovy
tags: added: verification-done
removed: verification-needed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu35.4

---------------
grub2 (2.04-1ubuntu35.4) groovy; urgency=medium

  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

grub2 (2.04-1ubuntu35.3) groovy; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288
  * Cherrypick fix crash in http LP: #1915288

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 22:11:53 +0000

Changed in grub2 (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for grub2 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu26.9

---------------
grub2 (2.04-1ubuntu26.9) focal; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288
  * Cherrypick fix crash in http LP: #1915288
  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 22:03:32 +0000

Changed in grub2 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu40

---------------
grub2 (2.04-1ubuntu40) hirsute; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288

grub2 (2.04-1ubuntu39) hirsute; urgency=medium

  * Cherrypick a bunch of patches:
    - fix crash in http LP: #1915288
    - add bootp6 documentation
    - add support for UEFI boot protocols
    - use UEFI protocols for http & https networking
    - make netboot search for by-mac/by-uuid/by-ip for grub.cfg
    - update documentation for netboot search paths of grub.cfg
  * Make prebuilt netboot image look for MAAS grub.cfg
  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

grub2 (2.04-1ubuntu38) hirsute; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix warnings during grub menu generation. Thanks wdoekes for the patch
    (LP: #1898177)
    - Fix warnings when bpool doesn't exist.
    - Fix warnings when snapshot name contains dashes.
  * Do not fail to generate grub menu when name of the snapshot contains
    spaces. (LP: #1903524)

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 20:29:16 +0000

Changed in grub2 (Ubuntu Hirsute):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.