Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
cloud-images |
New
|
Critical
|
Unassigned | |||
e2fsprogs (Ubuntu) | Status tracked in Mantic | |||||
Trusty |
Won't Fix
|
Critical
|
Matthew Ruffell | |||
Xenial |
Won't Fix
|
Critical
|
Matthew Ruffell | |||
Bionic |
Won't Fix
|
Critical
|
Matthew Ruffell | |||
Focal |
In Progress
|
Critical
|
Matthew Ruffell | |||
Jammy |
In Progress
|
Critical
|
Matthew Ruffell | |||
Lunar |
In Progress
|
Critical
|
Matthew Ruffell | |||
Mantic |
In Progress
|
Critical
|
Matthew Ruffell |
Bug Description
[Impact]
This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk.
Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image.
$ resize2fs /dev/nvme1n1p1
resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1
Couldn't find valid filesystem superblock.
Changing the read of the superblock to Direct I/O solves the issue.
[Testcase]
Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk.
Run the following script, courtesy of Krister Johansen and his team:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mount -t ext4 /dev/nvme1n1p1 /mnt
sleep 1
growpart /dev/nvme1n1 1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
Test packages are available in the following ppa:
https:/
If you install the test packages, the race no longer occurs.
[Where problems could occur]
We are changing how resize2fs reads the superblock from underlying disks.
If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur.
[Other info]
Upstream mailing list discussion:
https://<email address hidden>/
https://<email address hidden>/
This was fixed in the below commit upstream:
commit 43a498e93888795
Author: Theodore Ts'o <email address hidden>
Date: Thu, 15 Jun 2023 00:17:01 -0400
Subject: resize2fs: use Direct I/O when reading the superblock for
online resizes
Link: https:/
The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images.
Changed in e2fsprogs (Ubuntu Mantic): | |
status: | Confirmed → In Progress |
Changed in e2fsprogs (Ubuntu Lunar): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Jammy): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Focal): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in e2fsprogs (Ubuntu Mantic): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Lunar): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Jammy): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Focal): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Bionic): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Xenial): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Trusty): | |
importance: | Undecided → Critical |
Changed in e2fsprogs (Ubuntu Mantic): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Lunar): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Jammy): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Focal): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Bionic): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Xenial): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in e2fsprogs (Ubuntu Trusty): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
summary: |
- superblock checksum mismatch in resize2fs + Resizing cloud-images occasionally fails due to superblock checksum + mismatch in resize2fs |
description: | updated |
tags: | added: sts |
description: | updated |
Changed in e2fsprogs (Ubuntu Bionic): | |
status: | In Progress → Won't Fix |
Status changed to 'Confirmed' because the bug affects multiple users.