2023-09-18 22:03:49 |
Krister Johansen |
bug |
|
|
added bug |
2023-09-18 22:04:51 |
Krister Johansen |
bug |
|
|
added subscriber David Reaver |
2023-09-22 23:54:14 |
Launchpad Janitor |
e2fsprogs (Ubuntu): status |
New |
Confirmed |
|
2023-10-05 16:58:35 |
Dimitri John Ledkov |
tags |
patch patch-accepted-upstream |
patch patch-accepted-upstream rls-mm-incoming |
|
2023-10-05 17:01:00 |
Dimitri John Ledkov |
information type |
Public |
Public Security |
|
2023-10-05 17:01:19 |
Dimitri John Ledkov |
bug task added |
|
cloud-images |
|
2023-10-05 17:01:26 |
Dimitri John Ledkov |
cloud-images: importance |
Undecided |
Critical |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
nominated for series |
|
Ubuntu Jammy |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
bug task added |
|
e2fsprogs (Ubuntu Jammy) |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
nominated for series |
|
Ubuntu Mantic |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
bug task added |
|
e2fsprogs (Ubuntu Mantic) |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
nominated for series |
|
Ubuntu Focal |
|
2023-10-05 18:18:30 |
Julian Andres Klode |
bug task added |
|
e2fsprogs (Ubuntu Focal) |
|
2023-10-05 18:23:07 |
Julian Andres Klode |
tags |
patch patch-accepted-upstream rls-mm-incoming |
foundations-todo patch patch-accepted-upstream |
|
2023-10-05 18:30:14 |
Julian Andres Klode |
bug |
|
|
added subscriber Julian Andres Klode |
2023-10-09 01:51:31 |
Matthew Ruffell |
bug |
|
|
added subscriber Matthew Ruffell |
2023-10-09 01:51:49 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Lunar |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
bug task added |
|
e2fsprogs (Ubuntu Lunar) |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Trusty |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
bug task added |
|
e2fsprogs (Ubuntu Trusty) |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Bionic |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
bug task added |
|
e2fsprogs (Ubuntu Bionic) |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Xenial |
|
2023-10-09 01:51:49 |
Matthew Ruffell |
bug task added |
|
e2fsprogs (Ubuntu Xenial) |
|
2023-10-09 01:52:03 |
Matthew Ruffell |
e2fsprogs (Ubuntu Mantic): status |
Confirmed |
In Progress |
|
2023-10-09 01:52:05 |
Matthew Ruffell |
e2fsprogs (Ubuntu Lunar): status |
New |
In Progress |
|
2023-10-09 01:52:07 |
Matthew Ruffell |
e2fsprogs (Ubuntu Jammy): status |
New |
In Progress |
|
2023-10-09 01:52:09 |
Matthew Ruffell |
e2fsprogs (Ubuntu Focal): status |
New |
In Progress |
|
2023-10-09 01:52:11 |
Matthew Ruffell |
e2fsprogs (Ubuntu Bionic): status |
New |
In Progress |
|
2023-10-09 01:52:14 |
Matthew Ruffell |
e2fsprogs (Ubuntu Xenial): status |
New |
In Progress |
|
2023-10-09 01:52:15 |
Matthew Ruffell |
e2fsprogs (Ubuntu Trusty): status |
New |
In Progress |
|
2023-10-09 01:52:21 |
Matthew Ruffell |
e2fsprogs (Ubuntu Mantic): importance |
Undecided |
Critical |
|
2023-10-09 01:52:22 |
Matthew Ruffell |
e2fsprogs (Ubuntu Lunar): importance |
Undecided |
Critical |
|
2023-10-09 01:52:24 |
Matthew Ruffell |
e2fsprogs (Ubuntu Jammy): importance |
Undecided |
Critical |
|
2023-10-09 01:52:25 |
Matthew Ruffell |
e2fsprogs (Ubuntu Focal): importance |
Undecided |
Critical |
|
2023-10-09 01:52:27 |
Matthew Ruffell |
e2fsprogs (Ubuntu Bionic): importance |
Undecided |
Critical |
|
2023-10-09 01:52:28 |
Matthew Ruffell |
e2fsprogs (Ubuntu Xenial): importance |
Undecided |
Critical |
|
2023-10-09 01:52:30 |
Matthew Ruffell |
e2fsprogs (Ubuntu Trusty): importance |
Undecided |
Critical |
|
2023-10-09 01:52:32 |
Matthew Ruffell |
e2fsprogs (Ubuntu Mantic): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:35 |
Matthew Ruffell |
e2fsprogs (Ubuntu Lunar): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:37 |
Matthew Ruffell |
e2fsprogs (Ubuntu Jammy): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:40 |
Matthew Ruffell |
e2fsprogs (Ubuntu Focal): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:42 |
Matthew Ruffell |
e2fsprogs (Ubuntu Bionic): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:44 |
Matthew Ruffell |
e2fsprogs (Ubuntu Xenial): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 01:52:48 |
Matthew Ruffell |
e2fsprogs (Ubuntu Trusty): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-10-09 02:21:33 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on mantic https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff |
|
2023-10-09 02:22:01 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on lunar https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707894/+files/lp2036467_lunar.debdiff |
|
2023-10-09 02:22:26 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on jammy https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707895/+files/lp2036467_jammy.debdiff |
|
2023-10-09 02:22:56 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on focal https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707896/+files/lp2036467_focal.debdiff |
|
2023-10-09 02:24:08 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on bionic https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707898/+files/lp2036467_bionic.debdiff |
|
2023-10-09 02:24:39 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on xenial https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707899/+files/lp2036467_xenial.debdiff |
|
2023-10-09 02:25:06 |
Matthew Ruffell |
attachment added |
|
Debdiff for e2fsprogs on trusty https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707900/+files/lp2036467_trusty.debdiff |
|
2023-10-09 02:47:34 |
Matthew Ruffell |
summary |
superblock checksum mismatch in resize2fs |
Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs |
|
2023-10-09 02:47:53 |
Matthew Ruffell |
description |
Hi,
We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt
stress-ng --temp-path /mnt -D 4 &
STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1
resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
(This was on a 60gb gp3 volume attached to a c5.4xlarge)
We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem.
The patch is available here, but hasn't been published in a released version of e2fsprogs:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
A longer thread with the maintainer is available here:
https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/
This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. |
[Impact]
This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk.
Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image.
Changing the read of the superblock to Direct I/O solves the issue.
[Testcase]
Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk.
Run the following script, courtesy of Krister Johansen and his team:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt
stress-ng --temp-path /mnt -D 4 &
STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1
resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
Test packages are available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
If you install the test packages, the race no longer occurs.
[Where problems could occur]
We are changing how resize2fs reads the superblock from underlying disks.
If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur.
[Other info]
Upstream mailing list discussion:
https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/
https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/
This was fixed in the below commit upstream:
commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 15 Jun 2023 00:17:01 -0400
Subject: resize2fs: use Direct I/O when reading the superblock for
online resizes
Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. |
|
2023-10-09 02:48:18 |
Matthew Ruffell |
tags |
foundations-todo patch patch-accepted-upstream |
foundations-todo patch patch-accepted-upstream sts |
|
2023-10-09 10:57:20 |
Julian Andres Klode |
e2fsprogs (Ubuntu Trusty): status |
In Progress |
Won't Fix |
|
2023-10-09 10:57:23 |
Julian Andres Klode |
e2fsprogs (Ubuntu Xenial): status |
In Progress |
Won't Fix |
|
2023-10-09 13:07:11 |
Andreas Hasenack |
bug |
|
|
added subscriber Andreas Hasenack |
2023-10-12 03:42:46 |
Matthew Ruffell |
description |
[Impact]
This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk.
Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image.
Changing the read of the superblock to Direct I/O solves the issue.
[Testcase]
Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk.
Run the following script, courtesy of Krister Johansen and his team:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt
stress-ng --temp-path /mnt -D 4 &
STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1
resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
Test packages are available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
If you install the test packages, the race no longer occurs.
[Where problems could occur]
We are changing how resize2fs reads the superblock from underlying disks.
If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur.
[Other info]
Upstream mailing list discussion:
https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/
https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/
This was fixed in the below commit upstream:
commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 15 Jun 2023 00:17:01 -0400
Subject: resize2fs: use Direct I/O when reading the superblock for
online resizes
Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. |
[Impact]
This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk.
Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image.
$ resize2fs /dev/nvme1n1p1
resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1
Couldn't find valid filesystem superblock.
Changing the read of the superblock to Direct I/O solves the issue.
[Testcase]
Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk.
Run the following script, courtesy of Krister Johansen and his team:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt
stress-ng --temp-path /mnt -D 4 &
STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1
resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
Test packages are available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
If you install the test packages, the race no longer occurs.
[Where problems could occur]
We are changing how resize2fs reads the superblock from underlying disks.
If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur.
[Other info]
Upstream mailing list discussion:
https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/
https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/
This was fixed in the below commit upstream:
commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 15 Jun 2023 00:17:01 -0400
Subject: resize2fs: use Direct I/O when reading the superblock for
online resizes
Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. |
|
2023-10-12 03:42:54 |
Matthew Ruffell |
e2fsprogs (Ubuntu Bionic): status |
In Progress |
Won't Fix |
|
2023-10-17 12:47:02 |
Philip Roche |
bug |
|
|
added subscriber Philip Roche |