Activity log for bug #2036467

Date Who What changed Old value New value Message
2023-09-18 22:03:49 Krister Johansen bug added bug
2023-09-18 22:04:51 Krister Johansen bug added subscriber David Reaver
2023-09-22 23:54:14 Launchpad Janitor e2fsprogs (Ubuntu): status New Confirmed
2023-10-05 16:58:35 Dimitri John Ledkov tags patch patch-accepted-upstream patch patch-accepted-upstream rls-mm-incoming
2023-10-05 17:01:00 Dimitri John Ledkov information type Public Public Security
2023-10-05 17:01:19 Dimitri John Ledkov bug task added cloud-images
2023-10-05 17:01:26 Dimitri John Ledkov cloud-images: importance Undecided Critical
2023-10-05 18:18:30 Julian Andres Klode nominated for series Ubuntu Jammy
2023-10-05 18:18:30 Julian Andres Klode bug task added e2fsprogs (Ubuntu Jammy)
2023-10-05 18:18:30 Julian Andres Klode nominated for series Ubuntu Mantic
2023-10-05 18:18:30 Julian Andres Klode bug task added e2fsprogs (Ubuntu Mantic)
2023-10-05 18:18:30 Julian Andres Klode nominated for series Ubuntu Focal
2023-10-05 18:18:30 Julian Andres Klode bug task added e2fsprogs (Ubuntu Focal)
2023-10-05 18:23:07 Julian Andres Klode tags patch patch-accepted-upstream rls-mm-incoming foundations-todo patch patch-accepted-upstream
2023-10-05 18:30:14 Julian Andres Klode bug added subscriber Julian Andres Klode
2023-10-09 01:51:31 Matthew Ruffell bug added subscriber Matthew Ruffell
2023-10-09 01:51:49 Matthew Ruffell nominated for series Ubuntu Lunar
2023-10-09 01:51:49 Matthew Ruffell bug task added e2fsprogs (Ubuntu Lunar)
2023-10-09 01:51:49 Matthew Ruffell nominated for series Ubuntu Trusty
2023-10-09 01:51:49 Matthew Ruffell bug task added e2fsprogs (Ubuntu Trusty)
2023-10-09 01:51:49 Matthew Ruffell nominated for series Ubuntu Bionic
2023-10-09 01:51:49 Matthew Ruffell bug task added e2fsprogs (Ubuntu Bionic)
2023-10-09 01:51:49 Matthew Ruffell nominated for series Ubuntu Xenial
2023-10-09 01:51:49 Matthew Ruffell bug task added e2fsprogs (Ubuntu Xenial)
2023-10-09 01:52:03 Matthew Ruffell e2fsprogs (Ubuntu Mantic): status Confirmed In Progress
2023-10-09 01:52:05 Matthew Ruffell e2fsprogs (Ubuntu Lunar): status New In Progress
2023-10-09 01:52:07 Matthew Ruffell e2fsprogs (Ubuntu Jammy): status New In Progress
2023-10-09 01:52:09 Matthew Ruffell e2fsprogs (Ubuntu Focal): status New In Progress
2023-10-09 01:52:11 Matthew Ruffell e2fsprogs (Ubuntu Bionic): status New In Progress
2023-10-09 01:52:14 Matthew Ruffell e2fsprogs (Ubuntu Xenial): status New In Progress
2023-10-09 01:52:15 Matthew Ruffell e2fsprogs (Ubuntu Trusty): status New In Progress
2023-10-09 01:52:21 Matthew Ruffell e2fsprogs (Ubuntu Mantic): importance Undecided Critical
2023-10-09 01:52:22 Matthew Ruffell e2fsprogs (Ubuntu Lunar): importance Undecided Critical
2023-10-09 01:52:24 Matthew Ruffell e2fsprogs (Ubuntu Jammy): importance Undecided Critical
2023-10-09 01:52:25 Matthew Ruffell e2fsprogs (Ubuntu Focal): importance Undecided Critical
2023-10-09 01:52:27 Matthew Ruffell e2fsprogs (Ubuntu Bionic): importance Undecided Critical
2023-10-09 01:52:28 Matthew Ruffell e2fsprogs (Ubuntu Xenial): importance Undecided Critical
2023-10-09 01:52:30 Matthew Ruffell e2fsprogs (Ubuntu Trusty): importance Undecided Critical
2023-10-09 01:52:32 Matthew Ruffell e2fsprogs (Ubuntu Mantic): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:35 Matthew Ruffell e2fsprogs (Ubuntu Lunar): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:37 Matthew Ruffell e2fsprogs (Ubuntu Jammy): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:40 Matthew Ruffell e2fsprogs (Ubuntu Focal): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:42 Matthew Ruffell e2fsprogs (Ubuntu Bionic): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:44 Matthew Ruffell e2fsprogs (Ubuntu Xenial): assignee Matthew Ruffell (mruffell)
2023-10-09 01:52:48 Matthew Ruffell e2fsprogs (Ubuntu Trusty): assignee Matthew Ruffell (mruffell)
2023-10-09 02:21:33 Matthew Ruffell attachment added Debdiff for e2fsprogs on mantic https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff
2023-10-09 02:22:01 Matthew Ruffell attachment added Debdiff for e2fsprogs on lunar https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707894/+files/lp2036467_lunar.debdiff
2023-10-09 02:22:26 Matthew Ruffell attachment added Debdiff for e2fsprogs on jammy https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707895/+files/lp2036467_jammy.debdiff
2023-10-09 02:22:56 Matthew Ruffell attachment added Debdiff for e2fsprogs on focal https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707896/+files/lp2036467_focal.debdiff
2023-10-09 02:24:08 Matthew Ruffell attachment added Debdiff for e2fsprogs on bionic https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707898/+files/lp2036467_bionic.debdiff
2023-10-09 02:24:39 Matthew Ruffell attachment added Debdiff for e2fsprogs on xenial https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707899/+files/lp2036467_xenial.debdiff
2023-10-09 02:25:06 Matthew Ruffell attachment added Debdiff for e2fsprogs on trusty https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707900/+files/lp2036467_trusty.debdiff
2023-10-09 02:47:34 Matthew Ruffell summary superblock checksum mismatch in resize2fs Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
2023-10-09 02:47:53 Matthew Ruffell description Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o <tytso@mit.edu> Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images.
2023-10-09 02:48:18 Matthew Ruffell tags foundations-todo patch patch-accepted-upstream foundations-todo patch patch-accepted-upstream sts
2023-10-09 10:57:20 Julian Andres Klode e2fsprogs (Ubuntu Trusty): status In Progress Won't Fix
2023-10-09 10:57:23 Julian Andres Klode e2fsprogs (Ubuntu Xenial): status In Progress Won't Fix
2023-10-09 13:07:11 Andreas Hasenack bug added subscriber Andreas Hasenack
2023-10-12 03:42:46 Matthew Ruffell description [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o <tytso@mit.edu> Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team:    #!/usr/bin/bash    set -euxo pipefail    while true    do            parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s            sleep .5            mkfs.ext4 /dev/nvme1n1p1            mount -t ext4 /dev/nvme1n1p1 /mnt            stress-ng --temp-path /mnt -D 4 &            STRESS_PID=$!            sleep 1            growpart /dev/nvme1n1 1            resize2fs /dev/nvme1n1p1            kill $STRESS_PID            wait $STRESS_PID            umount /mnt            wipefs -a /dev/nvme1n1p1            wipefs -a /dev/nvme1n1    done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.GA5737@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o <tytso@mit.edu> Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for  online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images.
2023-10-12 03:42:54 Matthew Ruffell e2fsprogs (Ubuntu Bionic): status In Progress Won't Fix
2023-10-17 12:47:02 Philip Roche bug added subscriber Philip Roche