Hi,
We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5 mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
(This was on a 60gb gp3 volume attached to a c5.4xlarge)
We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem.
The patch is available here, but hasn't been published in a released version of e2fsprogs:
A longer thread with the maintainer is available here:
https://<email address hidden>/
This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy.
Hi,
We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer:
#!/usr/bin/bash
set -euxo pipefail
while true
mkfs. ext4 /dev/nvme1n1p1
stress- ng --temp-path /mnt -D 4 &
STRESS_ PID=$!
resize2fs /dev/nvme1n1p1
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mount -t ext4 /dev/nvme1n1p1 /mnt
sleep 1
growpart /dev/nvme1n1 1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
(This was on a 60gb gp3 volume attached to a c5.4xlarge)
We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem.
The patch is available here, but hasn't been published in a released version of e2fsprogs:
https:/ /git.kernel. org/pub/ scm/fs/ ext2/e2fsprogs. git/commit/ ?id=43a498e9388 87956f393b5e45e a6ac79cc5f4b84
A longer thread with the maintainer is available here:
https://<email address hidden>/
This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy.