mantic images after 20230917 are failing to deploy with failure to mount root and kernel filesystems

Bug #2037417 reported by Francis Ginther
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Invalid
Undecided
Unassigned
The Ubuntu-power-systems project
Invalid
Undecided
Unassigned
cloud-images
Fix Released
Undecided
Unassigned
maas-images
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Status tracked in Mantic
Mantic
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Status tracked in Mantic
Mantic
Invalid
Undecided
Unassigned
util-linux (Ubuntu)
Status tracked in Mantic
Mantic
Fix Released
Critical
Unassigned

Bug Description

Mantic arm64 deploys started failing on Sept 18th with:

[ 41.913552] systemd[1]: Starting systemd-remount-fs.service - Remount Root and Kernel File Systems...
         Starting systemd-remount-f鈥t Root and Kernel File Systems...
[ 41.940748] systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices...
         Starting systemd-udev-trig鈥0m - Coldplug All udev Devices...
[ 41.964758] systemd[1]: Started systemd-journald.service - Journal Service.
[ OK ] Started systemd-journald.service - Journal Service.
[ OK ] Mounted dev-hugepages.mount - Huge Pages File System.
[ OK ] Mounted dev-mqueue.mount[鈥�- POSIX Message Queue File System.
[ OK ] Mounted sys-kernel-debug.m鈥t - Kernel Debug File System.
[ OK ] Mounted sys-kernel-tracing鈥t - Kernel Trace File System.
[ OK ] Finished keyboard-setup.se鈥�- Set the console keyboard layout.
[ OK ] Finished kmod-static-nodes鈥eate List of Static Device Nodes.
[ OK ] Finished lvm2-monitor.serv鈥ing dmeventd or progress polling.
[ OK ] Finished modprobe@configfs鈥0m - Load Kernel Module configfs.
[ OK ] Finished modprobe@dm_mod.s鈥 - Load Kernel Module dm_mod.
[ OK ] Finished [0;1;<email address hidden> - Load Kernel Module drm.
[ OK ] Finished modprobe@efi_psto鈥 - Load Kernel Module efi_pstore.
[ OK ] Finished [0;1;<email address hidden> - Load Kernel Module fuse.
[ OK ] Finished [0;1;<email address hidden> - Load Kernel Module loop.
[ OK ] Finished systemd-modules-l鈥ervice - Load Kernel Modules.
[FAILED] Failed to start systemd-re鈥unt Root and Kernel File Systems.
See 'systemctl status systemd-remount-fs.service' for details.

After this many other services and cloud-init fails. See the full kopter-0918.log. For comparison, a log from the prior day's test is also attached.

Revision history for this message
Francis Ginther (fginther) wrote :

Log from failing deployment.

Revision history for this message
Francis Ginther (fginther) wrote :

Log from prior day's passing deployment.

Revision history for this message
Francis Ginther (fginther) wrote :

I have done some debugging, but I'm at a loss for what to do next.

Booting with 'apparmor=0' did not help.

Here's what the mounts look like:

# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=262851016k,nr_inodes=65712754,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=52585616k,mode=755,inode64)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
/root.tmp.img (deleted) on /media/root-ro type squashfs (ro,relatime,errors=continue,threads=single)
tmpfs-root on /media/root-rw type tmpfs (rw,relatime,inode64)
overlayroot on / type overlay (ro,relatime,lowerdir=/media/root-ro,upperdir=/media/root-rw/overlay,workdir=/media/root-rw/overlay-workdir/_,xino=off,nouserxattr)
copymods on /usr/lib/modules type tmpfs (rw,relatime,inode64)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
tmpfs on /etc/machine-id type tmpfs (ro,relatime,size=52585616k,mode=755,inode64)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
ramfs on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

I can boot into single user mode and interact with the console to perform further debugging.

Revision history for this message
Francis Ginther (fginther) wrote :

journalctl -b --no-pager

Revision history for this message
Francis Ginther (fginther) wrote :

The last working maas image had kernel 6.3.0-7-generic, the first one to fail has 6.5.0-5-generic.

Revision history for this message
Francis Ginther (fginther) wrote :

I've confirmed this problem on amd64 and ppc64el (https://bugs.launchpad.net/maas/+bug/2037475).

summary: - mantic arm64 images are failing to deploy with failure to mount root and
- kernel filesystems
+ mantic images after 20230917 are failing to deploy with failure to mount
+ root and kernel filesystems
Revision history for this message
Francis Ginther (fginther) wrote :

The 6.5.0-6.6 linux kernel in mantic-proposed also fails.

Revision history for this message
Francis Ginther (fginther) wrote (last edit ):

I have tested booting with various initrd/kernel/squashfs combinations.

 * The mantic 6.5.0-5 kernel and modules repacked into a jammy (20230927) or lunar (20230927) initrd and squashfs result in a deployed host.
 * The mantic initrd and squashfs from 20230908 with the 6.5.0-5 kernel did not deploy.
 * The original 20230908 mantic maas image (with the 6.3.0-7-generic kernel) does boot.

So the 6.5.0-5 kernel works with older images, but not any of the recent mantic images.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Changed in ubuntu-power-systems:
status: New → Confirmed
Revision history for this message
Patricia Domingues (patriciasd) wrote :

adding the console log from the failed system - Power10 LPAR

Revision history for this message
Francis Ginther (fginther) wrote :

My experiments with the workaround mentioned in https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/2037202/comments/1 did not help.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
affects: linux → linux (Ubuntu)
Changed in linux (Ubuntu):
milestone: none → ubuntu-23.10
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2037417

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I believe reproducer is:

wget https://images.maas.io/ephemeral-v3/candidate/mantic/amd64/20231004.1/ga-23.10/generic/boot-kernel
wget https://images.maas.io/ephemeral-v3/candidate/mantic/amd64/20231004.1/ga-23.10/generic/boot-initrd

qemu-system-x86_64 -m 4G -smp 2 -nic user,model=virtio-net-pci -kernel boot-kernel -initrd boot-initrd --append 'ip=dhcp root=squash:http://images.maas.io/ephemeral-v3/candidate/mantic/amd64/20231004.1/squashfs overlayroot=tmpfs overlayroot_cfgdisk=disabled systemd.log_level=debug systemd.log_target=console systemd.journald.forward_to_console=1 debug=y'

for faster download you might want to have squashfs on local lan

tags: added: foundaitions-todo
tags: added: foundations-todo
removed: foundaitions-todo
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So the cause of this, I am fairly sure, is the new "libmount-mountfd" support in util-linux which seems to have the consequence that "mount -o remount $mountpoint" fails for an overlay that references paths no longer available in the current mount namespace.

You can see the discussion of a similar-but-not-the-same issue here https://github.com/util-linux/util-linux/issues/1992

I'm trying to come up with a reproducer but I have to head afk now. I'll file a bug upstream when I get that done.

Maybe we should build util-linux with --disable-libmount-mountfd-support for now (bit late in the release cycle to be uploading util-linux though...)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah no it's a bit simpler than that https://github.com/util-linux/util-linux/issues/2528.

--disable-libmount-mountfd-support looking better tbh.

Changed in util-linux (Ubuntu Mantic):
milestone: none → ubuntu-23.10
Changed in util-linux (Ubuntu Mantic):
status: New → Triaged
importance: Undecided → Critical
Changed in systemd (Ubuntu Mantic):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Mantic):
status: Incomplete → Invalid
Changed in systemd (Ubuntu Mantic):
status: Incomplete → Invalid
Changed in ubuntu-power-systems:
status: Confirmed → Invalid
Changed in maas-images:
status: New → Confirmed
Revision history for this message
Francis Ginther (fginther) wrote :

Special maas image built with util-linux, 2.39.1-4ubuntu2, from https://ppa.launchpadcontent.net/xnox/release-critical/ubuntu is looking good. I have one machine deployed with this:

ubuntu@rumford:~$ uname -r
6.5.0-5-lowlatency
ubuntu@rumford:~$ apt-cache policy util-linux
util-linux:
  Installed: 2.39.1-4ubuntu2
  Candidate: 2.39.1-4ubuntu2
  Version table:
 *** 2.39.1-4ubuntu2 500
        500 https://ppa.launchpadcontent.net/xnox/release-critical/ubuntu mantic/main amd64 Packages
        100 /var/lib/dpkg/status
     2.39.1-4ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages
ubuntu@rumford:~$ cat /etc/cloud/build.info
build_name: server
serial: 20231006.1732

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package util-linux - 2.39.1-4ubuntu2

---------------
util-linux (2.39.1-4ubuntu2) mantic; urgency=medium

  * Disable brand new feature with --disable-libmount-mountfd-support that
    causes inability to deploy MAAS LP: #2037417

 -- Dimitri John Ledkov <email address hidden> Thu, 05 Oct 2023 22:27:31 +0100

Changed in util-linux (Ubuntu Mantic):
status: Triaged → Fix Released
Revision history for this message
Francis Ginther (fginther) wrote :

The latest maas images from 20231008 are booting without issue:

ubuntu@akis:~$ lsb_release -sc
No LSB modules are available.
mantic
ubuntu@akis:~$ cat /etc/cloud/build.info
build_name: server
serial: 20231008
ubuntu@akis:~$ uname -a
Linux akis 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 29 09:14:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Changed in ubuntu-release-notes:
status: New → Invalid
Revision history for this message
John Chittum (jchittum) wrote :

Marking cloud-images and maas-images as fixed with the roll of util-linux, and confirmation from fginther

Changed in maas-images:
status: Confirmed → Fix Released
Changed in cloud-images:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.