iso9660 not supported on 4k devices for config drives

Bug #2028002 reported by Julia Kreger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Undecided
Unassigned

Bug Description

A downstream reporter has attempted to use ironic to deploy a running OS with shiny new 4k IO devices.

The deployment of the OS image works just fine. The issue is the creation and write-out of the configuration drive.

Specifically, when they attempt to boot a machine, they get the following kernel error log attempting to access the iso9660 filesystem from a 4k device.

[ 82.511907] isofs_fill_super: bread failed, dev=sda2, iso_blknum=17, block=-2147483648

cloud-init, when it is running logs the following:

2023-07-11 05:42:14,601 - subp.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/sda2', '/run/cloud-init/tmp/tmpkf7q2n3s'] with allowed return codes [0] (shell=False, capture=True)
2023-07-11 05:42:14,621 - util.py[DEBUG]: Failed mount of '/dev/sda2' as 'auto': Unexpected error while running command.
Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/sda2', '/run/cloud-init/tmp/tmpkf7q2n3s']
Exit code: 32
Reason: -
Stdout:
Stderr: mount: /run/cloud-init/tmp/tmpkf7q2n3s: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error.
2023-07-11 05:42:14,621 - __init__.py[DEBUG]: Datasource DataSourceConfigDrive [net,ver=None][source=None] not updated for events: New instance first boot
2023-07-11 05:42:14,622 - handlers.py[DEBUG]: finish: init-local/search-ConfigDrive: SUCCESS: no local data found from DataSourceConfigDrive

Examining the cloud-init source code[0], it appears the rules constraint has been:

* an iso9660 or vfat filesystem

OR

* A file system with a config-2/CONFIG-2 label.

That basic logic[0] has been present since 2014.

The original strict constraint seems largely rooted in nova, which also appears like it had deprecated the vfat option. And, it is sort of a reasonable constraint in the virtualized context.

So I think the option forward is to update ironic-python-agent to handle this gracefully.

Two options:
1) Attempt to remount the device, and if fails, reformat, extract, and copy the config content to the new filesystem. This can inherently help create a security issue, so we might need to make the filesystem configurable.

2) Identify the device is a 4k IO device (how?!?) and then base the steps on that.

One issue to consider is if someone starts adding a lot of content into their configuration drive, since a larger block size could mean many smaller files won't be able to fit in the same space potentially. But... that shouldn't be an issue for the 99% of operators out there using configuration drives, and those doing embedded contents can likely just easily ship a different filesystem anyhow.

[0]: https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceConfigDrive.py#L278-L299

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

It looks like this is a blend of the device, and the OS, plus constraints on ISO9660 filesystems. Specifically their internal block size is 2K, where as the block device driver could not support less than 4k.

The stock/generic difference here seems to be what the block device driver supports.

I've discussed this with @stevebaker and the consensus is basically, attempt to access, if it fails, re-pave it with a different filesystem type. That should at least get the user to a working node.

Changed in ironic:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (master)

Reviewed: https://review.opendev.org/c/openstack/ironic-python-agent/+/888794
Committed: https://opendev.org/openstack/ironic-python-agent/commit/b6c263a5dc34c10b50a480e3fdcf22784f2aa77c
Submitter: "Zuul (22348)"
Branch: master

commit b6c263a5dc34c10b50a480e3fdcf22784f2aa77c
Author: Julia Kreger <email address hidden>
Date: Tue Jul 18 11:01:22 2023 -0700

    preserve/handle config drives on 4k block devices

    When an underlying block device (or driver) only supports 4KB IO,
    this can cause some issues with aspects like using an ISO9660 filesystem
    which can only support a maximum of 2KB IO.

    The agent will now attempt to mount the filesystem *before* deleting the
    supplied file, and should that fail it will mount the configuration drive
    file from the ramdisk utilizing a loopback, and then extract the contents
    of the ramdisk into a newly created VFAT filesystem which supports 4KB
    block IO.

    Closes-Bug: #2028002
    Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/ironic-python-agent/+/893736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/xena)
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

I'm afraid we won't be able to backport this past xena. This is rooted in functionality based in a stable library which we can't really fix at this point. Once merged, please use a more recent release of Ironic.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-python-agent 9.7.0

This issue was fixed in the openstack/ironic-python-agent 9.7.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.