block probing fails with KeyError MAJOR

Bug #1868109 reported by Malte Kuhn
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Fix Released
Undecided
Unassigned
subiquity
Fix Released
Undecided
Unassigned
subiquity (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We were unable to install Ubuntu Focal Live Server via subiquity in the daily-live iso from 17.03.2020 (SHA256SUM 341d2990bab8d28a02414743eb45d74bb90771a8cc34ca0a792293cbbedfcecf).
Reason appears to be a failiure in the block device probing section.
ISO was loaded via IPMI as virtual device.

Expected Result:
Subiquity would list the NVMe(SSD) that is residing in the first PCIe Slot of the Mainboard.
Continuation of installation.

Current Result:
Crash of installation process, no listing of block devices.
See attached screencast.

Additional Information:
 Hardware Specs:
   Mainboard: Supermicro X11DPi-NT
   CPUs: 2 * Intel Xeon Gold 6226
   SSD: Samsung PM1725b 3.2TB PCIe 3.0x8 (AIC)

 Installation log was sent via reporting functionality of subiquity.

Related branches

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :
description: updated
description: updated
description: updated
Paul White (paulw2u)
affects: ubuntu → subiquity (Ubuntu)
tags: added: focal
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I wonder if this was the same problem as https://bugs.launchpad.net/subiquity/+bug/1868817

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

@mwhudson : do you have any commands I should execute to confirm this?
Or is the Fix Committed Status not equal to "it will be on the next daily live disc"?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah yes, you could try the process described in https://discourse.ubuntu.com/t/how-to-test-the-latest-version-of-subiquity/12428, that would actually be very useful.

FWIW, fix committed does not mean it will be on the next ISO, that's more what fix released means in this project. But we've not been very disciplined about keeping bug statuses up to date, I'm afraid.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Tried subiquity in the edge channel - no change in the installation process in this regard.
The block probe still fails. The report was send via subiquity reporting functionality 1585241307.605162859.block_probe_...

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

The installation was tried with the Ubuntu Focal Live Server (daily-live current) from 26.03.2020
(SHA256SUM b5a94bedc6833f2fabc4cda091cdcb8f5b8b5633ece68740145cedf6ec89b137)

On a sidenote: Installing on a desktop system, cloning the disk (in this case a Samsung 970 EVO NVMe M.2 SSD), transferring it onto the Samsung PM1725b 3.2TB PCIe 3.0x8 (AIC), renders a usable system.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Hm, is it possible to attach an error report to this bug? By design it's not possible to get from an error report in the tracker back to a user. If that's not easy, can you look in the .meta file in /var/crash? It should have an oops_id which will let me find the report reliably.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

I'll attach just everything of /var/crash ;)
(Daily-live current ISO from yesterday and switch to subiquity snap edge channel)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah the traceback is this:

Traceback:
 Traceback (most recent call last):
   File "/snap/subiquity/1581/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 145, in _probe
     self._probe_once_task.task, 15.0)
   File "/snap/subiquity/1581/usr/lib/python3.6/asyncio/tasks.py", line 358, in wait_for
     return fut.result()
   File "/snap/subiquity/1581/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 120, in _probe_once
     self.app.prober.get_storage, probe_types)
   File "/snap/subiquity/1581/lib/python3.6/site-packages/subiquitycore/async_helpers.py", line 44, in run_in_thread
     return await loop.run_in_executor(None, func, *args)
   File "/snap/subiquity/1581/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
     result = self.fn(*self.args, **self.kwargs)
   File "/snap/subiquity/1581/lib/python3.6/site-packages/subiquitycore/prober.py", line 60, in get_storage
     return Storage().probe(probe_types=probe_types)
   File "/snap/subiquity/1581/lib/python3.6/site-packages/probert/storage.py", line 182, in probe
     probed_data[ptype] = pfunc(context=self.context)
   File "/snap/subiquity/1581/lib/python3.6/site-packages/probert/filesystem.py", line 37, in probe
     if device['MAJOR'] not in ["1", "7"]:
   File "/snap/subiquity/1581/lib/python3.6/site-packages/pyudev/device/_device.py", line 957, in __getitem__
     return self.properties.__getitem__(prop)
   File "/snap/subiquity/1581/lib/python3.6/site-packages/pyudev/device/_device.py", line 1084, in __getitem__
     raise KeyError(prop)
 KeyError: 'MAJOR'

We've seen this before but I thought it was fixed. I'll investigate...

summary: - Subiquity on Focal Live Server (current daily-live) crashes on block
- probe
+ block probing fails with KeyError MAJOR
Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Hi Michael,

would it help to grant you SSH access into the installation session (daily-live current iso) on that machine?

Regards
Malte

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Even if my last comment was sent on April 1st - it was no joke :)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Sorry for the slow response and thanks for the offer but I think we have enough information to go on to debug this. Other things just keep on coming up! Hopefully I can look at this tomorrow.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I created https://github.com/CanonicalLtd/probert/pull/86 which is hopefully a workaround/fix for this

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

My colleague manually patched probert in the subiquity snap (fictional snap revision 1999 based on 1626 and your PR above) and we get a different stacktrace now.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The crash is now

 Traceback (most recent call last):
   File "/snap/subiquity/1999/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 144, in _probe
     self._probe_once_task.task, 15.0)
   File "/snap/subiquity/1999/usr/lib/python3.6/asyncio/tasks.py", line 358, in wait_for
     return fut.result()
   File "/snap/subiquity/1999/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 124, in _probe_once
     self.model.load_probe_data(storage)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1529, in load_probe_data
     self.reset()
   File "/snap/subiquity/1999/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1266, in reset
     self._probe_data)["storage"]["config"]
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/storage_config.py", line 1317, in extract_storage_config
     tree = get_config_tree(cfg.get('id'), final_config)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/storage_config.py", line 275, in get_config_tree
     for dep in find_item_dependencies(item, sconfig):
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/storage_config.py", line 245, in find_item_dependencies
     _validate_dep_type(item_id, dep_key, dep, config)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/storage_config.py", line 193, in _validate_dep_type
     'Invalid dep_id (%s) not in storage config' % dep_id)
 ValueError: Invalid dep_id (disk-nvme0n1) not in storage config

so, progress of a kind I guess? Will take a look later.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So the udev data for /dev/nvme0n1 is:

/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
 N: nvme0n1
 L: 0
 S: disk/by-id/nvme-SAMSUNG_MZPLL3T2HAJQ-00005_S4CCNE0M300015
 S: disk/by-id/nvme-eui.344343304d3000150025384500000004
 E: DEVPATH=/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
 E: SUBSYSTEM=block
 E: DEVNAME=/dev/nvme0n1
 E: DEVTYPE=disk
 E: MAJOR=259
 E: MINOR=1
 E: USEC_INITIALIZED=5210525
 E: MPATH_SBIN_PATH=/sbin
 E: DM_MULTIPATH_DEVICE_PATH=0
 E: ID_SERIAL_SHORT=S4CCNE0M300015
 E: ID_WWN=eui.344343304d3000150025384500000004
 E: ID_MODEL=SAMSUNG MZPLL3T2HAJQ-00005
 E: ID_REVISION=GPJA0B3Q
 E: ID_SERIAL=SAMSUNG MZPLL3T2HAJQ-00005_S4CCNE0M300015
 E: ID_PART_TABLE_UUID=4bac57b7-307b-4b0e-a853-e0232c6fb955
 E: ID_PART_TABLE_TYPE=gpt
 E: DEVLINKS=/dev/disk/by-id/nvme-SAMSUNG_MZPLL3T2HAJQ-00005_S4CCNE0M300015 /dev/disk/by-id/nvme-eui.344343304d3000150025384500000004
 E: TAGS=:systemd:

The specific problem we have here is that when scanning for physical disks, we filter out DEVPATHs that start with /devices/virtual, because you know, we're looking for physical disks. Maybe we should only filter on /devices/virtual/block instead?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

This tiny patch to curtin https://paste.ubuntu.com/p/fbJVRmdBjz/ gets past this crash. If your expeditious colleague would like to try patching the subiquity snap with that and trying again, I'd be interested to hear if that works!

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Well that worked for the block probing step :)

The device is now selectable and the installation process begins. :+1:

But the patched curtin (from your paste) seems to trip on a next error during install:

Tags: focal uec-images
Title: install failed crashed with CalledProcessError
Traceback:
 Traceback (most recent call last):
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
     ret = args.func(args)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/commands/curthooks.py", line 1638, in curthooks
     builtin_curthooks(cfg, target, state)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/commands/curthooks.py", line 1604, in builtin_curthooks
     setup_grub(cfg, target, osfamily=osfamily)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/commands/curthooks.py", line 627, in setup_grub
     join_stdout_err + args + instdevs, env=env, capture=True)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/util.py", line 275, in subp
     return _subp(*args, **kwargs)
   File "/snap/subiquity/1999/lib/python3.6/site-packages/curtin/util.py", line 141, in _subp
     cmd=args)
 curtin.util.ProcessExecutionError: Unexpected error while running command.

The install fail is attached.

Thank you for your commitment and patience, Michael!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

It never ends, the failure is now

         + grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu --recheck
         Installing for x86_64-efi platform.
         grub-install: warning: efivarfs_get_variable: open(/sys/firmware/efi/efivars/blk0-47c7b225-c42a-11d2-8e57-00a0c969723b): No such file or directory.
         grub-install: warning: efi_get_variable: ops->get_variable failed: No such file or directory.
         grub-install: warning: device_get: readlink of /sys/block/(null)/device failed: No such file or directory.
         grub-install: warning: open_disk: could not open disk: No such file or directory.
         grub-install: warning: efi_va_generate_file_device_path_from_esp: could not open disk: No such file or directory.
         grub-install: warning: efi_generate_file_device_path_from_esp: could not generate File DP from ESP: No such file or directory.
         grub-install: error: failed to register the EFI boot entry: No such file or directory.
         failed to install grub!

I'm going to lie down for a bit before I think about this though.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

After you mentioned UEFI, we are now able to install in bios legacy mode. ヽ(•‿•)ノ
We still have a second system, that can be used to test the latest ISO on UEFI mode

Revision history for this message
Ryan Harper (raharper) wrote :

Can we get the /var/log/installer/block/probe-data*.json files?

Also, if you have the var-crash from your failed UEFI install, that would help me fix the UEFI install.

Changed in curtin:
status: New → Incomplete
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The crash file from the failed UEFI install is in comment #18, and contains the probe data.

Changed in curtin:
status: Incomplete → Confirmed
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 0832e4ef to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=0832e4ef

Changed in curtin:
status: Confirmed → Fix Committed
Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

As soon those patches are in the daily live iso - we would like to test again. Please send us a notice, so we can report back.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

They're not on an ISO yet but if you can try with the edge snap, you'll get a subiquity with these patches. https://discourse.ubuntu.com/t/how-to-test-the-latest-version-of-subiquity/12428 has a few ways you can do this.

I would not expect UEFI installs to work yet but hopefully the BIOS install will work with edge.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The latest daily (i.e. 20200414) has the patches discussed here.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

We'll try next week. Then i've physical access

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Sorry for the delay. Installation continues and finishes in UEFI mode.

Now we ran into boot issues (SSD is the same model, but mainboard isn't), but those are unlikely Software related (seems like an AHCI Driver bug in the mainboard firmware) - so we can't confirm that the installed OS boots.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

So this issue can be closed.

Revision history for this message
Malte Kuhn (mc-mkuhn) wrote :

Mainboard firmware update worked. Installable and bootable in both legacy and uefi mode.

Changed in subiquity:
status: New → Fix Released
Changed in subiquity (Ubuntu):
status: New → Fix Released
Changed in curtin:
status: Fix Committed → Fix Released
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Thanks for the update! I wasn't expecting the UEFI install to work but happy to hear it does!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.