install failed crashed with ReadTimeout

Bug #2034715 reported by Ken VanDine
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
snapd
In Progress
Critical
Unassigned
subiquity
Triaged
Critical
Dan Bungert
ubuntu-desktop-installer
Triaged
Critical
Unassigned

Bug Description

Installation with TPM enabled fails

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: subiquity (unknown)
ProcVersionSignature: Ubuntu 6.3.0-7.7-generic 6.3.5
Uname: Linux 6.3.0-7-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.27.0-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
CasperVersion: 1.482
CurtinAptConfig: /var/log/installer/subiquity-curtin-apt.conf
Date: Thu Sep 7 14:20:01 2023
ExecutablePath: /snap/ubuntu-desktop-installer/1229/bin/subiquity/subiquity/cmd/server.py
InterpreterPath: /snap/ubuntu-desktop-installer/1229/usr/bin/python3.10
LiveMediaBuild: Ubuntu 23.10 "Mantic Minotaur" - Daily amd64 (20230907)
MachineType: LENOVO 4810CT0100
ProcAttrCurrent: snap.hostname-desktop-installer.subiquity-server (complain)
ProcCmdline: /snap/hostname-desktop-installer/1229/usr/bin/python3.10 -m subiquity.cmd.server --use-os-prober --storage-version=2 --postinst-hooks-dir=/snap/hostname-desktop-installer/1229/etc/subiquity/postinst.d
ProcEnviron:
 LANG=C.UTF-8
 LD_LIBRARY_PATH=<set>
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz layerfs-path=standard.live.squashfs --- quiet splash
Python3Details: /usr/bin/python3.11, Python 3.11.5, python3-minimal, 3.11.4-5
PythonDetails: N/A
SnapChannel:

SnapRevision: 1229
SnapUpdated: False
SnapVersion: 0+git.09349bf4
SourcePackage: subiquity
Title: install failed crashed with ReadTimeout
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/07/2017
dmi.bios.release: 0.8
dmi.bios.vendor: LENOVO
dmi.bios.version: R0RET08L (0.08 )
dmi.board.asset.tag: Not Available
dmi.board.name: 4810CT0100
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 0.0
dmi.modalias: dmi:bvnLENOVO:bvrR0RET08L(0.08):bd09/07/2017:br0.8:efr0.0:svnLENOVO:pn4810CT0100:pvrThinkPad133rdGen:rvnLENOVO:rn4810CT0100:rvrNotDefined:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_4810_BU_Think_FM_ThinkPad133rdGen:
dmi.product.family: ThinkPad 13 3rd Gen
dmi.product.name: 4810CT0100
dmi.product.sku: LENOVO_MT_4810_BU_Think_FM_ThinkPad 13 3rd Gen
dmi.product.version: ThinkPad 13 3rd Gen
dmi.sys.vendor: LENOVO

Revision history for this message
Ken VanDine (ken-vandine) wrote :
tags: added: fde
information type: Private → Public
Revision history for this message
Dan Bungert (dbungert) wrote :

I don't think this is FDE related, I think this is a transient network thing. This is not a crash we've seen before, but I did see weird subiquity snap build failures this morning that are already resolved without change that look like they are network related.

Please repeat the install on the same setup and see if you get a different result. Thanks!

Changed in subiquity:
status: New → Incomplete
Dan Bungert (dbungert)
Changed in subiquity:
status: Incomplete → New
Revision history for this message
Ken VanDine (ken-vandine) wrote :

I made several more attempts, with and without networking and hit what appeared to be the same error condition. However, without FDE it worked (without networking).

Dan Bungert (dbungert)
Changed in subiquity:
status: New → Confirmed
assignee: nobody → Dan Bungert (dbungert)
Dan Bungert (dbungert)
Changed in subiquity:
status: Confirmed → In Progress
importance: Undecided → Critical
Revision history for this message
Dan Bungert (dbungert) wrote :

We are hitting 60s read timeouts when attempting to perform action=INSTALL step=FINISH for core boot encrypted.

Changed in subiquity:
status: In Progress → Triaged
Revision history for this message
Dan Bungert (dbungert) wrote :

In my case, making the timeout longer allows the step to complete with apparent success, but fails on first boot and recovery key prompts.

Revision history for this message
Michael Vogt (mvo) wrote :

Thanks for reporting this. The issue here is that the "seal" operation in snapd can take a very long time. For historic reasons we lock the state during the operation and block other snapd operations (including /v2/changes). This obviously is a bug and we are working on a fix.

Changed in snapd:
status: New → In Progress
importance: Undecided → Critical
Revision history for this message
Michael Vogt (mvo) wrote :

Just for the record - there is a draft PR in https://github.com/snapcore/snapd/pull/13171 - the code in there will change after the latest discussions we had and the PR number may actual change (because we may to a prereq PR first with some groundwork) but we are actively working on this.

Revision history for this message
Michael Vogt (mvo) wrote :

@Dan if your first boot fails with the long timeout, could you please share your device details and (if possible) the log output of the failed boot?

no longer affects: ubuntu-desktop-installer
Revision history for this message
Dan Bungert (dbungert) wrote :

> log output of the failed boot?

Rather boring, or maybe that is itself a clue:

```
/EndEntire
/EndEntire
<screen clear>
[timestamp] x86/cpu: SGX disabled by BIOS.
Please enter the recovery key for disk /dev/disk/by-partuuid/<uuid>: (press TAB for no echo)
```

> device details

Not sure what would be relevant here. TPM info from dmesg below, tarball of logs attached.
kernel: tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16)

Changed in ubuntu-desktop-installer:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Dan Bungert (dbungert) wrote :

Subiquity workaround committed in terms of a longer timeout.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.