Ubuntu Core 20 randomly fails to boot and asking for recovery key

Bug #1979185 reported by daniel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Committed
Critical
Unassigned

Bug Description

Hello everyone.

We purchased at our company several units of the BXNUC10i3FNH NUC. We are installing Ubuntu core on it and using it as a part of an embedded system. Our installation process is as follows:

We first upgrade the BIOS as specified in the intel support website. Then we install the Ubuntu core as specified in the Ubuntu core website.

The bios is configured to:

power up when plugging the NUC into the power,
the secure boot is enabled as standard mode.
The installation completes successfully, and the NUC works flawlessly for some time.

The issue is that after an ungraceful shutdown (which happens randomly, most of the time when an ungraceful shutdown occurs it doesn’t happen), the ubuntu core boot is stuck at “please enter the recovery key for disk <LONG_DISK_NAME>” (which we don’t have, we have never inserted it, it is configured by itself).

After a few minutes, the message is changed to “cannot recover key: the platform’s secure device is unavailable: the TPM is in DA lockout mode” (images for both messages attached).

And that’s it, we cannot unlock it, and we don’t know why it happens.

Image attached at snapcraft forum: https://forum.snapcraft.io/t/ubuntu-core-20-randomly-fails-to-boot-and-asking-for-recovery-key/30541

The only thing we can do is to reinstall Ubuntu core, but all the data that was on the NUC will be lost after reinstalling.

Does someone know why it may happen and what we can do to prevent it from happening?

Thanks, Daniel.

Tags: bot-comment
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1979185/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Michael Vogt (mvo)
affects: ubuntu → snapd
Revision history for this message
Chris Coulson (chrisccoulson) wrote :

This is most likely because the TPM's DA counter is incremented after an unclean shutdown. Snapd needs to be resetting this on every successful boot.

Revision history for this message
daniel (acz-a) wrote :

Hey Chris,
Thank you for your time to comment.
Can we do something for now until the bug is fixed?

Michael Vogt (mvo)
Changed in snapd:
importance: Undecided → Critical
Revision history for this message
Michael Vogt (mvo) wrote :

I started with that in https://github.com/snapcore/snapd/compare/master...mvo5:clear-lockout-mode?expand=1 - needs some more work but the principle should be sound.

Revision history for this message
daniel (acz-a) wrote :

Michael Vogt (mvo) / Chris Coulson (chrisccoulson) / Anyone:

Is there a terminal command that we can run in the meantime to see and clear the TPM DA Counter?
It is an urgent matter for us, we have crashing systems and we need a temporary solution until the bug is fixed.

Thanks.

Michael Vogt (mvo)
Changed in snapd:
status: New → Fix Committed
Revision history for this message
Michael Vogt (mvo) wrote :

Hey Daniel, sorry for my slow reply. The fix is now commited, any version of snapd from the edge channel will clear the counter after the first boot.

You can get the counter value via:
"""
cd ~/some/empty/dir
wget -c https://raw.githubusercontent.com/snapcore/snapd/master/tests/nested/manual/core20-da-lockout/getdalockout.go
GO111MODULE=off go get .
GO111MODULE=off go build ./getdalockout.go
sudo ./getdalockout
"""

The snapd with the fix is planned to go to beta this week and to candidate next week.

Revision history for this message
daniel (acz-a) wrote :

Hey Michael.
Thank you very much for the support and debugging.

I want to inform you that prior to your update, I used the https://github.com/timchen119/tpm2-toolbox package to read the TPM DA lockout counter, using:
"""
sudo tpm2-toolbox.getcap properties-variable -T device:/dev/tpmrm0
"""
With every power off and on (power outage), the parameter TPM2_PT_LOCKOUT_COUNTER was incremented. When it reached 32 (0x20) the NUC was locked as I specified and as you assumed.

I have now reinstalled the NUC and upgraded snapd to edge channel.
It seems to have zeroed that parameter.
We are still testing it (powering off and on), to be 100% sure that it has solved the issue, I will update you on this matter.

Now, because our operatioal systems are used for an industrial embedded purpose, we need the snapd package on them to be set to the stable channel.

Do you have an estimation of when we would have an ubuntu-core 20 base image with this bugfix installed in it?

Again, thanks a lot for the support so far.

Daniel.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.