Fail to boot with LUKS on top of RAID1 if the array is broken/degraded
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cryptsetup (Debian) |
Fix Released
|
Unknown
|
|||
cryptsetup (Ubuntu) |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Xenial |
Won't Fix
|
Medium
|
Guilherme G. Piccoli | ||
Bionic |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Focal |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Groovy |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
initramfs-tools (Ubuntu) |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Xenial |
Won't Fix
|
Medium
|
Guilherme G. Piccoli | ||
Bionic |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Focal |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Groovy |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
mdadm (Ubuntu) |
Opinion
|
Medium
|
Guilherme G. Piccoli | ||
Xenial |
Won't Fix
|
Medium
|
Guilherme G. Piccoli | ||
Bionic |
Opinion
|
Medium
|
Guilherme G. Piccoli | ||
Focal |
Opinion
|
Medium
|
Guilherme G. Piccoli | ||
Groovy |
Opinion
|
Medium
|
Guilherme G. Piccoli |
Bug Description
[Impact]
* Considering a setup of a encrypted rootfs on top of md RAID1 device, Ubuntu is currently unable to decrypt the rootfs if the array gets degraded, like for example if one of the array's members gets removed.
* The problem has 2 main aspects: first, cryptsetup initramfs script attempts to decrypt the array only in the local-top boot stage, and in case it fails, it gives-up and show user a shell (boot is aborted).
* Second, mdadm initramfs script that assembles degraded arrays executes later on boot, in the local-block stage. So, in a stacked setup of encrypted root on top of RAID, if the RAID is degraded, cryptsetup fails early in the boot, preventing mdadm to assemble the degraded array.
* The hereby proposed solution has 2 components: first, cryptsetup script is modified to allow a gentle failure on local-top stage, then it retries for a while (according to a heuristic based on ROOTDELAY with minimum of 30 executions) in a later stage (local-block). This gives time to other initramfs scripts to run, like mdadm in local-block stage. And this is meant to work this way according to initramfs-tools documentation (although Ubuntu changed it a bit with wait-for-root, hence we stopped looping on local-block, see next bullet).
* Second, initramfs-tools was adjusted - currently, it runs for a while the mdadm local-block script, in order to assemble the arrays in a non-degraded mode. We extended this approach to also execute cryptsetup, in a way that after mdadm ends its execution, we execute at least once more time cryptsetup. In an ideal world we should loop on local-block as Debian's initramfs (in a way to remove hardcoded mdadm/cryptsetup mentions from initramfs-tools code), but this would be really a big change, non-SRUable probably. I plan to work that for future Ubuntu releases.
[Test case]
* Install Ubuntu in a Virtual Machine with 2 disks. Use the installer to create a RAID1 volume and an encrypted root on top of it.
* Boot the VM, and use "sgdisk"/"wipefs" to erase the partition table from one of the RAID members. Reboot and it will fail to mount rootfs and continue boot process.
* If using the initramfs-
[Regression potential]
* There are potential for regressions, since this is a change in 2 boot components. The patches were designed in a way to keep the regular case working, it changes the failure case which is not currently working anyway.
* A modification in the behavior of cryptsetup was introduced: right now, if we fail the password 3 times (the default maximum attempts), the script doesn't "panic" and drop to a shell immediately; instead it runs once more (or twice, if mdadm is installed) before failing. This is a minor change given the benefit of the being able to mount rootfs in a degraded RAID1 scenario.
* Other potential regressions could show-up as boot problems, but the change in initramfs-tools specifically is not invasive, it just may delay boot time a bit, given we now run cryptsetup multiple times on local-block, with 1 sec delays between executions.
Related branches
- Guilherme G. Piccoli (community): Approve
- git-ubuntu developers: Pending requested
-
Diff: 2287 lines (+1775/-33)11 files modifieddebian/changelog (+1603/-0)
debian/control (+4/-3)
debian/cryptsetup-initramfs.install (+1/-0)
debian/functions (+11/-1)
debian/initramfs/cryptroot-unlock (+12/-6)
debian/initramfs/hooks/cryptroot (+5/-3)
debian/initramfs/scripts/local-block/cryptroot (+4/-0)
debian/initramfs/scripts/local-bottom/cryptroot (+23/-0)
debian/initramfs/scripts/local-top/cryptroot (+56/-20)
debian/patches/decrease_memlock_ulimit.patch (+55/-0)
debian/patches/series (+1/-0)
Changed in mdadm (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in initramfs-tools (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
description: | updated |
Changed in cryptsetup (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in cryptsetup (Debian): | |
status: | Unknown → New |
Changed in mdadm (Ubuntu): | |
status: | Confirmed → Opinion |
Changed in initramfs-tools (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in mdadm (Ubuntu Xenial): | |
status: | New → Opinion |
Changed in mdadm (Ubuntu Bionic): | |
status: | New → Opinion |
Changed in cryptsetup (Ubuntu Xenial): | |
status: | New → Opinion |
Changed in cryptsetup (Ubuntu Bionic): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
status: | New → In Progress |
Changed in cryptsetup (Ubuntu Xenial): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
Changed in cryptsetup (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in cryptsetup (Ubuntu Xenial): | |
status: | Opinion → Won't Fix |
Changed in cryptsetup (Ubuntu Focal): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in initramfs-tools (Ubuntu Xenial): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → Won't Fix |
Changed in initramfs-tools (Ubuntu Bionic): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in initramfs-tools (Ubuntu Focal): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in mdadm (Ubuntu Xenial): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | Opinion → Won't Fix |
Changed in mdadm (Ubuntu Bionic): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
Changed in mdadm (Ubuntu Focal): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → Opinion |
description: | updated |
tags: | added: patch |
tags: | added: sts-sponsor-mfo |
tags: | removed: sts-sponsor-mfo |
Changed in cryptsetup (Debian): | |
status: | New → Fix Released |
The issue basically is about a failure in mounting root if we have a stacked setup of LUKS on top of RAID1, when RAID1 is degraded (like a member missing). What happens in detail is a conjuncture of factors leading to this problem:
(a) The initramfs script for cryptroot currently is present in two initram stages: local-top and local-block. Problem is that if the script fails on local-top phase, it panics and opens a console, not allowing the boot process to continue. In this case, subsequent scripts are not executed automatically.
(b) The mdadm initramfs script to mount degraded arrays runs on local-block stage. It provides a heuristic that tries a regular array assemble for (2/3*ROOTDELAY) times, and then it assembles the array as degraded, in which is called the "poor man last resort" mechanism.
So, the first and far more serious issue is cryptroot early fail at local-top phase. So an idea I've implemented to fix this was to allow some retries on local-block stage, given local-block should loop for a while running its scripts (at least according to documentation and Debian's initramfs code). But guess what ?
(c) In Ubuntu, we have wait-on-root, which aims to speed-up the boot, in my shallow understanding. Basically, we have wait-for-root consuming almost all the ROOTDELAY time (30s as default, if not specified), and local-block scripts run only once. Except...mdadm, which has the previously mentioned heuristic of running 2/3*ROOTDELAY times. And for that, we have a hack on initramfs-tools to cope with mdadm (!), as per commit: salsa.debian.org/kernel-team/initramfs-tools/-/commit/033c948bb0 .
So, to fix the cryptroot inability to mount root device on top a degraded RAID1 is a matter of coordinate mdadm and cryptroot, and (if my approach is taken), loop on local-block. Below are the steps I took to circumvent this long-term issue:
1) Allows cryptsetup to retry on local-block stage, relying in a heuristic based on ROOTDELAY (we try 1/4*ROOTDELAY times) and on initramfs looping at local-block phase.
2) Reduce the heuristic frequency on mdadm, in order it doesn't "beat" the cryptroot attempts, i.e., cryptroot must execute more times. for this, we reduced the heuristic for 1/5*ROOTDELAY.
3) Make local-block on Ubuntu loop again, but still rely on wait-for-root in a first step; also, I removed that mdadm heinous hack from initramfs-tools, it works without...that...if local-block loops.
Below I'll submit groovy debdiffs to gather reviews on my approach. Also, a PPA with packages built, in case somebody else wanna give them a try: https://launchpad.net/~gpiccoli/+archive/ubuntu/lp1879980
Thanks,
Guilherme