silently breaking raid: root raid_members opened as luks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Release Notes for Ubuntu |
Invalid
|
Undecided
|
Unassigned | ||
cryptsetup (Ubuntu) |
Fix Released
|
Wishlist
|
Unassigned | ||
util-linux (Ubuntu) |
Invalid
|
Low
|
Unassigned |
Bug Description
Note:
When using luks encryption on top of software raid devices, it can eventually break because linux_raid_member devices get opened directly as luks instead of being assembled into md devices (Bug #531240), and all this happens silently because mdadm monitoring is not set up (Bug #491443). Additionally luks on raid root filesystems can not boot if the array is degraded (Bug #488317). It is advisable not to rely on luks on raid until a fix is released.
----
After the member was opened as luks device it was booted instead of the md device, while the raid remained "inactive". I don't know what triggered this, but it happened repeatedly after a couple of reboots. May happen according to the order of device enumeration on boot (random assembly).
Only by chance I noticed this because /proc/mdstat reported the root raid as inactive (although the system seemed to run fine!).
Looking further it seems during boot the system has unlocked and mounted the rootfs using (only) one raid_member (located on an external usb disk) directly. ("dmsetup deps" pointed to it for mdX_crypt)
"sudo blkid" now also reported what actually is an USB "linux_raid_member" as TYPE="crypto_LUKS".
After booting into a rescue system and reassembling the array everything seemed back to normal (including blkid output), until this bug hit again a couple reboots/days later.
A more reliable way to replicate:
Boot the luks on raid1 system with the alternate CD's rescue-mode, and it wants to open all raid members device directly as luks.
---
@util-linux
I found the following in the util-linux changelog:
* Always return encrypted block devices as the first detected encryption
system (ie. LUKS, since that's the only one) rather than probing for
additional metadata and returning an ambivalent result. LP: #428435.
might that cause the precedence of reporing luks over raid_member for the usb disk?
@cryptsetup
Also "cryptsetup isLuks" gives a false positve. "cryptsetup isLuks <raid member device with luks on it>" returns $?=0 (success/true) That is the reason why the initramfs check implemented in scripts/
summary: |
- blkid reports root raid_member (on usb) as luks, which is booted while - raid remains "inactive" + breaking raid: root raid_member opened as luks |
description: | updated |
description: | updated |
Changed in util-linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in cryptsetup (Ubuntu): | |
status: | Invalid → New |
description: | updated |
description: | updated |
Changed in cryptsetup (Ubuntu): | |
status: | Incomplete → New |
description: | updated |
description: | updated |
summary: |
- breaking raid: root raid_member opened as luks + silently breaking raid: root raid_members opened as luks |
description: | updated |
description: | updated |
Changed in ubuntu-release-notes: | |
status: | New → Invalid |
A (more or less wild) guess why this misreporting may not have surfaced with any bad effects before:
After booting manually and reassembling the array manually I noticed the usb disk did fail after a while and was dropped from the array, so it has become unreliably.
So maybe it is this condition on boot (usb disk was droped from the array) where the array is not run yet (possibly waiting for the disk now having a different device name) and cryptsetup grabs the raid_member instead of waiting for the array.