Boot failure with root on /dev/md* (raid)

Bug #780492 reported by ptashek
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

Description: Ubuntu 11.04
Release: 11.04

Package: mdadm
Version: 3.1.4+8efb9d1ubuntu4.1

There are several bugs relating to boot failures when the root partition is on an mdadm managed RAID array, however none of them seem to describe the problem I am seeing with 11.04.

Some context:
- Ubuntu 11.04 x86_64 upgraded from 10.10 x86_64 using update-manager
- three raid arrays across 9 partitions on three physical disks - /dev/md125 (raid10), /dev/md126 (raid5), /dev/md127 (raid10)
- root partition is /dev/md127p3 with /boot on /dev/md127p1
- fstab and grub.cfg refer to partitions using UUID, not device names

Expected behaviour:
System boots from root partition on an mdadm managed raid array

Actual behaviour:
When booting with rroot partition on an mdadm managed raid array the process fails, with the udevd daemon hanging right after the init-premount and local-top scripts execute. The error message, repeatead for each array, is:

"udevd[PID] worker [PID] unexpectedly returned with status 0x0100"

From my debugging efforts it would seem that there is an issue with how mdadm 3.x communicates with udevd, or the other way round. While the arrays get detected, assembling them at boot causes udevd to hang. The only way to solve this problem for the moment was (for me at least) to force-downgrade mdadm from 3.1.4+8efb9d1ubuntu4.1 from Natty repos to 2.6.7.1-1ubuntu16 from Maverick repos and regenerating initrd. While much slower than 10.10, the OS boots in a workable state with all arrays properly assembled and healthy.

Identical issue exists on my system when started from the LiveCD for 11.04.
If I install mdadm from 11.04 repos *and* leave udev running the udev daemon hangs a few seconds after "mdadm --assemble --scan" is executed. If I disable the udev service first, all raid arrays are assembled and started within a second or two. Again, downgrading mdadm to 2.6.7.1-1ubuntu16 solves the problem.

The very same setup previously worked flawless both in Ubuntu 10.04 LTS and 10.10.

Tags: boot mdadm natty
Revision history for this message
ptashek (ptashek) wrote :
Revision history for this message
trendle (trendleforum) wrote :

I think I'm being plagued by the same issue. Like you I was using 10.04 LTS in the first place with mdadm.
However in 10.04LTS I had upgraded my mdadm to 3.1.4+8efb9d1ubuntu and I've had that running for some time.

It is now my misfortune to have chosen to upgrade to 11.04, there goes another few days of my life :-(

The upgrade was a complete bust, and stopped working.

I've now tried to do a fresh install of the ubuntu alternative install disk on a new clean HD (but the mdadm RAID-6 is still connected inside the box).

So now I get these udevd fail messages then the system hangs completely. So now I'll need to rip it down again.

Revision history for this message
molostoff (molostoff) wrote :

This behaviour *is simple to reproduce* on Ubuntu LVM2 layout install:

As an example I have Ubunty natty with LVM2. Suppose vgs reports 20% free in system volume group.

lvcreate -n test -L '15%' sysvolgroup # create new empty volume
...
pvcreate /dev/sysvolgroup/test # attach that volume as device
...
# all things go fine here, except until reboot, that when system boots, udev reports

"udevd[PID] worker [PID] unexpectedly returned with status 0x0100"

udev.log-priority=debug kernel parameter shows that the device at which udevd timeouts - [ /dev/dm-XX ] to which a link /dev/sysvolgroup/test points, in this sample timeouts within scripts/local-premount at rule:

85-lvm2.rules: RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

after timeout all go es well, again - until next boot.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.