dmraid eats mdadm-managed raid in upgrading from 9.04 to 9.10

Bug #442735 reported by Spider
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
dmraid
Invalid
Undecided
Unassigned
dmraid (Debian)
Fix Released
Unknown
dmraid (Ubuntu)
Invalid
Undecided
Unassigned
mdadm (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

After rebooting into my newly upgraded 9.10 beta and rebooting a second time my raid was lost. Apperantly dmraid is loaded before mdadm-managed raid, thereby locking /dev/sda1 and others from the raid signature and thereby preventing mdadm from assembling the raid.

This leads to massive data loss as dmraid also will erase the superblock/partition table on the devices.

In my case the solution is to forcibly remove dmraid and reboot.

After that you have to manually kill mountall in a single setup because mountaill will fail and go into a spin where it constantly and incessantly respawns fsck.ext3 (which fails and tells you to restart it without -p or -y )

On top of this the terminal is garbled forcing you to swap to another one. ( seems like a curses based prompt for interaction is overwritten turning the whole thing into garbage)

once mountall is killed and all fsck-jobs pruned, you can re-create the partition tables on the lost disk, and set the type to Linux Raid Autodetect.

Then you can once again do mdadm --assemble /dev/md0 /dev/sd[a-c]1 and wait for 24+ hours.

Then you can fsck and hope that Ubuntu won't eat your raid again.

This is -clearly- suboptimal, caused me a lot of grief and panic.

On top of that the ubuntu-bug tool for mdadm doesn't stop to ask you for a password but happily tries to run mdadm -E on devices without permission to read them, generating what appears to be a bogus report.

ProblemType: Bug
Architecture: amd64
Date: Mon Oct 5 03:53:24 2009
DistroRelease: Ubuntu 9.10
MDadmExamine.dev.sda: Error: command ['/sbin/mdadm', '-E', '/dev/sda'] failed with exit code 1: mdadm: cannot open /dev/sda: Permission denied
MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: cannot open /dev/sda1: Permission denied
MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: Permission denied
MDadmExamine.dev.sdb1: Error: command ['/sbin/mdadm', '-E', '/dev/sdb1'] failed with exit code 1: mdadm: cannot open /dev/sdb1: Permission denied
MDadmExamine.dev.sdc: Error: command ['/sbin/mdadm', '-E', '/dev/sdc'] failed with exit code 1: mdadm: cannot open /dev/sdc: Permission denied
MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: cannot open /dev/sdc1: Permission denied
MDadmExamine.dev.sdd: Error: command ['/sbin/mdadm', '-E', '/dev/sdd'] failed with exit code 1: mdadm: cannot open /dev/sdd: Permission denied
MDadmExamine.dev.sdd1: Error: command ['/sbin/mdadm', '-E', '/dev/sdd1'] failed with exit code 1: mdadm: cannot open /dev/sdd1: Permission denied
MDadmExamine.dev.sdd2: Error: command ['/sbin/mdadm', '-E', '/dev/sdd2'] failed with exit code 1: mdadm: cannot open /dev/sdd2: Permission denied
MDadmExamine.dev.sdd3: Error: command ['/sbin/mdadm', '-E', '/dev/sdd3'] failed with exit code 1: mdadm: cannot open /dev/sdd3: Permission denied
MDadmExamine.dev.sdd4: Error: command ['/sbin/mdadm', '-E', '/dev/sdd4'] failed with exit code 1: mdadm: cannot open /dev/sdd4: Permission denied
MDadmExamine.dev.sdd5: Error: command ['/sbin/mdadm', '-E', '/dev/sdd5'] failed with exit code 1: mdadm: cannot open /dev/sdd5: Permission denied
MDadmExamine.dev.sdd6: Error: command ['/sbin/mdadm', '-E', '/dev/sdd6'] failed with exit code 1: mdadm: cannot open /dev/sdd6: Permission denied
MachineType: Gigabyte Technology Co., Ltd. P35-DS4
NonfreeKernelModules: nvidia
Package: mdadm 2.6.7.1-1ubuntu13
ProcCmdLine: root=UUID=79dc13f4-cbae-42ee-bb76-a110e8a381d7 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcVersionSignature: Ubuntu 2.6.31-11.38-generic
SourcePackage: mdadm
Uname: Linux 2.6.31-11-generic x86_64
dmi.bios.date: 07/21/2008
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F13
dmi.board.name: P35-DS4
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF13:bd07/21/2008:svnGigabyteTechnologyCo.,Ltd.:pnP35-DS4:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnP35-DS4:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: P35-DS4
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Spider (spider-alternating) wrote :
Revision history for this message
Spider (spider-alternating) wrote :

Please see : https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/442737 proper logs generated with admin privileges.

Revision history for this message
Spider (spider-alternating) wrote :
Revision history for this message
Spider (spider-alternating) wrote :

Further information: I'm not using the raid function on the mainboard, in fact it is explicitly disabled in bootup because it was being a hassle even in windows.

I believe the main problem is:

a) loadtime ordering mdadm vs. dmraid. mdadm -REALLY- should be allowed to scan and load the drives before dmraid is even close to my kernel's space.

b) clobbering by dmraid on failed load. The fact that it wiped my partition table on a disk was unforgivable.

c) bootup scripts failing on things like this and falling into semi-endless loops/spins without functional recovery. I realise this kind of failure isn't easy to generate in virtual hardware, but failure modes need to be tested more and functional. Having a GDM load on top of the endless loop of fsck's was even more damaging because I wasn't let in on the fact that something was seriously wrong.

Revision history for this message
Spider (spider-alternating) wrote :
summary: - dmraid /mdadm eats mdadm-managed raid in upgrading from 9.04 to 9.10
- beta.
+ dmraid eats mdadm-managed raid in upgrading from 9.04 to 9.10
Revision history for this message
PrivateUser132781 (privateuser132781-deactivatedaccount) wrote :

My mdadm-managed raid wouldn't mount after upgrading from jaunty to karmic. It seems this problem is caused by dmraid trying to take over the array. According to the Debian bug linked by Spider above this is caused by dmraid recognising the device nodes as part of a fakeraid array rather than as part of an softraid array. In my case this could have been caused by the fact that prior to setting up my softraid array I had used the tool in my bios to set up a fakeraid array.

I could solve this problem by uninstalling dmraid. (I don't know why it was installed -- did it get installed with the karmic upgrade or was it always there and this behaviour is just new?) After rebooting mdadm initialised the array just fine.

Even so, this behaviour results in serious breakage from the user's point of view. Some way of avoiding this should be found -- see the discussion in the debian bug tracker.

I am bumping the importance, but someone from the server team should probably look into this issue.

Changed in dmraid:
importance: Unknown → Undecided
status: Unknown → New
Revision history for this message
Tormod Volden (tormodvolden) wrote :

Thanks for your report. There is a number of issues in the original description, I will leave this report for the dmraid activation-by-default issue. Please open separate bug reports for the other issues.

Changed in mdadm (Ubuntu):
status: New → Invalid
Changed in dmraid:
status: New → Invalid
Revision history for this message
Tormod Volden (tormodvolden) wrote :

The problem here is that dmraid finds valid fakeraid signatures on your disks, which is the only way for it to know if you have a fakeraid or not. It would not happen if you removed the fakeraid in your BIOS setup.

A workaround is to boot with the "nodmraid" boot option. This, and the fact that dmraid is included on the Desktop CD should appear in the Release Notes, but for some reason it is not on the web site yet.

Changed in dmraid (Ubuntu):
status: New → Confirmed
Revision history for this message
Brian Buchanan (brianbuchanan) wrote :

Thank-you. Same problem here with a 9.04 upgrade to 9.10 and dm-raid taking the devices before mdadm. I couldn't figure out why I had no partitions in /proc/partitions (i.e. /dev/sda existed, but /dev/sda1 was missing) and using fdisk to rewrite the partition table created the nodes, but mdadm couldn't assemble the raid stating that the device or resource was busy.

apt-get remove dmraid

got me going. I had additional, likely unrelated problems on my LVM volumes on the boot. Either this lost the ext3 journal or it never had one, but the volume (defined as ext3 in fstab) refused to mount. tune2fs -j /dev/mapper/vg-lv added the journal and so far so good.

Revision history for this message
Peter Szmulik (peter-szmulik) wrote :

Having just upgraded from 8.10 to 9.04 and experiencing similar problems I guess upgrading to 9.10 is not a fix and the cause is the same?

On boot I get "atax: softreset failed (device not ready)". It then continues booting but drops out at the file system check; the root partition which sits on a non-raid, non-lvm volume is fine. For my two logical volumes fsck.ext3 (that sits ontop of a set of two RAID-1 sets) is unable to resolve the UIIDs and drops out to a maintenance shell. The two logial volumes hosts /home and /var. At the maintenance shell I tried to enter mdadm --examine --scan --config=mdadm.conf >> /etc/mdadm/mdadm.conf. At one point that did allow me to log into the system with access to /home and /var. I belive I was able to rescue all data to an external USB drive; however gparted reported errors on all affected volumes indicating that all data could not be read. After an (unwise) reboot I'm back where I started an the previous fix no longer works. Trying to boot with an older kernel avoids the softreset problem but the system still fails the file system check

Would the current recommendation for regular users (and anyone with just modest time/energy available) be to avoid upgrading to 9.04 or 9.10 if they have software raid and lvm enabled? Or backup all data and install from scratch if you really need to move to a new version? Until these issues are resolved I would think it could be a good idea to mention these issues in the release notes for 9.04 and 9.10.

Revision history for this message
Tormod Volden (tormodvolden) wrote :

Peter, if you have no fakeraids you can simply uninstall the dmraid package.

Revision history for this message
Peter Szmulik (peter-szmulik) wrote :

Tormod, thanks; that sort of hit me like a brick wall! I run Linux/software raid! Wow.

I'll see what happens!

Many thanks!
Peter

Revision history for this message
Ate Siemensma (ate-atesiemensma) wrote :

cannot uninstall dmraid as i boot from /dev/mapper/pdc_dejiafebgh1
disabled unused dmraid with " sudo dmraid -an -v "
output:

RAID set "nvidia_jaeffjja" is not active
INFO: Deactivating stripe raid set "nvidia_jaeffjja"

that disabled the mdadm raid disk falseley detected.

after that i could use the disks again in mdadm by using:
" sudo mdadm --assemble --scan --verbose "
But still i have to perform this at every reboot

Revision history for this message
Phillip Susi (psusi) wrote :

I'm going to have to say that this is not a bug, but rather an invalid hardware configuration. A disk can not be part of a fake raid and an mdraid at the same time. Simply disabling the raid support in the bios has no effect on Linux as it has no way of telling whether or not the bios raid support is enabled or not. If you aren't using the disk as part of a fakeraid set, then you should scrub the raid metadata from the disks with dmraid -E, or the bios utility.

Changed in dmraid (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
darren (darrenm) wrote :

No BIOS utility here to scrub the metadata.

I reported one of the duplicates of this bug months ago and I see it still isn't fixed. I couldn't install Ubuntu on my PC using software RAID as I have old dmraid metadata on the devices. I've now checked up and learned what the comments above mean and I've managed to boot correctly into Ubuntu editing the GRUB command line to have nodmraid, run dmraid -r -E in Ubuntu and then reboot happily without the dmraid metadata. This is undoubtedly what the issue is and is correctly described above, that md and dmraid metadata can't co-exist.

But the issue remains that if existing metadata is there, someone can't just install Ubuntu using md software RAID. It requires extra steps to get it working, something less tech users may not be able or inclined to do.

Changed in dmraid (Ubuntu):
status: Invalid → New
Phillip Susi (psusi)
Changed in dmraid (Ubuntu):
status: New → Invalid
Changed in dmraid (Debian):
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.