Comment 13 for bug 330298

Revision history for this message
RnSC (webclark) wrote : Re: mdadm software raid breaks on intrepid-jaunty upgrade

The title definitely happened to me.

System boots off of a simple, separate disk.
Two 500GB disks (sdb, sdc) were mirrored with md, then a VG created, a LV, and ext3 on top. Simple, worked great. /etc/mdadm/mdadm.conf did NOT list the configuration, but rather depended on a scan to figure it all out every boot. I did not set this up. I just ran mdadm and vgxxx lvxxx commands to create it all on install. Never touched the .conf.

Upgraded 0810 to 0904.
This is where I get fuzzy, did not keep good records.

On boot, fsck failed saying that the volume did not exist.
mdadm at various times has told me that sdb did not exist, had a bad superblock, and was in use by another process. No doubt I caused some of my own problems. At a point in the past on a re-install of 0810 I they were not recognized and I recovered by setting one of the mirrors to fail, removing it, and readding it. Did not work this time. I continued to fiddle, as well as learn the syntax of mdadm (man page confusing (to me)).

At all times mdadm --examine /dev/sdb (or sdc) told me that both mirrors were fine / clean.

As I fiddled, one disk came on-line. I rebooted, and it was gone. I got it back. At one point I had two. Fool, I rebooted. Lost. After much fooling (wouldn't come back) I got one back. Am backing up to a non-md ext3 on a USB drive! Plan to wipe the disks and reinstall from scratch.

Error messages saying that /dev/sdb does not exist when I can dd blocks from it, or that it is in use by another process when (1) I have not run anything and (2) lsof does not show anything, and Statements that the Superblock is bad on a mirror that has been running for six monthes, and was healthy when I pushed the "upgrade" button and dead on reboot at the end of the install process ... are not helpful at all! Plus things were inconsistent.

It sure looks like something buggy / flakey is happening. I would suspect flakey hardware except that the system has been stable for 6 or 8 years, including up until I pressed "Upgrade". Two hours later, this.

Should I assume that 0904 is inherently stable, that this was just a botched "upgrade" proceedure not covering something that was changed, or should I re-install 0810?

Opinions and your rationale would be GREATLY appreciated.