mdadm refuses to re-add failed member
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mdadm (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
I have my /home in a three-disk RAID1 configuration (/dev/md1) with a partition on my laptop and a second on an external disk connected via eSATA; A third sits on a third external disk. I booted up with two members degraded (external drive not plugged in) and prior to login, proceeded to use a console to umount, remove and fail the active drive (internal partition member) and stop the RAID1 disk, and then plug in my external, re-starting the /dev/md1 device with the external partition member active and remounting /home. The process is one I have executed many times before and is scripted from a couple of files in /usr/local/bin.
However, this time after logging in with my external member active after executing the process above, and attempting to re-add the internal drive to bring the /dev/md1 device in sync with the external disk I received an error suggesting the add failed. I re-executed the remove, fail, re-add manually with the same outcome as shown on my console below, and filed this bug.
It seems the failed disk thinks it is still active, when I use -Q --examine to interrogate it.
:~# mdadm /dev/md1 -r /dev/sda6
mdadm: hot remove failed for /dev/sda6: No such device or address
:~# mdadm /dev/md1 -f /dev/sda6
mdadm: set device faulty failed for /dev/sda6: No such device
:~# mdadm /dev/md1 -a /dev/sda6
mdadm: /dev/sda6 reports being an active member for /dev/md1, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
:~# mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdc3[0]
86003840 blocks [3/1] [U__]
unused devices: <none>
:~# apport-bug mdadm
Here is a quick summary of what I did,
a) My disks were synced on an 11.10 system
b) I upgraded from 11.10 to 12.04 with one member failed (external)
c) After upgrade I failed the active disk (internal), stopped the array, and restarted it with the external disk
d) Attempted to re-add the failed internal disk after logging in
:~# blkid | grep raid_member
/dev/sda6: UUID="eeeb6708-
/dev/sdc3: UUID="eeeb6708-
:~# mdadm -D /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Array Size : 86003840 (82.02 GiB 88.07 GB)
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Mar 3 13:56:05 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : eeeb6708:
Events : 0.10186827
Number Major Minor RaidDevice State
0 8 35 0 active sync /dev/sdc3
1 0 0 1 removed
2 0 0 2 removed
:~# mdadm -Q /dev/sdc3
/dev/sdc3: is not an md array
/dev/sdc3: device 0 in 3 device active raid1 /dev/md1. Use mdadm --examine for more detail.
:~# mdadm -Q /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 3 device mismatch raid1 /dev/md1. Use mdadm --examine for more detail.
:~# mdadm -Q /dev/sda6 --examine
/dev/sda6:
Magic : a92b4efc
Version : 0.90.00
UUID : eeeb6708:
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Array Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Update Time : Sat Mar 3 13:28:57 2012
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Checksum : 60f50ddb - correct
Events : 10128612
Number Major Minor RaidDevice State
this 1 8 6 1 active sync /dev/sda6
0 0 0 0 0 removed
1 1 8 6 1 active sync /dev/sda6
2 2 0 0 2 faulty removed
clearly it is not active (0,8,35,0 is per -D output above), but it thinks it is.
Captured enough.. time to reboot and see what happens; Hopefully an auto-rebuild. I have the third disk in the array separate should some corruption happen here.
ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: mdadm 3.2.3-2ubuntu1
ProcVersionSign
Uname: Linux 3.2.0-17-generic x86_64
NonfreeKernelMo
ApportVersion: 1.94-0ubuntu1
Architecture: amd64
Date: Sat Mar 3 13:33:11 2012
MDadmExamine.
/dev/sda:
MBR Magic : aa55
Partition[0] : 121660182 sectors at 63 (type 07)
Partition[1] : 503477100 sectors at 121660245 (type 05)
MDadmExamine.
/dev/sda2:
MBR Magic : aa55
Partition[0] : 78124032 sectors at 63 (type 83)
Partition[1] : 172007893 sectors at 78124095 (type 05)
MDadmExamine.
MDadmExamine.
MDadmExamine.
MDadmExamine.
/dev/sdc:
MBR Magic : aa55
Partition[0] : 104438502 sectors at 63 (type 83)
Partition[1] : 20498940 sectors at 104438565 (type 0b)
Partition[2] : 172007893 sectors at 124937505 (type fd)
MDadmExamine.
MDadmExamine.
/dev/sdc2:
MBR Magic : aa55
MachineType: Hewlett-Packard HP Pavilion dv5 Notebook PC
ProcEnviron:
LANGUAGE=en
TERM=xterm
LANG=en_US.utf8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=
ProcMDstat:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdc3[0]
86003840 blocks [3/1] [U__]
unused devices: <none>
SourcePackage: mdadm
UpgradeStatus: Upgraded to precise on 2012-03-03 (0 days ago)
dmi.bios.date: 08/19/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: F.37
dmi.board.
dmi.board.name: 30F2
dmi.board.vendor: Quanta
dmi.board.version: 98.36
dmi.chassis.type: 10
dmi.chassis.vendor: Quanta
dmi.chassis.
dmi.modalias: dmi:bvnHewlett-
dmi.product.name: HP Pavilion dv5 Notebook PC
dmi.product.
dmi.sys.vendor: Hewlett-Packard
mtime.conffile.
Still weird on reboot. mdstat seems fine, but the failed member still thinks it is active.
:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[0]
86003840 blocks [3/1] [U__]
unused devices: <none>
:~# mdadm -D /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Array Size : 86003840 (82.02 GiB 88.07 GB)
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Mar 3 14:12:24 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : eeeb6708: d1080847: 57e9714c: 01b7dbc8
Events : 0.10187219
Number Major Minor RaidDevice State d1080847: 57e9714c: 01b7dbc8
0 8 19 0 active sync /dev/sdb3
1 0 0 1 removed
2 0 0 2 removed
:~# mdadm -Q --examine /dev/sda6
/dev/sda6:
Magic : a92b4efc
Version : 0.90.00
UUID : eeeb6708:
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Array Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Update Time : Sat Mar 3 13:28:57 2012
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Checksum : 60f50ddb - correct
Events : 10128612
Number Major Minor RaidDevice State
this 1 8 6 1 active sync /dev/sda6
0 0 0 0 0 removed d1080847: 57e9714c: 01b7dbc8
1 1 8 6 1 active sync /dev/sda6
2 2 0 0 2 faulty removed
:~# mdadm -Q --examine /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 0.90.00
UUID : eeeb6708:
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Array Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Update Time : Sat Mar 3 14:12:34 2012
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : 60f6e218 - correct
Events : 10187225
Number Major Minor RaidDevice State
this 0 8 19 0 active sync /dev/sdb3
0 0 8 19 0 active sync /dev/sdb3
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed