Xen dom0 kernel corrupts software raid (2.6.24-19)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
Stefan Bader | ||
Hardy |
Fix Released
|
Medium
|
Stefan Bader | ||
linux-meta (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Hardy |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: linux-xen
Hard heron (fully patched) installed with kickstart file:
http://
I build 4 directories on 4 raids:
mdadm -C /dev/md0 -l 5 -n 4 /dev/sd[abcd]3
mdadm -C /dev/md1 -l 5 -n 4 /dev/sd[efgh]3
mdadm -C /dev/md2 -l 5 -n 4 /dev/sd[ijkl]3
mdadm -C /dev/md3 -l 5 -n 4 /dev/sd[mnop]3
for i in `seq 0 3`; do
mkfs.ext3 /dev/md$i;
mount /dev/md$i /disk/$i;
touch /disk/$i/f;
done
Then run:
iozone -s 16g -r 1024 -t 4 -F /disk/[0123]/f
When I run iozone the disk system gets corrupted:
dmesg reports:
[ 2435.753683] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.753745] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.753811] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.753864] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.753913] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754003] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.754068] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754121] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.754170] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754176] sd 8:0:3:0: [sdd] Result: hostbyte=DID_ERROR driverbyte=
[ 2435.754181] end_request: I/O error, dev sdd, sector 37085198
[ 2435.754191] sd 8:0:3:0: [sdd] Result: hostbyte=DID_ERROR driverbyte=
[ 2435.754194] end_request: I/O error, dev sdd, sector 37085582
[ 2435.854168] raid5:md0: read error corrected (8 sectors at 1902976 on sdd3)
[ 2435.854544] raid5:md0: read error corrected (8 sectors at 1902984 on sdd3)
[ 2435.854552] raid5:md0: read error corrected (8 sectors at 1902992 on sdd3)
[ 2435.854556] raid5:md0: read error corrected (8 sectors at 1903000 on sdd3)
[ 2435.854559] raid5:md0: read error corrected (8 sectors at 1903008 on sdd3)
[ 2435.854566] raid5:md0: read error corrected (8 sectors at 1903016 on sdd3)
...
/proc/mdstat reports:
md0 : active raid5 sda3[0] sdd3[4](F) sdc3[5](F) sdb3[6](F)
187526400 blocks level 5, 64k chunk, algorithm 2 [4/1] [U___]
md1 : active raid5 sde3[0] sdh3[3] sdg3[2] sdf3[1]
187526400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdi3[4](F) sdl3[5](F) sdk3[2] sdj3[1]
187526400 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]
md3 : active raid5 sdm3[0] sdp3[3] sdo3[2] sdn3[1]
187526400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
When I replace the xen dom0 kernel:
Linux 43-246-120-128 2.6.24-19-xen #1 SMP Wed Jun 18 16:08:38 UTC 2008 x86_64 GNU/Linux
With:
Linux 43-246-120-128 2.6.24-19-generic #1 SMP Wed Jun 18 14:15:37 UTC 2008 x86_64 GNU/Linux
Everything works. I've reproduced this several times and every time the xen kernel causes multiple disks to drop out of raid, and the generic kernel works perfectly (no drops, no dmesg, no errors).
Changed in linux-meta: | |
status: | New → Invalid |
Changed in linux: | |
assignee: | nobody → stefan-bader-canonical |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux: | |
status: | In Progress → Invalid |
found the package name