SW RAID weirdness
Yesterday a disk dropped out of my RAID array at home:
ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata3.00: irq_stat 0x00000040, connection status changed
ata3: SError: { PHYRdyChg CommWake DevExch }
ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 40/00:04:14:d6:42/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
end_request: I/O error, dev sdc, sector 625137172
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdc2, disabling device.
raid1: Operation continuing on 1 devices.
ata3: EH complete
sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
RAID1 conf printout:
— wd:1 rd:2
disk 0, wo:1, o:0, dev:sdc2
disk 1, wo:0, o:1, dev:sdd2
RAID1 conf printout:
— wd:1 rd:2
disk 1, wo:0, o:1, dev:sdd2
So I kicked the disk and rejoined it:
md: unbindmd: export_rdev(sdc2) md: bind RAID1 conf printout: — wd:1 rd:2 disk 0, wo:1, o:1, dev:sdc2 disk 1, wo:0, o:1, dev:sdd2 md: recovery of RAID array md1 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. md: using 128k window, over a total of 312472192 blocks. md: md1: recovery done. RAID1 conf printout: — wd:2 rd:2 disk 0, wo:0, o:1, dev:sdc2 disk 1, wo:0, o:1, dev:sdd2
This morning I wake up and find the other disk has dropped out - an I/O for the same sector - this is too much of a co-incidence for my liking
ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata4.00: irq_stat 0x00000040, connection status changed
ata4: SError: { PHYRdyChg CommWake DevExch }
ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 40/00:04:14:d6:42/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
end_request: I/O error, dev sdd, sector 625137172
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdd2, disabling device.
raid1: Operation continuing on 1 devices.
ata4: EH complete
sd 3:0:0:0: [sdd] 625142448 512-byte hardware sectors (320073 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
RAID1 conf printout:
— wd:1 rd:2
disk 0, wo:0, o:1, dev:sdc2
disk 1, wo:1, o:0, dev:sdd2
RAID1 conf printout:
— wd:1 rd:2
disk 0, wo:0, o:1, dev:sdc2
Posted: August 22nd, 2008 under My setup.
Comments: 3
Comments
Comment from adrian
Time: Friday 22 August, 2008, 09:09
Thanks to Hugo for pointing out how to spell “weird”.
Comment from Kai Hendry
Time: Monday 25 August, 2008, 14:53
I’ve had a terrible experiences with RAID. Either software or hardware, once something fails everything fails. :/ Yearning for better.
Comment from adrian
Time: Monday 25 August, 2008, 15:58
It’s normally worked fine for me in the past on other machines. What we really need is a cheap decent onboard RAID (i.e. with some _write_ cache). Or of course a dual-fabric SAN with some stonking RAID arrays ![]()
Write a comment