Site menu:

Sponsored by

Bitcube Ltd.  Expert Linux Consultancy

Categories

Meta

Site search

 

August 2008
M T W T F S S
« Jul   Sep »
 123
45678910
11121314151617
18192021222324
25262728293031

Archives

Links:

SW RAID weirdness

Yesterday a disk dropped out of my RAID array at home:

ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata3.00: irq_stat 0x00000040, connection status changed
ata3: SError: { PHYRdyChg CommWake DevExch }
ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:04:14:d6:42/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
end_request: I/O error, dev sdc, sector 625137172
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdc2, disabling device.
raid1: Operation continuing on 1 devices.
ata3: EH complete
sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdc2
 disk 1, wo:0, o:1, dev:sdd2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sdd2

So I kicked the disk and rejoined it:

md: unbind
md: export_rdev(sdc2)
md: bind
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdc2
 disk 1, wo:0, o:1, dev:sdd2
md: recovery of RAID array md1
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
md: using 128k window, over a total of 312472192 blocks.
md: md1: recovery done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:sdc2
 disk 1, wo:0, o:1, dev:sdd2

This morning I wake up and find the other disk has dropped out – an I/O for the same sector – this is too much of a co-incidence for my liking :-(

ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata4.00: irq_stat 0x00000040, connection status changed
ata4: SError: { PHYRdyChg CommWake DevExch }
ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:04:14:d6:42/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
end_request: I/O error, dev sdd, sector 625137172
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdd2, disabling device.
raid1: Operation continuing on 1 devices.
ata4: EH complete
sd 3:0:0:0: [sdd] 625142448 512-byte hardware sectors (320073 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sdc2
 disk 1, wo:1, o:0, dev:sdd2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sdc2

Comments

Comment from adrian
Time: Friday 22 August, 2008, 09:09

Thanks to Hugo for pointing out how to spell “weird”.

Comment from Kai Hendry
Time: Monday 25 August, 2008, 14:53

I’ve had a terrible experiences with RAID. Either software or hardware, once something fails everything fails. :/ Yearning for better.

Comment from adrian
Time: Monday 25 August, 2008, 15:58

It’s normally worked fine for me in the past on other machines. What we really need is a cheap decent onboard RAID (i.e. with some _write_ cache). Or of course a dual-fabric SAN with some stonking RAID arrays :)