12
September 10, 2014 Sam Siewert Software Engineering RAID Backgrounder

ECEN 5623 RT Embedded Systemsmercury.pr.erau.edu/.../Lectures/Lecture-Basic-RAID-Backgrounder.pdf · Fast Storage is Either SSD, RAID or Hybrid Sam Siewert 2 . Multiple Disk Drives

Embed Size (px)

Citation preview

September 10, 2014 Sam Siewert

Software Engineering

RAID Backgrounder

Scalable Enterprise File systems

Three Types of Media Storage

– Direct Attached Storage – e.g. SATA (Serial ATA)

– Network Attached Storage – e.g. NFS

– Storage Area Networks – e.g. SAS (Serial Attached SCSI), Fiber

Channel

Flash / RAM based SSD Still 10x++ More Costly than

Spinning Media

– Predictions for Demise of HDDs and RAID?

– Cost is the Driver (E.g. < $0.01 / GB tape, < $0.10 / GB HDD,

$1.00 / GB SSD)

Fast Storage is Either SSD, RAID or Hybrid

Sam Siewert 2

Multiple Disk Drives

Disk Drives Fail – Like a Light-bulb

– MTBF of 100’s of Thousands of Hours [3 to 5 Years at Duty

Cycle]

– Difficult to Determine When Failure Might Occur

– The Larger the Population, the More Often Failures will be Seen

Disk Drives Have Low Random Access [100 to 200 I/Os

per Second]

Idea – Write to them in Parallel and Mirror Data to

Protect Against HDD Failures (Erasures)

Sam Siewert 3

RAID-10

Sam Siewert 4

A1 A1 A2 A2 A3 A3

A4 A4 A5 A5 A6 A6

RAID-1 Mirror RAID-1 Mirror RAID-1 Mirror

RAID-0 Striping Over RAID-1 Mirrors

A7 A7 A8 A8 A9 A9

A10 A10 A11 A11 A12 A12

A1,A2,A3, … A12

RAID Operates on LBAs/Sectors

(Sometimes Files) SAN/DAS RAID

NAS – Filesystem on top of RAID

RAID-10, RAID-50, RAID-60 – Stripe Over Mirror Sets

– Stripe Over RAID-5 XOR Parity Sets

– Stripe Over RAID-6 Reed-Soloman or Double-Parity Encoded Sets

EVEN/ODD

Row Diagonal Parity

Minimum Density Codes (Liberation)

Reed-Solomon Codes – Generalized Erasure Codes

Cauchy Reed-Solomon, LDPC (Low Density Parity Codes), Weaver/Hover

MDS (Maximal Distance Separation) – For each Parity Device, Another Level of Fault Tolerance is Provided

– Larger Drives (Multi-terabyte), Larger arrays (100’s of drives), and Cost Reduction are Driving RAID6 and Higher Levels

Sam Siewert 5

RAID5,6 XOR Parity Encoding

MDS Encoding, Can Achieve High Storage Efficiency

with N+1: N/(N+1) and N+2: N/(N+2)

Sam Siewert 6

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sto

rag

e E

ffic

ien

cy

Number of Data Devices for 1 XOR or 2 P,Q Encoded Devices

RAID6

RAID5

RAID-50

Sam Siewert 7

A1

RAID-5 Set RAID-5 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 H1 P(EFGH)

I1 J1 P(IJKL) K1 L1

M1 P(MNOP) N1 P1 O1

P(QRST) Q1 R1 S1 T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 H2 P(EFGH)

I2 J2 P(IJKL) K2 L2

M2 P(MNOP) N2 P2 O2

P(QRST) Q2 R2 S2 T2

RAID-0 Striping Over RAID-5 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…,

Q2,R2,S2,T2

A1

RAID-6 Set RAID-6 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 P(EFGH)

I1 J1 P(IJKL) K1

M1 P(MNOP) N1 O1 P(QRST) Q1 R1 S1

RAID-0 Striping Over RAID-6 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H1 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L1 P1

T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 P(EFGH)

I2 J2 P(IJKL) K2

M2 P(MNOP) N2 O2 P(QRST) Q2 R2 S2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H2 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L2 P2

T2

RAID-60 (Reed-Solomon Encoding)

RAID is an Erasure Code

RAID-1 is an MDS EC (James Plank, U. of Tennessee)

Sam Siewert 9

Comparison of ECs

Data Devices = n

Coding Devices = m

Total = m+n

Storage Efficiency: R=n/(n+m) – RAID1 2-Way, R=1/(1+1)=50%, MDS=1, Reads 2x Speed-up, 1x

Write

– RAID1 3-Way, R=1/(1+2)=33%, MDS=2, 3x Read, 1x Write

– RAID10 with 10 sets, R=10/(10+10)=50%, MDS=1, 20x Read, 10x Write

– RAID5 with 3+1 set, R=3/(3+1)=75%, MDS=1, 3x Read (Parity Check?), RMW Penalty, Striding Issues

– RAID6 with 7+2 set, R=5/(5+2)=71%, MDS=2, 5x Read, Reed-Solomon Encode on Write and RMW Penalty

– Beyond RAID6?

Cauchy Reed-Solomon Scales, but Encode, Decode Complexity High

Low Density Parity Codes, Simpler, but not MDS

Sam Siewert 10

Read, Modify Write Penalty

Any Update that is Less than the Full RAID5 or RAID6 Set, Requires 1. Read Old Data and Parity – 2 Reads

2. Compute New Parity (From Old & New Data)

3. Write New Parity and New Data – 2 Writes

Only Way to Remove Penalty is a Write-Back Cache to Coalesce Updates and Perform Full-Set Writes Always

Sam Siewert 11

A1

RAID-5 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 H1 P(EFGH)

I1 J1 P(IJKL) K1 L1

M1 P(MNOP) N1 P1 O1

P(QRST) Q1 R1 S1 T1

Write A1 P(ABCD)new=A1new xor A1

xor P(ABCD)

A1 B1 C1 D1 P(ABCD)

0 0 0 0 0

0 0 0 1 1

0 0 1 0 1

0 0 1 1 0

0 1 0 0 1

0 1 0 1 0

0 1 1 0 0

Hands-On Coding Exercise(s)

Examples-RAID-Unit-Test, stripetest.c

Sam Siewert 12

A B C D XOR

XOR[A,B,C,D] A,B,C,D Strips

[siewerts@localhost Examples-RAID-Unit-Test]$ ./stripetest Baby-Musk-Ox.ppm Baby-Musk-Ox.ppm.replicated

read full stripe

hit end of file

FINISHED

[siewerts@localhost Examples-RAID-Unit-Test]$

[siewerts@localhost Examples-RAID-Unit-Test]$ diff Baby-Musk-Ox.ppm Baby-Musk-Ox.ppm.replicated