62
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of China A joint work with Liping Xiang, John C.S. Lui and Qian Chang

A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of

  • View
    234

  • Download
    2

Embed Size (px)

Citation preview

A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes:

Algorithms and Performance Evaluation

Yinlong Xu

University of Science and Technology of China

A joint work with

Liping Xiang, John C.S. Lui and Qian Chang

I would like to thank

John C.S. Lui,

Raymond W. Yeung,

Patrick B.C. Lee,

Alfred C.L. Ho!

Outline

BackgroundA Hybrid Recovery Approach for Single Disk FailureRow-Diagonal Optimal Recovery (RDOR) for Single Disk

Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

3/60

Outline

BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Disk

Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

4/60

Remark:

5/60

This work can be applied to two RAID-6 codes, RDP and EVENODD.

This talk takes RDP as an example.

RDP Code

Note: With RDP code, all information data is recoverable when any two disks fail.

In a form of a (p1)×(p+1) matrix, p is a prime number.

The first p1 columns are information columns.

The last two are parity columns (row parity, diagonal parity).

6/60

Missing Diagonal

d0,4= d0,0 d0,1 d0,2 d0,3

d0,5= d0,0 d2,3 d3,2 d1,4

Row parity

Diagonal parity

Outline of our work

Problem: The recovery of single disk failure in RDP coded systems

Motivation: RDP code tolerates two disk failures, but the probability of single disk failure is much higher than double disk failures.

Contributions:Give the lower bound of disk readsPropose a recovery scheme, s.t.

Disk reads matches the lower bound, reduced by 1/4.Balancing disk readsMinimum extra memory usage: (p1)/2 blocksXOR operations: No more than conventional scheme

7/60

A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(1)Case 1: Single information disk fails

Row parity disk and other information disks are used for the recovery.

The recovery of Disk 1

d3,1

d2,1

d1,1

d0,1

Disk6Disk

0Disk

1Disk

2Disk

3Disk

4Disk

5

d0,1

d1,1

d2,1

d3,1

8/60

A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(2)Case 2: Single parity disk fails

The recovery is equivalent to the parity encoding

The recovery of diagonal parity disk

d3,5

d2,5

d1,5

d0,5

Disk6

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,5

d1,5

d2,5

d3,5

9/60

Features of the Naive Recovery Scheme

Only uses single parity column for single disk failure recovery, however, there are two parity columns in the array.

(p1)2 symbols are read from the disks for the recovery.

10/60

Questions

Whether the disk reads can be reduced for the recovery of single disk failure?

What if two parity disks are used for single disk failure recovery?

11/60

Some Benefits from Reducing Disk Reads

Speeding up the recoveryRelieving system bus loadRelieving disk loadEnhancing user’s service performanceSaving disk energy…

12/60

Outlines

BackgroundA Hybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single disk

Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

13/60

Row Parity or Diagonal Parity?

Either row parity or diagonal parity can be used to recover an erasure symbol

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,1

d1,1

d2,1

d3,1

d0,1 can be recovered by row parity

14/60

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,1

d1,1

d2,1

d3,1

d0,1 can also be recovered by diagonal parity

A Hybrid Recovery Approach for SingleDisk Failure

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,1

d1,1

d2,1

d3,1

Overlapping symbols

15/60

Using diagonal parity to recover d0,1;

Using diagonal parity to recover d1,1;

Using row parity to recover d2,1;

Using row parity to recover d3,1.

Notes: There are 4 overlapping symbols which need to be read twice. If the 4 overlapping symbols are per-stored in memory, the number of

disk reads is reduced to 164=12<16.

Consideration of Hybrid Recovery ApproachBy using memory read instead of disk read

The recovery process will be speeded up Note: Memory read is 100 times faster than disk readCommunication load of the storage system will be reduced

During the recovery, the more overlapping symbols, the fewer symbols to be read from disks.

QuestionsWhat is the lower bound of disk reads for single disk

failure recovery?How to design a recovery scheme which matches this

lower bound?

16/60

Outlines

BackgroundA Hybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single

Failure Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

17/60

Row Parity Sets

Ri = {di,k|0 k p1}-----the i-th row parity set.

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

Because

d0,4=d0,0d0,1d0,2d0,3 ,

so

d0,1=d0,0d0,2d0,3d0,4

18/60

Row

parity

Diagonal

parity

Each symbol in Ri can be recovered by other symbols in Ri.

E.g.

R0={d0,0, d0,1, d0,2, d0,3, d0,4}.

Diagonal Parity Sets

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

Dj= {di,k|(i+k) mod p = j, 0 i p2, 0 k p} is the j-th diagonal parity set.

19/60

Row

parity

Diagonal

parity

d0,1=d1,0d3,3d2,4d1,5

Each symbol in Dj can be recovered by other symbols in Dj.

E.g.

D1={d1,0, d0,1, d3,3, d2,4, d1,5}

Overlapping Symbols

There is just one common (named overlapping) symbol between each pair of Ri and Dj.

R1

D3

e.g. R1∩D3={d1,2}

20/60

Special Cases of Parity Sets

Only belong to their diagonal parity

sets

Only belong to their row parity sets

Disk p can only be recovered by diagonal parity.

This work only consider the recovery of Disk k, with k ≠ p.

21/60

Recovery Combination

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

E.g. Using recovery combination (D1, D2, R2, R3) to

recover Disk 1.

A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.

22/60

Recovery Combination

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

E.g. Using recovery combination (D1, D2, R2, R3) to

recover Disk 1.

Using D1 to recover d0,1;

A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.

23/60

Recovery Combination

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

E.g. Using recovery combination (D1, D2, R2, R3) to

recover Disk 1.

Using D1 to recover d0,1;

Using D2 to recover d1,1;

A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.

24/60

Recovery Combination

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

E.g. Using recovery combination (D1, D2, R2, R3) to

recover Disk 1.

Using D1 to recover d0,1;

Using D2 to recover d1,1;

Using R2 to recover d2,1;

A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.

25/60

Recovery Combination

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

E.g. Using recovery combination (D1, D2, R2, R3) to

recover Disk 1.

Using D1 to recover d0,1;

Using D2 to recover d1,1;

Using R2 to recover d2,1;

Using R3 to recover d3,1.

A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.

26/60

Number of Overlapping Symbols

Assumption Disk k is in erasure p1 symbols d0,k, d1,k, … , dp-2,k need to be recovered

Disk5Disk4Disk3Disk2Disk1Disk0

Conclusion

t(p1t) = (t(p1)/2)

2+(p1)

2/4 overlapping symbols

When t=(p1)/2, the number of overlapping symbols is maximized.

A Recovery Scheme

t erasure symbols from diagonal parity sets

The other p1t symbols from row parity sets

27/60

Lower Bound of Disk Reads for SingleFailure RecoveryThe maximum number of overlapping symbols is (p1)2/4.A maximum of (p1)2/4 symbols may be reduced from disk

read for recovery.

Conclusion: The lower bound of disk reads for recovery is (p1)2(p1)2/4 =3(p1)2/4.

28/60

Symbols be read Overlapping symbols

Disk5Disk4Disk3Disk2Disk1Disk0

Read Optimal Recovery Scheme

Any recovery combination consists of (p1)/2 row and (p1)/2 diagonal parity sets is read optimal.

-----Named Row-Diagonal Optimal Recovery (RDOR).

Conclusion:

RDOR reduces approximately 25% disk reads compared with the naive scheme.

29/60

Outlines

BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single

Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

30/60

Example: Two Read Optimal Recovery Combinations

(R0, R1, R2, D3, D4, D5) (D0, D1, R2, D3, R4, R5)

Disk reads: 5 4 3 3 4 5 3 4 4 4 4 4 4 3

Unbalanced Balanced

31/60

Problem and Questions

During recovery, the disk with the most read operations may slow down the recovery.

32/60

Questions

To reduce the recovery time, what is a balanced and read-optimal recovery scheme?

It reads the same (or almost the same) number of symbols from different disks.

Average Reads from Each Disk

The minimum number of disk reads for recovery is 3(p1)2/4.To achieve read optimal, (p1)/2 symbols will be read from

Disk p (diagonal parity disk).

Conclusion: Average number of symbols to be read from the other

surviving disks (except for Disk k and Disk p) is [3(p 1)2/4 (p 1)/2] / (p 1)= (3p 5)/4.

Note: A balanced read-optimal recovery should read (p1)/2 symbols from Disk p and (3p 5)/4 symbols from each of other disks

33/60

Example: A Balanced Example

(D0, D1, R2, D3, R4, R5)

4 4 4 4 4 4 3

Balanced

34/60

E.g. p=7

Total: 3(p 1)2

/4 =27

Disk 7: (p 1)/2=3

Each of other disks:

(27 3)/6=4

Recovery Sequence

Define a recovery sequence x0, x1, ... , xp2, xp1 corresponds to a recovery combination, wherexi=0 means that di,k is recovered from its row parity setxi=1 means that di,k is recovered from its diagonal parity set

E.g.

(D0, D1, R2, D3, R4, R5)

1 1 0 1 0 0 0

35/60

Additional symbol

Condition of Read Optimal and Balanced Recovery SequenceRecovery sequence {xi}0≤i≤p1 is read optimal and balanced if

and only if Read optimal

x0+x1+…+xp2+xp1=(p1)/2 (1)

Symbols in missing diagonal and added row are recovered by row parity.

xp1 k=xp1=0 (2)

(3p 5)/4 symbols to be read from Disk j (0≤j≤p1, j≠k)

36/60

(3)

Read Optimal and Balanced Recovery –– An Example (D0, D1, R2, D3, R4, R5) is a read optimal and balanced recovery

combination for p=7 and k=0.

Corresponding recovery sequence x0x1...x5x6=1101000 satisfies: x0+x1+…+x5+x6=(p1)/2=3 (1) xp1 k=xp1=0 (2)

37/60

Condition of Read Optimal and Balanced Recovery Sequence (Cont.)When xi=0 or x<i+jk>p=1, di,j in Disk j is used for recovery.

When di,j is used for recovery, xi(1x<i+jk>p)=0.The number of symbols that need to be read in Disk j is

38/60

Number of symbols not

read from Disk j

x2=0, d2,3 is read

x0=1, d4,3 is used

Disk 3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

39/60

d4,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is used;

40/60

d4,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is used;

x1=1, d5,3 is used;

41/60

d5,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is used;

x1=1, d5,3 is used;

x2=0, d2,3 is used;

42/60

d2,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is used;

x1=1, d5,3 is used;

x2=0, d2,3 is used;

x3=1, d0,3 is used;

43/60

d0,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is used;

x1=1, d5,3 is used;

x2=0, d2,3 is used;

x3=1, d0,3 is used;

x4=0, d4,3 is used;

44/60

d4,3

Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:

4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)

Disk 3

E.g. x0=1, d4,3 is read;

x1=1, d5,3 is read;

x2=0, d2,3 is read;

x3=1, d0,3 is read;

x4=0, d4,3 is read;

x5=0, d5,3 is read;

45/60

d5,3

Not be read

Recovery Set

Given a recovery sequence {xi}0≤i≤p1, define

A={ i | xi=1, 0≤i≤p1} as the recovery set.

x0x1...x5x6=1101000

A={0,1,3}

46/60

Recovery Set

As if and only if i∈ A and <i+t>p∈A, xix<i+t>p= 1. So

Balanced Recovery Set

A corresponds to a balanced sequence, if and only if

For any t (1≤ t≤ p1), t has a multiplicity of (p3)/4 in the multi-set MA={a1a2| a1, a2 A, a1≠a2∈ }

47/60

The Existence of Read Optimal and Balanced Recovery Set

By using the concept of (partial) difference-set, we have the following conclusion.Given a prime number p, and the nonzero squares set

D={i2|1≤i≤(p−1)/2} in Fp is a difference-set.

There is g∈Fp, s.t. A=D+g corresponds to a read-optimal and balanced recovery sequence {xi}0≤i≤p−1 for the recovery of Disk k (k≠p).

48/60

Reviewing on Read Balance Problem

49/60

Find out the average number of disk reads on each disk .

Define recovery sequence and recovery set to describe recovery

scheme.

Find out the constraint conditions that a recovery set is read optimal and

balanced.

Using the concept of (partial) difference set to solve these constraint

conditions.

The read optimal and balanced recovery scheme corresponds to the solved

recovery set.

Outlines

BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single

Failure Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

50/60

Extra Memory Usage Problem

The number of overlapping symbols should be stored in memory is at most (p1)2/4.

The more overlapping symbols, the more extra memory usage.

Question How to minimize the extra memory usage while read-optimal and

balanced?

51/60

Main Idea of Optimizing Extra Memory Usage

Using D1 to recover d0,1; Pre-store d3,3;

Pre-store d2,4;

Using D2 to recover d1,1; Pre-store d2,0;

Pre-store d3,4;

Using R2 to recover d2,1; Read d2,0, d2,4 from memory;

Using R3 to recover d3,1. Read d2,0, d2,4 from memory;

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,1

d1,1

d2,1

d3,1

52/60

E.g. Using (D1, D2, R2, R3) to recover Disk 1.

Need four extra memory units

Main Idea of Optimizing Extra Memory UsageMain Idea

Store the XOR-sum of overlapping symbols instead of the original symbols to optimize extra memory usage.

Disk0

Disk1

Disk2

Disk3

Disk4

Disk5

d0,1

d1,1

d2,1

d3,1

53/60

Two extra memory units M[2], M[3] are reserved for the recovery of

d2,1, d3,1.

M[2]=0, M[3]=0;

M[2]=d2,4, M[3]= d3,3;

M[2]=d2,0d2,4, M[3]=d3,3 d3,4;

M[2]

M[3]

Only need two extra memory units

Read Optimal and Balanced Recovery Scheme with Minimum Memory UsageUsing the read optimal and balanced recovery combination.Recovery process is executed in a “row-parity-first” manner.

Firstly, recover all symbols that use row parity sets. Then, using diagonal parity sets to recover the other symbols.

(p1)/2 memory units are reserved to recover (p1)/2 symbols which use diagonal parity sets for recovery.

54/60

Outlines

BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Failure

Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

55/60

Methodology

Experiment Settings Off-line recovery mode DiskSim simulation Disk array size p+1=8, 14, and 20 Strip size from 16KB to 64KB

Metrics Total recovery time Individual disk access time

56/60

Experimental Results –– Recovery Time

The total recovery time of RDOR is less than the naive scheme as strip size varies from 16KB to 64KB.

Moreover, with a strip size less than 32KB, the recovery time of RDOR is reduced by approximately 20% compared with the naive scheme.

57/60

Experimental Results –– Disk Access Time

The average disk access time of RDOR is reduced 15.16% to 22.60% when p=7 and strip size varies from 16KB to 64KB.

In on-line scenarios, each disk will be more available to serve user’s requests.

58/60

Outlines

BackgroundMotivationHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Failure

Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage

Performance EvaluationSummary

59/60

Summary

The proposed single recovery scheme RDOR issues Lower bound of disk reads for recovery

When k≠p, the number of symbols should be read from disk is reduced by 1/4 compared with the conventional strategy.

Balancing disk readsThe number of read operations from each disk are the same (or almost the

same). Minimum memory usage

At any time, the maximum number of overlapping symbols or their computations stored in memory is (p1)/2.

XOR operationsNo more than conventional scheme

60/60

Future works

Design efficient recovery algorithms for other codes.

Construct codes against multiple failures but more efficient for single failure recovery.

61/60

Thank you!