37
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1 , Patrick P. C. Lee 2 , Liping Xiang 1 , Yinlong Xu 1 , Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12 1

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

  • Upload
    sorcha

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes. Yunfeng Zhu 1 , Patrick P. C. Lee 2 , Liping Xiang 1 , Yinlong Xu 1 , Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12. Fault Tolerance. - PowerPoint PPT Presentation

Citation preview

Page 1: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems

with RAID-6 Codes

Yunfeng Zhu1, Patrick P. C. Lee2, Liping Xiang1,

Yinlong Xu1, Lingling Gao1

1University of Science and Technology of China2The Chinese University of Hong Kong

DSN’121

Page 2: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Fault Tolerance Fault tolerance becomes more challenging in

modern distributed storage systems • Increase in scale • Usage of inexpensive but less reliable storage nodes

Fault tolerance is ensured by introducing redundancy across storage nodes• Replication

• Erasure codes (e.g., Reed-Solomon codes)

2A B A+B A+2B

A

B

A

B

A

B

Page 3: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

XOR-Based Erasure Codes

Encoding/decoding involve XOR operations only• Low computational overhead

Different redundancy levels• 2-fault tolerant: RDP, EVENODD, X-Code• 3-fault tolerant: STAR• General-fault tolerant: Cauchy Reed-Solomon (CRS)

3

Page 4: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Failure Recovery

Recovering node failures is necessary• Preserve the required redundancy level• Avoid data unavailability

Single-node failure recoverySingle-node failure occurs more frequently than a

concurrent multi-node failure

Page 5: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Example: Recovery in RDP

d0,6

d1,6

d2,6

d3,6

d4,6

d5,6

⊕⊕

⊕⊕⊕⊕

d0,0 d0,1 d0,2 d0,3 d0,4 d0,5

d1,0 d1,1 d1,2 d1,3 d1,4 d1,5

d2,0 d2,1 d2,2 d2,3 d2,4 d2,5

d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

d4,0 d4,1 d4,2 d4,3 d4,4 d4,5

d5,0 d5,1 d5,2 d5,3 d5,4 d5,5

d0,7

d1,7

d2,7

d3,7

d4,7

d5,7

⊕⊕⊕⊕⊕⊕

node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7

5

An RDP code example with 8 nodes

Let’s say node0 fails. How do we recover node0?

Page 6: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Conventional Recovery Idea: use only row parity sets. Recover each lost

data symbol (i.e., data chunk) independentlynode 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7

Read symbols: 36

Then how do we recover node 0 efficiently?

Different metrics can be used to measure the efficiency of a recovery

scheme

6

Page 7: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Minimize Number of Read Symbols

Idea: use a combination of row and diagonal parity sets to maximize overlapping symbols[Xiang, ToS’11]

node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7

Read symbols: 27 Improve rate: 25%

7

Page 8: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Need A New Metric? A modern storage system is natural to be composed of

heterogeneous types of storage nodes • System upgrades• New node addition

A heterogeneous environment

8

Proxy

node 0

node 1 node 2

node 3

node4

node 5node 6node 7

New node

26Mbps

68Mbps 109Mbps

110Mbps

113Mbps

10Mbps110Mbps86Mbps

Need a new efficient failure recovery solution

for heterogeneous environment!

Page 9: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Related Work Hybrid recovery

• Minimize number of read symbols RAID-6 XOR-based erasure codes• e.g., RDP [Xiang, ToS’11], EVENODD [Wang, Globecom’10

Enumeration recovery [Khan, FAST’12]

• Enumerate all recovery possibilities to achieve optimal recovery for general XOR-based erasure codes

Greedy recovery [Zhu, MSST’12]

• Efficient search of recovery solutions for general XOR-based erasure codes

Regenerating codes [Dimakis, ToIT’10]

• Nodes encode data during recovery• Minimize recovery bandwidth• Heterogeneous case considered in [Li, Infocom’10], but requires node encoding

and collaboration

9

Page 10: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Challenges

How to enable efficient failure recovery for heterogeneous settings?• Minimizing # of read symbols homogeneous settings• Performance bottlenecked by poorly performed nodes

How to quickly find the recovery strategy?• Minimizing # of read symbols deterministic metric• Minimizing general cost non-deterministic metric Recovery decision typically can’t be pre-determined

Page 11: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Our Contributions

Target two RAID-6 codes: RDP and EVENODD• XOR-based encoding operations

Goals:• Minimize search time• Minimize recovery cost

Cost-based single-node failure recovery for heterogeneous distributed storage systems

11

Page 12: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Our Contributions Formulate an optimization problem for single-

node failure recovery in heterogeneous settings Propose a cost-based heterogeneous recovery

(CHR) algorithmNarrow down search spaceSuitable for online recovery

Implement and experiment on a heterogeneous networked storage testbed

12

Page 13: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

. . .

Node p-1 Node p

. . .

Weight: Download Distribution:

w0 w1 wp-1 wp

y0 y1 yp-1 yp

. . .. . .

. . .. . .

p

kiiii ywC

,0

Minimizing total recovery cost:

Model Formulation

Our formulation:

13

Node : v0 v1 vkvp-1 vp

Node 0 Node 1 Node k

Page 14: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Physical Meaningswi C

1 for all i total number of symbols being read from surviving nodes

inverse of transmission bandwidth of node Vi

total amount of transmission time to download symbols from surviving nodes

monetary cost of migrating per unit of data outbound from node Vi

the total monetary cost of migrating data from surviving nodes (or clouds)

14

Page 15: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Solving the Model Important: Which symbols to be fetched from surviving

nodes must follow inherent rules of specific coding schemes To solve the model, we introduce recovery sequence (x0 , x1 , … , xp-2, 0)

– xi = 0 , di,k is recovered from its row parity set– xi = 1 , di,k is recovered from its diagonal parity set

download distribution:(3, 2, 2, 3, 2)

recovery sequence: (0, 0, 1, 1, 0)

d0,0

d1,0

d2,0

d3,0

d0,1

d1,1

d2,1

d3,1

d0,2

d1,2

d2,2

d3,2

d0,3

d1,3

d2,3

d3,3

d0,4

d1,4

d2,4

d3,4

d0,5

d1,5

d2,5

d3,5

node 0 node 1 node 2 node 3 node 4 node 5

15

An example:1) Each recovery sequence represents a feasible recovery solution;

2) Download distribution can be represented by recovery sequence;

Page 16: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Solving the Model (2) Step 1: use recovery sequence to represent downloads

Step 2: narrow down search space by only considering min-read recovery sequences (i.e., download minimum number of read symbols during recovery)

Step 3: reformulate the model as

)()1(1

0

1

0

p

ikjii

p

iij p

xxxpy

1

0

2/)1(p

ikjiij p

xxpy

kj

p

iipkjii wxx

1

0Minimize

16

Page 17: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Expensive Enumeration

P Total # of recovery sequences

# of min-read recovery sequences

# of unique min-read recovery sequences

5 16 6 2

7 64 20 4

11 1024 252 26

13 4096 924 74

17 65536 12870 698

19 262144 48620 2338

23 4194304 705432 28216

29 268435456 40116600 1302688

Challenge: Too many min-read recovery sequences to enumerate even we narrow down search space

17

Observation: many min-read recovery sequences return the same download distribution

Page 18: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Optimize Enumeration Process

Two conditions under which different recovery sequences have same download distribution:Shift condition(0, 0, 0, 1, 1, 1, 0) (0, 0, 1, 1, 1, 0, 0) (0, 1, 1, 1, 0, 0, 0) (1, 1, 1, 0, 0, 0, 0) … Reverse condition(0, 0, 0, 1, 1, 1, 0) (0, 1, 1, 1, 0, 0, 0)

18

Key idea: not all recovery sequences need to be enumerated(details in the paper)

Page 19: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Cost-based Heterogeneous Recovery (CHR) Algorithm: Intuition

Step 1: initialize a bitmap to track all possible min-read recovery sequences R

Step 2: compute recovery cost of R. Step 3: mark all shifted and reverse sequences

of R as being enumerated Step 4: switch to another R; return the one with

minimum cost

19

Page 20: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Example

Proxy

node 0

node 1 node 2

node 3

node4

node 5node 6node 7

New node

26Mbps

68Mbps 109Mbps

110Mbps

113Mbps

10Mbps110Mbps86Mbps

node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7

3 5 4 4 5 3 3

node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7

5 4 3 3 4 5 3

Our proposed CHR algorithm Hybrid approach [Xiang, ToS’11]

Page 21: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Recovery Cost Comparison CHR approach

Hybrid approach

Conventional approach

7353.01133

105

1104

863

1103

1094

685

5449.01133

103

1105

864

1104

1095

683

9221.0106

1106

866

1106

1096

686

reduce by 25.89%

reduce by 40.91%

21

Page 22: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Simulation Studies (1): Traverse Efficiency

Evaluate the computational time of CHRP Naive traverse time

(ms)CHR’s traverse time

(ms)Improved rate

(%)5 0.0220 0.0100 54.55

7 0.0950 0.0310 67.37

11 2.3160 0.3910 83.12

13 11.9840 1.6150 86.52

17 107.7410 10.0790 90.65

19 455.2760 40.5370 91.10

23 9230.7800 691.2800 92.51

29 752296.2700 45423.5570 93.96

CHR significantly reduces the traverse time of the naive approach by over 90% as p increases! 22

Page 23: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Simulation Studies (2): Robustness Efficiency

Evaluate if CHR achieves the global optimal among all the feasible recovery sequences

P Hit Global OptimalProbability(%)

Global Optimal MaxImprovement(%)

5 94.9 6.12

7 94.5 5.54

11 93.6 5.98

13 93.2 6.46

17 92.8 5.97

19 93.1 5.73

CHR has a very high probability (over 93%) to hit the global optimal recovery cost!

12 p

23

Page 24: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Simulation Studies (3): Recovery Efficiency

Evaluate via 100 runs for each p the recovery efficiency of CHR in a heterogeneous storage environment

CHR can reduce recovery cost by up to 50% over the conventional approach

CHR can reduce recovery cost by up to 30% over the hybrid approach

24

Page 25: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Experiments Experiments on a networked storage testbed

• Conventional vs. Hybrid vs. CHR• Default chunk size = 1MB• Communication via ATA over Ethernet (AoE)• Consider two codes: RDP and EVENODD

• Only RDP results shown in this talk

Recovery operation:• Read chunks from

surviving nodes• Reconstruct lost chunks• Write reconstructed chunks

to a new node

25

Recovery process

Gigabit switch

nodes

Page 26: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Experiments Two types of Ethernet interface card equipped by

physical storage devices• 100Mbps set weight = 1/(100Mbps)• 1Gbps set weight = 1/(1Gbps)

26

p Total # of nodes

# of nodes with 100Mbps

# of nodes with 1Gbps

5 6 2 4

7 8 3 5

11 12 5 7

13 14 6 8

17 18 9 9

Configuration for RDP code

Page 27: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Number of Storage Nodes Total recovery time for RDP

• CHR improves conventional by 21-31%• CHR improves hybrid by 15-20%

16.4717.11

15.2818.1

20.78

24.1524.61

21.4523.93

31.19

00.05

0.10.15

0.20.25

0.30.35

0.40.45

p=5 p=7 p=11 p=13 p=17Conventional Hybrid CHR

Rec

over

yTi

me

(in s

econ

ds) p

er

MB

p27

Page 28: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Chunk Size Total recovery time for RDP (p = 11)

• CHR improves conventional by 18-26%• CHR improves hybrid by 14-19%

14.6914.66

15.2816.68

19.82

18.3222.02

21.4523.76

25.66

0

0.05

0.1

0.15

0.2

0.25

256KB 512KB 1024KB 2048KB 4096KB

Conventional Hybrid CHRChunk Size

Rec

over

yTi

me

(in s

econ

ds) p

er

MB

Page 29: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Failed Nodes Total recovery time for RDP (p = 11)

• CHR still outperforms conventional and hybrid

29

15.28 16.18 19.44 12.46 13.12 17.49 14.43 14.27 16.67 11.99 10.0421.45 21.53 25.12 21.77 19.03 22.54 18.84 23.62 20.79 24.58

17.9

00.020.040.060.08

0.10.120.140.160.18

0.2

Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10Conventional Hybrid CHR

Rec

over

yTi

me

(in s

econ

ds) p

er

MB

Page 30: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Conclusions Address single-node failure recovery RAID-6 coded

heterogeneous storage systems Formulate a computation-efficient optimization model Propose a cost-based heterogeneous recovery algorithm Validate the effectiveness of the CHR algorithm through

extensive simulations and testbed experiments Future work:

Different cost formulations Extension for general XOR-based erasure codes Degraded reads

Source code:• http://ansrlab.cse.cuhk.edu.hk/software/chr/

30

Page 31: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Backup

Page 32: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Cost-based Heterogeneous Recovery (CHR) Algorithm

F A bitmap that identifies if a min-read recovery sequence has been enumerated

R, C A min-read recovery sequence with its recovery cost

R*, C* The min-cost recovery sequence with the minimum total recovery cost

1 Initialize F[0…2p-1-1] with 0-bits; Initialize R with 1-bits followed by 0-bits;Initialize R* with R ; Initialize C* with MAX_VALUE

2 If R is null, then go to Step 4;Convert R into integer value v, if R has already enumerated, then go to Step 3;Mark all the shifted an reverse recovery sequences of R as being enumerated;Calculate the recovery cost C of R; Update R* and C* if necessary

3 Get the next min-read recovery sequence R and go to Step 2;

4 Finally, initialize R with all 0-bits;Calculate the recovery cost C of R; Update R* and C* if necessary

Notation:

Algorithm:21p

21p

32

Page 33: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Example

Proxy

node 0

node 1

node 2

node 3

node4

node 5

node 6

node 7

New node

26Mbps

68Mbps 109Mbps

110Mbps

113Mbps

10Mbps110Mbps86Mbps

Step 1: Initialize F[0..63] with 0-bits, R = {1110000}, the recovery cost C = MAX_VALUE

Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1;Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C

Step 3: Get the next min-read recovery sequence R and go to Step 2

Step 4: Finally, we can find that R* = {1010100} and C* = 0.5449α33

node 0

node 1

node 2

node 3

node 4

node 5

node 6

node 7

3 5 4 4 5 3 3

Page 34: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Recovery Cost Comparison CHR approach

Hybrid approach

Conventional approach

7353.01133

105

1104

863

1103

1094

685

5449.01133

103

1105

864

1104

1095

683

9221.0106

1106

866

1106

1096

686

reduce by 25.89%

reduce by 40.91%

34

node 0

node 1

node 2

node 3

node 4

node 5

node 6

node 7

5 4 3 3 4 5 3

Page 35: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Number of Storage Nodes

Consider the overall performance of the complete recovery operation for EVENODD

14.319.69

14.4415.06

18.31

1925.47

25.1727.49

32.11

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

p=5 p=7 p=11 p=13 p=17Conventional Hybrid CHR

Rec

over

yTim

e (in

sec

onds

) per

M

B

p

35

Page 36: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Chunk Size Evaluate the impact of chunk size for EVENODD on

the recovery time performance

9.411.07

14.4416.27

20.5

15.57 26.2525.17

26.3925.48

0

0.05

0.1

0.15

0.2

0.25

256KB 512KB 1024KB 2048KB 4096KB

Conventional Hybrid CHR

Rec

over

yTi

me

(in s

econ

ds) p

er

MB

Chunk Size

36

Page 37: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Different Failed NodesEvaluate the recovery time performance for EVENODD when the failed node is in a different column

14.448.1 11.52 10.22

13.97 13.06 9.4 15.31 13.798.83 9.98

25.1717.06 17.09

21.8 22.62 23.9318.58 19.77 18.19 22.56 16.95

00.020.040.060.08

0.10.120.140.160.18

0.2

Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10Conventional Hybrid CHR

Reco

very

Tim

e (in

sec

onds

) per

M

B

37