Upload
sorcha
View
45
Download
0
Embed Size (px)
DESCRIPTION
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes. Yunfeng Zhu 1 , Patrick P. C. Lee 2 , Liping Xiang 1 , Yinlong Xu 1 , Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12. Fault Tolerance. - PowerPoint PPT Presentation
Citation preview
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems
with RAID-6 Codes
Yunfeng Zhu1, Patrick P. C. Lee2, Liping Xiang1,
Yinlong Xu1, Lingling Gao1
1University of Science and Technology of China2The Chinese University of Hong Kong
DSN’121
Fault Tolerance Fault tolerance becomes more challenging in
modern distributed storage systems • Increase in scale • Usage of inexpensive but less reliable storage nodes
Fault tolerance is ensured by introducing redundancy across storage nodes• Replication
• Erasure codes (e.g., Reed-Solomon codes)
2A B A+B A+2B
A
B
A
B
A
B
XOR-Based Erasure Codes
Encoding/decoding involve XOR operations only• Low computational overhead
Different redundancy levels• 2-fault tolerant: RDP, EVENODD, X-Code• 3-fault tolerant: STAR• General-fault tolerant: Cauchy Reed-Solomon (CRS)
3
Failure Recovery
Recovering node failures is necessary• Preserve the required redundancy level• Avoid data unavailability
Single-node failure recoverySingle-node failure occurs more frequently than a
concurrent multi-node failure
Example: Recovery in RDP
d0,6
d1,6
d2,6
d3,6
d4,6
d5,6
⊕⊕
⊕⊕⊕⊕
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
d4,0 d4,1 d4,2 d4,3 d4,4 d4,5
d5,0 d5,1 d5,2 d5,3 d5,4 d5,5
d0,7
d1,7
d2,7
d3,7
d4,7
d5,7
⊕⊕⊕⊕⊕⊕
node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7
5
An RDP code example with 8 nodes
Let’s say node0 fails. How do we recover node0?
Conventional Recovery Idea: use only row parity sets. Recover each lost
data symbol (i.e., data chunk) independentlynode 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7
Read symbols: 36
Then how do we recover node 0 efficiently?
Different metrics can be used to measure the efficiency of a recovery
scheme
6
Minimize Number of Read Symbols
Idea: use a combination of row and diagonal parity sets to maximize overlapping symbols[Xiang, ToS’11]
node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7
Read symbols: 27 Improve rate: 25%
7
Need A New Metric? A modern storage system is natural to be composed of
heterogeneous types of storage nodes • System upgrades• New node addition
A heterogeneous environment
8
Proxy
node 0
node 1 node 2
node 3
node4
node 5node 6node 7
New node
26Mbps
68Mbps 109Mbps
110Mbps
113Mbps
10Mbps110Mbps86Mbps
Need a new efficient failure recovery solution
for heterogeneous environment!
Related Work Hybrid recovery
• Minimize number of read symbols RAID-6 XOR-based erasure codes• e.g., RDP [Xiang, ToS’11], EVENODD [Wang, Globecom’10
Enumeration recovery [Khan, FAST’12]
• Enumerate all recovery possibilities to achieve optimal recovery for general XOR-based erasure codes
Greedy recovery [Zhu, MSST’12]
• Efficient search of recovery solutions for general XOR-based erasure codes
Regenerating codes [Dimakis, ToIT’10]
• Nodes encode data during recovery• Minimize recovery bandwidth• Heterogeneous case considered in [Li, Infocom’10], but requires node encoding
and collaboration
9
Challenges
How to enable efficient failure recovery for heterogeneous settings?• Minimizing # of read symbols homogeneous settings• Performance bottlenecked by poorly performed nodes
How to quickly find the recovery strategy?• Minimizing # of read symbols deterministic metric• Minimizing general cost non-deterministic metric Recovery decision typically can’t be pre-determined
Our Contributions
Target two RAID-6 codes: RDP and EVENODD• XOR-based encoding operations
Goals:• Minimize search time• Minimize recovery cost
Cost-based single-node failure recovery for heterogeneous distributed storage systems
11
Our Contributions Formulate an optimization problem for single-
node failure recovery in heterogeneous settings Propose a cost-based heterogeneous recovery
(CHR) algorithmNarrow down search spaceSuitable for online recovery
Implement and experiment on a heterogeneous networked storage testbed
12
. . .
Node p-1 Node p
. . .
Weight: Download Distribution:
w0 w1 wp-1 wp
y0 y1 yp-1 yp
. . .. . .
. . .. . .
p
kiiii ywC
,0
Minimizing total recovery cost:
Model Formulation
Our formulation:
13
Node : v0 v1 vkvp-1 vp
Node 0 Node 1 Node k
Physical Meaningswi C
1 for all i total number of symbols being read from surviving nodes
inverse of transmission bandwidth of node Vi
total amount of transmission time to download symbols from surviving nodes
monetary cost of migrating per unit of data outbound from node Vi
the total monetary cost of migrating data from surviving nodes (or clouds)
14
Solving the Model Important: Which symbols to be fetched from surviving
nodes must follow inherent rules of specific coding schemes To solve the model, we introduce recovery sequence (x0 , x1 , … , xp-2, 0)
– xi = 0 , di,k is recovered from its row parity set– xi = 1 , di,k is recovered from its diagonal parity set
download distribution:(3, 2, 2, 3, 2)
recovery sequence: (0, 0, 1, 1, 0)
d0,0
d1,0
d2,0
d3,0
d0,1
d1,1
d2,1
d3,1
d0,2
d1,2
d2,2
d3,2
d0,3
d1,3
d2,3
d3,3
d0,4
d1,4
d2,4
d3,4
d0,5
d1,5
d2,5
d3,5
node 0 node 1 node 2 node 3 node 4 node 5
15
An example:1) Each recovery sequence represents a feasible recovery solution;
2) Download distribution can be represented by recovery sequence;
Solving the Model (2) Step 1: use recovery sequence to represent downloads
Step 2: narrow down search space by only considering min-read recovery sequences (i.e., download minimum number of read symbols during recovery)
Step 3: reformulate the model as
)()1(1
0
1
0
p
ikjii
p
iij p
xxxpy
1
0
2/)1(p
ikjiij p
xxpy
kj
p
iipkjii wxx
1
0Minimize
16
Expensive Enumeration
P Total # of recovery sequences
# of min-read recovery sequences
# of unique min-read recovery sequences
5 16 6 2
7 64 20 4
11 1024 252 26
13 4096 924 74
17 65536 12870 698
19 262144 48620 2338
23 4194304 705432 28216
29 268435456 40116600 1302688
Challenge: Too many min-read recovery sequences to enumerate even we narrow down search space
17
Observation: many min-read recovery sequences return the same download distribution
Optimize Enumeration Process
Two conditions under which different recovery sequences have same download distribution:Shift condition(0, 0, 0, 1, 1, 1, 0) (0, 0, 1, 1, 1, 0, 0) (0, 1, 1, 1, 0, 0, 0) (1, 1, 1, 0, 0, 0, 0) … Reverse condition(0, 0, 0, 1, 1, 1, 0) (0, 1, 1, 1, 0, 0, 0)
18
Key idea: not all recovery sequences need to be enumerated(details in the paper)
Cost-based Heterogeneous Recovery (CHR) Algorithm: Intuition
Step 1: initialize a bitmap to track all possible min-read recovery sequences R
Step 2: compute recovery cost of R. Step 3: mark all shifted and reverse sequences
of R as being enumerated Step 4: switch to another R; return the one with
minimum cost
19
Example
Proxy
node 0
node 1 node 2
node 3
node4
node 5node 6node 7
New node
26Mbps
68Mbps 109Mbps
110Mbps
113Mbps
10Mbps110Mbps86Mbps
node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7
3 5 4 4 5 3 3
node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7
5 4 3 3 4 5 3
Our proposed CHR algorithm Hybrid approach [Xiang, ToS’11]
Recovery Cost Comparison CHR approach
Hybrid approach
Conventional approach
7353.01133
105
1104
863
1103
1094
685
5449.01133
103
1105
864
1104
1095
683
9221.0106
1106
866
1106
1096
686
reduce by 25.89%
reduce by 40.91%
21
Simulation Studies (1): Traverse Efficiency
Evaluate the computational time of CHRP Naive traverse time
(ms)CHR’s traverse time
(ms)Improved rate
(%)5 0.0220 0.0100 54.55
7 0.0950 0.0310 67.37
11 2.3160 0.3910 83.12
13 11.9840 1.6150 86.52
17 107.7410 10.0790 90.65
19 455.2760 40.5370 91.10
23 9230.7800 691.2800 92.51
29 752296.2700 45423.5570 93.96
CHR significantly reduces the traverse time of the naive approach by over 90% as p increases! 22
Simulation Studies (2): Robustness Efficiency
Evaluate if CHR achieves the global optimal among all the feasible recovery sequences
P Hit Global OptimalProbability(%)
Global Optimal MaxImprovement(%)
5 94.9 6.12
7 94.5 5.54
11 93.6 5.98
13 93.2 6.46
17 92.8 5.97
19 93.1 5.73
CHR has a very high probability (over 93%) to hit the global optimal recovery cost!
12 p
23
Simulation Studies (3): Recovery Efficiency
Evaluate via 100 runs for each p the recovery efficiency of CHR in a heterogeneous storage environment
CHR can reduce recovery cost by up to 50% over the conventional approach
CHR can reduce recovery cost by up to 30% over the hybrid approach
24
Experiments Experiments on a networked storage testbed
• Conventional vs. Hybrid vs. CHR• Default chunk size = 1MB• Communication via ATA over Ethernet (AoE)• Consider two codes: RDP and EVENODD
• Only RDP results shown in this talk
Recovery operation:• Read chunks from
surviving nodes• Reconstruct lost chunks• Write reconstructed chunks
to a new node
25
Recovery process
Gigabit switch
nodes
Experiments Two types of Ethernet interface card equipped by
physical storage devices• 100Mbps set weight = 1/(100Mbps)• 1Gbps set weight = 1/(1Gbps)
26
p Total # of nodes
# of nodes with 100Mbps
# of nodes with 1Gbps
5 6 2 4
7 8 3 5
11 12 5 7
13 14 6 8
17 18 9 9
Configuration for RDP code
Different Number of Storage Nodes Total recovery time for RDP
• CHR improves conventional by 21-31%• CHR improves hybrid by 15-20%
16.4717.11
15.2818.1
20.78
24.1524.61
21.4523.93
31.19
00.05
0.10.15
0.20.25
0.30.35
0.40.45
p=5 p=7 p=11 p=13 p=17Conventional Hybrid CHR
Rec
over
yTi
me
(in s
econ
ds) p
er
MB
p27
Different Chunk Size Total recovery time for RDP (p = 11)
• CHR improves conventional by 18-26%• CHR improves hybrid by 14-19%
14.6914.66
15.2816.68
19.82
18.3222.02
21.4523.76
25.66
0
0.05
0.1
0.15
0.2
0.25
256KB 512KB 1024KB 2048KB 4096KB
Conventional Hybrid CHRChunk Size
Rec
over
yTi
me
(in s
econ
ds) p
er
MB
Different Failed Nodes Total recovery time for RDP (p = 11)
• CHR still outperforms conventional and hybrid
29
15.28 16.18 19.44 12.46 13.12 17.49 14.43 14.27 16.67 11.99 10.0421.45 21.53 25.12 21.77 19.03 22.54 18.84 23.62 20.79 24.58
17.9
00.020.040.060.08
0.10.120.140.160.18
0.2
Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10Conventional Hybrid CHR
Rec
over
yTi
me
(in s
econ
ds) p
er
MB
Conclusions Address single-node failure recovery RAID-6 coded
heterogeneous storage systems Formulate a computation-efficient optimization model Propose a cost-based heterogeneous recovery algorithm Validate the effectiveness of the CHR algorithm through
extensive simulations and testbed experiments Future work:
Different cost formulations Extension for general XOR-based erasure codes Degraded reads
Source code:• http://ansrlab.cse.cuhk.edu.hk/software/chr/
30
Backup
Cost-based Heterogeneous Recovery (CHR) Algorithm
F A bitmap that identifies if a min-read recovery sequence has been enumerated
R, C A min-read recovery sequence with its recovery cost
R*, C* The min-cost recovery sequence with the minimum total recovery cost
1 Initialize F[0…2p-1-1] with 0-bits; Initialize R with 1-bits followed by 0-bits;Initialize R* with R ; Initialize C* with MAX_VALUE
2 If R is null, then go to Step 4;Convert R into integer value v, if R has already enumerated, then go to Step 3;Mark all the shifted an reverse recovery sequences of R as being enumerated;Calculate the recovery cost C of R; Update R* and C* if necessary
3 Get the next min-read recovery sequence R and go to Step 2;
4 Finally, initialize R with all 0-bits;Calculate the recovery cost C of R; Update R* and C* if necessary
Notation:
Algorithm:21p
21p
32
Example
Proxy
node 0
node 1
node 2
node 3
node4
node 5
node 6
node 7
New node
26Mbps
68Mbps 109Mbps
110Mbps
113Mbps
10Mbps110Mbps86Mbps
Step 1: Initialize F[0..63] with 0-bits, R = {1110000}, the recovery cost C = MAX_VALUE
Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1;Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C
Step 3: Get the next min-read recovery sequence R and go to Step 2
Step 4: Finally, we can find that R* = {1010100} and C* = 0.5449α33
node 0
node 1
node 2
node 3
node 4
node 5
node 6
node 7
3 5 4 4 5 3 3
Recovery Cost Comparison CHR approach
Hybrid approach
Conventional approach
7353.01133
105
1104
863
1103
1094
685
5449.01133
103
1105
864
1104
1095
683
9221.0106
1106
866
1106
1096
686
reduce by 25.89%
reduce by 40.91%
34
node 0
node 1
node 2
node 3
node 4
node 5
node 6
node 7
5 4 3 3 4 5 3
Different Number of Storage Nodes
Consider the overall performance of the complete recovery operation for EVENODD
14.319.69
14.4415.06
18.31
1925.47
25.1727.49
32.11
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
p=5 p=7 p=11 p=13 p=17Conventional Hybrid CHR
Rec
over
yTim
e (in
sec
onds
) per
M
B
p
35
Different Chunk Size Evaluate the impact of chunk size for EVENODD on
the recovery time performance
9.411.07
14.4416.27
20.5
15.57 26.2525.17
26.3925.48
0
0.05
0.1
0.15
0.2
0.25
256KB 512KB 1024KB 2048KB 4096KB
Conventional Hybrid CHR
Rec
over
yTi
me
(in s
econ
ds) p
er
MB
Chunk Size
36
Different Failed NodesEvaluate the recovery time performance for EVENODD when the failed node is in a different column
14.448.1 11.52 10.22
13.97 13.06 9.4 15.31 13.798.83 9.98
25.1717.06 17.09
21.8 22.62 23.9318.58 19.77 18.19 22.56 16.95
00.020.040.060.08
0.10.120.140.160.18
0.2
Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10Conventional Hybrid CHR
Reco
very
Tim
e (in
sec
onds
) per
M
B
37