Upload
kira-windle
View
221
Download
1
Embed Size (px)
Citation preview
Contribution to the Design & Implementation
of the Highly Available Scalable and
Distributed Data Structure: LH*RS
Rim MoussaRim Moussa [email protected] [email protected]
http://ceria.dauphine.fr/rim/rim.htmlhttp://ceria.dauphine.fr/rim/rim.html
Thesis Presentation in Computer Science *Distributed Databases
Thesis Supervisor: Pr. Witold Litwin
Examinators: Pr. Thomas J.E. Schwarz
Pr. Toré Risch
Jury President: Pr. Gérard Lévy
Paris Dauphine University
*CERIA Lab.*04th October 2004
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 2
Outline
1. Issue
2. State of the Art
3. LH*RS Scheme
4. LH*RS Manager
5. Experimentations
6. LH*RS File Creation
7. Bucket Recovery
8. Parity Bucket Creation
9. Conclusion & Future Work
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 3
Facts …
Volume of Information of 30% /year
Technology
Network Infrastructure
>> Gilder Law, bandwidth triples every year.
Evolution of PCs storage & computing capacities
>> Moore Law, the latters double every 18 months.
Bottleneck of Disks Accesses & CPUs
Need of Distributed Data Storage SystemsSDDSs: LH*, RP* … High Throughput
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 4
Facts …
Network
Frequent & Costly Failures
>> Stat. Published by the Contingency Planning Research in
1996: the cost of service interruption/h case of brokerage
application is $6,45 million.
Need of Distributed & Highly-Available Data Storage
Systems
Multicomputers
>> Modular Architecture >> Good Price/ Performance Tradeoff
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 5
State of the Art
Parity Calculus
(+)(+) Good Response Time, Mirors are functional
(-) High Storage Overhead (n if n repliquas)
Data Replication
Criteria to evaluate Erasure-resilient Codes:
Encoding Rate (Parity Volume/ Data Volume)
Update Penality (Parity Volumes)
Group Size used for Data Reconstruction
Encoding & Decoding Complexity
Recovery Capabilitties
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 6
Parity Schemes
1-Available Schemes
k-Available Schemes
Binary Linear Codes: [H94]
Tolerate max. 3 failures
Array Codes: EVENODD [B94 ], X-code [XB99], RDP [C+04]
Tolerate max. 2 failures
Reed Solomon Codes : IDA [R89], RAID X [W91], FEC [B95],
Tutorial [P97], LH*RS [LS00, ML02, MS04, LMS04]
Tolerate k failures (k > 3)
…
XOR Parity Calculus : RAID Technology (level 3, 4, 5…) [PGK88],
SDDS LH*g [L96] …
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 7
Outline…
1. Issue
2. State of the Art
3. LH*RS Scheme
LH*RS?
SDDSs?
Reed Solomon Codes?
Encoding/ Decoding Optimizations
4. LH*RS Manager
5. Experimentations
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 8
LH*RS ?
Distribution using Linear Hashing (LH*LH [KLR96])
LH*LH Manager[B00]
Scalability & High Throughput
High Availability
LH*: Scalable & Distributed Data Structure
Parity Calculus using Reed-Solomon Codes [RS63]
LH*RS [LS00]
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 9
SDDSs Principles
(1) Dynamic File Growth
Client
Network
Client…
Data Buckets
…OVERLOADED
You Split Insertions
…
Coordinator
Record Transfert
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 10
SDDSs Principles (2)
Network
(2) No Centralized Directory Access
Cases de Données
……
Client
Query Query Forward
Client Image
Adjustment
Message
…
File Image
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 11
Reed-Solomon Codes
Encoding
From m Data Symbols Calculus of n Parity Symbols
Data Representation Galois Field
Fields with finite size: q
Closure Propoerty: Addition, Substraction,
Multiplication, Division.
In GF(2w),(1) Addition (XOR)(2) Multiplication (Tables: gflog and antigflog) e1 * e2 = antigflog[ gflog[e1] + gflog[e2] ]
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 12
1 0 0 0 0 … 0 C1,1 … C1,j … C1,n-m
0 1 0 0 0 … 0 C2,1 … C2,j … C2,n-m
0 0 1 0 0 … 0 C3,1 … C3,j … C3,n-m
… … … … …
0 00 0 0 … 1 Cm,1 … Cm,j … Cm,n-m
RS Encoding
S1
S2
S3
:
Si
:
Sm
S1
:
Sm
P1
P2
:
Pj
:
Pn-m
=
C1,j
C2,j
C3,j
:
Cm,j Pj
(S1 C1,j) (S2 C2,j) … (Sm Cm,j)
m-1 XORs GF
m Multiplications GF
S1
S2
S3
:
Si
:
Sm
Im P(m(n-m))
(1) Systematic Encoding: Matrix (Im|P)
(2) Any m columns are linearly independent
Parity Matrix
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 13
Optimized Decoding
Multiply the ‘‘m OK
symbols’’
By columns of H-1
corresponding to lost symbols
m OK symbols
Hm: m corresponding
columns
H-1 = [ S1 S2 S3 S4 ….. Sm ]
Gauss Transformatiom
1 0 0 0 0 … 0 C1,1 C1,2 C1,3 … C1,n-m
0 1 0 0 0 … 0 C2,1 C2,2 C2,3 … C2,n-m
0 0 1 0 0 … 0 C3,1 C3,2 C3,3 … C3,n-m
… … … … …
0 0 0 0 0 … 1 Cm,1 Cm,2 Cm,3 … Cm,n-m
RS DecodingS1
S2
S3
S4
:Sm
P1
P2
P3
: Pn-m
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 14
Galois Field Parity Matrix
Optimizations
GF Multiplication
(+)
GF(216) vs. GF(28) reduces the #Symbols by 1/2
#Operations in the GF.
GF(28) 1 symbol = 1 Byte
GF(216) 1 symbol = 2 Bytes
(-)
Multiplication Tables Size
GF(28): 0,768 Ko
GF(216): 393,216 Ko (512 0,768)
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 15
Galois Field Parity Matrix
Optimizations (2)
GF Multiplication
1st Column of ‘1’s
Encoding of the 1st PB along XOR Calculus
Gain in encoding & decoding
1st Row of ‘1’s
Any update from 1st DB is processed with XOR Calculus
Gain in Performance of 4% (case PB creation, m =4)
0001 0001 0001 …
0001 eb9b 2284 …
0001 2284 9é74 …
0001 9e44 d7f1 …
… … … …
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 16
Galois Field Parity Matrix
Optimizations (3)
GF Multiplication
EncodingEncoding
Log Pre-calculus of the Coef. of P Matrix
Improvement of 3,5%
0000 0000 0000 …
0000 5ab5 e267 …
0000 e267 0dce …
0000 784d 2b66 …
… … … …
DecodingDecoding
Log Pre-calculus of coef.
of H-1 matrix and OK
symbols vector
Improvement of 4% to
8% depending on the
#buckets to recover
Goal: Reduce GF Multiplication Complexity
e1 * e2 = antigflog[ gflog[e1] + gflog[e2] ]
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 17
LH*RS -Parity Groups
Data Buckets
Parity Buckets
: Key; Data
Insert Rank
r
: Rank; [Key-list ]; Parity
Key r
2
1
0
2
1
0
A k-Acvailable Group survive to the failure of k buckets
Grouping Concept m: #data buckets k: #parity buckets
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 18
Outline…
1. Issue
2. State of the Art
3. LH*RS Scheme
4. LH*RS Manager
Communication
Gross Architecture
5. Experimentations
6. File Creation
7. Bucket Recovery …
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 19
Communication
TCP/IPTCP/IPUDP ““Multicast”Multicast”
Individual Operations
(Insert, Update, Delete, Search)
Record Recovery
Control Messages
Performance
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 20
Communication
TCP/IPUDPUDP “Multicast”
Large Buffers Transfert
New Parity Buckets
Transfer Parity Update & Record (Bucket Split)
Bucket Recovery
Performance & Reliability
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 21
Communication
TCP/IPUDP “Multicast”
Looking for New Data/Parity Buckets
Communication Multipoints
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 22
Architecture
(1) TCP/IP Connection Handler
Principle of “Sending Credit & Message Conservation
until delivery” [J88, GRS97, D01]
1 Bucket Recovery (3,125 MB):
SDDS 2000: 6,7 s SDDS2000-TCP: 2,6 s
(Hardware Config.: CPU 733MhZ machines, network 100Mbps)
Before
Improvement of 60%
TCP/IP Connections are passive OPEN,
RFC 793 –[ISI81], TCP/IP under Win2K Server OS [MB00]
(2) Flow Control & Message Acknowledgement (FCMA)
Enhancements to SDDS2000 Architecture:
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 23
Architecture (2)
Befor
e
To tag new servers (data or parity) using Multicast:
(3) Dynamic IP Addressing Structure
Pre-defined and Static IP@s Table
Multicast Group of Blank Data
Buckets
Multicast Group of Blank Parity Buckets
Coordinator
Created Buckets
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 24
Architecture (3)
Multicast Listening
Port
UDP Sending Port
TCP/IP Port
UDP Listening
Port
UDP Listening Thread
Messages Queue
TCP Listening Thread
Multicast listening Thread
Message Queue
Pool of Working Threads
Network
ACK Mgmt Threads
Free Zones
Messages waiting for ACK.
Not acquittedMessages
…
ACK Structure
Multicast Working Thread
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 25
Experimentation
Performance Evaluation
* CPU Time
* Communication Time
Experimental Environment
* 5 Machines (Pentium IV: 1.8 GHz, RAM: 512 Mb)
* Ethernet Network 1 Gbps
* O.S.: Win2K Server
* Tested Configuration: 1 Client,
A group of 4 Data Buckets,
k Parity Buckets (k = 0,1,2,3).
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 26
Outline…
1. Issue
2. State of the Art
3. LH*RS Scheme
4. LH*RS Manager
5. Experimentations
6. File Creation
Parity Update
Performance
7. Bucket Recovery
8. Parity Bucket Creation
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 27
File Creation Client Operations
Propagation of Data Record Inserts/ Updates/ Deletes to Parity Buckets.
Update: Send only –record.
Deletes: Management of Free Ranks within Data Buckets.
Data Bucket Split
N1: #renaining records
N2: #leaving records
Parity Group of the Splitting Data Bucket
N1+N2 Deletes + N1 Inserts
Parity Group of the New Data Bucket
N2 Inserts
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 28
Performances
Config.Config. Client Window = 1Client Window = 1 Client Window = 5Client Window = 5
Max Bucket Size = 10 000 records
File of 25 000 records
1 record = 104 Bytes
No difference GF(28) et GF(216) (we don’t wait for ACKs between DBs and PBs)
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 29
Performances
Config.Config. Client Window = 1Client Window = 1 Client Window = 5Client Window = 5
7,896s9,990s
10,963s
0,0002,0004,0006,0008,000
10,00012,000
0 5000 10000 15000 20000 25000
Inserted Keys
File Creation Time
(sec)
k = 0k = 1k = 2
kk = 0 ** = 0 ** kk = 1 = 1 Perf. Degradation of 20% Perf. Degradation of 20%
kk = 1 ** = 1 ** kk = 2 = 2 Perf. Degradation of 8% Perf. Degradation of 8%
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 30
Performances
Config.Config. Client Window = 1Client Window = 1 Client Window = 5Client Window = 5
kk = 0 ** = 0 ** kk = 1 = 1 Perf. Degradation of 37% Perf. Degradation of 37%
kk = 1 ** = 1 ** kk = 2 = 2 Perf. Degradation of 10% Perf. Degradation of 10%
4,349s
6,940s7,720s
0
2
4
6
8
10
0 5000 10000 15000 20000 25000
Number of Inserted Keys
File Creation Time
(sec)
k = 0k = 1k = 2
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 31
Outline…1. Issue
2. State of the Art
3. LH*RS Scheme
4. LH*RS Manager
5. Experimentations
6. File Creation
7. Bucket RecoveryScenarioPerformances
8. Parity Bucket Creation
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 32
Failure Detection
Are you Alive?
Data Buckets
Parity Buckets
Scenario
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 33
Waiting for Responses …
OK
Data Buckets
Parity Buckets
Scenario (2)
OK OKOK
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 34
Searching Spare Buckets …
Wanna be
Spare ?
Scenario (3)
Multicast Group of Blank Data Buckets
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 35
Waiting for Replies …
Launch UDP Listening Launch
TCP Listening, Launch Working
Thredsl
*Waiting for Confirmation* If Time-out elapsed cancel everything
I would
Scenario (4)
Multicast Group of Blank Data Buckets
CoordinatorI would
I would
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 36
Spare Selection
Scenario (5)
Multicast Group of Blank Data Buckets
Confirmed
Cancellation
Confirmed
You are HiredCoordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 37
Parity Buckets
Recover Failed Buckets
Scenario (6)
Recovery Manager Selection
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 38
Data Buckets
Parity Buckets
Recovery Manager
Spare Buckets
Buckets participating to Recovery
Send me Records of rank in [r, r+slice-1]
…
Scenario (7)
Query Phase
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 39
Decoding Phase
Recovered Slices
Data Buckets
Parity Buckets
Spare Buckets
Buckets participating to Recovery
Requested Buffers
…
Scenario (8)
Reconstruction Phase
Recovery Manager
In // with Query Phase
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 40
2 DBs1 DB XORConfig. 1 DB RS XOR vs. RS
Performances
File Info
File of 125 000 records
Record Size = 100 bytes
Bucket Size = 31250 records 3.125 MB
Group of 4 Data Buckets (m = 4), k-Available with k = 1,2,3
Decoding
* GF(216)
* RS+ Decoding (RS + log Pre-calculus of H-1 and OK Symboles Vector)
Recovery per Slice (adaptative to PCs storage & computing capacities)
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 41
2 DBs1 DB XORConfig. 1 DB RS XOR vs. RS
Performances
SliceTotal Time (sec)
CPU Time (sec)
Com. Time (sec)
1250 0,625 0,266 0,348
3125 0,588 0,255 0,323
6250 0,552 0,240 0,312
15625 0,562 0,255 0,302
31250 0,578 0,250 0,328
Slice (from 4% to 100% of a bucket content)
Total Time is almost constant
0,58
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 42
2 DBs1 DB XORConfig. 1 DB RS XOR vs. RS
Performances
SliceTotal Time (sec)
CPU Time (sec)
Com. Time (sec)
1250 0,734 0,349 0,365
3125 0,688 0,359 0,323
6250 0,656 0,354 0,297
15625 0,667 0,360 0,297
31250 0,688 0,360 0,328
0,67
Slice (from 4% to 100% of a bucket content)
Total Time is almost constant
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 43
2 DBs1 DB XORConfig.
Performances
Time to Recover 1DB -XOR : 0,58 sec
XOR in GF(216) realizes a gain of 13% in Total Time
(and 30% in CPU Time)
Time to Recover 1DB –RS : 0,67 sec
1 DB RS XOR vs. RS
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 44
3 DBs2 DBs SummaryXOR vs. RS1 DB RS
Performances
SliceTotal Time (sec)
CPU Time (sec)
Com. Time (sec)
1250 0,976 0,577 0,375
3125 0,932 0,589 0,338
6250 0,883 0,562 0,321
15625 0,875 0,562 0,281
31250 0,875 0,562 0,313
0,9
Slice (from 4% to 100% of a bucket content)
Total Time is almost constant
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 45
3 DBs2 DBs SummaryXOR vs. RS1 DB RS
Performances
Slice Total Time (sec)
CPU Time (sec)
Com. Time (sec)
1250 1,281 0,828 0,406
3125 1,250 0,828 0,390
6250 1,211 0,852 0,352
15625 1,188 0,823 0,361
31250 1,203 0,828 0,375
1,23
Slice (from 4% to 100% of a bucket content)
Total Time is almost constant
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 46
Performances
3 DBs2 DBs SummaryXOR vs. RS1 DB RS
fBucket
Size (MB)Total Time
(sec)
Recovery Speed
(MB/sec)
1 (XOR)1 (RS)
3,1250,58 5.38
0,67 4.66
2 6,250 0,9 6.94
3 9,375 1,23 7,62
Time to Recover f Buckets f Time to Recover 1 Bucket
Factorized Query Phase The + is Decoding Time & Time to send Recovered Buffers
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 47
Performances
GF(28)
XOR in GF(28) improves decoding perf. of 60% compared to RS in GF(28).
RS/RS+ decoding in GF(216) realize a gain of 50% compared to decoding in GF(28).
3 DBs2 DBs SummaryXOR vs. RS
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 48
Outline…
1. Issue
2. State of the Art
3. LH*RS Scheme
4. LH*RS Manager
5. Experimentations
6. File Creation
7. Bucket Recovery
8. Parity Bucket Creation
ScenarioPerformances
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 49
Scenario
Multicast Group of Blank Parity Buckets
Wanna Join Group g ?
Searching for a new Parity Bucket
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 50
Scenario (2)
Coordinator
I Would
Launch UDP Listening Launch
TCP Listening, Launch Working
Thredsl
*Waiting for Confirmation* If Time-out elapsed cancel everything
Waiting for Replies …
Multicast Group of Blank Parity Buckets
I Would
I Would
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 51
Scenario (3)
You are Hired
Confirmed
Cancellation
Cancellation
New Parity Bucket Selection
Multicast Group of Blank Parity Buckets
Coordinator
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 52
Send me your contents ! …
Scenario (4)
Group of Data Buckets
New Parity Bucket
…
Auto-creation *Query Phase
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 53
Requested Buffers…
Scenario (5)
Group of Data Buckets
Buffer Processing
…
Auto-creation *Encoding Phase
New Parity Bucket
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 54
Performances
Max Bucket Size : 5000 .. 50000 records
Bucket Load Factor: 62,5%
Record Size: 100 octets
Group of 4 Data Buckets
Encoding
GF(216)
RS++ ( Log Pre-calculus & Row ‘1’s XOR encoding to Process 1st DB buffer)
XOR RS XOR vs. RSConfig.
GF(28)
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 55
Performances
Bucket Size
Total Time (sec)
CPU Time (sec)
Com. Time (sec)
5000 0.190 0.140 0.029
10000 0.429 0.304 0.066
25000 1.007 0.738 0.144
50000 2.062 1.484 0.322
XOR RS XOR vs. RSConfig.
GF(28)
Same Encoding Rate
Bucket Size: CPU Time 74% Total Time
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 56
Performances
Bucket Size
Total Time (sec)
CPU Time (sec)
Com. Time (sec)
5000 0.193 0.149 0.035
10000 0.446 0.328 0.059
25000 1.053 0.766 0.153
50000 2.103 1.531 0.322
XOR RS XOR vs. RSConfig.
GF(28)
Same Encoding Rate
Bucket Size: CPU Time 74% Total Time
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 57
Performances
XOR encoding speed : 2.062 sec
RS encoding speed: 2.103 sec
XOR realizes a performance gain in CPU time
of 5% ( only 0,02% on Total Time)
For Bucket Size = 50000 records
XOR RS XOR vs. RSConfig.
GF(28)
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 58
XOR RS XOR vs. RSConfig.
GF(28)
Performances
Idem GF(216), CPU Time = 3/4 Total Time
XOR in GF(28) improves CPU Time by 22%
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 59
Performance
File Creation Rate0.33MB/s for k = 0
0.25MB/s for k = 1
0.23MB/s for k = 2
Record Insert Time0.29ms for k = 0
0.33ms for k = 1
0.36ms for k = 2
Bucket Recovery Rate4.66MB/s from 1-unavailability
6.94MB/s from 2-unavailability
7.62MB/s from 3-unavailability
Record Recovery TimeAbout 1.3ms
Key Search TimeIndividual> 0.24ms
Bulk> 0.056ms
Wintel P4, 1.8GHz, 1Gbps
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 60
Conclusion
Experiments prove:
Optimizations
Encoding/ Decoding
Architecture
Impact on Performance
Good Recovery Performances
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 61
Future Work
Update Propagation to Parity Buckets Reliability
Performance
Reduce Coordinator Tasks
« Parity Declustering »
Investigation of New Erausure-Resilient Codes
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 62
References
[PGK88] D. A. Patterson, G. Gibson & R. H. Katz, A Case for Redundant Arrays of Inexpensive Disks, Proc. of ACM SIGMOD Conf, pp.109-106, June 1988.
[ISI81] Information Sciences Institute, RFC 793: Transmission Control Protocol (TCP) – Specification, Sept. 1981, http://www.faqs.org/rfcs/rfc793.html
[MB 00] D. MacDonal, W. Barkley, MS Windows 2000 TCP/IP Implementation Details, http://secinf.net/info/nt/2000ip/tcpipimp.html
[J88] V. Jacobson, M. J. Karels, Congestion Avoidance and Control, Computer Communication Review, Vol. 18, No 4, pp. 314-329.
[XB99] L. Xu & J. Bruck, X-Code: MDS Array Codes with Optimal Encoding, IEEE Trans. on Information Theory, 45(1), p.272-276, 1999.
[CEG+ 04] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, S. Sankar, Row-Diagonal Parity for Double Disk Failure Correction, Proc. of the 3rd USENIX –Conf. On File and Storage Technologies, Avril 2004.
[R89] M. O. Rabin, Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance, Journal of ACM, Vol. 26, N° 2, April 1989, pp. 335-348.
[W91] P.E. White, RAID X tackles design problems with existing design RAID schemes, ECC Technologies, ftp://members.aol.com.mnecctek.ctr1991.pdf
[GRS97] J. C. Gomez, V. Redo, V. S. Sunderam, Efficient Multithreaded User-Space Transport for Network Computing, Design & Test of the TRAP protocol, Journal of Parallel & Distributed Computing, 40 (1) 1997.
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 63
References (2)
[BK+ 95] J. Blomer, M. Kalfane, R. Karp, M. Karpinski, M. Luby & D. Zuckerman, An XOR-Based Erasure-Resilient Coding Scheme, ICSI Tech. Rep. TR-95-048, 1995.
[LS00] W. Litwin & T. Schwarz, LH*RS: A High-Availability Scalable Distributed
Data Structure using Reed Solomon Codes, p.237-248, Proceedings of the ACM SIGMOD 2000.
[KLR96] J. Karlson, W. Litwin & T. Risch, LH*LH: A Scalable high performance data structure for switched multicomputers, EDBT 96, Springer Verlag.
[RS60] I. Reed & G. Solomon, Polynomial codes over certain Finite Fields, Journal of the society for industrial and applied mathematics, 1960.
[P97] J. S. Plank, A Tutorial on Reed-Solomon Coding for fault-Tolerance in RAID-like Systems, Software– Practise & Experience, 27(9), Sept. 1997, pp 995- 1012,
[D01] A.W. Diène, Contribution à la Gestion de Structures de Données Distribuées et Scalables, PhD Thesis, Nov. 2001, Université Paris Dauphine.
[B00] F. Sahli Bennour, Contribution à la Gestion de Structures de Données Distribuées et Scalables, PhD Thesis, Juin 2000, Université Paris Dauphine.
+ Références: http://ceria.dauphine.fr/rim/theserim.pdf+ Références: http://ceria.dauphine.fr/rim/theserim.pdf
04 Oct. 04 * Présentation de Thèse R. Moussa, U. Paris Dauphine 64
Publications
[ML02] R. Moussa, W. Litwin, Experimental Performance Analysis of LH*RS Parity Management, Carleton Scientific Records of the 4th International Workshop on Distributed Data & Structure : WDAS 2002, p.87-97.
[MS04] R. Moussa, T. Schwarz, Design and Implementation of LH*RS – A Highly-
Available Scalable Distributed Data Structure, Carleton Scientific Records of the 6th International Workshop on Distributed Data & Structure: WDAS 2004.
[LMS04] W. Litwin, R. Moussa, T. Schwarz, Prototype Demonstration of LH*RS: A
Highly Available Distributed Storage System, Proc. of VLDB 2004 (Demo Session) p.1289-1292.
[LMS04-a] W. Litwin, R. Moussa, T. Schwarz, LH*RS: A Highly Available
Distributed Storage System, journal version submitted, under revision.