Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Coding Theory and applications to
Distributed Computing
Carl Bosley10/29/2007
Overview
• Introduction to ECC’s– History– Examples– Current Research
• Multicast with Erasure Codes
Noisy Channel
C(x)y = C(x)+error
x• Mapping C
– Error-correcting code (“code”)– Encoding: x → C(x)– Decoding: y → x– C(x) is a codeword
x
A Noisy Channel
• Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaerin waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the fristand lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raedervey lteter by istlef, but the wrod as a wlohe.
Communication
• Internet– Checksum used in
multiple layers of TCP/IP stack
• Cell phones• Satellite broadcast
– TV• Deep space
telecommunications– Mars Rover
“Unusual” applications
• Data Storage– CDs and DVDs– RAID– ECC memory
• Paper bar codes– UPS (MaxiCode)
• ISBNCodes are all around us
Other applications of codes
• Applications in theory– Complexity Theory
• Derandomization– Cryptography– Network algorithms
• Network Coding
The birth of coding theory
• Claude E. Shannon– “A Mathematical Theory of
Communication”– 1948– Gave birth to Information theory
• Richard W. Hamming– “Error Detecting and Error Correcting
Codes”– 1950
The fundamental tradeoff
• Correct as many errors as possible while using as little redundancy as possible– Intuitively, contradictory goals
The Binary Symmetric Channel0 0
1 11-p
1-p
p
p
E: {0,1}k {0,1}n
D: {0,1}n {0,1}k
k = Rn
R < 1 is called the rate of the source.Each bit sent is received
correctly with probability p, and incorrectly with probability 1-p. Errors are independent.
Notation• Hamming Distance:
– For x, y in Σn, d(x,y) = # coordinates s.t. xi ≠ yi.– wt(x) = d(x,0)
• Entropy:– H(p) = -Σpi log pi = -(p log p + (1-p) log (1-p)).
• Capacity of the Binary Symmetric Channel is:– C(p) = 1 – H(p).
• Exist codes which achieve rate R arbitrarily close to C(p).
Some Terminology
• C = (n,k,d)q code:– n = block length– k = information length– d = distance– k/n = (information) rate– q = alphabet size
• Often, it is convenient to think of Σ as a finite field of size q.– (“allows multiplication and addition”)
Basic Questions in Coding Theory
• Find optimal tradeoffs for n, k, d, q• Usually, q is fixed, and we seek C.
– Given n,k,q, maximize d– Given n,d,q, maximize k– Given rate, minimize n
Some main types of Codes
Matrix operations.
Interpolation.Berlekamp-Welch
Decoding
{v . xi}Vector vLinear
{f(xi)}Polynomial fReed-Solomon
EncodingInput interpretation
Code type
Trivial Example Codes
• Repetition– 0 000000, 1 111111
• d = n/2, k = 1
• Parity– Append parity to message– 000 0000, 001 0011, …, 111 1111
• N = k+1, d = 2
Linear Codes
• If Σ is a field, then Σn is a vector space• C can be a linear subspace
– Called [n,k,d]q code• Short representation• Efficient encoding• Efficient error detection
Linear Codes are Nice•Generator Matrix:
–k × n matrix G s.t. C = {xG | x \in Σk}•Parity Check Matrix:
–n × (n-k) matrix H s.t. C = {y \in Σn | yH = 0}
Examples
• Hamming Code– [n = (qt – 1)/(q-1), n-t, d=3]q– Rows of H: all nonzero vectors of length t
• Hadamard Code– Dual of the Hamming code– {mx | x \in Σk}– [n=qt, k = t, d = qt – qt-1]q
Hamming (7,4) code
Hamming (7,4) code
Encoding
Decoding Example 1
Decoding Example 2
Using the Parity Check Matrix
Codes for Multicasting: Introduction
• Everyone thinks of data as an ordered stream. I need packets 1-1,000.
• Using codes, data is like water:– You don’t care what drops you get.– You don’t care if some spills.– You just want enough to get through the pipe.– I need 1,000 packets.
Erasure Codes
Message
Encoding
Received
Message
Encoding Algorithm
Decoding Algorithm
Transmission
n
cn
n≥
n
Application:Trailer Distribution Problem
• Millions of users want to download a new movie trailer.
• 32 megabyte file, at 56 Kbits/second.• Download takes around 75 minutes at
full speed.
Point-to-Point Solution Features
• Good– Users can initiate the download at their discretion.– Users can continue download seamlessly after
temporary interruption.– Moderate packet loss is not a problem.
• Bad– High server load.– High network load.– Doesn’t scale well (without more resources).
Broadcast Solution Features
• Bad– Users cannot initiate the download at their discretion.– Users cannot continue download seamlessly after
temporary interruption.– Packet loss is a problem.
• Good– Low server load.– Low network load.– Does scale well.
A Coding Solution: Assumptions
• We can take a file of n packets, and encode it into cn encoded packets.
• From any set of n encoded packets, the original message can be decoded.
Coding Solution
File
Encoding
Transmission
EncodingCopy 2
EncodingCopy 1
User 1Reception
User 2Reception
0 hours
1 hour
2 hours
3 hours
4 hours
5 hours
Coding Solution Features
• Users can initiate the download at their discretion.• Users can continue download seamlessly after
temporary interruption.• Moderate packet loss is not a problem.• Low server load - simple protocol.• Does scale well.• Low network load.
So, Why Aren’t We Using This...
• Encoding and decoding are slow for large files -- especially decoding.
• So we need fast codes to use a coding scheme.
• We may have to give something up for fast codes– Such codes were only recently developed.
Performance Measures
• Time Overhead– The time to encode and decode expressed as
a multiple of the encoding length.
• Reception efficiency– Ratio of packets in message to packets
needed to decode. Optimal is 1.
Reception Efficiency
• Optimal– Can decode from any n words of encoding.– Reception efficiency is 1.
• Relaxation– Decode from any (1+ε) n words of encoding– Reception efficiency is 1/(1+ε).
Parameters of the Code
Message Length
Encoding Length
Reception efficiency is 1/(1+ε)
(1+ε)n
n
cn
Previous Codes
• Reception efficiency is 1.– e.g. Standard Reed-Solomon
• Time overhead is number of redundant packets.• Uses finite field operations.
– Fast Fourier-based• Time overhead is ln2 n field operations.
• Reception efficiency is 1/(1+ε).– Random mixed-length linear equations
• Time overhead is ln(1/ε)/ε.
Tornado Code Performance
• Reception efficiency is 1/(1+ε).
• Time overhead is ln(1/ε).
• Fast and efficient enough to be practical.
Codes: Other Applications?
• Using codes, data is like water.– What more can you do with this idea?
• Example: Parallel downloads: Get data from multiple sources, without the need for co-ordination.
Recent Improvements• Practical problem with Tornado code:
encoding length– Must decide a priori -- what is right?– Encoding/decoding time/memory
proportional to encoded length.• Luby transform:
– Encoding produced “on-the-fly” -- no encoding length.
– Encoding/decoding time/memory proportional to message length.
Coding Solution
File
Transmission
Encoding
User 1Reception
User 2Reception
0 hours
1 hour
2 hours
3 hours
4 hours
5 hours
Additional Resources• Tornado codes
– Slides: http://www.icsi.berkeley.edu/~luby/PAPERS/tordig.ps– Paper: http://www.icsi.berkeley.edu/~luby/PAPERS/losscode.ps
• Network Coding– Combination of coding theory and graph theory
• Goal of coding theory: achieve capacity on a channel• Goal of network coding: achieve capacity on a network
– See [NC], [EXOR], [COPE] papers available at• http://www.news.cs.nyu.edu/~jinyang/fa07/schedule.html
– More links at• http://www.ifp.uiuc.edu/~koetter/NWC/index.html