42
Coding Theory and applications to Distributed Computing Carl Bosley 10/29/2007

Coding Theory and applications to Distributed Computing

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coding Theory and applications to Distributed Computing

Coding Theory and applications to

Distributed Computing

Carl Bosley10/29/2007

Page 2: Coding Theory and applications to Distributed Computing

Overview

• Introduction to ECC’s– History– Examples– Current Research

• Multicast with Erasure Codes

Page 3: Coding Theory and applications to Distributed Computing

Noisy Channel

C(x)y = C(x)+error

x• Mapping C

– Error-correcting code (“code”)– Encoding: x → C(x)– Decoding: y → x– C(x) is a codeword

x

Page 4: Coding Theory and applications to Distributed Computing

A Noisy Channel

• Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaerin waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the fristand lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raedervey lteter by istlef, but the wrod as a wlohe.

Page 5: Coding Theory and applications to Distributed Computing

Communication

• Internet– Checksum used in

multiple layers of TCP/IP stack

• Cell phones• Satellite broadcast

– TV• Deep space

telecommunications– Mars Rover

Page 6: Coding Theory and applications to Distributed Computing

“Unusual” applications

• Data Storage– CDs and DVDs– RAID– ECC memory

• Paper bar codes– UPS (MaxiCode)

• ISBNCodes are all around us

Page 7: Coding Theory and applications to Distributed Computing

Other applications of codes

• Applications in theory– Complexity Theory

• Derandomization– Cryptography– Network algorithms

• Network Coding

Page 8: Coding Theory and applications to Distributed Computing

The birth of coding theory

• Claude E. Shannon– “A Mathematical Theory of

Communication”– 1948– Gave birth to Information theory

• Richard W. Hamming– “Error Detecting and Error Correcting

Codes”– 1950

Page 9: Coding Theory and applications to Distributed Computing

The fundamental tradeoff

• Correct as many errors as possible while using as little redundancy as possible– Intuitively, contradictory goals

Page 10: Coding Theory and applications to Distributed Computing

The Binary Symmetric Channel0 0

1 11-p

1-p

p

p

E: {0,1}k {0,1}n

D: {0,1}n {0,1}k

k = Rn

R < 1 is called the rate of the source.Each bit sent is received

correctly with probability p, and incorrectly with probability 1-p. Errors are independent.

Page 11: Coding Theory and applications to Distributed Computing

Notation• Hamming Distance:

– For x, y in Σn, d(x,y) = # coordinates s.t. xi ≠ yi.– wt(x) = d(x,0)

• Entropy:– H(p) = -Σpi log pi = -(p log p + (1-p) log (1-p)).

• Capacity of the Binary Symmetric Channel is:– C(p) = 1 – H(p).

• Exist codes which achieve rate R arbitrarily close to C(p).

Page 12: Coding Theory and applications to Distributed Computing

Some Terminology

• C = (n,k,d)q code:– n = block length– k = information length– d = distance– k/n = (information) rate– q = alphabet size

• Often, it is convenient to think of Σ as a finite field of size q.– (“allows multiplication and addition”)

Page 13: Coding Theory and applications to Distributed Computing

Basic Questions in Coding Theory

• Find optimal tradeoffs for n, k, d, q• Usually, q is fixed, and we seek C.

– Given n,k,q, maximize d– Given n,d,q, maximize k– Given rate, minimize n

Page 14: Coding Theory and applications to Distributed Computing

Some main types of Codes

Matrix operations.

Interpolation.Berlekamp-Welch

Decoding

{v . xi}Vector vLinear

{f(xi)}Polynomial fReed-Solomon

EncodingInput interpretation

Code type

Page 15: Coding Theory and applications to Distributed Computing

Trivial Example Codes

• Repetition– 0 000000, 1 111111

• d = n/2, k = 1

• Parity– Append parity to message– 000 0000, 001 0011, …, 111 1111

• N = k+1, d = 2

Page 16: Coding Theory and applications to Distributed Computing

Linear Codes

• If Σ is a field, then Σn is a vector space• C can be a linear subspace

– Called [n,k,d]q code• Short representation• Efficient encoding• Efficient error detection

Page 17: Coding Theory and applications to Distributed Computing

Linear Codes are Nice•Generator Matrix:

–k × n matrix G s.t. C = {xG | x \in Σk}•Parity Check Matrix:

–n × (n-k) matrix H s.t. C = {y \in Σn | yH = 0}

Page 18: Coding Theory and applications to Distributed Computing

Examples

• Hamming Code– [n = (qt – 1)/(q-1), n-t, d=3]q– Rows of H: all nonzero vectors of length t

• Hadamard Code– Dual of the Hamming code– {mx | x \in Σk}– [n=qt, k = t, d = qt – qt-1]q

Page 19: Coding Theory and applications to Distributed Computing

Hamming (7,4) code

Page 20: Coding Theory and applications to Distributed Computing

Hamming (7,4) code

Page 21: Coding Theory and applications to Distributed Computing

Encoding

Page 22: Coding Theory and applications to Distributed Computing

Decoding Example 1

Page 23: Coding Theory and applications to Distributed Computing

Decoding Example 2

Page 24: Coding Theory and applications to Distributed Computing

Using the Parity Check Matrix

Page 25: Coding Theory and applications to Distributed Computing

Codes for Multicasting: Introduction

• Everyone thinks of data as an ordered stream. I need packets 1-1,000.

• Using codes, data is like water:– You don’t care what drops you get.– You don’t care if some spills.– You just want enough to get through the pipe.– I need 1,000 packets.

Page 26: Coding Theory and applications to Distributed Computing

Erasure Codes

Message

Encoding

Received

Message

Encoding Algorithm

Decoding Algorithm

Transmission

n

cn

n≥

n

Page 27: Coding Theory and applications to Distributed Computing

Application:Trailer Distribution Problem

• Millions of users want to download a new movie trailer.

• 32 megabyte file, at 56 Kbits/second.• Download takes around 75 minutes at

full speed.

Page 28: Coding Theory and applications to Distributed Computing

Point-to-Point Solution Features

• Good– Users can initiate the download at their discretion.– Users can continue download seamlessly after

temporary interruption.– Moderate packet loss is not a problem.

• Bad– High server load.– High network load.– Doesn’t scale well (without more resources).

Page 29: Coding Theory and applications to Distributed Computing

Broadcast Solution Features

• Bad– Users cannot initiate the download at their discretion.– Users cannot continue download seamlessly after

temporary interruption.– Packet loss is a problem.

• Good– Low server load.– Low network load.– Does scale well.

Page 30: Coding Theory and applications to Distributed Computing

A Coding Solution: Assumptions

• We can take a file of n packets, and encode it into cn encoded packets.

• From any set of n encoded packets, the original message can be decoded.

Page 31: Coding Theory and applications to Distributed Computing

Coding Solution

File

Encoding

Transmission

EncodingCopy 2

EncodingCopy 1

User 1Reception

User 2Reception

0 hours

1 hour

2 hours

3 hours

4 hours

5 hours

Page 32: Coding Theory and applications to Distributed Computing

Coding Solution Features

• Users can initiate the download at their discretion.• Users can continue download seamlessly after

temporary interruption.• Moderate packet loss is not a problem.• Low server load - simple protocol.• Does scale well.• Low network load.

Page 33: Coding Theory and applications to Distributed Computing

So, Why Aren’t We Using This...

• Encoding and decoding are slow for large files -- especially decoding.

• So we need fast codes to use a coding scheme.

• We may have to give something up for fast codes– Such codes were only recently developed.

Page 34: Coding Theory and applications to Distributed Computing

Performance Measures

• Time Overhead– The time to encode and decode expressed as

a multiple of the encoding length.

• Reception efficiency– Ratio of packets in message to packets

needed to decode. Optimal is 1.

Page 35: Coding Theory and applications to Distributed Computing

Reception Efficiency

• Optimal– Can decode from any n words of encoding.– Reception efficiency is 1.

• Relaxation– Decode from any (1+ε) n words of encoding– Reception efficiency is 1/(1+ε).

Page 36: Coding Theory and applications to Distributed Computing

Parameters of the Code

Message Length

Encoding Length

Reception efficiency is 1/(1+ε)

(1+ε)n

n

cn

Page 37: Coding Theory and applications to Distributed Computing

Previous Codes

• Reception efficiency is 1.– e.g. Standard Reed-Solomon

• Time overhead is number of redundant packets.• Uses finite field operations.

– Fast Fourier-based• Time overhead is ln2 n field operations.

• Reception efficiency is 1/(1+ε).– Random mixed-length linear equations

• Time overhead is ln(1/ε)/ε.

Page 38: Coding Theory and applications to Distributed Computing

Tornado Code Performance

• Reception efficiency is 1/(1+ε).

• Time overhead is ln(1/ε).

• Fast and efficient enough to be practical.

Page 39: Coding Theory and applications to Distributed Computing

Codes: Other Applications?

• Using codes, data is like water.– What more can you do with this idea?

• Example: Parallel downloads: Get data from multiple sources, without the need for co-ordination.

Page 40: Coding Theory and applications to Distributed Computing

Recent Improvements• Practical problem with Tornado code:

encoding length– Must decide a priori -- what is right?– Encoding/decoding time/memory

proportional to encoded length.• Luby transform:

– Encoding produced “on-the-fly” -- no encoding length.

– Encoding/decoding time/memory proportional to message length.

Page 41: Coding Theory and applications to Distributed Computing

Coding Solution

File

Transmission

Encoding

User 1Reception

User 2Reception

0 hours

1 hour

2 hours

3 hours

4 hours

5 hours

Page 42: Coding Theory and applications to Distributed Computing

Additional Resources• Tornado codes

– Slides: http://www.icsi.berkeley.edu/~luby/PAPERS/tordig.ps– Paper: http://www.icsi.berkeley.edu/~luby/PAPERS/losscode.ps

• Network Coding– Combination of coding theory and graph theory

• Goal of coding theory: achieve capacity on a channel• Goal of network coding: achieve capacity on a network

– See [NC], [EXOR], [COPE] papers available at• http://www.news.cs.nyu.edu/~jinyang/fa07/schedule.html

– More links at• http://www.ifp.uiuc.edu/~koetter/NWC/index.html