33
Computer S y stems (159.253) ~ 1 ~Data Communications: © P.L y ons 2004 ext usually unsuitable for RLE only contains repeated space chars Data Compression RLE (RUN LENGTH ENCODING) save money and time by reducing amount of data tran Instead of sending <char> <char><char><char><char><char><char> Send ESC 7 <char> When data includes ESC, send ESC ESC LE is used for encoding faxes Binary files better often contain repeated chars, especially NUL there’s an alternative that would sen <char> <char> 5 no. of repetitions more elegant; <char> acts as its own esc RunLength Encoding

RunLength Encoding

  • Upload
    shanna

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

D ata C ompression. RunLength Encoding. there’s an alternative that would send 5. no. of repetitions. RLE ( R UN L ENGTH E NCODING). Aims to save money and time by reducing amount of data transmitted. Instead of sending - PowerPoint PPT Presentation

Citation preview

Page 1: RunLength Encoding

Computer Systems (159.253) ~ 1 ~Data Communications: © P.Lyons 2004

Text usually unsuitable for RLEonly contains repeated space chars

Data Compression

RLE (RUN LENGTH ENCODING)

Aims to save money and time by reducing amount of data transmittedInstead of sending <char> <char><char><char><char><char><char>Send ESC 7 <char>When data includes ESC, send ESC ESC

RLE is used for encoding faxes

Binary files betteroften contain repeated chars, especially NUL

there’s an alternative that would send<char> <char> 5

no. of repetitions

more elegant; <char> acts as its own esc

RunLength Encoding

Page 2: RunLength Encoding

Computer Systems (159.253) ~ 2 ~Data Communications: © P.Lyons 2004

Data Compression

HUFFMAN CODING

Makes boundaries between characters hard to find

ASCII encodes all characters with 7 bits

Characters occur with unequal frequenciesf e =100 x fq

Use fewer bits to encode most-common

D.A. Huffman

Huffman Coding

Page 3: RunLength Encoding

Computer Systems (159.253) ~ 3 ~Data Communications: © P.Lyons 2004

Consider an alphabet with only 4 letters

2-bit code Huffman code (most common) A 00 1

B 01 01C 10 001

(least common) D 11 000

20 bits → 19 bitsbetter compression with larger alphabets!

Data Compression

HUFFMAN CODING

AAAABBBCCD00

1

00 00 01 01 01

1 1 01 01 01

00

1

10 10

001 001

11

000 Huffman code

2-bit code

D.A. Huffman

Huffman Coding

Page 4: RunLength Encoding

Computer Systems (159.253) ~ 4 ~Data Communications: © P.Lyons 2004

Data Compression

HUFFMAN CODING

Most efficient to create a code specifically for the data being sentAllows for different letter frequencies in different languages

Both ends must agree on the code set

Fax machines use a modified Huffman scheme. codes for sequences containing1, 2, 3, .... , 63, 64 black or white dots128, 192, ... (i.e. multiples of 64) dots

So 67-dot sequence would be sent as codes for 64 then 3

D.A. Huffman

Huffman Coding

Page 5: RunLength Encoding

Computer Systems (159.253) ~ 5 ~Data Communications: © P.Lyons 2004

Data Compression

LEMPEL-ZIV COMPRESSION

ZIP and UNIX Compress utility use (modified) L-Z compression

Codes are fixed-length (usually 12 or 16-bits )

7-bit ASCII for single characters + nine 0-bitsInefficient when sending single charactersBut after a while, very few single characters get sent

Extra codes for most common character sequencesSender creates extra codes based on the letter frequencies in the messageReceiver constructs extra codes while decompressing the original message

Abraham Lempel

Jacob Ziv

Lempel-Ziv Compression

Page 6: RunLength Encoding

Computer Systems (159.253) ~ 6 ~Data Communications: © P.Lyons 2004

Lempel-Ziv CompressionData Compression

LEMPEL-ZIV COMPRESSION

a

1b

2.z

26.th

27.the

79.then

158

the Remaining stringR denotes the unsent part of the message

Initially, R is the complete message

A code table relates character sequences to codes

Initially, code table just contains the alphabetSender and receiver have the same table

New sequences are added when they are encounteredsender and receiver both add the same codeseasy for the sender!

L denotes the longest string of characters… starting from the first character of Roccurring in the code table

L’ denotes L + the next character in R

Abraham Lempel

Jacob Ziv

Page 7: RunLength Encoding

Computer Systems (159.253) ~ 7 ~Data Communications: © P.Lyons 2004

Lempel-Ziv CompressionData Compression

…they then and there theorised that this was thus

LL’

R

Senderidentifies L,

sends the code for L to the receiver

Receiver

receives codelooks it up in the code table

adds L to the message string

Sender identifies L’, & makes a new entry in the code table for L’

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 8: RunLength Encoding

Computer Systems (159.253) ~ 8 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compressiona 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

Sendera 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

Receiver

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 9: RunLength Encoding

Computer Systems (159.253) ~ 9 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

they_then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

2028

h t

Sender Receiver

Data Compression

LEMPEL-ZIV COMPRESSION

Abraham Lempel

Jacob Ziv

Page 10: RunLength Encoding

Computer Systems (159.253) ~ 10 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

eeh y_then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

829

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 11: RunLength Encoding

Computer Systems (159.253) ~ 11 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

yey_then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

530

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

► ey 30en 34

e

► he 29

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 12: RunLength Encoding

Computer Systems (159.253) ~ 12 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

_y then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

2531

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

e

► he 29

y

► ey 30en 34

► y_ 31

_

► ey 30en 34

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 13: RunLength Encoding

Computer Systems (159.253) ~ 13 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

_then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

2732

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

e

► he 29

y

► ey 30en 34

► y_ 31

t _

► ey 30en 34

► _t 32_a 36

► y_ 31

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 14: RunLength Encoding

Computer Systems (159.253) ~ 14 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

then_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

2833

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

e

► he 29

y

► ey 30en 34

► y_ 31

e _

► ey 30en 34

th

► y_ 31

► _t 32_a 36

► _t 32_a 36

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 15: RunLength Encoding

Computer Systems (159.253) ~ 15 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

en_and_there_theorised_that_this_was_thus

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► th 28the 33

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

534

t

Sender Receiver

Data Compression

► he 29

h

► th 28the 33

e

► he 29

y

► ey 30en 34

► y_ 31

n _

► ey 30en 34

th

► y_ 31

► _t 32_a 36

► _t 32_a 36

e

Several further steps ensue…

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 16: RunLength Encoding

Computer Systems (159.253) ~ 16 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

_th_the

► n_ 35

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► nd 38

► re 42

► _t 32_a 36

► _th 40► _the 43

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

► n_ 35

► re 42

► th 28the 33

► _t 32_a 36

► _th 40

► y_ 31

_and_there_theorised_that_this_was_

Sender Receiver

► he 29

► th 28the 33

► y_ 31

► he 29

► nd 38

► an 37 ► an 37

► d_ 39 ► d_ 39

► her 41 ► her 41

ey 30en 34e_ 43

► ey 30en 34e_ 43

RLL’

eorised_that_this_was_thus4043

they_then

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 17: RunLength Encoding

Computer Systems (159.253) ~ 17 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

spin-spin-s

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

Sender

RLL’

pin-effect8992

A special case

► sp 73► spi 79► spin 84► ► sp 73► spi 79► spin 84►

spin- 89►

s

Receiver

spin- 89► spin-s 92►

(well, nearly)<stringa><stringa><not char1 of stringa>

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 18: RunLength Encoding

Computer Systems (159.253) ~ 18 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

spin-spin-e

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

Sender

RLL’

effect8993

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

Receiver

spin-

spin-e 93►

A special case (well, nearly)<stringa><stringa><not char1 of stringa>

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 19: RunLength Encoding

Computer Systems (159.253) ~ 19 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

spin-ef

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

e

Sender

RLL’

ffect594

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

Receiver

spin-

spin-e 93►

ef 94►

spin-e 93►

A special case (well, nearly)<stringa><stringa><not char1 of stringa>

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 20: RunLength Encoding

Computer Systems (159.253) ~ 20 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

s spin-spin-s

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

Sender

RLL’

pin-splitt8992

► sp 73► spi 79► spin 84► ► sp 73► spi 79► spin 84►

spin- 89►

Receiver

spin- 89► spin-s 92►

A special case (yes, really!)

ing

<stringa><stringa><char1 of stringa>

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

Page 21: RunLength Encoding

Computer Systems (159.253) ~ 21 ~Data Communications: © P.Lyons 2004

Lempel-Ziv Compression

pspin-s spin-

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

a 1b 2c 3d 4e 5f 6g 7h 8i 9j 10k 11l 12m 13n 14o 15p 16q 17r 18s 19t 20u 21v 22w 23x 24y 25z 26_ 27

RLL’

9293

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

► sp 73► spi 79► spin 84►

spin- 89► spin-s 92►

spin-s

spin-sp 93►

A special case (yes, really!)

litting

Sender Receiver

p

<stringa><stringa><char1 of stringa>

Data Compression

Abraham Lempel

Jacob Ziv

LEMPEL-ZIV COMPRESSION

We are standing up

Page 22: RunLength Encoding

Computer Systems (159.253) ~ 22 ~Data Communications: © P.Lyons 2004

Error Detection

Sensitivity of applications to errors

Error Detection and Correction

ERROR DETECTION

Errors caused by noiseImpulse (clicks)Crosstalk (between lines)Thermal (can’t eliminate)

If we can detect errors, we can eliminate themUndetected errors can’t be eliminated altogetherAim for a detection rate high enough for application

Video transfer

Bank transactions

high

low

Tanenbaum 3rd edition: 183-190

Page 23: RunLength Encoding

Computer Systems (159.253) ~ 23 ~Data Communications: © P.Lyons 2004

with even parity: 1111111 ↓

11111111

0111111 ↓ 01111110

Error Detection MethodsError Detection and Correction

ERROR DETECTION

Double sending

METHODS

used by data prep operatorsnot normally used in data comms

ParityAdd 1 or 0 after character

to make total no of 1s even (even parity ) or odd (odd parity)

On arrival, no. of 1 bits in characters should still be evenSingle-bit corruptions make no. of 1 bits oddTwo-bit corruptions are undetected.

Page 24: RunLength Encoding

Computer Systems (159.253) ~ 24 ~Data Communications: © P.Lyons 2004

Receiver calculates XOR sum of complete byte sequencedetects error if calculated XOR sums is non-zero

Error Detection and Correction

BLOCK CHECKSUMS

Sender sends byte sequence & XOR sum of byte sequence

does not detect two characters that are reversed.

Block Checksums

Page 25: RunLength Encoding

Computer Systems (159.253) ~ 25 ~Data Communications: © P.Lyons 2004

conventional “horizontal” parity

“longitudinal” parity

Uses both longitudinal and horizontal parity.

Note: positional information => block parity can be used for error correction

Error Detection and Correction

BLOCK PARITY

0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0

011101110

Block Parity

Page 26: RunLength Encoding

Computer Systems (159.253) ~ 26 ~Data Communications: © P.Lyons 2004

CRCs

Receiver

DATA

Sender

Error Detection and Correction

CRC (CYCLIC REDUNDANCY CHECK)divisor represents a polynomial:11001 represents a polynomial of degree D = 41x4 + 1x3 + 0x2 + 0x1 + 1x0

divisorquotient

divisorquotient

+

0+

remainder

DATADATA

++ remainderremainder

DATA’DATA’ DATA’

Page 27: RunLength Encoding

Computer Systems (159.253) ~ 27 ~Data Communications: © P.Lyons 2004

CRCs

If data = 11100110, divisor = 11001 (D = 4 (x4 is the highest term))add D 0s to the datadivide, using rules of modulo-2 division:

XOR instead of subtraction

CRC (CYCLIC REDUNDANCY CHECK)

Error Detection and Correction

B “goes into” A if B’s high-order bit is in the same position as A’s high-order bit

In A/B,

Page 28: RunLength Encoding

Computer Systems (159.253) ~ 28 ~Data Communications: © P.Lyons 2004

CRCs

11001) 11100110 0000

CRC (CYCLIC REDUNDANCY CHECK)

Error Detection and Correction

1

11001 01011

0

00000 1011111001 11100

11

11001 0101000000 1010011001 1101011001 0011000000

0110

The CCITT polynomial (divisor) is x16 + x15 + x2+ x0

CRCs detect all single bit errors, most double bit errors, all error bursts <16 bitsmost error burst >16 bits.

11000000000000101

Page 29: RunLength Encoding

Computer Systems (159.253) ~ 29 ~Data Communications: © P.Lyons 2004

FEC (Forward Error Control)Used where retransmission is undesirableInclude extra information with message so it can be reconstructed

Computer memory, or diskSimplex transmission from a data loggerTransmissions from distant spacecraft

Error Detection and Correction

ERROR CORRECTION

ARQ (Automatic Retransmission on reQuest)Most common in data commsif received data contains errors, request retransmission

Error Correction

Page 30: RunLength Encoding

Computer Systems (159.253) ~ 30 ~Data Communications: © P.Lyons 2004

HAMMING CODES

Error Detection and Correction

Facilitate error detection and correctionUse >1 bit to encode a bit

0 in data becomes codeword 0001 in data becomes codeword 111

000 100

110010

011

001

111

101

“Hamming Distance” = 3

Closer to 111 than to 000

Closer to 000 than to 111

To detect d bit errors, a code’s HD must be d+1 To correct d bit errors, a code’s HD must be 2d+1

With HD = 3, it is possibleEITHER to detect 2-bit errorsOR to correct 1-bit errors

Richard Hamming

Hamming Codes

Page 31: RunLength Encoding

Computer Systems (159.253) ~ 31 ~Data Communications: © P.Lyons 2004

To detect d bit errors, code’s HD must be d+1 To correct d bit errors, a code’s HD must be 2d+1

HAMMING CODES

Error Detection and Correction

a 0000011111b 0000000000 c 1111100000d 1111111111

HD for some character-pairs is 10But minimum intercharacter HD is 5, so HD for the whole code is 5.If 1 or 2 bits change, the result is nearer to the original valid codeword

So, for 2-bit error correction, HD 5

5105a

dcbaInter-

characterHammingDistances

a 0000011111 0000101111

0000011111 a0000000000 b1111100000 c1111111111 d

25

85

5105d

5510c

1055b

Richard Hamming

Hamming Codes

Page 32: RunLength Encoding

Computer Systems (159.253) ~ 32 ~Data Communications: © P.Lyons 2004

Richard Hamming

HAMMING CODES Hamming CodesError Detection and Correction

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0 0 1 1 1 0 1 0 1 0 0 1 0 0 0

1

1

1

1

1

1

1

0

1

1

0

1

1

1

0

0

1

0

1

1

1

0

1

0

1

0

0

1

1

0

0

0

0

1

1

1

0

1

1

0

0

1

0

1

0

1

0

0

0

0

1

1

0

0

1

0

0

0

0

1

Page 33: RunLength Encoding

Computer Systems (159.253) ~ 34 ~Data Communications: © P.Lyons 2004

0 0 1 0 1 0 0 0

0 1 1 1 1 0 0 0

Richard Hamming

HAMMING CODES Hamming CodesError Detection and Correction

0 0 1 1 1 0 1 0

0 0 1 1 1 0 0 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0 0 1 1 1 0 1 0 1 0 0 1 0 0 0

0

0

0

0

0 0 0 0

0

0

0

0 0

1

1

1

1 0 1 1

1

Hamming Codes