25
1 Lempel-Ziv Compression Techniques Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using LZ78 Example 2: Decoding using LZ78

1 Lempel-Ziv Compression Techniques Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using

  • View
    243

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

1

Lempel-Ziv Compression Techniques

Introduction to Run-Length Encoding

Introduction to Lempel-Ziv Encoding: LZ77 & LZ78

Example 1: Encoding using LZ78

Example 2: Decoding using LZ78

Page 2: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

2

Introduction to Run-Length Encoding

A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs.

Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency.

Data to be transmitted is first scanned to find the coding information before transmission.

For example, the messageAAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD

is replaced by4A3BAA5B8CDABCB3A4B3CD

Saving can be dramatic when long runs are present.

Page 3: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

3

Run-Length Encoding: Possible Problems A problem arises if one of the characters being transmitted

is a digit, as in 11111111111544444

which is represented as 1111554 (eleven 1s, one 5, five 4s).

A solution is instead of using a number n, a character can be used whose ASCII value is n.

For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49).

This method is only modestly efficient for text files and widely used for encoding fax images.

The main advantage of this algorithm is simplicity and speed.

Page 4: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

4

Introduction to Lempel-Ziv Encoding

Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding.

An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv.

This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78.

Due to patents, LZ77 and LZ78 led to many variants:

The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes.

LZ77 Variants

LZRLZSSLZBLZH

LZ78 Variants

LZWLZCLZTLZMWLZJLZFG

Page 5: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

5

LZ78 Compression AlgorithmStart with empty dictionaryP = emptyWHILE (there more characters in the char

stream) C = next character in the char streamIF (the string P+C in the dictionary)

P = P+CELSE //(if P is empty,zero is its codeword)

output the code word corresponding to P output Cadd the string P+C to the dictionary P = empty

IF (P is not empty)output the code word corresponding to P

END

Page 6: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

6

Example 1: LZ78 Compression 1Example 1: Use the LZ78 algorithm to encode the message

ABBCBCABABCAABCAAB

Solution: The encoding process is presented below in which: The column STEP indicates the number of the encoding step. The column POS indicates the current position in the input

data. The column DICTIONARY shows what string has been added

to the dictionary. The index of the string is equal to the step number.

The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary.

The output of each step decodes to the string that has been added to the dictionary.

Page 7: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

7

Example 1: LZ78 Compression Step 1

ABBCBCABABCAABCAAB POS = 1C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

Page 8: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

8

Example 1: LZ78 Compression Step 2

ABBCBCABABCAABCAAB POS = 2C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

Page 9: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

9

Example 1: LZ78 Compression Step 3

ABBCBCABABCAABCAAB POS = 3C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

3 3)2,C(BC

Page 10: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

10

Example 1: LZ78 Compression Step 4

ABBCBCABABCAABCAAB POS = 5C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

3 3)2,C(BC

4 5)3,A(BCA

Page 11: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

11

Example 1: LZ78 Compression Step 5

ABBCBCABABCAABCAAB POS = 8C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

3 3)2,C(BC

4 5)3,A(BCA

5 8)2,A(BA

Page 12: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

12

Example 1: LZ78 Compression Step 6

ABBCBCABABCAABCAAB POS = 10C = AP = BCA

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

3 3)2,C(BC

4 5)3,A(BCA

5 8)2,A(BA

6 10)4,A(BCAA

Page 13: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

13

Example 1: LZ78 Compression Step 7

ABBCBCABABCAABCAAB POS = 14C = empty

P = empty

STEP POSOUTPUTDICTIONARY

1 1(0,A)A

2 2)0,B(B

3 3)2,C(BC

4 5)3,A(BCA

5 8)2,A(BA

6 10)4,A(BCAA

7 14)6,B(BCAAB

Page 14: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

14

Example 1: Coded Information in Bits Now, we calculate the number of bits needed to represent the

coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

The number of bits needed to represent each integer with index n is at most equal to the number of bits needed to represent the (n-1)th index.

For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary.

Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is

(1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits

Page 15: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

15

LZ78 Decompression Algorithm

At the start the dictionary is emptyWhile (more code words in the code stream)

W = code wordC = character following itoutput the string of W + Cadd the string of W + C to the dictionary

end

Page 16: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

16

Example 2: LZ78 Decompression

Example 2: Use the above algorithm to decompress the

compressed output sequence of Example 1, namely,

<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

Page 17: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

17

Example 2: LZ78 Decompression Step 1

<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

Page 18: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

18

Example 2: LZ78 Decompression Step 2<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

Page 19: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

19

Example 2: LZ78 Decompression Step 3<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

33)2,C(BCBC

Page 20: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

20

Example 2: LZ78 Decompression Step 4<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

33)2,C(BCBC

45)3,A(BCABCA

Page 21: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

21

Example 2: LZ78 Decompression Step 5<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

33)2,C(BCBC

45)3,A(BCABCA

58)2,A(BABA

Page 22: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

22

Example 2: LZ78 Decompression Step 6<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

33)2,C(BCBC

45)3,A(BCABCA

58)2,A(BABA

610)4,A(BCAABCAA

Page 23: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

23

Example 2: LZ78 Decompression Step 7<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

W=empty C=empty

STEPPOSINPUTOUTPUTDICTIONARY

11)0,A(AA

22)0,B(BB

33)2,C(BCBC

45)3,A(BCABCA

58)2,A(BABA

610)4,A(BCAABCAA

714)6,B(BCAABBCAAB

Page 24: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

24

LZ78: Concluding Remarks

LZ77 is based on a sliding window. Its output is very similar to that LZ78.

The compression ratio of LZ77 is similar to that of LZ78.

The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step.

More application areas of LZ77 and LZ78:• LZ77: gzip, Squeeze, LHA, PKZIP, ZOO• LZ78: compress, GIF, CCITT (modems), ARC, PAK

Page 25: 1 Lempel-Ziv Compression Techniques  Introduction to Run-Length Encoding  Introduction to Lempel-Ziv Encoding: LZ77 & LZ78  Example 1: Encoding using

25

Exercises1. How many bits are required to represent the codeword

(2,C) generated on page 11? How many bits are required to represent the associated string without compression?

2. Use LZ78 to trace encoding the string

SATATASACITASA.

3. Write a Java program that encodes a given string using LZ78.

4. Write a Java program that decodes a given set of encoded code words using LZ78.