Chapter 5 New

8/11/2019 Chapter 5 New

1/28


2/28

CSC 311

For many of the applications and uses we make of

modern computers, data compression is absolutely

essential.

FaxMp3

Video

TVetc.


3/28

CSC 311

For example: a typical fax uses 40,000 dots per square inch,using a 56K modem would require more than one minute perpage.

A typical 2 hour movie would require 1.04 * 1012bits, farbeyond the capacity of any DVD, yet you can put 2 two hourmovies on a DVD

This is made possible by the use of data compression.


4/28

CSC 311

There are fundamentally two types of data

compression:

Lossless

Lossy


5/28

CSC 311

Lossless:

Lossless compression techniques allow thereceiver to precisely reconstruct the originaldata being transmitted.

Lossy:

Lossy compression techniques allow the receiverto approximately reconstruct the original data.


6/28

CSC 311

Frequency Dependent Codes:

We first want to examine two compression techniques that rely

on the frequency of occurrence of various symbols in constructing

a compression algorithm.

Huffman Codes:

Arithmetic compression


7/28

CSC 311

Huffman Codes:

Huffman codes rely on the frequency of use of thevarious symbols to produce codes of varying lengthto represent the symbols.

Huffman codes display the canonical property that:

No valid Huffman code for any symbol is the prefixfor the code of any other symbol

sometimes called the : no prefix property


8/28

CSC 311

Example:

Letter Frequency Huffman Code

A 25 01B 15 110C 10 111D 20 10E 30 00

Note: Huffman codes are not unique, but a properly formedHuffman code will always be optimal


9/28

CSC 311


10/28

CSC 311

Arithmetic Compression:

Another frequency dependent compression technique.

Based on representing a character string as a singlereal number.

Assigning ranges based on frequency:Letter Frequency % Subinterval [p,q]

A 25 [0,0.25]B 15 [0.25,0.40]C 10 [0.4,0.5]

D 20 [0.5,0.7]E 30 [0.7,1.0]


11/28

CSC 311

How does it work?

We calculate the new interval based on the old intervaland the probabilities of the current symbol

The interval, in this case would change from 0.3-0.9to 0.450.60


12/28

CSC 311

Math shown in next slide


13/28

CSC 311

Step String Next Current[x,y] [p,q] width new x new yChar Interval y-x x-x+w*p y=x+w*q

1 - C [0,1] [0.4,0.5] 1.0 0 + 1*0.4=0.4 0+1*0.5=0.5

2 C A [0.4,0.5] [0,0.25] 0 .1 0.4+0.1*0=0.4 0.4+0.1*0.25=0.425

3 CA B [0.4,0.425] [0.25,0.40] 0.025 0.4+ 0.025*0.25=0.40625 0.4+0.025*0.4=0.41

4 CAB A [0.40625,0.41] [0.0.25] 0.00375 0.40625+0.00375*0= 0.40625+.0..375*.025=O.40625 0.4071875

5 CABA C [0.40625, [0.4,0.5] 0.0009375 0.40625+0.0009375 0.40625+0.0009375

0.4071875] *0.40= 0.406625 *0.5= 0.4067187

We could choose any number in the interval [0.406625-,0.4067187] to representthe string ABCAC

Suppose we send N = 0.4067. The receiver only knows the number we sent and the

contents of the original table of symbols and their probabilities.

How do we produce the original string?


14/28

CSC 311

Step N Interval[p,q] Width Char N-p Divide by width

1 0.4067 [0.4,0.5] 0.1 C 0.0067 0.067

2 0.067 [0, 0.25] 0.25 A 0.067 0.268

3 0.268 [0.25,0.40] 0.15 B 0.018 0.12

4 0.12 [0,0.25] 0.25 A 0.12 0.48

5 0.48 [0.4,0.5] 0.10 C 0.08 0.8

How do we know when to stop? Obviously we could continue the decoding process begunabove, but there are no more characters actually encoded in the message.

It is customary to include a terminating character in the code, when you decode theterminating character, you stop.

Number of characters that can be encoded is limited by the precision of real numberrepresentation on your machine.


15/28


16/28

CSC 311

If we encounter a run of more than 15 zeros, how can we specify

that with only 4 bit codes? This would allow only 15 zeros max.

To send for example; 20 zeros we would send:

1111 0101

The receiver assumes when it sees 1111, that the next code is acontinuation of the previous.

How then might we send a code for 30 zeros?

1111 1111 0000

the code for 0 zeros is needed to terminatethe 1111 code


17/28

Lempel ZIV Compression

LZ is a compression that realizes compression ratios of up to 20 to 1.

It relies on the fact that, in any document, character strings are going

to be repeated.

For example: in legal documents such as contracts, one is likely to find

phrases such as: whereas the party of the first part, repeated many

times in the document. Would it not be nice if we could, rather than

sending the thirty five individual characters contained in the above

phrase, simply send a single integer, such as 18 an have the

receiver understand that 18 stands for the above phrase?


18/28

Lempel-ZIV Compression

Lempel-ZIV provides an elegant algorithm for accomplishing this.

The sender has the original message and a previously agreed upon symbol table, usually the

set of allowable characters in the alphabet.

The receiving party knows nothing to the message content, but it knows what the contents

and organization of the symbol table are.


19/28


Let us suppose, at the senders end, we wish to send the message:

ABABAAABBCACABABACAC

The sender would have the following symbol table, assuming that all possible messages

consist only of patterns of the characters: A B and C.

Beginning Symbol Table:0 A

1 B

2 C

The receiver, knowning that all messages are composed only of the characters A,B, and C,

would have a similar symbol table at the beginning:

0 A1 B2 C


20/28


At the sending end, the sender will keep track of the following information:

The goal is to build an expanded symbol table containing all of the character patterns encountered so far.

One pass through the algorithm is the processing of a new character in the message, the sender tracks the followinginfo:

Pass Buffer Current What is sent What is stored New buffer

Content char in table content

1 A B 0 (code for A) AB (code = 3) B

The algorithm begins by sending the first character, the first pass thru the loop begins by reading

the second character B

The senders symbol table would now look as follows:

0 A

1 B

2 C3 AB


21/28


At the other end of the transmission, the receiver is trying to reconstruct the symbol table thatthe sender is building. The receiver is gathering the following info:

Pass Prior Current Is Current C Tempstring/ What is Printed

(string) (string) Code in Table? 1st Code Pair curr or temp?

1 0 (A) 1 (B) Yes B AB/3 B (current)

Since the receiver has received the code for both A and B sequentially, he knows the senderhas seen the character pattern AB and stores this as entry 3 in his table

Receivers table after pass one.

0 A

1 B

2 C

3 AB


22/28

Lempel-ZIV CompressionThis process continues for the entire Message:

ABABAAABBCACABABACAC

SenderPass Buffer Current What is sent What is stored New buffer

Content char in table content

1 A B 0 (code for A) AB (code = 3) B2 B A 1(code for B) BA (code = 4) A

3 A B -------------- --------------- AB

4 AB A 3 (code for AB) ABA(code=5) A

5 A B ___________ ________ AB

6 AB C 3(code for AB) ABC(code =6) C

7 C B 2(code for C) CB(code = 7) B

8 B A ________ _________ BA

9 BA B 4 (code for BA) BAB (code = 8) B

10 B A ________ _________ BA11 BA B _______ ________ BAB

12 BAB A 8(code for BAB) BABA(code=9) A

Pass Prior Current Is Current C Tempstring/ What is Printed

(string) (string) Code in Table? 1st Code Pair curr or temp?

1 0 (A) 1 (B) Yes B AB/3 B (current

2 1(B) 3(AB) Yes A BA/4 AB(current)

3 3(AB) 3(AB) Yes A ABA/5 AB(current)

4 3(AB) 2(C) Yes C ABC/6 C(current)

5 2 ( C ) 4(BA) Yes B CB/7 BA(current)

6 4(BA) 8 No B BAB/8 BAB(temp)


23/28


At this point the sender and receiver symbol tables would contain:

Sender Receiver0 A A1 B B2 C C3 AB AB

4 BA BA5 ABA ABA6 ABC ABC7 CB CB8 BAB BAB9 BABA not yet

CSC 311


24/28

CSC 311

Image compression is an example of Lossy Compression:

At just 640 X 480 resolution, a color image would require

7,372,800 bits, for motion, we send 30 images per second

which would require over 220 million bits per second for a single

video stream.

Lossy compression schemes are used to dramatically reduce this

requirement.


25/28

CSC 311We wont got thru the details of how video compression is accomplished, but

I suggest you read the remainder of the chapter for your own enlightenment.

Images that are transmitted consist of three different frame types:

P frame: encoded by computing the differences between the current

frame and the previous frame;

B frame: similar to a P frame except it is interpolated betweena previous and future frame

I frame: just a JPEG encoded image

CSC 311


26/28

CSC 311

CSC 311


27/28

CSC 311


28/28

Documents

Chapter 5 New