Upload
hkgrao
View
224
Download
0
Embed Size (px)
Citation preview
8/11/2019 Chapter 5 New
1/28
8/11/2019 Chapter 5 New
2/28
CSC 311
For many of the applications and uses we make of
modern computers, data compression is absolutely
essential.
FaxMp3
Video
TVetc.
8/11/2019 Chapter 5 New
3/28
CSC 311
For example: a typical fax uses 40,000 dots per square inch,using a 56K modem would require more than one minute perpage.
A typical 2 hour movie would require 1.04 * 1012bits, farbeyond the capacity of any DVD, yet you can put 2 two hourmovies on a DVD
This is made possible by the use of data compression.
8/11/2019 Chapter 5 New
4/28
CSC 311
There are fundamentally two types of data
compression:
Lossless
Lossy
8/11/2019 Chapter 5 New
5/28
CSC 311
Lossless:
Lossless compression techniques allow thereceiver to precisely reconstruct the originaldata being transmitted.
Lossy:
Lossy compression techniques allow the receiverto approximately reconstruct the original data.
8/11/2019 Chapter 5 New
6/28
CSC 311
Frequency Dependent Codes:
We first want to examine two compression techniques that rely
on the frequency of occurrence of various symbols in constructing
a compression algorithm.
Huffman Codes:
Arithmetic compression
8/11/2019 Chapter 5 New
7/28
CSC 311
Huffman Codes:
Huffman codes rely on the frequency of use of thevarious symbols to produce codes of varying lengthto represent the symbols.
Huffman codes display the canonical property that:
No valid Huffman code for any symbol is the prefixfor the code of any other symbol
sometimes called the : no prefix property
8/11/2019 Chapter 5 New
8/28
CSC 311
Example:
Letter Frequency Huffman Code
A 25 01B 15 110C 10 111D 20 10E 30 00
Note: Huffman codes are not unique, but a properly formedHuffman code will always be optimal
8/11/2019 Chapter 5 New
9/28
CSC 311
8/11/2019 Chapter 5 New
10/28
CSC 311
Arithmetic Compression:
Another frequency dependent compression technique.
Based on representing a character string as a singlereal number.
Assigning ranges based on frequency:Letter Frequency % Subinterval [p,q]
A 25 [0,0.25]B 15 [0.25,0.40]C 10 [0.4,0.5]
D 20 [0.5,0.7]E 30 [0.7,1.0]
8/11/2019 Chapter 5 New
11/28
CSC 311
How does it work?
We calculate the new interval based on the old intervaland the probabilities of the current symbol
The interval, in this case would change from 0.3-0.9to 0.450.60
8/11/2019 Chapter 5 New
12/28
CSC 311
Math shown in next slide
8/11/2019 Chapter 5 New
13/28
CSC 311
Step String Next Current[x,y] [p,q] width new x new yChar Interval y-x x-x+w*p y=x+w*q
1 - C [0,1] [0.4,0.5] 1.0 0 + 1*0.4=0.4 0+1*0.5=0.5
2 C A [0.4,0.5] [0,0.25] 0 .1 0.4+0.1*0=0.4 0.4+0.1*0.25=0.425
3 CA B [0.4,0.425] [0.25,0.40] 0.025 0.4+ 0.025*0.25=0.40625 0.4+0.025*0.4=0.41
4 CAB A [0.40625,0.41] [0.0.25] 0.00375 0.40625+0.00375*0= 0.40625+.0..375*.025=O.40625 0.4071875
5 CABA C [0.40625, [0.4,0.5] 0.0009375 0.40625+0.0009375 0.40625+0.0009375
0.4071875] *0.40= 0.406625 *0.5= 0.4067187
We could choose any number in the interval [0.406625-,0.4067187] to representthe string ABCAC
Suppose we send N = 0.4067. The receiver only knows the number we sent and the
contents of the original table of symbols and their probabilities.
How do we produce the original string?
8/11/2019 Chapter 5 New
14/28
CSC 311
Step N Interval[p,q] Width Char N-p Divide by width
1 0.4067 [0.4,0.5] 0.1 C 0.0067 0.067
2 0.067 [0, 0.25] 0.25 A 0.067 0.268
3 0.268 [0.25,0.40] 0.15 B 0.018 0.12
4 0.12 [0,0.25] 0.25 A 0.12 0.48
5 0.48 [0.4,0.5] 0.10 C 0.08 0.8
How do we know when to stop? Obviously we could continue the decoding process begunabove, but there are no more characters actually encoded in the message.
It is customary to include a terminating character in the code, when you decode theterminating character, you stop.
Number of characters that can be encoded is limited by the precision of real numberrepresentation on your machine.
8/11/2019 Chapter 5 New
15/28
8/11/2019 Chapter 5 New
16/28
CSC 311
If we encounter a run of more than 15 zeros, how can we specify
that with only 4 bit codes? This would allow only 15 zeros max.
To send for example; 20 zeros we would send:
1111 0101
The receiver assumes when it sees 1111, that the next code is acontinuation of the previous.
How then might we send a code for 30 zeros?
1111 1111 0000
the code for 0 zeros is needed to terminatethe 1111 code
8/11/2019 Chapter 5 New
17/28
Lempel ZIV Compression
LZ is a compression that realizes compression ratios of up to 20 to 1.
It relies on the fact that, in any document, character strings are going
to be repeated.
For example: in legal documents such as contracts, one is likely to find
phrases such as: whereas the party of the first part, repeated many
times in the document. Would it not be nice if we could, rather than
sending the thirty five individual characters contained in the above
phrase, simply send a single integer, such as 18 an have the
receiver understand that 18 stands for the above phrase?
8/11/2019 Chapter 5 New
18/28
Lempel-ZIV Compression
Lempel-ZIV provides an elegant algorithm for accomplishing this.
The sender has the original message and a previously agreed upon symbol table, usually the
set of allowable characters in the alphabet.
The receiving party knows nothing to the message content, but it knows what the contents
and organization of the symbol table are.
8/11/2019 Chapter 5 New
19/28
Lempel-ZIV Compression
Let us suppose, at the senders end, we wish to send the message:
ABABAAABBCACABABACAC
The sender would have the following symbol table, assuming that all possible messages
consist only of patterns of the characters: A B and C.
Beginning Symbol Table:0 A
1 B
2 C
The receiver, knowning that all messages are composed only of the characters A,B, and C,
would have a similar symbol table at the beginning:
0 A1 B2 C
8/11/2019 Chapter 5 New
20/28
Lempel-ZIV Compression
At the sending end, the sender will keep track of the following information:
The goal is to build an expanded symbol table containing all of the character patterns encountered so far.
One pass through the algorithm is the processing of a new character in the message, the sender tracks the followinginfo:
Pass Buffer Current What is sent What is stored New buffer
Content char in table content
1 A B 0 (code for A) AB (code = 3) B
The algorithm begins by sending the first character, the first pass thru the loop begins by reading
the second character B
The senders symbol table would now look as follows:
0 A
1 B
2 C3 AB
8/11/2019 Chapter 5 New
21/28
Lempel-ZIV Compression
At the other end of the transmission, the receiver is trying to reconstruct the symbol table thatthe sender is building. The receiver is gathering the following info:
Pass Prior Current Is Current C Tempstring/ What is Printed
(string) (string) Code in Table? 1st Code Pair curr or temp?
1 0 (A) 1 (B) Yes B AB/3 B (current)
Since the receiver has received the code for both A and B sequentially, he knows the senderhas seen the character pattern AB and stores this as entry 3 in his table
Receivers table after pass one.
0 A
1 B
2 C
3 AB
8/11/2019 Chapter 5 New
22/28
Lempel-ZIV CompressionThis process continues for the entire Message:
ABABAAABBCACABABACAC
SenderPass Buffer Current What is sent What is stored New buffer
Content char in table content
1 A B 0 (code for A) AB (code = 3) B2 B A 1(code for B) BA (code = 4) A
3 A B -------------- --------------- AB
4 AB A 3 (code for AB) ABA(code=5) A
5 A B ___________ ________ AB
6 AB C 3(code for AB) ABC(code =6) C
7 C B 2(code for C) CB(code = 7) B
8 B A ________ _________ BA
9 BA B 4 (code for BA) BAB (code = 8) B
10 B A ________ _________ BA11 BA B _______ ________ BAB
12 BAB A 8(code for BAB) BABA(code=9) A
Pass Prior Current Is Current C Tempstring/ What is Printed
(string) (string) Code in Table? 1st Code Pair curr or temp?
1 0 (A) 1 (B) Yes B AB/3 B (current
2 1(B) 3(AB) Yes A BA/4 AB(current)
3 3(AB) 3(AB) Yes A ABA/5 AB(current)
4 3(AB) 2(C) Yes C ABC/6 C(current)
5 2 ( C ) 4(BA) Yes B CB/7 BA(current)
6 4(BA) 8 No B BAB/8 BAB(temp)
8/11/2019 Chapter 5 New
23/28
Lempel-ZIV Compression
At this point the sender and receiver symbol tables would contain:
Sender Receiver0 A A1 B B2 C C3 AB AB
4 BA BA5 ABA ABA6 ABC ABC7 CB CB8 BAB BAB9 BABA not yet
CSC 311
8/11/2019 Chapter 5 New
24/28
CSC 311
Image compression is an example of Lossy Compression:
At just 640 X 480 resolution, a color image would require
7,372,800 bits, for motion, we send 30 images per second
which would require over 220 million bits per second for a single
video stream.
Lossy compression schemes are used to dramatically reduce this
requirement.
8/11/2019 Chapter 5 New
25/28
CSC 311We wont got thru the details of how video compression is accomplished, but
I suggest you read the remainder of the chapter for your own enlightenment.
Images that are transmitted consist of three different frame types:
P frame: encoded by computing the differences between the current
frame and the previous frame;
B frame: similar to a P frame except it is interpolated betweena previous and future frame
I frame: just a JPEG encoded image
CSC 311
8/11/2019 Chapter 5 New
26/28
CSC 311
CSC 311
8/11/2019 Chapter 5 New
27/28
CSC 311
8/11/2019 Chapter 5 New
28/28