Chapter 5 New

  • Upload
    hkgrao

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

  • 8/11/2019 Chapter 5 New

    1/28

  • 8/11/2019 Chapter 5 New

    2/28

    CSC 311

    For many of the applications and uses we make of

    modern computers, data compression is absolutely

    essential.

    FaxMp3

    Video

    TVetc.

  • 8/11/2019 Chapter 5 New

    3/28

    CSC 311

    For example: a typical fax uses 40,000 dots per square inch,using a 56K modem would require more than one minute perpage.

    A typical 2 hour movie would require 1.04 * 1012bits, farbeyond the capacity of any DVD, yet you can put 2 two hourmovies on a DVD

    This is made possible by the use of data compression.

  • 8/11/2019 Chapter 5 New

    4/28

    CSC 311

    There are fundamentally two types of data

    compression:

    Lossless

    Lossy

  • 8/11/2019 Chapter 5 New

    5/28

    CSC 311

    Lossless:

    Lossless compression techniques allow thereceiver to precisely reconstruct the originaldata being transmitted.

    Lossy:

    Lossy compression techniques allow the receiverto approximately reconstruct the original data.

  • 8/11/2019 Chapter 5 New

    6/28

    CSC 311

    Frequency Dependent Codes:

    We first want to examine two compression techniques that rely

    on the frequency of occurrence of various symbols in constructing

    a compression algorithm.

    Huffman Codes:

    Arithmetic compression

  • 8/11/2019 Chapter 5 New

    7/28

    CSC 311

    Huffman Codes:

    Huffman codes rely on the frequency of use of thevarious symbols to produce codes of varying lengthto represent the symbols.

    Huffman codes display the canonical property that:

    No valid Huffman code for any symbol is the prefixfor the code of any other symbol

    sometimes called the : no prefix property

  • 8/11/2019 Chapter 5 New

    8/28

    CSC 311

    Example:

    Letter Frequency Huffman Code

    A 25 01B 15 110C 10 111D 20 10E 30 00

    Note: Huffman codes are not unique, but a properly formedHuffman code will always be optimal

  • 8/11/2019 Chapter 5 New

    9/28

    CSC 311

  • 8/11/2019 Chapter 5 New

    10/28

    CSC 311

    Arithmetic Compression:

    Another frequency dependent compression technique.

    Based on representing a character string as a singlereal number.

    Assigning ranges based on frequency:Letter Frequency % Subinterval [p,q]

    A 25 [0,0.25]B 15 [0.25,0.40]C 10 [0.4,0.5]

    D 20 [0.5,0.7]E 30 [0.7,1.0]

  • 8/11/2019 Chapter 5 New

    11/28

    CSC 311

    How does it work?

    We calculate the new interval based on the old intervaland the probabilities of the current symbol

    The interval, in this case would change from 0.3-0.9to 0.450.60

  • 8/11/2019 Chapter 5 New

    12/28

    CSC 311

    Math shown in next slide

  • 8/11/2019 Chapter 5 New

    13/28

    CSC 311

    Step String Next Current[x,y] [p,q] width new x new yChar Interval y-x x-x+w*p y=x+w*q

    1 - C [0,1] [0.4,0.5] 1.0 0 + 1*0.4=0.4 0+1*0.5=0.5

    2 C A [0.4,0.5] [0,0.25] 0 .1 0.4+0.1*0=0.4 0.4+0.1*0.25=0.425

    3 CA B [0.4,0.425] [0.25,0.40] 0.025 0.4+ 0.025*0.25=0.40625 0.4+0.025*0.4=0.41

    4 CAB A [0.40625,0.41] [0.0.25] 0.00375 0.40625+0.00375*0= 0.40625+.0..375*.025=O.40625 0.4071875

    5 CABA C [0.40625, [0.4,0.5] 0.0009375 0.40625+0.0009375 0.40625+0.0009375

    0.4071875] *0.40= 0.406625 *0.5= 0.4067187

    We could choose any number in the interval [0.406625-,0.4067187] to representthe string ABCAC

    Suppose we send N = 0.4067. The receiver only knows the number we sent and the

    contents of the original table of symbols and their probabilities.

    How do we produce the original string?

  • 8/11/2019 Chapter 5 New

    14/28

    CSC 311

    Step N Interval[p,q] Width Char N-p Divide by width

    1 0.4067 [0.4,0.5] 0.1 C 0.0067 0.067

    2 0.067 [0, 0.25] 0.25 A 0.067 0.268

    3 0.268 [0.25,0.40] 0.15 B 0.018 0.12

    4 0.12 [0,0.25] 0.25 A 0.12 0.48

    5 0.48 [0.4,0.5] 0.10 C 0.08 0.8

    How do we know when to stop? Obviously we could continue the decoding process begunabove, but there are no more characters actually encoded in the message.

    It is customary to include a terminating character in the code, when you decode theterminating character, you stop.

    Number of characters that can be encoded is limited by the precision of real numberrepresentation on your machine.

  • 8/11/2019 Chapter 5 New

    15/28

  • 8/11/2019 Chapter 5 New

    16/28

    CSC 311

    If we encounter a run of more than 15 zeros, how can we specify

    that with only 4 bit codes? This would allow only 15 zeros max.

    To send for example; 20 zeros we would send:

    1111 0101

    The receiver assumes when it sees 1111, that the next code is acontinuation of the previous.

    How then might we send a code for 30 zeros?

    1111 1111 0000

    the code for 0 zeros is needed to terminatethe 1111 code

  • 8/11/2019 Chapter 5 New

    17/28

    Lempel ZIV Compression

    LZ is a compression that realizes compression ratios of up to 20 to 1.

    It relies on the fact that, in any document, character strings are going

    to be repeated.

    For example: in legal documents such as contracts, one is likely to find

    phrases such as: whereas the party of the first part, repeated many

    times in the document. Would it not be nice if we could, rather than

    sending the thirty five individual characters contained in the above

    phrase, simply send a single integer, such as 18 an have the

    receiver understand that 18 stands for the above phrase?

  • 8/11/2019 Chapter 5 New

    18/28

    Lempel-ZIV Compression

    Lempel-ZIV provides an elegant algorithm for accomplishing this.

    The sender has the original message and a previously agreed upon symbol table, usually the

    set of allowable characters in the alphabet.

    The receiving party knows nothing to the message content, but it knows what the contents

    and organization of the symbol table are.

  • 8/11/2019 Chapter 5 New

    19/28

    Lempel-ZIV Compression

    Let us suppose, at the senders end, we wish to send the message:

    ABABAAABBCACABABACAC

    The sender would have the following symbol table, assuming that all possible messages

    consist only of patterns of the characters: A B and C.

    Beginning Symbol Table:0 A

    1 B

    2 C

    The receiver, knowning that all messages are composed only of the characters A,B, and C,

    would have a similar symbol table at the beginning:

    0 A1 B2 C

  • 8/11/2019 Chapter 5 New

    20/28

    Lempel-ZIV Compression

    At the sending end, the sender will keep track of the following information:

    The goal is to build an expanded symbol table containing all of the character patterns encountered so far.

    One pass through the algorithm is the processing of a new character in the message, the sender tracks the followinginfo:

    Pass Buffer Current What is sent What is stored New buffer

    Content char in table content

    1 A B 0 (code for A) AB (code = 3) B

    The algorithm begins by sending the first character, the first pass thru the loop begins by reading

    the second character B

    The senders symbol table would now look as follows:

    0 A

    1 B

    2 C3 AB

  • 8/11/2019 Chapter 5 New

    21/28

    Lempel-ZIV Compression

    At the other end of the transmission, the receiver is trying to reconstruct the symbol table thatthe sender is building. The receiver is gathering the following info:

    Pass Prior Current Is Current C Tempstring/ What is Printed

    (string) (string) Code in Table? 1st Code Pair curr or temp?

    1 0 (A) 1 (B) Yes B AB/3 B (current)

    Since the receiver has received the code for both A and B sequentially, he knows the senderhas seen the character pattern AB and stores this as entry 3 in his table

    Receivers table after pass one.

    0 A

    1 B

    2 C

    3 AB

  • 8/11/2019 Chapter 5 New

    22/28

    Lempel-ZIV CompressionThis process continues for the entire Message:

    ABABAAABBCACABABACAC

    SenderPass Buffer Current What is sent What is stored New buffer

    Content char in table content

    1 A B 0 (code for A) AB (code = 3) B2 B A 1(code for B) BA (code = 4) A

    3 A B -------------- --------------- AB

    4 AB A 3 (code for AB) ABA(code=5) A

    5 A B ___________ ________ AB

    6 AB C 3(code for AB) ABC(code =6) C

    7 C B 2(code for C) CB(code = 7) B

    8 B A ________ _________ BA

    9 BA B 4 (code for BA) BAB (code = 8) B

    10 B A ________ _________ BA11 BA B _______ ________ BAB

    12 BAB A 8(code for BAB) BABA(code=9) A

    Pass Prior Current Is Current C Tempstring/ What is Printed

    (string) (string) Code in Table? 1st Code Pair curr or temp?

    1 0 (A) 1 (B) Yes B AB/3 B (current

    2 1(B) 3(AB) Yes A BA/4 AB(current)

    3 3(AB) 3(AB) Yes A ABA/5 AB(current)

    4 3(AB) 2(C) Yes C ABC/6 C(current)

    5 2 ( C ) 4(BA) Yes B CB/7 BA(current)

    6 4(BA) 8 No B BAB/8 BAB(temp)

  • 8/11/2019 Chapter 5 New

    23/28

    Lempel-ZIV Compression

    At this point the sender and receiver symbol tables would contain:

    Sender Receiver0 A A1 B B2 C C3 AB AB

    4 BA BA5 ABA ABA6 ABC ABC7 CB CB8 BAB BAB9 BABA not yet

    CSC 311

  • 8/11/2019 Chapter 5 New

    24/28

    CSC 311

    Image compression is an example of Lossy Compression:

    At just 640 X 480 resolution, a color image would require

    7,372,800 bits, for motion, we send 30 images per second

    which would require over 220 million bits per second for a single

    video stream.

    Lossy compression schemes are used to dramatically reduce this

    requirement.

  • 8/11/2019 Chapter 5 New

    25/28

    CSC 311We wont got thru the details of how video compression is accomplished, but

    I suggest you read the remainder of the chapter for your own enlightenment.

    Images that are transmitted consist of three different frame types:

    P frame: encoded by computing the differences between the current

    frame and the previous frame;

    B frame: similar to a P frame except it is interpolated betweena previous and future frame

    I frame: just a JPEG encoded image

    CSC 311

  • 8/11/2019 Chapter 5 New

    26/28

    CSC 311

    CSC 311

  • 8/11/2019 Chapter 5 New

    27/28

    CSC 311

  • 8/11/2019 Chapter 5 New

    28/28