Data Compression Techniques

Data CompressionData CompressionTechniquesTechniques

By…

Sukanta beheraReg. No. 07SBSCA048

Data CompressionData CompressionLossless data compressionLossless data compression: : Store/Transmit Store/Transmit bigbig files using files using fewfew bytes so bytes so that the original files can be perfectly that the original files can be perfectly retrieved. Example: retrieved. Example: zipzip..

Loosely data compressionLoosely data compression: : Store/Transmit Store/Transmit big big files using files using fewfew bytes so bytes so that the original files can be approximately that the original files can be approximately retrieved. Example: retrieved. Example: mp3mp3..

Motivation:Motivation: Save storage space and/or Save storage space and/or bandwidth.bandwidth.

Definition of CodecDefinition of Codec Let Let be an alphabet and let S be an alphabet and let S µµ * * be a set of possible be a set of possible messages.messages.

A lossless A lossless codeccodec (c,d) consists of (c,d) consists of A A codercoder c : S c : S !! {0,1}* {0,1}*

A A decoderdecoder d: {0,1}* d: {0,1}* !! **

so that so that 88 x x 22 S: d(c(x))=x S: d(c(x))=x

RemarksRemarks It is necessary for c to be an injective map.It is necessary for c to be an injective map.

If we do not worry about efficiency, we don’t have If we do not worry about efficiency, we don’t have to specify d if we have specified c.to specify d if we have specified c.

Terminology: Some times we just say “Terminology: Some times we just say “codecode” ” rather than “rather than “codeccodec”.”.

Terminology: The set c(S) is called the set of Terminology: The set c(S) is called the set of code code wordswords of the codec. In examples to follow, we often of the codec. In examples to follow, we often just state the set of code words.just state the set of code words.

PropositionProposition Let S = {0,1}Let S = {0,1}nn. Then, for any codec . Then, for any codec (c,d) there is some x (c,d) there is some x 22 S, so that |c(x)| S, so that |c(x)| ¸̧ n. n.

“ “Compression is impossibleCompression is impossible””

PropositionProposition For any message x, there is a codec For any message x, there is a codec (c,d) so that |c(x)|=1.(c,d) so that |c(x)|=1.

“ “The Encyclopedia Britannica can be The Encyclopedia Britannica can be compressed to 1 bitcompressed to 1 bit”.”.

RemarksRemarks We cannot compress We cannot compress all all data. Thus, we must data. Thus, we must concentrate on compressing “relevant” data.concentrate on compressing “relevant” data.

It is trivial to compress data known in advance. It is trivial to compress data known in advance. We should concentrate on compressing data about We should concentrate on compressing data about which there is uncertainty.which there is uncertainty.

We will use probability theory as a tool to model We will use probability theory as a tool to model uncertainty about relevant data.uncertainty about relevant data.

Can random data be Can random data be compressed?compressed?

Suppose Suppose = {0,1} and S = {0,1} = {0,1} and S = {0,1}22..

We know we cannot compress all data, but We know we cannot compress all data, but can we do well on the average?can we do well on the average?

Let us assume the uniform distribution on Let us assume the uniform distribution on S and look at the expected length of the S and look at the expected length of the code words.code words.

Definition of prefix codesDefinition of prefix codes A A prefix codeprefix code c is a code with the property that c is a code with the property that for all different messages x and y, c(x) is not a for all different messages x and y, c(x) is not a prefix of c(y).prefix of c(y).

Example: Example: Fixed lengthFixed length codes (such as ascii). codes (such as ascii). Example: {0,11,10}Example: {0,11,10}

All codes in this course will be prefix codes.All codes in this course will be prefix codes.

PropositionProposition If c is a prefix code for S = If c is a prefix code for S = 11 then c then cnn is a prefix code for S = is a prefix code for S = nn where where

ccnn(x(x11 x x22 .. x .. xnn) = c(x) = c(x11))¢¢ c(x c(x22) ….) ….¢¢ c(x c(xnn))

Prefix codes and treesPrefix codes and trees Set of code words of a prefix code: Set of code words of a prefix code:

{0,11,10}.{0,11,10}.

0 1

0 1

Alternative view of prefix Alternative view of prefix codescodes

A prefix code is an assignment of the A prefix code is an assignment of the messages of Smessages of S to the leaves of a to the leaves of a rooted binary tree.rooted binary tree.

The codeword of a message x is The codeword of a message x is found by reading the labels on the found by reading the labels on the edges on the path from the root of the edges on the path from the root of the tree to the leaf corresponding to x.tree to the leaf corresponding to x.

Binary trees and the interval Binary trees and the interval [0,1)[0,1)0 1

0 1

0 11/21/4 3/4

[0,1/2)

[1/2,3/4) [3/4,1)

Alternative view of prefix Alternative view of prefix codescodes

A prefix code is an assignment of the A prefix code is an assignment of the messages of S to disjoint messages of S to disjoint dyadic dyadic intervals.intervals.

A dyadic interval is a real interval of A dyadic interval is a real interval of the form the form [ k 2[ k 2- m- m, (k+1) 2, (k+1) 2- m - m )) with with k+1 k+1 ·· 2 2mm. The corresponding code word . The corresponding code word is the m-bit binary representation of k.is the m-bit binary representation of k.

Kraft-McMillan InequalityKraft-McMillan Inequality Let mLet m11, m, m22, … be the lengths of the , … be the lengths of the code words of a prefix code. Then, code words of a prefix code. Then, 2 2- -

mmii ·· 1. 1.

Let mLet m11, m, m22, … be integers with , … be integers with 2 2- m- mii ·· 1. Then there is prefix code c so that 1. Then there is prefix code c so that {m{mii} are the lengths of the code words } are the lengths of the code words of c.of c.

ProbabilityProbabilityA A probability distributionprobability distribution p on S is a map p on S is a map p: S p: S !! [0,1] so that [0,1] so that x x 22 S S p(x) = 1. p(x) = 1.

A U-valued A U-valued stochastic variablestochastic variable is a map is a map Y: S Y: S !! U. U.

If Y: S If Y: S !! R R is a stochastic variable, its is a stochastic variable, its expected valueexpected value E[Y] is E[Y] is x x 22 S S p(x) Y(x). p(x) Y(x).

Self-entropySelf-entropy Given a probability distribution p on S, the Given a probability distribution p on S, the self-entropy self-entropy of x of x 22 S is the defined as S is the defined as

H(x) = – logH(x) = – log22 p(x). p(x).

The self-entropy of a message with probability 1 is 0 bitsThe self-entropy of a message with probability 1 is 0 bits

The self-entropy of a message with probability 0 is +The self-entropy of a message with probability 0 is +11..

The self-entropy of a message with probability ½ is 1 bitThe self-entropy of a message with probability ½ is 1 bit

We often measure entropy is unit “bits”We often measure entropy is unit “bits”

EntropyEntropyGiven a probability distribution p on S, its Given a probability distribution p on S, its entropy entropy H[p] is defined as E[H], i.e.H[p] is defined as E[H], i.e.

H[p] = – H[p] = – x x 22 S S p(x) log p(x) log22 p(x). p(x).

For a stochastic variable X, its entropy H[X] For a stochastic variable X, its entropy H[X] is the entropy of its underlying distribution:is the entropy of its underlying distribution:

H[X] = – H[X] = – ii Pr[X=i] log Pr[X=i] log22 Pr[X=i] Pr[X=i]

FactsFacts The entropy of the uniform distribution on The entropy of the uniform distribution on {0,1}{0,1}nn is n bits. Any other distribution on is n bits. Any other distribution on {0,1}{0,1}nn has strictly smaller entropy. has strictly smaller entropy.

If XIf X11 and X and X22 are independent stochastic are independent stochastic variables, then H(Xvariables, then H(X11, X, X22) = H(X) = H(X11) + H(X) + H(X22).).

For any function f, H(f(X)) For any function f, H(f(X)) ·· H(X). H(X).

Shannon’s theoremShannon’s theorem Let S be a set of messages and let X be an S-Let S be a set of messages and let X be an S-valued stochastic variable.valued stochastic variable.

For all prefix codes c on S,For all prefix codes c on S, E[ |c(X)| ] E[ |c(X)| ] ¸̧ H[X]. H[X].

There is a prefix code c on S so that There is a prefix code c on S so that E[ |c(X)| ] < H[X] + 1E[ |c(X)| ] < H[X] + 1 In fact, for all x in S, |c(x)| < H[x] + 1.In fact, for all x in S, |c(x)| < H[x] + 1.

Documents

Data Compression Techniques