Lie Gonzalez Ch8 ppt

8/10/2019 Lie Gonzalez Ch8 ppt

1/106

CCU, Taiwan

Wen-Nung Lie

Chapter 8 : Image

Compression


2/106

8-1CCU, Taiwan

Wen-Nung Lie

Applications of image

compression

Televideo conferencing, Remote sensing (satellite imagery),

Document and medical imaging,

Facsimile transmission (FAX),

Control of remotely piloted vehicles in military,

space, and hazardous waste management


3/106

8-2CCU, Taiwan

Wen-Nung Lie

Fundamentals

What is data and hat is information ? Data are the means by which information is conveyed.

Various amounts of data may be used to represent the

same amount of information Data redundancy

Coding redundancy

Interpixel redundancy Psychovisual redundancy


4/1068-3CCU, Taiwan

Wen-Nung Lie

Coding redundancy

The graylevel histogram of an image can provide a great

deal of insight into the construction of codes

The average number of bits to represent each pixel

is the average number of bits to represent each value of

Variable length coding (VLC) : Higher probability, shorter bit length

==

1

0)()(

L

k

krkavg rprlL

)( krl kr

(code2)bits7.2

(code1)bits0.3=avgL


5/106


6/1068-5

CCU, Taiwan

Wen-Nung Lie

Run-length coding to remove

spatial redundancy

62.063.2

11

63.2

==

=

D

R

R

C


7/1068-6

CCU, Taiwan

Wen-Nung Lie

Psychovisual redundancy

Certain information simply has less relative importance

than others in normal visual processing psychovisual

redundancy

The elimination of psychovisually redundant data results

in a loss of quantitative information called quantization

lossy data compression E.g., quantization in graylevels

E.g., line interlacing in TV

(reduced video scanning rate)

(b) False contouring on quantization

(c) Improve graylevel quantizationvia dithering


8/1068-7

CCU, Taiwan

Wen-Nung Lie

Fidelity criteria

Objective fidelity criterion

Root-mean-square error

Mean-square signal-to-noise ratio

Subjective fidelity criterion

Much worse, worse, slightly worse, the same, slightly better, better,much better

21

1

0

1

0

2)],(),([1

=

=

=

M

x

N

y

rms yxfyxfMN

e

[ ]

=

=

=

=

=

1

0

1

0

2

1

0

1

0

2

),(),(

),(

M

x

N

y

M

x

N

y

msyxfyxf

yxf

SNR


9/106

8-8CCU, Taiwan

Wen-Nung Lie

Image compression model

Source encoder Remove input redundancies

Channel encoder

Increase the noise immunity of the source encodersoutput


10/106

8-9CCU, Taiwan

Wen-Nung Lie

Source encoder model

Mapper

Transform the input data

into a format designed to

reduce interpixel

redundancies in the image(reversible)

Quantizer Reduce the accuracy of the mappers output (reduce the

psychovisual redundancies) irreversible and must be omitted onerror-free compression

Symbol encoder

Fixed or variable-length coder (assign the shortest code words tothe most frequently occurring output to reduce codingredundancies) -- reversible


11/106

8-10CCU, Taiwan

Wen-Nung Lie

Channel encoder and decoder

Designed to reduce the impact of channel noise by

inserting a controlled form of redundancy into the source

encoded data

Hamming code as a channel code

7-bit Hamming (7,4) the minimum distance is 3 and could

correct one bit error

A single-bit error is indicated by a nonzero parity word c4c2c1

07

160124

250132

330231

bh

bhbbbh

bhbbbh

bhbbbh

===

==

==

h1

h2

h4

are evenparity bits

76544

76322

75311

hhhhc

hhhhchhhhc

=

==


12/106

8-11CCU, Taiwan

Wen-Nung Lie

Information theory

Explore the minimum amount of data that issufficient to describe completely an image without

loss of information

Self-information of an eventEwith probabilityP(E)

)(log

)(

1log)( EP

EP

EI ==

P(E)=1 I(E)=0

P(E)=1/2 I(E)=1 bit

The base of logarithmdetermines the information units


13/106

8-12CCU, Taiwan

Wen-Nung Lie

Information channel

Source alphabet

Symbol probability

Ensemble (A, z) describes the information source

completely Average information per source output

{ }J

aaaA ,...,,21

= =

=J

j

jaP

1

1)(

[ ]TJaPaPaP )(),...,(),( 21=z

==

J

j

jj aPaPH1

)(log)()(z

H(z) is the uncertainty or entropy of the source orthe average amount of information (in m-ary units)

obtained by observing a single source output

For equally probable source symbols, entropy is maximized


14/106

8-13CCU, Taiwan

Wen-Nung Lie

Channel alphabet Channel description (B, v)

{ }K

bbbB ,...,,21

=

[ ]TKbPbPbP )(),...,(),( 21=v

=

=J

j

jjkk aPabPbP1

)()()(

=

)(......)()(

...........

...........

..........)(

)(......)()(

21

12

12111

JKKK

J

abPabPabP

abP

abPabPabP

Q

Q : Forward channel transitionmatrix, channel matrix

Qzv =


15/106

8-14CCU, Taiwan

Wen-Nung Lie

Conditional entropy function

is called the equivocation of z wrt v.

The difference between and is the average

information received upon observing a single output

symbol -- mutual information

==

J

j

kjkjk baPbaPbH1

)(log)()(z

=

=K

k

kk bPbHH1

)()()( zvz

= ==J

jkjkj

K

k baPbaPH 1 1 )(log),()( vz

)( vzH

)(zH )( vzH

),( vzI

)()(),( vzzvz HHI =


16/106

8-15CCU, Taiwan

Wen-Nung Lie

Mutual information

The minimum possible value ofI(z,v) is zero and occurs

when the input and output symbols are statistically

independent (i.e., )

The maximum value ofI(z,v) for all possible z is called the

capacity Cdescribed by channel matrix Q

= =

=

= =

=

=

J

j

K

k

J

i

kii

kj

kjj

J

j

K

k kj

kj

kj

qaP

qqaP

bPaP

baPbaPvzI

1 1

1

1 1

)(

log)(

)()(

),(log),(),(

Q={qij}

)()(),( kjkj bPaPbaP =

)],([max vzz

IC=


17/106

8-16CCU, Taiwan

Wen-Nung Lie

Channel capacity

Capacity : the maximum rate at which information can be

transmitted reliably through the channel

Channel capacity does not depend on the input probability

of the source (i.e., on how the channel is used) but is a

function of the conditional probabilities defining thechannel alone (i.e., Q)


18/106

8-17CCU, Taiwan

Wen-Nung Lie

Binary entropy function For binary source

Binary symmetrical channel (BSC) error probability :

channel matrix :

probability of received symbols :

mutual information :

T

bsbs pp ]1,[ =z

bsbsbsbs ppppH 22 loglog)( =z

5.0bit1)( == bsbs pwhenHMax

ep

=

ee

ee

pp

ppQ

++==

bsebse

bsebse

ppppppppQzv

)()(),( ebsebsebsbs pHppppHI +=vz

Plot_1

Plot_2

I(z,v)=0 whenpbs=0 or 1

I(z,v)=1-Hbs(pe)=Cwhenpbs=0.5 and anype

To estimate source entropy

To estimate channel capacity


19/106

8-18CCU, Taiwan

Wen-Nung Lie

whenpe=0 or 1 C=1 bit

whenpe=0.5 C=0 bit (no information transmitted)


20/106

8-19CCU, Taiwan

Wen-Nung Lie

Shannons first theorem for

noiseless coding Define the minimum average code length per source

symbol that can be achieved

For a zero-memory source (with finite ensemble (A, z) and

statistically independent source symbols)

Single symbol :

Block symbol : (n-tuple of symbols)

},...,,{ 21 JaaaA=},....,,{ 21 nJA =

)()...()()(21 jnjji

aPaPaPP =

)(),( zz HA)(),( zz HA

=

=n

J

i

ii PPH1

)(log)()( z

)()( zz nHH = Entropy of zero-memorysource with block symbols


21/106

8-20CCU, Taiwan

Wen-Nung Lie

N-extended source Use n symbols as a block symbol

Average word length of the n-extended code can be

= =

==

n nJ

i

J

i i

iiiavgP

PlPL1 1 )(

1log)()()(

1)()( +


22/106

8-21CCU, Taiwan

Wen-Nung Lie

Coding efficiency Coding efficiency :

Example :

Original source :

Second extension source :

avgL

zHn

=

)(

3/1)(,3/2)( 21 == aPaP

}9/1,9/2,9/2,9/4{=z

symbolbitsLavg /89.19/17 ==

symbolbitszH /83.1)( =

97.089.1/83.1 ==

symbolbitLavg /1= 918.01/918.0 ==


23/106

8-22CCU, Taiwan

Wen-Nung Lie

Shannons second theorem

for noisy coding For anyR < C (channel capacity of the zero-memory

channel with Q), there exists codes of block length rand

rateR such that the block decoding error can be arbitrarily

small

Rate distortion function

given a distortionD, the minimum rate at which

information about the source can be conveyed to the user

)],([min)( vzDQQ IDR =

})({ Ddqkj = QQD

ncompressiobysymbolsoutputtosymbolssourcefromyprobabilitthe:Q

d(Q) : average distortion

D: distortion threshold


24/106

8-23CCU, Taiwan

Wen-Nung Lie

Example of Rate-distortion

function Consider a zero-memory binary source with equally

probable source symbols {0,1}

R(D) is monotonically decreasing and convex in the interval (0,Dmax)

Shape of R-D function represents the coding efficiency of a coder

)(1)( DHDR bs=


25/106

8-24CCU, Taiwan

Wen-Nung Lie

Estimate the information

content (entropy) of an image Construct a source model based on the relative frequency

of occurrence of the graylevels

First-order estimate (consider histogram of single pixels) : 1.81

bits/pixel

Second-order estimate (consider histogram of pairs of adjacentpixels) : 1.25 bits/pixel

Third-order, fourth-order : approaching the source entropy

21

21

21

21

21

21

21

21

21

21

21

21

95

95

95

95

169

169

169

169

243

243

243

243

243

243

243

243

243

243

243

243

Example image


26/106

8-25CCU, Taiwan

Wen-Nung Lie

The difference between the higher-order estimates of entropy and

the first-order estimate indicates the presence of interpixelredundancy

If the pixels are statistically independent, the higher-order

estimates are equivalent to the first-order estimate and variable

length coding (VLC) provides optimal compression (i.e., no further

mapping is necessary)

First-order estimate of difference mapped source : 1.41 bits/pixel

1.41 > 1.25 there is an even better mapping


27/106

8-26CCU, Taiwan

Wen-Nung Lie

Error-free compression --

variable-length coding (VLC) Huffman coding

the most popular method to yield the smallest possible number of

code symbols per source symbol

construct the Huffman tree according to source symbol

probabilities Code the Huffman tree

Compute the source entropy, average code length, and code

efficiency


28/106

8-27CCU, Taiwan

Wen-Nung Lie

Huffman code is :

a block code (each symbol is mapped to a fixed sequence of bits)

instantaneous (decoded without referencing succeeding symbols)

uniquely decodable (any code word is not a prefix of another)

for example : 010100111100 a3 a1 a2 a2 a6

symbolbitsLavg /2.2= 973.0=H(z)=2.14 bits/symbol


29/106

8-28CCU, Taiwan

Wen-Nung Lie

Variations of VLC


30/106

8-29CCU, Taiwan

Wen-Nung Lie

Variations of VLC (cont.) Considered when a large number of source symbols are

considered Truncated Huffman coding

Only partial symbols (with most probabilities, here the top 12) are

encoded with Huffman code A prefix code (whose probability was the sum of other symbols,

here 10) followed with a fixed-length code is used to representall the other symbols

B-code Each code word is made up of continuationbits (C) and

informationbits (natural binary numbers)

Ccan be 1 or 0, but alternative

E.g., a11a2a7 001010101000010 or 101110001100110


31/106

8-30CCU, Taiwan

Wen-Nung Lie

Variations of VLC (cont.) Binary shift code

Divide the symbols into blocks of equal size

Code the individual elements within all blocks identically

Add shift-up or shift-down symbols to identify each block (here,

111) Huffman shift code

Select one reference block

Sum up probabilities of all other blocks and use it to determine theshift symbol by Huffman method (here, 00)


32/106

8-31CCU, Taiwan

Wen-Nung Lie

Arithmetic coding AC generates non-block codes (no

look-up table as in Huffman code) An entire sequence of source symbols

is assigned a single code word

The code word itself defines an

interval of real numbers between 0 and

1. As the number of symbols in the

message increases, the interval

becomes smaller (more information

bits to represent this real number)

Multiple symbols are integratedly

coded with a set of bits number of

bits per symbol is effectivelyfractional


33/106

8-32CCU, Taiwan

Wen-Nung Lie

Arithmetic coding (cont.) Example : coding of a1a2a3a3a4

Any real number between [0.06752, 0.0688] can representthe source symbols (e.g., 0.000100011=0.068359375)

Theoretically, 0.068 is enough to represent the source

symbol 3 decimal digits 0.6 decimal digits persymbol Actually, 0.068359375 9 bits for 5 symbols 1.8 bits

per symbol

As the length of the source symbol sequence increases, theresulting arithmetic code approaches the Shannons bound

Two factors limit the coding performance

An end-of-message indicator is necessary to separate onemessage sequence from another

The finite precision arithmetic

LZW (L l Zi W l h)


34/106

8-33CCU, Taiwan

Wen-Nung Lie

LZW (Lempel-Ziv-Welch)

coding Assign fixed-length code words to variable-length

sequences of source symbols but require no a prioriknowledge of the probabilities of occurrence of symbols

LZW compression has been integrated into GIF, TIFF, and

PDF formats For a 9-bit (512 words) dictionary, the latter half entries

are constructed in the encoding process. An entry of two ormultiple pixels can be possible (i.e., assigned with a 9 bits

code) If the size of the dictionary is too small, the detection of matching

gray-level sequences will be less likely.

If too large, the size of code words will adversely affectcompression performance


35/106

8-34CCU, Taiwan

Wen-Nung Lie

LZW coding (cont.) The LZW decoder builds an

identical decompressiondictionary as it decodes

simultaneously the bit

stream

Notice to handle the

dictionary overflow


36/106

8-35CCU, Taiwan

Wen-Nung Lie

Bit-plane coding Bit-plane decomposition

Binary : bit change between two adjacent codes may besignificant (e.g., 127 and 128)

Gray code : only 1 bit is changed between any two

adjacent codes

Gray-coded bit planes are less complex than the

corresponding binary bit planes

0

0

1

1

2

2

1

1 22...22 aaaa m

m

m

m ++++

2011

1

= = +

miagaag

mm

iii


37/106

8-36CCU, Taiwan

Wen-Nung LieBinary-coded gray-coded

Lossless compression for


38/106

8-37CCU, Taiwan

Wen-Nung Lie

Lossless compression for

binary images 1-D run-length coding

RLC+VLC according to run-lengths statistics

2-D run-length coding

used for FAX image compression Relative address coding (RAC)

based on the principle of tracking the binary transitions that

begin and end each black and white run combined with VLC

Other methods for lossless


39/106

8-38CCU, Taiwan

Wen-Nung Lie

Other methods for lossless

compression of binary images White block skipping

code the solid lines as 0s and all other lines with a 1 followed by

original bit patterns

Direct contour tracing

represent each contour by a single boundary point and a set ofdirectionals

Predictive differential quantizing (PDO)

a scan-line-oriented contour tracing procedure Comparisons of various algorithms

only entropies after pixel mapping are computed, instead of real

encoding (see next slice)

Comparison of various


40/106

8-39CCU, Taiwan

Wen-Nung Lie

Comparison of various

methods Run-length coding proved to

be the best coding method for

bit-plane coded images

2-D techniques (PDQ, DDC,

RAC) perform better when

compressing binary images orhigher bit-plane images

Gray-coded images proved to

gain additional 1 bit/pixel

compression efficiency relativeto binary-coded images

For binary image


41/106

8-40CCU, Taiwan

Wen-Nung Lie

Lossless predictive coding Eliminating the interpixel redundancies of closely spaced pixels

by extracting and coding only the new information (thedifference between the actual and predicted value) in each pixel

System architecture

the same predictor in the encoder and decoder sides rounding the predicted value

RLC symbol encoder

nnn ffe

=


42/106

8-41CCU, Taiwan

Wen-Nung Lie

Methods of prediction Use local, global, or adaptive predictor to generate

linear prediction : linear combination of mprevious pixels

m : order of prediction

: prediction coefficients

Use local neighborhoods (e.g., pixels-1,2,3,4) forprediction of pixel-X in 2-D images

special case : previous pixel predictor

nf

=

=

m

i

inin froundf1

i

2 3 4

1 X

Entropy reduction by


43/106

8-42CCU, Taiwan

Wen-Nung Lie

Entropy reduction by

difference mapping

e

e

e

e eep

2

2

1)(

=

Due to removal of interpixel redundancies by prediction, first-

order entropy of difference mapping will be lower than theoriginal image (3.96 bits/pixel vs. 6.81 bits/pixel)

The probability density function of the prediction errors is

highly peaked at zero and characterized by a relatively smallvariance (modeled by the zero mean uncorrelated Laplacian pdf)

e

e eep =

2)(


44/106

8-43CCU, Taiwan

Wen-Nung Lie

Lossy predictive coding Error-free encoding of images seldom results in more than

3:1 reduction in data

A lossy predictive coding model

the prediction in the encoder and decoder must be equivalent (same)

-- placing encoders predictor within a feedback loop

nnn fef+= &&

This closed loop

configuration preventserror buildup atdecoders output


45/106

8-44CCU, Taiwan

Wen-Nung Lie

Delta modulation (DM) DM :

the resulting bit rate is 1 bit/pixel

Slope overload effect

is too small to represent inputs large changes, lead to blurredobject edges

Granular noise effect

is too large to represent inputs small change, lead to grainy ornoisy surfaces

1

= nn ff &

>+=

otherwise

ee

n

n

0for

&


46/106

8-45CCU, Taiwan

Wen-Nung Lie

Delta modulation (cont)


47/106

8-46CCU, Taiwan

Wen-Nung Lie

Optimal predictors Minimize encoders mean-square prediction error

with

select mprediction coefficients to minimize

The solution :

}]{[}{ 22 nnn ffEeE =

nnnnnn ffefef =++= & =

=m

i

inin ff1

DPCM

=

=

2

1

2}{m

i

ininn ffEeE

rR1=

=

}{...}{

.....

.....

...}{}{

}{..}{}{

1

2212

12111

mnmnnmn

nnnn

mnnnnnn

ffEffE

ffEffE

ffEffEffE

R

=

}{

.

.

}{

}{

2

1

mnn

nn

nn

ffE

ffE

ffE

r

=

m

.

.

2

1


48/106

8-47CCU, Taiwan

Wen-Nung Lie

Optimal predictors (cont) Variance of prediction error

For a 2-D Markov source with separable autocorrelation

function

and 4-th order linear predictor

We generally have

to reduce the DPCM decoders susceptibility totransmission noise

= ==m

i

iinnT

e ffE1

222 }{ r

j

h

i

vjyixfyxfE 2)},(),({ =

)1,1(),1()1,1()1,(),( 4321 ++++= yxfyxfyxfyxfyxf

0,,, 4321 ==== vhvh

2 1

3 X

4

0.11

=

m

i

i

Confines the impact of an input error to asmall number of outputs


49/106

8-48CCU, Taiwan

Wen-Nung Lie

Examples of predictors Four examples :

The visually perceptible error decreases as the order of the

predictor increases

=

+=

+=

=

otherwise),1(97.0)1,(97.0),(

)1,1(5.0),1(75.0)1,(75.0),(

),1(5.0)1,(5.0),(

)1,(97.0),(

yxfvhifyxfyxf

yxfyxfyxfyxf

yxfyxfyxf

yxfyxf

Adaptive with local measure ofdirectional property


50/106

8-49CCU, Taiwan

Wen-Nung Lie

Optimal quantization The staircase quantization function is an odd function that

can be described by the decision ( ) and reconstruction ( )levels

Quantizer design is to select the best and for a

particular optimization criterion and input probabilitydensity functionp(s)

is it

is it


51/106

8-50CCU, Taiwan

Wen-Nung Lie

L-level Lloyd-Max quantizer Optimal in the mean-square error sense

2,...,2,10)()(

1

Ls

s i idsspts

i

i

==

= =

=

=

++

2

22

1,...,2,1

00

1

L

Ltt

i

ii

i

s

ii

iiii ttss == ,

ti is the centroid ofeach decision interval

Decision levels are halfway

of the reconstruction levels

Non-uniformquantizer

How about an optimal

uniform quantizer ?


52/106

8-51CCU, Taiwan

Wen-Nung Lie

Transform coding Subimage decomposition

Transformation

Decorrelate the pixels of each subimage

Energy packing

Quantization Eliminate coefficients that carry the least information

coding


53/106

8-52CCU, Taiwan

Wen-Nung Lie

Transform selection Requirements

orthonormal or unitary forward and inverse transformation kernels

basis function or basis images

separable and symmetric for the kernels

Types DFT

DCT

WHT (Walsh-Hadamard transform) Compression is achieved during the quantization of the

transformed coefficients (not during the transformation

step)


54/106

8-53CCU, Taiwan

Wen-Nung Lie

WHT The summation in the exponent is performed in modulo 2

arithmetic is thekth bit (from right to left) in the binary

representation ofz

==

=

+1

0

)()()()(

)1(1

),,,(),,,(

m

i

iiii vpybupxb

N

vuyxhvuyxg

mN 2=

)(zbk

=

=

=1

0

1

0

),,,(),(),(N

x

N

y

vuyxgyxfvuT

=

=

=1

0

1

0

),,,(),(),(N

u

N

v

vuyxhvuTyxf


55/106

8-54CCU, Taiwan

Wen-Nung Lie

DCT

+

+==

N

vy

N

uxvuvuyxhvuyxg

2

)12(cos

2

)12(cos)()(),,,(),,,(

=

==

1,...,2,1

0)(

2

1

Nufor

uforu

N

N

Approximations using DFT,


56/106

8-55CCU, Taiwan

Wen-Nung Lie

pp g ,

WHT, and DCT Truncating 50% of the

transformed coefficientsbased on maximum

magnitudes

The RMS error values are1.28 (DFT), 0.86 (WHT), and

0.68 (DCT).

DFT

WHT

DCT


57/106

8-56CCU, Taiwan

Wen-Nung Lie

Basis images A linear combination of basis images :

=

=

=1

0

1

0

),(n

u

n

v

uvvuT HF

=

),,1,1(..),,1,1(),,0,1(

.....

.....

....),,0,1(

),,1,0(..),,1,0(),,0,0(

vunnhvunhvunh

vuh

vunhvuhvuh

uvH

uvH2n


58/106


59/106

8-58CCU, Taiwan

Wen-Nung Lie

Subimage size selection The most popular subimage sizes are 88 and 1616.

A comparison of nn subimage for n=2,4,8,16 Truncating 75% of resulting coefficients

Curves of WHT and DCT flatten as size of subimage becomes greater

than 88, while FT does not Block effect decreases when subimage size increases


60/106

8-59CCU, Taiwan

Wen-Nung Lie

original 2x2

subimage

4x4

subimage

8x8

subimage


61/106

8-60CCU, Taiwan

Wen-Nung Lie

Zonal vs. threshold coding The coefficients are retained based on

Maximum variance : zonal coding Maximum magnitude : threshold coding

Zonal coding

Each DCT coefficient is considered as a random variable whose

distribution could be computed over the ensemble of all transformedsubimages

The variances can also be based on an assumed image model (e.g., aMarkov autocorrelation function)

Coefficients of maximum variances are usually located around the origin

Threshold coding

Select coefficients which have the largest magnitudes

Causing far less error than the zonal coding result

Bit allocation for


62/106

8-61CCU, Taiwan

Wen-Nung Lie

zonal coding The retained coefficients are

allocated bits according to : The dc coefficient is modeled by

a Rayleigh density function

The ac coefficients are often

modeled by a Laplacian or

Gaussian density

The number of bits is made

proportional to

Zonal

coding

Threshold

coding

2),(2log vuT

The information content of a Gaussian

random variable is proportional to )/(log2

2 D

Bit allocation in threshold


63/106

8-62CCU, Taiwan

Wen-Nung Lie

coding The transform coefficients of largest magnitude make the

most significant contribution to reconstructed subimagequality

Inherently adaptive in the sense that the location of the

retained transform coefficients vary from one subimage toanother

Retained coefficients are recorded in a 1-D zigzag ordering

pattern The mask pattern is run-length coded


64/106

8-63CCU, Taiwan

Wen-Nung Lie

Zonal mask Bit allocation

Thresholded and

retained coefficients

Zigzag ordering pattern

Bit allocation in threshold


65/106

8-64CCU, Taiwan

Wen-Nung Lie

coding (cont) Thresholding the transform coefficients -- the threshold

can be varied as a function of the location of eachcoefficient within the subimage

Result in a variable code rate, but offer the advantage that

thresholding and quantization can be combined by replacing the

masking (i.e., ) with

The elementZ(u, v) can be scaled to achieve a variety of

compression levels

),(),( vuTvu

]),(

),([),(

vuZ

vuTroundvuT =

Z(u,v) is the transform normalization array

),(),(),( vuZvuTvuT =& Recovered or denormalized value

Normalization (quantization)


66/106

8-65CCU, Taiwan

Wen-Nung Lie

matrix Used in JPEG standard


67/106

8-66CCU, Taiwan

Wen-Nung Lie

Wavelet coding Requirements in encoding an image

An analyzing wavelet Number of decomposition levels

The digital wavelets transform (DWT) converts a largeportion of the original image to horizontal, vertical anddiagonal decomposition coefficients with zero mean andLaplacian-like distributions.


68/106

8-67CCU, Taiwan

Wen-Nung Lie

Wavelet coding (cont) Many computed coefficients can be quantized and coded to

minimize intercoefficient and coding redundancy The quantization can be adapted to exploit any positional

correlation across different decomposition levels

Subdivision of the original image is unnecessary

Eliminating the blocking artifact that characterizes DCT-based

approximations at high compression ratios


69/106

8-68CCU, Taiwan

Wen-Nung Lie

1-D Subband decomposition Analysis filter :

Low-pass : approximation High-pass : detail

Synthesis filter :

Low-pass : High-pass :

)(0 ng)(1 ng

)(1 nh

)(0 nh

2-D Subband image coding


70/106

8-69CCU, Taiwan

Wen-Nung Lie

4 filter bank

An example of Daubechies


71/106

8-70CCU, Taiwan

Wen-Nung Lie

orthonormal filters

Comparison between DCT-


72/106

8-71CCU, Taiwan

Wen-Nung Lie

based and DWT-based coding A noticeable decrease of error in the wavelet coding results (based on

compression ratios of 34:1 and 67:1)

DWT-based DCT-based

Rms error :

2.29, 2.96

Rms error :

3.42, 6.33


73/106

8-72CCU, Taiwan

Wen-Nung Lie

Rms error :

3.72, 4.73

Based on compression ratios of

108:1 and 167:1 Even the 167:1 DWT result

(rms=4.73) is better than 67:1

DCT result (rms=6.33)

At more than twice ofcompression ratio, the wavelet-

based reconstruction has only

75% of the error of the DCT-

based result


74/106

8-73CCU, Taiwan

Wen-Nung Lie

Wavelet selection The wavelet selection affects all aspects of wavelet coding

system design and performance Computational complexity of the transform

Systems ability to compress and reconstruct images of acceptable

error

Include the decomposition filters and reconstruction filters

Useful analysis property : number of zero moments

Important synthesis property : smoothness of reconstruction

The most widely used expansion functions Daubechies wavelets

Bi-orthogonal waveletsAllows filters with binary coefficients (numbers of theform k/2a)

Comparison between different


75/106

8-74CCU, Taiwan

Wen-Nung Lie

wavelets (1) Harr wavelets : the

simplest

(2) Daubechies wavelets :

the most popular imaging

wavelets

(3) Symlets : an extensionof Daubechies with

increased symmetry

(4) Cohen-Daubechies-

Feauveau wavelets :biorthogonal wavelets

1,2

3,4

Comparison between different


76/106

8-75CCU, Taiwan

Wen-Nung Lie

wavelets (cont) Comparisons in number of operations required

As the computational complexity increases, the informationpacking ability does as well

For analysis and reconstruction filters, respectively


77/106

8-76CCU, Taiwan

Wen-Nung Lie

Other issues The number of decomposition levels required

The initial decompositions are responsible for the majority of the datacompression (3 levels is enough)

Quantizer design

Introduce an enlarged quantization interval around zero (i.e., the deadzone)

Adapting the size of the quantization interval from scale to scale

Quantization of coefficients at more decomposition levels impacts largerareas of the reconstructed image

Binary image compression


78/106

8-77CCU, Taiwan

Wen-Nung Lie

standards Most of the standards are issued by

International Standardization Organization (ISO)

Consultative Committee of the International Telephone and Telegraph(CCITT)

CCITT Group 3 and 4 are for binary image compression

Originally designed as fasimile (FAX) coding method

G3 : Nonadaptive, 1-D run-length coding

G4 : a simplified or streamlined version of G3, only 2-D coding

The coding approach is quite similar to the RAC method

Joint Bilevel Imaging Group (JBIG)

A joint committee of CCITT and ISO

Proposed JBIG1: adaptive arithmetic compression technique (the bestaverage and worst-case available)

Proposed JBIG2 : achieve compressions 2 to 4 times greater than JBIG1

Continuous-tone image


79/106

8-78CCU, Taiwan

Wen-Nung Lie

compression -- JPEG JPEG, wavelet-based JPEG-2000, JPEG-LS

JPEG define three coding system Lossy baseline coding system

Extended coding system for greater compression, higherprecision, or progressive reconstruction applications

Lossless independent coding system

A product or system must include support for the baseline system

Baseline system

Input, output : 8 bits Quantized DCT values : 11 bits

Subdivided into blocks of 8x8 pixels for encoding

Pixels are level shifted by substracting 128 graylevels


80/106

8-79CCU, Taiwan

Wen-Nung Lie

JPEG (cont) DCT-transformed

Quantized by using the quantization or normalization matrix The DC DCT coefficients is difference coded relative to the DC

coefficient of the previous block

The non-zero AC coefficients are coded by using a VLC that

defines the coefficients value and number of preceding zeros The user is free to construct custom tables and/or arrays, which may

be adapted to the characteristics of the image being compressed

Add a special EOB (end of block) code behind the last non-zero

AC coefficient

Huffman Coding for the DC


81/106

8-80CCU, Taiwan

Wen-Nung Lie

and AC coefficients VLC = base code (category code) + value code

An example to encode DC = -9

Category 4 : -15,,-8,8,,15 (base code = 101)

Total length = 7 value code length = 4

For a category K, additional Kbits are needed as the value code andcomputed as either the KLSBs (positive value) or the KLSBs of 1scomplement of its absolute value (negative value) : K=4, value code =0110

Complete code : 101-0110

An example to encode AC = (0,-3)

length of 0 run : 0, AC coefficient value : -3

AC = -3 category = 2 0/2 (Table 8.19)base code = 01

Value code = 00

Complete code = 0100


82/106

8-81CCU, Taiwan

Wen-Nung Lie

A JPEG coding example52 55 61 66 70 61 64 73

63 59 66 90 109 85 69 72

62 59 68 113 144 104 66 73

63 58 71 122 154 106 70 69

67 61 68 104 126 88 68 70

79 65 60 70 77 68 58 75

85 71 64 59 55 61 65 8387 79 69 68 65 76 78 94

-76 -73 -67 -62 -58 -67 -64 -55

-65 -69 -62 -38 -19 -43 -59 -56

-66 -69 -60 -15 16 -24 -62 -55

-65 -70 -57 -6 26 -22 -58 -59

-61 -67 -60 -24 -2 -40 -60 -58

-49 -63 -68 -58 -51 -65 -70 -53

-43 -57 -64 -69 -73 -67 -63 -45-41 -49 -59 -60 -63 -52 -50 -34

Level

shiftedoriginal

-415 -29 -62 25 55 -20 -1 3

7 -21 -62 9 11 -7 -6 6

-46 8 77 -25 -30 10 7 -5

-50 13 35 -15 -9 6 0 3

11 -8 -13 -2 -1 1 -4 1

-10 1 3 -3 -1 0 2 -1

-4 -1 2 -1 2 -3 1 -2

-1 1 -1 -2 -1 -1 0 -1

DCT

transformed


83/106

8-82CCU, Taiwan

Wen-Nung Lie

-26 -3 -6 2 2 0 0 0

1 -2 -4 0 0 0 0 0

-3 1 5 -1 -1 0 0 0

-4 1 2 -1 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

quantized

1-D Zigzag scanning : [-26 3 1 3 2 6 2 4 1 4 1 1 5 0 2 0 0 1 2 0 0 0 0 0 1 1 EOB]

Symbols to be coded : (DPCM =-9, assumed),(0,-3),(0,1),(0,-3),(0,-2),(0,-6),(0,2),(0,-4),(0,1),(0,-4),(0,1),(0,1),(0,5),(1,2),(2,-1),(0,2),(5,-1),(0,-1),EOB

Final codes :

1010110,0100,001,0100,0101,100001,0110,100011,001,100011,001,001,100101,1110011

0,110110,0110,11110100,000,1010


84/106


85/106

8-84CCU, Taiwan

Wen-Nung Lie


86/106

8-85CCU, Taiwan

Wen-Nung Lie

JPEG 2000


87/106

8-86CCU, Taiwan

Wen-Nung Lie

JPEG-2000 Increased flexibility to the access of compressed data

Portions of a JPEG 2000 compressed image can be extracted forre-transmission, storage, display, and editing

Procedures

DC level shift by substracting 128

Optionally decorrelated by using reversible or non-reversible linear

combination of components (e.g., RGB to YCrCb component

transformation) (i.e., RGB-based or YCrCb-based JPEG-2000

coding are allowed)

),(08131.0),(41869.0),(5.0),(

),(5.0),(33126.0),(16875.0),(

),(114.0),(587.0),(299.0),(

2102

2101

2100

yxIyxIyxIyxY

yxIyxIyxIyxY

yxIyxIyxIyxY

=

+=

++=

Y1 and Y2 are different images highly peaked around zero

JPEG 2000 ( t)


88/106

8-87CCU, Taiwan

Wen-Nung Lie

JPEG-2000 (cont) Optionally divided into tiles (rectangular arrays of pixels which

can be extracted and reconstructed independently), providing a

mechanism for accessing a limited region of a coded image

2-D DWT by using a biorthogonal 5-3 coefficient filter (lossless)

or a 9-7 coefficient filter (lossy) produces a low-resolution

approximation and horizontal, vertical, and diagonal frequencycomponents (LL, HL,LH, HH bands)


89/106

8-88CCU, Taiwan

Wen-Nung Lie

9-7 filter for lossy DWT

i H0 H1

0 6/8 11 2/8 -1/2

2 -1/8

5-3 filter for lossless DWT

JPEG 2000 ( t)


90/106

8-89CCU, Taiwan

Wen-Nung Lie

JPEG-2000 (cont) A fast DWT algorithm or a lifting-basedapproach is often used

Iterative decomposition of the approximation part to obtain :

Important visual information is concentrated in a few coefficients

Difference from subbanddecomposition

The standard does not specify

the number of scales (i.e., thenumber of decomposition

levels) to be computed


91/106

8-90CCU, Taiwan

Wen-Nung Lie

JPEG 2000 ( t)


92/106

8-91CCU, Taiwan

Wen-Nung Lie

JPEG-2000 (cont) Coefficient quantization adapted to individual scales and subbands

The number of exponent and mantissa bits should be provided tothe decoder on a subband basis, called explicit quantization, or forthe LL subbands only, called implicit quantization (the remainingsubbands are quantized using extrapolated LL subband parameters)

]),(

[)],([),(b

bbb

vuafloorvuasignvuq

=

)2

1(2:sizesteponquantizati11

bR

b

bb +=

Rb is the nominal dynamic range (in bits) of subbandb, b andb are the number of bits allocated to the exponent and

mantissa of subbands coefficients

For error-free compression, b = 0,Rb = b, and b=1

JPEG 2000 ( t)


93/106

8-92CCU, Taiwan

Wen-Nung Lie

JPEG-2000 (cont) Quantized coefficients are arithmetically coded on a bit-plane basis

The coefficients are arranged into rectangular blocks called code

blocks, which are individually coded a bit plane at a time Starting from the MSB bit plane with nonzero elements to the LSB bit

plane

Encoding by using the context-based arithmetic coding method

The coding outputs from each code block are grouped to formlayers

The resulting layers are finally partitioned intopackets

The encoder can encode Mb

bitplanes for a particularsubband, the decoder can decode only Nbbitplanes, due tothe embedded nature of the code stream


94/106

8-93CCU, Taiwan

Wen-Nung Lie

empty

empty

empty

empty

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

code-block 1

bit-stream code-block 2

bit-stream

code-block 3


bit-stream

code-block 5


bit-stream

code-block 7

bit-stream

N decomposition level N-1 decomposition levelN+1 decomposition level

R-D / Quality

Resolution

Video compression -- motion-

compensated transform coding


95/106

8-94CCU, Taiwan

Wen-Nung Lie

compensated transform coding Standards

Video teleconferencing standard (H.261, H.263, H.263+, H.320, H.264)

Multimedia standard (MPEG-1/2/4)

A combination of predictive coding (in temporal domain) and

transform coding (in spatial domain)

Estimate (predict) the image by simply propagating pixels in the

previous frame along their motion trajectory motion

compensation

The success depends on accuracy, speed, and robustness of thedisplacement estimator -- motion estimation

Motion estimation is based on each image block (1616 or 88

pixels)

Motion-compensated

transform coding encoder


96/106

8-95CCU, Taiwan

Wen-Nung Lie

transform coding - encoder

Motion-compensated

transform coding decoder


97/106

8-96CCU, Taiwan

Wen-Nung Lie

transform coding -- decoder

VLDVLDBUFBUF ITIT

MCMC

PredictorPredictor

ReconstructedReconstructed

FrameFrame

Motion VectorsMotion Vectors

++

Compressedbit stream

IT : inverse DCT transform

VLD : variable length decoding

2-D Motion estimation

techniques


98/106

8-97CCU, Taiwan

Wen-Nung Lie

techniques Assumptions :

Rigid translational movements only

Uniform illumination in time and space

Methods : Optical flow approach

Pel-recursive approach

Block-matching approach

Different from motion estimation in target tracking

Causal operations (forward prediction, no future frames at decoder) Goal: reproduce pictures with minimum distortion, not necessarily

estimate accurate motion parameters

Block-Matching (BM) Motion

Estimation


99/106

8-98CCU, Taiwan

Wen-Nung Lie

Estimation Find the best match between current image block and

candidates in the previous frame by minimizing the following cost

One motion vector (MV) for each block between successive frames

frame k

search window

block

frame k-1

),( yx mvmv

++blockyx

yxnn mvymvxxyxx),(

1 ),(),(

BM motion estimation

techniques


100/106

8-99CCU, Taiwan

Wen-Nung Lie

techniques Full search method

Fast Search (Reduced Search)

Usually a multistage search procedure that calculates fewer search points

Local-(or Sub-)optima in general

Evaluation number of search points, noise immunity

Existing Algorithms : Three-step method

Four-step method

Log-search method

Hierarchical matching method

Prediction search method

Kalman prediction method


101/106

Three steps search


102/106

8-101CCU, Taiwan

Wen-Nung Lie

Three steps search The full search examines 1515=225 positions, the three step search examines

9+8+8=25 positions (assume MV range is 7 to +7)

Search range

Predictive search


103/106

8-102CCU, Taiwan

Wen-Nung Lie

Predictive search Explore the similarity of MVs between adjacent macroblocks

Obtain initially predicted MV from neighboring macroblocks by

averaging

Prediction followed by a small-area detailed search

Significantly reduce the CPU time by 4~10X

Search area

MV1 MV2

MV3

MVpredict=(MV1+MV2+MV3)/3

Fractional pixel motion

estimation


104/106

8-103CCU, Taiwan

Wen-Nung Lie

estimation Question : what happens if motion vector are real numbers instead

of integer numbers ?

Answer : Find integer motion vector first and then do interpolation fromsurrounding pixels for refined search. Most popular case is the half-pixel

accuracy (i.e., 1 out of 8 positions for refinement). 1/3 pixel accuracy will be

proposed in H.263L standard.

Half-pixel accuracy often leads to a matching with less residue (highreconstruction quality), hence decreases the bit rate required in following

encoding.

),( yx mvmv

: integer MV position

: candidates of half-pixel MV positions

Evaluation of BM motion

estimation


105/106

8-104CCU, Taiwan

Wen-Nung Lie

estimation Advantages :

Straightforward, regular operation, and promising for VLSI chipdesign

Robustness (no differential operations as in optical flow approach)

Disadvantages :

Can not process several moving objects in a block (e.g., around

occlusion boundaries)

Improvements :

Post-processing of images to eliminate blocking effect

Interpolation of images or averaging of motion vectors to subpixel

accuracy

Image Prediction structure


106/106

8-105CCU, Taiwan

Wen-Nung Lie

Image Prediction structure I/ P frame only (H.261, H.263)

I/ P/ B frame (MPEG-x)

. . . . .I P P P

forward prediction

P P

. . . . .I B B B P B B B P B B B P B

forward prediction

bidirectional

prediction

Intra-frame

Predictiveframe

Bi-directional

frame

Documents

Lie Gonzalez Ch8 ppt