41
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1 Introduction Much of the information is in form of images Images are handled by machines as a matrix of digital picture elements, or pixels The appearance of an image depends on image type resolution

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Embed Size (px)

Citation preview

Page 1: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1

Introduction

Much of the information is in form of images Images are handled by machines as a matrix

of digital picture elements, or pixels The appearance of an image depends on

image type resolution

Page 2: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2

Types of images & Resolution

bilevel (black & white) e.g. faxes

grayscale color

dot per inches (dpi) 600 x 600 – actual medium quality laser printer 1200 x 1200 – low cost phototypesetter 4800 x 4800 – high resolution phototypesetter

Page 3: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3

Bilevel images: CCITT fax standard

fax: facsimile CCITT Comité Consultatif International Téléphonique et

Télégraphique, it is part of the ITU International

Telecommunication Union, one of the specialized agencies of the United Nations

In the late 70s CCITT starts thinking about a standard for fax transmission

1980 CCITT Group 3 standard group 1 & 2 are earlier attempt, which use simpler

encoding and modulations techniques, resulting in very slow transmissions

Page 4: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4

CCITT Group 3 - I

It is the most common standard for fax transmission

It is accepted worldwide, almost every fax machine supports this standard

It uses compression algorithms for bilevel images

Page 5: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

5

CCITT Group 3 - II Paper size: international A4 (not US letter) standard resolution 204x98 dpi (200x100) high resolution 204x196 dpi (200x200)

1728 bits/line

1188 lines/page

bilevel image 1 bit/pixel image size: 1728x1188 bits at standard

resolution about 2 Mbit Transmission rate: 4.8 Kbit/s

today is usually higher, 14.4 – 33.6 Kbit/s At 4.8Kbit/s in std resolution one page would

take about 430 sec, but only 1 minute on average with Group 3 algorithms

Page 6: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 6

Run-length coding

Each scan line is composed by sequences of pixel of the same color

Count the number of element of each run

Example 3w 4b 9w 2b 2w 6b 5w 2b 5w...

Page 7: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7

G3 1D

Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes runlengths using a predefined Huffman code

In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length

Page 8: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8

G3 1D

1000 011 10100 11 0111 0010 ...

predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents

Page 9: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9

G3 1D As one line has 1728 bits, we have to define a

codeword for all 1728 black and white run lengths

As shorter runs occur more frequently that longer runs, we code each run length in an additive form

there is a terminating and makeup codeword Lengths form 0 to 63 are coded with a single terminating

codeword Longer runs are coded with one or more makeup

codewords and a terminating codeword Each line is terminated with a EOL symbol composed of

eleven 0 and one 1

Page 10: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

10

G3 2D Group 3 Two-Dimensional coding (G3 2D) is

called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate)

Many images have a high degree of vertical coherence between consecutive lines

changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line

Page 11: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11

G3 2D

Nearby means within an interval of radius 3 pixels

If there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D)

On the opposite if the ref line has a run with no counterpart in the current line special pass code

Page 12: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

12

G3 2D

current line

reference line

generated code

vertical mode

-1 0

horizontal mode

pass code

from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3

<mode | length of preceding white run | length of black run>

0001

vertical mode

+2 -2

...

Page 13: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13

G3 2D

Two dimensional coding is more prone to transmission errors

In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword

Here an error in the reference line is likely propagated in all the other lines

For this reason there are 1 reference line for each k lines (i.e. k-1 are coded w.r.t. each ref line)

standard resolution k=2 high resolution k=4

Page 14: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

14

CCITT fax standard compression performances

Standard resolution (~200x100 dpi) G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 Kbps G3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps

High resolution (~200x200 dpi) G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps

Compression is very good for office image where run lengths are long

It would be very bad for bilevel natural images

Page 15: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15

Continuous-tone images: why lossless compression?

lossy compression is often preferred to have remarkably more compressed images, with good quality

However there are some situations in which using an approximation may not be adequate

medical images historical documents images with legal relevance

Page 16: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

16

Continuous-tone images: lossless compression

GIF standard PNG standard JPEG-LS

It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’

extimation of pixel value using quite simple context: effective and low cost solution

www.hpl.hp.com/loco

Page 17: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 17

GIF image format - I

Adopted by CompuServe to minimize the time required to download images over a modem link

The most widely used lossless image format until 1995

8-bit pixel description 256 color images, but it is possible to use a

color map

Page 18: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 18

GIF image format - II

The color map can be specified for each image or can be omitted

if specified, it is included as an header into image file, in uncompressed form

color map is composed of 256 24-bit entries, that specify 256 RGB colors

Compression scheme used is LZW Alphabet symbols are the 256 colors of the color

map plus a “clear” code and an “end-of-information” code

Page 19: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

19

GIF image format - III Even if this feature is not widely used, GIF files

may contain more than one image, and it is possible to share the color map

LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it

In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW

This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements

Page 20: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 20

PNG image format - I Portable Network Graphics (pronounced “ping”) it uses gzip compression scheme through some improvements compression obtained

is about 10-30% better than GIF By default it encodes the pixels in raster scan order,

but some other methods are available it is possible to code horizontal difference, i.e. the

difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixel

average difference, the difference with the average of above and next pixel

...

Page 21: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 21

PNG image format - II

It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color

GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background

It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF

Page 22: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 22

Continuous-tone images: why lossy compression?

Digital images are yet an approximation of the real analog phenomenon

lossy techniques allow to obtain very good compression with a modest lost of details

This is useful for storing and trasmitting images

Page 23: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 23

Continuous-tone images: lossy compression

JPEG JPEG2000

a new image coding system that uses state-of-the-art compression techniques based on wavelet technology

file extension .jp2 With very compressed files, if image size is

the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images

Page 24: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 24

JPEG format - I

JPEG is a standard defined by the Joint Photographic Experts Group in 1992

It was conceived to transmit images at 64 Kbps

It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard)

With lossy mode it allows to obtain very good quality at about 1 bit/pixel

Implementation complexity is reasonable

Page 25: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 25

JPEG format - II

It could be used with graylevel and color images

Each channel of the color space (RGB, YUV...) is treated separately

it allows progressive transmission (that is much better suited for WWW than raster transmission)

Page 26: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Raster vs. progressive transmission

Page 27: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 27

JPEG Coder - I

Binary Binary EncoderEncoder

DiscreteDiscreteCosineCosine

TransformTransform

QuantizationQuantization

Page 28: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 28

JPEG Coder - II

Image is divided in 8x8-pixel squares Preprocessing Apply Discrete Cosine Transform on

each square Coefficient quantization Bit stream encoding

Page 29: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29

Preprocessing: color space transformation & downsampling

from RGB into YUV The Y component represents the brightness

of a pixel, and the U and V components together represent the hue and saturation

Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively

4:4:4 no downsampling 4:2:2 horizontal downsampling of a factor 2 4:2:0 both horizontal and vertical downsampling

Page 30: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 30

Discrete Cosine Transform - I

The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers

It is used in JPEG because it is fast and quite easy to implement efficiently

Page 31: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 31

Discrete Cosine Transform - II

where the block is pixels (in JPEG, 8x8) A(i,j) is the value of pixel of position (i,j) is the DCT coefficient of position low values for corresponds to low vertical

frequencies, low values for to low horizontal frequencies

Generally higher frequencies have very low values

1 2N N

1 2B(k ,k ) 1 2(k ,k )

1k2k

Page 32: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 32

Discrete Cosine Transform - III

DCT function basis

each 8x8 square is reduced to 64 coefficient

Page 33: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 33

Discrete Cosine Transform - IV

Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square

But finite precision quantization of the coefficients (always) Some coefficient related to high frequency are not

transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible

Page 34: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

34

Quantization - I

The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor

the factor for each component has been decided based on human sensitivity to changes at each frequency

In practice the matrix of factor is usually

Page 35: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 35

Quantization - II

Next, all values are rounded to nearest integer This leads to a quite high number of 0s in the

high frequency zone, as factors are bigger

Page 36: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 36

Zig-zag scan

Low frequency coefficients are transmitted before higher frequency coefficients

This allows for progressive visualization of this 8x8 block

Page 37: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 37

Raster vs. progressive transmission

Raster transmission DCT coefficient of the upper left block, then

those of all the others in the upper part of the image and so on

Progressive transmission first all (0,0) coefficients, than all (0,1) and

so on, following zig-zag scan in each block

Page 38: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 38

Binary coding

DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value

For this reason it is convenient to encode the difference from the previous value

Tipically the bit stream is coded with Huffman It is possible to use arithmetic scheme, gaining some

compression at cost of decoding speed Huffman codes are predefined, or it is possible

to build optimal tables and insert them in the stream

Page 39: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 39

JPEG Decoder

Some values are lost!

Binary Binary DecoderDecoder

Inverse DCTInverse DCT

DequantizationDequantization

Good quality, but reconstruction is not exact

Page 40: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 40

JPEG performances - I

Page 41: Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by

41

JPEG performances - IIOriginal Quality factor 75

Quality factor 20 Quality factor 3