Upload
hugh-wagar
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1
Introduction
Much of the information is in form of images Images are handled by machines as a matrix
of digital picture elements, or pixels The appearance of an image depends on
image type resolution
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2
Types of images & Resolution
bilevel (black & white) e.g. faxes
grayscale color
dot per inches (dpi) 600 x 600 – actual medium quality laser printer 1200 x 1200 – low cost phototypesetter 4800 x 4800 – high resolution phototypesetter
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3
Bilevel images: CCITT fax standard
fax: facsimile CCITT Comité Consultatif International Téléphonique et
Télégraphique, it is part of the ITU International
Telecommunication Union, one of the specialized agencies of the United Nations
In the late 70s CCITT starts thinking about a standard for fax transmission
1980 CCITT Group 3 standard group 1 & 2 are earlier attempt, which use simpler
encoding and modulations techniques, resulting in very slow transmissions
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4
CCITT Group 3 - I
It is the most common standard for fax transmission
It is accepted worldwide, almost every fax machine supports this standard
It uses compression algorithms for bilevel images
5
CCITT Group 3 - II Paper size: international A4 (not US letter) standard resolution 204x98 dpi (200x100) high resolution 204x196 dpi (200x200)
1728 bits/line
1188 lines/page
bilevel image 1 bit/pixel image size: 1728x1188 bits at standard
resolution about 2 Mbit Transmission rate: 4.8 Kbit/s
today is usually higher, 14.4 – 33.6 Kbit/s At 4.8Kbit/s in std resolution one page would
take about 430 sec, but only 1 minute on average with Group 3 algorithms
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 6
Run-length coding
Each scan line is composed by sequences of pixel of the same color
Count the number of element of each run
Example 3w 4b 9w 2b 2w 6b 5w 2b 5w...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7
G3 1D
Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes runlengths using a predefined Huffman code
In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8
G3 1D
1000 011 10100 11 0111 0010 ...
predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9
G3 1D As one line has 1728 bits, we have to define a
codeword for all 1728 black and white run lengths
As shorter runs occur more frequently that longer runs, we code each run length in an additive form
there is a terminating and makeup codeword Lengths form 0 to 63 are coded with a single terminating
codeword Longer runs are coded with one or more makeup
codewords and a terminating codeword Each line is terminated with a EOL symbol composed of
eleven 0 and one 1
10
G3 2D Group 3 Two-Dimensional coding (G3 2D) is
called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate)
Many images have a high degree of vertical coherence between consecutive lines
changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11
G3 2D
Nearby means within an interval of radius 3 pixels
If there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D)
On the opposite if the ref line has a run with no counterpart in the current line special pass code
12
G3 2D
current line
reference line
generated code
vertical mode
-1 0
horizontal mode
pass code
from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3
<mode | length of preceding white run | length of black run>
0001
vertical mode
+2 -2
...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13
G3 2D
Two dimensional coding is more prone to transmission errors
In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword
Here an error in the reference line is likely propagated in all the other lines
For this reason there are 1 reference line for each k lines (i.e. k-1 are coded w.r.t. each ref line)
standard resolution k=2 high resolution k=4
14
CCITT fax standard compression performances
Standard resolution (~200x100 dpi) G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 Kbps G3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps
High resolution (~200x200 dpi) G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps
Compression is very good for office image where run lengths are long
It would be very bad for bilevel natural images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15
Continuous-tone images: why lossless compression?
lossy compression is often preferred to have remarkably more compressed images, with good quality
However there are some situations in which using an approximation may not be adequate
medical images historical documents images with legal relevance
16
Continuous-tone images: lossless compression
GIF standard PNG standard JPEG-LS
It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’
extimation of pixel value using quite simple context: effective and low cost solution
www.hpl.hp.com/loco
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 17
GIF image format - I
Adopted by CompuServe to minimize the time required to download images over a modem link
The most widely used lossless image format until 1995
8-bit pixel description 256 color images, but it is possible to use a
color map
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 18
GIF image format - II
The color map can be specified for each image or can be omitted
if specified, it is included as an header into image file, in uncompressed form
color map is composed of 256 24-bit entries, that specify 256 RGB colors
Compression scheme used is LZW Alphabet symbols are the 256 colors of the color
map plus a “clear” code and an “end-of-information” code
19
GIF image format - III Even if this feature is not widely used, GIF files
may contain more than one image, and it is possible to share the color map
LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it
In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW
This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 20
PNG image format - I Portable Network Graphics (pronounced “ping”) it uses gzip compression scheme through some improvements compression obtained
is about 10-30% better than GIF By default it encodes the pixels in raster scan order,
but some other methods are available it is possible to code horizontal difference, i.e. the
difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixel
average difference, the difference with the average of above and next pixel
...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 21
PNG image format - II
It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color
GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background
It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 22
Continuous-tone images: why lossy compression?
Digital images are yet an approximation of the real analog phenomenon
lossy techniques allow to obtain very good compression with a modest lost of details
This is useful for storing and trasmitting images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 23
Continuous-tone images: lossy compression
JPEG JPEG2000
a new image coding system that uses state-of-the-art compression techniques based on wavelet technology
file extension .jp2 With very compressed files, if image size is
the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 24
JPEG format - I
JPEG is a standard defined by the Joint Photographic Experts Group in 1992
It was conceived to transmit images at 64 Kbps
It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard)
With lossy mode it allows to obtain very good quality at about 1 bit/pixel
Implementation complexity is reasonable
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 25
JPEG format - II
It could be used with graylevel and color images
Each channel of the color space (RGB, YUV...) is treated separately
it allows progressive transmission (that is much better suited for WWW than raster transmission)
Raster vs. progressive transmission
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 27
JPEG Coder - I
Binary Binary EncoderEncoder
DiscreteDiscreteCosineCosine
TransformTransform
QuantizationQuantization
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 28
JPEG Coder - II
Image is divided in 8x8-pixel squares Preprocessing Apply Discrete Cosine Transform on
each square Coefficient quantization Bit stream encoding
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29
Preprocessing: color space transformation & downsampling
from RGB into YUV The Y component represents the brightness
of a pixel, and the U and V components together represent the hue and saturation
Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively
4:4:4 no downsampling 4:2:2 horizontal downsampling of a factor 2 4:2:0 both horizontal and vertical downsampling
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 30
Discrete Cosine Transform - I
The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers
It is used in JPEG because it is fast and quite easy to implement efficiently
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 31
Discrete Cosine Transform - II
where the block is pixels (in JPEG, 8x8) A(i,j) is the value of pixel of position (i,j) is the DCT coefficient of position low values for corresponds to low vertical
frequencies, low values for to low horizontal frequencies
Generally higher frequencies have very low values
1 2N N
1 2B(k ,k ) 1 2(k ,k )
1k2k
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 32
Discrete Cosine Transform - III
DCT function basis
each 8x8 square is reduced to 64 coefficient
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 33
Discrete Cosine Transform - IV
Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square
But finite precision quantization of the coefficients (always) Some coefficient related to high frequency are not
transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible
34
Quantization - I
The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor
the factor for each component has been decided based on human sensitivity to changes at each frequency
In practice the matrix of factor is usually
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 35
Quantization - II
Next, all values are rounded to nearest integer This leads to a quite high number of 0s in the
high frequency zone, as factors are bigger
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 36
Zig-zag scan
Low frequency coefficients are transmitted before higher frequency coefficients
This allows for progressive visualization of this 8x8 block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 37
Raster vs. progressive transmission
Raster transmission DCT coefficient of the upper left block, then
those of all the others in the upper part of the image and so on
Progressive transmission first all (0,0) coefficients, than all (0,1) and
so on, following zig-zag scan in each block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 38
Binary coding
DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value
For this reason it is convenient to encode the difference from the previous value
Tipically the bit stream is coded with Huffman It is possible to use arithmetic scheme, gaining some
compression at cost of decoding speed Huffman codes are predefined, or it is possible
to build optimal tables and insert them in the stream
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 39
JPEG Decoder
Some values are lost!
Binary Binary DecoderDecoder
Inverse DCTInverse DCT
DequantizationDequantization
Good quality, but reconstruction is not exact
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 40
JPEG performances - I
41
JPEG performances - IIOriginal Quality factor 75
Quality factor 20 Quality factor 3