9
Entropy-coded pyramid vector quantisation for interband wavelet image coding M.Vij and N.Kingsbury Abstract: Novel class-based entropy coding algorithms for lattice quantised hierarchical (or interband) vectors are presented. The vectors are formed from wavelet coefficients on different scales from similarly oriented sub-bands corresponding to the same spatial location. Structures have been designed specifically for interband vectors drawn from wavelet coefficients, where large groups of approximately equiprobable lattice points are grouped into relatively few classes, known as sub-classes, super-classes and super-super-classes, enabling accurate probability estimates from training data to be obtained for entropy coding of the class indices. Further, it has been found that the best quantiser is a combination of the 2, and D, lattices which has been termed an augmented lattice. The performance of the method, entitled entropy-coded pyramid vector quantisation (ECPVQ), is evaluated on real images and the results show that ECPVQ is competitive, particularly at low bit-rates, with current state-of-the-art wavelet-based coders. A subjective comparison with current high performance scalar quantisation based coders shows that ECPVQ is likely to better preserve fine texture detail in the decoded images because of the finer quantisation of low energy wavelet coefficients that occurs with the augmented lattice. 1 Introduction Current high performance image coding algorithms (e.g. [1-5]) tend to consist of a sub-band transform [e.g. the discrete wavelet transform (DWT)] followed by scalar quantisation (SQ) and entropy coding. In this paper, an alternative to SQ is examined which is known as vector quantisation (VQ). However, for a high bit-rate of R bitdsample and a high dimension space n, the number of code-book vectors required is 2'IR. This number is not achievable by vector quantisers generated by the LBG [6] or ECVQ [7] algo- rithms. In fact, these optimal algorithms are computation- ally very expensive when the product nR is large. In 1979 Gersho [8] conjectured that the optimal high- resolution ECVQ should have the form of a lattice. Thus lattice quantiser methods have been investigated for image coding, (e.g. [9-121). The code-book of a lattice quantiser is a particular subset of a regular arrangement of points in an n-dimen- sional space centred on zero (lattice). Quantisation algorithms have been developed by Conway and Sloane [13, 141 for many different types of lattice. Unlike LBG-type algorithms, there is usually no need to compute a norm to find (among all of the vectors in the code-book) the best reproduction vector. 0 IEE, 2000 IEE Proceedings online no. 20000384 DOI; IO. 1049/ip-vis:20000384 Paper first received 26th July 1999 and in revised form 13th January 2000 M. Vij is with Silicon and Software Systems, South County Business Park, Dublin 18, Ireland N. Kingsbury is with the Department of Engineering, University of Cambridge, Cambridge CB2 IPZ, UK 304 It has been found, by Antonini amongst others [lo, 121, that, to obtain an implementable coding scheme, it is necessary to scale and truncate the lattice suitably. Various researchers [lo, 121 have considered the question of the shape of the truncated area. When the signal to be compressed has an independent, identically distributed multivariate Gaussian distribution, the surfaces of equal probability are ordinary hyperspheres. The truncated area should then be spherical. However, motivated by image coding applications, Fischer [ 151 investigated the case of Laplacian sources (for cubic lattices) where surfaces of equal probability are planar and form shapes called pyra- mids. A limitation of Antonini et al.'s approach [ 101 is that, to obtain accurate probability estimates for lattice points (to design entropy codes) from training data, the dimension- ality of the vectors must be restricted and the lattice needs to be truncated (to ensure a small enough code-book of lattice points). More recently, Rogers and Woolf [9] attempted to alleviate this problem, by quantising 63-D interband vectors onto spherically shaped shells. They assumed that all lattice vectors on a particular shell were equiprobable. They were able to gather accurate statistics to entropy encode the shell indices (as there were at most 200 of these), but their results were poor (probably because their vectors contained coefficients with different variances and so the equiprobable assumption was not valid). Mohdyusof and Fischer [ 1 11 developed a practical coding scheme for Fischer's weighted pyramid [ 161 which takes into account the different variances of the coefficients that formed their 63-D vectors (the AC coeffi- cients from an 8 x 8 DCT). A brief summary of their method is as follows: Assume a vector x of independent Laplacian random variables. Let the elements of the vector x=(x,, x2,. . . ,x,)~ have variances 2/Af, i= 1 . . . L, and mean zero. IEE P~oc.-Vis. Image Signal Process., Vol. 147, No. 4, August 2000

Entropy-coded pyramid vector quantisation for interband wavelet image coding

  • Upload
    n

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Entropy-coded pyramid vector quantisation for interband wavelet image coding

M.Vij and N.Kingsbury

Abstract: Novel class-based entropy coding algorithms for lattice quantised hierarchical (or interband) vectors are presented. The vectors are formed from wavelet coefficients on different scales from similarly oriented sub-bands corresponding to the same spatial location. Structures have been designed specifically for interband vectors drawn from wavelet coefficients, where large groups of approximately equiprobable lattice points are grouped into relatively few classes, known as sub-classes, super-classes and super-super-classes, enabling accurate probability estimates from training data to be obtained for entropy coding of the class indices. Further, it has been found that the best quantiser is a combination of the 2, and D, lattices which has been termed an augmented lattice. The performance of the method, entitled entropy-coded pyramid vector quantisation (ECPVQ), is evaluated on real images and the results show that ECPVQ is competitive, particularly at low bit-rates, with current state-of-the-art wavelet-based coders. A subjective comparison with current high performance scalar quantisation based coders shows that ECPVQ is likely to better preserve fine texture detail in the decoded images because of the finer quantisation of low energy wavelet coefficients that occurs with the augmented lattice.

1 Introduction

Current high performance image coding algorithms (e.g. [1-5]) tend to consist of a sub-band transform [e.g. the discrete wavelet transform (DWT)] followed by scalar quantisation (SQ) and entropy coding. In this paper, an alternative to SQ is examined which is known as vector quantisation (VQ).

However, for a high bit-rate of R bitdsample and a high dimension space n, the number of code-book vectors required is 2'IR. This number is not achievable by vector quantisers generated by the LBG [6] or ECVQ [7] algo- rithms. In fact, these optimal algorithms are computation- ally very expensive when the product nR is large.

In 1979 Gersho [8] conjectured that the optimal high- resolution ECVQ should have the form of a lattice. Thus lattice quantiser methods have been investigated for image coding, (e.g. [9-121).

The code-book of a lattice quantiser is a particular subset of a regular arrangement of points in an n-dimen- sional space centred on zero (lattice).

Quantisation algorithms have been developed by Conway and Sloane [13, 141 for many different types of lattice. Unlike LBG-type algorithms, there is usually no need to compute a norm to find (among all of the vectors in the code-book) the best reproduction vector.

0 IEE, 2000 IEE Proceedings online no. 20000384 DOI; I O . 1049/ip-vis:20000384

Paper first received 26th July 1999 and in revised form 13th January 2000 M. Vij is with Silicon and Software Systems, South County Business Park, Dublin 18, Ireland N. Kingsbury is with the Department of Engineering, University of Cambridge, Cambridge CB2 IPZ, UK

304

It has been found, by Antonini amongst others [lo, 121, that, to obtain an implementable coding scheme, it is necessary to scale and truncate the lattice suitably. Various researchers [lo, 121 have considered the question of the shape of the truncated area. When the signal to be compressed has an independent, identically distributed multivariate Gaussian distribution, the surfaces of equal probability are ordinary hyperspheres. The truncated area should then be spherical. However, motivated by image coding applications, Fischer [ 151 investigated the case of Laplacian sources (for cubic lattices) where surfaces of equal probability are planar and form shapes called pyra- mids.

A limitation of Antonini et al.'s approach [ 101 is that, to obtain accurate probability estimates for lattice points (to design entropy codes) from training data, the dimension- ality of the vectors must be restricted and the lattice needs to be truncated (to ensure a small enough code-book of lattice points). More recently, Rogers and Woolf [9] attempted to alleviate this problem, by quantising 63-D interband vectors onto spherically shaped shells. They assumed that all lattice vectors on a particular shell were equiprobable. They were able to gather accurate statistics to entropy encode the shell indices (as there were at most 200 of these), but their results were poor (probably because their vectors contained coefficients with different variances and so the equiprobable assumption was not valid).

Mohdyusof and Fischer [ 1 11 developed a practical coding scheme for Fischer's weighted pyramid [ 161 which takes into account the different variances of the coefficients that formed their 63-D vectors (the AC coeffi- cients from an 8 x 8 DCT). A brief summary of their method is as follows:

Assume a vector x of independent Laplacian random variables. Let the elements of the vector x=(x , , x2,. . . ,x,)~ have variances 2/Af, i = 1 . . . L, and mean zero.

IEE P~oc.-Vis. Image Signal Process., Vol. 147, No. 4, August 2000

They define the ‘weighted pyramid’ as

where I = x/A and A is the scaling factor, w, = Rnd[A,/ min, A,], i = 1, . . . ,L , are positive integers, Rnd[x] denotes the closest integer to x and y” is the closest integer lattice point to 2. Note that, in Mohdyusof and Fischer’s method, each AL is obtained from an estimate of the variance q?.

Each lattice quantised vector v” is represented by two codewords. The first code is a prefix code for the ‘size’ of the weighted pyramid on which lies. The second code is an enumeration code for the particular lattice point y” on S(L, w, K ) where it is assumed that all lattice code-vectors on S(L, w, K ) are equiprobable.

2 ECPVQ algorithm

An alternative approach is proposed in this paper for interband formed vectors which (i) considers the use of different quantisers other than the integer lattice; and (ii) is specially adapted to coding vectors formed hier- archically from wavelet coefficients.

Furthermore, in the first main class structure developed, known as sub-classes, no assumption is made about the shape of the PDF from which coefficients forming the vectors are drawn.

The algorithms make use of pyramidal shells in such a way that no weighting is necessary even though some elements of the vectors generated have different variances from others. This is because the method represents the position of a particular lattice-quantised vector on a pyra- midal shell by two different codes, one of which is entropy coded. This is explained further in Sections 2.3 and 2.4.

2. I Vector formation The performance of the algorithms has been evaluated on 256 x 256 and 512 x 512-sized monochrome images which contain eight bits per pixel. Images of other sizes can also be coded. A 2-D dyadic separable wavelet trans- form [17] using bi-orthogonal 9/7 [ l ] or 10/18 [IS] tap filter pairs has been used. Symmetric extension [I91 has been used to reduce boundary effects.

For 256 x 256 images, a three-level decomposition is used. There are therefore three vertically (Vk, k= 1..3) oriented sub-bands, of sizes 32 x 32, 6 4 x 6 4 and 128 x 128, respectively, and similarly also for the three horizontally (Hk , k= 1 ..3) and three diagonally (Dk, k = 1 ..3) oriented sub-bands. For each sub-band orientation (r H, D), 2 l-D vectors are formed comprising one coeffi- cient from the coarsest scale ( k = 3) , 4 coefficients from scale k= 2 and 16 coefficients from scale k= 1.

The 21-D vectors which are coded are formed as follows:

x = { ( i , j ) , {d(i,,j)lJ V ( i , j ) E v,,ff,, D, (2)

where d{(i , j ) } are the descendants (as defined in [l]) of (i, j ) . Fig. 1 shows the formation of a 21-D vector from sub-bands of a vertical orientation.

The training set for collecting statistics to design entropy codes consists of eight 256 x 256 monochrome images (‘Trevor’, ‘Miss America’, ‘Couple’, ‘House’, ‘Peppers’, ‘Sailboat’, ‘Plane’ and ‘Baboon’) coded at five different bit-rates (0.12, 0.3, 0.6, 0.8 and 1.0 bit(s)/pixel). The

IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 4, August 2000

Fig 1. Diagram of a three-level sub-band decomposition The construction of a vertically oriented 2 1 -D vector is shown

training set for coding the larger 5 12 x 5 12 images consists of four similarly sized images (‘House’, ‘Sailboat’, ‘Plane’ and ‘Peppers’) coded at the same five different bit-rates.

2.2 Choice of quantiser employed in ECPVQ method Let the Z, lattice be the n-dimensional lattice of integers and the D, lattice be an n-dimensional lattice such that the n components are integers whose sum is even [13, 141.

The performance of three different quantisers has been evaluated: (i) 2, lattice; (ii) D, lattice; and (iii) a union of the D, lattice with the innermost shell of the Z,l lattice (i.e. all vectors with norm one) which has been termed the Z,/ D, augmented lattice (this is not a lattice in the strictest sense [13]).

Simple nearest-neighbour quantisation algorithms are available for the Z, and D, lattices [13]. The quantisation algorithm for the Z,/D, lattice, which is also nearest- neighbour, is now given: let x be an input vector; let C = Z , ~ { X } , e=D,{x} and w=Z,/D,{x}, where, for exam- ple, Z,{x} means the vector x quantised by the Zn lattice. Then w = c if llcll 5 1 otherwise w = e.

2.3 The class structure The concept of classes is now introduced. The main objective is to put large numbers of lattice points, on pyramidal shells, into a small number of classes, within each of which all points are approximately equiprobable. Then a particular quantised vector can be identified by three codes: (i) the shell on which the lattice point lies (i.e. 1, norm of vector for which has been used the variable K ) ; (ii) the index of the class which contains the vector; and (iii) the index of the vector within a specific class.

If the number of classes is small, then an efficient entropy code can be designed for the second code from finite training data. If it can be assumed that all lattice points within a class are equiprobable, then a simple near- uniform code can be used for the third code. An efficient conditional entropy code has been designed for the first code and is described in Section 3.

A vector on a particular pyramidal shell is mapped into its class by two transformations. First the absolute value of its co-ordinates is computed and then they are sorted into

305

descending order. A class is similar to a leader as defined by Moureaux et al. [20] except that a leader contains elements that are in ascending order. A recursive algo- rithm, based on the sequential partitioning of an integer into a variable number of parts [22], for determining the index of a specific class, is given by Vij [2 I]. Moureaux et al. [20] pre-store the indexes of the leaders in a coding table. However, this can require considerable memory storage.

It is assumed that all the vectors in a particular class (or vectors in the orbit of a leader [20]), which are the ‘signed permutations’ [20], are equiprobable if the components which comprise the vectors are of the same variance. Algorithms for indexing the signed permutations are given by Moureaux et al. [20] and Vij [21].

2.4 The sub-class structure In this Section coding the 2 1 -D interband vectors described in Section 2.1 is considered, where the vector is of the format (I , 4, 16) (i.e. one coefficient from one coarse sub- band four coefficients from a finer sub-band and the other 16 coefficients from another even finer sub-band).

Since these three types of coefficient will have signifi- cantly different variances, it is necessary to introduce a new structure called a sub-class, which, on any particular shell indexed by K, contains code vectors which are assumed to be equiprobable. It is necessary to think of the 2 1 -D vector as containing three distinct groups of elements with different variances (of sizes 1, 4 and 16, respectively).

It is desired to allow any ordering within each group but not to allow an interchange between the groups. A vector is mapped into its sub-class by three transformations. First the absolute value of its co-ordinates is calculated. Next the 2nd to 5th co-ordinates are sorted into descending order. Finally the 6th to 21st co-ordinates are sorted into descending order.

The problem of counting the number of sub-classes in a particular class (and by extension on a particular pyramidal shell) is not considered in this paper, but the solution has been obtained [21] by use of the limited-repetition princi- ple [22] and the results are presented in Table 1. An

Table 1: Number of classes, sub-classes and super- classes on each pyramidal shell K for the Zzl/Dzl augmented lattice

K Classes Su b-classes Super-classes

1

2

4

6

8

10

12

14

16

18

20

22

24

26

1

2

5

11

22

42

77

135

231

385

627

1001

1571

2424

3

8

38

135

404

1073

2606

5903

12641

25832

50734

96283

177318

317978

3

6

15

28

45

66

91

120

153

190

23 1

276

325

378

306

algorithm for indexing all the lattice points in a particular sub-class is given by Vij [21].

It was decided to implement a method which effectively groups sub-classes on each pyramidal shell indexed by K. One of the reasons for doing this, which can be observed from Table 1, is that the total number of sub-classes increases rapidly as the shell index K increases (e.g. for K = 12, the number of sub-classes is 2606). This means that accurate probabilities for each of the sub-classes on a shell where K is greater than, say, 12 are difficult to obtain from training and thus good entropy codes are hard to design. By allowing each super-class to comprise a number of sub-classes where K is large, more accurate probability estimates can be obtained for entropy encoding the super- class indices and performance can therefore actually improve.

2.5 Super-class structure for 21-0 vectors Let K be the I, norm of the input quantised vector. This is the index of the shell on which the quantised vector lies. Then define three different radii, rl , r, and r I 6 , that are, respectively, the I, norms of the three distinct groups of 1, 4 and 16 elements which comprise the 21-D vectors. Define the set of all super-classes on a shell K to be all combinations of (Y, , r4, r16) where 0 5 Y, , r4, r16 5 K such that y1 + r , +y16 =K. An example is now given which shows the relationship between sub-classes and super- classes.

2.5. I Example: Let K = 2. The six super-classes on the shell indexed by K = 2 are given by ((2,0,0>,(0,2,0>,(0,0,2),(1,1,0),(1,0,1),(0,1,1)1 where each element in the set is of the form (yl, r,, r16). Assume that all the possible code-vectors in each super-class are equiprobable. So, for example, the super-class (0,2,0) comprises the two sub-classes (0,(2,0,0,0),(0,0,. . . ,O,O)) and (0,(1,1,0,0),(0,0,. . . ,O,O)). The number of code- vectors in the former sub-class is 8 (4 x 2’) and in the latter is 24 (6 x 22). Therefore the total number of code- vectors in the super-class (0,2,0) is 32 and it is assumed that they are equiprobable.

This is equivalent to assuming that the coefficients in the four-element group are independent, have the same variance and come from a Laplacian PDF. For other super-classes, this concept can also be extended to the 16-element group.

It can easily be shown that the total number of super- classes on a particular shell is given by (K + l)(K+ 2)/2. Table 1 shows the number of super-classes for various values of K.

2.6 Super-super-class structure For large values of K, a significant number of super-classes are highly improbable, necessitating hrther grouping. Therefore an efficient heuristic method has been developed where each super-super-class comprises a number of super- classes. It has been found that overall performance is improved as a consequence of this grouping. This is because, with the training set used, over 80% of the histogram bins for the super-classes on the outer shells (for K > 22) are empty.

2.6.1 Heuristic approach to grouping super- classes: First the heuristic formula which has been found to work well in practice [21] is given: Let Si denote a super-class on a shell given by K such that i is

IEE Proc.-vis. Image Signal Process., Vol. 147, No. 4, August 2000

in the range [O,((K + 1)(K + 2)12) - 11 and let S, be of the form (rl , r4 , r16) where r, + r4 + rI6 = K and rl , r4 and r16 are, respectively, the 1, norms of the first element, second to fifth elements and sixth to twenty first elements, respec- tively, of the 21-D input quantised vectors. Then, for each S, calculate an associated index, j m , given by:

jm , = Rnd[2,{max(Irl - ~ 4 1 , Iy1 - rl6I,

174 - ri61) A I 10&plll (3)

where max(x, y, z ) is the maximum of integers x, y and z, Rnd(x) would be the nearest integer to the real number x, P, is the number of equiprobable code-vectors in the super- class of index i and 2 , > 1.0, 0 5 2,s 1.0 are parameters assigned by the user (they determine how many super- super-classes there are in total on a particular shell). The super-classes with the same integer index j m , are grouped into a single super-super-class.

Fig. 2 shows the number of super-super-classes which were used on the outer shells (the quantiser used is the Z,,l D,, lattice). The number of quantised vectors lying on a shell K decreases as K increases. Therefore, as K increases, the number of super-super-classes should decrease (towards 1) to permit design of good entropy codes. The super-class grouping was done offline. The variation in ?,, and A, with K is shown in Fig. 3. Here 2 , and A, were chosen to minimise the number of super-super-classes for a

shell index K Fig 2. ECPVQ algorithm (i.e. for shell index K > 22)

The number of super-super-classes used on the outer shells in the

2.5

2.0

1.5

1 .o

.........

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5 1 ; : ........................

. . . . . . . . . . . . . . . . . . *.-;- : : : : : : : I I I I , - r I I I . . -:--A-: : : : :

0 20 25 30 35 40 45 50 55 60 65 70

shell index K Fig 3. Even values of K only

Variation of 1, and i2 with shell index K

~ 4 1 2 _ _ _

IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 4, August 2000

H4

"4

Fig 4. Five-level sub-band decomposition The construction of a 5-D vector from diagonally oriented sub-bands is shown

given small percentage increase in entropy (which results when nl super-classes are grouped into n2 super-super- classes with n2 < n,) , based on training data.

2.7 Extension for coding large images For the 512 x 512 images which have been coded, a five- level sub-band decomposition was used. Using the notation of Section 2.1, the sub-bands v,, Hk and D, fork= 4, 5 are coded by 5-D interband vectors. The formation of a 5-D vector from diagonally oriented sub-bands is given in Fig 4. For future reference, the shell index of a quantised 5-D vector, which is given by its I , norm, is denoted by the variable KK.

3 Results

The 21-D vectors are quantised by the Z2,ID2, lattice with scaling factor A which controls the bit-rate. A comparison between the performance of the different quantisers defined in Section 2.2 is given in Section 3.1. For 512 x 512-sized images, the 5-D vectors are quantised by a Z,lD, lattice (the performance of the D, lattice is similar in this case, both being superior to the Z, lattice) using a scaling factor of 0.75A. It has been found using the equal-slope rate- distortion method [23] that coarser quantisation of the higher-frequency sub-bands is appropriate to maximise rate against PSNR performance. The DC band is also quantised using a uniform mid tread quantiser [24] with step size 0.75A. The scaling factor of 0.75A is used to achieve approximately equal rate-distortion slopes [2 11. An adaptive arithmetic coder [25-271 is used for the imple- mentation of entropy codes unless otherwise stated. The DC band is coded by a differential code similar to that used in JPEG [24].

A quantised 21-D vector is represented by the following three codes: (i) coding of the shell index K with a conditional entropy code [2 11 designed using variations of algorithms suggested by Young [26], Chrysafis and Ortega [3] and Elnahas et al. [28] (see Fig. 5 for an example); (ii) coding of the index of sublsuperlsuper-super class within which the vector lies;

307

!:::I L o 0.25

z 0.20

2 0.15 e 0.10

0.05

0 -10 0 10 20 30 40 50 60

shell index K

._ - s!

Q

Fig 5. The quantiser used is the D lattice with A = 30 where A is the scaling factor. A context-based entropy cod2’is used for K < I O and a simple first-order code for K? 10 (the probabilities are modelled by a decaying exponential with the optimal exponent being transmitted to the decoder)

Distribution of shell indexKfor the quantised 21-0 vectors

(iii) coding of the index of the vector within a specific sub/ super/super-super-class using a simple near-uniform Huff- man code (e.g. if there are 10 equiprobable lattice points within a particular class, the first six points would be represented by 3-bit codewords and the last four points by 4-bit codewords).

For the 5 12 x 5 12 images coded, a quantised 5-D vector for levels 4 and 5 is represented by the following three codes: (i) coding of shell index KK for which a simple first-order entropy code is used with the probabilities being modelled by an exponential curve as shown in Fig. 6; (ii) coding of super-class index within which the vector lies (the number of super-classes on a shell KK is given by KK+ 1); and (iii) a near-uniform Huffman code, as previously described is used to encode the index of the vector within a specific super-class.

The reproduction vectors at the decoder are determined by a method [21] similar to that used by Mohdyusof and Fischer [ l l ] for determining centroids. Fig. 7 shows a diagram of the encodeddecoder for the complete ECPVQ algorithm.

I 1 ‘ 1 . . . . . . . . I I I I I I I

. . . . . g 010 > . . . . . . : ........ ; ........ ; ........ ........ : ........ ........ ................. ;. : . r i : : : . . . . . . . . . . . . . . .

0.06

............. .................................... ; .......................... ;.

. . . . ....... ...... .\... ;.. .... .i ........ 1 ........ ....... .1 ........ 1.

A : : : : : . . . . . . 0 I l l

. . . . . . . . 0.14 - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

0 10 20 30 40 50 60 70 80 90 100

shell index KK

Fig 6. ~ actual distribution . . . . . modelled distribution with a decaying exponential The quantiser used is the D, lattice. The factor by which the input vectors are scaled (or alternatively the quantizer) is 30

308

Distribution of shell index KKfor the quantised 5-D vectors

c

in A B

Fig. 7 The decoder is similar with ‘in’ and ‘out’ reversed and ‘in’ denoting binary bit- stream and ‘out’ denoting decoded image A - wavelethnverse wavelet transform B - quantisation stageidetermination of reproduction vectors C - coding of DC band D - coding of shell index E - coding of index of subisuperisuper-super class F - coding of index within a specific subisuperisuper-super class G - arithmetic coderidecoder H - Huffman coderidecoder I - multiplexeridemultiplexer in - input image out - binary bit-stream s - side information

Encoder fo r the ECPVQ algorithm

3.1 Performance of different quantisers Fig. 8 gives the comparison between different quantisers used in the ECPVQ algorithm for coding the 256 x 256 ‘Lena’ image. It can be seen that the Z,,/D,, lattice performs about 0.1 dB better than the Z,, lattice, which in turn is slightly superior to the D,, lattice.

A rate-distortion analysis suggests that the big difference in variance between the single low-frequency coefficient and the I6 highest-frequency coefficients which (together with the four medium frequency coefficients) comprise the 21-D vectors results in the Z2,/D2, lattice being superior to both the Z,, and D,, lattices.

3.2 Comparison between sub-classes and super- classes Consider the question of comparing the performance of sub-classes with that of super-classes to see whether the simpler super-classes perform almost as well as the more general sub-classes.

Let the test image be the 256 x 256 image ‘Lena’, the scaling factor A = 30 (corresponding to a bit-rate of about 0.6 bitdpixel) and the quantisation be carried out by the D,, lattice. Then look at shells with K equal to 2 ,4 , 6, 8, 10

37

36

m 35 U

2 v)

K-

34

33

32 0.4 0.5 0.6 0.7 0.8 0.9 1

bit-rate, bits/pixel

PSNR against rate performance of ECPVQ methodfor 256 x 256 Fig 8. ‘Lena ’ using three different quantisers

~ Z, , iD, , lattice

* D,, lattice . . . . . Z,, lattice

IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 4, August 2000

and 12. The two entropies, e s u b and e s u p are defined as follows:

e s u b = E[- ( i

log,(P(S,)) + log,(c,)]

(4)

3072 bitshector (5)

where both summations are over all quantised vectors where 0 < K s 12. The total number of quantised vectors is 3072. P(S,) is the probability of occurrence of the sub- class in which the ith vector lies and c, is the number of possible code-vectors (including those with different signs) in that sub-class. The notation P(SP,) and cp, corresponds to the previous notation except that super-classes are being used.

Table 2 compares the performance of the sub-class and super-class structures as K varies. It can be seen that using super-classes is only 0.03 bitshector (0.03 x 3072 bits) inferior to the scheme where sub-classes are used. It has been found that this difference increases to 0.05 bitshector if coarser quantisation is employed (i.e. A > 40). It can be seen that, for K = 12, the super-class method is slightly superior. This is because, on the shell indexed by K = 12, there are 2606 sub-classes many of which are highly improbable whereas there are only 9 1 super-classes. This is the main reason super-classes [or modifications thereof (i.e. super-super-classes)] are used for all K > 12. Coding other images suggests that, for K = 12, slightly superior performance is achieved by the sub-class structure compared with the super-class structure.

For the 512 x 512 image ‘Lena’, using the Z,,/D2, lattice for quantisation, the advantage in using the sub- class method at bit-rates less than 0.25 bitdpixel is equiva- lent to an improvement in PSNR performance (for a comparable bit-rate) of about 0.05 dB (or alternatively the advantage is about 0.05 bitshector or 650 bits).

Therefore, it can be concluded that the assumptions made in deriving the super-class structure are generally valid. At low bit-rates and low shell radii, there is a small advantage in using the sub-class structure and this method is therefore used for 0 < K s 12 to optimise the ECPVQ algorithm.

3.3 Comparison between ECPVQ and other coders The performance of the ECPVQ method will now be compared with that of the following coders:

ECPVQ-1: ECPVQ method using the 917-tap filter pair; ECPVQ-2: ECPVQ method using the 10/18-tap filter pair; SPIHT: entropy-coded version of method of Said and Pearlman [I] using the 9/7-tap filters;

Table 2: Comparison between performance of subclass and superclass structures.

K 2 4 6 8 10 12 Total

e-sub 1.159 1.069 0.921 0.960 0.980 0.718 5.805

e-sup 1.163 1.088 0.927 0.967 0.987 0.713 5.835

The definitions for ’e-sub and ’e-sup’ are given in eqns. 4 and 5, respectively. The units for ’total’ are bits/vector.

ECLVQ: Method of Mohdyusof and Fischer [ 1 I ] in which a uniform sub-band decomposition (1 6 bands) is used using Johnston’s [29] 32-tap filters. The lowest-frequency sub-band is further decomposed by an 8 x 8 DCT. The 63- D vectors from the DCT are coded using a ‘weighted pyramid’ as previously described. The remaining 15 sub- bands are coded using 32-D or 64-D intraband vectors; SFQ: space frequency quantisation method [2] of Xiong et al. using the 9/7-filter pair; CSQ: the context-based scalar quantisation method of Chrysafis and Ortega [3] using the 10/18 filters in a separable wavelet transform; UCLA-best: the best current results at the UCLA web site [5] for wavelet-based coders. The PSNR number at 0.2 bpp is from the estimation quantisation algorithm [4].

3.3. I PSNR against rate comparison with other coders: Tables 3 and 4 give the PSNR comparisons between the present ECPVQ method (ECPVQ-1 and ECPVQ-2) and previously described coders at various bit-rates, respectively, for the 5 12 x 5 12 monochrome ‘Lena’ and ‘Goldhill’ images.

From Table 3 for the ‘Lena’ image, it can be observed that ECPVQ (ECPVQ-2) is at most 0.06-0.25 dB inferior to the best current quoted results for wavelet-based coders (UCLA-best). Table 4 for the ‘Goldhill’ image shows ECPVQ (ECPVQ-2) inferior by about 0.1 1-0.14 dB to the best results (UCLA-best).

Therefore, ECPVQ appears to be competitive at low bit- rates with current state-of-the-art coders (e.g. [ 1-5, 301) based on an objective (PSNR) comparison.

Table 3: Comparison between rate against PSNR (dB) performance of various coders (see Section 3.3 for defi- nitions) for coding the 512 x 512 ‘Lena’ image (same original as that used by Xiong et al. 121. Joshi et al. 1301, which is slightly different from that used by Said and Pearlman [ I ] )

Rate (bpp) 0.15 0.2 0.25 0.258 0.327 0.5

ECPVQ-1 32.11 33.37 34.33 34.48 35.51 37.35

ECPVQ-2 32.26 33.51 34.46 34.61 35.62 37.44

SPIHT 31.90 33.17 34.14 34.28 35.35 37.25 - - - ECLVQ 33.7* 34.8* - - - SFQ 33.32 34.33 - 37.36

CSQ 34.53 - 37.66

UCLA-best - 33.57 34.57 - 37.69

- - -

-

measured from Fig. 7 in [Ill

Table 4: Comparison between the rate against PSNR (dB) performance of various coders for coding the 512 x 512 ’Goldhill‘ image

Rate (bpp) 0.2 0.25 0.327 0.5

ECPVQ-I 29.95 30.67 31.81 33.32

E C PVQ4 30.03 30.75 31.89 33.39

SPIHT 29.85 30.56 31.65 33.13

ECLVQ 30.27 33.05

SFQ 29.94 30.7 1 33.37

CSQ 30.37 30.80 33.51

UCLA-best 30.86 33.53

- -

- -

- -

IEE Puoc.-Vis. h u g e Sigriul Process., Vol. 147, No. 4, Augiist 2000 309

C d

Fig 9. a Original 5 I 2 x 5 12 ‘Lena’ b ECPVQ method bit-rate = 0.25 bpp with PSNR = 34.33 dB c Method of SaidPearlman [ I ] , bit-rate = 0.25 bpp with PSNR = 34.14 dB d Baseline JPEG [24], bit-rate = 0.255 bpp with PSNR = 3 I .68 dB

Comparison between 256 x 256 sections of ‘Lena’ coded b.v three different methods using the 9/7-tup,filters

a h

C d

Fig 10. a Original 512 x 512 ‘Goldhill’ b ECPVQ method, bit-rate = 0.5 bpp with PSNR = 33.32 dB c Method of SaidiPearlman [l], bit-rate=0.5 bpp with PSNR= 33.13 dB d Baseline JPEG [24], bit-rate = 0.5 bpp with PSNR = 3 I .68 dB

Comparison between 256 x 256 sections of ‘Goldhill’ coded by three duerent methods using the 9/7-tupjilters

310 IEE Proc.-Vis. Image Signal Process.. Val. 147, No. 4, August 2000

a h

C d

Fig 11. a Coded ‘Lena’ by ECPVQ method at 0.25 bpp with PSNR = 34.46 dB b Coded ‘Lena’ by context-based method [3] at 0.25 bpp and PSNR = 34.53 dB c Coded ‘Goldhill’ by ECPVQ method at 0.5 bpp with PSNR=33.39dB d Coded ‘Goldhill’ by context-based method [3] at 0.5 bpp with PSNR = 33.5 I dB

Comparison between 128 x 128 sections of ‘Goldhill’ and ‘Lena’ coded by two different methods using the IO/IR-tapfilter pair

a

C

b

d

Fig 12. jlters a Original ‘Lena’ b Coded ‘Lena’ by method of SaidPearlman [ l ] at 0.25 bpp and PSNR= 34.14dB c Coded ‘Lena’ by ECPVQ method at 0.25 bpp with PSNR = 34.46 dB d Coded ‘Lena’ by context-based method [3] at 0.25 bpp and PSNR = 34.53 dB

Comparison between 60 x 60-pixel sections of ‘Lena ’ coded by two different methods using the IO/lX-tapjilter pair and one method using the 9/7-tap

IEE Proc-Vis. Image Signal Process., Vol. 147, No. 4, August 2000 311

3.3.2 Subjective comparison of different coders: Subjective comparisons are shown in Figs. 9 to 12 and versions of these can be found at http:ll www.eng.cam.ac.uk/- ngW. Fig. 9 permits a comparison between the ‘Lena’ image coded by ECPVQ (ECPVQ-1), SaidIPearlman [ l ] and JPEG [24]. A lot of blocking artefacts can be seen in the JPEG image, as expected. The image coded by ECPVQ contains more detail in the hat region of ‘Lena’ compared with the image coded with Said/Pearlman’s method.

In Fig. 10, the JPEG image looks considerably less degraded than the corresponding JPEG-coded ‘Lena’ image because the rectangular objects in ‘Goldhill’ tend to hide the blocking artefacts. The major subjective differ- ence between the ECPVQ coded image and that of Said/ Pearlman is that there is a visually annoying artefact on some of the roof tiles, pointed by the arrow, in Said/ Pearlman’s decoded image. Using ECPVQ, this artefact is less severe and the texture is better preserved.

In Figs. 11 and 12, a comparison can be seen of the present method (ECPVQ-2) and the recent context-based scalar quantisation method of Chrysafis and Ortega [3] which is currently amongst the very best coders [ 5 ] . It has been found that, below both eyelids, two distinct eyelashes can be seen in Fig. l l a but they seem to have merged in Fig. 1 1 b. Also, in certain parts of the hat area, there is more texture detail in Fig. 12c than in Fig 12d. Comparing the two coded ‘Goldhill’ images, (Figs I IC and 1 Id), one can again observe that some of the roof tiles in Fig. 1 IC contain more texture information than those in Fig. 1 Id. However, it has also been observed that an object (a tower) in the top right-hand corner of Fig. 1 I d is more sharply defined than the same object in Fig. 1 IC. This indicates that though the subjective quality of Fig. 1 IC would appear to be better in most areas than Fig. 1 Id, there is at least one region of Fig. 1 I d which appears to be better coded than in Fig. 1 IC.

The reasoning behind the apparent better preservation of texture detail when coding using the proposed ECPVQ method, as opposed to the methods of SaidPearlman [ 11 and Chryafis and Ortega [3], for example, appears to be because of the finer quantisation employed in quantising the low-energy high-frequency wavelet coefficients (leading to better preser- vation of texture detail). For example, in the method of Said Pearlman, the quantisation employed is equivalent to a uniform scalar quantiser with a quantisation region or ‘dead-zone’ width at the origin of 2A, where A is the nominal step size. Therefore low-magnitude transform coefficients are more coarsely quantised with respect to larger-magnitude coeffi- cients. In contrast, it is found that the best quantiser (Z2,/D2, lattice) in the ECPVQ method gives finer quantisation of low- magnitude coefficients and thus leads to better preservation of low-contrast textures in the image.

Further, it has also been found that an analysis of MSE (mean-squared error) as a function of sub-band suggests that ECPVQ coded images produce lower MSE for the three highest-frequency sub-bands (and higher MSE for the lower-frequency sub-bands) compared with algorithms which give comparable rate against PSNR performance

4 Conclusions

(e.g [2 , 3,411.

Entropy coding methods for lattice-vector quantisation of hierarchical vectors formed from wavelet coefficients have been developed. It has been found that the algorithm ECPVQ is competitive with current state-of-the-art wave- let-based coders in terms of PSNR against rate perfor- mance. Further, it has been found that the best quantiser for

these vectors in fact utilises finer quantisation near the origin. This seems to result in better preservation of texture detail in decoded images at low bit-rates than scalar- quantisation-based algorithms which give similar PSNR performance.

5

1

2

3

4

5

6

7

8

9

10

1 1

12

13

14

15

16

17

18

19

20

21

22 23

24

25

26

27

28

29

30

References

SAID, A., and PEARLMAN, W.: ‘A new fast and efficient image codec based on set partitioning in hierarchical trees’, IEEE Trans. Circuits Syst. Video Technol., 1996, 6 , (3), pp. 243-250 XIONG. 2.. RAMCHANDRAN. K.. and ORCHARD. M.: ‘Snace-

I ,

frequency quantization for wavelet imagc coding’, IEEg Eans. inage Process., 1997, 6 , (5), pp. 677-693 CHRYSAFIS, C., and ORTEGA, A.: ‘Efficient context-based entropy coding for lossy wavelet image compression‘. Proceedings of IEEE Data Compression Con@” Snowbird USA, 1997, pp. 241-250 LOPRESTO, S., RAMCHANDRAN, K., and ORCHARD, M.: ‘Image coding based on mixture modelling of wavelet coefficients and a fast estimation quantization framework’. Proceedings of IEEE Data Compression Conference, Snowbird USA, 1997, pp. 22 1-230 Wavelet image coding: PSNR results. UCLA web site at: http:// www.icsl.ucla.edu/- iplipsnr-results.htmI LINDE, Y., BUZO, A., and G U Y , R.: ‘An algorithm for vector quantizer design’, IEEE Trans. Conznzun., 1980, COM-28, (l), pp. 84-95 CHOU, P., LOOKABAUGH, T., and GRAY, R.: ‘Entropy-constrained vector quantization’, IEEE Trans. Acoust. Speech Signal Process., 1989, 37, (l) , pp. 3 1 4 2 GERSHO, A.: ‘Asymptotically optimal block quantization’, IEEE Trans. Inf: Theory, 1979, IT-25, (4), pp. 373-380 WOOLF, A., and ROGERS, G.: ‘Lattice vector quantization of image wavelet coefficient vectors using a simplified form of entropy coding’. Proceedings of IEEE intemational conference on Acoustics. sneech. , ~r~~~ , ~ ~~

signal processing, ICASSP ‘94, 1994, pp. 269-272 ~~

BARLAUD, M., SOLE, P., GAIDON, T., ANTONINI, M., and MATHIEU, P.: ‘Pyramidal lattice vector quantization for multiscale image coding’, IEEE Trans. Image Process., 1994,3, (7), pp. 367-381 MOHDYUSOF, Z., and FISCHER. T.: ‘Entrouv coded lattice vector quantizer for transform and subband image cod&$, IEEE Trans. Image Process., 1996, 5, (2), pp. 289-298 KASEI, S., and DERICHE, M.: ‘Fingerprint compression using a piecewise-uniform pyramid lattice vector quantization’. Proceedings of IEEE international conference on Acoustics, speech, signal proces- sing, ICASSP ’97, 1997, pp. 31 17-3120 CONWAY, J., and SLOANE, N.: ‘Sphere packings, lattices and groups’ (Springer, NY, 1993) CONWAY, J., and SLOANE, N.: ‘Fast quantising and decoding algo- rithm for lattice quantizers and codes’, IEEE Trans. InJ Theory, 1982,

FISCHER, T.: ‘A pyramid vector quantizer’, IEEE Trans. InJ; Theory,

FISCHER, T.: ‘Geometric source coding and vector quantization’, IEEE Trans. InJ Theoiy, 1989, 35, (I) , pp. 137-145 MALLAT, S.G.: ‘A theory of multiresolution signal decomposition: the wavelet representation’, IEEE Trans. Putlern Anal. hfuch. Intell., 1989,

TSAI, M., VILLASENOR, J., and CHEN, E: ‘Stack-run image coding’, IEEE Trans. Circuits S,vst. Video Technol., 1996, 6 , (lo), pp. 5 19-52 1 SMITH, M., and EDDINS, S.: ‘Analysislsynthesis techniques for sub- band coding’, IEEE Trans. Acoust. Speech Signal Process., 1990, 38, (X), pp. 1446-1455 MOUREAUX, J.-M., LOYER, P., and ANTONINI, M.: ‘Low complex- ity indexing method for z” and D,, lattice quantizers’, IEEE Pans. Commun., 1998, 46, (12), pp. 1602-1609 VIJ, M.: ‘Hierarchical lattice vector quantization of wavelet trans- formed images’. PhD dissertation, Depaflment of Engineering, Univer- sity of Cambridgc, 1999 BERGE, C.: ‘Principles of combinatorics’ (Academic Press, 1971) SHOHAM, Y., and GERSHO, A.: ‘Efficient bit allocation for an arbitrary set of quantizers’, IEEE Trms. Acoust. Speech Signal Process., 1988, ASSP-36, (9), pp. 1445- 1453 JPEG: ‘Digital Compression and coding of continuous tone still images’. Draft IS0 109/8, lntemational Organisation for Standardisa- tion, 1991 RISSANEN, J., and LANGDON, G.: ‘Arithmetic coding’, I B M J Res. Develop., 1979, 23, (2), pp. 149-162 YOUNG, R.: ‘Video coding using lapped transforms’. PhD disserta- tion, Department of Engineering, University of Cambridge, 1994 WITTEN, I., NEAL, R., and CLEARY, J.: ‘Arithmetic coding for data compression’, Conznz. ACM, 1987, 30, (h), pp. 520-540 ELNAHAS, S., and DUNHAM, J.: ‘Entropy coding for low bit-rate visual telecommunications’, IEEEJ Selected Areas Conznzzm., 1987, 5, (7), pp. 1175-1 183 JOHNSTON, J.: ‘A filter family designed for use in quadrature mirror filter banks’. Proceedings of lEEE international conference on Acous- tics, speech, signalprocessing, ICASSP ‘80, April 1980, pp. 291-294 JOSHI, R., JAFARKHANI, H., KASNER, J., FISCHER, T., MARCELLIN, M., and BAMBERGER, R.: ‘Comparison of different methods of classification in subband coding of images’, lEEE Trans. Image Process., 1997, 6 , ( 1 I), pp. 1473- 1484

IT-28, (2), pp. 227-232

1986, IT-32, (4), pp. 568-583

11, (7), pp. 674-681

IEE hoc.-Vis. Image Signal Process.. Vol. 147, No. 4, August 2000 5 I L