15
Signal Processing 82 (2002) 1633 – 1647 www.elsevier.com/locate/sigpro An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding Toshihisa Tanaka , Yukihiko Yamashita Department of International Development Engineering, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo 152-8552, Japan Received 30 April 2001 Abstract We present the theory and design of an adaptive lapped biorthogonal transform for image coding. The proposed transform consists of basis functions overlapping across adjacent blocks and non-overlapping basis functions, where the basis functions’ centers of symmetry are aligned. Because of the alignment, we can use the symmetric extension method at image boundaries when we transform an input image. Next, we show that the optimal non-overlapping basis functions in the minimal mean square error sense can be found by solving an eigenvalue problem without numerical search when the feasible overlapping basis functions are given. This derivation is enabled by the subspace Karhunen–Lo eve transform which provides the optimal approximation of an original signal in a given subspace, and which has been proposed by the present authors. We show an orientation adaptive example, where each adaptive transform is characterized by the angle of edges in image blocks. In the encoder, each block is selectively transformed by one of orientation adaptive transforms. Experimental results show that the use of adaptation improves the quality of decoded images. ? 2002 Elsevier Science B.V. All rights reserved. Keywords: Transform coding; Lapped transform; Biorthogonal transform; Adaptive transform; Sub-space KLT 1. Introduction Block-transform coding is one of the most ecient methods for image compression. It is well known that among all block transforms and at a given rate, the Karhunen–Lo eve transform (KLT) minimizes the expected distortion [8, p. 240]. The discrete cosine This work is supported in part by JSPS Grant-in-Aid for JSPS Fellows 1210283. Toshihisa Tanaka and Yukihiko Yamashita are with the Graduate School of Science and Engineering, Tokyo Institute of Technology, Japan. Corresponding author. Tel.: +81-3-5734-3906; fax: +81-3-5734-3497. E-mail addresses: [email protected] (T. Tanaka), [email protected] (Y. Yamashita). transform (DCT), which is an approximation for the KLT of a rst-order Gauss–Markov process with a large positive correlation coecient, is usually used in several international image and video compression standards [18]. On the other hand, there is some doubt on the use of global statistics for generating an optimal transform. Therefore, several attempts on the use of space-varying transforms, which are adaptively con- structed for local signals (image blocks), have been conducted. As a result, it has been recognized that the use of multiple adaptive transforms can improve coding performance compared to the use of a single transform [3,7,9,17,20,28]. In adaptive image coding, image blocks are classied into dierent classes, and are decomposed by transforms corresponding to the 0165-1684/02/$ - see front matter ? 2002 Elsevier Science B.V. All rights reserved. PII:S0165-1684(02)00306-7

An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

Embed Size (px)

Citation preview

Page 1: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

Signal Processing 82 (2002) 1633–1647www.elsevier.com/locate/sigpro

An adaptive lapped biorthogonal transform and its application inorientation adaptive image coding�

Toshihisa Tanaka ∗, Yukihiko YamashitaDepartment of International Development Engineering, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku,

Tokyo 152-8552, Japan

Received 30 April 2001

Abstract

We present the theory and design of an adaptive lapped biorthogonal transform for image coding. The proposed transformconsists of basis functions overlapping across adjacent blocks and non-overlapping basis functions, where the basis functions’centers of symmetry are aligned. Because of the alignment, we can use the symmetric extension method at image boundarieswhen we transform an input image. Next, we show that the optimal non-overlapping basis functions in the minimal meansquare error sense can be found by solving an eigenvalue problem without numerical search when the feasible overlappingbasis functions are given. This derivation is enabled by the subspace Karhunen–Lo5eve transform which provides the optimalapproximation of an original signal in a given subspace, and which has been proposed by the present authors. We show anorientation adaptive example, where each adaptive transform is characterized by the angle of edges in image blocks. In theencoder, each block is selectively transformed by one of orientation adaptive transforms. Experimental results show that theuse of adaptation improves the quality of decoded images.? 2002 Elsevier Science B.V. All rights reserved.

Keywords: Transform coding; Lapped transform; Biorthogonal transform; Adaptive transform; Sub-space KLT

1. Introduction

Block-transform coding is one of the most e;cientmethods for image compression. It is well knownthat among all block transforms and at a given rate,the Karhunen–Lo5eve transform (KLT) minimizes theexpected distortion [8, p. 240]. The discrete cosine

� This work is supported in part by JSPS Grant-in-Aid for JSPSFellows 1210283. Toshihisa Tanaka and Yukihiko Yamashita arewith the Graduate School of Science and Engineering, TokyoInstitute of Technology, Japan.

∗ Corresponding author. Tel.: +81-3-5734-3906; fax:+81-3-5734-3497.

E-mail addresses: [email protected] (T. Tanaka),[email protected] (Y. Yamashita).

transform (DCT), which is an approximation for theKLT of a Hrst-order Gauss–Markov process with alarge positive correlation coe;cient, is usually usedin several international image and video compressionstandards [18]. On the other hand, there is some doubton the use of global statistics for generating an optimaltransform. Therefore, several attempts on the use ofspace-varying transforms, which are adaptively con-structed for local signals (image blocks), have beenconducted. As a result, it has been recognized thatthe use of multiple adaptive transforms can improvecoding performance compared to the use of a singletransform [3,7,9,17,20,28]. In adaptive image coding,image blocks are classiHed into diKerent classes, andare decomposed by transforms corresponding to the

0165-1684/02/$ - see front matter ? 2002 Elsevier Science B.V. All rights reserved.PII: S 0165 -1684(02)00306 -7

Page 2: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1634 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

Nomenclature

RN N - dimensional Euclidean spaceIN N × N identity matrix0N N × N null matrixAT transposition of a matrix Atr[A] trace of AR(A) range of AN (A) null space of A〈 f ; g〉 inner product of two vectors f and g‖ f ‖ Euclidean norm of f

classes. A number of adaptation methods have beenproposed. BjHntegaard [3] introduced a priori classeswhich are characterized by directionality such as edgesand lines. Some other methods for orientation adapta-tion have been reported in [9,20]. On the other hand,instead of adopting a priori classes for the adaptation,there have been some self-organizing methods withtraining of input signals [7,17,28].The use of adaptation has resulted in signiHcant

improvements in both compression ratio and vi-sual quality around edges. However, as long as weuse block-transform coding, decoded images can-not be free from annoying blocking e3ects at lowbit-rates. By using wavelets or lapped transforms[13,19], we can obtain reconstructed images with-out the blocking artifacts, however, they are usu-ally blurred especially around edges. In the waveletcase, moreover, the construction of locally adaptivebasis functions is very di;cult since “double-shift”orthogonality=biorthogonality is required for per-fect reconstruction. Also, lapped transforms (LT)are powerful tools for the reduction of the blockingartifacts in image compression [4,14–16,26]. Theblocking artifacts are reduced by overlapping basisfunctions of which size is larger than the block size.It is however di;cult to construct a space-varyingLT [25] since the LT has a strong constraint suchthat the overlapping parts of the basis functions mustbe orthogonal or biorthogonal. Several studies on aspace-varying LT have been conducted [5,12]. How-ever, they are 1-D transforms, therefore, di;cultyin design of 2-D adaptive LT’s such as orientationadaptive ones remains, since the degree of freedomextensively increases in the 2-D case.

In order to solve those problems, the presentauthors have proposed adaptive transforms with over-lapping basis functions (ALT) [21,25], where thetransform matrix consists of K overlapping functionsof size 2M and (M − K) non-overlapping functionsof size M (M ¿K), and the non-overlapping func-tions can vary according to block signals. As a result,adaptive image coding using multiple transforms ispossible. We provided the analytic solution for theoptimal non-overlapping basis functions in the mini-mum mean square error (MSE) sense when feasibleoverlapping basis functions are given. This analyticsolution enables us to derive 2-D ALT without numer-ical search. As an example, an orientation adaptivelapped transform (OALT) has been designed. It hasbeen shown that the OALT reduces both the blockingeKects and the distortions around edges.In a previous work [21,25], moreover, our ap-

proach required orthogonality for basis functions ofthe ALT such that subspaces spanned by the over-lapping basis functions and by the non-overlappingbasis functions are orthogonal. However, no workhas been done on biorthogonalization of the ALT, al-though it has been reported that biorthogonal lappedtransforms result in higher coding e;ciency than or-thogonal ones [1,6,14]. Also, in the previous work[21,25], centers of symmetry of the long and shortbasis functions of the ALT are not aligned, we can-not use the symmetric extension method at the imageboundaries, where special transforms should be re-quired. Therefore, the scanning method for transformcoe;cients is very complicated. In this paper, wepropose an adaptive lapped biorthogonal transform(ALBT) for image coding where the basis functions’centers of symmetry are aligned. Because of thealignment, we can use the simple symmetric exten-sion method [2,11] at image boundaries when wetransform an input image. In Section 2, we formulatea general form of a 1-D lapped biorthogonal trans-form with variable length basis functions (VLLBT).In the Held of Hlter banks, a linear-phase version ofthe VLLBT has been introduced [6], but we provide avector-matrix form, which is easily extended to a 2-Dnon-separable form and is not limited to linear-phase.In Section 3, in order to construct non-overlappingbasis functions, we review the subspace Karhunen–Lo5eve transform (SKLT) [22–24] that provides theoptimal approximation of an original signal in a given

Page 3: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1635

subspace in the minimum MSE sense. The SKLTenables us to Hnd the optimal non-overlapping basisfunctions without numerical searching. In Section4, we show an orientation adaptation example. Weconstruct a transform matrix of the VLLBT with re-spect to a class characterized by the angle of edges.The resulting VLLBT is two-dimensional and willbe called an orientation adaptive lapped biorthogo-nal transform (OALBT). In Section 5, we show amethod for designing the long basis functions andconstruct the OALBT. In Section 6, we performorientation adaptive image coding with the OALBT.Experimental results show that the use of adaptivetransforms with overlapping basis functions can sig-niHcantly reduce the blocking eKects, and the use oforientation adaptation improves visual quality aroundedges and lines. In Section 7, we give concludingremarks.

2. Lapped biorthogonal transforms withnon-overlapping basis functions

Before derivation for the proposed adaptive lappedtransform, we formulate in this section a generalizedform of a lapped biorthogonal transform with overlap-ping and non-overlapping basis functions. A vector–matrix form will be utilized for the formulation.

2.1. Formulation

Consider two matrices A=[AT0 A

T1 · · · AT

L−1]T and

B=[BT0 B

T1 · · · BT

L−1]T of size LM×M , which will be

called a forward and an inverse transform matrices ofa lapped biorthogonal transform (LBT), respectively.Keep in mind that for i = 0; : : : ; L − 1;Ai and Bi areM ×M matrices. Columns of A and B are basis func-tions. Let fi be the ith block with M samples of aninput signal. Then, the transform vector gi is obtainedby

gi =L∑l=1

ATl−1fi−L+l (1)

and the reconstructed block f i is obtained by

f i =L∑l=1

Bl−1gi+L−l: (2)

For perfect reconstruction fi= f i, the generalized LBThas to satisfy the following conditions [19]:

L−1∑l=0

BTl Al =

L−1∑l=0

BlATl = IM ; (3)

L−1−s∑l=0

BTl Al+s =

L−1−s∑l=0

BlATl+s = 0M ; (4)

where s = 1; : : : ; L − 1. Eq. (3) describes biorthogo-nality of basis functions, and (4) means that the over-lapping functions of neighboring blocks must also bebiorthogonal.Let us introduce two matrices Al and RAl which

contain the Hrst K and the remaining (M−K) columnvectors of Al, respectively, that is,

Al = [Al RAl]; l= 0; : : : ; L− 1: (5)

Assume that L is odd. Setting that RAl= 0M except forl= (L− 1)=2, A becomes

A=

A0 0M...

...A(L−1)=2 RA

......

AL−1 0M

; (6)

where RA(L−1)=2 is written as RA for simplicity. In thiscase, A is a forward transform such that K basis func-tions have length LM and the remaining (M − K)functions have lengthM . The former will be called thelong basis functions, and the latter will be called theshort basis functions. In the same manner, we deHnean inverse transform as follows:

B =

B0 0M...

...B(L−1)=2 RB

......

BL−1 0M

: (7)

The columns of A and those of B correspond to thelong basis functions of the forward and the inversetransforms, respectively. Similarly, RA and RB containthe short basis functions. In our previous work [25],we dealt with only the case RA= RB.

Page 4: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1636 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

In this structure, the short basis functions neveroverlap across block boundaries, so that they enableus to easily generate adaptive lapped biorthogonaltransforms. Since L is odd, if all basis functions aresymmetric or antisymmetric, then the basis functions’center of symmetry are aligned. In the context oflinear-phase perfect reconstruction Hlter banks, thistransform can be regarded as particular cases of thegeneralized lapped biorthogonal transform with vari-able length functions [6]. We will use the term LBTwith variable length (VLLBT) to refer to the proposedtransform.Substituting (6) and (7) into the LBT conditions

(3) and (4), we obtain the following conditions:

Condition 1. The long basis functions of the VLLBTare required to be biorthogonal; and the overlappingparts of the basis functions of neighboring blocksmust also be biorthogonal:

L−1∑l=0

BTl Al = IK ; (8)

L−1−s∑l=0

BTl Al+s =

L−1−s∑l=0

BTl+sAl = 0K ;

s= 1; : : : ; L− 1: (9)

Furthermore, the following condition for overlap-ping parts is required:

L−1−s∑l=0

BlATl+s =

L−1−s∑l=0

BTl+sA

Tl = 0M ;

s= 1; : : : ; L− 1: (10)

Condition 2. For the short basis functions; biorthog-onality is required

RBT RA= IM−K : (11)

The long and short basis functions have the followingrelation:

L−1∑l=0

BlATl + RB RA

T= IM : (12)

If long basis functions meet Condition 1, we saythat they are feasible.

3. SKLT

3.1. Formulation

This section gives theoretical preliminaries to de-rive the optimal short basis functions of the VLLBT.The SKLT was Hrstly formulated by Tanaka andYamashita [23]. All proofs in this subsection can befound in [22,23].Assume that RM is a direct sum of two spaces S1

and S2, that is, RM =S1 ⊕S2. The case where S1

andS2 are orthogonal was discussed in previous work[20]. In this work however, orthogonality betweenS1

and S2 are not required (see Fig. 1).Let f be a vector of M consecutive samples of a

real wide-sense stationary random process. Let L bethe projection matrix onto S1. The projection matrixL is not necessarily orthogonal.

De$nition 1. When a projection matrix L whoserank is K ¡M is given; the SKLT X minimizes the

RM

S

S

1

2

f

Lf

Fig. 1. The case the subspaces S1 and S2 are not orthogonal.

Page 5: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1637

functional

J [X ] = Ef ‖f − (L+ X)f ‖2 (13)

under the condition rank (X)=N for any N6M−K .

When the subspaceS1 is given, the SKLT providesthe optimal approximation in S2. Since letting L= 0leads to the criterion for the KLT [29], the KLT is asubclass of SKLT’s.Fortunately, the analytic solution of the above

problem can be derived as follows. Let R= Ef [ f f T]be the correlation matrix of the input vector f .

Lemma 1. Assume the rank of R is full. Then; thereexist the (M − K) non-zero eigenvalues of

Q = (I − L)R(I − L)T (14)

such that �0¿ · · ·¿ �M−K−1¿ 0; and the corre-sponding eigenvectors 0; : : : ;M−K−1.

Theorem 1. Let i be eigenvectors of Q with respectto non-zero eigenvalues. Assume that we choose theeigenvectors i such that {i}M−K−1

i=0 forms an or-thonormal system. Then; the functional J [X ] in (13)is minimized by

X =N−1∑n=0

n∗Tn ; (15)

where

∗n = (I − L)Tn: (16)

Here, we call {i ;∗i }M−K−1

i=0 an SKLT basis.We have the following result on biorthogonality of

the SKLT basis.

Proposition 1. The SKLT basis {i ;∗i }M−K−1

i=0forms a biorthonormal system; that is;

〈i ;∗j 〉= �i; j ; i; j = 0; : : : ; M − K − 1: (17)

3.2. Application in VLLBT

Once A and B satisfy Condition 1 ((8)–(10)), theshort functions that fulHll Condition 2 ((11) and (12))can be easily constructed by the SKLT. Let RA and RBbe M × (M −K) matrices whose columns correspondto short basis functions. The following result leads

to the optimal short basis functions in the sense ofminimizing criterion (13):

Proposition 2.∑L−1

l=0 BlATl is a projection matrix of

rank K.

Proof appears in Appendix A.1.Proposition 2 guarantees data compression ability

of short basis functions if suitable long basis functionsare given. From Theorem 1, consequently, by settingthat

L=L−1∑l=0

BlATl (18)

in Eq. (14), we obtain a biorthogonal system{i ;∗

i }M−K−1i=0 , which leads to the optimal short ba-

sis functions. Let Rai and Rbi be the ith columns of RAand RB.

Proposition 3. If we set that

Rai = ∗i (19)

and

Rbi = i ; (20)

the resulting VLLBT achieves perfect reconstruction;that is; RA and RB satis=es Condition 2.

Proof appears in Appendix A.2.

Remark. When L is an orthogonal projection matrix;the resulting short basis functions are orthogonal; thatis; RA = RB. In this special case; which was discussedin [20]; we must impose the condition L=LT; whichseems to be too severe; on the long basis functions.The steps to design the short basis functions can be

summarized as follows:1. Choose a set of the feasible long basis functions

A= [AT0 · · · A

TL−1]

T and B = [BT0 · · · B

TL−1]

T.2. Obtain the projection matrix L.3. Obtain a correlation matrix R of the signal.4. From Lemma 1, obtain (M −K) eigenvectors such

that the corresponding eigenvalues are not zero.5. Obtain the forward short basis functions from (16)

and (19), and obtain the inverse short basis func-tions from (20).

Page 6: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1638 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

4. Orientation adaptation

4.1. Extension to 2-D transform

The extension to 2-D transform is simply accom-plished. We treat anM×M image block as anM 2×1vector in the Euclidean space RM 2

by lexicographicordering. In this case, the “long” basis functions have“length” of (LM)2. Similarly, the “short” ones have“length” of M 2. (We will use terms for the 1-D casesuch as “long”, “short”, and “length”.) Let

L=

(L−1∑l=0

BlATl

)⊗(

L−1∑l=0

BlATl

); (21)

where ⊗ denotes the Kronecker product. It can thenbe easily checked that L is still a projection matrix ofrank K 2, and size M 2 × M 2. By applying the resultof Theorem 1, we obtain the (M 2 − K2) 2-D optimalshort basis functions Rai and Rbi of size M 2 for i =0; : : : ; M 2 − K2 − 1. The size of RA and RB is thereforeM 2 × (M 2 − K2).

4.2. OALBT

Since the short basis functions never overlapacross block boundaries, they can easily generatespace-varying lapped transforms via the SKLT. Let� be a parameter with respect to the angle of an edgein each block. To simplify matters, the variation � isdiscretized into J levels such that �j=−�=2+(�=J )jfor j=0; : : : ; J − 1. For j=0; : : : ; J − 1, we deHne theclass Cj which is characterized by the correspondingparameter �j. Image blocks are then classiHed intoone of the classes {Cj}Jj=0. The class CJ is intro-duced for image blocks which do not belong to Cj forj=0; : : : ; J − 1. We deHne the corresponding correla-tion matrix set {Rj}J−1

j=0 , where Rj is anM 2×M 2 ma-trix of the “directional” Markov model [3] deHned as

(Rj)p+qM;p′+q′M = �|dx(�j)| · |dy(�j)|;p; p′; q; q′ = 0; : : : ; M − 1; (22)

where � and are the correlation coe;cients anddx(�j) and dy(�j) are deHned as[dx(�j)dy(�j)

]=[cos �j −sin �jsin �j cos �j

] [p− p′

q− q′

]: (23)

The VLLBT resulting from L and Rj will be calledthe OALBT for the direction �j. We write theshort basis functions of the OALBT by a pair of

M 2× (M 2−K2) matrices RA( j)

= [ Ra( j)0 ; : : : ; Ra( j)M 2−K2−1]

and RB( j)

= [ Rb( j)0 ; : : : ; Rb

( j)M 2−K2−1]. Similarly, we intro-

duce a non-directional correlation matrix RJ withrespect to the class CJ given by

(RJ )p+qM;p′+q′M = "√

|p−p′|2+|q−q′|2 ;

p; p′; q; q′ = 0; : : : ; M − 1; (24)

where " is the correlation coe;cient. It generatesthe non-adaptive 2-D VLLBT with the optimal shortbasis functions (we will call this non-adaptive ver-sion just “VLLBT” to distinguish from the OALBTin encoders). Its short functions are included in

{ RA(J ); RB

(J )} as columns.

5. Design

5.1. Long basis functions

Consider the case that the basis functions of theVLLBT are all symmetric=antisymmetric, that is,linear-phase. In our previous work [21,25], sinceL = 2 was chosen, the symmetric extension method[11,2] could not be used, and special care was re-quired for the image boundary in order to avoid theborder distortion. In this paper, however, since L isodd as formulated previously, the basis functions’centers of symmetry are aligned. Therefore, we canuse the simple symmetric extension method at imageboundaries when we perform the transformation.The long basis functions are produced by the

method like the GenLOT [4]: Let hi ; i = 0; : : : ; K − 1be linear independent vectors of size M such that hi issymmetric if i is even, and antisymmetric otherwise.DeHne

H 0 = [H e H 0]; �i =[Ui 00 Vi

];

W =1√2

[IK=2 IK=2IK=2 −IK=2

];

(25)

where H e and H 0 are M × K=2 matrices consistingof columns which are hi for even i and hi for odd i,respectively, and Ui and Vi are non-singular matrices

Page 7: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1639

of size K=2 × K=2. Then, A can be found from thefollowing recursion:

H i =[H i−1W 0M×K

0M×K H i−1W

]IK=2 0K=20K=2 0K=20K=2 0K=20K=2 IK=2

W�i−1;

(26)

A= HL−1: (27)

The corresponding inverse long basis function matrixB can be obtained by substituting +i for (+

−1i )T and

H 0 for G 0 such that GT0 H 0 = IK in (26) and (27).

SpeciHcally, B is written as

G i =[G i−1W 0M×K

0M×K G i−1W

]IK=2 0K=20K=2 0K=20K=2 0K=20K=2 IK=2

W (+−1

i−1)T;

(28)

A= GL−1: (29)

These A and B meet the conditions (8)–(10). As aresult, the free parameters which we should Hnd areH 0; G 0, and �i (i = 0; : : : ; K − 1).

For application in image coding, we use coding gainas a cost function. Higher coding gain correlates mostconsistently with higher PSNR. The correlation matrixfor a random process with LM samples is given byCp;q = "|p−q| for p; q = 0; : : : ; LM − 1. Coding gainfor a biorthogonal transform is given by [1,10,23]

JCG = 10 log10

(M−1∏i=0

〈ai ;Cai〉‖bi‖2)−1=M

; (30)

where ai and bi are columns of the forward transformA and the inverse transform B, respectively.For image compression purpose, “low DC leakage”

is an essential requirement [19]. Assume that the in-put signal is a constant function, that is, a DC signal,and then consider the case that only the lowest trans-form coe;cient is kept and the rest is set to zero. Ifeach transform coe;cient except the lowest one is notzero, then the reconstructed signal is no longer identi-cal to the original one. Such basis functions cause thecheckerboard artifact [19]. Therefore, the inner prod-uct of each basis function except the lowest one withthe original signal must be zero. In other words, let1N be a DC vector such that all components are one,

that is, 1N = (1; : : : ; 1︸ ︷︷ ︸N

)T. Then, for i=1; : : : ; M − 1; ai

has to satisfy that

〈ai ; 1LM 〉= 0: (31)

A cost function for the low DC leakage is deHned as

JDC =M−1∑i=1

〈ai ; 1LM 〉2: (32)

Consequently, Hnding the long basis functions is for-mulated as a non-linear optimization problem:

MinimizeH 0 ;G 0 ;Ui ;Vi

− JCG + #JDC; (33)

subject to HT0 G 0=IK , where # is a weight for the com-

bination of two cost functions. In every step of the op-timization, the short basis functions are found via theSKLT with a correlation matrix of the Markov model,as expressed in Section 3.2, and then the cost function(33) into which A and B are substituted is evaluated.

In our test, we choose that M =8, L=3, and K=2;there are two sets of long basis functions {a0; a1} and{b0; b1}, where each function has length of 24. Thecase K=2 gives the minimum number of the long ba-sis functions because of the existing condition as de-scribed in [27]. In other words, this case gives the max-imum number of adaptive basis functions. This maybe a good choice for adaptive image coding purpose.In this case, Ui and Vi are scalars, and both H 0 andG 0 consist of two vectors of sizeM . This optimizationproblem can be reduced to an unconstrained optimiza-tion problem as described in [20,23]. The resultinglong basis functions are illustrated in Fig. 2. It is in-teresting that the inverse long basis functions decay tozero at their ends, even though we never impose anyconstraints for decay on the long basis functions. Thisproperty is eKective to reduce the blocking eKects.

5.2. 2-D short basis functions

Since the long basis functions have been alreadyobtained, the short basis functions are automaticallyfound as described in Section 4. By using L given as in(21), we can obtain the 2-D short basis functions withrespect to the angle �j. The total number of classesis 33 (J = 32), where the classes Cj; j = 0; : : : ; 31denote 32 “directional” blocks and C32 denotes onenon-directional block. For the correlation coe;cients

Page 8: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1640 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

0 5 10 15 20 25

− 0.6

− 0.4

− 0.2

0

0.2

0.4

0.6

No. 0

0 5 10 15 20 25

− 0.6

− 0.4

− 0.2

0

0.2

0.4

0.6

No. 1

(a) The long basis functions of the forward transform

0 5 10 15 20 25

− 0.6

− 0.4

− 0.2

0

0.2

0.4

0.6

No. 0

0 5 10 15 20 25

− 0.6

− 0.4

− 0.2

0

0.2

0.4

0.6

No. 1

(b) The long basis functions of the inverse transform

Fig. 2. The resulting long basis functions.

Fig. 3. First eight short basis functions of the QALBT for � = 2�=15 (the correlation coe;cients are set that � = 0:95 and = 0:50).

of Rj, we choose that � = 0:95 and = 0:50. Thecorrelation coe;cient of R32 for design of the VLLBTis set to 0.95 ("= 0:95). Consequently, we obtain 32OALBTs and one VLLBT. As an example, the Hrsteight short functions of the OALBT with respect tothe angle �= 2�=15 are illustrated in Fig. 3.

6. Image coding applications

Recall that this paper does not address the “codingstep” but the “transform step”. Therefore, we will usea simple and well-known algorithm to encode thetransform coe;cients. The encoder consists of the fol-lowing four steps: classiHcation, transformation, quan-tization, and coding. An input image is partitioned into8 × 8-blocks. The block is classiHed into one of theclasses and transformed by the corresponding adap-tive transform. To be fair, the scanned coe;cients areuniform quantized with the same step size for eachcoe;cient. In order to code the transform coe;cients,we use the same run-length=HuKman table as thebaseline JPEG [18]. Side information on the transformselected in each block are coded by using therun-length and HuKman techniques. The details aredescribed as follows.

6.1. Classi=cation

In order to transform blocks with the short basisfunctions, the encoder has to classify blocks into one of

Page 9: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1641

the classes {Cj}32j=0. Then, each block is transformedby the OALBT or the VLLBT derived from the cor-responding correlation matrix Rj.Let f be an input block. If the block is a part of

smooth image regions, that is, the variance of f iscomparatively small, then it would not contain anystrong edges. Therefore, it is natural that f belongsto C32. SpeciHcally, if the variance is below a certainthreshold $, then the input signal is assigned to the“non-directional” class C32, i.e.,

%2f ¡$ ⇒ f ∈C32; (34)

where %2f denotes the variance of f . On the other hand,if the variance of f is comparatively large, that is,%2f ¿ $, the block would contain strong edges or tex-tures. In this case, we classify the data f by the sub-space method. The projection matrix with respect tothe class Cj is deHned as

Pj =R−1∑r=0

Rb( j)r Ra( j)

T

r ; (35)

where 16R6M − K − 1: In subspace methods,the projection matrix Pj characterizes each class Cj.Hence, an input block f is assigned to the class suchthat the norm of its projection is maximized, that is,for i; j = 0; : : : ; 32,

‖Pi f ‖2¿ ‖Pjf ‖2 ∀i �= j ⇒ f ∈Ci : (36)

Assume that Rbi ; i = 0; : : : ; M − K − 1 is normalizedso that ‖ Rbi‖ = 1. Then, the above expression can bereduced toR−1∑r=0

|〈 Ra(i)r ; f 〉|2 ¿R−1∑r=0

|〈 Ra( j)r ; f 〉|2 ∀i �= j ⇒ f ∈Ci ;

(37)

since

‖Pjf ‖2 =∣∣∣∣∣∣∣∣∣∣R−1∑r=0

〈 Ra( j)r ; f 〉 Rb( j)r

∣∣∣∣∣∣∣∣∣∣2

=R−1∑r=0

R−1∑s=0

〈 Ra( j)r ; f 〉〈 Ra( j)s ; f 〉〈 Rb( j)r ; Rb( j)s 〉

=R−1∑r=0

|〈 Ra( j)r ; f 〉|2 (38)

by using the fact that inverse short basis functions Rbrare orthonormal because the matrix Q is symmetric.

Table 1HuKman codebook for run-length on the non-directional class toencode side information

Run Code length Code word Hex

0 1 0 00001 2 10 00022 4 1100 000C3 4 1101 000D4 5 11100 001C5 5 11110 001E6 6 111110 003E7 8 11111100 00FC8 8 11111101 00FD9 8 11111110 00FE10 9 111111110 01FE11 10 1111111110 03FE12 12 111111111100 0FFC13 12 111111111101 0FFD14 12 111111111110 0FFE15 13 1111111111110 1FFE

DRL 5 11101 001DEOH 13 1111111111111 1FFF

6.2. Transform

An input image is Hrstly transformed with the longbasis functions. Transformation is performed in hori-zontal and vertical directions in the image. At imageboundaries, the symmetric extension method [4,6,11]is employed. As a result, we obtain K2 coe;cients perblock. Next, the block belonging to a class Cj is trans-

formed with the short basis functions RA( j). The short

basis functions generate (M 2 − K2) transform coef-Hcients at each block. Therefore, the total number ofcoe;cients is the same as the size of the input image.Note that there is no redundancy.

6.3. Bit allocation

For decoding, side information which indicates theclass of each block should be transmitted or stored.We build the HuKman codebook for run-length on thenon-directional class. For a block which does not be-long to the non-directional class C32, Hve bits are allo-cated. When the rest of the blocks in the image are thenon-directional class blocks, the end-of-header (EOH)symbol is used for e;cient coding. The HuKman codesused for side information are listed in Table 1.

Page 10: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1642 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

Fig. 4. ClassiHcation map: each white line segment indicates the angle of the directional block.

22

24

26

28

30

32

34

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PS

NR

[dB

]

Rate [bpp]

OALBTVLLBT

DCT

(a)

29

30

31

32

33

34

35

36

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PS

NR

[dB

]

Rate [bpp]

OALBTVLLBT

DCT

29

30

31

32

33

34

35

36

37

38

39

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PS

NR

[dB

]

Rate [bpp]

OALBTVLLBT

DCT

26

27

28

29

30

31

32

33

34

35

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PS

NR

[dB

]

Rate [bpp]

OALBTVLLBT

DCT

(b)

(c) (d)

Fig. 5. Comparison of PSNR (dB) results for 512× 512 “Barbara”, “Lena”, “Pepper” and “Boat” images.

Page 11: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1643

Fig. 6. Comparison of the decoded “Barbara” images at rate 0.25 bpp.

For transform coe;cients, in this test, weadopt a uniform scalar quantizer and the samerun-length=HuKman codebook as the baseline JPEG[18]. However, zig–zag scanning is applied only tothe coe;cients produced by the long basis functionssince the short basis functions have 2-D non-separableform.

6.4. Image coding results

In the classiHer, eight short basis functions are used(R=8). The threshold $=200 is chosen through exper-iments. In order to evaluate our proposed transform,

three diKerent transforms are compared:DCT: The case that the DCT is used.VLLBT: The case that the 1-D VLLBT is used.OALBT: The case that one VLLBT and 32

OALBTs are adaptively used.Only the last case needs side information for de-

coding.The amount of side information and the percen-

tage of directional blocks (corresponding to classesCj; j = 0; : : : ; 31 in the OALBT coder are shown inTable 2. Fig. 4 shows the resulting map of the testimages “Barbara” and “Pepper”. In Fig. 4, four kindsof white line segments indicate directional blocks

Page 12: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1644 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

Fig. 7. Comparison of the decoded “Pepper” images at rate 0.25 bpp.

Table 2The amount of overhead and the percentage of directional regions

Barbara Pepper Lena Boat

Overhead in bpp 0.0445 0.0276 0.0281 0.0309Directional region 39.1% 19.3% 20.2% 24.2%

and its direction of the angle, where 32 directionsare classiHed into four directions to be seen simply.In “Barbara”, some blocks containing striped patternare classiHed into directional classes, and others arenot. Those blocks are judged to be plane regions bythe encoder with (34). This decision depends on the

threshold $. For smaller threshold, they would be clas-siHed into directional regions. However, the smallerthreshold results in the longer length of overhead andcan lead to lower coding e;ciency. The empiricaldiscussion for the threshold in orientation adaptivecoding is shown in [20].Coding results and comparisons at diKerent rates

are illustrated in Fig. 5. In “Barbara”, OALBT con-sistently outperforms both DCT and VLLBT in thePSNR sense, even though there exists side informa-tion. For example, at 0:25 bpp, OALBT gains 1:27 dBover DCT. In all images, at lower bit rates, bothVLLBT and OALBT show higher PSNRs than DCT.However, PSNRs in “Pepper”, “Lena”, and “Boat”

Page 13: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1645

with OALBT are slightly lower than those withVLLBT. At around 0:20 bpp, OALBT results in lowerperformance than VLLBT. This can be explained bythe existence of side information. For example, at thePSNR 30:00 dB, the bit-rate (including side informa-tion) in OALBT is 0:2139 bpp, and that in VLLBTis 0:2049 bpp for “Pepper”. Thus, VLLBT requiresfewer bits than OALBT only by 0.0090 bits per pixel.In spite of this, 0.0276 bits are used for side infor-mation for “Pepper”. This fact implies that a moresophisticated scheme to encode side information thanours may lead to better performance.Figs. 6–7 illustrate the original and the decoded im-

ages 512 × 512 “Barbara” and “Pepper” at 0:25 bpp,respectively. We can observe that OALBT providesbetter subjective quality of the decoded image com-pared to the other methods. It seems that OALBT re-duces ringing and blurring around strong edges andtherefore provides clearer edges and lines. These re-sults may indicates the use of adaptive basis functionsin the VLLBT can improve coding e;ciency despitethe fact that side information is required for decoding.

7. Conclusions

A novel adaptive lapped biorthogonal transformand its application in orientation adaptive cod-ing have been proposed. The proposed transformconsists of overlapping and non-overlapping basisfunctions, where the basis functions’ centers of sym-metry are aligned, so that we can treat the imageboundaries without special processing. To constructnon-overlapping basis functions, we have also intro-duced a transform that provides the optimal approxi-mation of an original signal in a given subspace. In theencoder, an image block is selectively transformed byone of orientation adaptive transforms. Experimentalresults show that performance of adaptive transformsgains an advantage over that of non-adaptive trans-forms even though side information is needed.In this paper, we have shown only the ori-

entation adaptation example. However, since thenon-overlapping basis functions can be obtained froma correlation matrix by solving the eigenvalue prob-lem, any conventional adaptation procedures basedon the Karhunen–Lo5eve transform can be applied tothe proposed design method of lapped transforms. For

instance, we can construct various classes of adaptivelapped transforms by training correlation matrices ofa source of input signals. Therefore, the proposedframework is powerful for the design of adaptivetransforms without the blocking eKect.In this paper, the adaptation is applied only to

short basis functions. However, constructing 2-Dnon-separable long basis functions adapted for a 2-Dcharacteristic such as orientation would be very mean-ingful. This problem will be left for future research.

Appendix A. Proofs

A.1. Proof of Proposition 2

Let us introduce the transform matrices given by

Tf =

. . . 0A0

A1 A0...

...AL−1 AL−2

AL−1

0. . .

(A.1)

and

T i =

. . . 0B0

B1 B0...

...BL−1 BL−2

BL−1

0. . .

: (A.2)

Before we prove Proposition 2, we show the followinglemma.

Lemma 2. The matrix P de=ned by

P = T iTTf (A.3)

is a projection matrix.

Proof. We have

P2 = T iTTfT iT

Tf = T iT

Tf = P; (A.4)

since the columns of A and B are biorthogonal from(8). Therefore; Lemma 2 holds.

Page 14: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

1646 T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647

Now, we shall show the proof of Proposition 2.

Proof of Proposition 2. From (9); we can rewrite Pas

P =

. . . 0B0

B1 B0...

...BL−1 BL−2

BL−1

0. . .

×

. . . 0

AT0 A

T1 · · · A

TL−1

AT0 · · · A

TL−2 A

TL−1

0. . .

=

. . . 0L−1∑l=0

BlATl

L−1∑l=0

BlATl

0. . .

=

. . . 0

LL

0. . .

; (A.5)

which implies that P is a block diagonal matrix. Itfollows from Lemma 2 that L2 =L. Therefore; L is aprojection matrix.Next, because L is a projection matrix, we have

rank(L) = tr(L); (A.6)

which can be rewritten as

rank(L) = tr(L)

= tr

(L−1∑l=0

BlATl

)= tr

(L−1∑l=0

BTl Al

)= tr[IK ] = K; (A.7)

since (8) holds. This completes the proof.

A.2. Proof of Proposition 3

It is evident from Proposition 2 that RA and RB satisfythe condition (11).The SKLT X of rank M − K is identical to RB RA

T,

since

X =M−K−1∑

n=0

n∗Tn = RB RA

T: (A.8)

From the fact that i is an eigenvector of Q, RB RAT

gives the projection into R(Q). Hence, we have

R( RB RAT)⊂ R(Q)

= R((I − L)R(I − L)T)= R(I − L) (A.9)

since the rank ofR is full. From Lemma 1, rank (Q)=M − K . Therefore, we have

R( RB RAT) = R(I − L): (A.10)

On the other hand, because RB RATL = 0, we obtain

N ( RB RAT) ⊃ R(L)=N (I−L). However, since the rank

of RB RATis M − K , we have

N ( RB RAT) = N (I − L): (A.11)

From (A.10) and (A.11),

I − L= RB RAT

(A.12)

which yields that (12) holds.

References

[1] S.O. Aase, T.A. Ramstad, On the optimality of nonunitaryHlter banks in subband coders, IEEE Trans. Image Process.4 (December 1995) 1585–1591.

[2] R.H. Bamberger, S.L. Eddins, V. Nuri, Generalizedsymmetric extension method of size-limited multirate Hlterbanks, IEEE Trans. Image Process. 3 (January 1994) 82–87.

[3] G. BjHntegaard, A novel method for compressing imagesusing discrete directional transforms, in: Proceedings of SPIEVisual Communication and Image Processing 88, Vol. 1001,1988, pp. 840–846.

[4] R.L. de Queiroz, T.Q. Nguyen, K.R. Rao, The GenLOT:Generalized linear-phase lapped orthogonal transform IEEETrans. Signal Process. 44 (March 1996) 497–507.

[5] R.L. de Queiroz, K.R. Rao, Time-varying lapped transformsand wavelet packets, IEEE Trans. Signal Process. 41(December 1993) 3293–3305.

Page 15: An adaptive lapped biorthogonal transform and its application in orientation adaptive image coding

T. Tanaka, Y. Yamashita / Signal Processing 82 (2002) 1633–1647 1647

[6] R.L. de Queiroz, T.D. Tran, A fast lapped transform forimage coding, in: Proceedings IS&T=SPIE Symposium onElectronic Imaging, Image and Video Communications andProcessing, 2000.

[7] R.D. Dony, S. Haykin, Optimally adaptive transform coding,IEEE Trans. Image Process. 4 (October 1995) 1358–1370.

[8] A. Gersho, R.M. Gray, Vector Quantization and SignalCompression, Kluwer Academic Publishers, Boston, MA,1992.

[9] M. Helsingius, P. Kuosmanen, J. Astola, Image compressionusing multiple transforms, Signal Process. Image Commun.15 (March 2000) 513–529.

[10] J. Katto, Y. Yasuda, Performance evaluation of subbandcoding and optimization of its Hlter coe;cients, in:Proceeding of SPIE Conference on Visual Commun. andImage Processing, Vol. 1605, 1991, pp. 95–106.

[11] H. Kiya, K. Hishikawa, M. Iwahashi, A development ofsymmetric extension method for subband image coding, IEEETrans. Image Process. 3 (January 1994) 78–81.

[12] T.J. Klausutis, V.K. Madisetti, Adaptive lapped transform-based image coding, IEEE Signal Process. Lett. 4 (September1997) 245–247.

[13] H.S. Malvar, Signal Processing with Lapped Transforms,Artech House, Norwood, MA, 1992.

[14] H.S. Malvar, Biorthogonal and nonuniform lapped transformsfor transform coding with reduced blocking and ringingartifacts, IEEE Trans. Signal Process. 46 (April 1998) 1043–1053.

[15] H.S. Malvar, D.H. Staelin, The LOT: Transform codingwithout blocking eKects IEEE Trans. Acoust., Speech, SignalProcess. 37 (April 1989) 553–559.

[16] S. Muramatsu, H. Kiya, A new factorization technique forthe generalized linear-phase LOT and its fast implementation,IEICE Trans. Fundamentals E79-A (August 1996) 1173–1179.

[17] K. Ohzeki, Adaptive KL transform coding and its designmethod. Trans. IEICE, Part B-I J77-B-I (February 1994) 94–101 in Japanese.

[18] W.B. Pennebaker, L.J. Mitchell, JPEG Still Image DataCompression Standard, Van Nostrand Reinhold, New York,NY, 1992.

[19] G. Strang, T. Nguyen, Wavelets and Filter Banks,Wellesley-Cambridge Press, Wellesley, MA, 1996.

[20] T. Tanaka, Y. Yamashita, Vector-embedded Karhunen–Lo5evetransform and its application in orientation adaptive codingof images, IEICE Trans. Fundamentals E73-A (June 2000)1257–1266.

[21] T. Tanaka, Y. Yamashita, The orientation adaptive lappedorthogonal transform for image coding, in: Proceedings of2000 IEEE International Conference on Image Processing(ICIP 2000), Vol. III, September 2000, pp. 829–832.

[22] T. Tanaka, Y. Yamashita, A lapped biorthogonal transformwith optimal non-overlapping basis functions, Tech. Rep.CS2000-92, IEICE, December 2000.

[23] T. Tanaka, Y. Yamashita, A biorthogonal transform withoverlapping and non-overlapping basis functions for imagecoding, IEEE Trans. Signal Process. 2001, submitted forpublication.

[24] T. Tanaka, Y. Yamashita, A biorthogonal transform withoverlapping and non-overlapping basis functions for imagecoding, in: Proceedings of the 2001 IEEE InternationalConference on Acoustics, Speech, and Signal Processing(ICASSP 2001), May 2001.

[25] T. Tanaka, Y. Yamashita, Adaptive transforms withoverlapping basis functions for image coding, J. ElectronImaging 10 (July 2001) 706–719.

[26] T.D. Tran, R.L. de Queiroz, T.Q. Nguyen, Linear-phaseperfect reconstruction Hlter bank: Lattice structure, design,and application in image coding, IEEE Trans. Signal Process.48 (January 2000) 133–147.

[27] T.D. Tran, M. Ikehara, T.Q. Nguyen, Linear phase paraunitaryHlter bank with Hlters of diKerent lengths and its application inimage compression, IEEE Trans. Signal Process. 47 (October1999) 2730–2744.

[28] G.W. Wornell, D.H. Staelin, Transform image coding witha new family of models, in: Proceeding of the IEEEInternational Conference on Acoustic, Speech, and SignalProcessing 1988, pp. 777–780.

[29] Y. Yamashita, H. Ogawa, Relative Karhunen–Lo5evetransform, IEEE Trans. Signal Process. 44 (February 1996)371–378.