15
1 Rate-Distortion Analysis of Directional Wavelets and Partitioning Improvements Arian Maleki, Boshra Rajaei and Hamid R. Pourreza Abstract—The inefficiency of separable wavelets in represent- ing smooth edges has led to a great interest in the study of new two-dimensional transformations. The most popular criterion for analyzing these transformations is the approximation power. Transformations with near-optimal approximation power are useful in many applications such as denoising, enhancement, etc. However, they are not necessarily good for compression. Therefore, most of the nearly optimal transformations such as curvelets and contourlets have not found any application in image compression yet. One of the most promising schemes for image compression is the elegant idea of directional wavelets. While these algorithms outperform the state of the art image coders in practice, our theoretical understanding of them is very limited. In this paper, we adopt the notion of rate-distortion and calculate the performance of the directional wavelets on a class of edge- like images. Our theoretical analysis shows that if the edges are not ‘sharp’, the directional wavelets will compress them more efficiently than the separable wavelets. It also demonstrates the inefficiency of the quadtree partitioning that is often used with the directional wavelets. To solve this issue, we propose a new partitioning scheme called megaquad partitioning. Our simulation results on real-world images confirm the benefits of the proposed partitioning algorithm, promised by our theoretical analysis. Index Terms—Rate-distortion, quadtree, wavelet transform, image coding, directional transform. I. I NTRODUCTION I N the last decade, several schemes have been proposed to overcome the limitations of the traditional separable wavelets by incorporating directional representations [1]– [6]. One of the most successful transformations proposed for image compression is the directional wavelets (DIW) [7]–[16]. Aligning the direction of the wavelet transform with the edge allows DIW to reduce the energy of the high frequency bands that in turn improves the efficiency of the compression. In spite of the success of DIW heuristics, the theoretical analysis of these algorithms has been very limited; for instance, the most successful heuristics, that use a partitioning algorithm and assign just one direction to each block of the partition, have not been analyzed yet. The relation between the partitioning algorithm and DIW coding is not known either. In this paper, our goal is to use the distortion-rate (DR) framework to address these issues. It Arian Maleki is with the Department of Electrical and Computer Engineer- ing, Rice University, Houston, TX 77005 USA (phone: +1-650-575-1480; e-mail: [email protected]). Boshra Rajaei is with the Department of Computer Engineering, Ferdowsi University of Mashhad, Iran (corresponding author, phone: +98-915-522- 2232; e-mail: [email protected]). Hamid R. Pourreza is with the Department of Computer Engineering, Ferdowsi University of Mashhad, Iran (e-mail: [email protected]). will be proved that the quadtree partitioning is not optimal in DR sense. We will also show how the performance can be improved by a simple modification of the partitioning scheme. Here is the organization of the paper. Next section formal- izes the notion of DIW and clarifies the DR framework. Sec- tion III discusses the contribution of our work and compares it with the related work in the literature. Section IV is devoted to the proofs of our theoretical results. Section V addresses the implementation issues. Finally, Section VI includes the details of our implementations, parameter values, and comparison of our proposed algorithms with other algorithms. II. FRAMEWORK AND MODEL A. Notation Let R be the set of real numbers. The L 2 -norm of a function f : R 2 R is defined as kf k 2 = (R R f 2 (t 1 ,t 2 )dt 1 dt 2 ) 1 2 . L 2 (R 2 ) is the set of all functions f : R 2 R with finite L 2 - norm. Finally, let the angle bracket represent the inner product of two functions in this space, i.e., for f,g L 2 (R 2 ) we have hf,gi = ZZ f (t 1 ,t 2 )g(t 1 ,t 2 )dt 1 dt 2 . B. Directional wavelets Let φ : R R and ψ : R R be the univariate scaling and wavelet functions of an orthonormal wavelet transform, respectively [17]. The shifted and scaled forms of these functions are denoted by ψ j,n (t)=2 j/2 ψ(2 j t - n) and φ j,n (t)=2 j/2 φ(2 j t - n), where j, n Z, and Z is the set of integer numbers. There are several approaches for extending the one dimensional wavelets to two dimensional wavelets. The most standard construction is the separable wavelets that uses Ψ 1 j,n1,n2 (t 1 ,t 2 )= φ j,n1 (t 1 )ψ j,n2 (t 2 ), Ψ 2 j,n1,n2 (t 1 ,t 2 )= ψ j,n1 (t 1 )φ j,n2 (t 2 ), and Ψ 3 j,n1,n2 (t 1 ,t 2 )= ψ j,n1 (t 1 )ψ j,n2 (t 2 ) as the bases. It is proved in [17] that separable wavelets provide an orthonormal basis for L 2 (R 2 ) and therefore, any function f L 2 (R 2 ) can be written as f (t 1 ,t 2 )= X j,n1,n2 3 X i=1 C i j,n1,n2 Ψ i j,n1,n2 (t 1 ,t 2 ), where for every j, n 1 ,n 2 Z C i j,n1,n2 = hf, Ψ i j,n1,n2 i,i =1, 2, 3. Separable wavelets are suboptimal in representing image edges due to the lack of directionality in the construction of their bases [1]–[4], [18].

Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

1

Rate-Distortion Analysis of Directional Waveletsand Partitioning Improvements

Arian Maleki, Boshra Rajaei and Hamid R. Pourreza

Abstract—The inefficiency of separable wavelets in represent-ing smooth edges has led to a great interest in the study of newtwo-dimensional transformations. The most popular criterionfor analyzing these transformations is the approximation power.Transformations with near-optimal approximation power areuseful in many applications such as denoising, enhancement,etc. However, they are not necessarily good for compression.Therefore, most of the nearly optimal transformations such ascurvelets and contourlets have not found any application in imagecompression yet. One of the most promising schemes for imagecompression is the elegant idea of directional wavelets. Whilethese algorithms outperform the state of the art image coders inpractice, our theoretical understanding of them is very limited.In this paper, we adopt the notion of rate-distortion and calculatethe performance of the directional wavelets on a class of edge-like images. Our theoretical analysis shows that if the edgesare not ‘sharp’, the directional wavelets will compress themmore efficiently than the separable wavelets. It also demonstratesthe inefficiency of the quadtree partitioning that is often usedwith the directional wavelets. To solve this issue, we proposea new partitioning scheme called megaquad partitioning. Oursimulation results on real-world images confirm the benefits ofthe proposed partitioning algorithm, promised by our theoreticalanalysis.

Index Terms—Rate-distortion, quadtree, wavelet transform,image coding, directional transform.

I. INTRODUCTION

IN the last decade, several schemes have been proposedto overcome the limitations of the traditional separable

wavelets by incorporating directional representations [1]–[6]. One of the most successful transformations proposedfor image compression is the directional wavelets (DIW)[7]–[16]. Aligning the direction of the wavelet transformwith the edge allows DIW to reduce the energy of the highfrequency bands that in turn improves the efficiency of thecompression. In spite of the success of DIW heuristics,the theoretical analysis of these algorithms has been verylimited; for instance, the most successful heuristics, thatuse a partitioning algorithm and assign just one directionto each block of the partition, have not been analyzed yet.The relation between the partitioning algorithm and DIWcoding is not known either. In this paper, our goal is to usethe distortion-rate (DR) framework to address these issues. It

Arian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University, Houston, TX 77005 USA (phone: +1-650-575-1480;e-mail: [email protected]).

Boshra Rajaei is with the Department of Computer Engineering, FerdowsiUniversity of Mashhad, Iran (corresponding author, phone: +98-915-522-2232; e-mail: [email protected]).

Hamid R. Pourreza is with the Department of Computer Engineering,Ferdowsi University of Mashhad, Iran (e-mail: [email protected]).

will be proved that the quadtree partitioning is not optimal inDR sense. We will also show how the performance can beimproved by a simple modification of the partitioning scheme.

Here is the organization of the paper. Next section formal-izes the notion of DIW and clarifies the DR framework. Sec-tion III discusses the contribution of our work and compares itwith the related work in the literature. Section IV is devoted tothe proofs of our theoretical results. Section V addresses theimplementation issues. Finally, Section VI includes the detailsof our implementations, parameter values, and comparison ofour proposed algorithms with other algorithms.

II. FRAMEWORK AND MODEL

A. Notation

Let R be the set of real numbers. The L2-norm of a functionf : R2 → R is defined as ‖f‖2 =

(∫ ∫f2(t1, t2)dt1dt2

) 12 .

L2(R2) is the set of all functions f : R2 → R with finite L2-norm. Finally, let the angle bracket represent the inner productof two functions in this space, i.e., for f, g ∈ L2(R2) we have

〈f, g〉 =

∫ ∫f(t1, t2)g(t1, t2)dt1dt2.

B. Directional wavelets

Let φ : R → R and ψ : R → R be the univariatescaling and wavelet functions of an orthonormal wavelettransform, respectively [17]. The shifted and scaled forms ofthese functions are denoted by ψj,n(t) = 2j/2ψ(2jt− n) andφj,n(t) = 2j/2φ(2jt− n), where j, n ∈ Z, and Z is the set ofinteger numbers. There are several approaches for extendingthe one dimensional wavelets to two dimensional wavelets.The most standard construction is the separable wavelets thatuses Ψ1

j,n1,n2(t1, t2) = φj,n1

(t1)ψj,n2(t2), Ψ2

j,n1,n2(t1, t2) =

ψj,n1(t1)φj,n2(t2), and Ψ3j,n1,n2

(t1, t2) = ψj,n1(t1)ψj,n2(t2)as the bases. It is proved in [17] that separable waveletsprovide an orthonormal basis for L2(R2) and therefore, anyfunction f ∈ L2(R2) can be written as

f(t1, t2) =∑

j,n1,n2

3∑i=1

Cij,n1,n2Ψij,n1,n2

(t1, t2),

where for every j, n1, n2 ∈ Z

Cij,n1,n2= 〈f,Ψi

j,n1,n2〉, i = 1, 2, 3.

Separable wavelets are suboptimal in representing imageedges due to the lack of directionality in the construction oftheir bases [1]–[4], [18].

Page 2: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

2

One seemingly simple solution is to align the directionof the wavelets with the edge [7]–[15]. In this approach,the wavelets can be applied in two different directions θ1

and θ2. Let Ψj1,j2,n1,n2(t1, t2) = ψj1,n1

(t1)ψj2,n2(t2)

and Ψθ1,θ2j1,j2,n1,n2

(t1, t2) = Ψj1,j2,n1,n2(t1 cos θ1 +

t2 cos θ2, t1 sin θ1 + t2 sin θ2), then the DIW coefficientsare given by

W θ1,θ2j1,j2,n1,n2

=

∫∫f(t1, t2)Ψθ1,θ2

j1,j2,n1,n2(t1, t2)dt1dt2. (1)

In practical applications, we usually stop applying DIW ata given scale J0 and use the scaling function at that scale.Toward this goal we should replace either one or both of theψ functions in the definition of Ψj1,j2,n1,n2

(t1, t2) with φ.Finally, it is easy to confirm that this transform is invertibleas long as θ1 6= θ2.1

C. Partitioning

The main goal of DIW is to find the directions along whichthe image is more regular, and apply the wavelets in thosedirections.2 However, in natural images different regions mayhave different dominant directions. Therefore, partitioning ofan image into its homogenous regions is an inevitable part ofDIW. Finding the optimal partitions is an NP-hard problem.In other applications, greedy ideas such as CART have beenexplored for this purpose. See Chapter 9.2 in [19]. However,even greedy algorithms are still computationally infeasiblefor image compression. Therefore, simpler approaches suchas quadtree partitioning have been proposed [12], [13], [15].In this section, we first explain the quadtree partitioning,and then we will discuss another approach called megaquadpartitioning.

1) Quadtree partition: Consider the interval [0, d1]×[0, d2].In this interval a dyadic rectangle is the set of points[k1d12j ,

(k1+1)d12j ) × [k2d22j ,

(k2+1)d22j ), where 0 ≤ k1, k2 < 2j .

Here, j is called the scale of the dyadic rectangle which isdifferent from the scale of the wavelet transform. Four dyadicrectangles of the same size are called siblings if and only iftheir union is also a dyadic rectangle which is called the parent.The four siblings are then called the children of their commonparent. The process of partitioning one dyadic rectangle intoits four dyadic children is called quad partitioning. Following[18], we define the quadtree partition as

Definition 1. A quadtree partition is any partition of Id =[0, d1]×[0, d2] reachable by applying the following productionrules:

1) Id is a quadtree partition;2) If P = {S1, S2, . . . , Sn} is a quadtree partition, then

any other partition that can be obtained by applyingquad partitioning to one of the rectangles in P is also aquadtree partition.

1In real applications, where the signals are discrete time, both the applica-tion of DIW and its inversion need some care. They are discussed in SectionVI.

2In this paper we use the phrase directional wavelets to indicate an adaptivetransformation that, at every pixel, applies the wavelet transform in twoarbitrary but fixed directions.

We use the notation ΥJ for the set of all quadtree partitionsup to scale J . An example of a quadtree partition is shownin Figure 1(a).

2) Megablock: Let P = {S1, S2, . . . , Sn} be a partition of[0, d1] × [0, d2]. Two blocks are called neighbors if and onlyif the intersection of their boundaries contains more than onepoint. The boundary of a set is defined as the set subtraction ofits closure from its interior. Furthermore, we use the notationN (Si) for the set of all the neighbors of the block Si. Givenan attribute set A, we assign an attribute αi ∈ A to everySi. In the context of the directional wavelets, the attributescorrespond to the directions that are assigned to the blocks.After assigning the attributes, we represent the partition withPA = {Sα1

1 , Sα22 , . . . , Sαnn } and call it A-decorated partition.

We also define the following concepts:

Definition 2. Two blocks are called similar, depicted by Sαii ∼Sαjj , if and only if αi = αj .

Definition 3. A path Sα11 ↔ Sα2

2 ↔ . . .↔ SαNN is a sequenceof blocks such that Sαi−1

i−1 ∈ N (Sαii ) for every i ∈ {2, . . . , N}.

Definition 4. Two blocks Sαii , Sαjj are connected if and only

if αi = αj and there exists a path Sαii ↔ Sαii1 ↔ . . . ↔ Sαijbetween them.

Definition 5. A megablock is a union of two or more blockssuch that any pair of blocks are connected. Also, a megablockMb is called maximal iff there exists no other megablockMb′

with Mb ⊂Mb′ .

3) Megaquad partition: Megaquad partition is any parti-tion resulted from an A-decorated quadtree partition, PA ={Sα1

1 , Sα22 , . . . , Sαnn }, by joining the connected similar blocks

which also has the property that all the megablocks aremaximal. It is clear that the megaquad partition correspondingto an A-decorated quadtree partition is unique. However,many A-decorated quadtree partitions may result in the samemegaquad block. There are several advantages in using themegaquad partitions. First, they can represent the geometryof the image better. Second, as we will discuss in the nextsection, they let us code much less number of directions. Thisidea was first introduced in [20] for the wedgelet transform.Figure 1 shows an A-decorated quadtree partition and thecorresponding megaquad partition.

D. Rate-distortion framework

Consider a class of signals F ⊂ L2(R2). Also, consider anencoding scheme E : F → {1, 2, . . . , 2R}. This is an R bitencoder. We also call the decoder D : {1, 2, . . . , 2R} → F . Fora given function f ∈ F define f = D(E(f)). The distortionof f is defined as

D(f, f) = ‖f − f‖2. (2)

We define the distortion of the coding scheme on the classF as the distortion of the algorithm on the least favorablefunction, i.e.,

DE,D(R) = supf∈F

D(f,D(E(f))). (3)

Page 3: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

3

Fig. 1. An A-decorated quadtree partition (left), and its correspondingmegaquad partition (right). αi represents the attribute of each dyadic rectangle.All the patches that are connected and have the attribute α1 have formed onemegablock and all the connected dyadic rectangles with attribute α2 formedanother megablock.

When the encoding and decoding strategies are clear fromthe context we use a simpler notation D(R) and call it thedistortion-rate (DR) function. Under this notion of distortion,the best performance a compression scheme can achieve isspecified by the well-known Kolmogorov’s ε-entropy [21].

E. Edge model

One of the most interesting aspects of real-world imagesare edges. Therefore, in this paper we direct our attention toa class of edge-like images and analyze the DR performanceof DIW on this class.Let BPQN (Ix, A) be the space of piecewise polynomial func-tions on the interval Ix ⊂ R, where Q is the number ofsingularity points, N the degree of polynomials, and A isan upper bound on the magnitude of these functions. LetIx = [0, d] and A = d. We also assume that except at thesingularity points, the magnitude of the derivative of thesefunctions is bounded by one. For any h(x) ∈ BPQN (Ix, d) wedefine fh : R2 → R as

fh(t1, t2) =

1 if t2 ≤ h(t1),h(t1)−t2+w

w if h(t1) ≤ t2 ≤ h(t1) + w,0 if t2 > h(t1) + w.

(4)

This function is a cartoon image with a piecewise smoothedge. In the above equation, h(t1) is the edge of thiscartoon image and w > 0 is the width of the edge. It isimportant to emphasize on the following aspects of this model:

1. The edge is not sharp and has some width associatedwith it. DIW uses this fact to decrease the energy of thehigh frequency bands. In other words, since the coder usesfinite number of bits in coding the directions, there is alwaysa discrepancy between the true direction of an edge andthe direction that the algorithm considers. If the edges arevery sharp, i.e., we remove the second rule from fh andset w to zero in the third rule, this discrepancy may causethe high frequency wavelet coefficients to be as large asthe corresponding coefficients of the separable wavelets.Therefore, no major improvement is observed for such edges.The smoothness of the edges is the main key to the successof DIW.

2. We assumed that all the edge parts have the same width.This simplifies our notation and can be easily extended tomore general settings, where each part has its own width. Infact, the width of the edge can be a function of t1 and theanalysis still works.

Consider the following class of edge-like cartoon images:

HQN = {fh(t1, t2) : h(t1) ∈ BPQN ([0, d], d)}. (5)

Inspired by [18], we call this class of functions the horizonmodel. Figure 2(a) depicts a function in H3

1. In the rest of thispaper, we direct our attention to HQ1 . Extensions to higherpolynomial degrees is presented in [22]. Therefore, for thenotational simplicity we skip the subscript and represent thisclass with HQ.

III. OUR CONTRIBUTIONS AND COMPARISON WITH OTHERWORK

A. Theoretical contributions

The inefficiency of the separable wavelets in capturing edgeshas been proved from several different perspectives includingthe approximation power [1], minimax estimation [18], andthe rate-distortion theory [23]. We will first confirm this factby calculating the DR performance of the separable waveletson the simplest class of functions we consider in this paper,H0. See Section IV for more information on the details of ourderivation.

Theorem III.1. A coding scheme based on the uniformquantization of separable wavelet coefficients (just the largecoefficients) results in the following DR function:

D(R) = O(R−32

√log2R).

Since this class of functions can be easily coded with a fewparameters, intuitively speaking, we would expect to see anexponential rather than a polynomial decay of the distortionin terms of the rate. However, the separable wavelets do notexploit the structure of the edges and therefore, generate manylarge coefficients at the high frequency bands.3 To utilizethis structure, DIW aligns its direction with the edge. Thisdecreases the size of the wavelet coefficients in the highfrequency bands. See Figure 3. The following theorem provedin Section IV formalizes this heuristic.

Theorem III.2. The coding scheme that uses DIW with theuniform quantization, achieves the following DR on the classH0.

D(R) = O(2−c8√R),

where c8 is a positive constant that just depends on d and `,the length of the wavelet filters.

It is important to note that the distortion of the DIW decaysmuch more rapidly than the separable wavelets.

3Although we derive an upper bound for the DR performance, it can beverified that the bounds we are using are tight and we expect the lower boundto be close to the upper bound derived here.

Page 4: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

4

(a) (b)

(c) (d)

Fig. 2. A sample function in H31 (a) and its corresponding optimal partition (b). The result of quadtree partition with four levels and the megaquad partition

are shown in (c) and (d) respectively. Similar to the optimal partitioning megaquad has just four partitions.

Fig. 3. Comparison of the separable wavelets (left) with DIW (right). Thehigh frequency bands of the separable wavelets have many non-zero elements,while DIW has moved most of the energy of the high-frequency bands to thelow frequency band by aligning the wavelet to the direction of the edge.

So far, our theoretical results are concerned with H0.However, H0 is too simple for modeling the edges of real-world images. Therefore, we should extend the theory to morecomplicated models. The first step is to consider the piecewiselinear edges, HQ. The performance of the separable waveletson HQ is the same as H1, since it uses the same directionfor all the pieces; on the other hand, DIW needs to adaptits direction to the edge and therefore, segmentation of theimage is inevitable. As explained before, the most popularsegmentation is the quad partitioning. The following theorem,proved in Section IV, derives the performance of DIW withthe quadtree segmentation.

Theorem III.3. On HQ the directional wavelet coder withuniform quantization and quadtree segmentation achieves thefollowing DR,

D(R) = O(6√R2−c9

3√R).

where c9 is a constant that depends on `, d, and Q.

This theorem shows a major improvement over the tra-ditional separable wavelets. But, comparing Theorem III.2with Theorem III.3 reveals that quadtree partitioning hasdeteriorated the performance of DIW. The main reason isthat quad-tree generates many unnecessary dyadic rectangles

around the singularity points. See Figure 2. Coding these extrapartitions and their attributes will consume a large portion ofthe bit-budget. Figure 2(c) displays the optimal segmentationfor an image from H3. If we had access to this optimalpartition we could get the same performance as H0. However,as discussed before, finding the optimal partition of an image iscomputationally infeasible. Figure 2(d) shows that megaquadpartition can also provide a near-optimal partition. Therefore,we expect it to improve the DR performance of the DIW coder.The following theorem formalizes this observation.

Theorem III.4. A coding scheme that uses the directionalwavelets, uniform quantization, and megaquad partitioningachieves the following DR function on the class HQ,

D(R) = O(2−c10√R), (6)

where c10 = c8√Q

, and c8 is the constant in Theorem III.2.

It is important to note that megaquad partitioning keeps thesimplicity of the quad partitioning in addition to improve theDR performance of the DIW compression.

B. Algorithmic and experimental contributions

The models considered in this paper are ideal and thereis no noise or unpredicted structure in them. However, inpractice due to noise and textures, even in the regions wherethere is no dominant direction, the algorithm may choose arandom direction. If we do not correct these directions, theyhighly affect the performance of the megaquad partitioning.As we will show in the simulation section, for the samereason the noise has a major effect on the performance of theDIW with quadtree segmentation. Therefore, we will introducethe alignment scheme in Section V that accounts for theseissues and makes the megaquad partitioning very efficient forpractical purposes.

The second issue that has not been considered in ourtheoretical analysis is that the images are discrete and their

Page 5: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

5

values are just known on a specific grid. Therefore, all theideas presented above shall be implemented in the discretedomain. Following [15] we use the lifting scheme [24], [25]for the wavelet transform, since it provides a very convenientframework for designing and modifying wavelets. We will thenexplain how all our ideas can be implemented in this setting.

Our final contribution in this paper is the comparison ofthe algorithms on real images that is presented in SectionVI. As the simulations demonstrate, the algorithm proposedin this paper outperforms the state of the art algorithms bothobjectively and subjectively.

C. Relations to other work

The work presented here is related to several other papersin the literature, and our goal in this section is to clarify theconnections.The horizon model considered in our paper is a smoothedversion of the models used in several other papers [20], [23],[26]. These papers construct a huge dictionary for presentingthese cartoon-like images. The size of these dictionaries makesthe problem of finding the optimal match challenging. As asolution, [26] suggests a greedy approach and proves that thisalgorithm is nearly optimal. On the other hand, [20] uses theidea of megaquad partitioning. Both papers have done theanalysis in the rate-distortion framework. The main differencebetween our work and these two papers is that, we are usingthe specific dictionary of DIW, which is very different fromthe dictionaries considered in those papers. In practice, DIWoutperforms the proposals of [20], [26].

Along the same line of research, we should also point tothe seminal work of Donoho on wedgelets [18]. Wedgeletdictionary is the same as the dictionary used in [20]. How-ever, for finding an approximation, it uses quadtree partition-ing. Wedgelets display optimal rate-distortion performance onsmooth edge models [27] and have also been extended tohigher dimensions [28], [27]. The nice properties of theseapproaches have led the researchers to exploit these types ofideas for practical image compression [29], [30]. Although itis a very promising area of research, the compression schemesproposed so far come short, when compared to DIW. The mainreason is the poor performance of wedgelets on textures orthe areas whose geometries do not match the geometry thatwedgelets are optimal for.DIW, discussed in this paper, has been around for more thana decade [7]–[15], [31]–[36]. These algorithms started fromuseful and interesting heuristics to exploit the anisotropicnature of images. However, their performances have notbeen analyzed theoretically yet. The only two algorithmsthat have been analyzed theoretically are directionlets andbandelets. Directionlets apply the wavelets on a lattice. Themain advantage is that there is no interpolation involved inthe process. But, it suffers from a major drawback; whenintegrated with partitioning, it performs poorly. This is dueto the fact that it does not use the pixels of the neighboringblocks. Bandelets on the other hand have more freedom andconsider the interpolated sub-pixels as real pixels. However,bandelets have not been as successful as simpler approaches

such as [13]. The theoretical analysis of our models will showthat the quadtree partitioning is not optimal for DIW and alsowe do not need multiscale approximations of the edges that areused in bandelets. These confirm the observation that simplerdirectional wavelets outperform bandelets. Furthermore, ouranalysis explains how one can adapt DIW for more compli-cated geometries and, therefore, provides a scope for furtherresearch in this area. It also provides a unified frameworkfor analyzing several algorithms proposed in the literature andtheir scope for image compression.

Our empirical work is based on the more recent compressionalgorithms [13], [15]. We use the lifting scheme on theinterpolated pixels. However, we have also incorporated themegablocking and alignment to the DIW and, as will beexplained in the experimental section, they have improved theperformance of DIW.

Finally, [1]–[6], [37], [38] proposed other dictionaries thatare able to capture the structures of the edges. From theapproximation power point of view, some of these transfor-mations are nearly optimal for different classes of geometries.These transformation provide very redundant dictionaries andhave not found any application in image compression.

IV. THEORETICAL RESULTS

This section provides the proofs of Theorems III.1, III.2,III.3, and III.4.

A. Rate-distortion analysis of separable wavelets

As mentioned before, the two dimensional separable wavelettransform of a function is its projection on the three basesΨ1j,n1,n2

, Ψ2j,n1,n2

, and Ψ3j,n1,n2

. In order to prove TheoremIII.1, we have to find an upper bound on the values of thewavelet coefficients. Lemma IV.1 serves this purpose.

In all the proofs, we assume that, the wavelets have finitesupport of length ` and their first moments are equal to zero[17]. This implies that the coefficients corresponding to thewavelets that do not intersect with either h(t1) or h(t1) + ware zero.

Lemma IV.1. On the class of functions H0, the waveletcoefficients at scale j, (j ≥ 1 + log `

w ), satisfy

|C1j,n1,n2

| ≤ c12−2j , |C2j,n1,n2

| ≤ c22−2j , |C3j,n1,n2

| ≤ c32−2j ,(7)

where c1 = max |ψ|max |φ|`3 tan θw , c2 = max |ψ|max |φ|`3

w , andc3 = max |ψ|2`3(1+tan θ)

w . Here, θ is the angle between the edgeand the horizontal line.

Proof: We just write the proof for C3j,n1,n2

, since theother bounds are derived similarly. Here, the only non-zerowavelet coefficients are the ones that sense (or their supportoverlaps with) either h(t1) or h(t1) + w. Let the support ofΨ3j,n1,n2

(t1, t2) be IΨ = [t′0, t′0 + `2−j ] × [t0, t0 + `2−j ] and

suppose this interval intersects with the lower boundary ofthe edge f ∈ H0. It is clear that for any (t1, t2) ∈ IΨ and

Page 6: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

6

j ≥ 1 + log `w

4

1− `2−j tan θ + `2−j

w≤ f(t1, t2) ≤ 1. (8)

Using (8) and the fact that 〈f(t1, t2),Ψ3j,n1,n2

(t1, t2)〉 =〈f(t1, t2)− 1,Ψ3

j,n1,n2(t1, t2)〉, we have

|C3j,n1,n2

| ≤∫ t0+`2−j

t0

∫ t′0+`2−j

t′0

|f(t1, t2)− 1||Ψ3j,n1,n2

(t1, t2)|dt1dt2 ≤

`2−j(tan θ + 1)

w

∫ t0+`2−j

t0

∫ t′0+`2−j

t′0

|Ψ3j,n1,n2

(t1, t2)|dt1dt2 ≤

max |ψ|2`3(tan θ + 1)

w2−2j .

Finally, we can use 1 − `2−j tan θw ≤ f(t1, t2) ≤ 1 and

1 − `2−j

w ≤ f(t1, t2) ≤ 1 instead of (8) to prove the claimedupper bounds for C1

j,n1,n2and C2

j,n1,n2, respectively.

In the rest of this section, for the notational simplicty weassume that w > `. The results clearly hold even if w < `and this just changes the constants.

Proof of Theorem III.1:Inspired by [39], we consider the same quantization level foreach non-zero coefficient. This suboptimal choice of quanti-zation will not change the decay rate of the DR function, andwill just affect the constants. Let c = max (c1, c2, c3), wherec1, c2, and c3 are as defined in Lemma IV.1. We have

|Cij,n1,n2| < c2−b−1, for j > b/2, (9)

where b is the number of bits assigned to each large coefficient.Suppose that the quantization steps are equal to c2−b−1.According to (9), all the coefficients at scales j > b/2 will bemapped to zero. Therefore, the encoder should just code thevalue and the location of the non-zero coefficients for scalesj < b/2. The location of each large coefficient can be codedwith four bits, since we know that they are descendants of largecoefficients in the previous scale. Furthermore, at every scalej the number of non-zero coefficients are 3×2×d`2j (3 and 2factors are because of 3 wavelet bands and the two transitionsfrom the flat regions to the shaded region, respectively). Hence,the total bit rate is

R =

b2∑j=0

(6d`2j)(b+ 4) = c4b2b/2(1 +O(

1

b)), (10)

where c4 = 12d`.There are two sources of distortion: quantization error of

non-zero coefficients at scales j ≤ b/2 and the quantizationerror at scales j > b/2. We call them D1 and D2, respectively.We have

D21 ≤ (c4b2

b/2(1 +O(1

b)))(c2−b−1)2 ≤ c5b2−

32 b(1 +O(

1

b))

4The j ≥ 1 + log `w

condition ensures that the support of a wavelet doesnot intersect both edge boundaries simultaneously.

and

D22 ≤

∞∑j=b/2+1

2d`2j(c212−4j + c222−4j + c232−4j) ≤ c62−32 b,

where c5 = c2c4/4 and c6 = 2d`(c21 + c22 + c23). So,

D =√D2

1 +D22 ≤

√c5b2

− 34 b(1 + o(1)). (11)

By combining (10) and (11) and eliminating b, we obtain

D(R) = O(R−32

√log2R).

B. Rate-distortion analysis of DIW

This section is devoted to the proofs of Theorems III.2, III.3,and III.4.We start with some notations. In addition to the value and thelocation of the large wavelet coefficients, DIW coder shall alsosend the attribute (here direction) of each block to the decoder.Suppose that the coder assigns b′ bits to specify the angle θbetween the direction of the edge and the horizontal line. Theprecision of the scheme in θ is then ∆θ = π/2b

′. Without

loss of generality, we assume that when all directions performsimilarly, i.e., there is no dominant direction, the wavelets areapplied in the vertical direction. Suppose that for a given edgewe have |θ| ≤ ∆θ/2. Then, DIW is applied in the horizontaldirection. It is clear that even though we are making thisassumption for the sake of simplicity of notation, the resultswe derive hold for any direction, because of the symmetricconstruction of the algorithm. Under these assumptions, DIWcoefficients can be written as

W(j1, j2, n1, n2) =

∫ d

0

wf (j1, n1, t2)2j2/2ψ(2j2t2 − n2)dt2,

(12)

C(0, j2, n1, n2) =

∫ d

0

cf (0, n1, t2)2j2/2ψ(2j2t2 − n2)dt2,

(13)where

wf (j1, n1, t2) =

∫ d

0

f(t1, t2)2j1/2ψ(2j1t1 − n1)dt1, (14)

and

cf (0, n1, t2) =

∫ d

0

f(t1, t2)φ(t1 − n1)dt1. (15)

It is important to note that unlike the separable wavelets,the two scales j1 and j2 are not necessarily equal. This addsto the flexibility of DIW. This advantage of DIW is alsoemphasized in [9].

1) Analysis of one piece: In this section, we focus on H0

and prove Theorem III.2.

Proof of Theorem III.2: We first prove that the totalenergy of wf (j1, n1, t2), j1 > 0, is proportional to (∆θ)2 forsmall values of ∆θ. Here, for the simplicity of the notation,we assume that w

sin ∆θ > `. Suppose that we choose b′ so largethat all the W(j1, j2, n1, n2) coefficients are very small and

Page 7: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

7

can be skipped in the coding. Similar to the proof of LemmaIV.1, we have

|wf (j1, n1, t2)| ≤ max |ψ|`2

w2−3j1/2∆θ(1 +O(∆θ2)).

Here, we have assumed that b′ is an increasing function ofR and b′ → ∞ as R → ∞. Therefore, the total energy ofwf (j1, n1, t2) is∞∑j1=1

[max |ψ|`2

w∆θ2−3j1/2]2×d`2j1(w+2`2−j1∆θ) = O(∆θ2),

(16)where d`2j1(w + 2`2−j1∆θ) is an upper bound for the areaof the region on which |wf (j1, n1, t2)| > 0. Since weassumed that the coefficients W(j1, j2, n1, n2) are negligible,we need to just code C(0, j2, n1, n2) coefficients. In the restof this proof we use j instead of j2 to simplify the notation.According to Lemma IV.1, we have

|∫ d

0

f(t1, t2)ψ(2jt2 − n2)dt2| ≤`2 max |ψ|

w2−2j .

Hence,|C(0, j, n1, n2)| ≤ c22−3j/2, (17)

where c2 is the parameter introduced in Lemma IV.1. Thenumber of non-zero coefficients is bounded by 2 × d` ×d`∆θ2j + 2d`. Suppose the coder assigns b bits to each non-zero coefficient and uniformly quantizes all of them. We have

|c22−3j/2| < c22−b−1, for j > 2b/3. (18)

Therefore, all coefficients at scale j > 2b/3 will be mappedto zero and the total bit rate is

R = b′ + (b+ 2)×2b/3∑j=0

(2d2`2∆θ2j + 2d`).

Set b′ = b to obtain

R = c7b2(1 + o(1)), (19)

where c7 = 2d`. There are three sources of distortion: theskipped W(j1, j2, n1, n2) coefficients, the quantization errorof C(0, j, n1, n2) for j < 2b/3, and the quantization error ofC(0, j, n1, n2) for j > 2b/3. We call the last two quantizationerrors D1 and D2, respectively. We have

D21 = (c27θ2

2b/3 + c7b)× (c22−b−1)2 = c22c72−2b(1 + o(1)),

D22 =

∞∑j>2b/3

(c22−3j/2)2×(c27θ22b/3+c7b) = c22c72−2b(1+o(1)).

The total distortion is, therefore,

D =√D2

1 +D22 + ∆θ2 = c2

√2c72−b(1 + o(1)). (20)

We combine (19) and (20) to obtain

D(R) = O(2−√c7√R).

2) Quadtrees: This section provides the proof of TheoremIII.3. Let us explain the coding strategy first. The coder fixesthe depth of the tree and up to that depth, each block thathas more than one piece of the edge is divided.5 Once thepartitions are determined, the DIW coder performs what itdoes for H0 images.

Proof of Theorem III.3: At each level of the quadtree andfor each partition at that level the coder uses one bit to codethe partitioning decision (1 means that the specified partitionis divided into its four children and 0 means that it remainsintact, since it has at most one piece of the edge in it). Sincethe proposed functions contain no more that 2Q singularities,at each level at most 2Q rectangles will be split. Therefore,we need 8QJ bits to code the whole quadtree structure.

After applying quadtree partitioning, the picture is dividedinto at most 6QJ + 2Q smaller rectangles, 2Q of which stillcontain more than one edge (since we have 2Q singularitypoints). Suppose we just code the partitions that do not haveany singularity points in them and we assign R0 bits to each.Then the total rate is

R = 8QJ + 6QJR0. (21)

The distortion, on the other hand, is the sum of errors of 2Qrectangles with singularities and H0 error for the other 6JQrectangles. Hence, using Theorem III.2, the total distortion canbe expressed as

D =

√2Q(d2−J × d2−J)2 +Q6J(2−c8

√R0)2, (22)

where d2−J × d2−J is the area of the partitions that includea singularity point in them. Minimizing (22) subject to theconstraint (21), results in

J =R1/3

(6Q)1/3(1 + o(1)),

R0 =R2/3

(6Q)2/3(1 + o(1)).

Therefore, we have

D(R) = O(6√R2−c9

3√R), (23)

where c9 = c86Q1/3 .

C. Megaquad partitioning

This section is devoted to the proof of Theorem III.4.

Proof of Theorem III.4: Assume we partition the imageby the megaquad scheme. This approach obtains 2Q + 3megablocks (2Q for edges and 3 for all the other smoothpieces) together with 2Q dyadic rectangles with more than oneedge. For each partition, we need to use one bit to code thejoining decision (refer to Section V-C2 for more information).Therefore, under assumptions of Theorem III.3, total bit ratecan be written as

R = 8QJ + (2Q+ 3)R0, (24)

5Practical schemes for partitioning of natural images are explained inSection V.

Page 8: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

8

and the distortion of this coding is:

D =

√Q(d22−2J)2 +Q(2−c8

√R0)2. (25)

Again, by combining (24) and (25) and setting J = O(R1/2),we have R0 = R

2Q+3 (1 + o(1)), and therefore,

D(R) = O(2−c10√R), (26)

where c10 = c8√2Q+3

.

V. IMPLEMENTATION

In order to apply the ideas we introduced in the theoreticalsection, we should address several issues that we skipped inthe previous sections. First, it is not clear how we shouldapply the wavelets in a given direction, since we have justaccess to the values of the image on a Cartesian grid. Second,the textures and the noise present in images may misleadthe DIW to partition the areas where there is no dominantdirection. We will show in the simulation section that thisphenomena deteriorates the performance of the DIW coders.The goal of this section is to study all these effects andexplain our approach to address them. The block diagram ofour algorithm is shown in Figure 4. It includes the followingsteps: First, the best directions are selected. The next step isto use an alignment scheme to detect and adjust the patcheswith no dominant direction. Finally, wavelet coefficients willbe obtained via a lifting scheme and will be coded togetherwith all the side information (such as the selected directions).Each step will be explained in detail.

A. Implementation of DIW

Following [12], [13], [15], in order to implement DIW, weuse the idea of lifting introduced in [25].

1) Lifting scheme: In [24], Daubechies and Sweldensproved that the polyphase matrix, H(z), of any discrete bi-orthogonal wavelet filter with a compact support, can befactored into a set of prediction, Pi(z), and update, Ui(z),steps followed by a normalization, i.e.,

H(z) =

m∏i=1

[1 Pi(z)0 1

] [1 0

Ui(z) 1

] [K 00 1

K

], (27)

where m denotes the number of lifting steps and K is anon-zero constant called the normalization factor. It is easierto adapt the lifting scheme to the directional setting, sinceit provides a spatial construction for the wavelets. Figure5 represents the analysis and synthesis steps of the liftedwavelets. Due to the structure of the lifting scheme there isno loss of information after applying the prediction and theupdate steps.

2) Directional Lifting: Extending the ideas of the lift-ing scheme to the classical separable wavelet transform isstraightforward. However, since the image samples are on theCartesian grid the extension to the directional cases is not asstraightforward. As mentioned in the previous section, liftingscheme has two steps; splitting the signal and applying thepredict and update steps. Although the splitting is clear in

the one dimensional setting, it is not as clear for the two di-mensional signals. Two successful approaches proposed in theliterature are quincunx splitting [12] and row/column splitting[15]. Empirical results demonstrated that row/column splittingoutperforms the quincunx splitting [40]. This is also true in oursetting and therefore, we are going to use row/column splitting.Once the partitions are specified, the next step is to apply theprediction and update steps in the specified location. Sincesome of the samples we need for these steps are not on thegrid, we use an interpolation scheme to calculate their valuesand then use the interpolated values for either the predictionor the update steps. To explain the process more formally,let Ge and Go represent even and odd samples, respectively.Here, we just consider row subsampling. Column subsamplingfollows the same relations just by exchanging the role of therow and column variables. The following formula provides theprediction of the odd sample xoij ∈ Go from the even samples:

P (xoi,j) =

NP−1∑n=−NP

KPn x

ei+n,j+n tan−1 d, (28)

where xeij ∈ Ge, 2NP and KP are the length and coefficientsof the prediction filter, respectively . Similarly for updatingxei,j we have

U(xei,j) =

NU−1∑n=−NU

KUn (xoi+n,j+n tan−1 d−P (xoi+n,j+n tan−1 d)),

(29)where 2NU and KU are the length and the coefficients of theupdate filters, respectively. Note that at the boundaries a singleeven sample may participate in predicting odd samples withdifferent directions and in these cases update will be performedalong more than one direction, accordingly.

If j + n tan−1 d does not belong to the grid, it will beinterpolated in the following way:

xi+n,j+n tan−1 d =

Nc−1∑l=−Nc

βlxi+n,[j+n tan−1 d−l],

where [·] denotes the integer part and in our implementation,we propose Nc = 1 and β−1 = β0 = 0.5 which meansthat we use the two closest samples for the interpolation. Wetested more complicated interpolation schemes such as syncinterpolation with more samples and we did not observe anyimprovement.

For the simplicity of notation, in the rest of the paper wewill use the notations P rSd(x) or P cSd(x) for a two dimensionalsignal whose samples are defined by using the prediction stepover a dyadic rectangle S in the direction d. The superscriptr or c explain if the subsampling is done along the rows orcolumns. In the cases that it does not have any effect on theexposition we may remove that.

B. Partitioning

In Section II-C we explained the partitioning approach forthe case of continuous time signals. The application of thoseideas to the discrete signals is pretty straightforward. Supposethat we have an n × n image. Let G = {(i, j)|i, j ∈ Z, 0 ≤

Page 9: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

9

Fig. 4. Block diagram of DIW using megaquad approach.

Fig. 5. One stage of the lifting scheme for signal analysis and reconstruction. HP and LP denote highpass and lowpass subbands, respectively. Also, Goand Ge represent the odd and even samples of the signal G.

i < n, 0 ≤ j < n}. For every partition P = {S1, S2, . . . , Sp}of the interval [0, n) × [0, n) we define the discrete partitionDP = {D1, . . . , Dp}, where Di = Si ∩G. We always ensurethat all the resulting partitions are non-empty. For example,in quad partition we set the maximum depth of the tree tolog2(n). Clearly if the partitions have had directions assignedto them, the resulting discrete partitions will have directionsas well. In the rest of the paper whenever we discuss thepartitions we are referring to discrete partitions.In the ideal settings, where we are dealing with signals that arefollowing the piecewise linear structures and there is no noisein the system, one will set the depth J of the tree accordingto the bit-budget and continues the quadtree partitioning untilthe partitioning does not reduce the approximation error orthe depth reaches J . However, as mentioned before, in thepractical settings this approach might not work and if wefollow the same strategy, we may end up using the full treeevery time. This partition is not desirable for several reasons.First, it does not reflect the structure of the underlying image.Second, it uses most of the bit-budget for coding the structureof the partition and the directions and will perform evenworse than the classical separable wavelets in compressionapplications. To prevent such circumstances, we penalize thealgorithm for the complexity of partitions. For a given twodimensional signal x and a given dyadic rectangle, S, xS isdefined as

xS(i, j) =

{x(i, j) if (i, j) ∈ S,0 if (i, j) /∈ S, (30)

As before let PA = {Sα11 , Sα2

2 , . . . , Sαpp } be a A-decorated

partition. When A is the set of directions in our algorithm, wemay call PA direction-decorated partition.

Definition 6. For a given direction-decorated partition PA, thecomplexity penalized prediction error denoted by CPPE(PA)is defined as,

CPPE(PA) =∑i

‖xSαii − PSαii (xSαii)‖22 + λ1|PA|, (31)

where P represents the prediction operation defined before andλ1 is the penalization constant that controls the complexity ofthe partition. If we set λ1 to zero, we may obtain the full tree,while setting λ1 to ∞ does not allow any partitioning.

This cost function has been used in several other papers[12], [13], [15]. Finding the partition that gives us the mini-mum of the above cost function is an NP-complete problemand therefore, as explained before, we restrict the class ofpartitions. Suppose that we restrict our attention to the set ofall quad partitions ΥJ , where J = log2(n) and the goal is,

minPA∈ΥJ

CPPE(PA).

Suppose that a quad partition PA is given. One easy test tocheck if this is the optimal partition or not, is to see whetherquadtree splitting of each of its quad-rectangles may resultin a better partition or not. To do this test, we fix all thepartitions except, for instance, Si and we divide Si into itsfour children Si,1, Si,2, Si,3, Si,4. For each of the new blocks,we also find the best direction. Finally, we form a new partitionPA = PA\{Sαi } ∪ {S

α1i,1 , S

α2i,2 , S

α3i,3 , S

α4i,4}. If CPPE(PA) ≥

CPPE(PA) this is an indication of the fact that PA is not theoptimal partition. Using this criterion, [15] proposed a greedyapproach for minimizing the CPPE function.

C. Megaquad partitioning

The final step in the partitioning is to form the megablocks.Once we have direction-decorated quadtree partitions it is veryeasy to create megablocks and code them all together. How-ever, due to the existence of noise and texture-like structures,some of the directions are either unreliable (noisy) or they donot benefit from the algorithm very much. This is speciallytrue when the block sizes get smaller. To fix this issue, weuse an alignment step before the megaquad partitioning. Thealignment step is explained in the next section.

Page 10: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

10

1) Alignment step: Usually there are two types of unreliableblocks in the presence of noise: 1. small blocks and 2. blockswithout dominant directional features, e.g., blocks of smoothregions. Aligning the directions of such blocks and theirneighbors may result in larger megablocks and improve theperformance. If di is the current direction of each block anddi is the direction after the alignment, we use the followingoptimization for calculating di:

mind

∑i

wi(di − di)2 + λ2

∑j

∑i∈N(Sj)

|Sj |(di − dj)2 (32)

where |Si| is the size of the block Si and λ2 is the Lagrangemultiplier. Here, d is the vector of all the new directions andwi is a measure of the reliability of the direction di definedas

wi = C−i − C+i , (33)

where C−i and C+i are, respectively, the maximum and the

minimum cost of predicting block i according to (31).In (32), the first term is defined to align directions with

respect to block reliability and in the second term, the blocksize is included. As we will show later, selected directions oflarger blocks are less affected by noise (refer to Section VI-Afor more detailed analysis).

2) Creating megablocks: After applying the quadtree parti-tioning and aligning the directions, we start joining the blocksuntil all the megablocks are maximal. For coding the newpartition we use the following scheme. We define two blocktypes: 1. inner blocks, whose neighbors all belong to the samemegablock, and 2. boundary blocks with at least one neighborfrom another megablock. Now scanning the blocks in a left toright and top to bottom order, we code each boundary block by0 and each inner block by 1. Notice that scanning of blockswith different sizes is performed according to their origins(which is assumed at top left corner). The above process is alsoreversible and the decoder can retrieve the structure perfectly.This approach uses one bit per block to code the structure ofthe megablock.

D. Coding

From the above discussion, we can conclude that DIW re-quires four pieces of information for the reconstruction phase:1. wavelet coefficients, 2. selected directions, 3. quadtree parti-tioning information, and 4. megaquad coding information. Wecode wavelet coefficients using the TCE coder [41]. Directioncoding is performed by, first, predicting each selection fromselections in the causal neighbors and, then, its modular ndresidual is computed and coded with variable length coding.Partitioning information, i.e., quadtree and megaquad parti-tions consist of 0 and 1 runs which we coded using run lengthencoding.

VI. SIMULATIONS

A. Analyzing the key concepts using polygonal model

So far we have explained our implementation of DIW withmegaquad partitioning. To highlight the main concepts, we first

Fig. 6. Partitions and selected directions. (a) Polygonal model; (b) Quadtreepartitions; (c) Megablocking partitions. It is clear that the quadtree generatesmany blocks at the corners of the underlying object. However, megablockingsolves this issue by connecting the blocks with similar directions.

Fig. 7. Overhead of the quadtree algorithm as a function of λ1. We enlargeone region from our polygon model to show the partitions and selecteddirections while λ1 increases. The rightmost image shows that when theoverhead of the quadtree is the same as the megaquad partition, it loosesthe details around the corner point.

present a simple polygonal image (Q = 5) shown in Figure6(a). Natural images will be studied in the next section.

In this ideal model, quad-splitting occurs in the blocks thatinclude one or more vertices of the polygon. As demonstratedin Section IV, in these regions overpartitioning generates lotsof subblocks with the same directions. The separate coding ofthese regions imposes an unnecessary overhead on the DIW-based compression algorithms. This drawback in capturingedge singularities is clearly shown in Figure 6(b).

Our megaquad approach, as described in Section V, appliesquadtree partitioning followed by aligning and joining steps.Figure 6(c) shows the results of the megablocking approachon the polygon image. Numerical results of coding this imageshows, 1649 and 388 overhead bits for the quadtree andmegablocking, respectively (in all experiments the same valueof λ1, 9, is chosen and it is assumed that the structure and theoverhead are coded and transmitted to the decoder at all bitrates).

Increasing the Lagrange multiplier, λ1, seems to be a simplesolution for the overhead problem. However, this comes at thecost of suboptimal representation of image geometries. Thisphenomenon is summarized in Figure 7.

Quadtree overhead and its suboptimal behavior in partition-ing images are more problematic in the presence of noise;additive noise may mislead the algorithm as well. Supposeour previous polygonal model is corrupted by a Gaussianwhite noise with variance 0.5 × 10−3 (as shown in Figure

Page 11: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

11

Fig. 8. Partitions and selected directions in the presence of a Gaussian whitenoise with variance 0.5 × 10−3. (a) Noisy polygon; (b) quadtree partitions;(c) megaquad partitions. Quadtree generates many unnecessary partitions withrandom directions in the smooth regions. Our megaquad approach, on the otherhand, fixes these random directions and provides a better segmentation.

Fig. 9. (a) Two ideal patches of 45 and 172 degrees. (b) Value ofCPPE(Pα) for the two patches of size 32× 32 shown in (a) as a functionof the direction chosen by DIW. (c) The result of a similar experiment on8 × 8 patches. Solid and dashed lines corresponds to noisy(noise variance.01) and noiseless experiments, and red crosses show the global minima (theselected directions) in each case.

8(a)). Its effect on the quadtree partitioning algorithm canbe clearly observed in Figure 8(b). Note that here we usethe same parameters as in the noise-free simulation. Figure 9studies the effect of noise on the DIW directions. It showsthe CPPE(PA) as a function of direction for two edge-likepatches. As shown in this figure, the noise may mislead thealgorithm to choose the wrong direction. It also confirms thatthe noise affects smaller blocks more than the larger ones. Inother words, the directions of larger blocks are more reliable.This is the main reason we use the size of the blocks in thealignment step as well.

In addition to the non-optimal directions, increasing theoverhead is another byproduct of generating many smallblocks. This phenomena is exhibited in Figure 10. This figureshows the dependence of the overhead to the variance of thenoise. We define the overhead as the number of bits usedfor coding the partitions and their directions. As mentionedearlier, λ1 = 9 and λ2 = 0.3. We observe that the overhead ofthe quadtree algorithm tremendously increases as the noisevariance becomes larger. But megablocking, thanks to theefficient partitioning of geometries, is less sensitive. Figure10 compares the PSNR of the two algorithms as a function ofnoise variance. The bitrate is fixed to 0.1 bpp. The megaquadoutperforms the quadtree algorithm by 2.39 dB on average.Note that instead of the noisy input image, the originalnoiseless Polygon is considered as a reference image in all

distortion-rate simulations.

B. Real Images

In this section, we present the results of our megablock cod-ing scheme on real images and compare it with JPEG2000 andDIW with quadtree partitioning. We also compare megablock-ing with DA-DWT as proposed in [13] in the next section.Here, in both megablocking and quadtree directional waveletapproaches we use a set of 5 directions aligned at 45, 72.5, 90,112.5 and 135 degrees. For prediction and update steps, weuse samples from 6 nearest even rows and in the case of non-integer locations, interpolation of the two nearest samples isused. The details are explained in Section V. Furthermore, inall experiments we employ (6, 6) interpolating wavelets [42]and the Lagrangian parameters, λ1 and λ2, are fixed to 9 and.3, respectively.

Since we keep the partition and the number of direc-tions fixed at different bit-rates6, the overhead caused byquadtree partitioning is not an important issue at high bitrates. However, at lower bit-rates it will start to play its roleagain. To demonstrate this claim, we present the compressionperformance of different coding schemes for four 512 × 512test images on bit-rates ranging from 0.01 to 0.1 bpp in Figure11. The megaquad outperforms the quadtree, on average, by1.01 dB for lena and 1.08 dB for barbara. It also performsbetter than JPEG2000 by up to 0.25 dB for lena and 0.99 dBfor barbara.

Furthermore, the reconstruction of the megablocking repre-sents image geometries better than the other algorithms. Thisis shown in Figure 12. JPEG2000 clearly introduces ringingartifacts to the image geometries and ruins their regularity.Figure 13 provides the same experiments at high bit rates.Here, megablocking outperforms quadtree by up to 0.45 dBfor lena and 0.59 dB for barbara and it also performs betterthan JPEG2000 by up to 0.49 dB for lena and 1.48 dB forbarbara. Consequently, megablocking outperforms the othertwo methods both subjectively and objectively at a wide rangeof bit-rates.

1) Noisy Images: As we explained earlier in section VI,quadtree partitioning is extremely sensitive to even very smallvalues of noise that are always present in natural images.

Figure 14 shows the resulting partitions and selected direc-tions of the proposed megablocking and quadtree approacheson the lena image. The proposed megaquad approach reducesthe overhead by generating reasonable partitions as a conse-quence of changing weak directions and jointly coding of thesimilar neighboring blocks.

In Figure 15 we study noise sensitivity of quadtree,megaquad, and JPEG2000 algorithms. Although JPEG2000displays the least noise sensitivity in terms of PSNR, Figure16 shows that megablocking results in better geometricalrepresentation due to its directional nature.

6This approach is not optimal if we were dealing with continuous timesignals. The optimal strategy for continuous time signals was explained inSection IV. However, in the discrete setting since the samples are given on agrid, increasing the number of directions over 7-9 does not help. Therefore,we fix the number of directions and do not change them with the bitrate inour simulations.

Page 12: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

12

Fig. 10. Overhead (left) and distortion in PSNR (right) versus noise variance for the polygon image.

Fig. 11. Comparison of compression performance between the proposed megablocking, the JPEG2000, and the quadtree for lena (left) and barbara (right)test images at rates up to 0.1 bpp.

Fig. 12. Comparison of visual quality between the three compression algorithms for lena image reconstructed at 0.06 bpp.

2) DA-DWT drawbacks: DA-DWT suggests a special kindof partitioning. In this algorithm each 64× 64 block may justbe decomposed into 32×64, 64×32, 32×32, 16×64, 64×16,16× 32, 32× 16, or 16× 16 subblocks. Indeed, the smallestblock is 16×16. It also suggests a set of 9 directions predictedfrom samples at integer locations and without interpolation.

Although the special kind of partitioning suggested by DA-DWT will reduce the overhead, it does not efficiently capturethe image geometries. We studied the problem of reducingoverhead by using large partitions in Figure 7 and DA-DWTsuffers from a similar issue. This is shown in Figure 17 forpatches of lena and barbara images. DA-DWT provides betterPSNR results (due to very low amount of overhead), butthe reconstructions suffer from two main problems; first, theringing artifacts are visible at some boundaries (because ofassigning directions to a relatively large portion of the images),and second, images are affected by brushstroke-like artifactsalong different directions, since it is using far samples insteadof interpolation for the predict and update steps. On the other

hand, these two undesirable patterns are completely absent inthe results of megablocking.

VII. CONCLUSION

In this paper, we present a rate-distortion analysis for thedirectional wavelet transform. Our analysis led us to a newscheme for partitioning images, called megaquad partitioning.Theoretical and simulation results confirmed that our newscheme outperforms the state of the art compression algo-rithms.

ACKNOWLEDGEMENTS

The authors would like to thank the referees for theircomments that helped to clarify the exposition.

REFERENCES

[1] E. J. Candes and D. L. Donoho. Ridgelets: a key to higher-dimensionalintermittency? Phil. Trans. R. Soc. Lond. A., 357(1760):2495–2509,1999.

Page 13: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

13

Fig. 13. Comparison of compression performance between the proposed megablocking, the JPEG2000, and the quadtree for lena (left) and barbara (right)test images at rates ranging from 0.1 to 1.0 bpp.

Fig. 14. Partitioning structure and the selected directions for the first level of the directional wavelets with (from left to right) quadtree and megablocking,on lena and also the same results for a noisy image with PSNR = 32.19 dB.

Fig. 15. Distortion in PSNR of megablocking, quadtree, and JPEG2000, in terms of noise variance at 0.12 bpp and for lena (left) and barbara (right) testimages.

[2] E. J. Candes and D. L. Donoho. Curvelets − a surprisingly effectivenonadaptive representation for objects with edges. In Curve and SurfaceFitting. Vanderbilt University Press, 1999.

[3] M. N. Do and M. Vetterli. The contourlet transform: An efficientdirectional multiresolution image representation. IEEE Trans. ImageProcess., 14, 2005.

[4] G. Kutyniok and D. Labate. Construction of regular and irregularshearlet frames. J. Wavelet Theo. and Appl., 1, 2007.

[5] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: Aflexible architecture for multi-scale derivative computation. In IEEEInternational Conference on Image Processing, Oct. 1995.

[6] H. Greenspan, S. Belongie, R. Goodman, and P. Perone. Overcompletesteerable pyramid filters and rotation invariance. In Proceedings of theIEEE Conference on Computer Vision and Pattern, pages 222–228, June1994.

[7] E. Le Pennec and S. Mallat. Sparse geometrical image approximationwith bandelets. IEEE Trans. Image Process., 14(4):423–438, Apr. 2005.

[8] D. S. Taubman. Directionality and scalibility in image and videocompression. PhD thesis, University of California Berkeley, 1994.

[9] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P. L. Dragotti. Di-rectionlets: Anisotropic multi-directional representation with separablefiltering. IEEE Trans. Image Process., 15, 2006.

[10] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P. L. Dragotti.Low-rate reduced complexity image compression using directionlets. InProc. IEEE International Conference on Image Processing, Atlanta, GA,

Oct. 2006.[11] O. N. Gerek and A. E. Cetin. A 2-d orientation-adaptive prediction

filter in lifting structures for image coding. IEEE Trans. Image Process.,15(1):106–111, Jan. 2006.

[12] C.-L. Chang, A. Maleki, and B. Girod. Adaptive wavelet transformfor image compression via directional quincunx lifting. In Proc. IEEEWorkshop on Multimedia Signal Processing, Shanghai, China, Oct. 2005.

[13] C.-L. Chang and B. Girod. Direction-adaptive discrete wavelet transformfor image compression. IEEE Trans. Image Process., 16(5):1289–1302,May 2007.

[14] V. Chappelier and C. Guillemot. Oriented wavelet transform for imagecompression and denoising. IEEE Trans. Image Process., 15(10):2892–2903, Oct. 2006.

[15] W. Ding, F. Wu, X. Wu, S. Li, and H. Li. Adaptive directional lifting-based wavelet transform for image coding. IEEE Trans. Image Process.,16(2):416–427, Feb. 2007.

[16] D. Wang, L. Zhang, A. Vincent, and F. Speranza. Curved wavelettransform for image coding. IEEE Trans. Image Process., 15(8):2413–2421, Aug. 2006.

[17] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, SanDiego, CA, 1997.

[18] D. L. Donoho. Wedgelets: Nearly minimax estimation of edges. Annalsof Stat., 27, 1999.

[19] T. Hastie, R. Tibshirani, and J. Friedman. Elements of StatisticalLearning. Springer, second edition, 2009.

Page 14: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

14

Fig. 16. Comparison of the visual quality for three compression approaches at 0.12 bpp. The test images are corrupted by zero mean Gaussian noise withσ2 = 0.6× 10−3.

Fig. 17. Subjective performance of DA-DWT and megaquad approaches on lena and barbara images at 0.1 bpp. DA-DWT partitions and selected directionsin each case are shown on left. DA-DWT and megablocking provide PSNRs of 30.18 and 30.02 dB for lena and 25.95 and 25.52 dB for barbara, respectively.Additionally, their corresponding overheads are 0.003 and 0.018 bpp for lena and 0.006 and 0.024 bpp for barbara.

[20] R. Shukla, P. L. Dragotti, M. N. Do, and M. Vetterli. Rate-distortionoptimized tree-structured compression algorithms for piecewise polyno-mial images. IEEE Trans. Image Process., 14(3):1–17, Mar. 2005.

[21] A. N. Kolmogorov and V. M. Tihomirov. ε-entropy and ε-capacity ofsets in functional spaces. American Math. Society Translation (Ser. 2),17, 1961.

[22] B. Rajaei, A. Maleki, and H. R. Pourreza. Rate-distortion analysis ofcurved wavelets. preprint.

[23] M. N. Do, P. L. Dragotti, R. Shukla, and M. Vetterli. On compressionof two-dimensional piecewise smooth functions. In IEEE InternationalConference on Image Processing, Thessaloniki, Greece, Oct. 2001.

[24] I. Daubechies and W. Sweldens. Factoring wavelet transforms into liftingsteps. J. Fourier Anal. Appl., 4(3):247–269, 1998.

[25] W. Sweldens. The lifting scheme: a construction of second generationwavelets. SIAM J. Math. Anal., 29(2):511–546, 1998.

[26] A. Maleki, M. Shahram, and G. Carlsson. Near optimal coder for imagegeometries with adaptive partitioning. In Proc. IEEE Int. Conf. on ImageProc., San Diego, 2008.

[27] V. Chandrasekaran, M. B.Wakin, D. Baron, and R. G. Baraniuk. Rep-resentation and compression of multi-dimensional piecewise functionsusing surflets. to appear in IEEE Trans. on Information Theory.

[28] R. Willett and R. Nowak. Platelets: a multiscale approach for recoveringedges and surfaces in photonlimited medical imaging. IEEE Trans. onMedical Imag., 22, 2003.

[29] M. B. Wakin, J. K. Romberg, H. Choi, and R. G. Baraniuk. Imagecompression using an efficient edge cartoon + texture model. In IEEEData Compression Conference, Utah, Apr. 2002.

[30] M.B. Wakin, J.K. Romberg, H. Choi, and R.G. Baraniuk. Wavelet-domain approximation and compression of piecewise smooth images.IEEE Trans. Image Process., 15(5), May 2006.

Page 15: Rate-Distortion Analysis of Directional Wavelets and ...mam15/directional_waveletRD.pdfArian Maleki is with the Department of Electrical and Computer Engineer-ing, Rice University,

15

[31] L. Demaret, N. Dyn, and A. Iske. Image compression by linear splinesover adaptive triangulations. Signal Process., 86:1604–1616, 2006.

[32] L. Jacques and J.-P. Antoine. Multiselective pyramidal decompositionof images: Wavelets with adaptive angular selectivity. Int. J. WaveletsMultires. Inf. Process., 5:785–814, 2007.

[33] J. Krommweh. Tetrolet transform: A new adaptive haar waveletalgorithm for sparse image representation. J. Vis. Commun., 21(4):364–374, 2010.

[34] J. Krommweh and G. Plonka. Directional haar wavelet frames ontriangles. Appl. Comput. Harmon. Anal., 27:215–234, 2009.

[35] S. Mallat. Geometrical grouplets. Appl. Comput. Harmon. Anal.,26:161–180, 2009.

[36] G. Plonka. The easy path wavelet transform: A new adaptive wavelettransform for sparse representation of two-dimensional data. MultiscaleModel. Sim., 7:1474–1496, 2009.

[37] E. J. Candes and D. L. Donoho. New tight frames of curvelets andoptimal representations of objects with piecewise singularities. Comm.Pure Appl. Math., 57:219–266, 2004.

[38] G. Plonka, S. Tenorth, and D. Rosca. A new hybrid method for imagecompression using the easy path wavelet transform. IEEE Trans. ImageProcess., 20(2):372–381, 2011.

[39] P.Prandoni and M.Vetterli. Approximation and compression of piecewisesmooth functions. Phil. Trans. Royal Society London, 357(1760):2573–2591, Sep. 1999.

[40] A. Gouze, M. Antonini, M. Barlaud, and B. Macq. Design of signal-adapted multidimensional lifting scheme for lossy coding. IEEE Trans.Image Process., 13(12):1589–1603, Dec. 2004.

[41] C. Tian and S. S. Hemami. An embedded image coding system basedon tarp filter with classification. In Proc. IEEE Int. Conf. Acoustics,Speech, and Signal Processing, volume 3, pages 49–52, May 2004.

[42] R. Calderbank, I. Daubechies, W. Sweldens, and B.-L. Yeo. Wavelettransforms that map integers to integers. Appl. Comput. Harmon. Anal.,5(3):332–369, 1998.