video compression and rate control methods based on the wavelet transform

VIDEO COMPRESSION AND RATE CONTROL

METHODS BASED ON THE WAVELET TRANSFORM

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Eric J. Balster, B.S., M.S.

* * * * *

The Ohio State University

2004

Dissertation Committee:

Yuan F. Zheng, Adviser

Ashok K. Krishnamurthy

Steven B. Bibyk

Approved by

Adviser

Department of Electricaland Computer Engineering

c© Copyright by

Eric J. Balster

2004

ABSTRACT

Wavelet-based image and video compression techniques have become popular ar-

eas in the research community. In March of 2000, the Joint Pictures Expert Group

(JPEG) released JPEG2000. JPEG2000 is a wavelet-based image compression stan-

dard and predicted to completely replace the original JPEG standard. In the video

compression field, a compression technique called 3D wavelet compression shows

promise. Thus, wavelet-based compression techniques have received more attention

from the research community.

This dissertation involves further investigation of the wavelet transform in the

compression of image and video signals, and a rate control method for real-time

transfer of wavelet-based compressed video.

A pre-processing algorithm based on the wavelet transform is developed for the

removal of noise in images prior to compression. The intelligent removal of noise

reduces the entropy of the original signal, aiding in compressibility. The proposed

wavelet-based denoising method shows a computational speedup of at least an order

of magnitude than previously established image denoising methods and a higher peak

signal-to-noise ratio (PSNR).

A video denoising algorithm is also included which eliminates both intra- and

inter-frame noise. The inter-frame noise removal technique estimates the amount

of motion in the image sequence. Using motion and noise level estimates, a video

ii

denoising technique is established which is robust to various levels of noise corruption

and various levels of motion.

A virtual-object video compression method is included. Object-based compres-

sion methods have come to the forefront of the research community with the adoption

of the MPEG-4 (Motion Pictures Expert Group) standard. Object-based compres-

sion methods promise higher compression ratios without further cost in reconstructed

quality. Results show that virtual-object compression outperforms 3D wavelet com-

pression with an increase in compression ratio and higher PSNR.

Finally, a rate-control method is developed for the real-time transmission of wavelet-

based compressed video. Wavelet compression schemes demand a rate-control al-

gorithm for real-time video communication systems. Using a leaky-bucket design

approach, the proposed rate-control method manages the uncertain factors in both

the acquisition time of the group of frames (GoF), computation time of compres-

sion/decompression algorithms, and network delay. Results show good management

and control of buffers and minimal variance in frame rate.

iii

To my parents

iv

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor Professor Yuan F. Zheng

for his constant encouragement, shrewd guidance, and financial support throughout

my years at The Ohio State University (OSU). I have benefited from his expert tech-

nical knowledge in science and engineering and learned from his creative and novel

solutions to many research problems. It has truly been an honor and a privilege to

study under his guidance. I would also like to thank Professors Ashok K. Krishna-

murthy and Steven B. Bibyk for serving on my committee and providing feedback on

this dissertation.

It has been my pleasure to work with my colleges in the Wavelet Research Group

at OSU. Specifically I would like to thank Ms. Yi Liu and Mr. Zhigang (James)

Gao for the continual help with many technical problems that I had come across

over the years and their computer support help that is second to none. I would also

like to thank my former colleges Dr. Jianyu (Jane) Dong (currently at California

State University) and Mr. Chao He (currently at Microsoft Corp.) for helping me to

become acclimated to our research group and to the university during the beginning

of my studies. Both Jane and Chao were also helpful in many productive discussions

concerning wavelet-based compression of video signals.

I would like to thank both the Dayton Area Graduate Studies Institute (DAGSI)

and the Air Force Research Laboratory (AFRL) for funding this research.

v

I want to give a special thanks to the AFRL Embedded Information Systems

Engineering Branch (IFTA) for their continued support over the years. Everyone

in the branch has been very encouraging and supportive throughout my studies.

Specifically, I would like to thank Mr. James Williamson and Mr. Eugene Blackburn

for giving me the opportunity to work at AFRL; an institution of superb research

and state-of-the-art technology. Thanks to Dr. Robert L. Ewing for his tutelage and

advise through many milestones over the years. I would also like to thank Mr. Al

Scarpelli for his support and help during many projects.

Lastly, I would also like to thank my family for their love and encouragement.

Susan, Craig, Jenny, Michael, Megan, Evan, Mom, and Dad, you have always been

a very supportive and loving family. Without you all, I would not be able to pursue

my goals.

vi

VITA

Dec. 24, 1975 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Born - Dayton, OH

May 1998 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B.S. Electrical Engineering, Universityof Dayton, Dayton, OH

Aug. 1998 - Aug. 1999 . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Assistant, Electri-cal Engineering, University of Dayton,Dayton, OH

Aug. 1999 - May. 2000 . . . . . . . . . . . . . . . . . . . . . Graduate Research Assistant, Electri-cal Engineering, University of Dayton,Dayton, OH

May 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.S. Electrical Engineering, Universityof Dayton, Dayton, OH

Sept. 2000 - June 2002 . . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, Electri-cal Engineering, The Ohio State Uni-versity, Columbus, OH

July 2002 - present . . . . . . . . . . . . . . . . . . . . . . . . . Associate Electronics Engineer, Em-bedded Information Systems Engineer-ing Branch, Air Force Research Labo-ratory, Wright-Patterson AFB, OH

PUBLICATIONS

Research Publications

Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Combined Spatial and Tem-poral Domain Wavelet Shrinkage Algorithm for Video Denoising”, submitted to IEEETransactions on Circuits and Systems for Video Technology. Apr. 2004.

Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Combined Spatial and Tem-poral Domain Wavelet Shrinkage Algorithm for Video Denoising”, in Proc. IEEE

vii

International Conference on Communication Systems, Networks, and Digital SignalProcessing. March 2004.

Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Feature-Based Wavelet Shrink-age Algorithm for Image Denoising”. submitted with one revision to IEEE Transac-tions on Image Processing. Feb 2004.

Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Fast, Feature-Based WaveletShrinkage Algorithm for Image Denoising”, in Proc. IEEE International Conferenceon Integration of Knowledge Intensive Multi-Agent Systems. pp. 722-728, Oct. 2003.

Eric J. Balster, Waleed W. Smari, and Frank A. Scarpino, ”Implementation of Effi-cient Wavelet Image Compression Algorithms using Reconfigurable Devices”, in Proc.IASTED International Conference on Signal and Image Processing. pp 249-256, Aug.2003.

Eric J. Balster and Yuan F. Zheng, ”Constant Quality Rate Control for Content-based 3D Wavelet Video Communication”, in Proc. World Congress on IntelligentControl and Automation. pp. 2056-2060, June 2002.

Eric J. Balster and Yuan F. Zheng, ”Real-Time Video Rate Control Algorithm for aWavelet-Based Compression Scheme”, in Proc. IEEE Midwest Symposium on Circuitsand Systems. pp. 492-496, Aug 2001.

Eric J. Balster, Frank A. Scarpino, and Waleed W. Smari, ”Wavelet Transform forReal-Time Image Compression Using FPGAs”, in Proc. IASTED International Con-ference on Parallel and Distributed Computing and Systems. pp 232-238, Nov. 2000.

FIELDS OF STUDY

Major Field: Electrical Engineering

Studies in:

Communication and Signal ProcessingCircuits and ElectronicsMathematics

viii

TABLE OF CONTENTS

Page

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

Chapters:

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 A Review of Current Compression Standards . . . . . . . . . . . . 11.1.1 Image Compression Standard (JPEG) . . . . . . . . . . . . 11.1.2 JPEG2000 Image Compression Standard . . . . . . . . . . . 21.1.3 Video Compression Standards (H.26X and MPEG-X) . . . . 3

1.2 Motivation for Wavelet Image Compression Research . . . . . . . . 61.2.1 Wavelet Image Compression vs. JPEG Compression . . . . 61.2.2 Wavelet Image Pre-processing . . . . . . . . . . . . . . . . . 9

1.3 Motivation for Wavelet Video Compression Research . . . . . . . . 111.3.1 Video Signal Pre-processing for Noise Removal . . . . . . . 121.3.2 Virtual-Object Based Video Compression . . . . . . . . . . 13

1.4 Motivation for the Rate Control of Wavelet-Compressed Video . . . 141.5 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . 15

ix

2. Wavelet Theory Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 Scaling Function and Wavelet Definitions . . . . . . . . . . . . . . 172.2 Scaling Function and Wavelet Restrictions . . . . . . . . . . . . . . 202.3 Wavelet Filterbank Analysis . . . . . . . . . . . . . . . . . . . . . . 202.4 Wavelet Filterbank Synthesis . . . . . . . . . . . . . . . . . . . . . 222.5 Two-Dimensional Wavelet Transform . . . . . . . . . . . . . . . . . 222.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3. Feature-Based Wavelet Selective Shrinkage Algorithm for Image Denoising 25

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 2D Non-Decimated Wavelet Analysis and Synthesis . . . . . . . . . 303.3 Retention of Feature-Supporting Wavelet Coefficients . . . . . . . . 333.4 Selection of Threshold τ and Support s . . . . . . . . . . . . . . . . 393.5 Estimation of Parameter Values . . . . . . . . . . . . . . . . . . . . 49

3.5.1 Noise Estimation . . . . . . . . . . . . . . . . . . . . . . . . 493.5.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 49

3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 513.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4. Combined Spatial and Temporal Domain Wavelet Shrinkage Algorithmfor Video Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2 Temporal Denoising and Order of Operations . . . . . . . . . . . . 62

4.2.1 Temporal Domain Denoising . . . . . . . . . . . . . . . . . 624.2.2 Order of Operations . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Proposed Motion Index . . . . . . . . . . . . . . . . . . . . . . . . 664.3.1 Motion Index Calculation . . . . . . . . . . . . . . . . . . . 664.3.2 Motion Index Testing . . . . . . . . . . . . . . . . . . . . . 67

4.4 Temporal Domain Parameter Selection . . . . . . . . . . . . . . . . 694.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 714.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5. Virtual-Object Video Compression . . . . . . . . . . . . . . . . . . . . . 86

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 3D Wavelet Compression . . . . . . . . . . . . . . . . . . . . . . . 89

5.2.1 2D Wavelet Transform . . . . . . . . . . . . . . . . . . . . . 895.2.2 2D Quantization . . . . . . . . . . . . . . . . . . . . . . . . 915.2.3 3D Wavelet Transform . . . . . . . . . . . . . . . . . . . . . 91

x

5.2.4 3D Quantization . . . . . . . . . . . . . . . . . . . . . . . . 925.2.5 3D Wavelet Compression Results . . . . . . . . . . . . . . . 95

5.3 Virtual-Object Compression . . . . . . . . . . . . . . . . . . . . . . 975.3.1 Virtual-Object Definitions . . . . . . . . . . . . . . . . . . . 975.3.2 Virtual-Object Extraction Method . . . . . . . . . . . . . . 985.3.3 Virtual-Object Coding . . . . . . . . . . . . . . . . . . . . . 102

5.4 Performance Comparison Between 3D Wavelet and Virtual-ObjectCompression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6. Constant Quality Rate Control for Content-Based 3D Wavelet Video Com-munication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.2 Multi-Threaded, Content-Based 3D Wavelet Compression . . . . . 1096.3 The Rate Control Algorithm . . . . . . . . . . . . . . . . . . . . . 112

6.3.1 Rate Control Overview . . . . . . . . . . . . . . . . . . . . . 1126.3.2 Buffer Constraints . . . . . . . . . . . . . . . . . . . . . . . 1146.3.3 Grouping Buffer Design . . . . . . . . . . . . . . . . . . . . 1186.3.4 Display Buffer Design . . . . . . . . . . . . . . . . . . . . . 120

6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 1236.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 129

7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Appendices:

A. Computation of S·,k[x, y] . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

xi

LIST OF VARIABLES

In this dissertation, the following variables are used:

Greek Variables:

• α[x, y, z]: Boolean value of position (x, y, z) indicating the presence of back-

ground information

• αk[n]: Non-decimated scaling coefficient of scale k and position n

• αll,k[x, y]: Two-dimensional non-decimated scaling coefficient of scale k and

position n

• α3Dk [l, z]: Non-decimated scaling coefficient of level k, spatial position l, and

frame z, generated by temporal domain transformation

• αll,k[x, y]: Reconstructed non-decimated scaling coefficient of spatial position

(x, y)

• αoptll,k[x, y]: Optimally reconstructed non-decimated scaling coefficient of spatial

position (x, y)

• αA: Percent change in frame acquisition rate

• αD: Percent change in display rate

xii

• γx[z]: Leftmost position of the virtual-object in frame z

• γy[z]: Highest vertical position of the virtual-object in frame z

• Γ: The maximum size of a group of frames (GoF)

• δA: Incremental change in the frame acquisition rate

• δD: Incremental change in the display rate

• εd: Empty display buffer warning threshold

• εg: Empty grouping buffer warning threshold

• εx[z]: Rightmost position of the virtual-object in frame z

• εy[z]: Lowest vertical position of the virtual-object in frame z

• η(x, y): Two-dimensional noise function value at spatial position (x, y)

• λk[n]: Non-decimated wavelet coefficient of scale k and position n

• λhl,k[x, y]: Two-dimensional non-decimated wavelet coefficient, high-low sub-

band, of scale k and spatial position (x, y)

• λlh,k[x, y]: Two-dimensional non-decimated wavelet coefficient, low-high sub-


• λhh,k[x, y]: Two-dimensional non-decimated wavelet coefficient, high-high sub-


• λ3Dk [l, z]: Non-decimated wavelet coefficient of level k, spatial position l, and

frame z, generated by temporal domain transformation

xiii

• λ·,k[x, y]: Non-decimated wavelet coefficient of level k and spatial position (x, y),

generated by the wavelet transform of f(·)

• λvo[x, y, z]: Non-decimated wavelet coefficient of position (x, y, z) used to de-

termine location of the virtual-object

• µl: Temporal mean of spatially averaged pixel values, Azl

• σn: Standard deviation of η(·)

• σn: Estimated standard deviation η(·)

• τ : Threshold used in image denoising

• τc: The critical time period before the display buffer is empty

• τm(·): Optimal threshold function used in image denoising

• τm(·): Estimated threshold function used in image denoising

• τvo: Threshold used to determine motion in the wavelet coefficients, λvo[·]

• τz[·]: temporal domain threshold for video denoising

• φd: Full display buffer warning threshold

• φg: Full grouping buffer warning threshold

• Φ(t): Scaling function

• Φk,n(t): Scaling function of scale k and shift n

• Ψ(t): Mother wavelet

xiv

• Ψk,n(t): Wavelet of scale k and shift n

English Variables:

• ak[n]: Scaling coefficient of scale k and position n

• all,k[x, y]: Two-dimensional scaling coefficient of scale k and spatial position

(x, y)

• all,k[x, y, z]: Quantized, two-dimensional scaling coefficient of scale k and posi-

tion (x, y, z)

• a3D·,k,j[x, y, z]: Three-dimensional scaling coefficient of 2D scale k, 3D scale j, and

position (x, y, z)

• a3D·,k,j[x, y, z]: Quantized three-dimensional scaling coefficient of 2D scale k, 3D

scale j, and position (x, y, z)

• as: Multiplicative term used in the LMMSE calculation of sm(·)

• aτ : Multiplicative term used in the LMMSE calculation of τm(·)

• Ai: Frame acquisition rate

• Azl : Spatially averaged pixel value of spatial position l and frame z used in

motion index calculation

• b(x, y): Background pixel of spatial location (x, y)

• bs: Additive term used in the LMMSE calculation of sm(·)

• bτ : Additive term used in the LMMSE calculation of τm(·)

xv

• Bdi : Display buffer fullness at time i

• Bgi : Grouping buffer fullness at time i

• CN : Size of the N th group of frames (GoF)

• dk[n]: Wavelet coefficient of scale k and position n

• dhl,k[x, y]: Two-dimensional wavelet coefficient, high-low subband, of scale k

and spatial position (x, y)

• dlh,k[x, y]: Two-dimensional wavelet coefficient, low-high subband, of scale k


• dhh,k[x, y]: Two-dimensional wavelet coefficient, high-high subband, of scale k


• dhl,k[x, y, z]: Quantized, 2D wavelet coefficient, high-low subband, of scale k and

location (x, y, z)

• dlh,k[x, y, z]: Quantized, 2D wavelet coefficient, low-high subband, of scale k and

location (x, y, z)

• dhh,k[x, y, z]: Quantized, 2D wavelet coefficient, high-high subband, of scale k

and location (x, y, z)

• d3D·,k,j[x, y, z]: Three-dimensional wavelet coefficient of 2D scale k, 3D scale j and

position (x, y, z)

• d3D·,k,j[x, y, z]: Quantized three-dimensional wavelet coefficient of 2D scale k, 3D

scale j and position (x, y, z)

xvi

• D: Space below the virtual-object

• Davg|Bdi−1<ε: Estimated average display rate

• Di: Display frame rate at time i

• Ei: Compression rate at time i

• Ex(z): Ending horizontal position of the virtual-object in frame z

• Ey(z): Ending vertical position of the virtual-object in frame z

• f(t): Arbitrary function

• fk(t): Arbitrary function of scale k

• f(x, y): Original image pixel of spatial position (x, y)

• f(x, y): Noisy image pixel of spatial position (x, y)

• f(x, y): Denoised image pixel of spatial position (x, y)

• f opt(x, y): Optimal denoised image pixel of spatial position (x, y)

• f(x, y, z): Original video signal pixel of position (x, y, z)

• f(x, y, z): Reconstructed video signal pixel of position (x, y, z)

• f zl : Video signal pixel of spatial location l and frame z

• F : Number of frames in a group of frames (GoF)

• g[n]: Wavelet filter coefficient of position n

xvii

• GN : Time period when the last frame of the N th group of frames (GoF) is

acquired

• h[n]: Scaling function filter coefficient of position n

• Hf : Height of image

• Ho: Height of the virtual-object

• I: The initial buffering level for the display buffer

• I·,k[x, y]: Boolean value formed by thresholding noisy wavelet coefficient, λ·,k[x, y]

by τ

• Ivo[x, y, z]: Boolean value created by thresholding λvo[x, y, z] coefficient by the

threshold, τvo

• J·,k[x, y]: Boolean value formed by refining I·,k[x, y] with local support

• Jopt·,k [x, y]: Optimal Boolean value of spatial location (x, y)

• Jvo[x, y, z]: Refined Boolean value used for motion detection of location (x, y, z)

• K: Number of terms included in noise estimation calculation

• KM : Number of subband levels in the 2D wavelet transform

• JM : Number of subband levels in the 3D wavelet transform

• L: Space left of the virtual-object

• L·,k[x, y]: Wavelet coefficient of scale k and spatial location (x, y) used in re-

construction

xviii

• Lopt·,k [x, y]: Wavelet coefficient of scale k and spatial location (x, y) used in opti-

mal reconstruction

• LN : The total delay of the N th group of frames (GoF)

• mse: Mean-squared error between original and modified image

• Ml: Motion index of spatial location l

• o(x, y, z): Virtual-object pixel of location (x, y, z)

• R: Space right of the virtual-object

• Ri: Video reconstruction rate at time i

• s: Support variable used to create Boolean map J·,k[·]

• s2: 2D Quantization step size

• s3: 3D Quantization step size

• sm(·): Optimal support function used in image denoising

• sm(·): Estimated support function used in image denoising

• svo: Support value used to refine motion detection

• S·,k[x, y]: Coefficient support value of level k and spatial location (x, y)

• Sd: Size of the display buffer

• Sg: Grouping buffer size

• Sx(z): Starting horizontal position of the virtual-object in frame z

xix

• Sy(z): Starting vertical position of the virtual-object in frame z

• U : Space above the virtual-object

• Vk: Spanning set of scaling functions of scale k

• Wf : Width of image

• Wk: Spanning set of wavelet functions of scale k

• Wo: Width of the virtual-object

• zm,x: Frame which contains the maximum virtual-object width

• zm,y: Frame which contains the maximum virtual-object height

xx

LIST OF TABLES

Table Page

3.1 Minimum average error of test images for various noise levels and theircorresponding threshold and support values. . . . . . . . . . . . . . . 48

3.2 PSNR comparison of the proposed method to other methods given inthe literature (results given in dB). . . . . . . . . . . . . . . . . . 52

3.3 Computation times for a 256x256 image, in seconds. . . . . . . . . . . 53

3.4 Compression ratios of 2D wavelet compression both with and withoutdenoising applied as a pre-processing step. . . . . . . . . . . . . . . . 54

4.1 Compression ratios of 3D wavelet compression both with and withoutdenoising applied as a pre-processing step. . . . . . . . . . . . . . . . 84

xxi

LIST OF FIGURES

Figure Page

1.1 Generalized architecture of the H.261 encoder. . . . . . . . . . . . . . 4

1.2 2D wavelet transform. Left: Original ”Peppers” image. Center: Wavelettransformed image, MRlevel = 3. Right: Subband reference. . . . . . 7

1.3 Comparison between JPEG and wavelet compression methods usingthe ”Peppers” image. Left: JPEG compression, file size = 6782 bytes,compression ratio 116:1, PSNR = 22.32. Right: 2D Wavelet compres-sion, file size = 6635 bytes, compression ratio 118:1, PSNR = 25.64. . 9

2.1 Wavelet decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Wavelet reconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 Non-decimated wavelet decomposition. . . . . . . . . . . . . . . . . . 31

3.2 Non-decimated wavelet synthesis. . . . . . . . . . . . . . . . . . . . . 32

3.3 Generic coefficient array. . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Generic coefficient array, with corresponding S·,k values. . . . . . . . . 37

3.5 Optimal denoising method applied to noisy ”Lenna” image. Left: Cor-rupted image f(x, y), σn = 50, PSNR = 14.16 dB. Right: Optimallydenoised image f opt(x, y), PSNR = 27.72 dB. . . . . . . . . . . . . . . 41

3.6 Test images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.7 Average PSNR values using different wavelets. . . . . . . . . . . . . . 46

xxii

3.8 Error results for test images, σn = 30. . . . . . . . . . . . . . . . . . . 47

3.9 τm(·), sm(·) and their corresponding estimates, τm(·), sm(·). . . . . . . 51

3.10 Results of the proposed image denoising algorithm. Top left: Original”Peppers” image. Top right: Corrupted image, σn = 37.75, PSNR =16.60 dB. Bottom: Denoised image using the proposed method, PSNR= 27.17 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.11 Results of the proposed image denoising algorithm. Top left: Original”House” image. Top right: Corrupted image, σn = 32.47, PSNR =17.90 dB. Bottom: Denoised image using the proposed method, PSNR= 29.81 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.12 Wavelet-based compression results with and without pre-processing. . 58

4.1 Test results of both TFS and SFT denoising methods. Upper left:FOOTBALL image sequence, SFT denoising, max. PSNR = 30.85,τ = 18, τz = 12. Upper right: FOOTBALL image sequence, TFSdenoising, max. PSNR = 30.71, τ = 18, τz = 12. Lower left: CLAIREimage sequence, SFT denoising, max. PSNR = 40.77, τ = 19, τz = 15.Lower right: CLAIRE image sequence, TFS denoising, max. PSNR =40.69, τ = 15, τz = 21. . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2 Spatial positions of motion estimation test points. Left: FOOTBALLimage sequence, frame #96. Right: CLAIRE image sequence, frame#167. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3 Motion estimate given in [10] of image sequences, CLAIRE and FOOT-BALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.4 Proposed motion estimate of image sequences, CLAIRE and FOOT-BALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.5 α and β parameter testing for temporal domain denoising. . . . . . . 75

4.6 Denoising methods applied to the SALESMAN image sequence, std.= 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.7 Denoising methods applied to the SALESMAN image sequence, std.= 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

xxiii

4.8 Denoising methods applied to the TENNIS image sequence, std. = 10. 77

4.9 Denoising methods applied to the TENNIS image sequence, std. = 20. 78

4.10 Denoising methods applied to the FLOWER image sequence, std. = 10. 78

4.11 Denoising methods applied to the FLOWER image sequence, std. = 20. 79

4.12 Original frame #7 of the SALESMAN image sequence. . . . . . . . . 79

4.13 SALESMAN image sequence corrupted, std. = 20, PSNR = 22.10. . . 80

4.14 Results of the 3D K-nearest neighbors filter, [83], PSNR = 28.42. . . 80

4.15 Results of the 2D wavelet denoising filter, given in Chapter 3, PSNR= 29.76. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.16 Results of the 2D wavelet filtering with linear temporal filtering, [55],PSNR = 30.47. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.17 Results of the proposed denoising method, PSNR = 30.66. . . . . . . 82

4.18 Wavelet-based compression results with and without pre-processing. . 83

5.1 3D wavelet compression. . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.2 Starting from left to right. 1) Original three-dimensional video signal.2) 2D wavelet transform (KM = 2 and JM = 0). 3) Symmetric 3Dwavelet transform 4) Decoupled 3D wavelet transform (KM = 2 andJM = 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3 Decoupled 3D wavelet transform subbands, KM = 2, JM = 2. Left:Subband d3D

hl,1,1[·] highlighted in gray. Right: Subband d3Dlh,0,2[·] high-

lighted in gray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

xxiv

5.4 Comparison of 2D wavelet compression and 3D wavelet compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 2Dwavelet compression. s2 = 64, KM = 8, file size = 198KB, compressionratio = 256:1, average PSNR = 29.80. Right: 3D wavelet compression.s2 = 29, s3 = 29, KM = 8, JM = 8, file size = 196KB, compressionratio = 258:1, average PSNR = 33.31. . . . . . . . . . . . . . . . . . . 96

5.5 Virtual-object extraction. . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6 Virtual-object compression. . . . . . . . . . . . . . . . . . . . . . . . 103

5.7 Comparison of 3D wavelet compression and virtual-object compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 3Dwavelet compression. s2 = 29, s3 = 29, KM = 8, JM = 8, file size= 196KB, compression ratio = 258:1, average PSNR = 33.31. Right:Virtual-object compression, s2 = 25, s3 = 25, KM = 8, JM = 8 forthe virtual-object and s2 = 9, KM = 8 for the background, file size =195KB, compression ratio = 259:1, average PSNR = 34.00. . . . . . . 104

5.8 Comparison of 2D wavelet compression, 3D wavelet compression, andvirtual-object compression. . . . . . . . . . . . . . . . . . . . . . . . . 105

6.1 Content-based 3D wavelet compression/decompression design flow. . . 110

6.2 3D wavelet communication system. . . . . . . . . . . . . . . . . . . . 111

6.3 Complete rate control system. . . . . . . . . . . . . . . . . . . . . . . 113

6.4 Rate control model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.5 Display frame rate and display buffer size, D0=12 fps. . . . . . . . . . 124

6.6 Frame acquisition rate and grouping buffer size, D0=12 fps. . . . . . . 125

6.7 Display frame rate and display buffer size, D0=2 fps. . . . . . . . . . 126

6.8 Frame acquisition rate and grouping buffer size, D0=2 fps. . . . . . . 127

xxv

CHAPTER 1

Introduction

Effective image and video compression techniques have been active research areas

for the last several years. Because of the vast data size of raw digital image and

video signals and limited transmission bandwidth and storage space, image and video

compression techniques are paramount in the development of digital image and video

systems. It is essential to develop compression methods which can both produce high

compression ratios and preserve reconstructed quality in order for the creation of high

quality, affordable image and video products.

It is this seemingly limitless demand for higher quality image and video compres-

sion systems which provides substantial motivation for further compression research.

First, a brief overview of the latest compression standards will be provided prior to

the presentation of specific research topics and objectives.

1.1 A Review of Current Compression Standards

1.1.1 Image Compression Standard (JPEG)

The Joint Pictures Experts Group (JPEG) committee developed a compression

standard for digital images in the late 1980’s. JPEG compression has long since been

1

the most widely accepted standard in image compression, embedded in most modern

digital imaging products.

The JPEG image encoder operates on 8x8 or 16x16 blocks of image data. Thus,

images being compressed by JPEG are segmented into processing blocks called mac-

roblocks. JPEG compresses each macroblock separately by first transforming the

block by Discrete Cosine Transformation (DCT), quantizing the resultant coefficients,

run-length encoding, and finally coding with a variable length entropy coder [47]. The

block-based encoder facilitates simplicity, computational speed, and a modest mem-

ory requirement.

Typically, JPEG can compress images at a 10:1 to 20:1 compression ratio and

retain high quality reconstruction. 30:1 to 50:1 compression ratios can be obtained

with only minor defects to the reconstructed image [34].

1.1.2 JPEG2000 Image Compression Standard

It has been known throughout the research community for several years that

the wavelet transform is superior to DCT methods in image compression. Thus, in

March of 2000, JPEG published the JPEG2000 standard based on wavelet technology

[63]. The compression method of JPEG2000 is similar to that of JPEG. However,

JPEG2000 uses the wavelet transform instead of the block-based DCT. This allows for

the user to specify the size of the processing block (small block sizes reduce the mem-

ory requirement while large block sizes improve compression gain and reconstructed

image quality). After transformation, coefficients are quantized and encoded as in

the JPEG standard.

2

The JPEG2000 standard promises a 20%-25% smaller average file size with com-

parable quality than the original JPEG standard [44].

1.1.3 Video Compression Standards (H.26X and MPEG-X)

The H.261 Video Compression Standard

H.261 is a compression standard developed by the ITU (International Telecom

Union) in 1990. The compression algorithm involves block-based DCT transforma-

tion as in JPEG, but also inter-frame prediction and motion compensation (MC) for

temporal domain compression. Temporal domain compression starts with an initial

frame, the intra (or I) frame. Compression is achieved by creating a predicted (P)

frame by subtracting the motion compensated current frame from the closest recon-

structed I frame. The I and P frames are then compressed by a method very similar

to JPEG, and because the P frames no longer contain as much information as their

original frame counterparts, temporal domain compression is achieved. Figure 1.1

gives a generalized architecture of the H.261 encoder.

Because of the subtraction involved in temporal domain compression, the quality

of the P frames are highly dependent upon the quality of the I frames. To combat

this problem, the P frames are compressed by subtraction from reconstructed I frames.

Thus, in decoding the P frames, there is little error introduced from temporal domain

compression.

The H.263 Video Compression Standard

H.263, also developed by the ITU, was published in 1995. The standard is similar

to H.261, but provides more advanced techniques such as half-pixel precision MC,

whereas H.261 uses full pixel precision MC.

3

Figure 1.1: Generalized architecture of the H.261 encoder.

The MPEG-1 Video Compression Standard

The Motion Pictures Expert Group (MPEG) published the MPEG-1 standard

in 1990 [1]. The video compression algorithm embedded in MPEG-1 follows H.261

with a few differences. One, the MC algorithm has less restriction providing better

predictive performance. Two, MPEG-1 not only generates I and P frames, but also

provides bi-directional predicted (or B) frames. While a P frame is generated from

the difference between the motion compensated current frame and the closest recon-

structed I frame, a B frame is produced from the difference between the current frame

and the average of the closest two reconstructed I frames. The introduction of the B

frame in MPEG-1 gives a sequence of coded video frames in form of:

I BB P BB P BB P BB I BB P BB P...

4

The advances of MPEG-1 from H.261 and H.263 make it a more popular com-

pression standard. A typical compression ratio from a high quality MPEG-1 encoded

bitstream is 26:1 [8].


Soon after the advent of MPEG-1, MPEG-2 was developed. The MPEG-2 stan-

dard is much like MPEG-1, with some added capability. Among the many improve-

ment, like H.263 from H.261, MPEG-2 supports half-pixel precision MC for higher

performance inter-frame prediction, [2, 30] . Typically, a high-quality MPEG-2 video

encoding will result in a 45:1 compression ratio [9]. Currently MPEG-2 is the most

widely used compression standard. It is the compression method used in digital video

disks (DVD), and most digital video recorders (DVR).


The finalized version of the MPEG-4 standard was published in December of 1999.

The basis of coding in MPEG-4 is not a processing macroblock, as in MPEG-1 and

MPEG-2, but rather an audio-visual object [3]. Object based compression techniques

have certain advantages, such as:

1) Allowing more user interaction with video content.

2) Allowing the reuse of recurring object content.

3) Removal of artifacts due to the joint coding of objects.

Although MPEG-4 does specify the advantages of object-based compression and

provides a standard of communication between sender and receiver, it does not provide

the means by which a) the content is separated into audio-visual objects, or b) the

audio-visual objects are compressed. The MPEG-4 standard is a more open standard

5

which can accept various compression methods. As long as both sender and receiver

possess the correct respective tool set for compression and decompression, they can

communicate.

The advent of the MPEG-4 compression standard has opened up audio and video

compression to more researchers, and provides a flexible environment for continual

improvement in the compression of audio and video signals.

1.2 Motivation for Wavelet Image Compression Research

1.2.1 Wavelet Image Compression vs. JPEG Compression

With the exception of JPEG2000 and MPEG-4 (which does not provide a method

of compression), each of the aforementioned compression standards given in Section

1.1 have the same drawback: blocking artifacts which appear in the reconstructed

signals at low bit-rate coding. These artifacts are a direct result of the block-based

DCT transform.

The wavelet transform does not have the drawbacks of block-based DCT methods.

Compression algorithms based on the wavelet transform do not segment frames into

processing blocks. Thus, wavelets have been extensively researched as an alternative

to block-based DCT compression methods, for both images and video signals [37, 52,

70, 82].

Figure 1.2 shows the ”Peppers” image, its wavelet decomposition, and a graphic

giving the referenced subband decomposition. As shown in Figure 1.2 the wavelet

transform does not break the image up into processing blocks, but processes the

entire image as a whole, creating subbands representative of differing spatial frequency

bandwidths.

6

Figure 1.2: 2D wavelet transform. Left: Original ”Peppers” image. Center: Wavelettransformed image, MRlevel = 3. Right: Subband reference.

Each of the subbands of the subband reference in the rightmost portion of figure

1.2 is labeled with a letter ”a” or ”d”. The subband labeled with a letter ”a” con-

tains scaling coefficients, which are the low spatial frequency representation of the

original image. The remaining subbands which are labeled with a letter ”d” contain

wavelet coefficients. Wavelet coefficients represent different levels of bandpass spatial

frequency information of the original image.

The subscript letters following the a’s and d’s, given in Figure 1.2, provide the

horizontal and vertical contributions of the particular subband. Typically, in the

2D wavelet transform, the original data values are processed first in the horizontal

direction, then in the vertical direction. Therefore, the data in each subband has

been contributed to from both horizontal and vertical processing. Thus, the ”H”

designation is representative of high frequency information, and the ”L” designation is

representative of low frequency information. For example, an HL designation denotes

data in that particular subband is representative of high frequency information in

the horizontal dimension and low frequency information in the vertical dimension.

7

Conversely, the LH designation denotes low frequency information in the horizontal

dimension and high frequency information in the vertical dimension. Also, the ”all,2”

subband is the lowest frequency representation of the original image and merely a copy

of the original image that has been decimated (low-pass filtered and downsampled)

by 22+1 in both the horizontal and vertical dimensions.

The numbers following the subscript letters represent the multiresolution level

(MRlevel) of the wavelet decomposition; The higher the value, the lower frequency

representation of the original signal the wavelet coefficients represent.

After the wavelet transform is applied to an image as in Figure 1.2, each subband

is quantized, run-length encoded, and sometimes entropy encoded, much like JPEG

compression.

Images compressed by methods utilizing the 2D wavelet transform have been

shown to progress into a more graceful degradation of reconstructed quality with an

increase in compression ratio. Unlike DCT-based compression, wavelet based image

encoders operate on each frame as a whole, thus eliminating blocking artifacts. Figure

1.3 gives the ”Peppers” image compressed both by the JPEG standard and wavelet

based compression.

As displayed in Figure 1.3, the wavelet compression algorithm does not produce

the blocking artifacts that appear in JPEG compression, but rather exhibits a more

graceful degradation in image quality with high compression ratio.

The JPEG compressed image given in Figure 1.3 is produced by the Advanced

JPEG Compressortm, downloadable software that can be found at

http://www.winsoftmagic.com. The wavelet compressed image given in Figure 1.3 is

produced by in-house software developed by the OSU research group. The ”Peppers”

8

Figure 1.3: Comparison between JPEG and wavelet compression methods using the”Peppers” image. Left: JPEG compression, file size = 6782 bytes, compression ra-tio 116:1, PSNR = 22.32. Right: 2D Wavelet compression, file size = 6635 bytes,compression ratio 118:1, PSNR = 25.64.

image is compressed by wavelet transformation, uniform quantization in all subbands,

stack-run coding [72], and Huffman coding [22]. No other processing is used. This

method of compression is referred to as 2D wavelet compression; the two dimensions

being processed are the vertical and horizontal dimensions of the image, as shown in

Figure 1.2.

1.2.2 Wavelet Image Pre-processing

Our research motivation in image compression is to provide supplemental pre-

processing steps to further enhance the capabilities of 2D wavelet compression. Im-

age pre-processing techniques are well established in many compression algorithms.

9

However, we have developed an image pre-processing algorithm which has proven to

out-perform established methods in both image quality and computation time.

Image pre-processing techniques are able to intelligently remove noise inherent

in digital images. The removal of noise decreases the entropy in the original image

signal, facilitating compressibility and reconstructed quality. With the removal of

noise, the encoder need not waste bits on noise, but rather use all the encoded bits

for storage of important image features.

Many different noise removal techniques have been applied to images, but the

wavelet transform has been viewed by many as the preferred technique for noise

removal [29, 42, 43, 54]. Rather than a complete transformation into the frequency

domain, as in DCT or FFT (Fast Fourier Transform), the wavelet transform produces

coefficient values which represent both time and frequency information. The hybrid

spatial-frequency representation of the wavelet coefficients allows for analysis based

on both spatial position and spatial frequency content. The hybrid analysis of the

wavelet transform is excellent in facilitating image denoising algorithms.

The wavelet transform does have a drawback, however. The computation time

of the wavelet transform hinders the performance of real-time image denoising ap-

plications. Thus, it is imperative to minimize the processing steps between wavelet

transformation and inverse transformation, i.e., the modification of wavelet coefficient

values for noise removal.

Thus, an image denoising method is developed which outperforms algorithms

given in [42, 43, 54] both in signal-to-noise ratio and computation time. This is

accomplished by providing an accurate and computationally simple coefficient selec-

tion process. Results of the proposed image denoising research show an improvement

10

in PSNR and a substantial reduction in computational complexity with a speedup of

over an order of magnitude than the established methods given in [42, 43, 54].

1.3 Motivation for Wavelet Video Compression Research

Because the wavelet transform has been successful in achieving better image qual-

ity at high compression ratios than traditional JPEG image compression, it is only

natural to assume that wavelet video compression techniques would be able to out-

perform the block-based DCT compression methods of H.26X and MPEG-X.

Several wavelet compression techniques have been targeted toward video appli-

cations. Tham et. al. uses block-based motion compensation for temporal domain

compression and the 2D wavelet transform for spatial compression [71]. Zheng, et.

al. uses the wavelet transform for temporal domain compression as well as spatial

domain compression, or 3D wavelet compression [24, 81].

The more straightforward approach in [81] exploits the advantages of the wavelet

transform in three dimensions for the compression of video. This approach uses the 2D

wavelet transform for intra-frame coding, and use the wavelet transform in between

frames for inter-frame coding.

Although both wavelet video compression techniques have had success in video

compression, there has not been an overwhelmingly superior wavelet video com-

pression technique to combat the industry standards. Thus, this research develops

wavelet-based techniques that further enhance the capabilities of 3D wavelet com-

pression.

11

We provide two processing methods to aid in the effectiveness of 3D wavelet

compression: a wavelet-based video noise removal algorithm for video pre-processing,

and a virtual-object based compression scheme utilizing 3D wavelet compression.

1.3.1 Video Signal Pre-processing for Noise Removal

It is well known that the removal of noise in images helps compression techniques

obtain higher compression ratios while achieving better reconstructed image quality.

However, there has not been much work in the removal of noise in video signals.

With video signals, there exists not only spatial domain noise, but also noise in the

temporal domain. Using the wavelet transform, we remove both spatial and temporal

noise providing a higher compression gain with 3D wavelet compression.

Noise reduction in digital images has been studied extensively [15, 16, 27, 29,

31, 42, 43, 54, 61, 77]. However, noise reduction in digital video has only rarely

been studied. Preliminary methods for temporal domain noise removal are variable

coefficient spatio-temporal filters [33, 83] and weighted median filters [45]. These

types of filters have also been studied in noise removal of images. Huang, et. al. uses

an adaptive median filter for noise removal in images [27]. Rieder and Scheffler [61],

and Wong [77] both use an adaptive linear filter for image noise removal. But the

wavelet transform has not been used for temporal domain noise removal.

One can only speculate why the wavelet transform has not yet been considered

for video signal denoising. However, our own preliminary analysis shows that the

overwhelming difficulty with using the wavelet transform is a considerable computa-

tional load. But with our image denoising technique, we have shown a significant

12

speedup in wavelet image denoising when compared to established methods, so the

computational burden in video denoising is overcome.

Thus, we include a method of removing temporal domain noise in video sequences

via the wavelet transform. Using techniques similar to the proposed image denoising

technique, we overcome the overwhelming computational burden provided by the

application of the wavelet transform in the temporal domain. Our video denoising

technique is applied to image sequences prior to compression, enabling more effective

compressed video.

1.3.2 Virtual-Object Based Video Compression

With the advent of the MPEG-4 standard, video compression is based on an

audio-visual object instead of the traditional macroblock [3].

Due to the advantages of object-based compression, as provided in the MPEG-

4 standard [3], we propose a wavelet-based virtual-object compression algorithm.

Virtual-object compression first separates moving objects from stationary background

and compresses each separately, thus achieving the advantages of object-based com-

pression.

There are two separate processing areas in object-based compression. Object

extraction is the method of separating different objects in an image sequence, and

the compression of those objects is a method of coding arbitrarily shaped objects. In

the virtual-object compression method, the wavelet transform is used for both object

extraction and object compression.

When the wavelet transform is applied in the temporal domain, motion of objects

is detected by large coefficient values. Therefore, the wavelet transform is used in

13

the identification and extraction of moving objects prior to object-based compression.

Virtual-object compression uses the non-decimated wavelet transform in the temporal

domain for the separation of objects from stationary background.

Virtual-object compression also restricts the virtual-object to be rectangular. This

restriction enables the use of 3D wavelet compression for the compression of the

virtual-object. Also, with a rectangular object restriction, the location and shape of

the object can be completely defined with only two sets of spatial coordinates (the

starting horizontal and vertical locations of the virtual-object, and the width and

height of the virtual-object), thus virtually eliminating shape coding overhead.

Results show the virtual-object compression method to be superior in compression

ratio with higher PSNR when compared to 3D wavelet compression.

1.4 Motivation for the Rate Control of Wavelet-CompressedVideo

Using the 3D wavelet compression method discussed in [24, 81], the number of

frames contained in a GoF (Group of Frames) varies due to video content. Thus,

there exists an unknown delay in the acquisition of the GoF, and the computation

time needed for compression. Also, in streaming applications across the Internet,

there exists another unknown delay in the transmission of the compressed GoF to the

receiver, and yet another unknown delay in the decompression time. The variability

in the time from frame acquisition to frame display requires a rate control algorithm

for real-time transmission of 3D wavelet compressed video.

A real-time video compression and transmission system is necessarily a multi-

threaded package. On the server side frame acquisition, GoF compression, and packet

transmission processes must work independently for real-time operation. For example,

14

in real-time compression the frame acquisition process may not wait for the compres-

sion process to finish before acquiring the next GoF. Frame acquisition must occur

at regular intervals for real-time processing. On the client side, the decompression of

the GoF must occur independently from frame display for real-time systems.

In a multi-threaded environment such as the real-time compression and transmis-

sion of video, there must exist a process to manage the computational activity of

each processing thread in order to avoid overflow or starvation of buffers between

the threads. Also, this management process must exist in both the client and server

systems, and the management processes must communicate to ensure equivalent ac-

quisition and display rates (a requirement for real-time video applications).

The true motivation for a rate-control algorithm in a 3D wavelet compression

scheme is that of necessity. We may possess an efficient and effective video compres-

sion scheme, but without an effective rate-control system, real-time video commu-

nication is not possible. Performance results give a continuous video stream from

sender to receiver with a modest variation in frame rate.

1.5 Dissertation Overview

The rest of the dissertation is organized as follows. Chapter 2 is an overview of

wavelet theory. The goal of the overview is to develop the wavelet filterbank analysis

and synthesis equations, used in the computation of the wavelet forward and inverse

transforms. The wavelet forward and inverse transforms are then used throughout

the dissertation.

In Chapter 3 we develop the feature-based wavelet selective shrinkage algorithm

for image denoising. The coefficient selection method is based on a two-threshold

15

criteria to aptly determine which coefficients contain useful image information, and

which coefficients are corrupted with noise. The two-threshold criteria proves to be

an effective means of distinguishing between useful and useless coefficients, and the

performance of the denoising method is an improvement over other methods given in

the literature both in PSNR and computation time.

Chapter 4 develops the video denoising algorithm which is based upon the image

denoising algorithm described in Chapter 3. However, the video denoising algorithm

also applies temporal domain processing to eliminate inter-frame noise. There is also

a motion estimation algorithm applied to the video signal prior to temporal domain

processing. The motion estimation algorithm is able to determine the amount of

temporal domain processing which can improve overall quality.

Chapter 5 describes the virtual-object compression method. The virtual-object

compression method separates moving objects from stationary background and com-

presses each separately. The independent coding of object and background gives the

virtual-object compression method an improvement in signal-to-noise ratio over GoF

based compression methods such as 3D wavelet compression.

Chapter 6 develops a rate control algorithm for real-time video communication

using wavelet-based compression schemes. The size of the GoF varies in the wavelet-

based codec, so the computation times of the compression and decompression algo-

rithms are unknown. Also, the transmission time of the compressed GoF from sender

to receiver is unknown and variable. Thus, it is necessary to include a rate con-

trol mechanism to ensure continuous video delivery from server to client. Chapter 7

concludes the dissertation and provides some areas for future research.

16

CHAPTER 2

Wavelet Theory Overview

An overview of wavelet theory is presented for completeness and for the formu-

lation of both the wavelet analysis and synthesis filterbank equations, used in the

computation of the wavelet forward and inverse transforms, respectively.

2.1 Scaling Function and Wavelet Definitions

The basic idea of a transform is to use a set of orthonormal basis functions to

convolve with an input function. The resultant output function, then, can be evalu-

ated or modified. The Fourier Transform, for example, uses complex sinusoids (i.e.

ejωn, ∀ n) as its orthonormal basis set. The wavelet transform uses stretched and

shifted versions of one function, the mother wavelet, as its basis. However, not any

function can be a mother wavelet. There are certain criteria which the mother wavelet

must obey.

We will start with a scaling function, Φ(·). A basis can be generated by shifting

and stretching this function.

Φk,n(t) = 2−k2 Φ(2−kt− n), (2.1)

and

||Φ(t)|| = 1. (2.2)

17

where Φk,n(·) is the basis function of the kth scale and nth position.

It is required that the set of all Φk,n(·) be an orthonormal basis. Therefore, any

function, f(·), can be completely defined by a weighted sum of the basis functions

given in Equation 2.1.

f(t) =∑

k

∑n

ak[n]Φk,n(t), (2.3)

where

ak[n] = 〈Φk,n(t), f(t)〉 =

∫ ∞

−∞Φ∗

k,n(t)f(t)dt. (2.4)

ak[·] are called scaling coefficients.

Let us define a subset of the basis functions, Φk,n(·).

Vk = Span{Φk,n(t); n ∈ Z}. (2.5)

It is required that,

... Vk+1 ⊂ Vk ⊂ Vk−1 ... (2.6)

where Vk+1 defines a span of coarser scaling functions than does Vk.

We know from Equations 2.5 and 2.6 that, Φk+1,0(·) ∈ Vk+1 ⊂ Vk. So substituting

into Equation 2.3 we can show there exists a set of weights, h[·], such that

Φk+1,0(t) =∑

n

h[n]Φk,n(t), (2.7)

which when using Equation 2.1 and setting k = 0 reduces to

Φ(t) =√

2∑

n

h[n]Φ(2t− n). (2.8)

Equation 2.8 is referred to as the scaling equation, and the scaling function, Φ(·) is

completely defined by h[·].

18

A subset of scaling functions, Vk−1 can be defined by a subset of coarser scaling

functions Vk plus a difference subset, which we will call Wk. Therefore,

Vk−1 = Vk + Wk (Vk ⊥ Wk). (2.9)

We can then define a basis for Wk:

Wk = span{Ψk,n(t), n ∈ Z}, (2.10)

where

Ψk,n(t) = 2−k2 Ψ(2−kt− n). (2.11)

Ψ(·) is the mother wavelet, and the set of all Ψk,n(·) are the wavelet basis functions

corresponding to the subset Wk.

Because Wk ⊂ Vk−1, as given in Equation 2.9, we can substitute into Equation 2.3

to show that there exists a set of values, g[·] such that,

Ψk,0(t) =∑

n

g[n]Φk−1,n(t), (2.12)

which using Equation 2.11 and setting k = 1 can be reduced to

Ψ(t) =√

2∑

n

g[n]Φ(2t− n). (2.13)

Equation 2.13 is referred to as the wavelet scaling equation, and g[·] completely de-

scribes the Mother Wavelet, Ψ(·).

Notice from Equation 2.9 for any arbitrarily fine scale, k, we can show that,

Vk = Vk+1 + Wk+1

= Vk+2 + Wk+2 + Wk+1

= Vk+3 + Wk+3 + Wk+2 + Wk+1

=∑∞

n=1 Wk+n.

(2.14)

And therefore, any function, f(·), can be defined by

f(t) =∑

k

∑n

dk[n]Ψk,n(t), (2.15)

19

where

dk[n] = 〈Ψk,n(t), f(t)〉 =

∫ ∞

−∞Ψ∗

k,n(t)f(t)dt. (2.16)

2.2 Scaling Function and Wavelet Restrictions

Recall, that we want to keep shifted basis functions, Φk,n(·), orthonormal. There-

fore, for a given scale, k, we have

δ[m] = 〈Φk,0(t), Φk,m(t)〉=

⟨Φk,0(t), Φk,0(t− 2km)

⟩,

(2.17)

where δ[·] is the Kronecker delta function [50]. Using Equations 2.1, 2.7, and setting

k = 1, Equation 2.17 can reduce to

δ[m] =∑

n

h[n]h[n− 2m]. (2.18)

The wavelet basis functions, Ψk,n(·), also need to be orthonormal to the scaling basis

functions Φk,n(·), for Equation 2.9 to be valid. Therefore,

0 = 〈Ψk,0(t), Φk,m(t)〉 , (2.19)

which can be reduced to

0 =∑

n

g[n]h[n− 2m]. (2.20)

Equation 2.20 can be solved by

g[n] = (−1)nh[N − n], (2.21)

where N is the length of both h[·] and g[·].

2.3 Wavelet Filterbank Analysis

Let fk(·) ∈ Vk. From Equations 2.3, 2.14, and 2.15 it can be shown that

fk(t) =∑

n ak[n]Φk,n(t)=

∑n ak+1 [n]Φk+1n(t) +

∑n dk+1 [n]Ψk+1,n(t),

(2.22)

20

where dk+1 [·] and ak+1 [·] are the wavelet coefficients and scaling coefficients of the k+1

scale, respectively.

Using Equation 2.4 the scaling coefficients are realized, and substituting Equation

2.7 we obtain

ak+1 [n] =⟨fk(t), Φk+1,n(t)

⟩=

⟨∑m ak[m]Φk,m(t), Φk+1,n(t)

⟩=

∑m ak[m]

⟨Φk,m(t), Φk+1,n(t)

⟩=

∑m ak[m]

⟨Φk,m(t), Φk+1,0(t− 2k+1n)

⟩.

(2.23)

Using Equations 2.1 and 2.7, Equation 2.23 can be reduced to

ak+1 [n] =∑m

ak[m]∑

l

h[l]⟨2−

k2 Φ(2−kt−m), 2−

k2 Φ(2−kt− l − 2n)

⟩. (2.24)

Since the scaling function basis is orthonormal, the inner product in Equation 2.24 is

equal to one if and only if (l + 2n) = m. Therefore,

ak+1 [n] =∑m

ak[m]h[m− 2n]. (2.25)

Equation 2.25 indicates that the scaling coefficients ak+1 [·] can be obtained by con-

volving a reversed h[·] with ak[·], and downsampling by two.

Very similarly, it can be shown that,

dk+1 [n] =∑m

ak[m]g[m− 2n]. (2.26)

From Equations 2.23 and 2.25, we can obtain increasing coarser scales of wavelet

coefficients, dk+1 [·], by convolving the scaling coefficients, ak[·], by both a reversed

scaling filter, h[·], and a reversed wavelet filter, g[·], and downsampling by two. Figure

2.1 gives a block diagram of wavelet filterbank analysis.

Because each filtered output is downsampled by two, the same number of total

coefficients remains the same regardless of the number of resolution levels, k.

21

Figure 2.1: Wavelet decomposition.

2.4 Wavelet Filterbank Synthesis

Let fk(·) ∈ Vk. From Equations 2.4 and 2.22 it can be shown that

ak[n] = 〈fk(t), Φk,n(t)〉=

⟨∑m ak+1 [m]Φk+1,m(t) +

∑m dk+1 [m]Ψk+1,m(t), Φk,n(t)

⟩.

(2.27)

With some further computation, and substituting in Equations 2.7 and 2.12 it can

be shown that

ak[n] =∑

m ak+1 [m]⟨Φk+1,m(t), Φk,n(t)

⟩+

∑m dk+1 [m]

⟨Ψk+1,m(t), Φk,n(t)

⟩=

∑m ak+1 [m]h[n− 2m] +

∑m dk+1 [m]g[n− 2m].

(2.28)

From Equation 2.28, we can the obtain the original signal, fk(t), by upsampling

the scaling and wavelet coefficients and filtering the coefficients with their respective

filters, h[·] and g[·]. The wavelet reconstruction block diagram is given in Figure 2.2.

2.5 Two-Dimensional Wavelet Transform

A digital image is, in most cases, considered as a two-dimensional array, with

width and height as the dimensions. Let f(·) be a 2 dimensional, discrete signal. As

shown in Equations 2.25 and 2.26, the wavelet transform in one dimension generates

two pair of coefficients: scaling coefficients, ak[·], and wavelet coefficients, dk[·]. When

22

Figure 2.2: Wavelet reconstruction.

dealing with two dimensions, however, four pair of coefficients are generated. That

is,all,0[x, y] =

∑n h[n− 2y]

∑m h[m− 2x]f(m,n)

dhl,0[x, y] =∑

n h[n− 2y]∑

m g[m− 2x]f(m,n)dlh,0[x, y] =

∑n g[n− 2y]

∑m h[m− 2x]f(m,n)

dhh,0[x, y] =∑

n g[n− 2y]∑

m g[m− 2x]f(m,n).

(2.29)

As in the case of the 1-dimensional wavelet transform, the scaling coefficients can

be processed further for a multiresolution analysis of the original image, f(·):

all,k+1 [x, y] =∑

n h[n− 2y]∑

m h[m− 2x]all,k[m,n]dhl,k+1 [x, y] =

∑n h[n− 2y]

∑m g[m− 2x]all,k[m,n]

dlh,k+1 [x, y] =∑

n g[n− 2y]∑

m h[m− 2x]all,k[m,n]dhh,k+1 [x, y] =

∑n g[n− 2y]

∑m g[m− 2x]all,k[m, n].

(2.30)

The four coefficient sets are referred to as the low-low band, all,·[·], the high-low band,

dhl,·[·], the low-high band, dlh,·[·], and the high-high band, dhh,·[·]. The subbands are

named due to the order in which the scaling and/or the wavelet filters process the

scaling coefficients, all,·[·].

The reconstruction of f(x, y) is accomplished by

all,k[x, y] =∑

m h[x− 2m]∑

n h[y − 2n]all,k+1 [m,n]+

∑m h[x− 2m]

∑n g[y − 2n]dlh,k+1 [m, n]

+∑

m g[x− 2m]∑

n h[y − 2n]dhl,k+1 [m, n]+

∑m g[x− 2m]

∑n g[y − 2n]dhh,k+1 [m,n],

(2.31)

23

andf(x, y) =

∑m h[x− 2m]

∑n h[y − 2n]all,0[m,n]

+∑

m h[x− 2m]∑

n g[y − 2n]dlh,0[m,n]+

∑m g[x− 2m]

∑n h[y − 2n]dhl,0[m,n]

+∑

m g[x− 2m]∑

n g[y − 2n]dhh,0[m,n],

(2.32)

2.6 Summary

In this chapter, a brief overview of wavelet theory is presented and a formulation

of the wavelet analysis and synthesis filterbank equations is developed. The wavelet

analysis equations are given by Equations 2.25 and 2.26, and wavelet synthesis equa-

tion is given by Equation 2.28. Also, the 2D wavelet transform is described. The 2D

forward wavelet transform is given by Equations 2.29 and 2.30, and the 2D wavelet

inverse transform is given by Equations 2.31 and 2.32. Both the wavelet analysis and

synthesis equations and the 2D wavelet transform are used throughout the rest of the

dissertation.

24

CHAPTER 3

Feature-Based Wavelet Selective Shrinkage Algorithm forImage Denoising

3.1 Introduction

The recent advancement in multimedia technology has promoted an enormous

amount of research in the area of image and video processing. Image and video

processing applications such as compression, enhancement, and target recognition

require preprocessing functions for noise removal to improve performance. Noise

removal is one of the most common and important processing steps in many image

and video systems.

Because of the importance and commonality of preprocessing in most image and

video systems, there has been an enormous amount of research dedicated to the

subject of noise removal, and many different mathematical tools have been proposed.

Variable coefficient linear filters [17, 49, 61, 77], adaptive nonlinear filters [27, 46,

53, 83], DCT based solutions [31], cluster filtering [76], genetic algorithms [73], fuzzy

logic [39, 64], etc. have all been proposed in the literature.

The wavelet transform has also been used to suppress noise in digital images. It

has been shown that the reduction of absolute value in wavelet coefficients is suc-

cessful in signal restoration [43]. This process is known as wavelet shrinkage. Other

25

more complex denoising techniques select or reject wavelet coefficients based on their

predicted contribution to reconstructed image quality. This process is known as se-

lective wavelet shrinkage, and many works have used it as the preferred method of

image denoising. Preliminary methods predict the contribution of the wavelet co-

efficients based on the magnitude of the wavelet coefficients [69], and others based

on intra-scale dependencies of the wavelet coefficients [15, 20, 41, 43]. More recent

denoising methods are based on both intra- and inter-scale coefficient dependencies

[18, 26, 29, 42, 54].

Mallat and Hwang prove the successful removal of noise in signals via the wavelet

transform by selecting and rejecting wavelet coefficients based on their Lipschitz

(Holder) exponents [43]. The Holder exponent is a measure of regularity in a sig-

nal, and it may be approximated by the evolution of wavelet coefficient ratios across

scales. Thus, this regularity metric used in selecting those wavelet coefficients which

are to be used in reconstruction, and those which are not. Although this fundamental

work in image denoising is successful in the removal of noise, its application is broad

and not focused on image noise removal, and the results are not optimal.

Malfait and Roose refined the selective shrinkage denoising approach by applying

a Bayesian probabilistic formulation, and modeling the wavelet coefficients as Markov

random sequences [42]. This method is focused on image denoising and its results are

an improvement upon [43]. The Holder exponents are roughly approximated by the

evolution of coefficient values across scales, i.e.

ml,n = 1p−l

∑p−1k=l

∣∣∣λk+1,n

λk,n

∣∣∣,

where ml,n is the approximated Holder exponent of position n of scale l, and λk,n is

the wavelet coefficient of scale k and position n. The rough approximation is refined

26

by assuming that the coefficient values are well modeled as a Markov chain, and

the probability of a coefficients contribution to the image can be well approximated

by the Holder exponents of neighboring coefficients. Coefficients are then assigned

binary labels xk,n of scale k and position n depending on their predicted retention

for reconstruction (xk,n = 1), or predicted removal (xk,n = 0). The binary labels are

then randomly and iteratively switched until P (X|M) is maximized, where xk,n ∈ X

and mk,n ∈ M . The coefficients are modified by λnewk,n = λk,nP (xk,n = 1|M), and the

denoised image is formed by the inverse wavelet transform of the modified coefficients.

Each coefficient is reduced in magnitude depending on the probable contribution to

the image, i.e. P (xk,n = 1|M).

Later, Pizurica, et al. ([54]) continued on the work done by [42] by using a different

approximation of the Holder exponent given by

ρl,n = 1p−l

∑p−1k=l

∣∣∣ Ik+1,n

Ik,n

∣∣∣

where

Ik,n =∑

t∈C(k,n) |λk,t|.

ρk,n is the approximation of the Holder exponent, and C(k, n) is the set of coefficients

surrounding λk,n. This work applies the same probabilistic model as [42] using the

new approximation of the Holder exponent. Coefficients are assigned binary labels,

xk,n, depending on their predicted retention for reconstruction (xk,n = 1), or predicted

removal (xk,n = 0). The binary labels are then randomly and iteratively switched until

P (X|M) is maximized. Unlike [42], the significance measure of a coefficient, M , is not

merely its Holder exponent, but evaluated by the magnitude of the coefficients as well

as its Holder approximation, i.e. fM |X(mk,n|xk,n) = fΛ|X(λk,n|xk,n)fR|X(ρk,n|xk,n).

27

Thus a joint measure of coefficient significance is developed based on both the Holder

exponent approximation and the magnitude of the wavelet coefficient. As in [42], the

coefficients are modified by λnewk,n = λk,nP (xk,n = 1|M).

Although both algorithms in [42] and [54] show promising results in denoised image

quality, the iterative procedure necessary to maximize the probability P (X|M) adds

computational complexity making the processing times of the algorithms impractical

for most image and video processing applications. Also, the Markov Random Field

(MRF) model used in the calculation of P (X|M) is not appropriate for analysis of

wavelet coefficients because it ignores the influence of non-neighboring coefficients.

The MRF model is strictly used for simplicity and conceptual ease [42].

From the review of the literature, one can see that image denoising remains to be

an active and challenging topic of research. The major challenge lies in the fact that

one does not know what the original signal is for a corrupted image. The performance

of a method, on the other hand, can only be measured by comparing the denoised

image with its origin. In this chapter, we present a new denoising approach which

consists of two components. The first is the selective wavelet shrinkage method for

denoising, and the second is a new threshold selection method which makes use of test

images as training samples.

In general, selective shrinkage methods are comprised of three processing steps.

First, a corrupted image is decomposed into multiresolution subbands via the wavelet

transform. Next, wavelet coefficients are modified based upon certain criteria to

predict their importance in reconstructed image quality. Finally, the denoised image

is formed by reconstructing the modified coefficients via the inverse wavelet transform.

The processing step of most cost computationally in the methods of [42] and [54] and

28

greatest importance in denoising performance is the coefficient modification process,

which calls for effective and efficient criteria to modify wavelet coefficients. To improve

performance, this paper presents a new coefficient selection process which uses a

two-threshold criteria to non-iteratively select and reject wavelet coefficients. The

two-threshold selection criteria results in an effective and computationally simple

coefficient selection process.

The threshold selection method presented is based on minimizing the error be-

tween the wavelet coefficients of the denoised image and the wavelet coefficients of

an optimally denoised image produced by a method using supplemental information.

The supplemental information provided produces a denoised image that is far superior

than any method which does not utilize supplemental information. Thus, the image

produced by the method utilizing supplemental information is referred to as an op-

timally denoised image. Using several test cases, the threshold values which produce

the minimum difference between the wavelet coefficients of the denoised image and

the wavelet coefficients of the optimally denoised image are chosen as the threshold

values for the general case.

The two-threshold coefficient selection method results in a denoising algorithm

which gives improved results upon those provided by [42, 54] without the compu-

tational complexity. The two-threshold requirement investigates the regularities of

wavelet coefficients both spatially and across scales for predictive coefficient selection,

providing selective wavelet shrinkage to non-decimated wavelet subbands.

Following the Introduction, Section 3.2 gives theory on the 2D non-decimated

wavelet analysis and synthesis filters. Section 3.3 then describes the coefficient selec-

tion process prior to selective wavelet shrinkage. Section 3.4 gives testing results for

29

parameter selection. Section 3.5 gives the estimation algorithms for proper parameter

selection, and Section 3.6 gives the results. Section 3.7 gives the discussion.

3.2 2D Non-Decimated Wavelet Analysis and Synthesis

To facilitate the discussion of the proposed method, non-decimated wavelet filter-

bank theory is presented. In certain applications such as signal denoising, it is not

desirable to downsample wavelet coefficients after decomposition, as in the tradition

wavelet filterbank. The spatial resolution of the coefficients is degraded due to down-

sampling. Therefore, for the non-decimated case, each subband contains the same

number of coefficients as the original signal.

Let ak[n] and dk[n] be scaling and wavelet coefficients, respectively, of scale k and

position n. Thus,

αk[2k+1n] = ak[n]

λk[2k+1n] = dk[n],

(3.1)

where αk[·] are the non-decimated scaling coefficients, and λk[·] are the non-decimated

wavelet coefficients. Equation 3.1 is substituted into the scaling analysis filterbank

equation, Equation 2.25, to find the non-decimated filterbank equation:

ak+1 [n] =∑

m h[m]ak[m− 2n]αk+1 [2

k+2n] =∑

m h[m]αk[2k+1(m− 2n)]

αk+1 [n] =∑

m h[m]αk[2k+1m− n],

(3.2)

where h[·] and g[·] are the filter coefficients corresponding to the low-pass and high-

pass filter, respectively, of the wavelet transform. The 2k+1 scalar introduced into

Equation 3.2 is equivalent to upsampling h[·] by 2k+1 prior to its convolution with

αk[·]. Similarly Equation 3.1 is substituted into the wavelet analysis filterbank equa-

tion, Equation 2.26, to obtain

λk+1 [n] =∑

m g[m]αk[2k+1m− n]. (3.3)

30

Figure 3.1 gives a block diagram of the non-decimated wavelet decomposition.

Figure 3.1: Non-decimated wavelet decomposition.

The synthesis of the non-decimated wavelet transform also differs from the down-

sampled case. From the wavelet synthesis filterbank equation, Equation 2.28, we

obtain,

ak[2n] =∑m

h[2(n−m)]ak+1 [m] +∑m

g[2(n−m)]dk+1 [m]. (3.4)

Substituting (p = n−m) we obtain,

ak[2n] =∑

p

h[2p]ak+1 [n− p] +∑

p

g[2p]dk+1 [n− p]. (3.5)

Substituting Equation 3.1 into Equation 3.5,

αk[2k+2n] =

∑p h[2p]αk+1 [2

k+2(n− p)]

+∑

p g[2p]λk+1 [2k+2(n− p)]

, (3.6)

and

αk[n] =∑

p h[2p]αk+1 [n− 2k+2p]

+∑

p g[2p]λk+1 [n− 2k+2p].(3.7)

Looking at Equation 3.7 samples are being thrown away by downsampling αk+1 [·] and

λk+1 [·] by 2 prior to convolution. Because the downsampling in the analysis filters

31

is eliminated, a downsample by 2 is shown in the synthesis equation, Equation 3.7.

If a downsample by 2 is not performed, i.e. (m = 2p), then we must divide by 2 to

provide power equality. That is,

αk[n] = 12

∑m h[m]αk+1 [n− 2k+1m]

+12

∑m g[m]λk+1 [n− 2k+1m].

(3.8)

Figure 3.2 gives a block diagram of the non-decimated wavelet transform synthesis.

Figure 3.2: Non-decimated wavelet synthesis.

The above analysis is expanded to the two-dimensional case. For a 2D discrete

signal f(·), the 2D non-decimated wavelet transform is given by

αll,k+1 [x, y] =∑

n,m h[n]h[m]αll,k[2k+1m− x, 2k+1n− y]

λhl,k+1 [x, y] =∑

n,m h[n]g[m]αll,k[2k+1m− x, 2k+1n− y]

λlh,k+1 [x, y] =∑

n,m g[n]h[m]αll,k[2k+1m− x, 2k+1n− y]

λhh,k+1 [x, y] =∑

n,m g[n]g[m]αll,k[2k+1m− x, 2k+1n− y],

(3.9)

where

αll,−1[x, y] = f(x, y). (3.10)

The four coefficient sets given in Equation 3.9 are referred to as the low-low band,

αll,k+1 [·], the high-low band, λhl,k+1 [·], the low-high band, λlh,k+1 [·], and the high-high

32

band, λhh,k+1 [·]. The subbands are named due to the order in which the scaling and/or

the wavelet filters process the scaling coefficients.

For the synthesis of f(·) we have,

αll,k[x, y] = 14

∑m,n h[m]h[n]αll,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n h[m]g[n]λhl,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n g[m]h[n]λlh,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n g[m]g[n]λhh,k+1 [x− 2k+1m, y − 2k+1n]

. (3.11)

Equation 3.9 is recursively computed to produce several levels of wavelet coefficients,

and reconstruction of the 2D signal, f(·), is accomplished by the recursive computa-

tion of Equation 3.11.

The non-decimated wavelet transform has many advantages in signal denoising

over the traditional decimated case. One, each subband in the wavelet decomposition

is equal in size, thus it is more straightforward to find the spatial relationships be-

tween subbands. Two, the spatial resolution of each of the subbands is preserved by

eliminating the downsample by two. Because of the elimination of the downsampler,

information contained in the wavelet coefficients is redundant, and this redundancy

is exploited to determine the coefficients comprised of noise and the coefficients com-

prised of feature information contained in the original image.

3.3 Retention of Feature-Supporting Wavelet Coefficients

One of the many advantages of the wavelet transform over other mathematical

transformations is the retention of the spatial relationship between pixels in the orig-

inal image by the coefficients in the wavelet domain. These spatial relationships

represent features of the image and should be retained as much as possible during

denoising. In general, images are comprised of regular features, and the resulting

wavelet transform of an image generates few, large, spatially contiguous coefficients

33

which are representative of the features given in the original image. We refer to the

spatial contiguity of the wavelet coefficients as spatial regularity.

The concept of spatial regularity has the similar function as that of signal regu-

larity in previous denoising approaches for selecting the wavelet coefficients. The key

difference is that spatial correlation of the features are represented by connectivity of

wavelet coefficients rather than statistical models such as Markov random sequences

[42, 54] or Holder exponents [42, 43, 54] in previous methods. These models are often

computationally complicated and still do not reflect the geometry of the features ex-

plicitly. As a result the current method has a better performance even with a much

simpler computation.

Because of spatial regularity, the resulting subbands of the wavelet transform do

not generally contain isolated coefficients. This regularity can aid in deciding which

coefficients should be selected for reconstruction, and which should be thrown away

for maximum reconstructed image quality. The proposed coefficient selection method

in which spatial regularity is exploited is shown as follows.

Let us assume that an image is corrupted with additive noise, i.e.

f(x, y) = f(x, y) + η(x, y), (3.12)

where f(·) is the noiseless 2D signal, η(·) is a random noise function, and f(·) is the

corrupted signal.

The first step for selecting the wavelet coefficient is to form a preliminary binary

label for each coefficient, which collectively form a binary map. The binary map is

then used to determine whether or not a particular wavelet coefficient is included in

a regular spatial feature. The wavelet transform of f(·) generates coefficients, λ·,k[·],

from Equations 3.9 and 3.10. λ·,k[·] is used to create the preliminary binary map,

34

I·,k[·].

I·,k[x, y] =

{1, when |λ·,k[x, y]| > τ0, else

, (3.13)

where τ is a threshold for selecting valid coefficients in the construction of the binary

coefficient map. A valid coefficient is defined as a coefficient, λ·,k[x, y], which results

in I·,k[x, y] = 1; hence the coefficient has been selected due to its magnitude. After

coefficients are selected by magnitude, spatial regularity is used to further examine

the role of the valid coefficient: whether it is isolated noise or part of a spatial feature.

The number of supporting binary values around a particular non-zero value I·,k[x, y]

is used to make the judgement. The support value, S·,k[x, y], is the sum of all I·,k[·]

which support the current binary value I·,k[x, y]; that is, the total number of all valid

coefficients which are spatially connected to I·,k[x, y].

A coefficient is spatially connected to another if there exists a continuous path of

valid coefficients between the two. Figure 3.3 gives a generic coefficient map. The valid

coefficients are highlighted in gray. From Figure 3.3 it can be shown that coefficients

A, B, C, and H do not support any other valid coefficients in the coefficient map.

However, coefficients D and F support each other, coefficients E and G support each

other, and N and O support each other. Also, coefficients I, J, K, L, M, P, Q, and R

all support one another. Figure 3.4 gives the value of S·,k[x, y] for each of the valid

coefficients given in Figure 3.3. A method of computing S·,k[x, y] is given in Appendix

A. S·,k[·] is used to refine the original binary map I·,k[·] by

J·,k[x, y] =

1, when S·,k[x, y] > s,or J·,k+1[x, y]I·,k[x, y] = 1

0, else

, (3.14)

35

Figure 3.3: Generic coefficient array.

where J·,k[·] is the refined binary map, and s is the necessary number of support

coefficients for selection. J·,·[·] is calculated recursively, starting from the highest

multiresolution level, and progressing downward.

Equation 3.14 is equal to one when there exists enough wavelet coefficients of

large magnitude around the current coefficient. However, it also is equal to one

when the magnitude of the coefficient is effectively large (I·,k[·] = 1) but not locally

supported (J·,k[·] = 0) only if the coefficient of the larger scale is large and locally

supported (J·,k+1 [·] = 1). The decision to use this criterion is in the somewhat rare

case when a useful coefficient is not locally supported. In the general case, wavelet

coefficients of images are clustered together, but rarely they are isolated. In [43],

wavelet coefficients are modified only by their evolution across scales. Regular signal

features contain wavelet coefficients which increase with increasing scale. Thus, if

36

Figure 3.4: Generic coefficient array, with corresponding S·,k values.

there exists a useful coefficient which is isolated in an image, it is reasonable that a

coefficient in the same spatial location of an increase in scale will be sufficiently large

and spatially supported. Thus, the coefficient selection method provided by Equation

3.15 selects coefficients which are sufficiently large and locally supported as well as

isolated coefficients which are sufficiently large and supported by scale.

This type of scale-selection is consistent with the findings of Said and Pearlman

[62], who developed an image codec based on a ”spatial self-symmetry” between dif-

fering scales in wavelet transformed images. They discovered that most of an image’s

energy is concentrated in the low-frequency subbands of the wavelet transform. And

because of the self-symmetry properties of wavelet transformed images, if a coefficient

value is insignificant (i.e. of small value or zero), then it can be assumed that the

coefficients of higher spatial frequency and same spatial location are insignificant also.

37

In our application, however, we are looking for significance rather than insignificance,

so we look to the significance of lower frequency coefficients to determine significance

of the current coefficient. In this way, the preliminary binary map is refined by both

spatial and scalar support, given by equation 3.14.

The final coefficients retained for reconstruction are given by

L·,k[x, y] =

{λ·,k[x, y], when J·,k[x, y] = 10, else

. (3.15)

The denoised image is reconstructed using the supported coefficients, L,k[·] in the

synthesis equation given in Equation 3.11. Thus,

αll,k[x, y] = 14

∑m,n h[m]h[n]αll,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n h[m]g[n]Lhl,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n g[m]h[n]Llh,k+1 [x− 2k+1m, y − 2k+1n]

+14

∑m,n g[m]g[n]Lhh,k+1 [x− 2k+1m, y − 2k+1n]

. (3.16)

Equation 3.16 is calculated recursively producing scaling coefficients of finer resolution

until k = −1. The denoised image, f(·), is then given by

f(x, y) = αll,−1[x, y]. (3.17)

αll,k[·] are the reconstructed scaling coefficients of scale k.

In general, natural and synthetic imagery can be compactly represented in few

wavelet coefficients of large magnitude. These coefficients are in general spatially

clustered. Thus, it is useful to obtain selection methods based on magnitude and

spatial regularity to distinguish between useful coefficients which are representative

of the image and useless coefficients representative of noise. The two-threshold criteria

for the rejection of noisy wavelet coefficients is a computationally simple, non-iterative

test for magnitude and spatial regularity which can effectively distinguish between

useful and useless coefficients.

38

3.4 Selection of Threshold τ and Support s

The selection of threshold τ and support s is a key component of the denoising

algorithm. Unfortunately, the two parameters cannot be easily determined for a given

corrupted image because there is no information about the decomposition between

the original signal and the noise. We derive τ and s using a set of test images which

serve as training samples. These training samples are artificially corrupted by noise.

The noise is then removed by a series of τ and s. The set of τ and s which generates

the best results is selected for noise removing in general. This approach has its root

in an idea called oracle ([15]) which is described below.

An oracle is an entity which provides extra information to aid in the denoising

process. The extra information provided by the oracle is undoubtedly beneficial in

providing substantially greater denoising results than methods which are not fur-

nished supplemental information. Thus, the coefficient selection method which uses

the oracle’s information is referred to as the optimal denoising method. By the op-

timal denoising method the threshold and support can be selected using test images

of which both original image and noise are known. The selected threshold and sup-

port functions can then be selected for any corrupted images without supplemental

information.

An optimal coefficient selection process has been defined based on the original

(noiseless) image. The optimal binary map Jopt·,k [·] is given by

Jopt·,k [x, y] =

{1, when |λ·,k[x, y]| > σn

0, else, (3.18)

where λ·,k[·] are the wavelet coefficients of the original (noiseless) image, f(·), and

σn is the standard deviation of the noise in the corrupted image, f(·). Thus, the

39

extra information given by the oracle is the noiseless wavelet coefficients, λ·,k[·]. The

coefficients of the original image are used in coefficient selection process, but not

in the image reconstruction. The coefficients which are used in the reconstruction,

Lopt·,k [·], are given by,

Lopt·,k [x, y] =

{λ·,k[x, y], when Jopt

·,k [x, y] = 1

0, else, (3.19)

where λ·,k[·] are the wavelet coefficients of the noisy image.

The optimal coefficient map is used to create the optimal denoised image which

is given by

αoptll,k[x, y] =

14

∑m

∑n h[m]h[n]αopt

ll,k+1[x− 2k+1m, y − 2k+1n]

+14

∑m

∑n h[m]g[n]Lopt

hl,k+1[x− 2k+1m, y − 2k+1n]

+14

∑m

∑n g[m]h[n]Lopt

lh,k+1[x− 2k+1m, y − 2k+1n]

+14

∑m

∑n g[m]g[n]Lopt

hh,k+1[x− 2k+1m, y − 2k+1n]

. (3.20)

Equation 3.20 is recursively computed for lesser values of k until the optimal denoised

image is achieved, where

f opt(x, y) = αoptll,−1[x, y]. (3.21)

αoptll,k[·] are the optimal scaling coefficients, and f opt(·) is the optimally denoised image.

Figure 3.5 gives the denoising results of the optimal denoising method when applied

to the ”Lenna” image corrupted with additive white Gaussian noise (AWGN). As

shown in Figure 3.5, the optimal denoising method is able to effectively remove the

noise from the ”Lenna” image because of the added information given by the oracle.

PSNR is calculated for performance measurement and is given by

PSNR = 20log10

(255√mse

), (3.22)

where

mse =1

WfHf

∑x

∑y

(f(x, y)− f(x, y)

)2

. (3.23)

40

Figure 3.5: Optimal denoising method applied to noisy ”Lenna” image. Left: Cor-rupted image f(x, y), σn = 50, PSNR = 14.16 dB. Right: Optimally denoised imagef opt(x, y), PSNR = 27.72 dB.

mse is the mean-squared error between the original image f(·) and the denoised image

f(·), and Wf and Hf are the width and height of the image, respectively.

PSNR is the most popular quality metric among researchers in the image and

video processing community and has been used almost exclusively in the literature

for more than a decade. However, it is also well know in the community that PSNR is

not always consistent with the human perception of quality. That is, although image

processing method A is shown to give a higher PSNR than image processing method

B, people on average may tend to prefer the results of image processing method B.

Because of this inconsistency, recently there has been research conducted in the

development of new quality metrics which tend to give results which more closely

follow human perception. A metric call QI (quality index) has been developed based

41

not on pixel error as in PSNR, but on loss of correlation, luminance distortion, and

contrast distortion [75]. This method is tested, and the results suggest that QI may

be a better means of quantitative quality measurement than PSNR.

Also, another metric has been developed which suggest even more consistent qual-

ity assessment than both QI and PSNR. The weighted frequency-domain normalized

mean-squared error (W-NMSE) quality metric is based upon wavelet coefficient er-

ror [19]. Results given in [19] suggest that W-NMSE gives results that are closer to

human perception than both PSNR and QI.

In addition to PSNR, QI, and W-NMSE, there are also a number of proprietary

quality metrics available for purchase. So, there is a choice to be made when eval-

uating the performance of an image processing algorithm. The choice made in this

dissertation is to use PSNR, and there is a reason for the decision. The methods of

[19, 75] are very new metrics developed only in the past few years. These metrics may

be substantially better metrics than PSNR, but they have not had time to impact

the literature published by the image and video processing communities. Because

the methods of [19, 75] are new, it is unclear how much of an improvement they

have over PSNR, and until these metrics become more well known and commonplace

among researchers they will not replace PSNR as the quality metric of choice. Also,

the results of methods given in this dissertation are compared to methods developed

previously whose results are given in the literature. These methods all use PSNR as

the performance metric, so we must use PSNR for consistency.

It is rather obvious that the optimal coefficient selection process is unattainable

when no supplemental information is provided by the oracle. Thus the optimal image

42

denoising method is not possible for practical implementation. However, the knowl-

edge obtained by the optimal binary map, Jopt·,k [·], is used to compare with the refined

coefficient map generated by the two-threshold criteria, J·,k[·], described in Section

3.3. The coefficient selection method is based on the error between the optimal coef-

ficient subband and the subband generated by the two-threshold criteria. The error

is given by

Error =

∑p∈{hl,lh,hh},k,x,y

(Jopt

p,k [x, y]⊕ Jp,k[x, y])λ2

p,k[x, y]∑

p∈{hl,lh,hh},k,x,y λ2p,k[x, y]

, (3.24)

where ⊕ is the exclusive OR operation.

In the proposed coefficient selection algorithm, we use a training sample approach.

The approach starts with a series of test images serving as training samples to derive

the functions which determine the optimal set of values for τ and s as well as the type

of wavelet used for denoising. Theoretically, we may represent each training sample

as a vector Vi, i = 1, n. Those training samples should span a space which includes

many similar images corrupted by noise:

S = Span{Vi; i = 1, ..n}. (3.25)

The original data and the statistical distribution of the noise are given for each of

the training samples which are corrupted. The optimal set of parameters can then

be determined for the training samples using the approach described earlier. Ideally,

the space spanned by the training samples contains the types of corrupted images

which are to be denoised. As a result, the same set can generate an optimal or close

to optimal performance for the corrupted images of same type. It is clear that more

training samples will generate parameters suitable for more types of images, while

a space of fewer training samples is suitable for a lesser number of images. In the

43

following, we will use some examples to illustrate this approach. The test images

Figure 3.6: Test images.

are all 256x256 pixels. Shown in Figure 3.6, each of the training sample images is

well known in the image processing community, and collectively represents as many

types of images as possible. Starting from the upper-left image and going clockwise,

the images are ”Lenna”,”Airplane”, ”Fruits”, and ”Girl”. In this way, the τ and s

obtained will likely perform well in most cases.

44

A test is used to demonstrate the effectiveness of different wavelets in denoising.

First, each of the four test images is corrupted with AWGN at various levels. Next,

the 2D non-decimated wavelet transform, given in Section 3.2, is calculated using

several different wavelets. The wavelet coefficients are then hard thresholded using a

threshold T ranging from 0− 150, and the inverse wavelet transform is applied to the

thresholded coefficients. The wavelet which gives the reconstructed images with the

highest average PSNR is chosen to be used in the general case.

Several wavelets were used in the testing. However, for simplicity only five are

presented. We have chosen the Daubechies wavelets [12] (Daub4 and Daub8) for

their smoothness properties, the spline wavelets (first order and quadratic spline) [6]

because of their use in the previous works of [42, 43, 54], and the Haar wavelet because

of its simplicity and compact support. The results are given in Figure 3.7. After the

testing results given in Figure 3.7, the Haar wavelet is selected for image denoising:

h[n] =

{ 1√2, when n = 0, 1

0, elseg[n] =

−1√2, when n = 0

1√2, when n = 1

0, else

. (3.26)

Testing has shown the Haar wavelet to be the most promising in providing the highest

reconstructed image quality. The compact support of the Haar wavelet enables the

wavelet coefficients to represent the least number of original pixels in comparison

with other types of wavelets. Therefore, when a coefficient is removed because of its

insignificance or isolation, the result affects the smallest area of the original image in

the reconstruction, which reduces the impact to the image quality even if a removed

coefficient is not only comprised of noise.

The Haar wavelet is used in a non-decimated wavelet decomposition of the original

image. Three subband levels are used, i.e. k = −1 to 2. The proposed selective

45

0 50 100 15024

26

28

30

32

34

threshold T

PS

NR

(dB

)denoising using different wavelets, σ

n = 10

Haar wavelet1st order Spline waveletQuadradic Spline waveletDaub. 4 waveletDaub. 8 wavelet

0 50 100 15022

24

26

28

30

threshold T

PS

NR

(dB

)

denoising using different wavelets, σn = 20

0 50 100 15018

20

22

24

26

28

threshold T

PS

NR

(dB

)


0 50 100 15016

18

20

22

24

26

28

threshold T

PS

NR

(dB

)


Figure 3.7: Average PSNR values using different wavelets.

wavelet shrinkage algorithm is applied to all wavelet subbands, and the subbands are

synthesized by the non-decimated inverse wavelet transform.

Testing for the optimal values of τ and s is accomplished by artificially adding

Gaussian noise to each of the four images, denoising all four images with a particular τ

and s, and recording the average error given by Equation 3.24. Then, the combination

of τ and s which gives the lowest error is the choice for that particular noise level.

The average error is recorded when denoising each of the four test images given

in Figure 3.6 using τ ranging from 0− 150 and s ranging from 0− 20. The proposed

46

algorithm is tested by applying AWGN with a standard deviation (σn) of 10, 20, 30,

40, and 50 to each of the test images. The proposed method of selective wavelet

shrinkage is applied to the corrupted image, and the resulting error is recorded using

Equation 3.24. The results of the testing in which σn = 30 is given in Figure 3.8.

0

50

100

150

0

5

10

15

202

4

6

8

10

12

14

x 10−3

Threshold value, τ

Error Results with noise, σ = 30

spatial support pixels, s

Per

cent

Err

or

Figure 3.8: Error results for test images, σn = 30.

Table 3.1 gives the τ and s which provide the lowest average error for each noise

level tested. These particular values are referred to as τm(·) and sm(·). Table 3.1

47

Noise Level(σn) 10 20 30 40 50

Min. Avg. Error 3E-4 11E-4 24E-4 42E-4 64E-4sm value 5 9 10 15 14τm value 23 43 63 85 108

Table 3.1: Minimum average error of test images for various noise levels and theircorresponding threshold and support values.

suggests that parameters τm(·) and sm(·) are functions of the standard deviation of

the noise, σn.

Because τm(·) and sm(·) generally increase with an increase in additive noise as

shown in Table 3.1, both parameters can be modeled as functions of the additive

noise, σn. Then, knowing the level of noise corruption, the threshold levels which

produce the minimum error, Error, may be obtained by estimating the τm(·) and

sm(·) functions. The five noise levels provided in the test are used as sampling points

for the estimation of the continuous functions τm(·) and sm(·). With enough sampling

points both τm(·) and sm(·) can be effectively estimated, and the correct τ and s can

be calculated to denoise an image with any level of noise corruption, given that the

noise level is known.

The estimated functions of the sampled values τm(·) and sm(·) are referred to as

τm(·) and sm(·), respectively. Once the estimated functions are calculated they are

used in the general case. Thus, given an image corrupted with noise, it is denoised with

no prior knowledge by estimating the level of noise corruption, calculating the proper

thresholds using the τm(·) and sm(·) functions, and using the calculated threshold

levels in the denoising process given in Section 3.3.

48

3.5 Estimation of Parameter Values

It can be shown from the values given in Table 3.1 that the parameters τm(·) and

sm(·) are functions of σn; therefore, we need to estimate the standard deviation of

the noise level, and the functions. These two topics are discussed in this section.

3.5.1 Noise Estimation

The level of noise in a given digital image is unknown and must be estimated from

the noisy image data. Several well known algorithms have been given in the literature

to estimate image noise. From [16, 54] a median value of the λhh,0[·] subband is used

in the estimation process. The median noise estimation method of [54] is used in our

algorithm.

σn =Median(|λhh,0[·]|)

0.6745, (3.27)

where λhh,0[·] are the noisy wavelet coefficients in the high-high band of the 0th scale.

Because the vast majority of useful information in the wavelet domain is confined

to few and large coefficients, the median can effectively estimate the level of noise

(i.e. the average level of the useless coefficients) without being adversely influenced

by useful coefficients.

3.5.2 Parameter Estimation

Using the known level of noise added to the original images, the values of τm(·)

and sm(·), given in Table 3.1, are estimated. One of the simplest and most popular

estimation procedures is the LMMSE (Linear Minimum Mean Squared Error) method,

and it is used as the estimation procedure [68]. That is, two parameters aτ and bτ

49

are found such that

τm(σn) = aτσn + bτ . (3.28)

The choice of aτ and bτ will minimize the mean squared error. Similarly, an estimate

of sm, which must be an integer, is found as:

sm(σn) = basσn + bsc. (3.29)

The parameters which minimize the mean squared error are: aτ = 2.12, bτ = 0.80,

as = 0.26, and bs = 2.81.

The LMMSE estimation procedure gives a simple description of the τm and sm

functions. That is, there are only two values needed (a and b) to be able to determine

the proper thresholds for denoising. The LMMSE estimator also is shown to be a

good fit into the test data given in Figure 3.9. The values of τm(·), and sm(·) are given

as well as their corresponding LMMSE estimates. The LMMSE estimate functions

are the best linear fit into the data. Note that the support value sm must be an

integer.

The threshold τ and the support value s are determined by using the estimate of

the noise given by Equation 3.27. The two thresholds are given by

τ = aτ σn + bτ

s = basσn + bsc . (3.30)

Using this information, a new image denoising algorithm is formalized. With a given

image, the noise level is estimated by Equation 3.27, τ and s are then calculated using

Equation 3.30, and the image is denoised by the method given in Section 3.3.

50

0 10 20 30 40 50 600

20

40

60

80

100

120

140Threshold and support estimation based upon noise level

Noise level (standard deviation, σ)

Thr

esho

ld v

alue

, τ

Threshold level for min. errorThreshold estimate

0 10 20 30 40 50 600

5

10

15

20

Noise level (standard deviation, σ)

Loca

l sup

port

val

ue, s

Local support value for min. errorSupport estimate

Figure 3.9: τm(·), sm(·) and their corresponding estimates, τm(·), sm(·).

3.6 Experimental Results

The ”Peppers” and ”House” images are used for gauging the performance of the

proposed denoising algorithm. These two images have also been used in the results of

[42, 43, 54]. Therefore, the proposed algorithm’s performance is compared with the

performance of other recent algorithms given in the literature. Both the ”Peppers”

and ”House” images are corrupted with AWGN and the proposed method is used for

denoising. The results are given in Figures 3.10 and 3.11.

51

”Peppers”Image Input PSNR 22.6 19.6 16.6 13.6 Average

Proposed Algorithm 31.00 28.98 27.17 25.46 28.15Pizurica 3-band, [54] 30.20 28.60 27.00 25.20 27.75Pizurica 2-band, [54] 29.90 28.20 26.60 24.90 27.40Malfait and Roose, [42] 28.60 27.30 26.00 24.60 26.63Mallat and Hwang, [43] 28.20 27.30 27.10 24.60 26.80Matlab’s Sp. Adaptive Wiener 29.00 27.10 25.30 23.30 26.18

”House”Image Input PSNR 23.9 20.9 17.9 14.9 Average

Proposed Algorithm 33.09 31.55 29.81 28.34 30.70Pizurica 3-band, [54] 32.80 31.30 29.80 28.30 30.55Pizurica 2-band, [54] 32.10 30.50 29.30 28.10 30.00Malfait and Roose, [42] 32.90 31.30 29.80 28.20 30.55Mallat and Hwang, [43] 31.30 30.50 29.10 27.10 29.50Matlab’s Sp. Adaptive Wiener 30.30 28.60 26.70 24.90 27.63

Table 3.2: PSNR comparison of the proposed method to other methods given in theliterature (results given in dB).

Table 3.2 gives the results of the proposed method, as well as the results of

[42, 43, 54]. Note that the methods of [42, 43, 54] all use the quadratic spline wavelet

[6] in three subband levels, and each of the algorithms’ coefficient selection method

is based on a probabilistic formulation to determine how much a particular coeffi-

cient contributes to the overall image quality. The proposed algorithm uses the Haar

wavelet, given in Equation 4.4, in three subband levels, and the coefficient selection

process is based on a geometrical approach. As shown in Table 3.2, the results of the

proposed method are an improvement over other methods described in the literature.

In addition to improved performance, the proposed algorithm is computationally sim-

ple to facilitate real-world applications. The proposed algorithm has been computed

on older processors for an accurate comparison, and the computation time of the

52

Processor Pentium IV Pentium III IBM RS6000/320H

Proposed Algorithm 0.66 1.14 ***Pizurica 3-band, [54] *** 45.00 ***Pizurica 2-band, [54] *** 30.00 ***Malfait and Roose, [42] *** *** 180.00

*** Computation time not evaluated

Table 3.3: Computation times for a 256x256 image, in seconds.

proposed method is an order of magnitude less than the previous method of highest

performance, [54]. Table 3.3 gives the computational results of the proposed method

as well as the results of [42, 54].

The proposed algorithm shows a substantial drop in computation time. Both

[42] and [54] use iterative computation in the selection of wavelet coefficients for

reconstruction which requires unreasonable computation time for certain applications.

The current two-threshold technique is a simpler, non-iterative coefficient selection

method which produces greater performance results.

In addition to obtaining a higher signal-to-noise ratio than established image de-

noising algorithms, the proposed denoising algorithm facilitates image compression

when used as a pre-processing step. That is, the image is first denoised using the pro-

posed method, then compressed by 2D wavelet compression. The ”Peppers” image is

compressed with various quantization step sizes, both with and without the proposed

denoising algorithm. Figure 3.12 gives the compression results.

As given in Figure 3.12, regardless of the quantization step, applying the proposed

denoising algorithm prior to compression improves the compression ratio. However,

pre-processing is most beneficial when the step size is small. This is not surprising,

53

Image Step Size Without Denoising With Denoising

Lenna (512x512) 2 4.82:1, 159.2 kbytes 7.15:1, 107.4 kbytesFruits (512x512) 4 8.66:1, 88.7 kbytes 10.92:1, 70.4 kbytesBarb (512x512) 8 11.67:1, 65.8 kbytes 12.98:1, 59.19 kbytesGoldhill (512x512) 16 24.56:1, 31.3 kbytes 28.30:1, 27.14 kbytesPeppers (512x512) 32 49.28:1, 15.6 kbytes 51.19:1, 15.0 kbytes

Table 3.4: Compression ratios of 2D wavelet compression both with and withoutdenoising applied as a pre-processing step.

however. When a large step size is applied to the wavelet transform subbands, much

of the noise inherent in the image as well as much image content is removed, thus

increasing the compression ratio. However, when a small step size is applied, much of

the inherent noise is included in the compressed image, decreasing the compression

ratio.

Table 3.4 gives the results of 2D wavelet compression of various images both with

and without the denoising algorithm applied as a pre-processing step. As shown in

Table 3.4, when the denoising algorithm is applied to the image prior to compression,

the 2D wavelet compression algorithm achieves better performance. However, the

performance improvement is greater with a smaller quantization step size.

3.7 Discussion

A new selective wavelet shrinkage algorithm for image denoising has been de-

scribed. The proposed algorithm uses a two-threshold support criteria which inves-

tigates coefficient magnitude, spatial support, and support across scales in the coef-

ficient selection process. In general, images can be accurately represented by a few

large wavelet coefficients, and those few coefficients are spatially clustered together.

54

The two-threshold criteria is an efficient and effective way of using the magnitude and

spatial regularity of wavelet coefficients to distinguish useful from useless coefficients.

Furthermore, the two-threshold criteria is a non-iterative solution to selective wavelet

shrinkage to provide a computationally simple solution, facilitating realtime image

processing applications.

The values of the two-thresholds are determined by minimizing the error between

the coefficients selected by the two-thresholds and the coefficients selected by a de-

noising method which uses supplemental information provided by an oracle. The

supplemental information provided by the oracle is useful in determining the cor-

rect coefficients to select, and the denoising performance is substantially greater than

methods which do not use the supplemental information. Thus, the method which

uses the supplemental information provided by the oracle is referred to as the opti-

mal denoising method. Therefore, by minimizing the error between the two-threshold

method and the optimal denoising method, the two-threshold method can come as

close as possible to the performance of the optimal denoising method.

Consequently, the two-threshold method of selective wavelet shrinkage provides an

image denoising algorithm which provides signal-to-noise ratios than previous image

denoising methods given in the literature both in denoised image quality and com-

putation time. The light computational burden of the proposed denoising method

makes it suitable for real-time image processing applications.

55

Figure 3.10: Results of the proposed image denoising algorithm. Top left: Original”Peppers” image. Top right: Corrupted image, σn = 37.75, PSNR = 16.60 dB.Bottom: Denoised image using the proposed method, PSNR = 27.17 dB.

56

Figure 3.11: Results of the proposed image denoising algorithm. Top left: Original”House” image. Top right: Corrupted image, σn = 32.47, PSNR = 17.90 dB. Bottom:Denoised image using the proposed method, PSNR = 29.81 dB.

57

0 5 10 15 20 25 30 35 40 45 500

50

100

150

200

250

quantization step size

Com

pres

sed

file

size

(kB

ytes

)

Compressed file sizes of the "Peppers" image

2−D wavelet compression2−D wavelet compression with pre−processing

Figure 3.12: Wavelet-based compression results with and without pre-processing.

58

CHAPTER 4

Combined Spatial and Temporal Domain Wavelet ShrinkageAlgorithm for Video Denoising

4.1 Introduction

As shown in the introduction of Chapter 3, the process of removing noise in digital

images has been studied extensively [15, 17, 18, 20, 26, 27, 29, 31, 39, 41, 42, 43, 46,

49, 53, 54, 61, 64, 69, 73, 77, 76, 83]. However, until recently, the removal of noise

in video signals has not been studied seriously. Cocchia, et. al., developed a three

dimensional rational filter for noise removal in video signals [10]. The 3D rational

filter is able to remove noise, but preserve important edge information. Also, the

3D rational filter uses a motion estimation technique. Where there is no motion

detected, the 3D rational filter is applied in the temporal domain. Otherwise, only

spatial domain processing is applied.

Later, Zlokolica, et. al., uses two new techniques for noise removal in image

sequences [83]. Both these new techniques show improved results upon the method of

[10]. The first method is an alpha-trimmed mean filter of [4] extended to video signals,

and the second is the K nearest neighbors (KNN) filter. Both alpha-trimmed and

KNN denoising methods are based on ordering the pixel values in the neighborhood

of the location to be filtered, and averaging a portion of those spatially contiguous

59

pixels. Each of these methods attempts to average values which are close in value,

and avoid averaging values which are largely dissimilar in value. Thus, the image

sequence is smoothed without blurring edges.

However, because the success of the wavelet transform over other mathematical

tools in denoising images, some researchers believe that wavelets may be successful

in the removal of noise in video signals as well. Pizurica, et. al., uses a wavelet-

based image denoising method to remove noise from each individual frame in an

image sequence, then applies a temporal filtering process for temporal domain noise

removal [55]. The combination of wavelet image denoising and temporal filtering

outperforms both wavelet based image denoising techniques [42, 43, 54] and spatial-

temporal filtering techniques [4, 10, 83].

The temporal domain filtering technique described in [55] is a linear IIR filter

which will continue to filter until it reaches a large temporal discontinuity. It will not

filter the locations of large temporal discontinuity where the absolute difference in

neighboring pixel values is greater than a threshold, T , thus preserving motion while

removing noise.

Although temporal processing aids in the quality of the original image denoising

method, the parameter T varies with differing video signals for improved performance.

That is, the value of T may be large in sequences where there is little motion for im-

proved noise removal, i.e., there is more redundancy between consecutive frames.

Thus the redundancy may be exploited by a large T to improve video quality. How-

ever, in image sequences where there exists a large amount of motion, consecutive

frames are more independent and there exists little to no redundancy to exploit.

Thus, the parameter T must be small to achieve optimal performance.

60

In the case of video denoising, it has been fairly well documented that the amount

of noise removal achievable from temporal domain processing, while preserving overall

quality, is dependent on the amount of motion in the original video signal [10, 55].

Thus, a robust, high-quality video denoising algorithm is required to not only be

scalable to differing levels of noise corruption, but also scalable to differing amounts

of motion in the original signal. Unfortunately, this principle has not been seriously

considered in video denoising.

In this chapter, we develop a noise removal algorithm for video signals. This algo-

rithm uses selective wavelet shrinkage in all three dimensions of the image sequence

and proves to outperform the few video denoising algorithms given in the relevant

literature in terms of PSNR. First, the individual frames of the sequence are denoised

by the method described in Chapter 3, then a new selective wavelet shrinkage method

is used for temporal domain processing.

Also, a motion estimation algorithm is developed to determine the amount of

temporal domain processing to be performed. Several motion estimators have been

proposed [10, 55], but few are robust to noise corruption. The proposed motion esti-

mation algorithm is robust to noise corruption and an improvement over the motion

estimation method of [10]. The proposed denoising algorithm, including the proposed

motion estimation method, is experimentally determined to be an improvement over

the methods of [10, 55, 83].

Following the Introduction, Section 4.2 describes the temporal domain wavelet

shrinkage method and explores the proper order of temporal and spatial domain

processing functions. Section 4.3 provides the proposed motion estimation index

used in the temporal domain processing and compares it with the motion estimation

61

method of [10]. Section 4.4 develops the parameters for temporal domain processing,

and Section 4.5 gives the experimental results of the proposed method as well as other

established methods. Section 4.6 gives the discussion.

4.2 Temporal Denoising and Order of Operations

In this section, we develop the principal algorithm for video denoising. Additional

mechanisms required by this algorithm will be discussed in latter sections.

4.2.1 Temporal Domain Denoising

Let us define f zl as a pixel of spatial location l and frame z in a given image

sequence. The non-decimated wavelet transform applied in the temporal domain is

given by:

λ3Dk+1[l, z] =

∑p

g[p]α3Dk [l, 2k+1p− z], (4.1)

and

α3Dk+1[l, z] =

∑p

h[p]α3Dk [l, 2k+1p− z], (4.2)

where

α3D−1 [l, z] = f z

l . (4.3)

λ3Dk [l, z] is the high-frequency wavelet coefficient of spatial location l, frame z and scale

k. Also, α3Dk [l, z] is the low-frequency scaling coefficient of spatial location l, frame z

and scale k. Thus, multiple resolutions of wavelet coefficients may be generated from

iterative calculation of Equations 4.1 and 4.2.

62

The wavelet function used in the temporal domain denoising process is the Haar

wavelet given by

h[n] =

{ 1√2, when n = 0, 1

0, elseg[n] =

−1√2, when n = 0

1√2, when n = 1.

0, else

(4.4)

The decision to use the Haar wavelet is based on experimentation with several other

wavelet functions and finding the greatest results with the Haar. The compact support

of the Haar wavelet makes it a suitable function for denoising applications. Because

of it’s compact support, the Haar coefficients represent least number of original pixels

in comparison to other types of wavelets. Thus, when a coefficient is removed because

of its insignificance, the result affects the smallest area of the original signal in the

reconstruction.

Significant wavelet coefficients are selected by their magnitude with a threshold

operation.

L3Dk [l, z] =

{λ3D

k [l, z], when |λ3Dk [l, z]| > τz[l],

0, else, (4.5)

where L3Dk [·] are the thresholded wavelet coefficients used in signal reconstruction,

and τz[·] is the threshold value. The resulting denoised video signal is computed via

the inverse non-decimated wavelet transform

α3Dk [l, z] = 1

2

∑p h[p]α3D

k+1[l, z − 2k+1p]

+12

∑p g[p]L3D

k+1[l, z − 2k+1p], (4.6)

which leads to

f z,3Dl = α3D

−1 [l, z]. (4.7)

f z,3Dl is the temporally denoised video signal.

63

4.2.2 Order of Operations

With a spatial denoising technique and a temporal denoising technique established

in Chapter 3 and above, respectively, there still remains the question of the order of

operations. The highest quality may occur with temporal domain denoising followed

by spatial domain (TFS) denoising, or spatial denoising followed by temporal (SFT)

denoising.

Theoretically, is it not possible to prove and determine which operation is better

because the description of the noise is not known. However, it is our hypothesis

that SFT denoising can more aptly determine noise from signal information. The

reasoning behind this hypothesis is that removing noise in the spatial domain is a

well known process, and any noise removal prior to temporal domain processing is

helpful in discriminating between the residual noise and motion in the image sequence.

However, a validation of this hypothesis is determined heuristically.

Thus, a test is conducted using two video signals. The first video signal is one

which contains little motion, and the other contains a great deal of motion. The

selected image sequences are the ”CLAIRE” sequence from frame #104-167 and the

”FOOTBALL” sequence from frame #33-96.

Both of the image sequences are denoised with τ and τz ranging from 0 − 30 for

both TFS and SFT denoising operations. Note that in the test, τz is a single value

and spatially independent, unlike the temporal threshold, τz[·], which is used in the

final denoising algorithm, dependent upon spatial position, and given in Equation 4.5.

Also, the s parameter for feature selection in the image denoising method described

in Section 3.3 is calculated by taking Equation 3.30 and solving for s. The parameter

64

s is given by:

s =

⌊as

aτ

(τ − bτ ) + bs

⌋. (4.8)

Also, the number of resolutions of the non-decimated wavelet transform used in both

the spatial and temporal denoising methods is k = 1...5. The average PSNR of each

trial is recorded. The PSNR of an image is given by Equation 3.22.

Figure 4.1 gives the results of testing. As shown in Figure 4.1, the highest av-

erage PSNR is achieved by SFT denoising; first spatially denoising each frame of

the sequence followed by temporal domain denoising. Thus, for the proposed de-

noising method, spatial domain denoising occurs prior to temporal domain denoising,

exclusively.

In addition to a higher average PSNR, there is another benefit to SFT denoising.

The level of motion in an image sequence is known to be crucial in determining the

amount of noise reduction possible from temporal domain processing, and a motion

index calculation is inevitably done by comparing consecutive frames to one another.

Thus, let us define a noisy image sequence where f zl is a corrupted pixel in spatial

position l and frame z and is defined by

f zl = f z

l + ηzl , (4.9)

where f zl is the noiseless pixel value, and ηz

l is the noise function. We can compare

consecutive frames by taking the difference as in [10, 55] to find

f zl − f z+1

l = ∆f zl + ∆ηz

l . (4.10)

Thus by taking the difference between frames to find the level of motion, the noise

function is subtracted from itself, in effect doubling the level of noise corruption [68].

Therefore, by applying spatial denoising prior to motion index calculation we can

65

reduce the value of ∆ηzl and provide a more precise calculation of the motion given

in the image sequence.

4.3 Proposed Motion Index

A motion index is important in the success of a video denoising method in order

to discriminate between large temporal variances in the video signal which are caused

by noise and large temporal variances which are caused my motion in the original

(noiseless) signal. A motion index is able to aid temporal denoising algorithms to

eliminate the large temporal variances caused by noise while preserving the temporal

variances caused by motion in the original image sequence, creating a higher quality

video signal. That is, the motion index is used to determine τz[·].

4.3.1 Motion Index Calculation

Several works have developed a motion estimation index to determine the amount

of temporal domain processing to perform, i.e., the amount of information that can

be removed from the original signal to improve the overall quality [10, 55]. However,

neither of these proposed indices are robust to noise corruption, which is an important

feature in a motion index. There are a few characteristics that a motion index must

possess. One, a motion index should be a localized value. The reasoning behind a

localized motion index is because the amount of motion may vary in different spatial

portions of an image sequence. Thus the motion index should be able to identify

those differences. Two, a motion index needs to be unaffected by the amount of

noise corruption in a given video signal. A motion index should be robust to noise

corruption to aptly determine the proper amount of temporal domain processing.

66

Thus, a localized motion index is developed which is relatively unaffected by the

level of noise corruption in the original image sequence. A spatially averaged temporal

standard deviation (SATSD) is used as the index of motion. Spatial averaging is used

to remove the noise inherent in the signal, and the temporal standard deviation is

used to detect the amount of activity in the temporal domain.

Let us define f z,2Dl as pixel value in the spatial location l of the zth frame of an

image sequence already processed by the 2D denoising method given in Chapter 3.

The spatial averaging of the spatially denoised signal is given by

Azl =

1

B2

∑i∈I

f z,2Di , (4.11)

where I is the set of spatial locations which form a square area centered around spatial

location l, and B2 is the number of spatial locations contained in I; typically, B = 15.

The value of B must be an odd value to allow for the square area to set centrally

around spatial location l. This average is used to find the standard deviation in the

temporal domain.

µl =1

F

F∑i=1

Ail, (4.12)

and

Ml =

√√√√ 1

F

F∑i=1

(Ail − µl)2. (4.13)

Ml is the localized motion index, F is the number of frames in the image sequence,

and µl is the temporal mean of the spatial average at location l.

4.3.2 Motion Index Testing

The ”FOOTBALL” and ”CLAIRE” image sequences are used once more to test

the proposed motion index as well as the motion index given in [10], and two specific

67

spatial locations are selected from each sequence: a location where there is little to no

motion present, and a location where motion is present. A frame from each of the two

image sequences is given in Figure 4.2, and the four spatial locations for evaluation

of the proposed motion index are highlighted.

The two sequences are corrupted with various levels of noise, and the motion

is estimated at each of the four spatial locations selected with both the proposed

motion index and that of [10]. The results of the motion index used in [10] is given

in Figure 4.3. As shown in Figure 4.3, the motion index of [10] is not robust to noise

corruption. That is, the motion calculation from the same spatial location increases

with an increase in noise. Also, the motion index shows the ”FOOTBALL” image

sequence (x = 300, y = 220) as having a higher motion index than the ”CLAIRE”

image sequence (x = 40, y = 200) with zero noise corruption. However, the motion

index shows the opposite results with higher levels of noise. Thus, the motion index

gives conflicting results with the introduction of noise.

The results of the proposed SATSD motion index are given in Figure 4.4. As

shown in Figure 4.4, the proposed motion index is much more robust to varying noise

levels, and the order of locations from highest to lowest motion is what one would

believe is correct. The location with the lowest motion index is in the ”CLAIRE”

image sequence where there is no camera motion, and there are no moving objects in

that spatial location. The next lowest motion location is in the ”FOOTBALL” image

sequence in the spatial location where there are no moving objects. However, there

is some slight camera motion in the sequence, so the motion index is slightly higher

than in the ”CLAIRE ”image sequence. The location with the next highest motion

index is the center of the ”CLAIRE” image sequence, where there is some motion

68

due to movement of the head, and the location with the highest motion index is the

”FOOTBALL” image sequence in the spatial location where many objects cross.

4.4 Temporal Domain Parameter Selection

The amount of temporal denoising which is beneficial to an image sequence is

dependent upon the amount of noise corruption as well as the amount of motion.

Thus, the threshold τz[·] is given by

τz[l] = ασn + βMl (4.14)

where Ml is the motion index of spatial position l, and σn is the estimated noise stan-

dard deviation of the image sequence. The two parameters α and β are determined

experimentally using test image sequences.

In the proposed coefficient selection method, we use a training sample approach.

The approach starts with a series of test image sequences serving as training samples

to derive the functions which determine the optimal set of the values for α and β.

Theoretically, we may represent each training sample as a vector Vi, i = 1, n. Those

training samples should span a space which covers more corrupted image sequences

than the training samples:

S = Span{Vi; i = 1, ..n}. (4.15)

The original data and the statistical distribution of the noise are given for each of

the training samples which are corrupted. The optimal set of parameters can then be

determined which give the highest average PSNR for the training samples. Ideally,

the space spanned by the training samples contains the type of the corrupted image

sequences which are to be denoised. As a result, the same parameter set can generate

69

optimal or close to optimal performance for the corrupted image sequences of the

same type. It is clear that more training samples will generate parameters suitable

for more types of image sequences, while a space of fewer training samples is suitable

for fewer types of image sequences.

In order to obtain an estimate of the noise level, σn, an average is taken from

the noise estimates of each frame in the image sequence, given by Equation 3.27. It

is reasonable to assume an IID (Independent, Identically Distributed) model for the

level of noise for each pixel position since noise in each pixel position is generated by

individual sensing units of the image sensor such as CCD [25] which are independent.

As a result, the estimate of the standard deviation of the noise (σn) in each image also

represents the standard deviation of the noise in the temporal domain. Therefore,

we can use the estimate of the noise in the spatial domain to estimate that in the

temporal domain.

It should be pointed out that after the denoising has occurred in the spatial domain

using the SFT method, the standard deviation of the noise is significantly reduced.

That reduction is statistically equal to each frame. As a result, the estimated noise

in the spatial domain can still be nominally used for noise reduction in the temporal

domain as the reduction of σn can be automatically absorbed by α.

The sequences ”CLAIRE”, ”FOOTBALL”, and ”TREVOR” are used for α and β

selection. Each of the image sequences are corrupted with differing levels of noise cor-

ruption (σn = 10, 20) and denoised with the SFT denoising method where Equation

4.14 is used as the temporal domain threshold. Values of α and β are used ranging

from α = 0 to 3.0 and β = −0.3 to 0.3. The results of this testing is given in Figure

4.5. As shown in Figure 4.5 the maximum average PSNR is achieved when α = 0.9

70

and β = −0.11. The result is reasonable, of course, because as the motion increases

in an image sequence the redundancy between frames decreases, and the benefits of

temporal domain processing decrease. Thus, as the testing has shown, the temporal

domain threshold decreases as the motion increases.


The proposed video denoising algorithm first is applied to each of the video frames

individually and independently. The method developed in Chapter 3 is used to denoise

single images, and is used as the spatial denoising portion of the wavelet-based video

denoising algorithms.

The video signal is then denoised in the temporal domain by the method developed

in Sections 4.2 and 4.4. The temporal denoising algorithm is a selective shrinkage

algorithm which uses a proposed motion estimation index to determine the temporal

threshold, τz[·]. The temporal threshold is modified by the motion index to effectively

eliminate temporal domain noise while preserving important motion information.

Three image sequences are used to determine the effectiveness of the proposed

video denoising method. They are the ”SALESMAN” image sequence, the ”TENNIS”

image sequence, and the ”FLOWER” image sequence. These three sequences are all

corrupted with various levels of noise and denoised with the methods of [10, 55, 83] as

well as the proposed method. Please note that only the temporal domain denoising

algorithm of [55] is being tested. The spatial domain denoising method given in

Chapter 3 is used for all the wavelet-based video denoising methods. The results are

given in Figures 4.6 through 4.11. As shown in Figures 4.6 through 4.11, the

proposed method consistently outperforms the other methods presented. In all cases,

71

the proposed denoising method has a higher average PSNR than all other denoising

methods tested. Also, note that in the method of [55], the threshold T changes

due to video content and noise level to obtain the highest average PSNR using that

particular method. In the proposed method, the temporal domain threshold, τz[·], is

automatically calculated due to estimates of the noise level and motion.

Figures 4.12 through 4.17 give an example of the effectiveness of each of the

denoising methods. Figure 4.12 gives the original frame #7 of the SALESMAN image

sequence, and Figure 4.13 gives frame #7 corrupted with noise. Frames 4.14 through

4.17 give frame #7 denoised by each of the methods mentioned in this section.

In addition to obtaining a higher signal-to-noise ratio than established video de-

noising algorithms, the proposed denoising algorithm facilitates the compression of

video signals when used as a pre-processing step. That is, the image sequence is first

denoised using the proposed method, then compressed by 3D wavelet compression.

The ”CLAIRE” image sequence is compressed with various quantization step sizes,

both with and without the proposed denoising algorithm. Figure 4.18 gives the com-

pression results. As given in Figure 4.18, regardless of the quantization step, applying

the proposed denoising algorithm prior to compression improves the compression ra-

tio. However, pre-processing is most beneficial when the step size is small.

Table 4.1 gives the results of 3D wavelet compression of various image sequences

both with and without the denoising algorithm applied as a pre-processing step.

72

0 10 20 300102030

28.5

29

29.5

30

30.5

τz

FOOTBALL Image Sequence. SFT Denoising.

τ

Ave

rage

PS

NR

(dB

)

0 10 20 300102030

28.5

29

29.5

30

30.5

τz

FOOTBALL Image Sequence. TFS Denoising.

τA

vera

ge P

SN

R (

dB)

0 10 20 30 010203030

32

34

36

38

40

ττz

CLAIRE Image Sequence. SFT Denoising.

Ave

rage

PS

NR

(dB

)

0 10 20 30 010203030

32

34

36

38

40

ττz

CLAIRE Image Sequence. TFS Denoising.

Ave

rage

PS

NR

(dB

)

Figure 4.1: Test results of both TFS and SFT denoising methods. Upper left: FOOT-BALL image sequence, SFT denoising, max. PSNR = 30.85, τ = 18, τz = 12. Upperright: FOOTBALL image sequence, TFS denoising, max. PSNR = 30.71, τ = 18,τz = 12. Lower left: CLAIRE image sequence, SFT denoising, max. PSNR = 40.77,τ = 19, τz = 15. Lower right: CLAIRE image sequence, TFS denoising, max. PSNR= 40.69, τ = 15, τz = 21.

73

Figure 4.2: Spatial positions of motion estimation test points. Left: FOOTBALLimage sequence, frame #96. Right: CLAIRE image sequence, frame #167.

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

35

40

45Local Motion Estimate of [10] for Varying Noise Levels

Noise Std. Dev.

Mot

ion

Est

imat

e

Claire image sequence (frames 104−167), pos: x=40, y=200Claire image sequence (frames 104−167), pos: x=180, y=144Football image sequence (frames 33−96), pos: x=300, y=220Football image sequence (frames 33−96), pos: x=160, y=120

Figure 4.3: Motion estimate given in [10] of image sequences, CLAIRE and FOOT-BALL.

74

0 5 10 15 20 25 30 35 400

5

10

15

20

25Proposed Local Motion Estimate for Varying Noise Levels

Noise Std. Dev.

Mot

ion

Est

imat

e, M

l

Claire image sequence (frames 104−167), pos: x=40, y=200Claire image sequence (frames 104−167), pos: x=180, y=144Football image sequence (frames 33−96), pos: x=300, y=220Football image sequence (frames 33−96), pos: x=160, y=120

Figure 4.4: Proposed motion estimate of image sequences, CLAIRE and FOOTBALL.

010

2030

4050

60

0

5

10

15

20

25

3030.5

31

31.5

32

32.5

33

33.5

β*100+31

Average PSNR for image sequences used in test varying α and β

α*10+1

PS

NR

(dB

)

Figure 4.5: α and β parameter testing for temporal domain denoising.

75

5 10 15 20 25 30 35 40 45 5030

30.5

31

31.5

32

32.5

33

33.5

34

34.5

35

PS

NR

(dB

)

Frame number

SALESMAN image sequence, std. = 10

Proposed methodPizurica (T=20)2D wavelet filter3D KNN filter3D rational filter

Figure 4.6: Denoising methods applied to the SALESMAN image sequence, std. =10.

76

5 10 15 20 25 30 35 40 45 5028

28.5

29

29.5

30

30.5

31

PS

NR

(dB

)

Frame number

SALESMAN image sequence, std. = 20


Figure 4.7: Denoising methods applied to the SALESMAN image sequence, std. =20.

20 40 60 80 100 120 14024

25

26

27

28

29

30

31

32

33

34

PS

NR

(dB

)

Frame number

TENNIS image sequence, std. = 10


Figure 4.8: Denoising methods applied to the TENNIS image sequence, std. = 10.

77

20 40 60 80 100 120 14024

25

26

27

28

29

30

31

PS

NR

(dB

)

Frame number

TENNIS image sequence, std. = 20


Figure 4.9: Denoising methods applied to the TENNIS image sequence, std. = 20.

5 10 15 20 25 30 35 40 45 5022

23

24

25

26

27

28

29

30

31

PS

NR

(dB

)

Frame number

FLOWER image sequence, std. = 10


Figure 4.10: Denoising methods applied to the FLOWER image sequence, std. = 10.

78

5 10 15 20 25 30 35 40 45 5022

22.5

23

23.5

24

24.5

25

25.5

26

PS

NR

(dB

)

Frame number

FLOWER image sequence, std. = 20


Figure 4.11: Denoising methods applied to the FLOWER image sequence, std. = 20.

Figure 4.12: Original frame #7 of the SALESMAN image sequence.

79

Figure 4.13: SALESMAN image sequence corrupted, std. = 20, PSNR = 22.10.

Figure 4.14: Results of the 3D K-nearest neighbors filter, [83], PSNR = 28.42.

80

Figure 4.15: Results of the 2D wavelet denoising filter, given in Chapter 3, PSNR =29.76.

81

Figure 4.16: Results of the 2D wavelet filtering with linear temporal filtering, [55],PSNR = 30.47.

Figure 4.17: Results of the proposed denoising method, PSNR = 30.66.

82

0 2 4 6 8 10 12 14 16 18 200

1

2

3

4

5

6

7

8

9x 10

6

Quantization step size

Com

pres

sed

file

size

(kB

ytes

)

Compressed files sizes of the "CLAIRE" image sequence

3D wavelet compression3D wavelet compression with pre−processing

Figure 4.18: Wavelet-based compression results with and without pre-processing.

83

Image Sequence Step Size Without Denoising With Denoising

CLAIRE (360x288x168) 2 15.12:1, 3.29 Mbytes 31.72:1, 1.57 MbytesFOOTBALL (320x240x97) 4 6.45:1, 3.30 Mbytes 7.95:1, 2.68 MbytesMISSA (360x288x150) 8 33.10:1, 1.34 Mbytes 66.93:1, 0.68 MbytesCLAIRE (360x288x168) 16 137.2:1, 0.38 Mbytes 170.0:1, 0.30 MbytesMISSA (360x288x150) 32 198.2:1, 0.23 Mbytes 273.6:1, 0.17 Mbytes

Table 4.1: Compression ratios of 3D wavelet compression both with and withoutdenoising applied as a pre-processing step.

As shown in Table 4.1, when the denoising algorithm is applied to an image se-

quence prior to compression, the 3D wavelet compression algorithm achieves better

performance. However, the performance improvement is greater with a smaller quan-

tization step size.

4.6 Discussion

In this chapter, a new combined spatial and temporal domain wavelet shrinkage

method is developed for the removal of noise in video signals. The proposed method

uses a geometrical approach to spatial domain denoising to preserve edge information,

and a newly developed motion estimation index for selective wavelet shrinkage in the

temporal domain.

The spatial denoising technique is a selective wavelet shrinkage algorithm devel-

oped in Chapter 3 and is shown to obtain a higher average PSNR than other wavelet

shrinkage denoising algorithms given in the literature both in denoised image quality

and computation time. The temporal denoising algorithm is also a selective wavelet

shrinkage algorithm which uses a motion estimation index to determine the level of

thresholding in the temporal domain.

84

The proposed motion index is experimentally determined to be more robust to

noise corruption than other methods, and is able to help determine the threshold

value for selective wavelet shrinkage in the temporal domain. With the motion index

and temporal domain wavelet shrinkage, the proposed video denoising method is

experimentally proven to provide higher average PSNR than other methods given

in the literature for various levels of noise corruption applied to video signals with

varying amounts of motion.

85

CHAPTER 5

Virtual-Object Video Compression

5.1 Introduction

The finalized version of the MPEG-4 standard was published in December of 1999.

The basis of coding in MPEG-4 is not a processing macroblock, as in MPEG-1 and

MPEG-2, but rather an audio-visual object [3]. Object based compression techniques

have certain advantages, such as:

1) Allowing more user interaction with video content.

2) Allowing the reuse of recurring object content.

3) Removal of artifacts due to the joint coding of objects.

Although MPEG-4 does specify the advantages of object-based compression and

provides a standard of communication between sender and receiver, it does not provide

the means by which a) the content is separated into audio-visual objects, or b) the

audio-visual objects are compressed. Since the publication of the MPEG-4 standard,

much research has been conducted in the areas of shape coding [28, 40, 79] and texture

coding [36, 78] of arbitrarily shaped objects, and methods of object identification and

tracking [11, 23, 80].

However, although some success has been achieved in the various components

necessary for the implementation of an object-based compression method, no such

86

compression method exists to date. The reason that a robust, object-based compres-

sion method does not exist is two-fold. One, robust multiple object identification

and tracking methods have yet to be developed. The identification and tracking of

all objects that exist in a given image sequence is difficult, and the object extraction

and tracking technologies given in the literature are not mature enough to handle the

task. Two, it is unknown whether the additional bit savings achieved by object-based

compression will be greater than the added overhead of shape coding of objects to

provide an overall compression gain.

Thus, a wavelet-based compression method is presented to provide some of the

benefits of object-based compression methods without the difficulties of true object-

based compression. An object-based wavelet compression algorithm, called virtual-

object compression, is developed for high quality, low bit-rate video.

Virtual-object compression separates the portion of video that exhibits motion

from the portion of the video that is stationary. The stationary video portion is

then grouped as the background, and the portion of the video which exhibits motion

is grouped as the virtual-object. After separation, both background and virtual-

object are coded independently by means of 2D wavelet compression and 3D wavelet

compression, respectively.

There are two separate processing areas in object-based compression. Object

extraction is the method of separating different objects in an image sequence, and the

compression of those objects is a method of compressing arbitrarily shaped objects.

In the virtual-object compression method, the wavelet transform is used for both

object extraction and compression.

87

When the wavelet transform is applied in the temporal domain, the motion of ob-

jects is detected by large coefficient values. Therefore, the wavelet transform is used

in the identification and extraction of moving objects prior to object-based compres-

sion. Virtual-object based compression uses the non-decimated wavelet transform in

the temporal domain in the separation of objects and stationary background.

Virtual-object compression also restricts the shape of the virtual-object to be

rectangular. This restriction enables the use of known video compression methods

such as 3D wavelet compression for the compression of the virtual-object. Also,

with a rectangular object restriction, the location and shape of the object can be

completely defined with only two sets of spatial coordinates (the starting horizontal

and vertical locations of the virtual-object, and the width and height of the virtual-

object), virtually eliminating shape coding overhead.

Experimental results show the virtual-object compression method to be superior

in compression ratio and PSNR when compared to both 2D wavelet compression and

3D wavelet compression.

The organization of this chapter is as follows. Following the Introduction, Sec-

tion 5.2 gives a description of 3D wavelet compression. 3D wavelet compression is

a known compression method of video signals [21, 24] and is used to test the effec-

tiveness of virtual-object compression. Section 5.3 describes the virtual-object com-

pression method, and Section 5.4 gives the performance results of both virtual-object

compression and 3D wavelet compression. Section 5.5 gives the discussion.

88

5.2 3D Wavelet Compression

To show the improvement of the virtual-object compression method to more tra-

ditional compression methods based on macroblocks or frames, we briefly describe a

known compression method called 3D wavelet compression, which is an extension to

the well known image compression method, 2D wavelet compression. A block dia-

gram of 3D wavelet compression is given in Figure 5.1, the components of which are

as follows:

Figure 5.1: 3D wavelet compression.

5.2.1 2D Wavelet Transform

The first processing block of 3D wavelet compression is the spatial transformation

of each of the frames of the image sequence into the wavelet domain. This processing

block is referred to as 2D wavelet transformation.

First, let us define a 3 dimensional video signal f(·), where f(x, y, z) is a pixel

in the image sequence of horizontal position x, vertical position y and frame z. The

dimensions of f(·) are width Wf , height Hf , and frames F . f(·) is a processing unit

89

referred to as a group of frames (GoF). The 2D wavelet transform of f(·) is given by:

all,k+1 [x, y, z] =∑

n

∑m h[n]h[m]all,k[m− 2x, n− 2y, z]

dlh,k+1 [z, y, z] =∑

n

∑m g[n]h[m]all,k[m− 2x, n− 2y, z]

dhl,k+1 [z, y, z] =∑

n

∑m h[n]g[m]all,k[m− 2x, n− 2y, z]

dhh,k+1 [z, y, z] =∑

n

∑m g[n]g[m]all,k[m− 2x, n− 2y, z]

, (5.1)

where

all,−1[x, y, z] = f(x, y, z). (5.2)

d·,k[·] and all,k[·] are the wavelet and scaling coefficients of subband level k, respec-

tively. The subband level k ranges from [−1, KM), where KM is the 2D multires-

olution level (MRlevel). h[·] is the low-pass scaling filter, and g[·] is the high-pass

wavelet filter. The subscript designations of the coefficients, ll, lh, hl, hh describe the

horizontal and vertical processing in the coefficient construction. For example, dhl,k[·]

is obtained by first high-pass filtering all,k−1[·] with g[·] in the horizontal dimension,

and then low-pass filter the result with h[·] in the vertical dimension.

The type of wavelet used in all the given results is the FT wavelet, or 5/3 wavelet.

The FT wavelet is given by

h[·] = {−18, 1

4, 3

4, 1

4,−1

8}

g[·] = {12,−1, 1

2} . (5.3)

The FT wavelet is chosen because it has shown to give the best overall quality for

a given compression ratio for a wavelet which produces only integer coefficients [74].

Note, the benefits of integer wavelet coefficients is both a reduced computational

complexity and memory requirement.

After the coefficients are transformed in the spatial domain, they are then quan-

tized to represent the coefficients with no more precision than necessary to obtain the

desired reconstructed quality.

90

5.2.2 2D Quantization

After the GoF has been 2D wavelet transformed, the coefficients are quantized

uniformly across all subbands in the case of orthonormal wavelet transformation.

However, the wavelet transform used in the given 3D wavelet compression algorithm

is biorthogonal to facilitate integer computation and fast compression. Therefore, the

quantization level is modified according to scale. That is,

all,k[x, y, z] = Int(

2k+1all,k[x,y,z]

s2

)

dlh,k[z, y, z] = Int(

2kdlh,k[x,y,z]

s2

)

dhl,k[z, y, z] = Int(

2kdhl,k[x,y,z]

s2

)

dhh,k[z, y, z] = Int(

2k−1dhh,k[x,y,z]

s2

), (5.4)

where s2 is the 2D quantization step size, and all,k[·], dlh,k[·], dhl,k[·], and dhh,k[·] are the

2D quantized coefficient values. For more information on orthogonal and biorthogonal

wavelets, refer to [12].

After all frames in the GoF have been spatially transformed and quantized, they

are then transformed in the temporal domain to exploit intra-frame redundancy.

This is generally referred to as 3D wavelet transformation. The temporal domain

transformation generally allows for greater compression, given that the frames in the

GoF are similar.

5.2.3 3D Wavelet Transform

The 3D wavelet transform is given by:

d3Dζ,k,j+1

[x, y, z] =∑

p g[z]a3Dζ,k,j[x, y, p− 2z]

a3Dζ,k,j+1

[z, y, z] =∑

p h[z]a3Dζ,k,j[x, y, p− 2z]

, (5.5)

where

a3Dζ,k,−1[x, y, z] = dζ,k[x, y, z]. (5.6)

91

In Equations 5.5 and 5.6, ζ ∈ {lh, hl, hh}, and a3Dζ,k,j[·] and d3D

ζ,k,j[·] are the scaling and

wavelet subbands of spatial scale k and temporal scale j. The superscript indicator

3D denotes 3D wavelet transformation, and j is the subband level in the temporal

domain, which ranges from [−1, JM), where JM is the 3D MRlevel.

For the ll band of the 2D transform, all,k[·], we have

d3Dll,k,j+1

[x, y, z] =∑

p g[z]a3Dll,k,j[x, y, p− 2z]

a3Dll,k,j+1

[z, y, z] =∑

p h[z]a3Dll,k,j[x, y, p− 2z]

, (5.7)

where

a3Dll,k,−1[x, y, z] = all,k[x, y, z]. (5.8)

Note that in Equations 5.5 and 5.7 all 2D wavelet coefficients which are processed with

the g[·] filter are designated as 3D wavelet coefficients, d3D· [·], and all the 2D coeffi-

cients which are processed with the h[·] filter are designated as 3D scaling coefficients,

a3D· [·].

As followed by the 2D wavelet transformation, the 3D wavelet coefficients are

quantized once they are obtained.

5.2.4 3D Quantization

The 3D wavelet and scaling coefficients are quantized by

d3Dll,k,j[x, y, z] = Int

(s22k+1

√2

jd3D·,k,j [x,y,z]

s3

)a3D

ll,k,j[z, y, z] = Int

(s22k+1

√2

j+1d3D·,k,j [x,y,z]

s3

)

d3Dlh,k,j[x, y, z] = Int

(s22k

√2

jd3D·,k,j [x,y,z]

s3

)a3D

lh,k,j[x, y, z] = Int

(s22k

√2


s3

)

d3Dhl,k,j[x, y, z] = Int

(s22k

√2

jd3D·,k,j [x,y,z]

s3

)a3D

hl,k,j[x, y, z] = Int

(s22k

√2


s3

)

d3Dhh,k,j[x, y, z] = Int

(s22k−1

√2

jd3D·,k,j [x,y,z]

s3

)a3D

hh,k,j[x, y, z] = Int

(s22k−1

√2


s3

),

(5.9)

92

where s3 is the 3D quantization level. Again, if the transform used in compression

is an orthonormal transform, the scalings of Equation 5.9 would not be necessary.

However, the bi-orthogonal wavelet transform requires an adjustment by subband

level.

The quantization levels s2 and s3 are left to the user to determine. The relation-

ship between s2 and s3 is an important one, however. If s3 is significantly larger than

s2, unwanted temporal artifacts may result in the reconstructed signal. Therefore,

it is recommended to maintain s3 ≤ s2. Also, there is specific reasoning to why

two quantization processes are necessary. It is known that the statistical properties

of the horizontal and vertical dimensions in a video signal are similar to each other

but differ from the time dimension [23]. Thus, a different quantization step applied

to the spatial and temporal domains is reasonable. Also, it is well known that the

quantization step leads to artifact generation in signal reconstruction. However, the

artifacts that appear from quantization of the 2D wavelet coefficients and the 3D

wavelet coefficients are perceptibly vastly different. The quantization of spatial do-

main wavelet coefficients leads to blurring and softening of the video signal, while

the quantization of the 3D wavelet coefficients leads to ”trails” of moving objects

from frame to frame. Thus, to mitigate the differing types of artifacts generated from

wavelet transformation in the two domains, two quantization step sizes are necessary.

Also, the above formulation of the 2D and 3D wavelet transform is not consistent

with the traditional symmetric wavelet transformation of a 3-dimensional signal. In

the symmetric case, each dimension is transformed at a certain MRlevel level, and

the lowest subband is then processed further for the next MRlevel. In the above for-

mulation, however, the wavelet transform is applied in the spatial domain through all

93

subbands, and only afterwards is applied in the temporal domain. This is referred to

as the decoupled 3D wavelet transform, and it is the preferred wavelet transformation

method for video compression [5, 21, 24, 35].

A visual difference between the 2D wavelet transform and 3D wavelet transform

(both symmetric and decoupled) can be shown when viewing the differing sizes and

shapes of the various subbands that are calculated. Figure 5.2 gives the size and

shapes of each of the subbands calculated by the various wavelet transforms. The 2D

Figure 5.2: Starting from left to right. 1) Original three-dimensional video signal. 2)2D wavelet transform (KM = 2 and JM = 0). 3) Symmetric 3D wavelet transform 4)Decoupled 3D wavelet transform (KM = 2 and JM = 2).

wavelet transform, shown in Figure 5.2, applies no temporal domain processing, thus

there are no segmentation lines crossing the temporal domain separating different

subbands. There are only segmentation lines crossing the horizontal and vertical

dimensions, where the level 2 LL band, all,2[·], is shown in the upper left-hand corner,

and the level 0 HH band, dhh,0[·], is shown in the lower right-hand corner. Also

shown in Figure 5.2, there exists a greater number of subbands generated by the

94

decoupled 3D wavelet transform than in the symmetric 3D wavelet transform, allowing

for greater frequency analysis in both the spatial and temporal domains.

Each subband generated by the 3D wavelet transform is a 3-dimensional bandpass

signal representing the original signal, f(·). A sample of subband locations is given

in Figure 5.3.

Figure 5.3: Decoupled 3D wavelet transform subbands, KM = 2, JM = 2. Left:Subband d3D

hl,1,1[·] highlighted in gray. Right: Subband d3Dlh,0,2[·] highlighted in gray.

After the decoupled 3D wavelet transform and quantization are computed, stack-

run [72] followed by Huffman [22] encoding are applied to each of the subbands for

compression.

5.2.5 3D Wavelet Compression Results

The advantage of the 3D wavelet transform is evident when coding a video signal

with both 2D and 3D wavelet compression. Figure 5.4 gives the results of 2D wavelet

compression vs. 3D wavelet compression on the ”CLAIRE” image sequence. 2D

95

wavelet compression is accomplished by computing the 2D wavelet transform on each

frame in the image sequence separately, applying 2D quantization, and using stack-

run [72] followed by Huffman [22] coding on the quantized coefficients. The 3D

wavelet transform exploits redundancy in the temporal domain as well as in the spatial

domain. Therefore, 3D wavelet compression produces a much higher compression

ratio and better overall quality. As shown in Figure 5.4, the performance of 3D

Figure 5.4: Comparison of 2D wavelet compression and 3D wavelet compression usingthe CLAIRE image sequence (frame #4 is shown). Left: 2D wavelet compression.s2 = 64, KM = 8, file size = 198KB, compression ratio = 256:1, average PSNR =29.80. Right: 3D wavelet compression. s2 = 29, s3 = 29, KM = 8, JM = 8, file size= 196KB, compression ratio = 258:1, average PSNR = 33.31.

wavelet compression method is greater than that of 2D wavelet compression. Note

that for the results given in Figure 5.4 the GoF processing block for 3D wavelet

compression is F = 64 frames.

96

5.3 Virtual-Object Compression

The advantages of 3D wavelet compression over the traditional 2D frame-by-frame

compression is evident by the results given in Figure 5.4. However, to further exploit

temporal domain redundancy in video signals, virtual-object compression is devel-

oped. In virtual-object compression, the original video signal is separated into back-

ground and virtual-object. Then each is compressed separately for more optimal

compression results.

5.3.1 Virtual-Object Definitions

Let us define a three-dimensional rectangular object o(·) where o(x, y, z) is a pixel

in the object sequence of horizontal position x, vertical position y and frame z. The

dimensions of o(·) are width Wo, height Ho, and frames F . We restrict the object to be

the same size in each frame of the sequence to ensure that the virtual-object is easily

defined and compressible. Therefore, Wo and Ho are constant, and not dependent on

z.

However, because objects in an image sequence move, we must allow the virtual-

object to be placed anywhere within each frame. Thus, we define coordinates Sx[·] and

Sy[·] which correspond to upper-left corner of the virtual-object in each frame, or the

starting horizontal and vertical positions of the virtual-object, respectively. We also

define Ex[·] and Ey[·] which correspond to the lower-right corner of the virtual-object,

or the ending horizontal and vertical positions of the virtual-object, respectively.

With these definitions some boundary conditions are required. The virtual-object

must be positive in width and height, and it cannot be larger than the original video

frames, thus 0 ≤ Wo ≤ Wf and 0 ≤ Ho ≤ Hf . Also, the virtual-object must lie

97

within each frame. Thus, 0 ≤ Sx[z] < Wf − 1 and 0 ≤ Sy[z] < Hf − 1, for all z. It is

also known that Sx[z] < Ex[z] < Wf and Sy[z] < Ey[z] < Hf , for all z.

As stated previously, the virtual-object must remain the same size for each frame

in the sequence. Therefore, Ex[z]− Sx[z] = Wo and Ey[z]− Sy[z] = Ho for all z.

The virtual-object is defined as:

o(x, y, z) = f(x+Sx[z], y+Sy[z], z), 0 ≤ x < Wo, 0 ≤ y < Ho, 0 ≤ z < F , (5.10)

where o(·) is the virtual-object and f(·) is the original image sequence.

The background is defined as:

b(x, y) =

{ ∑F−1z=0 f(x,y,z)α[x,y,z]∑F−1

z=0 α[x,y,z], when

∑F−1z=0 α[x, y, z] 6= 0

0, else, (5.11)

where

α[x, y, z] =

{1, when (x, y, z) ∈ L + R + U + D0, else

. (5.12)

L, R, U , and D represent the area which lies outside the virtual-object, or the area

left (L), right (R), above (U), and below (D) the virtual-object. More specifically,

L = {(x, y, z) : x < Sx[z]}, R = {(x, y, z) : x ≥ Ex[z]}, U = {(x, y, z) : y < Sy[z]},

and D = {(x, y, z) : y ≥ Ey[z]}. As shown in Equation 5.11, the background is formed

by temporal average of the entire GoF area outside of the virtual-object boundary.

Figure 5.5 gives a frame of the ”CLAIRE” image sequence including virtual-object

definitions.

5.3.2 Virtual-Object Extraction Method

The method of extracting the virtual-object is accomplished by applying the

wavelet transform in the temporal domain to the original image sequence f(·). The

extraction method separates the portion of video with motion from the portion of the

98

Figure 5.5: Virtual-object extraction.

video without motion. Motion in an image sequence results in large temporal domain

transform coefficients which are spatially contiguous.

The non-decimated wavelet transform in the temporal domain of a 3 dimensional

image sequence f(·) is given by

λvo[x, y, z] =∑m

f(x, y,m)gvo[m− z], (5.13)

where λvo[·] are the wavelet coefficients, and gvo[·] is the wavelet filter. The subscript

designation vo is given to identify the coefficients and wavelet filter for purposes of

virtual-object extraction.

99

Experimentally, it has been determined that the biorthogonal Haar wavelet func-

tion provides the best motion identification. The biorthogonal Haar wavelet is given

by

gvo[t] =

1 when t = 0−1 when t = 10 else

. (5.14)

The compact support of the biorthogonal Haar wavelet makes it a natural choice for

motion identification. Assuming there is no noise in the image sequence, a simple

difference between consecutive frames is the most effective means of motion identi-

fication. The compact support of the Haar wavelet is most aptly able to locate the

spatial and temporal position of motion in an image sequence.

A 3 dimensional Boolean map determining motion from non-motion is obtained

by thresholding the coefficient values, λvo[·].

Ivo[x, y, z] =

{1, when |λvo[x, y, z]| > τvo

0, else. (5.15)

The Boolean motion map, Ivo[·] is refined by spatial support criteria described in

Section 3.3. That is,

Jvo[x, y, z] =

{1, when Svo[x, y, z] > svo

0, else, (5.16)

where Svo[x, y, z] is calculated by an algorithm given in Appendix A.

The values of τvo and svo are experimentally determined. We find that τvo = 15

and svo = 2 give the best separation of object and background.

100

Each frame of the Boolean map is scanned to find the smallest rectangle possible

to fit all the non-zero Jvo[·]. This is obtained by

γx[z] = max( ~K) where k ∈ ~K ⇐⇒ ∑k−1m=0

∑Hf−1n=0 Jvo[m,n, z] = 0

εx[z] = min( ~K) where k ∈ ~K ⇐⇒ ∑k−1m=0

∑Hf−1n=0 Jvo[m,n, z] =∑Wf−1

m=0

∑Hf−1n=0 Jvo[m,n, z]

γy[z] = max( ~K) where k ∈ ~K ⇐⇒ ∑k−1n=0

∑Wf−1m=0 Jvo[m,n, z] = 0

εy[z] = min( ~K) where k ∈ ~K ⇐⇒ ∑k−1n=0

∑Wf−1m=0 Jvo[m,n, z] =∑Hf−1

n=0

∑Wf−1m=0 Jvo[m,n, z]

.

(5.17)

The vectors, γx[·] and εx[·] are the starting and ending horizontal positions of the

virtual-object in each frame of the Boolean map. Similarly, γy[·] and εy[·] are the start-

ing and ending vertical positions of the virtual-object. However, these boundaries for

the virtual-object may not be the same size, i.e., εx(b)−γx(b) 6= εx(a)−γx(a), for a 6=

b. Therefore, the width and height of the virtual-object are defined by,

Wo = max(~εx − ~γx), zm,x = arg max(~εx − ~γx)Ho = max(~εy − ~γy), zm,y = arg max(~εy − ~γy)

. (5.18)

zm,x and zm,y are the frames which contain the maximum virtual-object width and

maximum virtual-object height, respectively.

The starting horizontal and vertical positions of the virtual-object , Sx(·) and

Sy(·), are needed to completely specify the location of the virtual-object. These po-

sitions are established to completely contain the virtual-object in all frames, and to

minimize the horizontal and vertical motion of the virtual-object boarder through-

out the image sequence. It has been experimentally determined that minimal spa-

tial movement of the virtual-object between consecutive frames provides the largest

compression ratios and best reconstructed quality. Thus the starting horizontal and

101

vertical positions of the virtual-object are given by

Sx[0] =

γx[zm,x], when γx[0] < Sx[zm,x]εx[0]−Wo, when εx[0] ≥ Ex[zm,x]Sx[zm,x], else

, (5.19)

Sx[z] =

γx[z], when γx[z] < Sx[z − 1]εx[z]−Wo, when εx[z] ≥ Ex[z − 1]Sx[z − 1], else

, (5.20)

Sy[0] =

γy[zm,y], when γy[0] < Sy[zm,y]εy[0]−Ho, when εy[0] ≥ Ey[zm,y]Sy[zm,y], else

, (5.21)

and,

Sy[z] =

γy[z], when γy[z] < Sy[z − 1]εy[z]−Ho, when εy[z] ≥ Ey[z − 1]Sy[z − 1], else

. (5.22)

The calculation of the starting horizontal and vertical positions, Sx[·] and Sy[·],

given in Equations 5.19 through 5.22 guarantee minimal movement of the virtual-

object boarder.

The reconstructed video signal, f(·), is given by

f(x, y, z) =

{b(x, y), when α[x, y, z] = 1o(x, y, z), else

, (5.23)

where b(·) and o(·) are the reconstructed background frame and virtual-object, re-

spectively.

5.3.3 Virtual-Object Coding

Once the virtual-object and background have been identified and separated, the

independent compression of each is straightforward. The background is compressed

by 2D wavelet compression, and the virtual-object is compression by 3D wavelet

compression described in Section 5.2. Figure 5.6 gives the design flow of the virtual-

object compression method.

102

Figure 5.6: Virtual-object compression.

As given in Figure 5.6, the original video signal is separated into the virtual-object

and background using the virtual-object extraction method. The virtual-object and

background are then compressed separately using the 3D wavelet compression and

2D wavelet compression methods, respectively. Each of the processing blocks given

in Figure 5.6 following the virtual-object extraction method are described in Section

5.2.

5.4 Performance Comparison Between 3D Wavelet and Virtual-Object Compression

The virtual-object compression method is compared to the 3D wavelet compres-

sion method. The ”CLAIRE” image sequence is used for continuity with the com-

parison of 2D wavelet compression to 3D wavelet compression, given in Figure 5.4.

Figure 5.7 gives results of 3D wavelet compression and virtual-object compression

methods, using the ”CLAIRE” image sequence. Note that for the results given in

Figure 5.7 the GoF processing block is F = 64 frames. As shown in Figure 5.7, the

103

Figure 5.7: Comparison of 3D wavelet compression and virtual-object compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 3D wavelet compres-sion. s2 = 29, s3 = 29, KM = 8, JM = 8, file size = 196KB, compression ratio =258:1, average PSNR = 33.31. Right: Virtual-object compression, s2 = 25, s3 = 25,KM = 8, JM = 8 for the virtual-object and s2 = 9, KM = 8 for the background, filesize = 195KB, compression ratio = 259:1, average PSNR = 34.00.

virtual-object compression method achieves an increase in compression ratio from 3D

wavelet compression while providing higher PSNR.

Along with the ”CLAIRE” image sequence, the virtual-object compression method

is tested against 3D wavelet compression as well as 2D wavelet compression using the

”SALESMAN” and ”MISSA” image sequences. The results of the quality comparison

is given in Figure 5.8. Figure 5.8 shows that virtual-object compression consistently

outperforms both 2D wavelet compression and 3D wavelet compression in compression

ratio and PSNR.

104

5 10 15 20 25 30 35 40 45 5027

28

29

30

PS

NR

(dB

)Results from Using Various Compression Methods on the SALESMAN Image Sequence

virtual−object comp., 54278 bytes3D wavelet comp., 56449 bytes2D wavelet comp., 59367 bytes

0 20 40 60 80 100 12030.5

31

31.5

32

PS

NR

(dB

)

Results from Using Various Compression Methods on the MISSA Image Sequence

virtual−object comp., 199,554 bytes3D wavelet comp., 202,035 bytes2D wavelet comp., 206,914 bytes

20 40 60 80 100 120 140 160

30

32

34

Frame Number

PS

NR

(dB

)

Results from Using Various Compression Methods on the CLAIRE Image Sequence

virtual−object comp., 200,205 bytes3D wavelet comp., 201,140 bytes2D wavelet comp., 202,878 bytes

Figure 5.8: Comparison of 2D wavelet compression, 3D wavelet compression, andvirtual-object compression.

5.5 Discussion

In this chapter, a new object-based compression method called virtual-object com-

pression has been described. Virtual-object compression differs from typical video

compression methods by first extracting moving objects from stationary background

and compressing each separately. The separation of objects and background enable

independent coding of both, providing a low bit-rate compressed video signal.

105

Although virtual-object compression is not a truly object-based compression method

set forth by the MPEG-4 standard. It is able to provide compression gain and im-

proved PSNR from the 3D wavelet compression method by relaxing some of the

constraints involved with object based compression methods. Thus, the results of

virtual-object compression have shown a performance improvement over the more

traditional wavelet-based compression methods of 2D wavelet compression and 3D

wavelet compression.

106

CHAPTER 6

Constant Quality Rate Control for Content-Based 3DWavelet Video Communication

6.1 Introduction

The vast amounts of data associated with digital images and video streams have

provided a growing concern and motivation for efficient image compression methods.

Many such compression algorithms have been developed around a variety of matrix

transforms [47, 48, 52]. One such method, the wavelet transform, has shown promising

results in large compression ratio and high reconstructed image quality [37, 70, 82].

Recently, the efficient coding of video signals has become a leading topic in com-

pression research [30, 56]. A new compression algorithm, the 3D wavelet transform,

has been developed to provide very high compression ratios of digital video while

preserving the reconstructed quality [71, 81].

Tightly coupled with compression research is the reliable transmission and recep-

tion of compressed video. Real-time video communication applications using com-

pression algorithms demand a constant frame rate for a high quality of service (QoS).

This requirement is challenging, however. Inconsistent compression and decompres-

sion computation times, variable compressed video data size, and the unpredictable

107

available bandwidth of volatile communication channels all hinder the performance

of real-time video communication.

Many rate control algorithms have been proposed in recent history, and most have

been associated with providing constant frame rate with a variable quantization pa-

rameter [13, 32, 38, 51, 57, 59, 60, 65]. The quantization parameter directly affects

both the bit rate and reconstructed video quality. Therefore, for low bit-rate envi-

ronments, the constant frame rate approach may provide poor quality image frames

at the receiver. To combat this effect, other rate control algorithms have controlled

both the frame rate and the quantization parameter to provide a best possible QoS

[58, 66, 67]. However, for many applications, individual image frames of reasonable

visual quality are vastly more important than high frame rates. Therefore, we employ

a fixed quantization step-size to deliver constant quality video frames.

Also, most former rate control algorithms have a minimum bit rate requirement

for the communication channel [13, 14, 32, 51, 57, 58, 59, 60, 65, 66]. Unfortunately

many communication systems such as the Internet do not provide a minimum bit

rate guarantee. Furthermore, the content-based 3D wavelet compression scheme is

a special case of image compression and also a relatively new idea [71, 81]. Thus

it is desirable for a rate control algorithm specific to 3D wavelet compression to be

developed.

The content-based 3D wavelet compression scheme operates on a group of frames

(GoF), and the number of frames varies between groups depending on the video

content. Because we group only similar frames together, the number of frames in

each group is variable. Thus, the 3D wavelet transform produces a variable delay

for the transmission of real-time video. Because of this delay, rate control becomes

108

an even more difficult issue. To deal with the uncertainty of both the bandwidth

of the communication channel and the video content, we propose a new rate control

algorithm. It differs from previous algorithms in many ways. First, because there are

two uncertainties, there are two frame buffers for the storage of video frames in both

the client and server sides. Secondly, the client side buffer is developed to ensure the

continuous display of reconstructed image frames. The client side buffer must contain

enough reconstructed video content to overcome the acquisition delay of the next GoF

as well as the delay of data transfer over the network, and the computation time of

the compression and decompression algorithms. The buffer is based on a leaky bucket

algorithm with an adjustable window of constant frame rate (AWCF). Thirdly, for the

server side we develop a feedback mechanism from the client to control the server’s

buffer content and ensure that the frame rates of the server and client sides are equal.

This chapter is arranged into five sections. Following the Introduction, Section

6.2 gives a brief description of content-based 3D wavelet compression and illustrates

the functionality and importance of a multi-threaded application for real-time com-

munication. Section 6.3 provides an overview and analysis of the rate control system,

including the constraints imposed on the rate control buffers, design parameters of

the control buffers on the client and server sides, and a definition of the AWCF.

Section 6.4 gives experimental results of the rate control algorithm, and Section 6.5

summarizes the chapter.

6.2 Multi-Threaded, Content-Based 3D Wavelet Compres-sion

The content-based 3D wavelet video compression/decompression system design

flow is given in Figure 6.1. As shown in Figure 6.1, the frame grabber loads video

109

frames into the compression system. The dynamic grouping of frames then compares

and groups frames of similar content together. The dynamic grouping process sends

the group of frames (GoF) to the 3D wavelet compression system. The compression

algorithm then compresses the video using wavelet analysis. By grouping frames of

similar content, the inter-frame redundancy of the individual pixels is assured, thus

providing high compression ratios. The compressed video is then either stored or sent

across a communication channel. The 3D wavelet decompression system reconstructs

the video, and the video is then displayed to the user. The content-based compression

approach develops GoFs of differing size, and because of the disparity in GoF size

the computation time required to compress and decompress each GoF varies. Thus,

continuous and smooth display of video becomes a challenging issue.

Figure 6.1: Content-based 3D wavelet compression/decompression design flow.

A real-time compression/decompression system must be able to perform many

tasks concurrently. For example, the compression algorithm must continuously cap-

ture and group frames while compressing video and sending it to the receiver. This

110

can only be performed when operations are being computed independently. There-

fore, four processing threads are created in the communication system: the grouping

thread, compression thread, decompression thread, and display thread. Figure 6.2 gives

a model of the communication system.

Figure 6.2: 3D wavelet communication system.

The two buffers that have been added to the system, shown in Figure 6.2, are

instrumental in achieving independent operation from each of the application threads.

Also, all four threads will be continuously active as long as both buffers are neither

empty, nor full. The grouping thread will continue to group frames until the grouping

buffer is full. At that point, there is no space left for the next GoF. Conversely,

the compression thread will continue to compress until the grouping buffer is empty.

After the grouping buffer is empty there is no longer a GoF to compress. Therefore,

continuous activity from both the grouping thread and the compression thread depends

111

on the fullness of the grouping buffer. Similarly, at the receiving end, continuous

activity from the decompression thread and the display thread can only be achieved if

the display buffer is neither full nor empty.

6.3 The Rate Control Algorithm

6.3.1 Rate Control Overview

The rate control algorithm of the current system is based on a leaky bucket ap-

proach [7, 13, 14, 59]. The leaky bucket idea has been developed earlier for ATM

networks and other applications, but has never been considered for 3D wavelet com-

pression. As stated previously, all four computation threads are continuously active

if and only if both data buffers given in Figure 6.2 are neither full, nor empty. There-

fore, the goal of the rate control algorithm is to keep the amount of data in both the

buffers at a reasonable level while ensuring the frame grabber rate and frame display

rate are constant, and equal.

Also, the network bandwidth limitation has not yet been considered. With limited

bandwidth, all four of the threads cannot be completely active. In most applications,

the computational capacity of both platforms greatly exceeds the communication

bandwidth available. Therefore, a rate control algorithm must manage each of the

threads computational activity. Figure 6.3 gives the completed rate control wavelet

communication system. The additions to the system given in Figure 6.2 are as follows:

Send Thread and Send Buffer – The most important part of the wavelet communi-

cation system is to maximally utilize the available bandwidth given by the communi-

cation channel, thus attempting to provide the highest possible frame rate. Therefore,

112

Figure 6.3: Complete rate control system.

another buffer and processing thread are created to continually send data at the max-

imum rate possible. The send buffer is inserted into the system to give the send thread

data to output through the channel. The compression thread is an algorithm whose

output bit rate depends on the content of the input video, so the send buffer is neces-

sary to achieve continuous data throughput. The send thread also partitions the data

into smaller packets to enable the continuous flow of data.

Receive Thread and Receive Buffer – The receive thread is used to capture the

data packets from the communication channel, and the received data is stored in the

receive buffer. The send buffer and receive buffer need not be controlled. Given that

they are sufficiently large, the control of the grouping buffer and display buffer will

limit the amount of data that the send buffer and receive buffer must hold.

113

Send Monitor – The send monitor controls the rate at which the frame grabber

acquires each frame. Its decision comes with the size of the data in the grouping buffer.

The send monitor attempts to keep the grouping buffer fullness at a reasonable level by

adjusting the frame acquisition rate. However, the frame acquisition rate is confined

by the feedback provided by the receiver, because real-time communication requires

that the frame acquisition rate and display rate be equivalent. The send monitor

enforces the grouping buffer constraints, which are given in Subsection 6.3.2.

Receive Monitor – The receive monitor regulates the size of the receive buffer by

controlling the display rate at the receiver. The receive monitor attempts to keep the

display buffer fullness at a reasonable level by adjusting the display rate and enforcing

the display buffer constraints, which are given in Subsection 6.3.2.

Feedback – A virtual path where the client sends information to the server. The

receiver monitor uses the feedback path ensure equivalent acquisition and display

rates.

The proposed leaky bucket control model reduces the number of variables in the

compression algorithm. Our interest lies only in rate control, not the specifics of

wavelet video compression. Therefore, the compression and decompression threads,

and network can be modeled as a single delay from transmitter to receiver. Figure

6.4 gives the control model for the rate control system. From the control model given

in Figure 6.4, we can develop the constraints of the grouping and display buffers.

6.3.2 Buffer Constraints

As shown in Subsection 6.3.1, the send monitor and receive monitor adjust the

flow of data into and out of grouping buffer and display buffer respectively to control

114

Figure 6.4: Rate control model.

buffer fullness. Therefore, it is necessary to give analysis of the constraints imposed

on both grouping buffer and display buffer by the send monitor and receive monitor.

The display buffer content is given by

Bdi = Bd

i−1 + Ri −Di, (6.1)

where i is the unit time, Bdi is the display buffer fullness, and Ri is the video recon-

struction rate. Di is the display frame rate. Also, since the display buffer has a fixed

size, it is also governed by

0 ≤ Bdi ≤ Sd, (6.2)

where Sd is the size of the display buffer. The receive monitor manages the size of

the display buffer by regulating Di. Therefore

Di =

Di−1 − δD, when Bdi−1 < εd

Di−1, when εd ≤ Bdi−1 ≤ φd,

Di−1 + δD, when φd < Bdi−1

(6.3)

where εd and φd are threshold levels corresponding to an almost empty and almost

full display buffer, respectively. δD corresponds to a modest change in the display

115

rate given by

δD = αDDi−1, (6.4)

where αD is the percent change in display rate. Assuming a small value for αD, the

receive monitor applies a gradual reduction in the display rate when the display buffer

falls below εd. Also, the receive monitor applies a gradual increase in the display rate

when the display buffer exceeds above φd. The gradual increase and decrease of frame

rate is crucial in producing a high QoS for the user.

The grouping buffer follows similar constraints.

Bgi = Bg

i−1 + Ai − Ei, (6.5)

where Bgi is the grouping buffer fullness, and Ai is the frame acquisition rate, and

Ei is the compression rate. Similar to the display buffer, the grouping buffer is also

governed by,

0 ≤ Bgi ≤ Sg, (6.6)

where Sg is the size of the grouping buffer. The grouping buffer fullness is controlled

by the send monitor, which regulates the frame acquisition rate Ai.

Ai =

Di−1 + δA, when Bgi−1 < εg

Di−1, when εg ≤ Bgi−1 ≤ φg,

Di−1 − δA, when φg < Bgi−1

(6.7)

where εg and φg are grouping buffer threshold levels similar to those of the display

buffer give in Equation 6.3. δA corresponds to a modest change in the acquisition

rate given by

δA = αADi−1. (6.8)

αA is the percent change in acquisition rate. Note that the grouping buffer in the

server is controlled by the display rate of the client. The send monitor is provided

116

Di−1 by the receive monitor through the feedback path from client to server. Also,

Ai ≈ Di, (αA, αD << 1) (6.9)

which is a requirement for real-time systems.

The compression algorithm can only operate on an entire GoF for temporal domain

compression. Therefore

Ei =

{CN , when i = GN

0, else, (6.10)

and

CN ∈ {1, 2, ..., Γ} (6.11)

where N is the GoF index, and GN corresponds to the unit time period when the last

frame of the N th group is acquired. CN depicts the size of the N th GoF, and Γ is

the maximum group size. Note that Γ is an important parameter to select. When

Γ is large, one is allowed to have more frames in a single group thus increasing the

compression ratio. On the other hand, a large Γ increases the delay time between

the acquisition and display of the video. Usually, Γ is selected to maximize the

compression ratio while staying within the delay requirement, which is application

specific.

Similar to Equation 6.10, the video reconstruction rate is given by

Ri =

{CN , when i = GN + LN

0, else, (6.12)

where LN is the delay of the N th GoF from the grouping buffer to the display buffer

as shown in Figure 6.4 , caused by the compression and decompression computation

times, and network delay.

117

For the grouping buffer to neither overflow or empty, it is necessary that

Limn→∞

1

n

n∑i=0

Ai = Limn→∞

1

n

n∑i=0

Ei. (6.13)

As n increases, the system reaches steady state where the grouping buffer input rate

is equal to the grouping buffer output rate. Similarly, the display buffer input and

output rates become equal in steady state.

Limn→∞

1

n

n∑i=0

Ri = Limn→∞

1

n

n∑i=0

Di. (6.14)

The control of the buffers’ fullness given by Equations 6.3 and 6.7, is developed to

ensure the validity of Equations 6.13 and 6.14. The steady state of the buffers’ fullness

is necessary for the success of the rate control algorithm. With steady state data flow

through both buffers, the data flowing from the input of the grouping buffer to the

output of the display buffer approaches a constant rate, and a constant rate is what

is desired.

6.3.3 Grouping Buffer Design

The design parameters that need to be assigned for the grouping buffer are:

• The empty buffer threshold, εg.

• The full buffer threshold, φg.

• The grouping buffer size, Sg

The empty buffer threshold

The basic idea of the grouping buffer is to continue to push more data through

the network until the maximum bandwidth available is utilized, or the computational

118

activity of one of the platforms is maximized. As seen from Equation 6.7, the grouping

thread continues to acquire frames at a slightly greater rate than the display thread in

an effort to continually push more data through the network. Also from Equation 6.10

we see that the grouping buffer empties when the last frame of a GoF is acquired. So

in an effort to keep constant the acquisition rate, and continually push the available

network bandwidth,

εg = Γ. (6.15)

With this threshold in place, the grouping thread will continually acquire frames at

a slightly greater rate than the display threads frame rate, thus continually pushing

the bandwidth of the communication system.

The full buffer threshold and the grouping buffer size

With limited bandwidth it is possible for the compression thread and sending

thread to both be limited in the rate at which each can output data. Therefore, to

combat the possible overflow of both the send buffer and grouping buffer, the value

of φg is determined.

If we look at the worst case scenario of total network congestion, the grouping

thread may acquire up to φg frames before the send monitor will start to slow the

frame acquisition rate. Therefore, The value of φg is determined to be

φg = 2Γ. (6.16)

With this threshold in place, the grouping thread may acquire up to two GoF’s of the

maximum size before being penalized with a slowed acquisition rate. The size of the

grouping buffer is also determined:

Sg = φg + Γ = 3Γ. (6.17)

119

The size of the grouping buffer allows up to three GoFs of the maximum size to be

acquired with total network congestion. Therefore, the value of Sg gives enough space

for buffer overflow to be avoided.

The grouping buffer design is simple with fixed values for εg and φg, and mostly

governed by the frame rate of the display thread as seen in Equation 6.7. Therefore,

the display buffer design is the primary vehicle for rate control, which is discussed in

detail in the following subsection.

6.3.4 Display Buffer Design

There are several design parameters that need to be assigned for the display buffer:

• The initial buffering level, I.

• The empty buffer threshold, εd.

• The full buffer threshold, φd.

• The display buffer size, Sd.

The initial buffering level

Because the video frames are grouped by content, the groups are of different sizes

with a maximum threshold Γ, as given in Equation 6.11. Therefore, group sizes range

from 1 to Γ frames. As an example, assume the beginning of a video sequence contains

two groups: the first group consists of 1 frame, and the second group consists of Γ

frames. If the first group is sent to the receiver, and the receiver immediately displays

that frame after image reconstruction, the receiver will inevitably wait for the second

group to be sent with no frames in the display buffer, and a constant frame rate will

120

not be achieved. Therefore, an initial buffering level large enough to ensure constant

video display must exist.

From the previous example, it is obvious that the initial buffering level, I, must

be larger than Γ.

I ≥ Γ. (6.18)

However, the initial buffer level must also be larger than the empty buffer threshold,

εd. This is necessary to keep the display buffer level greater than εd to ensure that the

frame rate remains constant, as given in Equation 6.3. Therefore,

I ≥ Γ + εd. (6.19)

However, I directly corresponds to the initial waiting time for the receiver. If I is

chosen too large, the receiver will have an overly large initial buffering time, decreasing

the QoS. Therefore must be kept at a minimum, and we choose

I = Γ + εd. (6.20)

The empty buffer threshold

The variable delay LN , given in Equation 6.12, is used to calculate the minimum

value of εd needed to ensure that the display buffer never empties. From Equations

6.3 and 6.4 we can determine the average display rate during the critical empty buffer

warning level, i.e., Bdi−1 ≤ εd. First, we can determine the amount of time the buffer

has before it empties, without control. That is,

τc =εd

Di

. (6.21)

τc represents the critical time period before the display buffer is empty. We can now

assume control of the display buffer and then determine the estimated average display

121

rate, Davg|Bdi−1≤ε.

Davg|Bdi−1≤ε >

Di + (Di − δDτc)

2= Di − εdαD

2. (6.22)

Note that Equation 6.22 is merely an estimate of the average delay in the display

buffer. The actual average delay is a polynomial of degree εd − 1. In Equation 6.22,

we assume δD to be constant when in reality the value of changes with each change

in the display rate, as seen in Equation 6.4. The choice to use this estimate is based

on computational simplicity and algorithmic intuitiveness.

Moreover, we know that enough frames must exist in the display buffer to keep

displaying throughout the delay of the next GoF, LN+1. Therefore,

εd

Davg|Bdi−1≤ε

≥ LN+1. (6.23)

Solving for εd and substituting in Equation 6.22 we obtain

εd ≥ 2LN+1Di

2 + αDLN+1

. (6.24)

In Practice, however, the variable delay LN+1 is greatly dependent on the size of

the next GoF, which is unknown. Therefore, for a worst-case scenario, we compute

the average delay per frame, Lf , and multiply by Γ to estimate the delay of a GoF

consisting of Γ frames. Therefore,

εd ≥ 2LfΓDi

2 + αDLfΓ. (6.25)

The average delay per frame can then be obtained by

Lf =LN

CN

. (6.26)

The value of LN is determined by calculating the round-trip time (RTT) of the com-

pressed GoF from client to server, dividing by 2, and adding the computation times

of the compression and decompression algorithms.

122

Again, to ensure the minimum delay possible for I, and substituting in for Lf we

obtain:

εd =2LNΓDi

2CN + αDLNΓ. (6.27)

And substituting into Equation 6.20, we have,

I = Γ(1 +2LNDi

2CN + αDLNΓ). (6.28)

The full buffer threshold and the display buffer size

The full buffer threshold, φd, is set one greater than I in order to produce an

AWCF that is 2Γ in size. Therefore

φd = Γ(2 +2LNDi

2CN + αDLNΓ). (6.29)

The display frame rate is constant whenever the buffer fullness is within this window.

Also, the display buffer size can be arbitrarily set greater than φd. We find that

Sd = 4Γ (6.30)

gives enough space for the AWCF to move.


The communication system is developed, and a test is run with a maximum group

size Γ of 64, an αA of 0.1, and an αD of 0.01. These parameters are found to produce

quality results, but their values are determined empirically, and without analysis

beyond the requirements given by Equation 6.9. The video is run for approximately

20 minutes. Also, the initial display rate of the receiver is deliberately set for too

high a frame rate for the communication system to handle, for evaluation purposes.

123

The video sample is 320x240 color frame size, and the initial frame rate, D0, is set

at 12 fps. The display frame rate, as well as the display buffer size is given in Figure

6.5.

0 2 4 6 8 10 12 14 16 18 200

5

10

15

time (minutes)

Fra

me

Rat

e (f

ps)

Display Frame Rate and Display Buffer Size

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

time(minutes)

Buf

fer

Siz

e (f

ram

es)

Display Buffer SizeLower Threshold, εUpper Threshold, φ

Figure 6.5: Display frame rate and display buffer size, D0=12 fps.

As seen in Figure 6.5, the rate control algorithm does reduce the frame rate until

steady state is found. Also, the frame rate stays constant unless the buffer fullness

reaches beyond the threshold levels of the AWCF, as given in Equation 6.3. Therefore,

124

the control algorithm produces a smooth and continuous frame rate for real-time video

communication.

The results of the acquisition frame rate and grouping buffer fullness are given in

Figure 6.6.

0 2 4 6 8 10 12 14 16 18 200

5

10

15

time (minutes)

Fra

me

Rat

e (f

ps)

Acquisition Frame Rate and Grouping Buffer Size

Grouping RateDisplay Rate

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

time (minutes)

Buf

fer

Siz

e (f

ram

es)

Grouping Buffer SizeLower Threshold, εUpper Threshold, φ

Figure 6.6: Frame acquisition rate and grouping buffer size, D0=12 fps.

As seen in Figure 6.6, the frame acquisition rate follows the frame display rate

as given in Equation 6.9. However, the acquisition rate is slightly higher than the

125

display rate. This is due to the grouping buffer size, which is lower than the empty

buffer threshold as shown in Equation 6.7.

The same video is run again, but the initial frame rate is set to 2 fps, intentionally

slower than the maximum frame rate the network can handle. The display frame

rate, as well as the display buffer fullness is given in Figure 6.7.

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

time (minutes)

Fra

me

Rat

e (f

ps)

Display Frame Rate and Display Buffer Size

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

time(minutes)

Buf

fer

Siz

e (f

ram

es)

Display Buffer SizeLower Threshold, εUpper Threshold, φ

Figure 6.7: Display frame rate and display buffer size, D0=2 fps.

As seen in Figure 6.7, the frame rate slowly reaches a steady-state frame rate

of approximately 8 fps, the same steady-state frame rate as given in Figure 6.5.

126

Therefore, the rate control algorithm does converge to a frame rate that maximally

utilizes the capacity of the platforms and network. Figure 6.8 gives the Acquisition

frame rate and grouping buffer fullness.

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

time (minutes)

Fra

me

Rat

e (f

ps)

Acquisition Frame Rate and Grouping Buffer Size

Grouping RateDisplay Rate

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

time (minutes)

Buf

fer

Siz

e (f

ram

es)

Grouping Buffer SizeLower Threshold, εUpper Threshold, φ

Figure 6.8: Frame acquisition rate and grouping buffer size, D0=2 fps.

Figure 6.8 indeed shows that the frame acquisition rate and display rate are close

to being equal, as given in Equation 6.9. Thus, the rate control algorithm continually

monitors the capacity of the network and adjusts the frame rate accordingly.

127

6.5 Discussion

We have developed a rate control algorithm designed for a content-based 3D

wavelet video compression scheme, used for real-time video transfer. With the GoF

requirement of 3D wavelet compression, an inherent delay is introduced in the trans-

mission of real-time video. Also because the wavelet transform is a content-based

compression scheme, the compression and decompression times vary with each group,

and the compressed file size also varies between differing GoF’s of the same size. A

rate control algorithm is designed to supply a smooth and continuous frame rate from

server to client in an environment with a variable and unknown network delay such

as the Internet and a compression scheme which allows for variable GoF sizes.

A buffering mechanism is developed on both the client and server sides to ensure

the continuous display of reconstructed image frames. On the server side, a grouping

buffer is designed based on the maximum GoF size. On the client side, a display

buffer is designed based on the maximum GoF size as well as the variable delay of

the network. As shown in the experimental results, the AWCF is able to provide

continuous video to the client based upon the inherent characteristics associated with

content-based 3D wavelet compression and real time video transfer. In addition,

a feedback mechanism is used from the client to control the servers buffer content

and ensure that the acquisition rate of the server and display rate of the client are

equal. Experimental results prove that the rate control algorithm is effective for the

content-based 3D wavelet video compression scheme.

128

CHAPTER 7

Conclusions and Future Work

This dissertation presents several methods to improve the state-of-the-art in video

compression and communication technology. This concluding chapter summarizes the

research presented and specifies contributions made. Also, various topics are identified

for future research.

7.1 Contributions

Noise removal in natural digital imagery is an important part of many different

imaging systems. Denoising methods based on the non-decimated wavelet transform

have shown to achieve a large PSNR increase. However, the computational burden

of previous wavelet-based noise removal algorithms are too large for realtime imaging

systems. Thus, the two-threshold criteria for coefficient selection in image denoising

has been developed to ease the computational burden associated with the coefficient

selection process. The two thresholds are defined by using a training sample approach.

The training sample images are artificially corrupted with AWGN and denoised with

several threshold levels. The threshold levels which produce the minimum error from

that of the optimal denoising method are used in the general case. The resulting

image denoising algorithm is not only 10x less complex computationally, but it also

129

shows an improvement in PSNR when compared to other wavelet-based denoising

algorithms given in the literature.

The removal of noise from video signals is important in the development of high

quality video systems. Therefore, a video denoising technique is described in this dis-

sertation. The denoising technique first uses the image denoising technique described

in this work for spatial domain denoising then uses a selective wavelet shrinkage al-

gorithm for temporal domain denoising. The temporal domain denoising technique

uses an estimate of the noise level as well as an estimate of the motion in the image

sequence to determine the amount of filtering that can improve the quality of the

video signal. This video denoising technique is more effective in noise removal and

achieves better average PSNR than the limited number of methods presented in the

literature.

Also, a virtual-object compression method is developed to provide the compression

gain that object-based compression methods promise, without the many difficulties

that object-based compression methods pose. With virtual-object compression, sta-

tionary background is separated from moving objects and each is compressed indepen-

dently. The independent compression of objects and background give virtual-object

compression an improvement in PSNR over 3D wavelet compression.

Real-time delivery of compressed video is a challenging problem because of the

many uncertain factors involved such as the computational capacity of both client

and server platforms, the bandwidth and amount of congestion of the network, and

the inherent acquisition delay of each GoF. We have provided a real-time video com-

munication solution which combats the many problems associated with real-time

video delivery over lossy channels by developing a rate control algorithm based on a

130

leaky bucket approach. Both sender and receiver include an independent monitoring

thread which adjusts the acquisition and display rates, respectively, to ensure proper

management of the video stream. The result is real-time video delivery over a lossy

channel.

The summation of these contributions results in a high-quality real-time video

compression and transmission system.

7.2 Future Work

Although this work provides some promising techniques to boost the overall per-

formance of 3D wavelet compression, there are still many issues that need to be

addressed for video compression using wavelets to be a method suitable for industry

standards. In this Section we outline a few areas of related study:

• Currently, wavelet-based image and video compression systems use one par-

ticular wavelet in transformation of the original signal, and that wavelet is

chosen experimentally. However, given different input signals, different wavelet

functions may prove to provide better results. Thus, it would be beneficial

to analyze the statistics of the input signal prior to compression in order to

select the wavelet which will most compactly represent that signal. Also, in

multiresolution analysis, the same wavelet need not be used in each level of

decomposition. Such signal analysis and wavelet selection could provide a com-

pression system that is optimal for all types of imaging and video signals (i.e.,

long-wave infra-red (LWIR), short-wave infra-red (SWIR), synthetic aperture

radar (SAR), etc.).

131

• Also, currently the image denoising and video denoising algorithms are not

computationally efficient enough for real-time imaging and video systems. Cur-

rently, the image denoising algorithm developed in this work can denoise a

320x240 grayscale image in approximately 1 second, which is 30 times slower

than needed for realtime calculation. In addition, the video denoising algorithm

has an added computational load with the addition of temporal domain process-

ing. It can denoise a 320x240x64 grayscale GoF in approximately 1.5 minutes.

A computational speedup of greater than 30 is most likely unattainable with

computational optimization of the algorithms. Thus, a hardware implementa-

tion is necessary for realtime applications.

• This dissertation in part defines an image and a video denoising algorithm.

These algorithms are designed to remove AWGN from images and video signals

and have shown to give higher PSNR than other methods given in the literature.

However, AWGN is only one of many types of noise sources that is found in

the image and video capture process. Fixed pattern noise, shot noise, thermal

noise, correlated noise, speckle, as well as AWGN are different types of noise

that corrupt many different image and video capture processes. Thus, for an

image/video denoising algorithm to be most useful in industry, the image/video

capture process must be studied and the types of noise corruption involved in

that process must be discovered. Then, an image/video denoising process may

be developed that is tailored to removing the type of noise that is produced by

the capture process.

132

• Much of the work involved in this dissertation is in the removal of noise in

signals prior to compression. The removal of noise facilitates compression by

reducing the amount of entropy of the signal while improving the signal quality.

However, the removal of noisy artifacts generated by the compression algorithm

after reconstruction is also an important processing step. Post-processing is

used in most modern-day compression systems. In both the JPEG and MPEG

standards, there exists filtering algorithms to remove the blocking artifacts as-

sociated with the block-based DCT transform used in the compression engine.

Thus, it would be fruitful to obtain a post-processing method to remove the

artifacts generated by wavelet-based compression methods.

• This dissertation uses PSNR as the metric for quality. The reasoning behind

using this metric is one of legacy and consistency. Most of the image and

video processing community continues to publish results using PSNR as the

quality metric, so to compare results with other methods we use PSNR as well.

However, in Chapter 3 we have briefly mention some metrics that may be closer

to the human perception of quality. Thus, new denoising and compression

methods can and should be developed which publish results with not one but

several quality metrics. In this way, researchers can be more confident about

the performance of such algorithms.

133

APPENDIX A

Computation of S·,k[x, y]

The computation of S·,k[x, y] is given from the following algorithm:

~N() = {[−1,−1], [−1, 0], [−1, 1], [0,−1],[0, 1], [1,−1], [1, 0], [1, 1]}

O[·] = 0, t = 0, p = 0, ~D·,k(0) = (x, y)if I·,k[x, y] == 1,

while ~D·,k(t) 6= NULL,

(i, j) = ~D·,k(t)t = t + 1for m = 0 to 7,

if ((I·,k[(i, j) + ~N(m)] == 1)

and (O[(i, j) + ~N(m)] == 0)),p = p + 1~D·,k(p) = ((i, j) + ~N(m))

O[(i, j) + ~N(m)] = 1,end if

end forend while

end ifS·,k[x, y] = t

. (A.1)

O[x, y] is a Boolean value to determine whether a particular I·,k[x, y] value has been

counted previously. ~D is an array of spatial coordinates of valid coefficients that

support the current coefficient value I·,k[x, y]. ~N is a set of vectors corresponding to

neighboring coefficient values.

134

BIBLIOGRAPHY

[1] ISO/IEC 11172-2. Information technology – Coding of moving pictures and asso-ciated audio for digital storage media at up to about 1,5 Mbit/s – Part 2: Video,Mar. 1993.

[2] ISO/IEC 13818-2. Information technology – Generic coding of moving picturesand associated audio information: Video, Mar. 1995.

[3] O. Avaro, A. Eleftheriadis, C. Herpel, G. Rajan, and L. Ward. MPEG-4 Systems:Overview, June 2000.

[4] J. B. Bednar and T. L. Wat. ”Alpha-Trimmed Means and Their Relationship toMedian Filters”. IEEE Transactions on Acoustics, Speech, and Signal Processing,vol. ASSP-32:pages 145–153, Feb. 1984.

[5] T. J. Burns, S. K. Roghers, M. E. Oxley, and D. W. Ruck. ”A Wavelet Multires-olution Analysis for Spatio-Temporal Signals”. IEEE Transactions on Aerospaceand Electronic Systems, 32(2):628–649, Apr. 1996.

[6] C. S. Burrus, R. A. Gopinath, and H. Guo. Introduction to Wavelets and WaveletTransforms, A Primer. Prentice Hall, 1998.

[7] M. Butto, E. Cavallero, and A. Tonietti. ”Effectiveness of the Leaky BucketPolicy Mechanism in ATM Networks”. IEEE Journal of Selected Areas in Com-munications, 9:335–342, April 1991.

[8] Berkeley Multimedia Research Center. MPEG-1 faq, Aug. 2001.

[9] Berkeley Multimedia Research Center. MPEG-2 faq, Aug. 2001.

[10] F. Cocchia, S. Carrato, and G. Ramponi. ”Design and Real-Time Implementa-tion of a 3-D Rational Filter for Edge Preserving Smoothing”. IEEE Transactionson Consumer Electronics, vol. 43:pages 1291–1300, Nov. 1997.

[11] C. D. Creusere and G. Dahman. ”Object Detection and Localization in Com-pressed Video”. In Proc. IEEE International Asilomar Conference on Signals,Systems, and Computers, volume 1, pages 93–97, 2001.

135

[12] I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and AppliedMathematics, 1992.

[13] W. Ding. ”Joint Encoder and Channel Rate Control of VBR Video over ATMNetworks”. IEEE Transactions on Circuits and Systems for Video Technology,7(2):266–278, 1997.

[14] W. Ding and B. Liu. ”Rate Control of MPEG Video Coding and Recording byRate-Quantization Modeling”. IEEE Transactions on Circuits and Systems forVideo Technology, 6(1):12–20, 1996.

[15] D. L. Donoho and I. M. Johnstone. ”Ideal Spatial Adaptation by Wavelet Shrink-age”. Biometrika, vol. 81:pages 425–455, Apr. 1994.

[16] D. L. Donoho and I. M. Johnstone. ”Adapting to Unknown Smoothness viaWavelet Shrinkage”. Journal of American Statistical Association, vol. 90:pages1200–1224, 1995.

[17] R. Dugad and N. Ahuja. ”Video Denoising by Combining Kalman and WienerEstimates”. In Proc. IEEE International Conference on Image Processing, vol-ume 4, pages 152–156, 1999.

[18] F. Faghih and M. Smith. ”Combining Spatial and Scale-Space Techniques forEdge Detection to Providee a Spatially Adaptive Wavelet-Based Noise FilteringAlgorithm”. IEEE Transactions on Image Processing, vol. 11:pages 1062–1071,Sept. 2002.

[19] Z. Gao and Y. F. Zheng. ”Variable Quantization in Subbands for Optimal Com-pression Using Wavelet Transform”. In Proc. World Conference on Systemics,Cybernetics, and Informatics, July 2003.

[20] M. Ghazel, G. H. Freeman, and E.R. Vrscay. ”Fractal-Wavelet Image Denoising”.In Proc. IEEE International Conference on Image Processing, volume 1, pagesI836–I839, 2002.

[21] K. H. Goh, J. J. Soraghan, and T. S. Durrani. ”New 3-D wavelet TransformCoding Algorithm for Image Sequences”. Electron. Letters, 29(4):401–402, Feb.1993.

[22] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison-WesleyPublishing, 1992.

[23] C. He, J. Dong, Y. F. Zheng, and S. C. Ahalt. ”Object Tracking Using the GaborWavelet Transform and the Golden Section Algorithm”. IEEE Transactions onMultimedia, 4(4):528–538, Dec. 2002.

136

[24] C. He, J. Dong, Y. F. Zheng, and Z. Gao. ”Optimal 3-D Coefficient Tree Struc-ture for 3-D Wavelet Video Coding”. IEEE Transactions on Circuits and Systemsfor Video Technology, 13(10):961–972, Oct. 2003.

[25] G. Healey and R. Kondepudy. ”CCD Camera Calibration and Noise Estima-tion”. In Proc. IEEE International Conference on Computer Vision and PatternRecognition, volume 1, page 90, June 1992.

[26] T. C. Hsung, D Pak-Kong Lun, and W. C. Siu. ”Denoising by SingularityDetection”. IEEE Transactions on Signal Processing, vol. 47:pages 3139–3144,Nov. 1999.

[27] S. J. Huang. ”Adaptive Noise Reduction and Image Sharpening for DigitalVideo Compression”. In Proc. IEEE International Conference on ComputationalCybernetics and Simulation, volume 4, pages 3142–3147, 1997.

[28] Y.-T. Hwang, Y.-C. Wang, and S.-S. Wang. ”An Efficient Shape Coding Schemeand its Codec Design”. In Proc. IEEE Workshop on Signal Processing Systems,volume 2, pages 225–232, 2001.

[29] C. R. Jung and J. Scharcanski. ”Adaptive Image Denoising in Scale-Space Us-ing the Wavelet Transform”. In Proc. XIV Brazilian Symposium on ComputerGraphics and Image Processing, pages 172–178, 2001.

[30] C. M. Kim, B. U. Lee, and R. H. Park. ”Design of MPEG-2 Video Test Bit-streams”. IEEE Transactions on Consumer Electronics, 45(4):1213–1220, 1999.

[31] S. D. Kim, S. K. Jang, M. J. Kim, and J. B. Ra. ”Efficient Block-Based Codingof Noise Images by Combining Pre-Filtering and DCT”. In Proc. IEEE Interna-tional Symposium on Circuits and Systems, volume 4, pages 37–40, 1999.

[32] Y.-R. Kim, Y. K. Kim, Y.-K. Ko, and S.-J. Ko. ”Video Rate Control UsingActivity Based Rate Prediction”. In Proc. IEEE International Conference onConsumer Electronics, volume 99, pages 236–237, June 1999.

[33] R. P. Kleinhorst, R. L. Lagendijk, and J. Biemond. ”An Efficient Spatio-Temporal OS-Filter for Gamma-Corrected Video Signals”. In Proc. IEEE Inter-national Conference on Image Processing, 1:348–352, Nov. 1994.

[34] Tom Lane. Image Compression FAQ, part 1/2, Mar. 1999.

[35] A. S. Lewis and G. Knowles. ”Video Compression Using 3D Wavelet Trans-forms”. Electron. Letters, 26(6):396–398, Mar. 1990.

137

[36] S. Li and W. Li. ”Shape-Adaptive Discrete Wavelet Transforms for ArbitrarilyShaped Visual Object Coding”. IEEE Transactions on Circuits and Systems forVideo Technology, 10(5):725–743, August. 2000.

[37] C. Lin, B. Zhang, and Y. F. Zheng. ”Packed Integer Wavelet Transform Con-structed by Lifting Scheme”. IEEE Transactions on Circuits and Systems forVideo Technology, 10(8):1496–1501, Dec. 2000.

[38] G. Lin and L. Zemin. ”3D Wavelet Video Codec and its Rate Control in ATMNetwork”. In Proc. IEEE International Syposium on Circuits and Systems, vol-ume 4, pages 447–450, 1999.

[39] W. Ling and P. K. S. Tam. ”Video Denoising Using Fuzzy-connectedness Princi-ples”. In Proc. IEEE International Symposium on Intelligent Multimedia, Video,and Speech Processing, pages 531–534, 2001.

[40] T.-M. Liu, B.-J. Shieh, and C.-Y. Lee. ”An Efficient Modeling Codec Archi-tecture for Binary Shape Coding”. In Proc. IEEE International Symposium onCircuits and Systems, volume 2, pages II–316–II–319, 2002.

[41] W. S. Lu. ”Wavelet Approaches to Still Image Denoising”. In Proc. IEEE Inter-national Asilomar Conference on Signals, Systems, and Computers, volume 2,pages 1705–1709, 1998.

[42] M. Malfait and D. Roose. ”Wavelet-Based Image Denoising Using a MarkovRandom Field A Priori Model”. IEEE Transactions on Image Processing, vol.6:pages 549–565, Apr. 1997.

[43] S. Mallat and W. L. Hwang. ”Singularity Detection and Processing withWavelets”. IEEE Transactions on Information Theory, vol. 38:pages 617–623,March 1992.

[44] F. McMahon. ”JPEG2000”. Digital Output, June. 2002.

[45] M. Meguro, A. Taguchi, and N. Hamada. ”Data-dependent Weighted MedianFiltering with Robust Motion Information for Image Sequence Restoration”. InProc. IEEE International Conference on Image Processing, 2:424–428, 1999.

[46] M. Meguro, A. Taguchi, and N. Hamada. ”Data-dependent Weighted MedianFiltering with Robust Motion Information for Image Sequence Restoration”. IE-ICE Transactions on Fundamentals, vol. 2:pages 424–428, 2001.

[47] J. Miano. Compressed Image File Formats: JPEG, PNG, GIF, XBM, BMP.ACM Publishing, 1999.

138

[48] N. Moayeri. ”A Low-Complexity, Fixed-Rate Compression Scheme for ColorImages and Document”s. The Hewlett-Packard Journal, 50(1), Nov. 1998.

[49] O. Ojo and T. Kwaaitaal-Spassova. ”An Algorithm for Integrated Noise Reduc-tion and Sharpness Enhancement”. IEEE Transactions on Consumer Electron-ics, vol. 46:pages 474–480, May 2000.

[50] S. J. Orfanidis. Introduction to Signal Processing. Prentice Hall, 1996.

[51] I.-M. Pai and M.-T. Sun. ”Encoding Stored Video for Streaming Applications”.IEEE Transactions on Circuits and Systems for Video Technology, 11(2):199–209, Feb. 2001.

[52] K. R. Persons, P. M. Pallison, A. Manduca, W. J. Charboneau, E. M. James,M. T. Charboneau, N. J. Hangiandreou, and B. J. Erickson. ”Ultrasoundgrayscale image compression with JPEG and wavelet techniques”. Journal ofDigital Imaging, 13(1):25–32, 2000.

[53] R. A. Peters. ”A New Algorithm for Image Noise Reduction Using MathematicalMorphology”. IEEE Transactions on Image Processing, vol. 4:pages 554–568,May 1995.

[54] A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy. ”A Joint Inter- andIntrascale Statistical Model for Bayesian Wavelet Based Image Denoising”. IEEETransactions on Image Processing, vol. 11:pages 545–557, May 2002.

[55] A. Pizurica, V. Zlokolica, and W. Philips. ”Combined Wavelet Domain and Tem-poral Video Denoising”. In Proc. IEEE International Conference on AdvancedVideo and Signal Based Surveillance, volume 1, pages 334–341, July 2003.

[56] S. M. Poon, B. S. Lee, and C. K. Yeo. ”Davic-based Video-on-Demand Systemover IP Networks”. IEEE Transactions on Consumer Electronics, 46(1):6–15,2000.

[57] D. Qiao and Y. F. Zheng. ”Dynamic Bit-Rate Estimation and Control forConstant-Quality Communication of Video”. In Proc. Third World Congresson Intelligent Control and Automation, pages 2506–2511, June 2000.

[58] A. C. Reed and F. Dufaux. ”Constrained Bit-Rate Control for Very Low Bit-RateStreaming-Video Applications”. IEEE Transactions on Circuits and Systems forVideo Technology, 11(7):882–889, July 2001.

[59] A. R. Reibman and B. G. Haskell. ”Constraints on Variable Bit-Rate Videofor ATM Networks”. IEEE Transactions on Circuits and Systems for VideoTechnology, 2(4):361–372, 1992.

139

[60] J. Ribas-Corbera and S. Lei. ”Rate Control in DCT Video Coding for Low-Delay Communicatinos”. IEEE Transactions on Circuits and Systems for VideoTechnology, 11(2):172–185, Feb. 2001.

[61] P. Rieder and G. Scheffler. ”New Concepts on Denoising and Sharpening of VideoSignals”. IEEE Transactions on Consumer Electronics, vol. 47:pages 666–671,Aug. 2001.

[62] A. Said and W. A. Pearlman. ”A New, Fast, and Efficient Image Codec Basedon Set Partitioning in Hierarchical Trees”. IEEE Transactions on Circuits andSystems for Video Technology, vol. 6:pages 243–250, June 1996.

[63] D. Santa-Cruz, T. Ebrahimi, J. Askelof, M. Larsson, and C.A. Christopoulos.”JPEG2000 still image coding versus other standards”. In Proc. SPIE’s 45thannual meeting, Applications of Digital Image Processing XXIII, volume 4115,pages 446–454, 2000.

[64] L. Shutao, W. Yaonan, Z. Changfan, and M. Jianxu. ”Fuzzy Filter Based onNeural Network and Its Applications to Image Restoration”. In Proc. IEEEInternational Conference on Signal Processing, volume 2, pages 1133–1138, 2000.

[65] K.-D. Soe, S.-H. Lee, J.-K. Kim, and J.-S. Kow. ”Rate Control Algorithm forFast Bit-Rate Conversion Transcoding”. IEEE Transactions on Consumer Elec-tronics, 46(4):1128–1136, Nov. 2000.

[66] H. Song, J. Kim, and J. Kuo. ”Real-Time H.263+ Frame Rate Control for LowBit-Rate VBR Video”. In Proc. IEEE International Symposium on Circuits andSystems, volume 4, pages 307–310, May 1999.

[67] H. Song and C.-C. J. Kuo. ”Rate Control for Low-Bit Rate Video via Variable-Encoding Frame Rates”. IEEE Transactions on Circuits and Systems for VideoTechnology, 11(4):512–521, April 2001.

[68] H. Stark and J. Woods. Probability, Random Processes, and Estimation Theoryfor Engineers. Prentice Hall, 1994.

[69] A. De Stefano, P. R. White, and W. B. Collis. ”An Innovative Approach for Spa-tial Video Noise Reduction Using a Wavelet Based Frequency Decomposition”.In Proc. IEEE International Conference on Image Processing, volume 3, pages281–284, 2000.

[70] W. Sweldens. ”The lifting scheme: A custom-design construction of biorthogonalwavelets”. Appl. Comput. Harmon. Anal., 3(2):186–200, 1996.

140

[71] J. Y. Tham, S. Ranganath, and A. A. Kassim. ”Highly Scalable Wavelet-BasedVideo Codec for Very Low Bit-Rate Environment”. IEEE Journal on SelectedAreas in Communications, 16(1):12–27, 1998.

[72] M. J. Tsai, J. D. Villasenor, and F. Chen. ”Stack-Run Image Coding”. IEEETransactions on Circuits and Systems for Video Technology, 6:519–521, Oct.1996.

[73] C. Vertan, C. I. Vertan, and V. Buzuloiu. ”Reduced Computation Genetic Al-gorithm for Noise Removal”. In Proc. IEEE International Conference on ImageProcessing and Its Applications, volume 1, pages 313–316, July 1997.

[74] J. D. Villasenor, B. Belzer, and J. Liao. ”Wavelet Filter Evaluation for ImageCompression”. IEEE Transactions on Image Processing, 4(7):1053–1060, Aug.1995.

[75] Z. Wang and A. Bovik. ”A Universal Image Quality Index”. IEEE Signal Pro-cessing Letters, 9(3):81–84, March 2002.

[76] Y. F. Wong, E. Viscito, and E. Linzer. ”PreProcessing of Video Signals forMPEG Coding by Clustering Filter”. In Proc. IEEE Internatonal Conference onImage Processing, volume 2, pages 2129–2133, 1995.

[77] Y. I. Wong. ”Nonlinear Scale-Space Filtering and Multiresolution System”. IEEETransactions on Image Processing, vol. 4:pages 774–786, June 1995.

[78] G. Xing, J. Li, S. Li, and Y.-Q. Zhang. ”Arbitrarily Shaped video-Object Codingby Wavelet”. IEEE Transactions on Circuits and Systems for Video Technology,11(10):1135–1139, Oct. 2001.

[79] C. H. Yeh, H. T. Chang, and C. J. Kuo. ”Boundary Block-Searching Algo-rithm for Arbitrary Shaped Coding”. In Proc. IEEE International Conferenceon Multimedia, volume 1, pages 473–476, 2002.

[80] W. Zhe, S. Wang, R.-S. Lin, and S. Levinson. ”Tracking of Object with SVMRegression”. In Proc. IEEE International Conference on Computer Vision andPattern Recognition, volume 2, pages II–240–II–245, 2001.

[81] Y. F. Zheng. ”Method for Dynamic 3D Wavelet Transform for Video Com-pression”. U.S. Pattent Application Submitted by the Department of ElectricalEngineering, The Ohio State University, Dec. 2000.

[82] Z. Zheng and I. Cumming. ”SAR Image Compression Based on the DiscreteWavelet Transform”. In Proc. IEEE International Conference on Signal Pro-cessing, pages 787–791, Oct. 1998.

141

[83] V. Zlokolica, W. Philips, and D. Van De Ville. ”A New Non-linear Filter for VideoProcessing”. In Proc. IEEE Benelux Signal Processing Symposium, volume 2,pages 221–224, 2002.

142

Documents

video compression and rate control methods based on the wavelet transform