Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
VIDEO COMPRESSION AND RATE CONTROL
METHODS BASED ON THE WAVELET TRANSFORM
DISSERTATION
Presented in Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy in the
Graduate School of The Ohio State University
By
Eric J. Balster, B.S., M.S.
* * * * *
The Ohio State University
2004
Dissertation Committee:
Yuan F. Zheng, Adviser
Ashok K. Krishnamurthy
Steven B. Bibyk
Approved by
Adviser
Department of Electricaland Computer Engineering
c© Copyright by
Eric J. Balster
2004
ABSTRACT
Wavelet-based image and video compression techniques have become popular ar-
eas in the research community. In March of 2000, the Joint Pictures Expert Group
(JPEG) released JPEG2000. JPEG2000 is a wavelet-based image compression stan-
dard and predicted to completely replace the original JPEG standard. In the video
compression field, a compression technique called 3D wavelet compression shows
promise. Thus, wavelet-based compression techniques have received more attention
from the research community.
This dissertation involves further investigation of the wavelet transform in the
compression of image and video signals, and a rate control method for real-time
transfer of wavelet-based compressed video.
A pre-processing algorithm based on the wavelet transform is developed for the
removal of noise in images prior to compression. The intelligent removal of noise
reduces the entropy of the original signal, aiding in compressibility. The proposed
wavelet-based denoising method shows a computational speedup of at least an order
of magnitude than previously established image denoising methods and a higher peak
signal-to-noise ratio (PSNR).
A video denoising algorithm is also included which eliminates both intra- and
inter-frame noise. The inter-frame noise removal technique estimates the amount
of motion in the image sequence. Using motion and noise level estimates, a video
ii
denoising technique is established which is robust to various levels of noise corruption
and various levels of motion.
A virtual-object video compression method is included. Object-based compres-
sion methods have come to the forefront of the research community with the adoption
of the MPEG-4 (Motion Pictures Expert Group) standard. Object-based compres-
sion methods promise higher compression ratios without further cost in reconstructed
quality. Results show that virtual-object compression outperforms 3D wavelet com-
pression with an increase in compression ratio and higher PSNR.
Finally, a rate-control method is developed for the real-time transmission of wavelet-
based compressed video. Wavelet compression schemes demand a rate-control al-
gorithm for real-time video communication systems. Using a leaky-bucket design
approach, the proposed rate-control method manages the uncertain factors in both
the acquisition time of the group of frames (GoF), computation time of compres-
sion/decompression algorithms, and network delay. Results show good management
and control of buffers and minimal variance in frame rate.
iii
To my parents
iv
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to my advisor Professor Yuan F. Zheng
for his constant encouragement, shrewd guidance, and financial support throughout
my years at The Ohio State University (OSU). I have benefited from his expert tech-
nical knowledge in science and engineering and learned from his creative and novel
solutions to many research problems. It has truly been an honor and a privilege to
study under his guidance. I would also like to thank Professors Ashok K. Krishna-
murthy and Steven B. Bibyk for serving on my committee and providing feedback on
this dissertation.
It has been my pleasure to work with my colleges in the Wavelet Research Group
at OSU. Specifically I would like to thank Ms. Yi Liu and Mr. Zhigang (James)
Gao for the continual help with many technical problems that I had come across
over the years and their computer support help that is second to none. I would also
like to thank my former colleges Dr. Jianyu (Jane) Dong (currently at California
State University) and Mr. Chao He (currently at Microsoft Corp.) for helping me to
become acclimated to our research group and to the university during the beginning
of my studies. Both Jane and Chao were also helpful in many productive discussions
concerning wavelet-based compression of video signals.
I would like to thank both the Dayton Area Graduate Studies Institute (DAGSI)
and the Air Force Research Laboratory (AFRL) for funding this research.
v
I want to give a special thanks to the AFRL Embedded Information Systems
Engineering Branch (IFTA) for their continued support over the years. Everyone
in the branch has been very encouraging and supportive throughout my studies.
Specifically, I would like to thank Mr. James Williamson and Mr. Eugene Blackburn
for giving me the opportunity to work at AFRL; an institution of superb research
and state-of-the-art technology. Thanks to Dr. Robert L. Ewing for his tutelage and
advise through many milestones over the years. I would also like to thank Mr. Al
Scarpelli for his support and help during many projects.
Lastly, I would also like to thank my family for their love and encouragement.
Susan, Craig, Jenny, Michael, Megan, Evan, Mom, and Dad, you have always been
a very supportive and loving family. Without you all, I would not be able to pursue
my goals.
vi
VITA
Dec. 24, 1975 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Born - Dayton, OH
May 1998 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B.S. Electrical Engineering, Universityof Dayton, Dayton, OH
Aug. 1998 - Aug. 1999 . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Assistant, Electri-cal Engineering, University of Dayton,Dayton, OH
Aug. 1999 - May. 2000 . . . . . . . . . . . . . . . . . . . . . Graduate Research Assistant, Electri-cal Engineering, University of Dayton,Dayton, OH
May 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.S. Electrical Engineering, Universityof Dayton, Dayton, OH
Sept. 2000 - June 2002 . . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, Electri-cal Engineering, The Ohio State Uni-versity, Columbus, OH
July 2002 - present . . . . . . . . . . . . . . . . . . . . . . . . . Associate Electronics Engineer, Em-bedded Information Systems Engineer-ing Branch, Air Force Research Labo-ratory, Wright-Patterson AFB, OH
PUBLICATIONS
Research Publications
Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Combined Spatial and Tem-poral Domain Wavelet Shrinkage Algorithm for Video Denoising”, submitted to IEEETransactions on Circuits and Systems for Video Technology. Apr. 2004.
Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Combined Spatial and Tem-poral Domain Wavelet Shrinkage Algorithm for Video Denoising”, in Proc. IEEE
vii
International Conference on Communication Systems, Networks, and Digital SignalProcessing. March 2004.
Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Feature-Based Wavelet Shrink-age Algorithm for Image Denoising”. submitted with one revision to IEEE Transac-tions on Image Processing. Feb 2004.
Eric J. Balster, Yuan F. Zheng, and Robert L. Ewing, ”Fast, Feature-Based WaveletShrinkage Algorithm for Image Denoising”, in Proc. IEEE International Conferenceon Integration of Knowledge Intensive Multi-Agent Systems. pp. 722-728, Oct. 2003.
Eric J. Balster, Waleed W. Smari, and Frank A. Scarpino, ”Implementation of Effi-cient Wavelet Image Compression Algorithms using Reconfigurable Devices”, in Proc.IASTED International Conference on Signal and Image Processing. pp 249-256, Aug.2003.
Eric J. Balster and Yuan F. Zheng, ”Constant Quality Rate Control for Content-based 3D Wavelet Video Communication”, in Proc. World Congress on IntelligentControl and Automation. pp. 2056-2060, June 2002.
Eric J. Balster and Yuan F. Zheng, ”Real-Time Video Rate Control Algorithm for aWavelet-Based Compression Scheme”, in Proc. IEEE Midwest Symposium on Circuitsand Systems. pp. 492-496, Aug 2001.
Eric J. Balster, Frank A. Scarpino, and Waleed W. Smari, ”Wavelet Transform forReal-Time Image Compression Using FPGAs”, in Proc. IASTED International Con-ference on Parallel and Distributed Computing and Systems. pp 232-238, Nov. 2000.
FIELDS OF STUDY
Major Field: Electrical Engineering
Studies in:
Communication and Signal ProcessingCircuits and ElectronicsMathematics
viii
TABLE OF CONTENTS
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Chapters:
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 A Review of Current Compression Standards . . . . . . . . . . . . 11.1.1 Image Compression Standard (JPEG) . . . . . . . . . . . . 11.1.2 JPEG2000 Image Compression Standard . . . . . . . . . . . 21.1.3 Video Compression Standards (H.26X and MPEG-X) . . . . 3
1.2 Motivation for Wavelet Image Compression Research . . . . . . . . 61.2.1 Wavelet Image Compression vs. JPEG Compression . . . . 61.2.2 Wavelet Image Pre-processing . . . . . . . . . . . . . . . . . 9
1.3 Motivation for Wavelet Video Compression Research . . . . . . . . 111.3.1 Video Signal Pre-processing for Noise Removal . . . . . . . 121.3.2 Virtual-Object Based Video Compression . . . . . . . . . . 13
1.4 Motivation for the Rate Control of Wavelet-Compressed Video . . . 141.5 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . 15
ix
2. Wavelet Theory Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Scaling Function and Wavelet Definitions . . . . . . . . . . . . . . 172.2 Scaling Function and Wavelet Restrictions . . . . . . . . . . . . . . 202.3 Wavelet Filterbank Analysis . . . . . . . . . . . . . . . . . . . . . . 202.4 Wavelet Filterbank Synthesis . . . . . . . . . . . . . . . . . . . . . 222.5 Two-Dimensional Wavelet Transform . . . . . . . . . . . . . . . . . 222.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3. Feature-Based Wavelet Selective Shrinkage Algorithm for Image Denoising 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 2D Non-Decimated Wavelet Analysis and Synthesis . . . . . . . . . 303.3 Retention of Feature-Supporting Wavelet Coefficients . . . . . . . . 333.4 Selection of Threshold τ and Support s . . . . . . . . . . . . . . . . 393.5 Estimation of Parameter Values . . . . . . . . . . . . . . . . . . . . 49
3.5.1 Noise Estimation . . . . . . . . . . . . . . . . . . . . . . . . 493.5.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 49
3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 513.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4. Combined Spatial and Temporal Domain Wavelet Shrinkage Algorithmfor Video Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2 Temporal Denoising and Order of Operations . . . . . . . . . . . . 62
4.2.1 Temporal Domain Denoising . . . . . . . . . . . . . . . . . 624.2.2 Order of Operations . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Proposed Motion Index . . . . . . . . . . . . . . . . . . . . . . . . 664.3.1 Motion Index Calculation . . . . . . . . . . . . . . . . . . . 664.3.2 Motion Index Testing . . . . . . . . . . . . . . . . . . . . . 67
4.4 Temporal Domain Parameter Selection . . . . . . . . . . . . . . . . 694.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 714.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5. Virtual-Object Video Compression . . . . . . . . . . . . . . . . . . . . . 86
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 3D Wavelet Compression . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.1 2D Wavelet Transform . . . . . . . . . . . . . . . . . . . . . 895.2.2 2D Quantization . . . . . . . . . . . . . . . . . . . . . . . . 915.2.3 3D Wavelet Transform . . . . . . . . . . . . . . . . . . . . . 91
x
5.2.4 3D Quantization . . . . . . . . . . . . . . . . . . . . . . . . 925.2.5 3D Wavelet Compression Results . . . . . . . . . . . . . . . 95
5.3 Virtual-Object Compression . . . . . . . . . . . . . . . . . . . . . . 975.3.1 Virtual-Object Definitions . . . . . . . . . . . . . . . . . . . 975.3.2 Virtual-Object Extraction Method . . . . . . . . . . . . . . 985.3.3 Virtual-Object Coding . . . . . . . . . . . . . . . . . . . . . 102
5.4 Performance Comparison Between 3D Wavelet and Virtual-ObjectCompression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6. Constant Quality Rate Control for Content-Based 3D Wavelet Video Com-munication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.2 Multi-Threaded, Content-Based 3D Wavelet Compression . . . . . 1096.3 The Rate Control Algorithm . . . . . . . . . . . . . . . . . . . . . 112
6.3.1 Rate Control Overview . . . . . . . . . . . . . . . . . . . . . 1126.3.2 Buffer Constraints . . . . . . . . . . . . . . . . . . . . . . . 1146.3.3 Grouping Buffer Design . . . . . . . . . . . . . . . . . . . . 1186.3.4 Display Buffer Design . . . . . . . . . . . . . . . . . . . . . 120
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 1236.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 129
7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Appendices:
A. Computation of S·,k[x, y] . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
xi
LIST OF VARIABLES
In this dissertation, the following variables are used:
Greek Variables:
• α[x, y, z]: Boolean value of position (x, y, z) indicating the presence of back-
ground information
• αk[n]: Non-decimated scaling coefficient of scale k and position n
• αll,k[x, y]: Two-dimensional non-decimated scaling coefficient of scale k and
position n
• α3Dk [l, z]: Non-decimated scaling coefficient of level k, spatial position l, and
frame z, generated by temporal domain transformation
• αll,k[x, y]: Reconstructed non-decimated scaling coefficient of spatial position
(x, y)
• αoptll,k[x, y]: Optimally reconstructed non-decimated scaling coefficient of spatial
position (x, y)
• αA: Percent change in frame acquisition rate
• αD: Percent change in display rate
xii
• γx[z]: Leftmost position of the virtual-object in frame z
• γy[z]: Highest vertical position of the virtual-object in frame z
• Γ: The maximum size of a group of frames (GoF)
• δA: Incremental change in the frame acquisition rate
• δD: Incremental change in the display rate
• εd: Empty display buffer warning threshold
• εg: Empty grouping buffer warning threshold
• εx[z]: Rightmost position of the virtual-object in frame z
• εy[z]: Lowest vertical position of the virtual-object in frame z
• η(x, y): Two-dimensional noise function value at spatial position (x, y)
• λk[n]: Non-decimated wavelet coefficient of scale k and position n
• λhl,k[x, y]: Two-dimensional non-decimated wavelet coefficient, high-low sub-
band, of scale k and spatial position (x, y)
• λlh,k[x, y]: Two-dimensional non-decimated wavelet coefficient, low-high sub-
band, of scale k and spatial position (x, y)
• λhh,k[x, y]: Two-dimensional non-decimated wavelet coefficient, high-high sub-
band, of scale k and spatial position (x, y)
• λ3Dk [l, z]: Non-decimated wavelet coefficient of level k, spatial position l, and
frame z, generated by temporal domain transformation
xiii
• λ·,k[x, y]: Non-decimated wavelet coefficient of level k and spatial position (x, y),
generated by the wavelet transform of f(·)
• λvo[x, y, z]: Non-decimated wavelet coefficient of position (x, y, z) used to de-
termine location of the virtual-object
• µl: Temporal mean of spatially averaged pixel values, Azl
• σn: Standard deviation of η(·)
• σn: Estimated standard deviation η(·)
• τ : Threshold used in image denoising
• τc: The critical time period before the display buffer is empty
• τm(·): Optimal threshold function used in image denoising
• τm(·): Estimated threshold function used in image denoising
• τvo: Threshold used to determine motion in the wavelet coefficients, λvo[·]
• τz[·]: temporal domain threshold for video denoising
• φd: Full display buffer warning threshold
• φg: Full grouping buffer warning threshold
• Φ(t): Scaling function
• Φk,n(t): Scaling function of scale k and shift n
• Ψ(t): Mother wavelet
xiv
• Ψk,n(t): Wavelet of scale k and shift n
English Variables:
• ak[n]: Scaling coefficient of scale k and position n
• all,k[x, y]: Two-dimensional scaling coefficient of scale k and spatial position
(x, y)
• all,k[x, y, z]: Quantized, two-dimensional scaling coefficient of scale k and posi-
tion (x, y, z)
• a3D·,k,j[x, y, z]: Three-dimensional scaling coefficient of 2D scale k, 3D scale j, and
position (x, y, z)
• a3D·,k,j[x, y, z]: Quantized three-dimensional scaling coefficient of 2D scale k, 3D
scale j, and position (x, y, z)
• as: Multiplicative term used in the LMMSE calculation of sm(·)
• aτ : Multiplicative term used in the LMMSE calculation of τm(·)
• Ai: Frame acquisition rate
• Azl : Spatially averaged pixel value of spatial position l and frame z used in
motion index calculation
• b(x, y): Background pixel of spatial location (x, y)
• bs: Additive term used in the LMMSE calculation of sm(·)
• bτ : Additive term used in the LMMSE calculation of τm(·)
xv
• Bdi : Display buffer fullness at time i
• Bgi : Grouping buffer fullness at time i
• CN : Size of the N th group of frames (GoF)
• dk[n]: Wavelet coefficient of scale k and position n
• dhl,k[x, y]: Two-dimensional wavelet coefficient, high-low subband, of scale k
and spatial position (x, y)
• dlh,k[x, y]: Two-dimensional wavelet coefficient, low-high subband, of scale k
and spatial position (x, y)
• dhh,k[x, y]: Two-dimensional wavelet coefficient, high-high subband, of scale k
and spatial position (x, y)
• dhl,k[x, y, z]: Quantized, 2D wavelet coefficient, high-low subband, of scale k and
location (x, y, z)
• dlh,k[x, y, z]: Quantized, 2D wavelet coefficient, low-high subband, of scale k and
location (x, y, z)
• dhh,k[x, y, z]: Quantized, 2D wavelet coefficient, high-high subband, of scale k
and location (x, y, z)
• d3D·,k,j[x, y, z]: Three-dimensional wavelet coefficient of 2D scale k, 3D scale j and
position (x, y, z)
• d3D·,k,j[x, y, z]: Quantized three-dimensional wavelet coefficient of 2D scale k, 3D
scale j and position (x, y, z)
xvi
• D: Space below the virtual-object
• Davg|Bdi−1<ε: Estimated average display rate
• Di: Display frame rate at time i
• Ei: Compression rate at time i
• Ex(z): Ending horizontal position of the virtual-object in frame z
• Ey(z): Ending vertical position of the virtual-object in frame z
• f(t): Arbitrary function
• fk(t): Arbitrary function of scale k
• f(x, y): Original image pixel of spatial position (x, y)
• f(x, y): Noisy image pixel of spatial position (x, y)
• f(x, y): Denoised image pixel of spatial position (x, y)
• f opt(x, y): Optimal denoised image pixel of spatial position (x, y)
• f(x, y, z): Original video signal pixel of position (x, y, z)
• f(x, y, z): Reconstructed video signal pixel of position (x, y, z)
• f zl : Video signal pixel of spatial location l and frame z
• F : Number of frames in a group of frames (GoF)
• g[n]: Wavelet filter coefficient of position n
xvii
• GN : Time period when the last frame of the N th group of frames (GoF) is
acquired
• h[n]: Scaling function filter coefficient of position n
• Hf : Height of image
• Ho: Height of the virtual-object
• I: The initial buffering level for the display buffer
• I·,k[x, y]: Boolean value formed by thresholding noisy wavelet coefficient, λ·,k[x, y]
by τ
• Ivo[x, y, z]: Boolean value created by thresholding λvo[x, y, z] coefficient by the
threshold, τvo
• J·,k[x, y]: Boolean value formed by refining I·,k[x, y] with local support
• Jopt·,k [x, y]: Optimal Boolean value of spatial location (x, y)
• Jvo[x, y, z]: Refined Boolean value used for motion detection of location (x, y, z)
• K: Number of terms included in noise estimation calculation
• KM : Number of subband levels in the 2D wavelet transform
• JM : Number of subband levels in the 3D wavelet transform
• L: Space left of the virtual-object
• L·,k[x, y]: Wavelet coefficient of scale k and spatial location (x, y) used in re-
construction
xviii
• Lopt·,k [x, y]: Wavelet coefficient of scale k and spatial location (x, y) used in opti-
mal reconstruction
• LN : The total delay of the N th group of frames (GoF)
• mse: Mean-squared error between original and modified image
• Ml: Motion index of spatial location l
• o(x, y, z): Virtual-object pixel of location (x, y, z)
• R: Space right of the virtual-object
• Ri: Video reconstruction rate at time i
• s: Support variable used to create Boolean map J·,k[·]
• s2: 2D Quantization step size
• s3: 3D Quantization step size
• sm(·): Optimal support function used in image denoising
• sm(·): Estimated support function used in image denoising
• svo: Support value used to refine motion detection
• S·,k[x, y]: Coefficient support value of level k and spatial location (x, y)
• Sd: Size of the display buffer
• Sg: Grouping buffer size
• Sx(z): Starting horizontal position of the virtual-object in frame z
xix
• Sy(z): Starting vertical position of the virtual-object in frame z
• U : Space above the virtual-object
• Vk: Spanning set of scaling functions of scale k
• Wf : Width of image
• Wk: Spanning set of wavelet functions of scale k
• Wo: Width of the virtual-object
• zm,x: Frame which contains the maximum virtual-object width
• zm,y: Frame which contains the maximum virtual-object height
xx
LIST OF TABLES
Table Page
3.1 Minimum average error of test images for various noise levels and theircorresponding threshold and support values. . . . . . . . . . . . . . . 48
3.2 PSNR comparison of the proposed method to other methods given inthe literature (results given in dB). . . . . . . . . . . . . . . . . . 52
3.3 Computation times for a 256x256 image, in seconds. . . . . . . . . . . 53
3.4 Compression ratios of 2D wavelet compression both with and withoutdenoising applied as a pre-processing step. . . . . . . . . . . . . . . . 54
4.1 Compression ratios of 3D wavelet compression both with and withoutdenoising applied as a pre-processing step. . . . . . . . . . . . . . . . 84
xxi
LIST OF FIGURES
Figure Page
1.1 Generalized architecture of the H.261 encoder. . . . . . . . . . . . . . 4
1.2 2D wavelet transform. Left: Original ”Peppers” image. Center: Wavelettransformed image, MRlevel = 3. Right: Subband reference. . . . . . 7
1.3 Comparison between JPEG and wavelet compression methods usingthe ”Peppers” image. Left: JPEG compression, file size = 6782 bytes,compression ratio 116:1, PSNR = 22.32. Right: 2D Wavelet compres-sion, file size = 6635 bytes, compression ratio 118:1, PSNR = 25.64. . 9
2.1 Wavelet decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Wavelet reconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Non-decimated wavelet decomposition. . . . . . . . . . . . . . . . . . 31
3.2 Non-decimated wavelet synthesis. . . . . . . . . . . . . . . . . . . . . 32
3.3 Generic coefficient array. . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Generic coefficient array, with corresponding S·,k values. . . . . . . . . 37
3.5 Optimal denoising method applied to noisy ”Lenna” image. Left: Cor-rupted image f(x, y), σn = 50, PSNR = 14.16 dB. Right: Optimallydenoised image f opt(x, y), PSNR = 27.72 dB. . . . . . . . . . . . . . . 41
3.6 Test images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7 Average PSNR values using different wavelets. . . . . . . . . . . . . . 46
xxii
3.8 Error results for test images, σn = 30. . . . . . . . . . . . . . . . . . . 47
3.9 τm(·), sm(·) and their corresponding estimates, τm(·), sm(·). . . . . . . 51
3.10 Results of the proposed image denoising algorithm. Top left: Original”Peppers” image. Top right: Corrupted image, σn = 37.75, PSNR =16.60 dB. Bottom: Denoised image using the proposed method, PSNR= 27.17 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.11 Results of the proposed image denoising algorithm. Top left: Original”House” image. Top right: Corrupted image, σn = 32.47, PSNR =17.90 dB. Bottom: Denoised image using the proposed method, PSNR= 29.81 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.12 Wavelet-based compression results with and without pre-processing. . 58
4.1 Test results of both TFS and SFT denoising methods. Upper left:FOOTBALL image sequence, SFT denoising, max. PSNR = 30.85,τ = 18, τz = 12. Upper right: FOOTBALL image sequence, TFSdenoising, max. PSNR = 30.71, τ = 18, τz = 12. Lower left: CLAIREimage sequence, SFT denoising, max. PSNR = 40.77, τ = 19, τz = 15.Lower right: CLAIRE image sequence, TFS denoising, max. PSNR =40.69, τ = 15, τz = 21. . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Spatial positions of motion estimation test points. Left: FOOTBALLimage sequence, frame #96. Right: CLAIRE image sequence, frame#167. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 Motion estimate given in [10] of image sequences, CLAIRE and FOOT-BALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4 Proposed motion estimate of image sequences, CLAIRE and FOOT-BALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5 α and β parameter testing for temporal domain denoising. . . . . . . 75
4.6 Denoising methods applied to the SALESMAN image sequence, std.= 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Denoising methods applied to the SALESMAN image sequence, std.= 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
xxiii
4.8 Denoising methods applied to the TENNIS image sequence, std. = 10. 77
4.9 Denoising methods applied to the TENNIS image sequence, std. = 20. 78
4.10 Denoising methods applied to the FLOWER image sequence, std. = 10. 78
4.11 Denoising methods applied to the FLOWER image sequence, std. = 20. 79
4.12 Original frame #7 of the SALESMAN image sequence. . . . . . . . . 79
4.13 SALESMAN image sequence corrupted, std. = 20, PSNR = 22.10. . . 80
4.14 Results of the 3D K-nearest neighbors filter, [83], PSNR = 28.42. . . 80
4.15 Results of the 2D wavelet denoising filter, given in Chapter 3, PSNR= 29.76. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.16 Results of the 2D wavelet filtering with linear temporal filtering, [55],PSNR = 30.47. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.17 Results of the proposed denoising method, PSNR = 30.66. . . . . . . 82
4.18 Wavelet-based compression results with and without pre-processing. . 83
5.1 3D wavelet compression. . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Starting from left to right. 1) Original three-dimensional video signal.2) 2D wavelet transform (KM = 2 and JM = 0). 3) Symmetric 3Dwavelet transform 4) Decoupled 3D wavelet transform (KM = 2 andJM = 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Decoupled 3D wavelet transform subbands, KM = 2, JM = 2. Left:Subband d3D
hl,1,1[·] highlighted in gray. Right: Subband d3Dlh,0,2[·] high-
lighted in gray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
xxiv
5.4 Comparison of 2D wavelet compression and 3D wavelet compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 2Dwavelet compression. s2 = 64, KM = 8, file size = 198KB, compressionratio = 256:1, average PSNR = 29.80. Right: 3D wavelet compression.s2 = 29, s3 = 29, KM = 8, JM = 8, file size = 196KB, compressionratio = 258:1, average PSNR = 33.31. . . . . . . . . . . . . . . . . . . 96
5.5 Virtual-object extraction. . . . . . . . . . . . . . . . . . . . . . . . . 99
5.6 Virtual-object compression. . . . . . . . . . . . . . . . . . . . . . . . 103
5.7 Comparison of 3D wavelet compression and virtual-object compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 3Dwavelet compression. s2 = 29, s3 = 29, KM = 8, JM = 8, file size= 196KB, compression ratio = 258:1, average PSNR = 33.31. Right:Virtual-object compression, s2 = 25, s3 = 25, KM = 8, JM = 8 forthe virtual-object and s2 = 9, KM = 8 for the background, file size =195KB, compression ratio = 259:1, average PSNR = 34.00. . . . . . . 104
5.8 Comparison of 2D wavelet compression, 3D wavelet compression, andvirtual-object compression. . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1 Content-based 3D wavelet compression/decompression design flow. . . 110
6.2 3D wavelet communication system. . . . . . . . . . . . . . . . . . . . 111
6.3 Complete rate control system. . . . . . . . . . . . . . . . . . . . . . . 113
6.4 Rate control model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.5 Display frame rate and display buffer size, D0=12 fps. . . . . . . . . . 124
6.6 Frame acquisition rate and grouping buffer size, D0=12 fps. . . . . . . 125
6.7 Display frame rate and display buffer size, D0=2 fps. . . . . . . . . . 126
6.8 Frame acquisition rate and grouping buffer size, D0=2 fps. . . . . . . 127
xxv
CHAPTER 1
Introduction
Effective image and video compression techniques have been active research areas
for the last several years. Because of the vast data size of raw digital image and
video signals and limited transmission bandwidth and storage space, image and video
compression techniques are paramount in the development of digital image and video
systems. It is essential to develop compression methods which can both produce high
compression ratios and preserve reconstructed quality in order for the creation of high
quality, affordable image and video products.
It is this seemingly limitless demand for higher quality image and video compres-
sion systems which provides substantial motivation for further compression research.
First, a brief overview of the latest compression standards will be provided prior to
the presentation of specific research topics and objectives.
1.1 A Review of Current Compression Standards
1.1.1 Image Compression Standard (JPEG)
The Joint Pictures Experts Group (JPEG) committee developed a compression
standard for digital images in the late 1980’s. JPEG compression has long since been
1
the most widely accepted standard in image compression, embedded in most modern
digital imaging products.
The JPEG image encoder operates on 8x8 or 16x16 blocks of image data. Thus,
images being compressed by JPEG are segmented into processing blocks called mac-
roblocks. JPEG compresses each macroblock separately by first transforming the
block by Discrete Cosine Transformation (DCT), quantizing the resultant coefficients,
run-length encoding, and finally coding with a variable length entropy coder [47]. The
block-based encoder facilitates simplicity, computational speed, and a modest mem-
ory requirement.
Typically, JPEG can compress images at a 10:1 to 20:1 compression ratio and
retain high quality reconstruction. 30:1 to 50:1 compression ratios can be obtained
with only minor defects to the reconstructed image [34].
1.1.2 JPEG2000 Image Compression Standard
It has been known throughout the research community for several years that
the wavelet transform is superior to DCT methods in image compression. Thus, in
March of 2000, JPEG published the JPEG2000 standard based on wavelet technology
[63]. The compression method of JPEG2000 is similar to that of JPEG. However,
JPEG2000 uses the wavelet transform instead of the block-based DCT. This allows for
the user to specify the size of the processing block (small block sizes reduce the mem-
ory requirement while large block sizes improve compression gain and reconstructed
image quality). After transformation, coefficients are quantized and encoded as in
the JPEG standard.
2
The JPEG2000 standard promises a 20%-25% smaller average file size with com-
parable quality than the original JPEG standard [44].
1.1.3 Video Compression Standards (H.26X and MPEG-X)
The H.261 Video Compression Standard
H.261 is a compression standard developed by the ITU (International Telecom
Union) in 1990. The compression algorithm involves block-based DCT transforma-
tion as in JPEG, but also inter-frame prediction and motion compensation (MC) for
temporal domain compression. Temporal domain compression starts with an initial
frame, the intra (or I) frame. Compression is achieved by creating a predicted (P)
frame by subtracting the motion compensated current frame from the closest recon-
structed I frame. The I and P frames are then compressed by a method very similar
to JPEG, and because the P frames no longer contain as much information as their
original frame counterparts, temporal domain compression is achieved. Figure 1.1
gives a generalized architecture of the H.261 encoder.
Because of the subtraction involved in temporal domain compression, the quality
of the P frames are highly dependent upon the quality of the I frames. To combat
this problem, the P frames are compressed by subtraction from reconstructed I frames.
Thus, in decoding the P frames, there is little error introduced from temporal domain
compression.
The H.263 Video Compression Standard
H.263, also developed by the ITU, was published in 1995. The standard is similar
to H.261, but provides more advanced techniques such as half-pixel precision MC,
whereas H.261 uses full pixel precision MC.
3
Figure 1.1: Generalized architecture of the H.261 encoder.
The MPEG-1 Video Compression Standard
The Motion Pictures Expert Group (MPEG) published the MPEG-1 standard
in 1990 [1]. The video compression algorithm embedded in MPEG-1 follows H.261
with a few differences. One, the MC algorithm has less restriction providing better
predictive performance. Two, MPEG-1 not only generates I and P frames, but also
provides bi-directional predicted (or B) frames. While a P frame is generated from
the difference between the motion compensated current frame and the closest recon-
structed I frame, a B frame is produced from the difference between the current frame
and the average of the closest two reconstructed I frames. The introduction of the B
frame in MPEG-1 gives a sequence of coded video frames in form of:
I BB P BB P BB P BB I BB P BB P...
4
The advances of MPEG-1 from H.261 and H.263 make it a more popular com-
pression standard. A typical compression ratio from a high quality MPEG-1 encoded
bitstream is 26:1 [8].
The MPEG-2 Video Compression Standard
Soon after the advent of MPEG-1, MPEG-2 was developed. The MPEG-2 stan-
dard is much like MPEG-1, with some added capability. Among the many improve-
ment, like H.263 from H.261, MPEG-2 supports half-pixel precision MC for higher
performance inter-frame prediction, [2, 30] . Typically, a high-quality MPEG-2 video
encoding will result in a 45:1 compression ratio [9]. Currently MPEG-2 is the most
widely used compression standard. It is the compression method used in digital video
disks (DVD), and most digital video recorders (DVR).
The MPEG-4 Video Compression Standard
The finalized version of the MPEG-4 standard was published in December of 1999.
The basis of coding in MPEG-4 is not a processing macroblock, as in MPEG-1 and
MPEG-2, but rather an audio-visual object [3]. Object based compression techniques
have certain advantages, such as:
1) Allowing more user interaction with video content.
2) Allowing the reuse of recurring object content.
3) Removal of artifacts due to the joint coding of objects.
Although MPEG-4 does specify the advantages of object-based compression and
provides a standard of communication between sender and receiver, it does not provide
the means by which a) the content is separated into audio-visual objects, or b) the
audio-visual objects are compressed. The MPEG-4 standard is a more open standard
5
which can accept various compression methods. As long as both sender and receiver
possess the correct respective tool set for compression and decompression, they can
communicate.
The advent of the MPEG-4 compression standard has opened up audio and video
compression to more researchers, and provides a flexible environment for continual
improvement in the compression of audio and video signals.
1.2 Motivation for Wavelet Image Compression Research
1.2.1 Wavelet Image Compression vs. JPEG Compression
With the exception of JPEG2000 and MPEG-4 (which does not provide a method
of compression), each of the aforementioned compression standards given in Section
1.1 have the same drawback: blocking artifacts which appear in the reconstructed
signals at low bit-rate coding. These artifacts are a direct result of the block-based
DCT transform.
The wavelet transform does not have the drawbacks of block-based DCT methods.
Compression algorithms based on the wavelet transform do not segment frames into
processing blocks. Thus, wavelets have been extensively researched as an alternative
to block-based DCT compression methods, for both images and video signals [37, 52,
70, 82].
Figure 1.2 shows the ”Peppers” image, its wavelet decomposition, and a graphic
giving the referenced subband decomposition. As shown in Figure 1.2 the wavelet
transform does not break the image up into processing blocks, but processes the
entire image as a whole, creating subbands representative of differing spatial frequency
bandwidths.
6
Figure 1.2: 2D wavelet transform. Left: Original ”Peppers” image. Center: Wavelettransformed image, MRlevel = 3. Right: Subband reference.
Each of the subbands of the subband reference in the rightmost portion of figure
1.2 is labeled with a letter ”a” or ”d”. The subband labeled with a letter ”a” con-
tains scaling coefficients, which are the low spatial frequency representation of the
original image. The remaining subbands which are labeled with a letter ”d” contain
wavelet coefficients. Wavelet coefficients represent different levels of bandpass spatial
frequency information of the original image.
The subscript letters following the a’s and d’s, given in Figure 1.2, provide the
horizontal and vertical contributions of the particular subband. Typically, in the
2D wavelet transform, the original data values are processed first in the horizontal
direction, then in the vertical direction. Therefore, the data in each subband has
been contributed to from both horizontal and vertical processing. Thus, the ”H”
designation is representative of high frequency information, and the ”L” designation is
representative of low frequency information. For example, an HL designation denotes
data in that particular subband is representative of high frequency information in
the horizontal dimension and low frequency information in the vertical dimension.
7
Conversely, the LH designation denotes low frequency information in the horizontal
dimension and high frequency information in the vertical dimension. Also, the ”all,2”
subband is the lowest frequency representation of the original image and merely a copy
of the original image that has been decimated (low-pass filtered and downsampled)
by 22+1 in both the horizontal and vertical dimensions.
The numbers following the subscript letters represent the multiresolution level
(MRlevel) of the wavelet decomposition; The higher the value, the lower frequency
representation of the original signal the wavelet coefficients represent.
After the wavelet transform is applied to an image as in Figure 1.2, each subband
is quantized, run-length encoded, and sometimes entropy encoded, much like JPEG
compression.
Images compressed by methods utilizing the 2D wavelet transform have been
shown to progress into a more graceful degradation of reconstructed quality with an
increase in compression ratio. Unlike DCT-based compression, wavelet based image
encoders operate on each frame as a whole, thus eliminating blocking artifacts. Figure
1.3 gives the ”Peppers” image compressed both by the JPEG standard and wavelet
based compression.
As displayed in Figure 1.3, the wavelet compression algorithm does not produce
the blocking artifacts that appear in JPEG compression, but rather exhibits a more
graceful degradation in image quality with high compression ratio.
The JPEG compressed image given in Figure 1.3 is produced by the Advanced
JPEG Compressortm, downloadable software that can be found at
http://www.winsoftmagic.com. The wavelet compressed image given in Figure 1.3 is
produced by in-house software developed by the OSU research group. The ”Peppers”
8
Figure 1.3: Comparison between JPEG and wavelet compression methods using the”Peppers” image. Left: JPEG compression, file size = 6782 bytes, compression ra-tio 116:1, PSNR = 22.32. Right: 2D Wavelet compression, file size = 6635 bytes,compression ratio 118:1, PSNR = 25.64.
image is compressed by wavelet transformation, uniform quantization in all subbands,
stack-run coding [72], and Huffman coding [22]. No other processing is used. This
method of compression is referred to as 2D wavelet compression; the two dimensions
being processed are the vertical and horizontal dimensions of the image, as shown in
Figure 1.2.
1.2.2 Wavelet Image Pre-processing
Our research motivation in image compression is to provide supplemental pre-
processing steps to further enhance the capabilities of 2D wavelet compression. Im-
age pre-processing techniques are well established in many compression algorithms.
9
However, we have developed an image pre-processing algorithm which has proven to
out-perform established methods in both image quality and computation time.
Image pre-processing techniques are able to intelligently remove noise inherent
in digital images. The removal of noise decreases the entropy in the original image
signal, facilitating compressibility and reconstructed quality. With the removal of
noise, the encoder need not waste bits on noise, but rather use all the encoded bits
for storage of important image features.
Many different noise removal techniques have been applied to images, but the
wavelet transform has been viewed by many as the preferred technique for noise
removal [29, 42, 43, 54]. Rather than a complete transformation into the frequency
domain, as in DCT or FFT (Fast Fourier Transform), the wavelet transform produces
coefficient values which represent both time and frequency information. The hybrid
spatial-frequency representation of the wavelet coefficients allows for analysis based
on both spatial position and spatial frequency content. The hybrid analysis of the
wavelet transform is excellent in facilitating image denoising algorithms.
The wavelet transform does have a drawback, however. The computation time
of the wavelet transform hinders the performance of real-time image denoising ap-
plications. Thus, it is imperative to minimize the processing steps between wavelet
transformation and inverse transformation, i.e., the modification of wavelet coefficient
values for noise removal.
Thus, an image denoising method is developed which outperforms algorithms
given in [42, 43, 54] both in signal-to-noise ratio and computation time. This is
accomplished by providing an accurate and computationally simple coefficient selec-
tion process. Results of the proposed image denoising research show an improvement
10
in PSNR and a substantial reduction in computational complexity with a speedup of
over an order of magnitude than the established methods given in [42, 43, 54].
1.3 Motivation for Wavelet Video Compression Research
Because the wavelet transform has been successful in achieving better image qual-
ity at high compression ratios than traditional JPEG image compression, it is only
natural to assume that wavelet video compression techniques would be able to out-
perform the block-based DCT compression methods of H.26X and MPEG-X.
Several wavelet compression techniques have been targeted toward video appli-
cations. Tham et. al. uses block-based motion compensation for temporal domain
compression and the 2D wavelet transform for spatial compression [71]. Zheng, et.
al. uses the wavelet transform for temporal domain compression as well as spatial
domain compression, or 3D wavelet compression [24, 81].
The more straightforward approach in [81] exploits the advantages of the wavelet
transform in three dimensions for the compression of video. This approach uses the 2D
wavelet transform for intra-frame coding, and use the wavelet transform in between
frames for inter-frame coding.
Although both wavelet video compression techniques have had success in video
compression, there has not been an overwhelmingly superior wavelet video com-
pression technique to combat the industry standards. Thus, this research develops
wavelet-based techniques that further enhance the capabilities of 3D wavelet com-
pression.
11
We provide two processing methods to aid in the effectiveness of 3D wavelet
compression: a wavelet-based video noise removal algorithm for video pre-processing,
and a virtual-object based compression scheme utilizing 3D wavelet compression.
1.3.1 Video Signal Pre-processing for Noise Removal
It is well known that the removal of noise in images helps compression techniques
obtain higher compression ratios while achieving better reconstructed image quality.
However, there has not been much work in the removal of noise in video signals.
With video signals, there exists not only spatial domain noise, but also noise in the
temporal domain. Using the wavelet transform, we remove both spatial and temporal
noise providing a higher compression gain with 3D wavelet compression.
Noise reduction in digital images has been studied extensively [15, 16, 27, 29,
31, 42, 43, 54, 61, 77]. However, noise reduction in digital video has only rarely
been studied. Preliminary methods for temporal domain noise removal are variable
coefficient spatio-temporal filters [33, 83] and weighted median filters [45]. These
types of filters have also been studied in noise removal of images. Huang, et. al. uses
an adaptive median filter for noise removal in images [27]. Rieder and Scheffler [61],
and Wong [77] both use an adaptive linear filter for image noise removal. But the
wavelet transform has not been used for temporal domain noise removal.
One can only speculate why the wavelet transform has not yet been considered
for video signal denoising. However, our own preliminary analysis shows that the
overwhelming difficulty with using the wavelet transform is a considerable computa-
tional load. But with our image denoising technique, we have shown a significant
12
speedup in wavelet image denoising when compared to established methods, so the
computational burden in video denoising is overcome.
Thus, we include a method of removing temporal domain noise in video sequences
via the wavelet transform. Using techniques similar to the proposed image denoising
technique, we overcome the overwhelming computational burden provided by the
application of the wavelet transform in the temporal domain. Our video denoising
technique is applied to image sequences prior to compression, enabling more effective
compressed video.
1.3.2 Virtual-Object Based Video Compression
With the advent of the MPEG-4 standard, video compression is based on an
audio-visual object instead of the traditional macroblock [3].
Due to the advantages of object-based compression, as provided in the MPEG-
4 standard [3], we propose a wavelet-based virtual-object compression algorithm.
Virtual-object compression first separates moving objects from stationary background
and compresses each separately, thus achieving the advantages of object-based com-
pression.
There are two separate processing areas in object-based compression. Object
extraction is the method of separating different objects in an image sequence, and
the compression of those objects is a method of coding arbitrarily shaped objects. In
the virtual-object compression method, the wavelet transform is used for both object
extraction and object compression.
When the wavelet transform is applied in the temporal domain, motion of objects
is detected by large coefficient values. Therefore, the wavelet transform is used in
13
the identification and extraction of moving objects prior to object-based compression.
Virtual-object compression uses the non-decimated wavelet transform in the temporal
domain for the separation of objects from stationary background.
Virtual-object compression also restricts the virtual-object to be rectangular. This
restriction enables the use of 3D wavelet compression for the compression of the
virtual-object. Also, with a rectangular object restriction, the location and shape of
the object can be completely defined with only two sets of spatial coordinates (the
starting horizontal and vertical locations of the virtual-object, and the width and
height of the virtual-object), thus virtually eliminating shape coding overhead.
Results show the virtual-object compression method to be superior in compression
ratio with higher PSNR when compared to 3D wavelet compression.
1.4 Motivation for the Rate Control of Wavelet-CompressedVideo
Using the 3D wavelet compression method discussed in [24, 81], the number of
frames contained in a GoF (Group of Frames) varies due to video content. Thus,
there exists an unknown delay in the acquisition of the GoF, and the computation
time needed for compression. Also, in streaming applications across the Internet,
there exists another unknown delay in the transmission of the compressed GoF to the
receiver, and yet another unknown delay in the decompression time. The variability
in the time from frame acquisition to frame display requires a rate control algorithm
for real-time transmission of 3D wavelet compressed video.
A real-time video compression and transmission system is necessarily a multi-
threaded package. On the server side frame acquisition, GoF compression, and packet
transmission processes must work independently for real-time operation. For example,
14
in real-time compression the frame acquisition process may not wait for the compres-
sion process to finish before acquiring the next GoF. Frame acquisition must occur
at regular intervals for real-time processing. On the client side, the decompression of
the GoF must occur independently from frame display for real-time systems.
In a multi-threaded environment such as the real-time compression and transmis-
sion of video, there must exist a process to manage the computational activity of
each processing thread in order to avoid overflow or starvation of buffers between
the threads. Also, this management process must exist in both the client and server
systems, and the management processes must communicate to ensure equivalent ac-
quisition and display rates (a requirement for real-time video applications).
The true motivation for a rate-control algorithm in a 3D wavelet compression
scheme is that of necessity. We may possess an efficient and effective video compres-
sion scheme, but without an effective rate-control system, real-time video commu-
nication is not possible. Performance results give a continuous video stream from
sender to receiver with a modest variation in frame rate.
1.5 Dissertation Overview
The rest of the dissertation is organized as follows. Chapter 2 is an overview of
wavelet theory. The goal of the overview is to develop the wavelet filterbank analysis
and synthesis equations, used in the computation of the wavelet forward and inverse
transforms. The wavelet forward and inverse transforms are then used throughout
the dissertation.
In Chapter 3 we develop the feature-based wavelet selective shrinkage algorithm
for image denoising. The coefficient selection method is based on a two-threshold
15
criteria to aptly determine which coefficients contain useful image information, and
which coefficients are corrupted with noise. The two-threshold criteria proves to be
an effective means of distinguishing between useful and useless coefficients, and the
performance of the denoising method is an improvement over other methods given in
the literature both in PSNR and computation time.
Chapter 4 develops the video denoising algorithm which is based upon the image
denoising algorithm described in Chapter 3. However, the video denoising algorithm
also applies temporal domain processing to eliminate inter-frame noise. There is also
a motion estimation algorithm applied to the video signal prior to temporal domain
processing. The motion estimation algorithm is able to determine the amount of
temporal domain processing which can improve overall quality.
Chapter 5 describes the virtual-object compression method. The virtual-object
compression method separates moving objects from stationary background and com-
presses each separately. The independent coding of object and background gives the
virtual-object compression method an improvement in signal-to-noise ratio over GoF
based compression methods such as 3D wavelet compression.
Chapter 6 develops a rate control algorithm for real-time video communication
using wavelet-based compression schemes. The size of the GoF varies in the wavelet-
based codec, so the computation times of the compression and decompression algo-
rithms are unknown. Also, the transmission time of the compressed GoF from sender
to receiver is unknown and variable. Thus, it is necessary to include a rate con-
trol mechanism to ensure continuous video delivery from server to client. Chapter 7
concludes the dissertation and provides some areas for future research.
16
CHAPTER 2
Wavelet Theory Overview
An overview of wavelet theory is presented for completeness and for the formu-
lation of both the wavelet analysis and synthesis filterbank equations, used in the
computation of the wavelet forward and inverse transforms, respectively.
2.1 Scaling Function and Wavelet Definitions
The basic idea of a transform is to use a set of orthonormal basis functions to
convolve with an input function. The resultant output function, then, can be evalu-
ated or modified. The Fourier Transform, for example, uses complex sinusoids (i.e.
ejωn, ∀ n) as its orthonormal basis set. The wavelet transform uses stretched and
shifted versions of one function, the mother wavelet, as its basis. However, not any
function can be a mother wavelet. There are certain criteria which the mother wavelet
must obey.
We will start with a scaling function, Φ(·). A basis can be generated by shifting
and stretching this function.
Φk,n(t) = 2−k2 Φ(2−kt− n), (2.1)
and
||Φ(t)|| = 1. (2.2)
17
where Φk,n(·) is the basis function of the kth scale and nth position.
It is required that the set of all Φk,n(·) be an orthonormal basis. Therefore, any
function, f(·), can be completely defined by a weighted sum of the basis functions
given in Equation 2.1.
f(t) =∑
k
∑n
ak[n]Φk,n(t), (2.3)
where
ak[n] = 〈Φk,n(t), f(t)〉 =
∫ ∞
−∞Φ∗
k,n(t)f(t)dt. (2.4)
ak[·] are called scaling coefficients.
Let us define a subset of the basis functions, Φk,n(·).
Vk = Span{Φk,n(t); n ∈ Z}. (2.5)
It is required that,
... Vk+1 ⊂ Vk ⊂ Vk−1 ... (2.6)
where Vk+1 defines a span of coarser scaling functions than does Vk.
We know from Equations 2.5 and 2.6 that, Φk+1,0(·) ∈ Vk+1 ⊂ Vk. So substituting
into Equation 2.3 we can show there exists a set of weights, h[·], such that
Φk+1,0(t) =∑
n
h[n]Φk,n(t), (2.7)
which when using Equation 2.1 and setting k = 0 reduces to
Φ(t) =√
2∑
n
h[n]Φ(2t− n). (2.8)
Equation 2.8 is referred to as the scaling equation, and the scaling function, Φ(·) is
completely defined by h[·].
18
A subset of scaling functions, Vk−1 can be defined by a subset of coarser scaling
functions Vk plus a difference subset, which we will call Wk. Therefore,
Vk−1 = Vk + Wk (Vk ⊥ Wk). (2.9)
We can then define a basis for Wk:
Wk = span{Ψk,n(t), n ∈ Z}, (2.10)
where
Ψk,n(t) = 2−k2 Ψ(2−kt− n). (2.11)
Ψ(·) is the mother wavelet, and the set of all Ψk,n(·) are the wavelet basis functions
corresponding to the subset Wk.
Because Wk ⊂ Vk−1, as given in Equation 2.9, we can substitute into Equation 2.3
to show that there exists a set of values, g[·] such that,
Ψk,0(t) =∑
n
g[n]Φk−1,n(t), (2.12)
which using Equation 2.11 and setting k = 1 can be reduced to
Ψ(t) =√
2∑
n
g[n]Φ(2t− n). (2.13)
Equation 2.13 is referred to as the wavelet scaling equation, and g[·] completely de-
scribes the Mother Wavelet, Ψ(·).
Notice from Equation 2.9 for any arbitrarily fine scale, k, we can show that,
Vk = Vk+1 + Wk+1
= Vk+2 + Wk+2 + Wk+1
= Vk+3 + Wk+3 + Wk+2 + Wk+1
=∑∞
n=1 Wk+n.
(2.14)
And therefore, any function, f(·), can be defined by
f(t) =∑
k
∑n
dk[n]Ψk,n(t), (2.15)
19
where
dk[n] = 〈Ψk,n(t), f(t)〉 =
∫ ∞
−∞Ψ∗
k,n(t)f(t)dt. (2.16)
2.2 Scaling Function and Wavelet Restrictions
Recall, that we want to keep shifted basis functions, Φk,n(·), orthonormal. There-
fore, for a given scale, k, we have
δ[m] = 〈Φk,0(t), Φk,m(t)〉=
⟨Φk,0(t), Φk,0(t− 2km)
⟩,
(2.17)
where δ[·] is the Kronecker delta function [50]. Using Equations 2.1, 2.7, and setting
k = 1, Equation 2.17 can reduce to
δ[m] =∑
n
h[n]h[n− 2m]. (2.18)
The wavelet basis functions, Ψk,n(·), also need to be orthonormal to the scaling basis
functions Φk,n(·), for Equation 2.9 to be valid. Therefore,
0 = 〈Ψk,0(t), Φk,m(t)〉 , (2.19)
which can be reduced to
0 =∑
n
g[n]h[n− 2m]. (2.20)
Equation 2.20 can be solved by
g[n] = (−1)nh[N − n], (2.21)
where N is the length of both h[·] and g[·].
2.3 Wavelet Filterbank Analysis
Let fk(·) ∈ Vk. From Equations 2.3, 2.14, and 2.15 it can be shown that
fk(t) =∑
n ak[n]Φk,n(t)=
∑n ak+1 [n]Φk+1n(t) +
∑n dk+1 [n]Ψk+1,n(t),
(2.22)
20
where dk+1 [·] and ak+1 [·] are the wavelet coefficients and scaling coefficients of the k+1
scale, respectively.
Using Equation 2.4 the scaling coefficients are realized, and substituting Equation
2.7 we obtain
ak+1 [n] =⟨fk(t), Φk+1,n(t)
⟩=
⟨∑m ak[m]Φk,m(t), Φk+1,n(t)
⟩=
∑m ak[m]
⟨Φk,m(t), Φk+1,n(t)
⟩=
∑m ak[m]
⟨Φk,m(t), Φk+1,0(t− 2k+1n)
⟩.
(2.23)
Using Equations 2.1 and 2.7, Equation 2.23 can be reduced to
ak+1 [n] =∑m
ak[m]∑
l
h[l]⟨2−
k2 Φ(2−kt−m), 2−
k2 Φ(2−kt− l − 2n)
⟩. (2.24)
Since the scaling function basis is orthonormal, the inner product in Equation 2.24 is
equal to one if and only if (l + 2n) = m. Therefore,
ak+1 [n] =∑m
ak[m]h[m− 2n]. (2.25)
Equation 2.25 indicates that the scaling coefficients ak+1 [·] can be obtained by con-
volving a reversed h[·] with ak[·], and downsampling by two.
Very similarly, it can be shown that,
dk+1 [n] =∑m
ak[m]g[m− 2n]. (2.26)
From Equations 2.23 and 2.25, we can obtain increasing coarser scales of wavelet
coefficients, dk+1 [·], by convolving the scaling coefficients, ak[·], by both a reversed
scaling filter, h[·], and a reversed wavelet filter, g[·], and downsampling by two. Figure
2.1 gives a block diagram of wavelet filterbank analysis.
Because each filtered output is downsampled by two, the same number of total
coefficients remains the same regardless of the number of resolution levels, k.
21
Figure 2.1: Wavelet decomposition.
2.4 Wavelet Filterbank Synthesis
Let fk(·) ∈ Vk. From Equations 2.4 and 2.22 it can be shown that
ak[n] = 〈fk(t), Φk,n(t)〉=
⟨∑m ak+1 [m]Φk+1,m(t) +
∑m dk+1 [m]Ψk+1,m(t), Φk,n(t)
⟩.
(2.27)
With some further computation, and substituting in Equations 2.7 and 2.12 it can
be shown that
ak[n] =∑
m ak+1 [m]⟨Φk+1,m(t), Φk,n(t)
⟩+
∑m dk+1 [m]
⟨Ψk+1,m(t), Φk,n(t)
⟩=
∑m ak+1 [m]h[n− 2m] +
∑m dk+1 [m]g[n− 2m].
(2.28)
From Equation 2.28, we can the obtain the original signal, fk(t), by upsampling
the scaling and wavelet coefficients and filtering the coefficients with their respective
filters, h[·] and g[·]. The wavelet reconstruction block diagram is given in Figure 2.2.
2.5 Two-Dimensional Wavelet Transform
A digital image is, in most cases, considered as a two-dimensional array, with
width and height as the dimensions. Let f(·) be a 2 dimensional, discrete signal. As
shown in Equations 2.25 and 2.26, the wavelet transform in one dimension generates
two pair of coefficients: scaling coefficients, ak[·], and wavelet coefficients, dk[·]. When
22
Figure 2.2: Wavelet reconstruction.
dealing with two dimensions, however, four pair of coefficients are generated. That
is,all,0[x, y] =
∑n h[n− 2y]
∑m h[m− 2x]f(m,n)
dhl,0[x, y] =∑
n h[n− 2y]∑
m g[m− 2x]f(m,n)dlh,0[x, y] =
∑n g[n− 2y]
∑m h[m− 2x]f(m,n)
dhh,0[x, y] =∑
n g[n− 2y]∑
m g[m− 2x]f(m,n).
(2.29)
As in the case of the 1-dimensional wavelet transform, the scaling coefficients can
be processed further for a multiresolution analysis of the original image, f(·):
all,k+1 [x, y] =∑
n h[n− 2y]∑
m h[m− 2x]all,k[m,n]dhl,k+1 [x, y] =
∑n h[n− 2y]
∑m g[m− 2x]all,k[m,n]
dlh,k+1 [x, y] =∑
n g[n− 2y]∑
m h[m− 2x]all,k[m,n]dhh,k+1 [x, y] =
∑n g[n− 2y]
∑m g[m− 2x]all,k[m, n].
(2.30)
The four coefficient sets are referred to as the low-low band, all,·[·], the high-low band,
dhl,·[·], the low-high band, dlh,·[·], and the high-high band, dhh,·[·]. The subbands are
named due to the order in which the scaling and/or the wavelet filters process the
scaling coefficients, all,·[·].
The reconstruction of f(x, y) is accomplished by
all,k[x, y] =∑
m h[x− 2m]∑
n h[y − 2n]all,k+1 [m,n]+
∑m h[x− 2m]
∑n g[y − 2n]dlh,k+1 [m, n]
+∑
m g[x− 2m]∑
n h[y − 2n]dhl,k+1 [m, n]+
∑m g[x− 2m]
∑n g[y − 2n]dhh,k+1 [m,n],
(2.31)
23
andf(x, y) =
∑m h[x− 2m]
∑n h[y − 2n]all,0[m,n]
+∑
m h[x− 2m]∑
n g[y − 2n]dlh,0[m,n]+
∑m g[x− 2m]
∑n h[y − 2n]dhl,0[m,n]
+∑
m g[x− 2m]∑
n g[y − 2n]dhh,0[m,n],
(2.32)
2.6 Summary
In this chapter, a brief overview of wavelet theory is presented and a formulation
of the wavelet analysis and synthesis filterbank equations is developed. The wavelet
analysis equations are given by Equations 2.25 and 2.26, and wavelet synthesis equa-
tion is given by Equation 2.28. Also, the 2D wavelet transform is described. The 2D
forward wavelet transform is given by Equations 2.29 and 2.30, and the 2D wavelet
inverse transform is given by Equations 2.31 and 2.32. Both the wavelet analysis and
synthesis equations and the 2D wavelet transform are used throughout the rest of the
dissertation.
24
CHAPTER 3
Feature-Based Wavelet Selective Shrinkage Algorithm forImage Denoising
3.1 Introduction
The recent advancement in multimedia technology has promoted an enormous
amount of research in the area of image and video processing. Image and video
processing applications such as compression, enhancement, and target recognition
require preprocessing functions for noise removal to improve performance. Noise
removal is one of the most common and important processing steps in many image
and video systems.
Because of the importance and commonality of preprocessing in most image and
video systems, there has been an enormous amount of research dedicated to the
subject of noise removal, and many different mathematical tools have been proposed.
Variable coefficient linear filters [17, 49, 61, 77], adaptive nonlinear filters [27, 46,
53, 83], DCT based solutions [31], cluster filtering [76], genetic algorithms [73], fuzzy
logic [39, 64], etc. have all been proposed in the literature.
The wavelet transform has also been used to suppress noise in digital images. It
has been shown that the reduction of absolute value in wavelet coefficients is suc-
cessful in signal restoration [43]. This process is known as wavelet shrinkage. Other
25
more complex denoising techniques select or reject wavelet coefficients based on their
predicted contribution to reconstructed image quality. This process is known as se-
lective wavelet shrinkage, and many works have used it as the preferred method of
image denoising. Preliminary methods predict the contribution of the wavelet co-
efficients based on the magnitude of the wavelet coefficients [69], and others based
on intra-scale dependencies of the wavelet coefficients [15, 20, 41, 43]. More recent
denoising methods are based on both intra- and inter-scale coefficient dependencies
[18, 26, 29, 42, 54].
Mallat and Hwang prove the successful removal of noise in signals via the wavelet
transform by selecting and rejecting wavelet coefficients based on their Lipschitz
(Holder) exponents [43]. The Holder exponent is a measure of regularity in a sig-
nal, and it may be approximated by the evolution of wavelet coefficient ratios across
scales. Thus, this regularity metric used in selecting those wavelet coefficients which
are to be used in reconstruction, and those which are not. Although this fundamental
work in image denoising is successful in the removal of noise, its application is broad
and not focused on image noise removal, and the results are not optimal.
Malfait and Roose refined the selective shrinkage denoising approach by applying
a Bayesian probabilistic formulation, and modeling the wavelet coefficients as Markov
random sequences [42]. This method is focused on image denoising and its results are
an improvement upon [43]. The Holder exponents are roughly approximated by the
evolution of coefficient values across scales, i.e.
ml,n = 1p−l
∑p−1k=l
∣∣∣λk+1,n
λk,n
∣∣∣,
where ml,n is the approximated Holder exponent of position n of scale l, and λk,n is
the wavelet coefficient of scale k and position n. The rough approximation is refined
26
by assuming that the coefficient values are well modeled as a Markov chain, and
the probability of a coefficients contribution to the image can be well approximated
by the Holder exponents of neighboring coefficients. Coefficients are then assigned
binary labels xk,n of scale k and position n depending on their predicted retention
for reconstruction (xk,n = 1), or predicted removal (xk,n = 0). The binary labels are
then randomly and iteratively switched until P (X|M) is maximized, where xk,n ∈ X
and mk,n ∈ M . The coefficients are modified by λnewk,n = λk,nP (xk,n = 1|M), and the
denoised image is formed by the inverse wavelet transform of the modified coefficients.
Each coefficient is reduced in magnitude depending on the probable contribution to
the image, i.e. P (xk,n = 1|M).
Later, Pizurica, et al. ([54]) continued on the work done by [42] by using a different
approximation of the Holder exponent given by
ρl,n = 1p−l
∑p−1k=l
∣∣∣ Ik+1,n
Ik,n
∣∣∣
where
Ik,n =∑
t∈C(k,n) |λk,t|.
ρk,n is the approximation of the Holder exponent, and C(k, n) is the set of coefficients
surrounding λk,n. This work applies the same probabilistic model as [42] using the
new approximation of the Holder exponent. Coefficients are assigned binary labels,
xk,n, depending on their predicted retention for reconstruction (xk,n = 1), or predicted
removal (xk,n = 0). The binary labels are then randomly and iteratively switched until
P (X|M) is maximized. Unlike [42], the significance measure of a coefficient, M , is not
merely its Holder exponent, but evaluated by the magnitude of the coefficients as well
as its Holder approximation, i.e. fM |X(mk,n|xk,n) = fΛ|X(λk,n|xk,n)fR|X(ρk,n|xk,n).
27
Thus a joint measure of coefficient significance is developed based on both the Holder
exponent approximation and the magnitude of the wavelet coefficient. As in [42], the
coefficients are modified by λnewk,n = λk,nP (xk,n = 1|M).
Although both algorithms in [42] and [54] show promising results in denoised image
quality, the iterative procedure necessary to maximize the probability P (X|M) adds
computational complexity making the processing times of the algorithms impractical
for most image and video processing applications. Also, the Markov Random Field
(MRF) model used in the calculation of P (X|M) is not appropriate for analysis of
wavelet coefficients because it ignores the influence of non-neighboring coefficients.
The MRF model is strictly used for simplicity and conceptual ease [42].
From the review of the literature, one can see that image denoising remains to be
an active and challenging topic of research. The major challenge lies in the fact that
one does not know what the original signal is for a corrupted image. The performance
of a method, on the other hand, can only be measured by comparing the denoised
image with its origin. In this chapter, we present a new denoising approach which
consists of two components. The first is the selective wavelet shrinkage method for
denoising, and the second is a new threshold selection method which makes use of test
images as training samples.
In general, selective shrinkage methods are comprised of three processing steps.
First, a corrupted image is decomposed into multiresolution subbands via the wavelet
transform. Next, wavelet coefficients are modified based upon certain criteria to
predict their importance in reconstructed image quality. Finally, the denoised image
is formed by reconstructing the modified coefficients via the inverse wavelet transform.
The processing step of most cost computationally in the methods of [42] and [54] and
28
greatest importance in denoising performance is the coefficient modification process,
which calls for effective and efficient criteria to modify wavelet coefficients. To improve
performance, this paper presents a new coefficient selection process which uses a
two-threshold criteria to non-iteratively select and reject wavelet coefficients. The
two-threshold selection criteria results in an effective and computationally simple
coefficient selection process.
The threshold selection method presented is based on minimizing the error be-
tween the wavelet coefficients of the denoised image and the wavelet coefficients of
an optimally denoised image produced by a method using supplemental information.
The supplemental information provided produces a denoised image that is far superior
than any method which does not utilize supplemental information. Thus, the image
produced by the method utilizing supplemental information is referred to as an op-
timally denoised image. Using several test cases, the threshold values which produce
the minimum difference between the wavelet coefficients of the denoised image and
the wavelet coefficients of the optimally denoised image are chosen as the threshold
values for the general case.
The two-threshold coefficient selection method results in a denoising algorithm
which gives improved results upon those provided by [42, 54] without the compu-
tational complexity. The two-threshold requirement investigates the regularities of
wavelet coefficients both spatially and across scales for predictive coefficient selection,
providing selective wavelet shrinkage to non-decimated wavelet subbands.
Following the Introduction, Section 3.2 gives theory on the 2D non-decimated
wavelet analysis and synthesis filters. Section 3.3 then describes the coefficient selec-
tion process prior to selective wavelet shrinkage. Section 3.4 gives testing results for
29
parameter selection. Section 3.5 gives the estimation algorithms for proper parameter
selection, and Section 3.6 gives the results. Section 3.7 gives the discussion.
3.2 2D Non-Decimated Wavelet Analysis and Synthesis
To facilitate the discussion of the proposed method, non-decimated wavelet filter-
bank theory is presented. In certain applications such as signal denoising, it is not
desirable to downsample wavelet coefficients after decomposition, as in the tradition
wavelet filterbank. The spatial resolution of the coefficients is degraded due to down-
sampling. Therefore, for the non-decimated case, each subband contains the same
number of coefficients as the original signal.
Let ak[n] and dk[n] be scaling and wavelet coefficients, respectively, of scale k and
position n. Thus,
αk[2k+1n] = ak[n]
λk[2k+1n] = dk[n],
(3.1)
where αk[·] are the non-decimated scaling coefficients, and λk[·] are the non-decimated
wavelet coefficients. Equation 3.1 is substituted into the scaling analysis filterbank
equation, Equation 2.25, to find the non-decimated filterbank equation:
ak+1 [n] =∑
m h[m]ak[m− 2n]αk+1 [2
k+2n] =∑
m h[m]αk[2k+1(m− 2n)]
αk+1 [n] =∑
m h[m]αk[2k+1m− n],
(3.2)
where h[·] and g[·] are the filter coefficients corresponding to the low-pass and high-
pass filter, respectively, of the wavelet transform. The 2k+1 scalar introduced into
Equation 3.2 is equivalent to upsampling h[·] by 2k+1 prior to its convolution with
αk[·]. Similarly Equation 3.1 is substituted into the wavelet analysis filterbank equa-
tion, Equation 2.26, to obtain
λk+1 [n] =∑
m g[m]αk[2k+1m− n]. (3.3)
30
Figure 3.1 gives a block diagram of the non-decimated wavelet decomposition.
Figure 3.1: Non-decimated wavelet decomposition.
The synthesis of the non-decimated wavelet transform also differs from the down-
sampled case. From the wavelet synthesis filterbank equation, Equation 2.28, we
obtain,
ak[2n] =∑m
h[2(n−m)]ak+1 [m] +∑m
g[2(n−m)]dk+1 [m]. (3.4)
Substituting (p = n−m) we obtain,
ak[2n] =∑
p
h[2p]ak+1 [n− p] +∑
p
g[2p]dk+1 [n− p]. (3.5)
Substituting Equation 3.1 into Equation 3.5,
αk[2k+2n] =
∑p h[2p]αk+1 [2
k+2(n− p)]
+∑
p g[2p]λk+1 [2k+2(n− p)]
, (3.6)
and
αk[n] =∑
p h[2p]αk+1 [n− 2k+2p]
+∑
p g[2p]λk+1 [n− 2k+2p].(3.7)
Looking at Equation 3.7 samples are being thrown away by downsampling αk+1 [·] and
λk+1 [·] by 2 prior to convolution. Because the downsampling in the analysis filters
31
is eliminated, a downsample by 2 is shown in the synthesis equation, Equation 3.7.
If a downsample by 2 is not performed, i.e. (m = 2p), then we must divide by 2 to
provide power equality. That is,
αk[n] = 12
∑m h[m]αk+1 [n− 2k+1m]
+12
∑m g[m]λk+1 [n− 2k+1m].
(3.8)
Figure 3.2 gives a block diagram of the non-decimated wavelet transform synthesis.
Figure 3.2: Non-decimated wavelet synthesis.
The above analysis is expanded to the two-dimensional case. For a 2D discrete
signal f(·), the 2D non-decimated wavelet transform is given by
αll,k+1 [x, y] =∑
n,m h[n]h[m]αll,k[2k+1m− x, 2k+1n− y]
λhl,k+1 [x, y] =∑
n,m h[n]g[m]αll,k[2k+1m− x, 2k+1n− y]
λlh,k+1 [x, y] =∑
n,m g[n]h[m]αll,k[2k+1m− x, 2k+1n− y]
λhh,k+1 [x, y] =∑
n,m g[n]g[m]αll,k[2k+1m− x, 2k+1n− y],
(3.9)
where
αll,−1[x, y] = f(x, y). (3.10)
The four coefficient sets given in Equation 3.9 are referred to as the low-low band,
αll,k+1 [·], the high-low band, λhl,k+1 [·], the low-high band, λlh,k+1 [·], and the high-high
32
band, λhh,k+1 [·]. The subbands are named due to the order in which the scaling and/or
the wavelet filters process the scaling coefficients.
For the synthesis of f(·) we have,
αll,k[x, y] = 14
∑m,n h[m]h[n]αll,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n h[m]g[n]λhl,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n g[m]h[n]λlh,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n g[m]g[n]λhh,k+1 [x− 2k+1m, y − 2k+1n]
. (3.11)
Equation 3.9 is recursively computed to produce several levels of wavelet coefficients,
and reconstruction of the 2D signal, f(·), is accomplished by the recursive computa-
tion of Equation 3.11.
The non-decimated wavelet transform has many advantages in signal denoising
over the traditional decimated case. One, each subband in the wavelet decomposition
is equal in size, thus it is more straightforward to find the spatial relationships be-
tween subbands. Two, the spatial resolution of each of the subbands is preserved by
eliminating the downsample by two. Because of the elimination of the downsampler,
information contained in the wavelet coefficients is redundant, and this redundancy
is exploited to determine the coefficients comprised of noise and the coefficients com-
prised of feature information contained in the original image.
3.3 Retention of Feature-Supporting Wavelet Coefficients
One of the many advantages of the wavelet transform over other mathematical
transformations is the retention of the spatial relationship between pixels in the orig-
inal image by the coefficients in the wavelet domain. These spatial relationships
represent features of the image and should be retained as much as possible during
denoising. In general, images are comprised of regular features, and the resulting
wavelet transform of an image generates few, large, spatially contiguous coefficients
33
which are representative of the features given in the original image. We refer to the
spatial contiguity of the wavelet coefficients as spatial regularity.
The concept of spatial regularity has the similar function as that of signal regu-
larity in previous denoising approaches for selecting the wavelet coefficients. The key
difference is that spatial correlation of the features are represented by connectivity of
wavelet coefficients rather than statistical models such as Markov random sequences
[42, 54] or Holder exponents [42, 43, 54] in previous methods. These models are often
computationally complicated and still do not reflect the geometry of the features ex-
plicitly. As a result the current method has a better performance even with a much
simpler computation.
Because of spatial regularity, the resulting subbands of the wavelet transform do
not generally contain isolated coefficients. This regularity can aid in deciding which
coefficients should be selected for reconstruction, and which should be thrown away
for maximum reconstructed image quality. The proposed coefficient selection method
in which spatial regularity is exploited is shown as follows.
Let us assume that an image is corrupted with additive noise, i.e.
f(x, y) = f(x, y) + η(x, y), (3.12)
where f(·) is the noiseless 2D signal, η(·) is a random noise function, and f(·) is the
corrupted signal.
The first step for selecting the wavelet coefficient is to form a preliminary binary
label for each coefficient, which collectively form a binary map. The binary map is
then used to determine whether or not a particular wavelet coefficient is included in
a regular spatial feature. The wavelet transform of f(·) generates coefficients, λ·,k[·],
from Equations 3.9 and 3.10. λ·,k[·] is used to create the preliminary binary map,
34
I·,k[·].
I·,k[x, y] =
{1, when |λ·,k[x, y]| > τ0, else
, (3.13)
where τ is a threshold for selecting valid coefficients in the construction of the binary
coefficient map. A valid coefficient is defined as a coefficient, λ·,k[x, y], which results
in I·,k[x, y] = 1; hence the coefficient has been selected due to its magnitude. After
coefficients are selected by magnitude, spatial regularity is used to further examine
the role of the valid coefficient: whether it is isolated noise or part of a spatial feature.
The number of supporting binary values around a particular non-zero value I·,k[x, y]
is used to make the judgement. The support value, S·,k[x, y], is the sum of all I·,k[·]
which support the current binary value I·,k[x, y]; that is, the total number of all valid
coefficients which are spatially connected to I·,k[x, y].
A coefficient is spatially connected to another if there exists a continuous path of
valid coefficients between the two. Figure 3.3 gives a generic coefficient map. The valid
coefficients are highlighted in gray. From Figure 3.3 it can be shown that coefficients
A, B, C, and H do not support any other valid coefficients in the coefficient map.
However, coefficients D and F support each other, coefficients E and G support each
other, and N and O support each other. Also, coefficients I, J, K, L, M, P, Q, and R
all support one another. Figure 3.4 gives the value of S·,k[x, y] for each of the valid
coefficients given in Figure 3.3. A method of computing S·,k[x, y] is given in Appendix
A. S·,k[·] is used to refine the original binary map I·,k[·] by
J·,k[x, y] =
1, when S·,k[x, y] > s,or J·,k+1[x, y]I·,k[x, y] = 1
0, else
, (3.14)
35
Figure 3.3: Generic coefficient array.
where J·,k[·] is the refined binary map, and s is the necessary number of support
coefficients for selection. J·,·[·] is calculated recursively, starting from the highest
multiresolution level, and progressing downward.
Equation 3.14 is equal to one when there exists enough wavelet coefficients of
large magnitude around the current coefficient. However, it also is equal to one
when the magnitude of the coefficient is effectively large (I·,k[·] = 1) but not locally
supported (J·,k[·] = 0) only if the coefficient of the larger scale is large and locally
supported (J·,k+1 [·] = 1). The decision to use this criterion is in the somewhat rare
case when a useful coefficient is not locally supported. In the general case, wavelet
coefficients of images are clustered together, but rarely they are isolated. In [43],
wavelet coefficients are modified only by their evolution across scales. Regular signal
features contain wavelet coefficients which increase with increasing scale. Thus, if
36
Figure 3.4: Generic coefficient array, with corresponding S·,k values.
there exists a useful coefficient which is isolated in an image, it is reasonable that a
coefficient in the same spatial location of an increase in scale will be sufficiently large
and spatially supported. Thus, the coefficient selection method provided by Equation
3.15 selects coefficients which are sufficiently large and locally supported as well as
isolated coefficients which are sufficiently large and supported by scale.
This type of scale-selection is consistent with the findings of Said and Pearlman
[62], who developed an image codec based on a ”spatial self-symmetry” between dif-
fering scales in wavelet transformed images. They discovered that most of an image’s
energy is concentrated in the low-frequency subbands of the wavelet transform. And
because of the self-symmetry properties of wavelet transformed images, if a coefficient
value is insignificant (i.e. of small value or zero), then it can be assumed that the
coefficients of higher spatial frequency and same spatial location are insignificant also.
37
In our application, however, we are looking for significance rather than insignificance,
so we look to the significance of lower frequency coefficients to determine significance
of the current coefficient. In this way, the preliminary binary map is refined by both
spatial and scalar support, given by equation 3.14.
The final coefficients retained for reconstruction are given by
L·,k[x, y] =
{λ·,k[x, y], when J·,k[x, y] = 10, else
. (3.15)
The denoised image is reconstructed using the supported coefficients, L,k[·] in the
synthesis equation given in Equation 3.11. Thus,
αll,k[x, y] = 14
∑m,n h[m]h[n]αll,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n h[m]g[n]Lhl,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n g[m]h[n]Llh,k+1 [x− 2k+1m, y − 2k+1n]
+14
∑m,n g[m]g[n]Lhh,k+1 [x− 2k+1m, y − 2k+1n]
. (3.16)
Equation 3.16 is calculated recursively producing scaling coefficients of finer resolution
until k = −1. The denoised image, f(·), is then given by
f(x, y) = αll,−1[x, y]. (3.17)
αll,k[·] are the reconstructed scaling coefficients of scale k.
In general, natural and synthetic imagery can be compactly represented in few
wavelet coefficients of large magnitude. These coefficients are in general spatially
clustered. Thus, it is useful to obtain selection methods based on magnitude and
spatial regularity to distinguish between useful coefficients which are representative
of the image and useless coefficients representative of noise. The two-threshold criteria
for the rejection of noisy wavelet coefficients is a computationally simple, non-iterative
test for magnitude and spatial regularity which can effectively distinguish between
useful and useless coefficients.
38
3.4 Selection of Threshold τ and Support s
The selection of threshold τ and support s is a key component of the denoising
algorithm. Unfortunately, the two parameters cannot be easily determined for a given
corrupted image because there is no information about the decomposition between
the original signal and the noise. We derive τ and s using a set of test images which
serve as training samples. These training samples are artificially corrupted by noise.
The noise is then removed by a series of τ and s. The set of τ and s which generates
the best results is selected for noise removing in general. This approach has its root
in an idea called oracle ([15]) which is described below.
An oracle is an entity which provides extra information to aid in the denoising
process. The extra information provided by the oracle is undoubtedly beneficial in
providing substantially greater denoising results than methods which are not fur-
nished supplemental information. Thus, the coefficient selection method which uses
the oracle’s information is referred to as the optimal denoising method. By the op-
timal denoising method the threshold and support can be selected using test images
of which both original image and noise are known. The selected threshold and sup-
port functions can then be selected for any corrupted images without supplemental
information.
An optimal coefficient selection process has been defined based on the original
(noiseless) image. The optimal binary map Jopt·,k [·] is given by
Jopt·,k [x, y] =
{1, when |λ·,k[x, y]| > σn
0, else, (3.18)
where λ·,k[·] are the wavelet coefficients of the original (noiseless) image, f(·), and
σn is the standard deviation of the noise in the corrupted image, f(·). Thus, the
39
extra information given by the oracle is the noiseless wavelet coefficients, λ·,k[·]. The
coefficients of the original image are used in coefficient selection process, but not
in the image reconstruction. The coefficients which are used in the reconstruction,
Lopt·,k [·], are given by,
Lopt·,k [x, y] =
{λ·,k[x, y], when Jopt
·,k [x, y] = 1
0, else, (3.19)
where λ·,k[·] are the wavelet coefficients of the noisy image.
The optimal coefficient map is used to create the optimal denoised image which
is given by
αoptll,k[x, y] =
14
∑m
∑n h[m]h[n]αopt
ll,k+1[x− 2k+1m, y − 2k+1n]
+14
∑m
∑n h[m]g[n]Lopt
hl,k+1[x− 2k+1m, y − 2k+1n]
+14
∑m
∑n g[m]h[n]Lopt
lh,k+1[x− 2k+1m, y − 2k+1n]
+14
∑m
∑n g[m]g[n]Lopt
hh,k+1[x− 2k+1m, y − 2k+1n]
. (3.20)
Equation 3.20 is recursively computed for lesser values of k until the optimal denoised
image is achieved, where
f opt(x, y) = αoptll,−1[x, y]. (3.21)
αoptll,k[·] are the optimal scaling coefficients, and f opt(·) is the optimally denoised image.
Figure 3.5 gives the denoising results of the optimal denoising method when applied
to the ”Lenna” image corrupted with additive white Gaussian noise (AWGN). As
shown in Figure 3.5, the optimal denoising method is able to effectively remove the
noise from the ”Lenna” image because of the added information given by the oracle.
PSNR is calculated for performance measurement and is given by
PSNR = 20log10
(255√mse
), (3.22)
where
mse =1
WfHf
∑x
∑y
(f(x, y)− f(x, y)
)2
. (3.23)
40
Figure 3.5: Optimal denoising method applied to noisy ”Lenna” image. Left: Cor-rupted image f(x, y), σn = 50, PSNR = 14.16 dB. Right: Optimally denoised imagef opt(x, y), PSNR = 27.72 dB.
mse is the mean-squared error between the original image f(·) and the denoised image
f(·), and Wf and Hf are the width and height of the image, respectively.
PSNR is the most popular quality metric among researchers in the image and
video processing community and has been used almost exclusively in the literature
for more than a decade. However, it is also well know in the community that PSNR is
not always consistent with the human perception of quality. That is, although image
processing method A is shown to give a higher PSNR than image processing method
B, people on average may tend to prefer the results of image processing method B.
Because of this inconsistency, recently there has been research conducted in the
development of new quality metrics which tend to give results which more closely
follow human perception. A metric call QI (quality index) has been developed based
41
not on pixel error as in PSNR, but on loss of correlation, luminance distortion, and
contrast distortion [75]. This method is tested, and the results suggest that QI may
be a better means of quantitative quality measurement than PSNR.
Also, another metric has been developed which suggest even more consistent qual-
ity assessment than both QI and PSNR. The weighted frequency-domain normalized
mean-squared error (W-NMSE) quality metric is based upon wavelet coefficient er-
ror [19]. Results given in [19] suggest that W-NMSE gives results that are closer to
human perception than both PSNR and QI.
In addition to PSNR, QI, and W-NMSE, there are also a number of proprietary
quality metrics available for purchase. So, there is a choice to be made when eval-
uating the performance of an image processing algorithm. The choice made in this
dissertation is to use PSNR, and there is a reason for the decision. The methods of
[19, 75] are very new metrics developed only in the past few years. These metrics may
be substantially better metrics than PSNR, but they have not had time to impact
the literature published by the image and video processing communities. Because
the methods of [19, 75] are new, it is unclear how much of an improvement they
have over PSNR, and until these metrics become more well known and commonplace
among researchers they will not replace PSNR as the quality metric of choice. Also,
the results of methods given in this dissertation are compared to methods developed
previously whose results are given in the literature. These methods all use PSNR as
the performance metric, so we must use PSNR for consistency.
It is rather obvious that the optimal coefficient selection process is unattainable
when no supplemental information is provided by the oracle. Thus the optimal image
42
denoising method is not possible for practical implementation. However, the knowl-
edge obtained by the optimal binary map, Jopt·,k [·], is used to compare with the refined
coefficient map generated by the two-threshold criteria, J·,k[·], described in Section
3.3. The coefficient selection method is based on the error between the optimal coef-
ficient subband and the subband generated by the two-threshold criteria. The error
is given by
Error =
∑p∈{hl,lh,hh},k,x,y
(Jopt
p,k [x, y]⊕ Jp,k[x, y])λ2
p,k[x, y]∑
p∈{hl,lh,hh},k,x,y λ2p,k[x, y]
, (3.24)
where ⊕ is the exclusive OR operation.
In the proposed coefficient selection algorithm, we use a training sample approach.
The approach starts with a series of test images serving as training samples to derive
the functions which determine the optimal set of values for τ and s as well as the type
of wavelet used for denoising. Theoretically, we may represent each training sample
as a vector Vi, i = 1, n. Those training samples should span a space which includes
many similar images corrupted by noise:
S = Span{Vi; i = 1, ..n}. (3.25)
The original data and the statistical distribution of the noise are given for each of
the training samples which are corrupted. The optimal set of parameters can then
be determined for the training samples using the approach described earlier. Ideally,
the space spanned by the training samples contains the types of corrupted images
which are to be denoised. As a result, the same set can generate an optimal or close
to optimal performance for the corrupted images of same type. It is clear that more
training samples will generate parameters suitable for more types of images, while
a space of fewer training samples is suitable for a lesser number of images. In the
43
following, we will use some examples to illustrate this approach. The test images
Figure 3.6: Test images.
are all 256x256 pixels. Shown in Figure 3.6, each of the training sample images is
well known in the image processing community, and collectively represents as many
types of images as possible. Starting from the upper-left image and going clockwise,
the images are ”Lenna”,”Airplane”, ”Fruits”, and ”Girl”. In this way, the τ and s
obtained will likely perform well in most cases.
44
A test is used to demonstrate the effectiveness of different wavelets in denoising.
First, each of the four test images is corrupted with AWGN at various levels. Next,
the 2D non-decimated wavelet transform, given in Section 3.2, is calculated using
several different wavelets. The wavelet coefficients are then hard thresholded using a
threshold T ranging from 0− 150, and the inverse wavelet transform is applied to the
thresholded coefficients. The wavelet which gives the reconstructed images with the
highest average PSNR is chosen to be used in the general case.
Several wavelets were used in the testing. However, for simplicity only five are
presented. We have chosen the Daubechies wavelets [12] (Daub4 and Daub8) for
their smoothness properties, the spline wavelets (first order and quadratic spline) [6]
because of their use in the previous works of [42, 43, 54], and the Haar wavelet because
of its simplicity and compact support. The results are given in Figure 3.7. After the
testing results given in Figure 3.7, the Haar wavelet is selected for image denoising:
h[n] =
{ 1√2, when n = 0, 1
0, elseg[n] =
−1√2, when n = 0
1√2, when n = 1
0, else
. (3.26)
Testing has shown the Haar wavelet to be the most promising in providing the highest
reconstructed image quality. The compact support of the Haar wavelet enables the
wavelet coefficients to represent the least number of original pixels in comparison
with other types of wavelets. Therefore, when a coefficient is removed because of its
insignificance or isolation, the result affects the smallest area of the original image in
the reconstruction, which reduces the impact to the image quality even if a removed
coefficient is not only comprised of noise.
The Haar wavelet is used in a non-decimated wavelet decomposition of the original
image. Three subband levels are used, i.e. k = −1 to 2. The proposed selective
45
0 50 100 15024
26
28
30
32
34
threshold T
PS
NR
(dB
)denoising using different wavelets, σ
n = 10
Haar wavelet1st order Spline waveletQuadradic Spline waveletDaub. 4 waveletDaub. 8 wavelet
0 50 100 15022
24
26
28
30
threshold T
PS
NR
(dB
)
denoising using different wavelets, σn = 20
0 50 100 15018
20
22
24
26
28
threshold T
PS
NR
(dB
)
denoising using different wavelets, σn = 30
0 50 100 15016
18
20
22
24
26
28
threshold T
PS
NR
(dB
)
denoising using different wavelets, σn = 40
Figure 3.7: Average PSNR values using different wavelets.
wavelet shrinkage algorithm is applied to all wavelet subbands, and the subbands are
synthesized by the non-decimated inverse wavelet transform.
Testing for the optimal values of τ and s is accomplished by artificially adding
Gaussian noise to each of the four images, denoising all four images with a particular τ
and s, and recording the average error given by Equation 3.24. Then, the combination
of τ and s which gives the lowest error is the choice for that particular noise level.
The average error is recorded when denoising each of the four test images given
in Figure 3.6 using τ ranging from 0− 150 and s ranging from 0− 20. The proposed
46
algorithm is tested by applying AWGN with a standard deviation (σn) of 10, 20, 30,
40, and 50 to each of the test images. The proposed method of selective wavelet
shrinkage is applied to the corrupted image, and the resulting error is recorded using
Equation 3.24. The results of the testing in which σn = 30 is given in Figure 3.8.
0
50
100
150
0
5
10
15
202
4
6
8
10
12
14
x 10−3
Threshold value, τ
Error Results with noise, σ = 30
spatial support pixels, s
Per
cent
Err
or
Figure 3.8: Error results for test images, σn = 30.
Table 3.1 gives the τ and s which provide the lowest average error for each noise
level tested. These particular values are referred to as τm(·) and sm(·). Table 3.1
47
Noise Level(σn) 10 20 30 40 50
Min. Avg. Error 3E-4 11E-4 24E-4 42E-4 64E-4sm value 5 9 10 15 14τm value 23 43 63 85 108
Table 3.1: Minimum average error of test images for various noise levels and theircorresponding threshold and support values.
suggests that parameters τm(·) and sm(·) are functions of the standard deviation of
the noise, σn.
Because τm(·) and sm(·) generally increase with an increase in additive noise as
shown in Table 3.1, both parameters can be modeled as functions of the additive
noise, σn. Then, knowing the level of noise corruption, the threshold levels which
produce the minimum error, Error, may be obtained by estimating the τm(·) and
sm(·) functions. The five noise levels provided in the test are used as sampling points
for the estimation of the continuous functions τm(·) and sm(·). With enough sampling
points both τm(·) and sm(·) can be effectively estimated, and the correct τ and s can
be calculated to denoise an image with any level of noise corruption, given that the
noise level is known.
The estimated functions of the sampled values τm(·) and sm(·) are referred to as
τm(·) and sm(·), respectively. Once the estimated functions are calculated they are
used in the general case. Thus, given an image corrupted with noise, it is denoised with
no prior knowledge by estimating the level of noise corruption, calculating the proper
thresholds using the τm(·) and sm(·) functions, and using the calculated threshold
levels in the denoising process given in Section 3.3.
48
3.5 Estimation of Parameter Values
It can be shown from the values given in Table 3.1 that the parameters τm(·) and
sm(·) are functions of σn; therefore, we need to estimate the standard deviation of
the noise level, and the functions. These two topics are discussed in this section.
3.5.1 Noise Estimation
The level of noise in a given digital image is unknown and must be estimated from
the noisy image data. Several well known algorithms have been given in the literature
to estimate image noise. From [16, 54] a median value of the λhh,0[·] subband is used
in the estimation process. The median noise estimation method of [54] is used in our
algorithm.
σn =Median(|λhh,0[·]|)
0.6745, (3.27)
where λhh,0[·] are the noisy wavelet coefficients in the high-high band of the 0th scale.
Because the vast majority of useful information in the wavelet domain is confined
to few and large coefficients, the median can effectively estimate the level of noise
(i.e. the average level of the useless coefficients) without being adversely influenced
by useful coefficients.
3.5.2 Parameter Estimation
Using the known level of noise added to the original images, the values of τm(·)
and sm(·), given in Table 3.1, are estimated. One of the simplest and most popular
estimation procedures is the LMMSE (Linear Minimum Mean Squared Error) method,
and it is used as the estimation procedure [68]. That is, two parameters aτ and bτ
49
are found such that
τm(σn) = aτσn + bτ . (3.28)
The choice of aτ and bτ will minimize the mean squared error. Similarly, an estimate
of sm, which must be an integer, is found as:
sm(σn) = basσn + bsc. (3.29)
The parameters which minimize the mean squared error are: aτ = 2.12, bτ = 0.80,
as = 0.26, and bs = 2.81.
The LMMSE estimation procedure gives a simple description of the τm and sm
functions. That is, there are only two values needed (a and b) to be able to determine
the proper thresholds for denoising. The LMMSE estimator also is shown to be a
good fit into the test data given in Figure 3.9. The values of τm(·), and sm(·) are given
as well as their corresponding LMMSE estimates. The LMMSE estimate functions
are the best linear fit into the data. Note that the support value sm must be an
integer.
The threshold τ and the support value s are determined by using the estimate of
the noise given by Equation 3.27. The two thresholds are given by
τ = aτ σn + bτ
s = basσn + bsc . (3.30)
Using this information, a new image denoising algorithm is formalized. With a given
image, the noise level is estimated by Equation 3.27, τ and s are then calculated using
Equation 3.30, and the image is denoised by the method given in Section 3.3.
50
0 10 20 30 40 50 600
20
40
60
80
100
120
140Threshold and support estimation based upon noise level
Noise level (standard deviation, σ)
Thr
esho
ld v
alue
, τ
Threshold level for min. errorThreshold estimate
0 10 20 30 40 50 600
5
10
15
20
Noise level (standard deviation, σ)
Loca
l sup
port
val
ue, s
Local support value for min. errorSupport estimate
Figure 3.9: τm(·), sm(·) and their corresponding estimates, τm(·), sm(·).
3.6 Experimental Results
The ”Peppers” and ”House” images are used for gauging the performance of the
proposed denoising algorithm. These two images have also been used in the results of
[42, 43, 54]. Therefore, the proposed algorithm’s performance is compared with the
performance of other recent algorithms given in the literature. Both the ”Peppers”
and ”House” images are corrupted with AWGN and the proposed method is used for
denoising. The results are given in Figures 3.10 and 3.11.
51
”Peppers”Image Input PSNR 22.6 19.6 16.6 13.6 Average
Proposed Algorithm 31.00 28.98 27.17 25.46 28.15Pizurica 3-band, [54] 30.20 28.60 27.00 25.20 27.75Pizurica 2-band, [54] 29.90 28.20 26.60 24.90 27.40Malfait and Roose, [42] 28.60 27.30 26.00 24.60 26.63Mallat and Hwang, [43] 28.20 27.30 27.10 24.60 26.80Matlab’s Sp. Adaptive Wiener 29.00 27.10 25.30 23.30 26.18
”House”Image Input PSNR 23.9 20.9 17.9 14.9 Average
Proposed Algorithm 33.09 31.55 29.81 28.34 30.70Pizurica 3-band, [54] 32.80 31.30 29.80 28.30 30.55Pizurica 2-band, [54] 32.10 30.50 29.30 28.10 30.00Malfait and Roose, [42] 32.90 31.30 29.80 28.20 30.55Mallat and Hwang, [43] 31.30 30.50 29.10 27.10 29.50Matlab’s Sp. Adaptive Wiener 30.30 28.60 26.70 24.90 27.63
Table 3.2: PSNR comparison of the proposed method to other methods given in theliterature (results given in dB).
Table 3.2 gives the results of the proposed method, as well as the results of
[42, 43, 54]. Note that the methods of [42, 43, 54] all use the quadratic spline wavelet
[6] in three subband levels, and each of the algorithms’ coefficient selection method
is based on a probabilistic formulation to determine how much a particular coeffi-
cient contributes to the overall image quality. The proposed algorithm uses the Haar
wavelet, given in Equation 4.4, in three subband levels, and the coefficient selection
process is based on a geometrical approach. As shown in Table 3.2, the results of the
proposed method are an improvement over other methods described in the literature.
In addition to improved performance, the proposed algorithm is computationally sim-
ple to facilitate real-world applications. The proposed algorithm has been computed
on older processors for an accurate comparison, and the computation time of the
52
Processor Pentium IV Pentium III IBM RS6000/320H
Proposed Algorithm 0.66 1.14 ***Pizurica 3-band, [54] *** 45.00 ***Pizurica 2-band, [54] *** 30.00 ***Malfait and Roose, [42] *** *** 180.00
*** Computation time not evaluated
Table 3.3: Computation times for a 256x256 image, in seconds.
proposed method is an order of magnitude less than the previous method of highest
performance, [54]. Table 3.3 gives the computational results of the proposed method
as well as the results of [42, 54].
The proposed algorithm shows a substantial drop in computation time. Both
[42] and [54] use iterative computation in the selection of wavelet coefficients for
reconstruction which requires unreasonable computation time for certain applications.
The current two-threshold technique is a simpler, non-iterative coefficient selection
method which produces greater performance results.
In addition to obtaining a higher signal-to-noise ratio than established image de-
noising algorithms, the proposed denoising algorithm facilitates image compression
when used as a pre-processing step. That is, the image is first denoised using the pro-
posed method, then compressed by 2D wavelet compression. The ”Peppers” image is
compressed with various quantization step sizes, both with and without the proposed
denoising algorithm. Figure 3.12 gives the compression results.
As given in Figure 3.12, regardless of the quantization step, applying the proposed
denoising algorithm prior to compression improves the compression ratio. However,
pre-processing is most beneficial when the step size is small. This is not surprising,
53
Image Step Size Without Denoising With Denoising
Lenna (512x512) 2 4.82:1, 159.2 kbytes 7.15:1, 107.4 kbytesFruits (512x512) 4 8.66:1, 88.7 kbytes 10.92:1, 70.4 kbytesBarb (512x512) 8 11.67:1, 65.8 kbytes 12.98:1, 59.19 kbytesGoldhill (512x512) 16 24.56:1, 31.3 kbytes 28.30:1, 27.14 kbytesPeppers (512x512) 32 49.28:1, 15.6 kbytes 51.19:1, 15.0 kbytes
Table 3.4: Compression ratios of 2D wavelet compression both with and withoutdenoising applied as a pre-processing step.
however. When a large step size is applied to the wavelet transform subbands, much
of the noise inherent in the image as well as much image content is removed, thus
increasing the compression ratio. However, when a small step size is applied, much of
the inherent noise is included in the compressed image, decreasing the compression
ratio.
Table 3.4 gives the results of 2D wavelet compression of various images both with
and without the denoising algorithm applied as a pre-processing step. As shown in
Table 3.4, when the denoising algorithm is applied to the image prior to compression,
the 2D wavelet compression algorithm achieves better performance. However, the
performance improvement is greater with a smaller quantization step size.
3.7 Discussion
A new selective wavelet shrinkage algorithm for image denoising has been de-
scribed. The proposed algorithm uses a two-threshold support criteria which inves-
tigates coefficient magnitude, spatial support, and support across scales in the coef-
ficient selection process. In general, images can be accurately represented by a few
large wavelet coefficients, and those few coefficients are spatially clustered together.
54
The two-threshold criteria is an efficient and effective way of using the magnitude and
spatial regularity of wavelet coefficients to distinguish useful from useless coefficients.
Furthermore, the two-threshold criteria is a non-iterative solution to selective wavelet
shrinkage to provide a computationally simple solution, facilitating realtime image
processing applications.
The values of the two-thresholds are determined by minimizing the error between
the coefficients selected by the two-thresholds and the coefficients selected by a de-
noising method which uses supplemental information provided by an oracle. The
supplemental information provided by the oracle is useful in determining the cor-
rect coefficients to select, and the denoising performance is substantially greater than
methods which do not use the supplemental information. Thus, the method which
uses the supplemental information provided by the oracle is referred to as the opti-
mal denoising method. Therefore, by minimizing the error between the two-threshold
method and the optimal denoising method, the two-threshold method can come as
close as possible to the performance of the optimal denoising method.
Consequently, the two-threshold method of selective wavelet shrinkage provides an
image denoising algorithm which provides signal-to-noise ratios than previous image
denoising methods given in the literature both in denoised image quality and com-
putation time. The light computational burden of the proposed denoising method
makes it suitable for real-time image processing applications.
55
Figure 3.10: Results of the proposed image denoising algorithm. Top left: Original”Peppers” image. Top right: Corrupted image, σn = 37.75, PSNR = 16.60 dB.Bottom: Denoised image using the proposed method, PSNR = 27.17 dB.
56
Figure 3.11: Results of the proposed image denoising algorithm. Top left: Original”House” image. Top right: Corrupted image, σn = 32.47, PSNR = 17.90 dB. Bottom:Denoised image using the proposed method, PSNR = 29.81 dB.
57
0 5 10 15 20 25 30 35 40 45 500
50
100
150
200
250
quantization step size
Com
pres
sed
file
size
(kB
ytes
)
Compressed file sizes of the "Peppers" image
2−D wavelet compression2−D wavelet compression with pre−processing
Figure 3.12: Wavelet-based compression results with and without pre-processing.
58
CHAPTER 4
Combined Spatial and Temporal Domain Wavelet ShrinkageAlgorithm for Video Denoising
4.1 Introduction
As shown in the introduction of Chapter 3, the process of removing noise in digital
images has been studied extensively [15, 17, 18, 20, 26, 27, 29, 31, 39, 41, 42, 43, 46,
49, 53, 54, 61, 64, 69, 73, 77, 76, 83]. However, until recently, the removal of noise
in video signals has not been studied seriously. Cocchia, et. al., developed a three
dimensional rational filter for noise removal in video signals [10]. The 3D rational
filter is able to remove noise, but preserve important edge information. Also, the
3D rational filter uses a motion estimation technique. Where there is no motion
detected, the 3D rational filter is applied in the temporal domain. Otherwise, only
spatial domain processing is applied.
Later, Zlokolica, et. al., uses two new techniques for noise removal in image
sequences [83]. Both these new techniques show improved results upon the method of
[10]. The first method is an alpha-trimmed mean filter of [4] extended to video signals,
and the second is the K nearest neighbors (KNN) filter. Both alpha-trimmed and
KNN denoising methods are based on ordering the pixel values in the neighborhood
of the location to be filtered, and averaging a portion of those spatially contiguous
59
pixels. Each of these methods attempts to average values which are close in value,
and avoid averaging values which are largely dissimilar in value. Thus, the image
sequence is smoothed without blurring edges.
However, because the success of the wavelet transform over other mathematical
tools in denoising images, some researchers believe that wavelets may be successful
in the removal of noise in video signals as well. Pizurica, et. al., uses a wavelet-
based image denoising method to remove noise from each individual frame in an
image sequence, then applies a temporal filtering process for temporal domain noise
removal [55]. The combination of wavelet image denoising and temporal filtering
outperforms both wavelet based image denoising techniques [42, 43, 54] and spatial-
temporal filtering techniques [4, 10, 83].
The temporal domain filtering technique described in [55] is a linear IIR filter
which will continue to filter until it reaches a large temporal discontinuity. It will not
filter the locations of large temporal discontinuity where the absolute difference in
neighboring pixel values is greater than a threshold, T , thus preserving motion while
removing noise.
Although temporal processing aids in the quality of the original image denoising
method, the parameter T varies with differing video signals for improved performance.
That is, the value of T may be large in sequences where there is little motion for im-
proved noise removal, i.e., there is more redundancy between consecutive frames.
Thus the redundancy may be exploited by a large T to improve video quality. How-
ever, in image sequences where there exists a large amount of motion, consecutive
frames are more independent and there exists little to no redundancy to exploit.
Thus, the parameter T must be small to achieve optimal performance.
60
In the case of video denoising, it has been fairly well documented that the amount
of noise removal achievable from temporal domain processing, while preserving overall
quality, is dependent on the amount of motion in the original video signal [10, 55].
Thus, a robust, high-quality video denoising algorithm is required to not only be
scalable to differing levels of noise corruption, but also scalable to differing amounts
of motion in the original signal. Unfortunately, this principle has not been seriously
considered in video denoising.
In this chapter, we develop a noise removal algorithm for video signals. This algo-
rithm uses selective wavelet shrinkage in all three dimensions of the image sequence
and proves to outperform the few video denoising algorithms given in the relevant
literature in terms of PSNR. First, the individual frames of the sequence are denoised
by the method described in Chapter 3, then a new selective wavelet shrinkage method
is used for temporal domain processing.
Also, a motion estimation algorithm is developed to determine the amount of
temporal domain processing to be performed. Several motion estimators have been
proposed [10, 55], but few are robust to noise corruption. The proposed motion esti-
mation algorithm is robust to noise corruption and an improvement over the motion
estimation method of [10]. The proposed denoising algorithm, including the proposed
motion estimation method, is experimentally determined to be an improvement over
the methods of [10, 55, 83].
Following the Introduction, Section 4.2 describes the temporal domain wavelet
shrinkage method and explores the proper order of temporal and spatial domain
processing functions. Section 4.3 provides the proposed motion estimation index
used in the temporal domain processing and compares it with the motion estimation
61
method of [10]. Section 4.4 develops the parameters for temporal domain processing,
and Section 4.5 gives the experimental results of the proposed method as well as other
established methods. Section 4.6 gives the discussion.
4.2 Temporal Denoising and Order of Operations
In this section, we develop the principal algorithm for video denoising. Additional
mechanisms required by this algorithm will be discussed in latter sections.
4.2.1 Temporal Domain Denoising
Let us define f zl as a pixel of spatial location l and frame z in a given image
sequence. The non-decimated wavelet transform applied in the temporal domain is
given by:
λ3Dk+1[l, z] =
∑p
g[p]α3Dk [l, 2k+1p− z], (4.1)
and
α3Dk+1[l, z] =
∑p
h[p]α3Dk [l, 2k+1p− z], (4.2)
where
α3D−1 [l, z] = f z
l . (4.3)
λ3Dk [l, z] is the high-frequency wavelet coefficient of spatial location l, frame z and scale
k. Also, α3Dk [l, z] is the low-frequency scaling coefficient of spatial location l, frame z
and scale k. Thus, multiple resolutions of wavelet coefficients may be generated from
iterative calculation of Equations 4.1 and 4.2.
62
The wavelet function used in the temporal domain denoising process is the Haar
wavelet given by
h[n] =
{ 1√2, when n = 0, 1
0, elseg[n] =
−1√2, when n = 0
1√2, when n = 1.
0, else
(4.4)
The decision to use the Haar wavelet is based on experimentation with several other
wavelet functions and finding the greatest results with the Haar. The compact support
of the Haar wavelet makes it a suitable function for denoising applications. Because
of it’s compact support, the Haar coefficients represent least number of original pixels
in comparison to other types of wavelets. Thus, when a coefficient is removed because
of its insignificance, the result affects the smallest area of the original signal in the
reconstruction.
Significant wavelet coefficients are selected by their magnitude with a threshold
operation.
L3Dk [l, z] =
{λ3D
k [l, z], when |λ3Dk [l, z]| > τz[l],
0, else, (4.5)
where L3Dk [·] are the thresholded wavelet coefficients used in signal reconstruction,
and τz[·] is the threshold value. The resulting denoised video signal is computed via
the inverse non-decimated wavelet transform
α3Dk [l, z] = 1
2
∑p h[p]α3D
k+1[l, z − 2k+1p]
+12
∑p g[p]L3D
k+1[l, z − 2k+1p], (4.6)
which leads to
f z,3Dl = α3D
−1 [l, z]. (4.7)
f z,3Dl is the temporally denoised video signal.
63
4.2.2 Order of Operations
With a spatial denoising technique and a temporal denoising technique established
in Chapter 3 and above, respectively, there still remains the question of the order of
operations. The highest quality may occur with temporal domain denoising followed
by spatial domain (TFS) denoising, or spatial denoising followed by temporal (SFT)
denoising.
Theoretically, is it not possible to prove and determine which operation is better
because the description of the noise is not known. However, it is our hypothesis
that SFT denoising can more aptly determine noise from signal information. The
reasoning behind this hypothesis is that removing noise in the spatial domain is a
well known process, and any noise removal prior to temporal domain processing is
helpful in discriminating between the residual noise and motion in the image sequence.
However, a validation of this hypothesis is determined heuristically.
Thus, a test is conducted using two video signals. The first video signal is one
which contains little motion, and the other contains a great deal of motion. The
selected image sequences are the ”CLAIRE” sequence from frame #104-167 and the
”FOOTBALL” sequence from frame #33-96.
Both of the image sequences are denoised with τ and τz ranging from 0 − 30 for
both TFS and SFT denoising operations. Note that in the test, τz is a single value
and spatially independent, unlike the temporal threshold, τz[·], which is used in the
final denoising algorithm, dependent upon spatial position, and given in Equation 4.5.
Also, the s parameter for feature selection in the image denoising method described
in Section 3.3 is calculated by taking Equation 3.30 and solving for s. The parameter
64
s is given by:
s =
⌊as
aτ
(τ − bτ ) + bs
⌋. (4.8)
Also, the number of resolutions of the non-decimated wavelet transform used in both
the spatial and temporal denoising methods is k = 1...5. The average PSNR of each
trial is recorded. The PSNR of an image is given by Equation 3.22.
Figure 4.1 gives the results of testing. As shown in Figure 4.1, the highest av-
erage PSNR is achieved by SFT denoising; first spatially denoising each frame of
the sequence followed by temporal domain denoising. Thus, for the proposed de-
noising method, spatial domain denoising occurs prior to temporal domain denoising,
exclusively.
In addition to a higher average PSNR, there is another benefit to SFT denoising.
The level of motion in an image sequence is known to be crucial in determining the
amount of noise reduction possible from temporal domain processing, and a motion
index calculation is inevitably done by comparing consecutive frames to one another.
Thus, let us define a noisy image sequence where f zl is a corrupted pixel in spatial
position l and frame z and is defined by
f zl = f z
l + ηzl , (4.9)
where f zl is the noiseless pixel value, and ηz
l is the noise function. We can compare
consecutive frames by taking the difference as in [10, 55] to find
f zl − f z+1
l = ∆f zl + ∆ηz
l . (4.10)
Thus by taking the difference between frames to find the level of motion, the noise
function is subtracted from itself, in effect doubling the level of noise corruption [68].
Therefore, by applying spatial denoising prior to motion index calculation we can
65
reduce the value of ∆ηzl and provide a more precise calculation of the motion given
in the image sequence.
4.3 Proposed Motion Index
A motion index is important in the success of a video denoising method in order
to discriminate between large temporal variances in the video signal which are caused
by noise and large temporal variances which are caused my motion in the original
(noiseless) signal. A motion index is able to aid temporal denoising algorithms to
eliminate the large temporal variances caused by noise while preserving the temporal
variances caused by motion in the original image sequence, creating a higher quality
video signal. That is, the motion index is used to determine τz[·].
4.3.1 Motion Index Calculation
Several works have developed a motion estimation index to determine the amount
of temporal domain processing to perform, i.e., the amount of information that can
be removed from the original signal to improve the overall quality [10, 55]. However,
neither of these proposed indices are robust to noise corruption, which is an important
feature in a motion index. There are a few characteristics that a motion index must
possess. One, a motion index should be a localized value. The reasoning behind a
localized motion index is because the amount of motion may vary in different spatial
portions of an image sequence. Thus the motion index should be able to identify
those differences. Two, a motion index needs to be unaffected by the amount of
noise corruption in a given video signal. A motion index should be robust to noise
corruption to aptly determine the proper amount of temporal domain processing.
66
Thus, a localized motion index is developed which is relatively unaffected by the
level of noise corruption in the original image sequence. A spatially averaged temporal
standard deviation (SATSD) is used as the index of motion. Spatial averaging is used
to remove the noise inherent in the signal, and the temporal standard deviation is
used to detect the amount of activity in the temporal domain.
Let us define f z,2Dl as pixel value in the spatial location l of the zth frame of an
image sequence already processed by the 2D denoising method given in Chapter 3.
The spatial averaging of the spatially denoised signal is given by
Azl =
1
B2
∑i∈I
f z,2Di , (4.11)
where I is the set of spatial locations which form a square area centered around spatial
location l, and B2 is the number of spatial locations contained in I; typically, B = 15.
The value of B must be an odd value to allow for the square area to set centrally
around spatial location l. This average is used to find the standard deviation in the
temporal domain.
µl =1
F
F∑i=1
Ail, (4.12)
and
Ml =
√√√√ 1
F
F∑i=1
(Ail − µl)2. (4.13)
Ml is the localized motion index, F is the number of frames in the image sequence,
and µl is the temporal mean of the spatial average at location l.
4.3.2 Motion Index Testing
The ”FOOTBALL” and ”CLAIRE” image sequences are used once more to test
the proposed motion index as well as the motion index given in [10], and two specific
67
spatial locations are selected from each sequence: a location where there is little to no
motion present, and a location where motion is present. A frame from each of the two
image sequences is given in Figure 4.2, and the four spatial locations for evaluation
of the proposed motion index are highlighted.
The two sequences are corrupted with various levels of noise, and the motion
is estimated at each of the four spatial locations selected with both the proposed
motion index and that of [10]. The results of the motion index used in [10] is given
in Figure 4.3. As shown in Figure 4.3, the motion index of [10] is not robust to noise
corruption. That is, the motion calculation from the same spatial location increases
with an increase in noise. Also, the motion index shows the ”FOOTBALL” image
sequence (x = 300, y = 220) as having a higher motion index than the ”CLAIRE”
image sequence (x = 40, y = 200) with zero noise corruption. However, the motion
index shows the opposite results with higher levels of noise. Thus, the motion index
gives conflicting results with the introduction of noise.
The results of the proposed SATSD motion index are given in Figure 4.4. As
shown in Figure 4.4, the proposed motion index is much more robust to varying noise
levels, and the order of locations from highest to lowest motion is what one would
believe is correct. The location with the lowest motion index is in the ”CLAIRE”
image sequence where there is no camera motion, and there are no moving objects in
that spatial location. The next lowest motion location is in the ”FOOTBALL” image
sequence in the spatial location where there are no moving objects. However, there
is some slight camera motion in the sequence, so the motion index is slightly higher
than in the ”CLAIRE ”image sequence. The location with the next highest motion
index is the center of the ”CLAIRE” image sequence, where there is some motion
68
due to movement of the head, and the location with the highest motion index is the
”FOOTBALL” image sequence in the spatial location where many objects cross.
4.4 Temporal Domain Parameter Selection
The amount of temporal denoising which is beneficial to an image sequence is
dependent upon the amount of noise corruption as well as the amount of motion.
Thus, the threshold τz[·] is given by
τz[l] = ασn + βMl (4.14)
where Ml is the motion index of spatial position l, and σn is the estimated noise stan-
dard deviation of the image sequence. The two parameters α and β are determined
experimentally using test image sequences.
In the proposed coefficient selection method, we use a training sample approach.
The approach starts with a series of test image sequences serving as training samples
to derive the functions which determine the optimal set of the values for α and β.
Theoretically, we may represent each training sample as a vector Vi, i = 1, n. Those
training samples should span a space which covers more corrupted image sequences
than the training samples:
S = Span{Vi; i = 1, ..n}. (4.15)
The original data and the statistical distribution of the noise are given for each of
the training samples which are corrupted. The optimal set of parameters can then be
determined which give the highest average PSNR for the training samples. Ideally,
the space spanned by the training samples contains the type of the corrupted image
sequences which are to be denoised. As a result, the same parameter set can generate
69
optimal or close to optimal performance for the corrupted image sequences of the
same type. It is clear that more training samples will generate parameters suitable
for more types of image sequences, while a space of fewer training samples is suitable
for fewer types of image sequences.
In order to obtain an estimate of the noise level, σn, an average is taken from
the noise estimates of each frame in the image sequence, given by Equation 3.27. It
is reasonable to assume an IID (Independent, Identically Distributed) model for the
level of noise for each pixel position since noise in each pixel position is generated by
individual sensing units of the image sensor such as CCD [25] which are independent.
As a result, the estimate of the standard deviation of the noise (σn) in each image also
represents the standard deviation of the noise in the temporal domain. Therefore,
we can use the estimate of the noise in the spatial domain to estimate that in the
temporal domain.
It should be pointed out that after the denoising has occurred in the spatial domain
using the SFT method, the standard deviation of the noise is significantly reduced.
That reduction is statistically equal to each frame. As a result, the estimated noise
in the spatial domain can still be nominally used for noise reduction in the temporal
domain as the reduction of σn can be automatically absorbed by α.
The sequences ”CLAIRE”, ”FOOTBALL”, and ”TREVOR” are used for α and β
selection. Each of the image sequences are corrupted with differing levels of noise cor-
ruption (σn = 10, 20) and denoised with the SFT denoising method where Equation
4.14 is used as the temporal domain threshold. Values of α and β are used ranging
from α = 0 to 3.0 and β = −0.3 to 0.3. The results of this testing is given in Figure
4.5. As shown in Figure 4.5 the maximum average PSNR is achieved when α = 0.9
70
and β = −0.11. The result is reasonable, of course, because as the motion increases
in an image sequence the redundancy between frames decreases, and the benefits of
temporal domain processing decrease. Thus, as the testing has shown, the temporal
domain threshold decreases as the motion increases.
4.5 Experimental Results
The proposed video denoising algorithm first is applied to each of the video frames
individually and independently. The method developed in Chapter 3 is used to denoise
single images, and is used as the spatial denoising portion of the wavelet-based video
denoising algorithms.
The video signal is then denoised in the temporal domain by the method developed
in Sections 4.2 and 4.4. The temporal denoising algorithm is a selective shrinkage
algorithm which uses a proposed motion estimation index to determine the temporal
threshold, τz[·]. The temporal threshold is modified by the motion index to effectively
eliminate temporal domain noise while preserving important motion information.
Three image sequences are used to determine the effectiveness of the proposed
video denoising method. They are the ”SALESMAN” image sequence, the ”TENNIS”
image sequence, and the ”FLOWER” image sequence. These three sequences are all
corrupted with various levels of noise and denoised with the methods of [10, 55, 83] as
well as the proposed method. Please note that only the temporal domain denoising
algorithm of [55] is being tested. The spatial domain denoising method given in
Chapter 3 is used for all the wavelet-based video denoising methods. The results are
given in Figures 4.6 through 4.11. As shown in Figures 4.6 through 4.11, the
proposed method consistently outperforms the other methods presented. In all cases,
71
the proposed denoising method has a higher average PSNR than all other denoising
methods tested. Also, note that in the method of [55], the threshold T changes
due to video content and noise level to obtain the highest average PSNR using that
particular method. In the proposed method, the temporal domain threshold, τz[·], is
automatically calculated due to estimates of the noise level and motion.
Figures 4.12 through 4.17 give an example of the effectiveness of each of the
denoising methods. Figure 4.12 gives the original frame #7 of the SALESMAN image
sequence, and Figure 4.13 gives frame #7 corrupted with noise. Frames 4.14 through
4.17 give frame #7 denoised by each of the methods mentioned in this section.
In addition to obtaining a higher signal-to-noise ratio than established video de-
noising algorithms, the proposed denoising algorithm facilitates the compression of
video signals when used as a pre-processing step. That is, the image sequence is first
denoised using the proposed method, then compressed by 3D wavelet compression.
The ”CLAIRE” image sequence is compressed with various quantization step sizes,
both with and without the proposed denoising algorithm. Figure 4.18 gives the com-
pression results. As given in Figure 4.18, regardless of the quantization step, applying
the proposed denoising algorithm prior to compression improves the compression ra-
tio. However, pre-processing is most beneficial when the step size is small.
Table 4.1 gives the results of 3D wavelet compression of various image sequences
both with and without the denoising algorithm applied as a pre-processing step.
72
0 10 20 300102030
28.5
29
29.5
30
30.5
τz
FOOTBALL Image Sequence. SFT Denoising.
τ
Ave
rage
PS
NR
(dB
)
0 10 20 300102030
28.5
29
29.5
30
30.5
τz
FOOTBALL Image Sequence. TFS Denoising.
τA
vera
ge P
SN
R (
dB)
0 10 20 30 010203030
32
34
36
38
40
ττz
CLAIRE Image Sequence. SFT Denoising.
Ave
rage
PS
NR
(dB
)
0 10 20 30 010203030
32
34
36
38
40
ττz
CLAIRE Image Sequence. TFS Denoising.
Ave
rage
PS
NR
(dB
)
Figure 4.1: Test results of both TFS and SFT denoising methods. Upper left: FOOT-BALL image sequence, SFT denoising, max. PSNR = 30.85, τ = 18, τz = 12. Upperright: FOOTBALL image sequence, TFS denoising, max. PSNR = 30.71, τ = 18,τz = 12. Lower left: CLAIRE image sequence, SFT denoising, max. PSNR = 40.77,τ = 19, τz = 15. Lower right: CLAIRE image sequence, TFS denoising, max. PSNR= 40.69, τ = 15, τz = 21.
73
Figure 4.2: Spatial positions of motion estimation test points. Left: FOOTBALLimage sequence, frame #96. Right: CLAIRE image sequence, frame #167.
0 5 10 15 20 25 30 35 400
5
10
15
20
25
30
35
40
45Local Motion Estimate of [10] for Varying Noise Levels
Noise Std. Dev.
Mot
ion
Est
imat
e
Claire image sequence (frames 104−167), pos: x=40, y=200Claire image sequence (frames 104−167), pos: x=180, y=144Football image sequence (frames 33−96), pos: x=300, y=220Football image sequence (frames 33−96), pos: x=160, y=120
Figure 4.3: Motion estimate given in [10] of image sequences, CLAIRE and FOOT-BALL.
74
0 5 10 15 20 25 30 35 400
5
10
15
20
25Proposed Local Motion Estimate for Varying Noise Levels
Noise Std. Dev.
Mot
ion
Est
imat
e, M
l
Claire image sequence (frames 104−167), pos: x=40, y=200Claire image sequence (frames 104−167), pos: x=180, y=144Football image sequence (frames 33−96), pos: x=300, y=220Football image sequence (frames 33−96), pos: x=160, y=120
Figure 4.4: Proposed motion estimate of image sequences, CLAIRE and FOOTBALL.
010
2030
4050
60
0
5
10
15
20
25
3030.5
31
31.5
32
32.5
33
33.5
β*100+31
Average PSNR for image sequences used in test varying α and β
α*10+1
PS
NR
(dB
)
Figure 4.5: α and β parameter testing for temporal domain denoising.
75
5 10 15 20 25 30 35 40 45 5030
30.5
31
31.5
32
32.5
33
33.5
34
34.5
35
PS
NR
(dB
)
Frame number
SALESMAN image sequence, std. = 10
Proposed methodPizurica (T=20)2D wavelet filter3D KNN filter3D rational filter
Figure 4.6: Denoising methods applied to the SALESMAN image sequence, std. =10.
76
5 10 15 20 25 30 35 40 45 5028
28.5
29
29.5
30
30.5
31
PS
NR
(dB
)
Frame number
SALESMAN image sequence, std. = 20
Proposed methodPizurica (T=40)2D wavelet filter3D KNN filter3D rational filter
Figure 4.7: Denoising methods applied to the SALESMAN image sequence, std. =20.
20 40 60 80 100 120 14024
25
26
27
28
29
30
31
32
33
34
PS
NR
(dB
)
Frame number
TENNIS image sequence, std. = 10
Proposed methodPizurica (T=20)2D wavelet filter3D KNN filter3D rational filter
Figure 4.8: Denoising methods applied to the TENNIS image sequence, std. = 10.
77
20 40 60 80 100 120 14024
25
26
27
28
29
30
31
PS
NR
(dB
)
Frame number
TENNIS image sequence, std. = 20
Proposed methodPizurica (T=40)2D wavelet filter3D KNN filter3D rational filter
Figure 4.9: Denoising methods applied to the TENNIS image sequence, std. = 20.
5 10 15 20 25 30 35 40 45 5022
23
24
25
26
27
28
29
30
31
PS
NR
(dB
)
Frame number
FLOWER image sequence, std. = 10
Proposed methodPizurica (T=10)2D wavelet filter3D KNN filter3D rational filter
Figure 4.10: Denoising methods applied to the FLOWER image sequence, std. = 10.
78
5 10 15 20 25 30 35 40 45 5022
22.5
23
23.5
24
24.5
25
25.5
26
PS
NR
(dB
)
Frame number
FLOWER image sequence, std. = 20
Proposed methodPizurica (T=20)2D wavelet filter3D KNN filter3D rational filter
Figure 4.11: Denoising methods applied to the FLOWER image sequence, std. = 20.
Figure 4.12: Original frame #7 of the SALESMAN image sequence.
79
Figure 4.13: SALESMAN image sequence corrupted, std. = 20, PSNR = 22.10.
Figure 4.14: Results of the 3D K-nearest neighbors filter, [83], PSNR = 28.42.
80
Figure 4.15: Results of the 2D wavelet denoising filter, given in Chapter 3, PSNR =29.76.
81
Figure 4.16: Results of the 2D wavelet filtering with linear temporal filtering, [55],PSNR = 30.47.
Figure 4.17: Results of the proposed denoising method, PSNR = 30.66.
82
0 2 4 6 8 10 12 14 16 18 200
1
2
3
4
5
6
7
8
9x 10
6
Quantization step size
Com
pres
sed
file
size
(kB
ytes
)
Compressed files sizes of the "CLAIRE" image sequence
3D wavelet compression3D wavelet compression with pre−processing
Figure 4.18: Wavelet-based compression results with and without pre-processing.
83
Image Sequence Step Size Without Denoising With Denoising
CLAIRE (360x288x168) 2 15.12:1, 3.29 Mbytes 31.72:1, 1.57 MbytesFOOTBALL (320x240x97) 4 6.45:1, 3.30 Mbytes 7.95:1, 2.68 MbytesMISSA (360x288x150) 8 33.10:1, 1.34 Mbytes 66.93:1, 0.68 MbytesCLAIRE (360x288x168) 16 137.2:1, 0.38 Mbytes 170.0:1, 0.30 MbytesMISSA (360x288x150) 32 198.2:1, 0.23 Mbytes 273.6:1, 0.17 Mbytes
Table 4.1: Compression ratios of 3D wavelet compression both with and withoutdenoising applied as a pre-processing step.
As shown in Table 4.1, when the denoising algorithm is applied to an image se-
quence prior to compression, the 3D wavelet compression algorithm achieves better
performance. However, the performance improvement is greater with a smaller quan-
tization step size.
4.6 Discussion
In this chapter, a new combined spatial and temporal domain wavelet shrinkage
method is developed for the removal of noise in video signals. The proposed method
uses a geometrical approach to spatial domain denoising to preserve edge information,
and a newly developed motion estimation index for selective wavelet shrinkage in the
temporal domain.
The spatial denoising technique is a selective wavelet shrinkage algorithm devel-
oped in Chapter 3 and is shown to obtain a higher average PSNR than other wavelet
shrinkage denoising algorithms given in the literature both in denoised image quality
and computation time. The temporal denoising algorithm is also a selective wavelet
shrinkage algorithm which uses a motion estimation index to determine the level of
thresholding in the temporal domain.
84
The proposed motion index is experimentally determined to be more robust to
noise corruption than other methods, and is able to help determine the threshold
value for selective wavelet shrinkage in the temporal domain. With the motion index
and temporal domain wavelet shrinkage, the proposed video denoising method is
experimentally proven to provide higher average PSNR than other methods given
in the literature for various levels of noise corruption applied to video signals with
varying amounts of motion.
85
CHAPTER 5
Virtual-Object Video Compression
5.1 Introduction
The finalized version of the MPEG-4 standard was published in December of 1999.
The basis of coding in MPEG-4 is not a processing macroblock, as in MPEG-1 and
MPEG-2, but rather an audio-visual object [3]. Object based compression techniques
have certain advantages, such as:
1) Allowing more user interaction with video content.
2) Allowing the reuse of recurring object content.
3) Removal of artifacts due to the joint coding of objects.
Although MPEG-4 does specify the advantages of object-based compression and
provides a standard of communication between sender and receiver, it does not provide
the means by which a) the content is separated into audio-visual objects, or b) the
audio-visual objects are compressed. Since the publication of the MPEG-4 standard,
much research has been conducted in the areas of shape coding [28, 40, 79] and texture
coding [36, 78] of arbitrarily shaped objects, and methods of object identification and
tracking [11, 23, 80].
However, although some success has been achieved in the various components
necessary for the implementation of an object-based compression method, no such
86
compression method exists to date. The reason that a robust, object-based compres-
sion method does not exist is two-fold. One, robust multiple object identification
and tracking methods have yet to be developed. The identification and tracking of
all objects that exist in a given image sequence is difficult, and the object extraction
and tracking technologies given in the literature are not mature enough to handle the
task. Two, it is unknown whether the additional bit savings achieved by object-based
compression will be greater than the added overhead of shape coding of objects to
provide an overall compression gain.
Thus, a wavelet-based compression method is presented to provide some of the
benefits of object-based compression methods without the difficulties of true object-
based compression. An object-based wavelet compression algorithm, called virtual-
object compression, is developed for high quality, low bit-rate video.
Virtual-object compression separates the portion of video that exhibits motion
from the portion of the video that is stationary. The stationary video portion is
then grouped as the background, and the portion of the video which exhibits motion
is grouped as the virtual-object. After separation, both background and virtual-
object are coded independently by means of 2D wavelet compression and 3D wavelet
compression, respectively.
There are two separate processing areas in object-based compression. Object
extraction is the method of separating different objects in an image sequence, and the
compression of those objects is a method of compressing arbitrarily shaped objects.
In the virtual-object compression method, the wavelet transform is used for both
object extraction and compression.
87
When the wavelet transform is applied in the temporal domain, the motion of ob-
jects is detected by large coefficient values. Therefore, the wavelet transform is used
in the identification and extraction of moving objects prior to object-based compres-
sion. Virtual-object based compression uses the non-decimated wavelet transform in
the temporal domain in the separation of objects and stationary background.
Virtual-object compression also restricts the shape of the virtual-object to be
rectangular. This restriction enables the use of known video compression methods
such as 3D wavelet compression for the compression of the virtual-object. Also,
with a rectangular object restriction, the location and shape of the object can be
completely defined with only two sets of spatial coordinates (the starting horizontal
and vertical locations of the virtual-object, and the width and height of the virtual-
object), virtually eliminating shape coding overhead.
Experimental results show the virtual-object compression method to be superior
in compression ratio and PSNR when compared to both 2D wavelet compression and
3D wavelet compression.
The organization of this chapter is as follows. Following the Introduction, Sec-
tion 5.2 gives a description of 3D wavelet compression. 3D wavelet compression is
a known compression method of video signals [21, 24] and is used to test the effec-
tiveness of virtual-object compression. Section 5.3 describes the virtual-object com-
pression method, and Section 5.4 gives the performance results of both virtual-object
compression and 3D wavelet compression. Section 5.5 gives the discussion.
88
5.2 3D Wavelet Compression
To show the improvement of the virtual-object compression method to more tra-
ditional compression methods based on macroblocks or frames, we briefly describe a
known compression method called 3D wavelet compression, which is an extension to
the well known image compression method, 2D wavelet compression. A block dia-
gram of 3D wavelet compression is given in Figure 5.1, the components of which are
as follows:
Figure 5.1: 3D wavelet compression.
5.2.1 2D Wavelet Transform
The first processing block of 3D wavelet compression is the spatial transformation
of each of the frames of the image sequence into the wavelet domain. This processing
block is referred to as 2D wavelet transformation.
First, let us define a 3 dimensional video signal f(·), where f(x, y, z) is a pixel
in the image sequence of horizontal position x, vertical position y and frame z. The
dimensions of f(·) are width Wf , height Hf , and frames F . f(·) is a processing unit
89
referred to as a group of frames (GoF). The 2D wavelet transform of f(·) is given by:
all,k+1 [x, y, z] =∑
n
∑m h[n]h[m]all,k[m− 2x, n− 2y, z]
dlh,k+1 [z, y, z] =∑
n
∑m g[n]h[m]all,k[m− 2x, n− 2y, z]
dhl,k+1 [z, y, z] =∑
n
∑m h[n]g[m]all,k[m− 2x, n− 2y, z]
dhh,k+1 [z, y, z] =∑
n
∑m g[n]g[m]all,k[m− 2x, n− 2y, z]
, (5.1)
where
all,−1[x, y, z] = f(x, y, z). (5.2)
d·,k[·] and all,k[·] are the wavelet and scaling coefficients of subband level k, respec-
tively. The subband level k ranges from [−1, KM), where KM is the 2D multires-
olution level (MRlevel). h[·] is the low-pass scaling filter, and g[·] is the high-pass
wavelet filter. The subscript designations of the coefficients, ll, lh, hl, hh describe the
horizontal and vertical processing in the coefficient construction. For example, dhl,k[·]
is obtained by first high-pass filtering all,k−1[·] with g[·] in the horizontal dimension,
and then low-pass filter the result with h[·] in the vertical dimension.
The type of wavelet used in all the given results is the FT wavelet, or 5/3 wavelet.
The FT wavelet is given by
h[·] = {−18, 1
4, 3
4, 1
4,−1
8}
g[·] = {12,−1, 1
2} . (5.3)
The FT wavelet is chosen because it has shown to give the best overall quality for
a given compression ratio for a wavelet which produces only integer coefficients [74].
Note, the benefits of integer wavelet coefficients is both a reduced computational
complexity and memory requirement.
After the coefficients are transformed in the spatial domain, they are then quan-
tized to represent the coefficients with no more precision than necessary to obtain the
desired reconstructed quality.
90
5.2.2 2D Quantization
After the GoF has been 2D wavelet transformed, the coefficients are quantized
uniformly across all subbands in the case of orthonormal wavelet transformation.
However, the wavelet transform used in the given 3D wavelet compression algorithm
is biorthogonal to facilitate integer computation and fast compression. Therefore, the
quantization level is modified according to scale. That is,
all,k[x, y, z] = Int(
2k+1all,k[x,y,z]
s2
)
dlh,k[z, y, z] = Int(
2kdlh,k[x,y,z]
s2
)
dhl,k[z, y, z] = Int(
2kdhl,k[x,y,z]
s2
)
dhh,k[z, y, z] = Int(
2k−1dhh,k[x,y,z]
s2
), (5.4)
where s2 is the 2D quantization step size, and all,k[·], dlh,k[·], dhl,k[·], and dhh,k[·] are the
2D quantized coefficient values. For more information on orthogonal and biorthogonal
wavelets, refer to [12].
After all frames in the GoF have been spatially transformed and quantized, they
are then transformed in the temporal domain to exploit intra-frame redundancy.
This is generally referred to as 3D wavelet transformation. The temporal domain
transformation generally allows for greater compression, given that the frames in the
GoF are similar.
5.2.3 3D Wavelet Transform
The 3D wavelet transform is given by:
d3Dζ,k,j+1
[x, y, z] =∑
p g[z]a3Dζ,k,j[x, y, p− 2z]
a3Dζ,k,j+1
[z, y, z] =∑
p h[z]a3Dζ,k,j[x, y, p− 2z]
, (5.5)
where
a3Dζ,k,−1[x, y, z] = dζ,k[x, y, z]. (5.6)
91
In Equations 5.5 and 5.6, ζ ∈ {lh, hl, hh}, and a3Dζ,k,j[·] and d3D
ζ,k,j[·] are the scaling and
wavelet subbands of spatial scale k and temporal scale j. The superscript indicator
3D denotes 3D wavelet transformation, and j is the subband level in the temporal
domain, which ranges from [−1, JM), where JM is the 3D MRlevel.
For the ll band of the 2D transform, all,k[·], we have
d3Dll,k,j+1
[x, y, z] =∑
p g[z]a3Dll,k,j[x, y, p− 2z]
a3Dll,k,j+1
[z, y, z] =∑
p h[z]a3Dll,k,j[x, y, p− 2z]
, (5.7)
where
a3Dll,k,−1[x, y, z] = all,k[x, y, z]. (5.8)
Note that in Equations 5.5 and 5.7 all 2D wavelet coefficients which are processed with
the g[·] filter are designated as 3D wavelet coefficients, d3D· [·], and all the 2D coeffi-
cients which are processed with the h[·] filter are designated as 3D scaling coefficients,
a3D· [·].
As followed by the 2D wavelet transformation, the 3D wavelet coefficients are
quantized once they are obtained.
5.2.4 3D Quantization
The 3D wavelet and scaling coefficients are quantized by
d3Dll,k,j[x, y, z] = Int
(s22k+1
√2
jd3D·,k,j [x,y,z]
s3
)a3D
ll,k,j[z, y, z] = Int
(s22k+1
√2
j+1d3D·,k,j [x,y,z]
s3
)
d3Dlh,k,j[x, y, z] = Int
(s22k
√2
jd3D·,k,j [x,y,z]
s3
)a3D
lh,k,j[x, y, z] = Int
(s22k
√2
j+1d3D·,k,j [x,y,z]
s3
)
d3Dhl,k,j[x, y, z] = Int
(s22k
√2
jd3D·,k,j [x,y,z]
s3
)a3D
hl,k,j[x, y, z] = Int
(s22k
√2
j+1d3D·,k,j [x,y,z]
s3
)
d3Dhh,k,j[x, y, z] = Int
(s22k−1
√2
jd3D·,k,j [x,y,z]
s3
)a3D
hh,k,j[x, y, z] = Int
(s22k−1
√2
j+1d3D·,k,j [x,y,z]
s3
),
(5.9)
92
where s3 is the 3D quantization level. Again, if the transform used in compression
is an orthonormal transform, the scalings of Equation 5.9 would not be necessary.
However, the bi-orthogonal wavelet transform requires an adjustment by subband
level.
The quantization levels s2 and s3 are left to the user to determine. The relation-
ship between s2 and s3 is an important one, however. If s3 is significantly larger than
s2, unwanted temporal artifacts may result in the reconstructed signal. Therefore,
it is recommended to maintain s3 ≤ s2. Also, there is specific reasoning to why
two quantization processes are necessary. It is known that the statistical properties
of the horizontal and vertical dimensions in a video signal are similar to each other
but differ from the time dimension [23]. Thus, a different quantization step applied
to the spatial and temporal domains is reasonable. Also, it is well known that the
quantization step leads to artifact generation in signal reconstruction. However, the
artifacts that appear from quantization of the 2D wavelet coefficients and the 3D
wavelet coefficients are perceptibly vastly different. The quantization of spatial do-
main wavelet coefficients leads to blurring and softening of the video signal, while
the quantization of the 3D wavelet coefficients leads to ”trails” of moving objects
from frame to frame. Thus, to mitigate the differing types of artifacts generated from
wavelet transformation in the two domains, two quantization step sizes are necessary.
Also, the above formulation of the 2D and 3D wavelet transform is not consistent
with the traditional symmetric wavelet transformation of a 3-dimensional signal. In
the symmetric case, each dimension is transformed at a certain MRlevel level, and
the lowest subband is then processed further for the next MRlevel. In the above for-
mulation, however, the wavelet transform is applied in the spatial domain through all
93
subbands, and only afterwards is applied in the temporal domain. This is referred to
as the decoupled 3D wavelet transform, and it is the preferred wavelet transformation
method for video compression [5, 21, 24, 35].
A visual difference between the 2D wavelet transform and 3D wavelet transform
(both symmetric and decoupled) can be shown when viewing the differing sizes and
shapes of the various subbands that are calculated. Figure 5.2 gives the size and
shapes of each of the subbands calculated by the various wavelet transforms. The 2D
Figure 5.2: Starting from left to right. 1) Original three-dimensional video signal. 2)2D wavelet transform (KM = 2 and JM = 0). 3) Symmetric 3D wavelet transform 4)Decoupled 3D wavelet transform (KM = 2 and JM = 2).
wavelet transform, shown in Figure 5.2, applies no temporal domain processing, thus
there are no segmentation lines crossing the temporal domain separating different
subbands. There are only segmentation lines crossing the horizontal and vertical
dimensions, where the level 2 LL band, all,2[·], is shown in the upper left-hand corner,
and the level 0 HH band, dhh,0[·], is shown in the lower right-hand corner. Also
shown in Figure 5.2, there exists a greater number of subbands generated by the
94
decoupled 3D wavelet transform than in the symmetric 3D wavelet transform, allowing
for greater frequency analysis in both the spatial and temporal domains.
Each subband generated by the 3D wavelet transform is a 3-dimensional bandpass
signal representing the original signal, f(·). A sample of subband locations is given
in Figure 5.3.
Figure 5.3: Decoupled 3D wavelet transform subbands, KM = 2, JM = 2. Left:Subband d3D
hl,1,1[·] highlighted in gray. Right: Subband d3Dlh,0,2[·] highlighted in gray.
After the decoupled 3D wavelet transform and quantization are computed, stack-
run [72] followed by Huffman [22] encoding are applied to each of the subbands for
compression.
5.2.5 3D Wavelet Compression Results
The advantage of the 3D wavelet transform is evident when coding a video signal
with both 2D and 3D wavelet compression. Figure 5.4 gives the results of 2D wavelet
compression vs. 3D wavelet compression on the ”CLAIRE” image sequence. 2D
95
wavelet compression is accomplished by computing the 2D wavelet transform on each
frame in the image sequence separately, applying 2D quantization, and using stack-
run [72] followed by Huffman [22] coding on the quantized coefficients. The 3D
wavelet transform exploits redundancy in the temporal domain as well as in the spatial
domain. Therefore, 3D wavelet compression produces a much higher compression
ratio and better overall quality. As shown in Figure 5.4, the performance of 3D
Figure 5.4: Comparison of 2D wavelet compression and 3D wavelet compression usingthe CLAIRE image sequence (frame #4 is shown). Left: 2D wavelet compression.s2 = 64, KM = 8, file size = 198KB, compression ratio = 256:1, average PSNR =29.80. Right: 3D wavelet compression. s2 = 29, s3 = 29, KM = 8, JM = 8, file size= 196KB, compression ratio = 258:1, average PSNR = 33.31.
wavelet compression method is greater than that of 2D wavelet compression. Note
that for the results given in Figure 5.4 the GoF processing block for 3D wavelet
compression is F = 64 frames.
96
5.3 Virtual-Object Compression
The advantages of 3D wavelet compression over the traditional 2D frame-by-frame
compression is evident by the results given in Figure 5.4. However, to further exploit
temporal domain redundancy in video signals, virtual-object compression is devel-
oped. In virtual-object compression, the original video signal is separated into back-
ground and virtual-object. Then each is compressed separately for more optimal
compression results.
5.3.1 Virtual-Object Definitions
Let us define a three-dimensional rectangular object o(·) where o(x, y, z) is a pixel
in the object sequence of horizontal position x, vertical position y and frame z. The
dimensions of o(·) are width Wo, height Ho, and frames F . We restrict the object to be
the same size in each frame of the sequence to ensure that the virtual-object is easily
defined and compressible. Therefore, Wo and Ho are constant, and not dependent on
z.
However, because objects in an image sequence move, we must allow the virtual-
object to be placed anywhere within each frame. Thus, we define coordinates Sx[·] and
Sy[·] which correspond to upper-left corner of the virtual-object in each frame, or the
starting horizontal and vertical positions of the virtual-object, respectively. We also
define Ex[·] and Ey[·] which correspond to the lower-right corner of the virtual-object,
or the ending horizontal and vertical positions of the virtual-object, respectively.
With these definitions some boundary conditions are required. The virtual-object
must be positive in width and height, and it cannot be larger than the original video
frames, thus 0 ≤ Wo ≤ Wf and 0 ≤ Ho ≤ Hf . Also, the virtual-object must lie
97
within each frame. Thus, 0 ≤ Sx[z] < Wf − 1 and 0 ≤ Sy[z] < Hf − 1, for all z. It is
also known that Sx[z] < Ex[z] < Wf and Sy[z] < Ey[z] < Hf , for all z.
As stated previously, the virtual-object must remain the same size for each frame
in the sequence. Therefore, Ex[z]− Sx[z] = Wo and Ey[z]− Sy[z] = Ho for all z.
The virtual-object is defined as:
o(x, y, z) = f(x+Sx[z], y+Sy[z], z), 0 ≤ x < Wo, 0 ≤ y < Ho, 0 ≤ z < F , (5.10)
where o(·) is the virtual-object and f(·) is the original image sequence.
The background is defined as:
b(x, y) =
{ ∑F−1z=0 f(x,y,z)α[x,y,z]∑F−1
z=0 α[x,y,z], when
∑F−1z=0 α[x, y, z] 6= 0
0, else, (5.11)
where
α[x, y, z] =
{1, when (x, y, z) ∈ L + R + U + D0, else
. (5.12)
L, R, U , and D represent the area which lies outside the virtual-object, or the area
left (L), right (R), above (U), and below (D) the virtual-object. More specifically,
L = {(x, y, z) : x < Sx[z]}, R = {(x, y, z) : x ≥ Ex[z]}, U = {(x, y, z) : y < Sy[z]},
and D = {(x, y, z) : y ≥ Ey[z]}. As shown in Equation 5.11, the background is formed
by temporal average of the entire GoF area outside of the virtual-object boundary.
Figure 5.5 gives a frame of the ”CLAIRE” image sequence including virtual-object
definitions.
5.3.2 Virtual-Object Extraction Method
The method of extracting the virtual-object is accomplished by applying the
wavelet transform in the temporal domain to the original image sequence f(·). The
extraction method separates the portion of video with motion from the portion of the
98
Figure 5.5: Virtual-object extraction.
video without motion. Motion in an image sequence results in large temporal domain
transform coefficients which are spatially contiguous.
The non-decimated wavelet transform in the temporal domain of a 3 dimensional
image sequence f(·) is given by
λvo[x, y, z] =∑m
f(x, y,m)gvo[m− z], (5.13)
where λvo[·] are the wavelet coefficients, and gvo[·] is the wavelet filter. The subscript
designation vo is given to identify the coefficients and wavelet filter for purposes of
virtual-object extraction.
99
Experimentally, it has been determined that the biorthogonal Haar wavelet func-
tion provides the best motion identification. The biorthogonal Haar wavelet is given
by
gvo[t] =
1 when t = 0−1 when t = 10 else
. (5.14)
The compact support of the biorthogonal Haar wavelet makes it a natural choice for
motion identification. Assuming there is no noise in the image sequence, a simple
difference between consecutive frames is the most effective means of motion identi-
fication. The compact support of the Haar wavelet is most aptly able to locate the
spatial and temporal position of motion in an image sequence.
A 3 dimensional Boolean map determining motion from non-motion is obtained
by thresholding the coefficient values, λvo[·].
Ivo[x, y, z] =
{1, when |λvo[x, y, z]| > τvo
0, else. (5.15)
The Boolean motion map, Ivo[·] is refined by spatial support criteria described in
Section 3.3. That is,
Jvo[x, y, z] =
{1, when Svo[x, y, z] > svo
0, else, (5.16)
where Svo[x, y, z] is calculated by an algorithm given in Appendix A.
The values of τvo and svo are experimentally determined. We find that τvo = 15
and svo = 2 give the best separation of object and background.
100
Each frame of the Boolean map is scanned to find the smallest rectangle possible
to fit all the non-zero Jvo[·]. This is obtained by
γx[z] = max( ~K) where k ∈ ~K ⇐⇒ ∑k−1m=0
∑Hf−1n=0 Jvo[m,n, z] = 0
εx[z] = min( ~K) where k ∈ ~K ⇐⇒ ∑k−1m=0
∑Hf−1n=0 Jvo[m,n, z] =∑Wf−1
m=0
∑Hf−1n=0 Jvo[m,n, z]
γy[z] = max( ~K) where k ∈ ~K ⇐⇒ ∑k−1n=0
∑Wf−1m=0 Jvo[m,n, z] = 0
εy[z] = min( ~K) where k ∈ ~K ⇐⇒ ∑k−1n=0
∑Wf−1m=0 Jvo[m,n, z] =∑Hf−1
n=0
∑Wf−1m=0 Jvo[m,n, z]
.
(5.17)
The vectors, γx[·] and εx[·] are the starting and ending horizontal positions of the
virtual-object in each frame of the Boolean map. Similarly, γy[·] and εy[·] are the start-
ing and ending vertical positions of the virtual-object. However, these boundaries for
the virtual-object may not be the same size, i.e., εx(b)−γx(b) 6= εx(a)−γx(a), for a 6=
b. Therefore, the width and height of the virtual-object are defined by,
Wo = max(~εx − ~γx), zm,x = arg max(~εx − ~γx)Ho = max(~εy − ~γy), zm,y = arg max(~εy − ~γy)
. (5.18)
zm,x and zm,y are the frames which contain the maximum virtual-object width and
maximum virtual-object height, respectively.
The starting horizontal and vertical positions of the virtual-object , Sx(·) and
Sy(·), are needed to completely specify the location of the virtual-object. These po-
sitions are established to completely contain the virtual-object in all frames, and to
minimize the horizontal and vertical motion of the virtual-object boarder through-
out the image sequence. It has been experimentally determined that minimal spa-
tial movement of the virtual-object between consecutive frames provides the largest
compression ratios and best reconstructed quality. Thus the starting horizontal and
101
vertical positions of the virtual-object are given by
Sx[0] =
γx[zm,x], when γx[0] < Sx[zm,x]εx[0]−Wo, when εx[0] ≥ Ex[zm,x]Sx[zm,x], else
, (5.19)
Sx[z] =
γx[z], when γx[z] < Sx[z − 1]εx[z]−Wo, when εx[z] ≥ Ex[z − 1]Sx[z − 1], else
, (5.20)
Sy[0] =
γy[zm,y], when γy[0] < Sy[zm,y]εy[0]−Ho, when εy[0] ≥ Ey[zm,y]Sy[zm,y], else
, (5.21)
and,
Sy[z] =
γy[z], when γy[z] < Sy[z − 1]εy[z]−Ho, when εy[z] ≥ Ey[z − 1]Sy[z − 1], else
. (5.22)
The calculation of the starting horizontal and vertical positions, Sx[·] and Sy[·],
given in Equations 5.19 through 5.22 guarantee minimal movement of the virtual-
object boarder.
The reconstructed video signal, f(·), is given by
f(x, y, z) =
{b(x, y), when α[x, y, z] = 1o(x, y, z), else
, (5.23)
where b(·) and o(·) are the reconstructed background frame and virtual-object, re-
spectively.
5.3.3 Virtual-Object Coding
Once the virtual-object and background have been identified and separated, the
independent compression of each is straightforward. The background is compressed
by 2D wavelet compression, and the virtual-object is compression by 3D wavelet
compression described in Section 5.2. Figure 5.6 gives the design flow of the virtual-
object compression method.
102
Figure 5.6: Virtual-object compression.
As given in Figure 5.6, the original video signal is separated into the virtual-object
and background using the virtual-object extraction method. The virtual-object and
background are then compressed separately using the 3D wavelet compression and
2D wavelet compression methods, respectively. Each of the processing blocks given
in Figure 5.6 following the virtual-object extraction method are described in Section
5.2.
5.4 Performance Comparison Between 3D Wavelet and Virtual-Object Compression
The virtual-object compression method is compared to the 3D wavelet compres-
sion method. The ”CLAIRE” image sequence is used for continuity with the com-
parison of 2D wavelet compression to 3D wavelet compression, given in Figure 5.4.
Figure 5.7 gives results of 3D wavelet compression and virtual-object compression
methods, using the ”CLAIRE” image sequence. Note that for the results given in
Figure 5.7 the GoF processing block is F = 64 frames. As shown in Figure 5.7, the
103
Figure 5.7: Comparison of 3D wavelet compression and virtual-object compressionusing the CLAIRE image sequence (frame #4 is shown). Left: 3D wavelet compres-sion. s2 = 29, s3 = 29, KM = 8, JM = 8, file size = 196KB, compression ratio =258:1, average PSNR = 33.31. Right: Virtual-object compression, s2 = 25, s3 = 25,KM = 8, JM = 8 for the virtual-object and s2 = 9, KM = 8 for the background, filesize = 195KB, compression ratio = 259:1, average PSNR = 34.00.
virtual-object compression method achieves an increase in compression ratio from 3D
wavelet compression while providing higher PSNR.
Along with the ”CLAIRE” image sequence, the virtual-object compression method
is tested against 3D wavelet compression as well as 2D wavelet compression using the
”SALESMAN” and ”MISSA” image sequences. The results of the quality comparison
is given in Figure 5.8. Figure 5.8 shows that virtual-object compression consistently
outperforms both 2D wavelet compression and 3D wavelet compression in compression
ratio and PSNR.
104
5 10 15 20 25 30 35 40 45 5027
28
29
30
PS
NR
(dB
)Results from Using Various Compression Methods on the SALESMAN Image Sequence
virtual−object comp., 54278 bytes3D wavelet comp., 56449 bytes2D wavelet comp., 59367 bytes
0 20 40 60 80 100 12030.5
31
31.5
32
PS
NR
(dB
)
Results from Using Various Compression Methods on the MISSA Image Sequence
virtual−object comp., 199,554 bytes3D wavelet comp., 202,035 bytes2D wavelet comp., 206,914 bytes
20 40 60 80 100 120 140 160
30
32
34
Frame Number
PS
NR
(dB
)
Results from Using Various Compression Methods on the CLAIRE Image Sequence
virtual−object comp., 200,205 bytes3D wavelet comp., 201,140 bytes2D wavelet comp., 202,878 bytes
Figure 5.8: Comparison of 2D wavelet compression, 3D wavelet compression, andvirtual-object compression.
5.5 Discussion
In this chapter, a new object-based compression method called virtual-object com-
pression has been described. Virtual-object compression differs from typical video
compression methods by first extracting moving objects from stationary background
and compressing each separately. The separation of objects and background enable
independent coding of both, providing a low bit-rate compressed video signal.
105
Although virtual-object compression is not a truly object-based compression method
set forth by the MPEG-4 standard. It is able to provide compression gain and im-
proved PSNR from the 3D wavelet compression method by relaxing some of the
constraints involved with object based compression methods. Thus, the results of
virtual-object compression have shown a performance improvement over the more
traditional wavelet-based compression methods of 2D wavelet compression and 3D
wavelet compression.
106
CHAPTER 6
Constant Quality Rate Control for Content-Based 3DWavelet Video Communication
6.1 Introduction
The vast amounts of data associated with digital images and video streams have
provided a growing concern and motivation for efficient image compression methods.
Many such compression algorithms have been developed around a variety of matrix
transforms [47, 48, 52]. One such method, the wavelet transform, has shown promising
results in large compression ratio and high reconstructed image quality [37, 70, 82].
Recently, the efficient coding of video signals has become a leading topic in com-
pression research [30, 56]. A new compression algorithm, the 3D wavelet transform,
has been developed to provide very high compression ratios of digital video while
preserving the reconstructed quality [71, 81].
Tightly coupled with compression research is the reliable transmission and recep-
tion of compressed video. Real-time video communication applications using com-
pression algorithms demand a constant frame rate for a high quality of service (QoS).
This requirement is challenging, however. Inconsistent compression and decompres-
sion computation times, variable compressed video data size, and the unpredictable
107
available bandwidth of volatile communication channels all hinder the performance
of real-time video communication.
Many rate control algorithms have been proposed in recent history, and most have
been associated with providing constant frame rate with a variable quantization pa-
rameter [13, 32, 38, 51, 57, 59, 60, 65]. The quantization parameter directly affects
both the bit rate and reconstructed video quality. Therefore, for low bit-rate envi-
ronments, the constant frame rate approach may provide poor quality image frames
at the receiver. To combat this effect, other rate control algorithms have controlled
both the frame rate and the quantization parameter to provide a best possible QoS
[58, 66, 67]. However, for many applications, individual image frames of reasonable
visual quality are vastly more important than high frame rates. Therefore, we employ
a fixed quantization step-size to deliver constant quality video frames.
Also, most former rate control algorithms have a minimum bit rate requirement
for the communication channel [13, 14, 32, 51, 57, 58, 59, 60, 65, 66]. Unfortunately
many communication systems such as the Internet do not provide a minimum bit
rate guarantee. Furthermore, the content-based 3D wavelet compression scheme is
a special case of image compression and also a relatively new idea [71, 81]. Thus
it is desirable for a rate control algorithm specific to 3D wavelet compression to be
developed.
The content-based 3D wavelet compression scheme operates on a group of frames
(GoF), and the number of frames varies between groups depending on the video
content. Because we group only similar frames together, the number of frames in
each group is variable. Thus, the 3D wavelet transform produces a variable delay
for the transmission of real-time video. Because of this delay, rate control becomes
108
an even more difficult issue. To deal with the uncertainty of both the bandwidth
of the communication channel and the video content, we propose a new rate control
algorithm. It differs from previous algorithms in many ways. First, because there are
two uncertainties, there are two frame buffers for the storage of video frames in both
the client and server sides. Secondly, the client side buffer is developed to ensure the
continuous display of reconstructed image frames. The client side buffer must contain
enough reconstructed video content to overcome the acquisition delay of the next GoF
as well as the delay of data transfer over the network, and the computation time of
the compression and decompression algorithms. The buffer is based on a leaky bucket
algorithm with an adjustable window of constant frame rate (AWCF). Thirdly, for the
server side we develop a feedback mechanism from the client to control the server’s
buffer content and ensure that the frame rates of the server and client sides are equal.
This chapter is arranged into five sections. Following the Introduction, Section
6.2 gives a brief description of content-based 3D wavelet compression and illustrates
the functionality and importance of a multi-threaded application for real-time com-
munication. Section 6.3 provides an overview and analysis of the rate control system,
including the constraints imposed on the rate control buffers, design parameters of
the control buffers on the client and server sides, and a definition of the AWCF.
Section 6.4 gives experimental results of the rate control algorithm, and Section 6.5
summarizes the chapter.
6.2 Multi-Threaded, Content-Based 3D Wavelet Compres-sion
The content-based 3D wavelet video compression/decompression system design
flow is given in Figure 6.1. As shown in Figure 6.1, the frame grabber loads video
109
frames into the compression system. The dynamic grouping of frames then compares
and groups frames of similar content together. The dynamic grouping process sends
the group of frames (GoF) to the 3D wavelet compression system. The compression
algorithm then compresses the video using wavelet analysis. By grouping frames of
similar content, the inter-frame redundancy of the individual pixels is assured, thus
providing high compression ratios. The compressed video is then either stored or sent
across a communication channel. The 3D wavelet decompression system reconstructs
the video, and the video is then displayed to the user. The content-based compression
approach develops GoFs of differing size, and because of the disparity in GoF size
the computation time required to compress and decompress each GoF varies. Thus,
continuous and smooth display of video becomes a challenging issue.
Figure 6.1: Content-based 3D wavelet compression/decompression design flow.
A real-time compression/decompression system must be able to perform many
tasks concurrently. For example, the compression algorithm must continuously cap-
ture and group frames while compressing video and sending it to the receiver. This
110
can only be performed when operations are being computed independently. There-
fore, four processing threads are created in the communication system: the grouping
thread, compression thread, decompression thread, and display thread. Figure 6.2 gives
a model of the communication system.
Figure 6.2: 3D wavelet communication system.
The two buffers that have been added to the system, shown in Figure 6.2, are
instrumental in achieving independent operation from each of the application threads.
Also, all four threads will be continuously active as long as both buffers are neither
empty, nor full. The grouping thread will continue to group frames until the grouping
buffer is full. At that point, there is no space left for the next GoF. Conversely,
the compression thread will continue to compress until the grouping buffer is empty.
After the grouping buffer is empty there is no longer a GoF to compress. Therefore,
continuous activity from both the grouping thread and the compression thread depends
111
on the fullness of the grouping buffer. Similarly, at the receiving end, continuous
activity from the decompression thread and the display thread can only be achieved if
the display buffer is neither full nor empty.
6.3 The Rate Control Algorithm
6.3.1 Rate Control Overview
The rate control algorithm of the current system is based on a leaky bucket ap-
proach [7, 13, 14, 59]. The leaky bucket idea has been developed earlier for ATM
networks and other applications, but has never been considered for 3D wavelet com-
pression. As stated previously, all four computation threads are continuously active
if and only if both data buffers given in Figure 6.2 are neither full, nor empty. There-
fore, the goal of the rate control algorithm is to keep the amount of data in both the
buffers at a reasonable level while ensuring the frame grabber rate and frame display
rate are constant, and equal.
Also, the network bandwidth limitation has not yet been considered. With limited
bandwidth, all four of the threads cannot be completely active. In most applications,
the computational capacity of both platforms greatly exceeds the communication
bandwidth available. Therefore, a rate control algorithm must manage each of the
threads computational activity. Figure 6.3 gives the completed rate control wavelet
communication system. The additions to the system given in Figure 6.2 are as follows:
Send Thread and Send Buffer – The most important part of the wavelet communi-
cation system is to maximally utilize the available bandwidth given by the communi-
cation channel, thus attempting to provide the highest possible frame rate. Therefore,
112
Figure 6.3: Complete rate control system.
another buffer and processing thread are created to continually send data at the max-
imum rate possible. The send buffer is inserted into the system to give the send thread
data to output through the channel. The compression thread is an algorithm whose
output bit rate depends on the content of the input video, so the send buffer is neces-
sary to achieve continuous data throughput. The send thread also partitions the data
into smaller packets to enable the continuous flow of data.
Receive Thread and Receive Buffer – The receive thread is used to capture the
data packets from the communication channel, and the received data is stored in the
receive buffer. The send buffer and receive buffer need not be controlled. Given that
they are sufficiently large, the control of the grouping buffer and display buffer will
limit the amount of data that the send buffer and receive buffer must hold.
113
Send Monitor – The send monitor controls the rate at which the frame grabber
acquires each frame. Its decision comes with the size of the data in the grouping buffer.
The send monitor attempts to keep the grouping buffer fullness at a reasonable level by
adjusting the frame acquisition rate. However, the frame acquisition rate is confined
by the feedback provided by the receiver, because real-time communication requires
that the frame acquisition rate and display rate be equivalent. The send monitor
enforces the grouping buffer constraints, which are given in Subsection 6.3.2.
Receive Monitor – The receive monitor regulates the size of the receive buffer by
controlling the display rate at the receiver. The receive monitor attempts to keep the
display buffer fullness at a reasonable level by adjusting the display rate and enforcing
the display buffer constraints, which are given in Subsection 6.3.2.
Feedback – A virtual path where the client sends information to the server. The
receiver monitor uses the feedback path ensure equivalent acquisition and display
rates.
The proposed leaky bucket control model reduces the number of variables in the
compression algorithm. Our interest lies only in rate control, not the specifics of
wavelet video compression. Therefore, the compression and decompression threads,
and network can be modeled as a single delay from transmitter to receiver. Figure
6.4 gives the control model for the rate control system. From the control model given
in Figure 6.4, we can develop the constraints of the grouping and display buffers.
6.3.2 Buffer Constraints
As shown in Subsection 6.3.1, the send monitor and receive monitor adjust the
flow of data into and out of grouping buffer and display buffer respectively to control
114
Figure 6.4: Rate control model.
buffer fullness. Therefore, it is necessary to give analysis of the constraints imposed
on both grouping buffer and display buffer by the send monitor and receive monitor.
The display buffer content is given by
Bdi = Bd
i−1 + Ri −Di, (6.1)
where i is the unit time, Bdi is the display buffer fullness, and Ri is the video recon-
struction rate. Di is the display frame rate. Also, since the display buffer has a fixed
size, it is also governed by
0 ≤ Bdi ≤ Sd, (6.2)
where Sd is the size of the display buffer. The receive monitor manages the size of
the display buffer by regulating Di. Therefore
Di =
Di−1 − δD, when Bdi−1 < εd
Di−1, when εd ≤ Bdi−1 ≤ φd,
Di−1 + δD, when φd < Bdi−1
(6.3)
where εd and φd are threshold levels corresponding to an almost empty and almost
full display buffer, respectively. δD corresponds to a modest change in the display
115
rate given by
δD = αDDi−1, (6.4)
where αD is the percent change in display rate. Assuming a small value for αD, the
receive monitor applies a gradual reduction in the display rate when the display buffer
falls below εd. Also, the receive monitor applies a gradual increase in the display rate
when the display buffer exceeds above φd. The gradual increase and decrease of frame
rate is crucial in producing a high QoS for the user.
The grouping buffer follows similar constraints.
Bgi = Bg
i−1 + Ai − Ei, (6.5)
where Bgi is the grouping buffer fullness, and Ai is the frame acquisition rate, and
Ei is the compression rate. Similar to the display buffer, the grouping buffer is also
governed by,
0 ≤ Bgi ≤ Sg, (6.6)
where Sg is the size of the grouping buffer. The grouping buffer fullness is controlled
by the send monitor, which regulates the frame acquisition rate Ai.
Ai =
Di−1 + δA, when Bgi−1 < εg
Di−1, when εg ≤ Bgi−1 ≤ φg,
Di−1 − δA, when φg < Bgi−1
(6.7)
where εg and φg are grouping buffer threshold levels similar to those of the display
buffer give in Equation 6.3. δA corresponds to a modest change in the acquisition
rate given by
δA = αADi−1. (6.8)
αA is the percent change in acquisition rate. Note that the grouping buffer in the
server is controlled by the display rate of the client. The send monitor is provided
116
Di−1 by the receive monitor through the feedback path from client to server. Also,
Ai ≈ Di, (αA, αD << 1) (6.9)
which is a requirement for real-time systems.
The compression algorithm can only operate on an entire GoF for temporal domain
compression. Therefore
Ei =
{CN , when i = GN
0, else, (6.10)
and
CN ∈ {1, 2, ..., Γ} (6.11)
where N is the GoF index, and GN corresponds to the unit time period when the last
frame of the N th group is acquired. CN depicts the size of the N th GoF, and Γ is
the maximum group size. Note that Γ is an important parameter to select. When
Γ is large, one is allowed to have more frames in a single group thus increasing the
compression ratio. On the other hand, a large Γ increases the delay time between
the acquisition and display of the video. Usually, Γ is selected to maximize the
compression ratio while staying within the delay requirement, which is application
specific.
Similar to Equation 6.10, the video reconstruction rate is given by
Ri =
{CN , when i = GN + LN
0, else, (6.12)
where LN is the delay of the N th GoF from the grouping buffer to the display buffer
as shown in Figure 6.4 , caused by the compression and decompression computation
times, and network delay.
117
For the grouping buffer to neither overflow or empty, it is necessary that
Limn→∞
1
n
n∑i=0
Ai = Limn→∞
1
n
n∑i=0
Ei. (6.13)
As n increases, the system reaches steady state where the grouping buffer input rate
is equal to the grouping buffer output rate. Similarly, the display buffer input and
output rates become equal in steady state.
Limn→∞
1
n
n∑i=0
Ri = Limn→∞
1
n
n∑i=0
Di. (6.14)
The control of the buffers’ fullness given by Equations 6.3 and 6.7, is developed to
ensure the validity of Equations 6.13 and 6.14. The steady state of the buffers’ fullness
is necessary for the success of the rate control algorithm. With steady state data flow
through both buffers, the data flowing from the input of the grouping buffer to the
output of the display buffer approaches a constant rate, and a constant rate is what
is desired.
6.3.3 Grouping Buffer Design
The design parameters that need to be assigned for the grouping buffer are:
• The empty buffer threshold, εg.
• The full buffer threshold, φg.
• The grouping buffer size, Sg
The empty buffer threshold
The basic idea of the grouping buffer is to continue to push more data through
the network until the maximum bandwidth available is utilized, or the computational
118
activity of one of the platforms is maximized. As seen from Equation 6.7, the grouping
thread continues to acquire frames at a slightly greater rate than the display thread in
an effort to continually push more data through the network. Also from Equation 6.10
we see that the grouping buffer empties when the last frame of a GoF is acquired. So
in an effort to keep constant the acquisition rate, and continually push the available
network bandwidth,
εg = Γ. (6.15)
With this threshold in place, the grouping thread will continually acquire frames at
a slightly greater rate than the display threads frame rate, thus continually pushing
the bandwidth of the communication system.
The full buffer threshold and the grouping buffer size
With limited bandwidth it is possible for the compression thread and sending
thread to both be limited in the rate at which each can output data. Therefore, to
combat the possible overflow of both the send buffer and grouping buffer, the value
of φg is determined.
If we look at the worst case scenario of total network congestion, the grouping
thread may acquire up to φg frames before the send monitor will start to slow the
frame acquisition rate. Therefore, The value of φg is determined to be
φg = 2Γ. (6.16)
With this threshold in place, the grouping thread may acquire up to two GoF’s of the
maximum size before being penalized with a slowed acquisition rate. The size of the
grouping buffer is also determined:
Sg = φg + Γ = 3Γ. (6.17)
119
The size of the grouping buffer allows up to three GoFs of the maximum size to be
acquired with total network congestion. Therefore, the value of Sg gives enough space
for buffer overflow to be avoided.
The grouping buffer design is simple with fixed values for εg and φg, and mostly
governed by the frame rate of the display thread as seen in Equation 6.7. Therefore,
the display buffer design is the primary vehicle for rate control, which is discussed in
detail in the following subsection.
6.3.4 Display Buffer Design
There are several design parameters that need to be assigned for the display buffer:
• The initial buffering level, I.
• The empty buffer threshold, εd.
• The full buffer threshold, φd.
• The display buffer size, Sd.
The initial buffering level
Because the video frames are grouped by content, the groups are of different sizes
with a maximum threshold Γ, as given in Equation 6.11. Therefore, group sizes range
from 1 to Γ frames. As an example, assume the beginning of a video sequence contains
two groups: the first group consists of 1 frame, and the second group consists of Γ
frames. If the first group is sent to the receiver, and the receiver immediately displays
that frame after image reconstruction, the receiver will inevitably wait for the second
group to be sent with no frames in the display buffer, and a constant frame rate will
120
not be achieved. Therefore, an initial buffering level large enough to ensure constant
video display must exist.
From the previous example, it is obvious that the initial buffering level, I, must
be larger than Γ.
I ≥ Γ. (6.18)
However, the initial buffer level must also be larger than the empty buffer threshold,
εd. This is necessary to keep the display buffer level greater than εd to ensure that the
frame rate remains constant, as given in Equation 6.3. Therefore,
I ≥ Γ + εd. (6.19)
However, I directly corresponds to the initial waiting time for the receiver. If I is
chosen too large, the receiver will have an overly large initial buffering time, decreasing
the QoS. Therefore must be kept at a minimum, and we choose
I = Γ + εd. (6.20)
The empty buffer threshold
The variable delay LN , given in Equation 6.12, is used to calculate the minimum
value of εd needed to ensure that the display buffer never empties. From Equations
6.3 and 6.4 we can determine the average display rate during the critical empty buffer
warning level, i.e., Bdi−1 ≤ εd. First, we can determine the amount of time the buffer
has before it empties, without control. That is,
τc =εd
Di
. (6.21)
τc represents the critical time period before the display buffer is empty. We can now
assume control of the display buffer and then determine the estimated average display
121
rate, Davg|Bdi−1≤ε.
Davg|Bdi−1≤ε >
Di + (Di − δDτc)
2= Di − εdαD
2. (6.22)
Note that Equation 6.22 is merely an estimate of the average delay in the display
buffer. The actual average delay is a polynomial of degree εd − 1. In Equation 6.22,
we assume δD to be constant when in reality the value of changes with each change
in the display rate, as seen in Equation 6.4. The choice to use this estimate is based
on computational simplicity and algorithmic intuitiveness.
Moreover, we know that enough frames must exist in the display buffer to keep
displaying throughout the delay of the next GoF, LN+1. Therefore,
εd
Davg|Bdi−1≤ε
≥ LN+1. (6.23)
Solving for εd and substituting in Equation 6.22 we obtain
εd ≥ 2LN+1Di
2 + αDLN+1
. (6.24)
In Practice, however, the variable delay LN+1 is greatly dependent on the size of
the next GoF, which is unknown. Therefore, for a worst-case scenario, we compute
the average delay per frame, Lf , and multiply by Γ to estimate the delay of a GoF
consisting of Γ frames. Therefore,
εd ≥ 2LfΓDi
2 + αDLfΓ. (6.25)
The average delay per frame can then be obtained by
Lf =LN
CN
. (6.26)
The value of LN is determined by calculating the round-trip time (RTT) of the com-
pressed GoF from client to server, dividing by 2, and adding the computation times
of the compression and decompression algorithms.
122
Again, to ensure the minimum delay possible for I, and substituting in for Lf we
obtain:
εd =2LNΓDi
2CN + αDLNΓ. (6.27)
And substituting into Equation 6.20, we have,
I = Γ(1 +2LNDi
2CN + αDLNΓ). (6.28)
The full buffer threshold and the display buffer size
The full buffer threshold, φd, is set one greater than I in order to produce an
AWCF that is 2Γ in size. Therefore
φd = Γ(2 +2LNDi
2CN + αDLNΓ). (6.29)
The display frame rate is constant whenever the buffer fullness is within this window.
Also, the display buffer size can be arbitrarily set greater than φd. We find that
Sd = 4Γ (6.30)
gives enough space for the AWCF to move.
6.4 Experimental Results
The communication system is developed, and a test is run with a maximum group
size Γ of 64, an αA of 0.1, and an αD of 0.01. These parameters are found to produce
quality results, but their values are determined empirically, and without analysis
beyond the requirements given by Equation 6.9. The video is run for approximately
20 minutes. Also, the initial display rate of the receiver is deliberately set for too
high a frame rate for the communication system to handle, for evaluation purposes.
123
The video sample is 320x240 color frame size, and the initial frame rate, D0, is set
at 12 fps. The display frame rate, as well as the display buffer size is given in Figure
6.5.
0 2 4 6 8 10 12 14 16 18 200
5
10
15
time (minutes)
Fra
me
Rat
e (f
ps)
Display Frame Rate and Display Buffer Size
0 2 4 6 8 10 12 14 16 18 200
50
100
150
200
250
time(minutes)
Buf
fer
Siz
e (f
ram
es)
Display Buffer SizeLower Threshold, εUpper Threshold, φ
Figure 6.5: Display frame rate and display buffer size, D0=12 fps.
As seen in Figure 6.5, the rate control algorithm does reduce the frame rate until
steady state is found. Also, the frame rate stays constant unless the buffer fullness
reaches beyond the threshold levels of the AWCF, as given in Equation 6.3. Therefore,
124
the control algorithm produces a smooth and continuous frame rate for real-time video
communication.
The results of the acquisition frame rate and grouping buffer fullness are given in
Figure 6.6.
0 2 4 6 8 10 12 14 16 18 200
5
10
15
time (minutes)
Fra
me
Rat
e (f
ps)
Acquisition Frame Rate and Grouping Buffer Size
Grouping RateDisplay Rate
0 2 4 6 8 10 12 14 16 18 200
50
100
150
200
250
time (minutes)
Buf
fer
Siz
e (f
ram
es)
Grouping Buffer SizeLower Threshold, εUpper Threshold, φ
Figure 6.6: Frame acquisition rate and grouping buffer size, D0=12 fps.
As seen in Figure 6.6, the frame acquisition rate follows the frame display rate
as given in Equation 6.9. However, the acquisition rate is slightly higher than the
125
display rate. This is due to the grouping buffer size, which is lower than the empty
buffer threshold as shown in Equation 6.7.
The same video is run again, but the initial frame rate is set to 2 fps, intentionally
slower than the maximum frame rate the network can handle. The display frame
rate, as well as the display buffer fullness is given in Figure 6.7.
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
time (minutes)
Fra
me
Rat
e (f
ps)
Display Frame Rate and Display Buffer Size
0 2 4 6 8 10 12 14 16 18 200
50
100
150
200
250
time(minutes)
Buf
fer
Siz
e (f
ram
es)
Display Buffer SizeLower Threshold, εUpper Threshold, φ
Figure 6.7: Display frame rate and display buffer size, D0=2 fps.
As seen in Figure 6.7, the frame rate slowly reaches a steady-state frame rate
of approximately 8 fps, the same steady-state frame rate as given in Figure 6.5.
126
Therefore, the rate control algorithm does converge to a frame rate that maximally
utilizes the capacity of the platforms and network. Figure 6.8 gives the Acquisition
frame rate and grouping buffer fullness.
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
time (minutes)
Fra
me
Rat
e (f
ps)
Acquisition Frame Rate and Grouping Buffer Size
Grouping RateDisplay Rate
0 2 4 6 8 10 12 14 16 18 200
50
100
150
200
250
time (minutes)
Buf
fer
Siz
e (f
ram
es)
Grouping Buffer SizeLower Threshold, εUpper Threshold, φ
Figure 6.8: Frame acquisition rate and grouping buffer size, D0=2 fps.
Figure 6.8 indeed shows that the frame acquisition rate and display rate are close
to being equal, as given in Equation 6.9. Thus, the rate control algorithm continually
monitors the capacity of the network and adjusts the frame rate accordingly.
127
6.5 Discussion
We have developed a rate control algorithm designed for a content-based 3D
wavelet video compression scheme, used for real-time video transfer. With the GoF
requirement of 3D wavelet compression, an inherent delay is introduced in the trans-
mission of real-time video. Also because the wavelet transform is a content-based
compression scheme, the compression and decompression times vary with each group,
and the compressed file size also varies between differing GoF’s of the same size. A
rate control algorithm is designed to supply a smooth and continuous frame rate from
server to client in an environment with a variable and unknown network delay such
as the Internet and a compression scheme which allows for variable GoF sizes.
A buffering mechanism is developed on both the client and server sides to ensure
the continuous display of reconstructed image frames. On the server side, a grouping
buffer is designed based on the maximum GoF size. On the client side, a display
buffer is designed based on the maximum GoF size as well as the variable delay of
the network. As shown in the experimental results, the AWCF is able to provide
continuous video to the client based upon the inherent characteristics associated with
content-based 3D wavelet compression and real time video transfer. In addition,
a feedback mechanism is used from the client to control the servers buffer content
and ensure that the acquisition rate of the server and display rate of the client are
equal. Experimental results prove that the rate control algorithm is effective for the
content-based 3D wavelet video compression scheme.
128
CHAPTER 7
Conclusions and Future Work
This dissertation presents several methods to improve the state-of-the-art in video
compression and communication technology. This concluding chapter summarizes the
research presented and specifies contributions made. Also, various topics are identified
for future research.
7.1 Contributions
Noise removal in natural digital imagery is an important part of many different
imaging systems. Denoising methods based on the non-decimated wavelet transform
have shown to achieve a large PSNR increase. However, the computational burden
of previous wavelet-based noise removal algorithms are too large for realtime imaging
systems. Thus, the two-threshold criteria for coefficient selection in image denoising
has been developed to ease the computational burden associated with the coefficient
selection process. The two thresholds are defined by using a training sample approach.
The training sample images are artificially corrupted with AWGN and denoised with
several threshold levels. The threshold levels which produce the minimum error from
that of the optimal denoising method are used in the general case. The resulting
image denoising algorithm is not only 10x less complex computationally, but it also
129
shows an improvement in PSNR when compared to other wavelet-based denoising
algorithms given in the literature.
The removal of noise from video signals is important in the development of high
quality video systems. Therefore, a video denoising technique is described in this dis-
sertation. The denoising technique first uses the image denoising technique described
in this work for spatial domain denoising then uses a selective wavelet shrinkage al-
gorithm for temporal domain denoising. The temporal domain denoising technique
uses an estimate of the noise level as well as an estimate of the motion in the image
sequence to determine the amount of filtering that can improve the quality of the
video signal. This video denoising technique is more effective in noise removal and
achieves better average PSNR than the limited number of methods presented in the
literature.
Also, a virtual-object compression method is developed to provide the compression
gain that object-based compression methods promise, without the many difficulties
that object-based compression methods pose. With virtual-object compression, sta-
tionary background is separated from moving objects and each is compressed indepen-
dently. The independent compression of objects and background give virtual-object
compression an improvement in PSNR over 3D wavelet compression.
Real-time delivery of compressed video is a challenging problem because of the
many uncertain factors involved such as the computational capacity of both client
and server platforms, the bandwidth and amount of congestion of the network, and
the inherent acquisition delay of each GoF. We have provided a real-time video com-
munication solution which combats the many problems associated with real-time
video delivery over lossy channels by developing a rate control algorithm based on a
130
leaky bucket approach. Both sender and receiver include an independent monitoring
thread which adjusts the acquisition and display rates, respectively, to ensure proper
management of the video stream. The result is real-time video delivery over a lossy
channel.
The summation of these contributions results in a high-quality real-time video
compression and transmission system.
7.2 Future Work
Although this work provides some promising techniques to boost the overall per-
formance of 3D wavelet compression, there are still many issues that need to be
addressed for video compression using wavelets to be a method suitable for industry
standards. In this Section we outline a few areas of related study:
• Currently, wavelet-based image and video compression systems use one par-
ticular wavelet in transformation of the original signal, and that wavelet is
chosen experimentally. However, given different input signals, different wavelet
functions may prove to provide better results. Thus, it would be beneficial
to analyze the statistics of the input signal prior to compression in order to
select the wavelet which will most compactly represent that signal. Also, in
multiresolution analysis, the same wavelet need not be used in each level of
decomposition. Such signal analysis and wavelet selection could provide a com-
pression system that is optimal for all types of imaging and video signals (i.e.,
long-wave infra-red (LWIR), short-wave infra-red (SWIR), synthetic aperture
radar (SAR), etc.).
131
• Also, currently the image denoising and video denoising algorithms are not
computationally efficient enough for real-time imaging and video systems. Cur-
rently, the image denoising algorithm developed in this work can denoise a
320x240 grayscale image in approximately 1 second, which is 30 times slower
than needed for realtime calculation. In addition, the video denoising algorithm
has an added computational load with the addition of temporal domain process-
ing. It can denoise a 320x240x64 grayscale GoF in approximately 1.5 minutes.
A computational speedup of greater than 30 is most likely unattainable with
computational optimization of the algorithms. Thus, a hardware implementa-
tion is necessary for realtime applications.
• This dissertation in part defines an image and a video denoising algorithm.
These algorithms are designed to remove AWGN from images and video signals
and have shown to give higher PSNR than other methods given in the literature.
However, AWGN is only one of many types of noise sources that is found in
the image and video capture process. Fixed pattern noise, shot noise, thermal
noise, correlated noise, speckle, as well as AWGN are different types of noise
that corrupt many different image and video capture processes. Thus, for an
image/video denoising algorithm to be most useful in industry, the image/video
capture process must be studied and the types of noise corruption involved in
that process must be discovered. Then, an image/video denoising process may
be developed that is tailored to removing the type of noise that is produced by
the capture process.
132
• Much of the work involved in this dissertation is in the removal of noise in
signals prior to compression. The removal of noise facilitates compression by
reducing the amount of entropy of the signal while improving the signal quality.
However, the removal of noisy artifacts generated by the compression algorithm
after reconstruction is also an important processing step. Post-processing is
used in most modern-day compression systems. In both the JPEG and MPEG
standards, there exists filtering algorithms to remove the blocking artifacts as-
sociated with the block-based DCT transform used in the compression engine.
Thus, it would be fruitful to obtain a post-processing method to remove the
artifacts generated by wavelet-based compression methods.
• This dissertation uses PSNR as the metric for quality. The reasoning behind
using this metric is one of legacy and consistency. Most of the image and
video processing community continues to publish results using PSNR as the
quality metric, so to compare results with other methods we use PSNR as well.
However, in Chapter 3 we have briefly mention some metrics that may be closer
to the human perception of quality. Thus, new denoising and compression
methods can and should be developed which publish results with not one but
several quality metrics. In this way, researchers can be more confident about
the performance of such algorithms.
133
APPENDIX A
Computation of S·,k[x, y]
The computation of S·,k[x, y] is given from the following algorithm:
~N() = {[−1,−1], [−1, 0], [−1, 1], [0,−1],[0, 1], [1,−1], [1, 0], [1, 1]}
O[·] = 0, t = 0, p = 0, ~D·,k(0) = (x, y)if I·,k[x, y] == 1,
while ~D·,k(t) 6= NULL,
(i, j) = ~D·,k(t)t = t + 1for m = 0 to 7,
if ((I·,k[(i, j) + ~N(m)] == 1)
and (O[(i, j) + ~N(m)] == 0)),p = p + 1~D·,k(p) = ((i, j) + ~N(m))
O[(i, j) + ~N(m)] = 1,end if
end forend while
end ifS·,k[x, y] = t
. (A.1)
O[x, y] is a Boolean value to determine whether a particular I·,k[x, y] value has been
counted previously. ~D is an array of spatial coordinates of valid coefficients that
support the current coefficient value I·,k[x, y]. ~N is a set of vectors corresponding to
neighboring coefficient values.
134
BIBLIOGRAPHY
[1] ISO/IEC 11172-2. Information technology – Coding of moving pictures and asso-ciated audio for digital storage media at up to about 1,5 Mbit/s – Part 2: Video,Mar. 1993.
[2] ISO/IEC 13818-2. Information technology – Generic coding of moving picturesand associated audio information: Video, Mar. 1995.
[3] O. Avaro, A. Eleftheriadis, C. Herpel, G. Rajan, and L. Ward. MPEG-4 Systems:Overview, June 2000.
[4] J. B. Bednar and T. L. Wat. ”Alpha-Trimmed Means and Their Relationship toMedian Filters”. IEEE Transactions on Acoustics, Speech, and Signal Processing,vol. ASSP-32:pages 145–153, Feb. 1984.
[5] T. J. Burns, S. K. Roghers, M. E. Oxley, and D. W. Ruck. ”A Wavelet Multires-olution Analysis for Spatio-Temporal Signals”. IEEE Transactions on Aerospaceand Electronic Systems, 32(2):628–649, Apr. 1996.
[6] C. S. Burrus, R. A. Gopinath, and H. Guo. Introduction to Wavelets and WaveletTransforms, A Primer. Prentice Hall, 1998.
[7] M. Butto, E. Cavallero, and A. Tonietti. ”Effectiveness of the Leaky BucketPolicy Mechanism in ATM Networks”. IEEE Journal of Selected Areas in Com-munications, 9:335–342, April 1991.
[8] Berkeley Multimedia Research Center. MPEG-1 faq, Aug. 2001.
[9] Berkeley Multimedia Research Center. MPEG-2 faq, Aug. 2001.
[10] F. Cocchia, S. Carrato, and G. Ramponi. ”Design and Real-Time Implementa-tion of a 3-D Rational Filter for Edge Preserving Smoothing”. IEEE Transactionson Consumer Electronics, vol. 43:pages 1291–1300, Nov. 1997.
[11] C. D. Creusere and G. Dahman. ”Object Detection and Localization in Com-pressed Video”. In Proc. IEEE International Asilomar Conference on Signals,Systems, and Computers, volume 1, pages 93–97, 2001.
135
[12] I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and AppliedMathematics, 1992.
[13] W. Ding. ”Joint Encoder and Channel Rate Control of VBR Video over ATMNetworks”. IEEE Transactions on Circuits and Systems for Video Technology,7(2):266–278, 1997.
[14] W. Ding and B. Liu. ”Rate Control of MPEG Video Coding and Recording byRate-Quantization Modeling”. IEEE Transactions on Circuits and Systems forVideo Technology, 6(1):12–20, 1996.
[15] D. L. Donoho and I. M. Johnstone. ”Ideal Spatial Adaptation by Wavelet Shrink-age”. Biometrika, vol. 81:pages 425–455, Apr. 1994.
[16] D. L. Donoho and I. M. Johnstone. ”Adapting to Unknown Smoothness viaWavelet Shrinkage”. Journal of American Statistical Association, vol. 90:pages1200–1224, 1995.
[17] R. Dugad and N. Ahuja. ”Video Denoising by Combining Kalman and WienerEstimates”. In Proc. IEEE International Conference on Image Processing, vol-ume 4, pages 152–156, 1999.
[18] F. Faghih and M. Smith. ”Combining Spatial and Scale-Space Techniques forEdge Detection to Providee a Spatially Adaptive Wavelet-Based Noise FilteringAlgorithm”. IEEE Transactions on Image Processing, vol. 11:pages 1062–1071,Sept. 2002.
[19] Z. Gao and Y. F. Zheng. ”Variable Quantization in Subbands for Optimal Com-pression Using Wavelet Transform”. In Proc. World Conference on Systemics,Cybernetics, and Informatics, July 2003.
[20] M. Ghazel, G. H. Freeman, and E.R. Vrscay. ”Fractal-Wavelet Image Denoising”.In Proc. IEEE International Conference on Image Processing, volume 1, pagesI836–I839, 2002.
[21] K. H. Goh, J. J. Soraghan, and T. S. Durrani. ”New 3-D wavelet TransformCoding Algorithm for Image Sequences”. Electron. Letters, 29(4):401–402, Feb.1993.
[22] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison-WesleyPublishing, 1992.
[23] C. He, J. Dong, Y. F. Zheng, and S. C. Ahalt. ”Object Tracking Using the GaborWavelet Transform and the Golden Section Algorithm”. IEEE Transactions onMultimedia, 4(4):528–538, Dec. 2002.
136
[24] C. He, J. Dong, Y. F. Zheng, and Z. Gao. ”Optimal 3-D Coefficient Tree Struc-ture for 3-D Wavelet Video Coding”. IEEE Transactions on Circuits and Systemsfor Video Technology, 13(10):961–972, Oct. 2003.
[25] G. Healey and R. Kondepudy. ”CCD Camera Calibration and Noise Estima-tion”. In Proc. IEEE International Conference on Computer Vision and PatternRecognition, volume 1, page 90, June 1992.
[26] T. C. Hsung, D Pak-Kong Lun, and W. C. Siu. ”Denoising by SingularityDetection”. IEEE Transactions on Signal Processing, vol. 47:pages 3139–3144,Nov. 1999.
[27] S. J. Huang. ”Adaptive Noise Reduction and Image Sharpening for DigitalVideo Compression”. In Proc. IEEE International Conference on ComputationalCybernetics and Simulation, volume 4, pages 3142–3147, 1997.
[28] Y.-T. Hwang, Y.-C. Wang, and S.-S. Wang. ”An Efficient Shape Coding Schemeand its Codec Design”. In Proc. IEEE Workshop on Signal Processing Systems,volume 2, pages 225–232, 2001.
[29] C. R. Jung and J. Scharcanski. ”Adaptive Image Denoising in Scale-Space Us-ing the Wavelet Transform”. In Proc. XIV Brazilian Symposium on ComputerGraphics and Image Processing, pages 172–178, 2001.
[30] C. M. Kim, B. U. Lee, and R. H. Park. ”Design of MPEG-2 Video Test Bit-streams”. IEEE Transactions on Consumer Electronics, 45(4):1213–1220, 1999.
[31] S. D. Kim, S. K. Jang, M. J. Kim, and J. B. Ra. ”Efficient Block-Based Codingof Noise Images by Combining Pre-Filtering and DCT”. In Proc. IEEE Interna-tional Symposium on Circuits and Systems, volume 4, pages 37–40, 1999.
[32] Y.-R. Kim, Y. K. Kim, Y.-K. Ko, and S.-J. Ko. ”Video Rate Control UsingActivity Based Rate Prediction”. In Proc. IEEE International Conference onConsumer Electronics, volume 99, pages 236–237, June 1999.
[33] R. P. Kleinhorst, R. L. Lagendijk, and J. Biemond. ”An Efficient Spatio-Temporal OS-Filter for Gamma-Corrected Video Signals”. In Proc. IEEE Inter-national Conference on Image Processing, 1:348–352, Nov. 1994.
[34] Tom Lane. Image Compression FAQ, part 1/2, Mar. 1999.
[35] A. S. Lewis and G. Knowles. ”Video Compression Using 3D Wavelet Trans-forms”. Electron. Letters, 26(6):396–398, Mar. 1990.
137
[36] S. Li and W. Li. ”Shape-Adaptive Discrete Wavelet Transforms for ArbitrarilyShaped Visual Object Coding”. IEEE Transactions on Circuits and Systems forVideo Technology, 10(5):725–743, August. 2000.
[37] C. Lin, B. Zhang, and Y. F. Zheng. ”Packed Integer Wavelet Transform Con-structed by Lifting Scheme”. IEEE Transactions on Circuits and Systems forVideo Technology, 10(8):1496–1501, Dec. 2000.
[38] G. Lin and L. Zemin. ”3D Wavelet Video Codec and its Rate Control in ATMNetwork”. In Proc. IEEE International Syposium on Circuits and Systems, vol-ume 4, pages 447–450, 1999.
[39] W. Ling and P. K. S. Tam. ”Video Denoising Using Fuzzy-connectedness Princi-ples”. In Proc. IEEE International Symposium on Intelligent Multimedia, Video,and Speech Processing, pages 531–534, 2001.
[40] T.-M. Liu, B.-J. Shieh, and C.-Y. Lee. ”An Efficient Modeling Codec Archi-tecture for Binary Shape Coding”. In Proc. IEEE International Symposium onCircuits and Systems, volume 2, pages II–316–II–319, 2002.
[41] W. S. Lu. ”Wavelet Approaches to Still Image Denoising”. In Proc. IEEE Inter-national Asilomar Conference on Signals, Systems, and Computers, volume 2,pages 1705–1709, 1998.
[42] M. Malfait and D. Roose. ”Wavelet-Based Image Denoising Using a MarkovRandom Field A Priori Model”. IEEE Transactions on Image Processing, vol.6:pages 549–565, Apr. 1997.
[43] S. Mallat and W. L. Hwang. ”Singularity Detection and Processing withWavelets”. IEEE Transactions on Information Theory, vol. 38:pages 617–623,March 1992.
[44] F. McMahon. ”JPEG2000”. Digital Output, June. 2002.
[45] M. Meguro, A. Taguchi, and N. Hamada. ”Data-dependent Weighted MedianFiltering with Robust Motion Information for Image Sequence Restoration”. InProc. IEEE International Conference on Image Processing, 2:424–428, 1999.
[46] M. Meguro, A. Taguchi, and N. Hamada. ”Data-dependent Weighted MedianFiltering with Robust Motion Information for Image Sequence Restoration”. IE-ICE Transactions on Fundamentals, vol. 2:pages 424–428, 2001.
[47] J. Miano. Compressed Image File Formats: JPEG, PNG, GIF, XBM, BMP.ACM Publishing, 1999.
138
[48] N. Moayeri. ”A Low-Complexity, Fixed-Rate Compression Scheme for ColorImages and Document”s. The Hewlett-Packard Journal, 50(1), Nov. 1998.
[49] O. Ojo and T. Kwaaitaal-Spassova. ”An Algorithm for Integrated Noise Reduc-tion and Sharpness Enhancement”. IEEE Transactions on Consumer Electron-ics, vol. 46:pages 474–480, May 2000.
[50] S. J. Orfanidis. Introduction to Signal Processing. Prentice Hall, 1996.
[51] I.-M. Pai and M.-T. Sun. ”Encoding Stored Video for Streaming Applications”.IEEE Transactions on Circuits and Systems for Video Technology, 11(2):199–209, Feb. 2001.
[52] K. R. Persons, P. M. Pallison, A. Manduca, W. J. Charboneau, E. M. James,M. T. Charboneau, N. J. Hangiandreou, and B. J. Erickson. ”Ultrasoundgrayscale image compression with JPEG and wavelet techniques”. Journal ofDigital Imaging, 13(1):25–32, 2000.
[53] R. A. Peters. ”A New Algorithm for Image Noise Reduction Using MathematicalMorphology”. IEEE Transactions on Image Processing, vol. 4:pages 554–568,May 1995.
[54] A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy. ”A Joint Inter- andIntrascale Statistical Model for Bayesian Wavelet Based Image Denoising”. IEEETransactions on Image Processing, vol. 11:pages 545–557, May 2002.
[55] A. Pizurica, V. Zlokolica, and W. Philips. ”Combined Wavelet Domain and Tem-poral Video Denoising”. In Proc. IEEE International Conference on AdvancedVideo and Signal Based Surveillance, volume 1, pages 334–341, July 2003.
[56] S. M. Poon, B. S. Lee, and C. K. Yeo. ”Davic-based Video-on-Demand Systemover IP Networks”. IEEE Transactions on Consumer Electronics, 46(1):6–15,2000.
[57] D. Qiao and Y. F. Zheng. ”Dynamic Bit-Rate Estimation and Control forConstant-Quality Communication of Video”. In Proc. Third World Congresson Intelligent Control and Automation, pages 2506–2511, June 2000.
[58] A. C. Reed and F. Dufaux. ”Constrained Bit-Rate Control for Very Low Bit-RateStreaming-Video Applications”. IEEE Transactions on Circuits and Systems forVideo Technology, 11(7):882–889, July 2001.
[59] A. R. Reibman and B. G. Haskell. ”Constraints on Variable Bit-Rate Videofor ATM Networks”. IEEE Transactions on Circuits and Systems for VideoTechnology, 2(4):361–372, 1992.
139
[60] J. Ribas-Corbera and S. Lei. ”Rate Control in DCT Video Coding for Low-Delay Communicatinos”. IEEE Transactions on Circuits and Systems for VideoTechnology, 11(2):172–185, Feb. 2001.
[61] P. Rieder and G. Scheffler. ”New Concepts on Denoising and Sharpening of VideoSignals”. IEEE Transactions on Consumer Electronics, vol. 47:pages 666–671,Aug. 2001.
[62] A. Said and W. A. Pearlman. ”A New, Fast, and Efficient Image Codec Basedon Set Partitioning in Hierarchical Trees”. IEEE Transactions on Circuits andSystems for Video Technology, vol. 6:pages 243–250, June 1996.
[63] D. Santa-Cruz, T. Ebrahimi, J. Askelof, M. Larsson, and C.A. Christopoulos.”JPEG2000 still image coding versus other standards”. In Proc. SPIE’s 45thannual meeting, Applications of Digital Image Processing XXIII, volume 4115,pages 446–454, 2000.
[64] L. Shutao, W. Yaonan, Z. Changfan, and M. Jianxu. ”Fuzzy Filter Based onNeural Network and Its Applications to Image Restoration”. In Proc. IEEEInternational Conference on Signal Processing, volume 2, pages 1133–1138, 2000.
[65] K.-D. Soe, S.-H. Lee, J.-K. Kim, and J.-S. Kow. ”Rate Control Algorithm forFast Bit-Rate Conversion Transcoding”. IEEE Transactions on Consumer Elec-tronics, 46(4):1128–1136, Nov. 2000.
[66] H. Song, J. Kim, and J. Kuo. ”Real-Time H.263+ Frame Rate Control for LowBit-Rate VBR Video”. In Proc. IEEE International Symposium on Circuits andSystems, volume 4, pages 307–310, May 1999.
[67] H. Song and C.-C. J. Kuo. ”Rate Control for Low-Bit Rate Video via Variable-Encoding Frame Rates”. IEEE Transactions on Circuits and Systems for VideoTechnology, 11(4):512–521, April 2001.
[68] H. Stark and J. Woods. Probability, Random Processes, and Estimation Theoryfor Engineers. Prentice Hall, 1994.
[69] A. De Stefano, P. R. White, and W. B. Collis. ”An Innovative Approach for Spa-tial Video Noise Reduction Using a Wavelet Based Frequency Decomposition”.In Proc. IEEE International Conference on Image Processing, volume 3, pages281–284, 2000.
[70] W. Sweldens. ”The lifting scheme: A custom-design construction of biorthogonalwavelets”. Appl. Comput. Harmon. Anal., 3(2):186–200, 1996.
140
[71] J. Y. Tham, S. Ranganath, and A. A. Kassim. ”Highly Scalable Wavelet-BasedVideo Codec for Very Low Bit-Rate Environment”. IEEE Journal on SelectedAreas in Communications, 16(1):12–27, 1998.
[72] M. J. Tsai, J. D. Villasenor, and F. Chen. ”Stack-Run Image Coding”. IEEETransactions on Circuits and Systems for Video Technology, 6:519–521, Oct.1996.
[73] C. Vertan, C. I. Vertan, and V. Buzuloiu. ”Reduced Computation Genetic Al-gorithm for Noise Removal”. In Proc. IEEE International Conference on ImageProcessing and Its Applications, volume 1, pages 313–316, July 1997.
[74] J. D. Villasenor, B. Belzer, and J. Liao. ”Wavelet Filter Evaluation for ImageCompression”. IEEE Transactions on Image Processing, 4(7):1053–1060, Aug.1995.
[75] Z. Wang and A. Bovik. ”A Universal Image Quality Index”. IEEE Signal Pro-cessing Letters, 9(3):81–84, March 2002.
[76] Y. F. Wong, E. Viscito, and E. Linzer. ”PreProcessing of Video Signals forMPEG Coding by Clustering Filter”. In Proc. IEEE Internatonal Conference onImage Processing, volume 2, pages 2129–2133, 1995.
[77] Y. I. Wong. ”Nonlinear Scale-Space Filtering and Multiresolution System”. IEEETransactions on Image Processing, vol. 4:pages 774–786, June 1995.
[78] G. Xing, J. Li, S. Li, and Y.-Q. Zhang. ”Arbitrarily Shaped video-Object Codingby Wavelet”. IEEE Transactions on Circuits and Systems for Video Technology,11(10):1135–1139, Oct. 2001.
[79] C. H. Yeh, H. T. Chang, and C. J. Kuo. ”Boundary Block-Searching Algo-rithm for Arbitrary Shaped Coding”. In Proc. IEEE International Conferenceon Multimedia, volume 1, pages 473–476, 2002.
[80] W. Zhe, S. Wang, R.-S. Lin, and S. Levinson. ”Tracking of Object with SVMRegression”. In Proc. IEEE International Conference on Computer Vision andPattern Recognition, volume 2, pages II–240–II–245, 2001.
[81] Y. F. Zheng. ”Method for Dynamic 3D Wavelet Transform for Video Com-pression”. U.S. Pattent Application Submitted by the Department of ElectricalEngineering, The Ohio State University, Dec. 2000.
[82] Z. Zheng and I. Cumming. ”SAR Image Compression Based on the DiscreteWavelet Transform”. In Proc. IEEE International Conference on Signal Pro-cessing, pages 787–791, Oct. 1998.
141
[83] V. Zlokolica, W. Philips, and D. Van De Ville. ”A New Non-linear Filter for VideoProcessing”. In Proc. IEEE Benelux Signal Processing Symposium, volume 2,pages 221–224, 2002.
142