16
2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014 Novel DCT-Based Image Up-Sampling Using Learning-Based Adaptive k -NN MMSE Estimation Kwok-Wai Hung, Student Member, IEEE, and Wan-Chi Siu, Fellow, IEEE Abstract—Image up-sampling in the discrete cosine transform (DCT) domain is a challenging problem because DCT coefficients are de-correlated, such that it is nontrivial to estimate directly high-frequency DCT coefficients from observed low-frequency DCT coefficients. In the literature, DCT-based up-sampling algorithms usually pad zeros as high-frequency DCT coefficients or estimate such coefficients with limited success mainly due to the nonadaptive estimator and restricted information from a single observed image. In this paper, we tackle the problem of estimating high-frequency DCT coefficients in the spatial domain by proposing a learning-based scheme using an adap- tive k-nearest neighbor weighted minimum mean squares error (MMSE) estimation framework. Our proposed scheme makes use of the information from precomputed dictionaries to formulate an adaptive linear MMSE estimator for each DCT block. The scheme is able to estimate high-frequency DCT coefficients with very successful results. Experimental results show that the proposed up-sampling scheme produces the minimal ringing and blocking effects, and significantly better results compared with the state-of-the-art algorithms in terms of peak signal-to-noise ratio (more than 1 dB), structural similarity, and subjective quality measurements. Index Terms— Interpolation, scalable video coding, super- resolution (SR), up-sampling. I. I NTRODUCTION I MAGE and video interpolation, up-sampling and super- resolution (SR) recently attract much attention in the com- munity due to their wide applications, including conventional video coding [1]–[4], scalable video coding [5]–[9], error concealment [10], video zooming, surveillance, multiview synthesis [11], and so on. For example, 4k resolution dis- plays will become widely available in the consumer market, such that videos in the current standard- and high-definition formats are required for displaying on the 4k resolution displays. In recent years, many interpolation, up-sampling and SR algorithms were proposed in the literature. Most of the available algorithms perform the up-sampling process in the spatial domain [12]–[19]. Recent developments show that up- sampling an image in the wavelet domain [20], [21] and the discrete cosine transform (DCT) domain [22]–[36] can provide Manuscript received April 22, 2013; revised August 9, 2013 and February 22, 2014; accepted May 2, 2014. Date of publication June 5, 2014; date of current version December 3, 2014. This work was supported in part by the Center for Signal Processing, Hong Kong Polytechnic University, Hong Kong, and in part by the Research Grant Council of the Hong Kong Special Administrative Region Government B-Q38S. This paper was recommended by Associate Editor E. Steinbach. The authors are with the Centre for Signal Processing, Department of Elec- tronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2014.2329352 better performance and lower complexity compared with those in conventional spatial up-sampling methods. Since images/videos are often coded using the block- based DCT (e.g., H.264/AVC), it is computationally attrac- tive to resize images directly in the block DCT domain. The most common way of down-sampling is to truncate directly the high-frequency DCT coefficients in all transformed DCT blocks [22]–[43]. Such simple process can simultane- ously perform the anti-aliasing filtering and down-sampling. Surprisingly, a few analyses show that down-sampling in the DCT domain results in high anti-aliasing performance (narrow transition band near the Nyquist frequency) and preserves the most important energy due to the sophisticated energy compaction performance of DCT [8], [23], [24]. Due to these advantages, there is a large amount of image/video contents which have been obtained by this down-sampling process in the DCT domain. Hence, there is a large demand for algorithms to restore the high-resolution image from the low-resolution (LR) image that was formed in this process [6], [22]–[36]. To avoid heavy computation, the previous development gives focus mainly on low complexity algorithms to up-sample images in the DCT domain [22]–[31]. The most conventional way is to tackle the problem by padding zeros as the high-frequency DCT coefficients [22]–[30]. However, the zero padding approaches introduce some ringing and blocking artifacts. Hence, some research works try to improve the zero padding approaches by: 1) sub-band approximation and low-pass truncation [31]; 2) using a larger DCT block size [23], [25], [28], [30]; 3) reducing the ringing artifacts generally [32], [33]; 4) reducing the blocking artifacts by overlapping [6]; and 5) estimating the high-frequency DCT coefficients directly [34]–[36]. Unfortunately, the improvements are often limited in terms of peak signal- to-noise ratio (PSNR) values, i.e., usually much less than 1 dB on average, and are limited to parts of the images with improvement only on vertical and horizontal edges [32], and the training data have to be relevant to the testing data [34]. There are several reasons why the previous algorithms cannot provide significant improvements to the zero padding approach. First, methods that aim at reducing the ringing and blocking artifacts solely reduce the visible artifacts but do not enhance the sharpness of the up-sampled image by compensating the loss of all high-frequency components due to zero padding [6], [31]–[33]. In fact, the source of ringing artifacts and blocking artifacts [6], [31]–[33] comes from inaccurate estimation of the high-frequency DCT coefficients. However, due to the de-correlation property of DCT, a direct 1051-8215 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

Novel DCT-Based Image Up-Sampling UsingLearning-Based Adaptive k-NN MMSE Estimation

Kwok-Wai Hung, Student Member, IEEE, and Wan-Chi Siu, Fellow, IEEE

Abstract— Image up-sampling in the discrete cosine transform(DCT) domain is a challenging problem because DCT coefficientsare de-correlated, such that it is nontrivial to estimate directlyhigh-frequency DCT coefficients from observed low-frequencyDCT coefficients. In the literature, DCT-based up-samplingalgorithms usually pad zeros as high-frequency DCT coefficientsor estimate such coefficients with limited success mainly dueto the nonadaptive estimator and restricted information froma single observed image. In this paper, we tackle the problemof estimating high-frequency DCT coefficients in the spatialdomain by proposing a learning-based scheme using an adap-tive k-nearest neighbor weighted minimum mean squares error(MMSE) estimation framework. Our proposed scheme makes useof the information from precomputed dictionaries to formulatean adaptive linear MMSE estimator for each DCT block. Thescheme is able to estimate high-frequency DCT coefficientswith very successful results. Experimental results show that theproposed up-sampling scheme produces the minimal ringing andblocking effects, and significantly better results compared withthe state-of-the-art algorithms in terms of peak signal-to-noiseratio (more than 1 dB), structural similarity, and subjectivequality measurements.

Index Terms— Interpolation, scalable video coding, super-resolution (SR), up-sampling.

I. INTRODUCTION

IMAGE and video interpolation, up-sampling and super-resolution (SR) recently attract much attention in the com-

munity due to their wide applications, including conventionalvideo coding [1]–[4], scalable video coding [5]–[9], errorconcealment [10], video zooming, surveillance, multiviewsynthesis [11], and so on. For example, 4k resolution dis-plays will become widely available in the consumer market,such that videos in the current standard- and high-definitionformats are required for displaying on the 4k resolutiondisplays. In recent years, many interpolation, up-sampling andSR algorithms were proposed in the literature. Most of theavailable algorithms perform the up-sampling process in thespatial domain [12]–[19]. Recent developments show that up-sampling an image in the wavelet domain [20], [21] and thediscrete cosine transform (DCT) domain [22]–[36] can provide

Manuscript received April 22, 2013; revised August 9, 2013 andFebruary 22, 2014; accepted May 2, 2014. Date of publication June 5, 2014;date of current version December 3, 2014. This work was supported in part bythe Center for Signal Processing, Hong Kong Polytechnic University, HongKong, and in part by the Research Grant Council of the Hong Kong SpecialAdministrative Region Government B-Q38S. This paper was recommendedby Associate Editor E. Steinbach.

The authors are with the Centre for Signal Processing, Department of Elec-tronic and Information Engineering, Hong Kong Polytechnic University, HongKong (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2014.2329352

better performance and lower complexity compared with thosein conventional spatial up-sampling methods.

Since images/videos are often coded using the block-based DCT (e.g., H.264/AVC), it is computationally attrac-tive to resize images directly in the block DCT domain.The most common way of down-sampling is to truncatedirectly the high-frequency DCT coefficients in all transformedDCT blocks [22]–[43]. Such simple process can simultane-ously perform the anti-aliasing filtering and down-sampling.Surprisingly, a few analyses show that down-sampling in theDCT domain results in high anti-aliasing performance (narrowtransition band near the Nyquist frequency) and preservesthe most important energy due to the sophisticated energycompaction performance of DCT [8], [23], [24]. Due tothese advantages, there is a large amount of image/videocontents which have been obtained by this down-samplingprocess in the DCT domain. Hence, there is a large demandfor algorithms to restore the high-resolution image from thelow-resolution (LR) image that was formed in this process[6], [22]–[36].

To avoid heavy computation, the previous developmentgives focus mainly on low complexity algorithms to up-sampleimages in the DCT domain [22]–[31]. The most conventionalway is to tackle the problem by padding zeros as thehigh-frequency DCT coefficients [22]–[30]. However, thezero padding approaches introduce some ringing and blockingartifacts. Hence, some research works try to improve thezero padding approaches by: 1) sub-band approximationand low-pass truncation [31]; 2) using a larger DCT blocksize [23], [25], [28], [30]; 3) reducing the ringing artifactsgenerally [32], [33]; 4) reducing the blocking artifactsby overlapping [6]; and 5) estimating the high-frequencyDCT coefficients directly [34]–[36]. Unfortunately, theimprovements are often limited in terms of peak signal-to-noise ratio (PSNR) values, i.e., usually much less than1 dB on average, and are limited to parts of the images withimprovement only on vertical and horizontal edges [32], andthe training data have to be relevant to the testing data [34].

There are several reasons why the previous algorithmscannot provide significant improvements to the zero paddingapproach. First, methods that aim at reducing the ringingand blocking artifacts solely reduce the visible artifacts butdo not enhance the sharpness of the up-sampled image bycompensating the loss of all high-frequency components dueto zero padding [6], [31]–[33]. In fact, the source of ringingartifacts and blocking artifacts [6], [31]–[33] comes frominaccurate estimation of the high-frequency DCT coefficients.However, due to the de-correlation property of DCT, a direct

1051-8215 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2019

estimation of high-frequency DCT coefficients in the DCTdomain is nontrivial unless the training data are highly relevantto the testing data [34]. Instead, natural images are muchcorrelated in the spatial domain, such that better results wereachieved by estimating high-frequency DCT coefficients in thespatial domain [35], [36]. However, the fixed Wiener filtercoefficients for half-pixel interpolation in H.264 coding arenot trained specially for estimating DCT coefficients [35],which leads to [36] on training a specific Wiener filter for thispurpose. The major drawback of these methods on directlyestimating the high-frequency DCT coefficients [34]–[36] isthat they used one fixed estimator for all DCT blocks in theimage and only made use of limited local information fromthe observed LR image.

If we can accurately estimate high-frequency DCT coeffi-cients, not only the ringing and blocking artifacts can be min-imized, but the fidelity of the up-sampled image can also beimproved and the reconstructed image approaches the groundtruth image. In this paper, we propose a novel up-samplingscheme to estimate high-frequency DCT coefficients. The ideais to formulate an adaptive estimator for each DCT block inthe up-sampled image by making use of some precomputeddictionaries. Our scheme comprises three major components,which represent three contributions. The first contribution isthe proposed learning-based algorithm based on the adaptivek-nearest neighbor (k-NN) weighted minimum mean squareserror (MMSE) framework [12], which is specifically designedto up-sample the images that were down-sampled in the DCTdomain. This proposed learning-based algorithm makes useof the first k-NN MMSE estimation to obtain accurately high-frequency DCT coefficients. The second contribution is theproposed iterative back-projection [44], [45] which refines thelow-frequency DCT coefficients of any initially estimated HRimage to match with that of the observed LR image. Thethird contribution is the proposed iterative refinement step forde-blurring and de-blocking by applying the DCT down-sampling image model as the blurring kernel in the unsharpmask [58], [59]. Thus, our scheme is able to estimatehigh-frequency DCT coefficients with very successful resultsand address the major weaknesses of previous up-samplingmethods [6], [22]–[36].

Due to an accurate estimation of high-frequency DCTcoefficients, results of our experimental work show thatthe proposed up-sampling scheme produces minimalringing and blocking effects. Objective measurements alsoshow that the proposed up-sampling scheme producessignificantly better results compared with the zero paddingapproach [24] and other state-of-the-art up-samplingalgorithms [6], [35], in terms of PSNR (more than 1 dB)and structural similarity (SSIM) for a wide range of naturalimages and videos. Both up-sampling factors of two and fourwere used.

The rest of the organization of this paper is as follows.Section II gives a literature review of the related works.Section III gives the problem formulation, including the imagemodel in the spatial domain that represents the down-samplingprocess in the DCT domain, to formulate the iterative back-projection (IBP) for refinement of the later process. Section IV

gives a detailed description of the proposed up-samplingscheme, including the proposed learning-based adaptive k-NNMMSE algorithm and the iterative refinement process forde-blurring and de-blocking. Section V gives the results of ourcomprehensive experimental work, and Section VI concludesour work and gives a discussion on possible future directions.

II. LITERATURE REVIEW

In the literature, the DCT-based up-sampling algorithms canbe categorized into two classes. The first class is based onzero padding [22]–[30]. The concept of zero padding is toup-sample a block of low-frequency DCT coefficients bypadding zeros as the high-frequency DCT coefficients. Thisconcept was first implemented in [22] to increase the resolu-tion for motion compensation. Subsequently, [24] generalizesthe concept of zero padding with a sophisticated formulationwhich merges four DCT down-sampled image blocks in thespatial domain and then groups them together as a block in theDCT domain. Park and Park [23] further analyze the perfor-mance of this type of resizing filter in the DCT domain. Basedon this type of filter design, resizing algorithms with arbitraryresizing factors were proposed [26]–[30]. The basic idea isto down-sample and up-sample for different factors to achievearbitrary (rational) resizing factors. Based on the resizing algo-rithms [26]–[30], a computationally scalable algorithm wasproposed by controlling the number of input and output coef-ficients [25]. A related work on the zero padding is the sub-band approximation and low-pass truncation approach [31].This method weights the low-frequency DCT coefficients andconcatenates the neighboring DCT blocks with zero padding.

The second class of the DCT-based up-sampling algorithmaims to tackle the weakness of the zero padding approach. Themajor objective is to reduce the ringing and blocking artifactsof the zero padding approach [6], [32]–[36]. To removeblocking artifacts, the most intuitive way is to overlap theestimated DCT blocks to reduce the blocking artifacts [6]. Tominimize the ringing artifacts, [32] proposes a hybrid schemeusing a column and row representation of DCT blocks forreducing the ringing artifacts of horizontal and vertical edges.Lim and Park [33] compared several de-ringing methods andproposed a mask map for extracting the ringing areas forde-ringing. Other improvements of the zero padding approachcan be obtained by directly estimating high-frequencyDCT coefficients instead of padding zeros [34]–[36].Cho and Lee [34] analyze the feasibility of directly estimatinghigh-frequency DCT coefficients from low-frequency DCTcoefficients by training one linear minimal mean squares errorestimator. Their experimental results showed that, since onetrained estimator is used for all image blocks in the image,the training data had to be highly relevant to the testing(validating) data, to provide a better result compared withthat of the zero padding approach. Recently, [35] and [36]investigated methods making use of DCT with the spatial inter-polation methods, which mainly make use of a fixed coefficientWiener filter to estimate high-frequency DCT coefficients.However, the fixed coefficient Wiener filter is not adaptive,such that all image blocks use the same filter regardless of

Page 3: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2020 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

the image contents. This means that the Wiener filter just useslimited local information from a single observed LR image forestimating the DCT coefficients.

Other DCT-related up-sampling methods include the facehallucination method which aims at reconstructing 17 lowestfrequency DCT coefficients (out of 64 coefficients) instead ofusing all high-frequency DCT coefficients [37]. An example-based method was also proposed [38], which searches fora suitable HR patch by making use of low-frequency DCTcoefficients. However, this method requires a relevant HRimage as the reference for searching. Ni and Nguyen [13]give a SR method, which applies the support vector machinein the DCT domain to estimate the HR image. However, thismethod does not aim at reconstructing the high-frequencyDCT coefficients [13]. Some SR methods were also proposed[39], [40], which aim at reconstructing a HR image, but theimage quality is degraded for the DCT quantization errors.

III. PROBLEM FORMULATION

A. Image Model of Down-Sampling in the DCT Domain

In this section, we will formulate an image model in thespatial domain that represents the down-sampling process inthe DCT domain, such that we can conveniently make useof the IBP [44], [45] in the following section. Without lossof generality, let the down-sampling factor be dyadic. Sincethe problem of high-frequency DCT coefficients estimationshould better be done in the spatial domain [35], [36], wewill solve the problem in the spatial domain throughout thispaper. In the spatial domain, let us denote the original HRimage to be down-sampled as Y ∈ �2n×2m , where n and mare the dimensions of the down-sampled LR image. Since thedown-sampling process is block-based, let us partition the HRimage into blocks of size [2q × 2q] and index the block asYi ∈ �2q×2q , where i is the block index. In this paper, let ususe yi ∈ �4q2

to represent the vector form of block Yi for asimpler and clearer representation in later stages. The block-based down-sampling process in the DCT domain [22]–[43]can be generalized by the following image model:

xi = FTDCT(q)

DFDCT(2q)yi (1)

where FDCT(2q) ∈ �4q2×4q2is the 2-D forward transform

matrix (which transforms the HR block into the DCT domain),D ∈ �q2×4q2

is the down-sampling matrix that down-samplesthe transformed block from [4q2 ×1] to [q2 ×1] by truncatingthe high-frequency DCT coefficients and scaling the low-frequency DCT coefficients, FT

DCT(q)∈ �q2×q2

is the 2-Dinverse transform matrix (which inverse-transforms the down-sampled block to the spatial domain), and xi ∈ �q2

is the LRblock. Note that the 2-D forward transform matrix FDCT(2q)

can be written as the Kronecker product of two 1-D forwardtransform DCT matrices [10], [60]

FDCT(2q) = F1DCT(2q) ⊗ F1DCT(2q) (2)

where F1DCT(2q) ∈ �2q×2q is the 1-D forward DCT matrixfor an 1-D signal of length 2q , and the Kronecker product is

Fig. 1. Graphical illustration of the dyadic down-sampling process in theblock DCT domain [22]–[43].

Fig. 2. Graphical illustration of the IBP process in (5).

defined as [60]

A ⊗[

a bc d

]=

[Aa AbAc Ad

]. (3)

Similarly, FDCT(q) ∈ �q×q can be derived accordingly. Notethat FT

DCT(2q)FDCT(2q) = I and FT

DCT(q)FDCT(q) = I, where I is the

identity matrix. A graphical illustration of the down-samplingprocess is shown in Fig. 1 in detail.

B. Formulating the IBP Framework

In this section, we will make use of the image model in (1)to formulate a version of the IBP [44], [45], which iterativelyrefines the estimate of HR image to fit the image model. Let usconsider the block-based model due to the block-based down-sampling process. By making use of the image model in (1),the IBP means to find iteratively an estimate of the HR block,which minimizes the squared L2 norm between the simulatedLR block and the observed LR block, as follows:

yi = arg minyi

∥∥xi − FTDCT(q)

DFDCT(2q)yi∥∥2

2. (4)

This IBP finds a solution through the iterative process

yn+1i = yn

i + λFTDCT(2q)

DT FDCT(q)

(xi − FT

DCT(q)DFDCT(2q)y

ni

)i.e., yn+1

i = yni + λFT

DCT(2q)DT (

FDCT(q)xi − DFDCT(2q)yni

)(5)

where n is the iteration index and λ is the step size. Sincethe squared L2 norm in (4) only defines the error between thelow-frequency DCT coefficients, during the IBP in (5) only thelow-frequency DCT coefficients of the estimated HR block,DFDCT(2q)y

ni , are compared and matched with that of the LR

block, FDCT(q)xi . The difference, FDCT(q)xi − DFDCT(2q)yni , is

projected onto the HR space and then added with the currentestimate, yn

i , to form yn+1i , as shown in Fig. 2. The high-

frequency DCT coefficients of the initial estimate of HR block,y0

i , are unchanged during the iterative back-projection. In other

Page 4: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2021

words, the initialization, y0i , defines the high-frequency DCT

coefficients of the converged solution that can minimize thesquared L2 norm in (4).

Different up-sampling algorithms in [6], [22]–[36] use dif-ferent methods to estimate high-frequency DCT coefficients,such that these algorithms can be represented by our suggestedIBP framework in (5), with different ways to set the initialvalues for high-frequency DCT coefficients. Let us illustratethree representative examples as follows.

1) Zero Padding Approach [22]–[24]: The zero paddingapproach pads zero as the high-frequency DCT coefficients.Hence, it is equivalent to initializing the HR block with azero vector, i.e., y0

i = 0. Let step size λ be 4 (due to thetwo 0.5 multipliers, during the IBP in Fig. 2). One iteration issufficient to refine the low-frequency DCT coefficients; hence,the equivalent IBP representation of the zero padding approachbecomes

yzeroi = y0

i + 4 · λFTDCT(2q)

DT (FDCT(q)xi − DFDCT(2q)y

0i

)yzero

i = 4 · FTDCT(2q)

DT FDCT(q)xi (6)

where yzeroi is the HR block obtained by the zero padding

approach. Specifically, zero coefficients are inserted as thehigh-frequency DCT coefficients in the DCT domain. TheDCT coefficients are then converted into the spatial domainas, FT

DCT(2q)DT FDCT(q)xi , which are scaled up by 4.

2) Zero Padding Approach With Overlapping [6]: The zeropadding approach with overlapping [6] further overlaps theestimated HR block, to reduce the blocking artifacts. Forexample, there are totally 16 2 × 2 sub-blocks in an 8 × 8HR block depending on their locations in the HR block.To implement the overlapping approach, let us shift theobserved LR image by [0, 1, 2, 3] pixels in both x- andy-direction and denote the shifted LR blocks in these16 shifted LR images as xi(m), where m represents the shiftindex. We can apply the zero padding approach in (6) toobtain the shifted HR blocks from the shifted LR blocks. Theaverages of the overlapped sub-blocks (within the shifted HRblocks) are then taken to obtain the final result, yover

i,s , as

yoveri,s = 1

M

∑m=[1,M]

Sm4 · FTDCT(2q)

DT FDCT(q)xi(m) (7)

where M is the number of overlaps, Sm is the operator toextract the corresponding sub-blocks and yover

i,s is the resultof averaging sub-blocks. For the sake of implementation, weobtain the result of each 2 × 2 sub-block yover

i,s in the finalHR block yover

i through averaging in (7). Overlapping theestimated HR blocks results in nonzero high-frequency DCTcoefficients in the final HR block yover

i , which can improvethe PSNR by around 0.1–0.2 dB on average, due to thede-blocking effects.

3) Hybrid DCT-Wiener Interpolation Scheme: The hybridDCT-Wiener interpolation scheme [35] estimates high-frequency DCT coefficients using a Wiener filter with fixedcoefficients [46], which is originally proposed as a half-pixelinterpolation filter for H.264/AVC. Let us denote yWF

i as theHR block estimated by the fixed coefficient Wiener filter.

Fig. 3. Up-sampled results of different algorithms. (a) Zero padding[22]–[24]. (b) Overlapping zero padding [6]. (c) Bicubic interpolation.(d) Hybrid DCT-Wiener scheme [35]. (e) Proposed up-sampling scheme.(f) Original high-resolution image. Please read the electronic version for aclearer view.

Hence, the equivalent IBP representation of this scheme isto refine the initialized yWF

i , as

yWF(IBP)i = yWF

i +4 · FTDCT(2q)

DT (FDCT(q)xi −DFDCT(2q)y

WFi

)(8)

where the result of the hybrid DCT-Wiener interpolationscheme is represented by yWF(IBP)

i . During the iteration using(8), the low-frequency DCT coefficients of the initially esti-mated HR block, DFDCT(2q)y

WFi , is refined to match with that of

the observed LR block, while keeping the high-frequency DCTcoefficients unchanged. The hybrid DCT-Wiener interpolationscheme combines the low-frequency DCT coefficients of theobserved LR block with the high-frequency DCT coefficientsobtained by the fixed coefficient Wiener filter.

IV. PROPOSED UP-SAMPLING SCHEME USING

k-NN MMSE ESTIMATION

As we have explained in Section I, the availableup-sampling algorithms do not estimate high-frequency DCTcoefficients very well. For example, the zero padding approachover-simplifies the model and assumes that most of the impor-tant energy concentrates in low-frequency DCT coefficients.In reality, the edge and texture areas suffer much from paddingzeros to form high-frequency DCT coefficients, as shown inFig. 3. The overlapping zero padding approach reduces theblocking artifacts; however, the ringing artifacts exist. TheHybrid DCT-Wiener scheme does not estimate high-frequencyDCT coefficients well by the nonadaptive Wiener filter. Hence,there is much room for improvements by making use ofa better up-sampling scheme.

In this paper, we propose a novel up-sampling scheme,which can produce the minimal ringing and blocking artifacts,as shown in Fig. 3(e). The key to estimate high-frequencyDCT coefficients is to formulate an adaptive estimator foreach DCT block by making use of extra information from

Page 5: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2022 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

Fig. 4. Flow diagram of the proposed up-sampling scheme (nearly allprocesses are in the spatial domain).

Fig. 5. Mismatch of the interpolated 8 × 8 HR pixels and the original8 × 8 HR pixels due to half-pixel shifts.

some precomputed training sets. The learning-based interpo-lation technique fits our need because the learning processcan accommodate various image details including edges andtexture areas [12]–[15] according to the training data. Sincethe estimation of high-frequency DCT coefficients in the DCTdomain is nontrivial [34], we will propose a learning-basedalgorithm that performs the estimation in the spatial domain.

Fig. 4 shows a flow diagram of the proposed scheme.Initially, the proposed learning-based algorithm estimatesthe HR image [Fig. 4(a)]. It is convenient to refine thelow-frequency parts by the proposed iterative back-projectionframework in Section III-B [Fig. 4(b)] because the low-frequency DCT coefficients are given in the observed LRimage. To follow the step after the low-frequency refinement,we propose an iterative refinement process for de-blurring andde-blocking until it reaches the maximum number of iterations[Fig. 4(c)]. Let us explain these three major components of ourscheme in the following sections.

A. Learning-Based k-NN MMSE Algorithm for theInitialization of High-Frequency DCT Coefficients

The advantage of the learning-based interpolation algorithmis its adaptation to the input and output training data, suchthat the observed LR image can learn from the training datafor better up-sampling. Particularly, there involves differentpixels shifts and specific blurring effects (due to the trunca-tion of high-frequency DCT coefficients) for images down-sampled in the DCT domain. For example, if we apply theconventional interpolation algorithms [17], [18] (using dyadicup-sampling factors) to an image down-sampled in the DCTdomain, the interpolated HR image has a half-pixel shiftwith the original HR image, as shown in Fig. 5. The pixelshift problem can apparently be resolved by the interpolationalgorithms designed for arbitrary factor up-sampling; however,the specific blurring effects still cannot be addressed.

Our approach involves a learning-based interpolation algo-rithm for estimating the high-frequency DCT coefficients,

Fig. 6. Training blocks: HR block (blue), larger LR block (orange) andLR block (green).

which is based on designing a new framework of the adap-tive k-NN weighted MMSE estimation [12]. Particularly,we empirically optimize the dictionary size and incorporatea spatial relationship constraint for a better stability and com-patibility of the estimated data. There are two major steps inour proposed learning-based algorithm, namely the dictionarytraining step and the estimation step. In the dictionary trainingstep, we build a dictionary that comprises HR and LR trainingpairs from the HR training images and their LR images(down-sampled in the DCT domain). In the estimation step,relevant image pairs are searched within the dictionary usingthe adaptive k-NN criterion [47]. The k-nearest image pairsare grouped to perform the weighted MMSE estimation [51],to obtain the filter weights to be used to estimate the HR blockfrom the observed LR block.

1) Offline Dictionary Training Process: Let us denoteY′ ∈�2n×2m as the high-resolution training image and X′ ∈ �n×m

as the LR training image down-sampled using the image modelin (1), where n and m are the dimensions of the image. Letus also partition the HR training image into nonoverlappingblocks [2q × 2q], and locate the corresponding nonover-lapping blocks [q × q] in the down-sampled LR image, asshown in Fig. 6. Note that the same block size, [q × q], wasused in the down-sampling process as well. The HR trainingblock is the original block covering high- and low-frequencyDCT coefficients, whereas the LR training block is the down-sampled block covering the low-frequency DCT coefficientsonly. All the blocks in the proposed learning-based algorithmare processed in the spatial domain. We can then vectorizethe HR and LR training blocks and denote the HR trainingblocks as y′

i ∈ �4q2and the LR training blocks as x′

i ∈ �q2.

Our trained dictionaries are represented by matrices, which aregrouped from vectors of the HR and LR training blocks, as

Bx′ = ⌊x′

1 x′2 ..... x′

Q

⌋(9)

By′ = ⌊y′

1 y′2 ..... y′

Q

⌋(10)

where Bx′ ∈ �q2×Q is the LR training dictionary, By′ ∈�4q2×Q is the HR training dictionary and Q is the total numberof blocks (vectors).

During the online searching process, if we directly use theobserved LR block to search for the relevant LR trainingblocks in the precomputed dictionary in a block-by-block

Page 6: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2023

Algorithm 1 Dictionary Training Process of the Proposedk-NN MMSE Algorithm

Input: HR training image Y′Output: Dictionaries B′

y,B′x, B′

u (as defined below)

1) Initialization

(a) Down-sample the HR training image Y′ usingimage model in (1) into LR training imagedenoted as X′.

(b) Partition the HR training image into nonover-lapping blocks [2q × 2q] ({y′

i}).(c) Partition the LR training image into the corre-

sponding nonoverlapping blocks [q × q] ({x′i}).

(d) Partition the LR training image into the corre-sponding overlapping blocks [(q +2)× (q +2)]({u′

i }).2) Build the dictionaries

(a) Vectorize the image blocks in 1(b)–(d), denotedas y′

i , x′i , u′

i , where i represents the blockindex.

(b) Group the vectors into matrixes as in (9)–(11),denoted as By′ ,Bx′ , Bu′ .

(c) Iterate i for all vector u′i in dictionary Bu′ ,

(i) Compute the average absolute gradient ofthe vector u′

i .(ii) If the gradient is less than 0.75, delete the

corresponding vectors y′i , x′

i , u′i in dictio-

naries By′ ,Bx′ , Bu′ .

manner, the searching process (to be explained in the followingsection) becomes block independent, i.e., the neighborhood ofLR blocks is ignored. In the literature, a larger block size isoften used to cover the neighborhood to search for the relevantLR blocks [15]. To do so, we construct a larger LR blockdenoted as u′

i ∈ �(q+2)2by including one more line of pixels

around the LR training block x′i . The dictionary that includes

the larger LR blocks is given by

Bu′ = ⌊u′

1 u′2 ..... u′

Q

⌋(11)

where Bu′ ∈ �(q+2)2×Q is the dictionary for searching thek-NNs in the estimation step. Since the smooth area may notbenefit from the weighted MMSE estimation, we compute theaverage absolute gradient of the larger LR block to excludethe training blocks, which belongs to the smooth area in thedictionaries described in (9)–(11), to save computation. Letus summarize the dictionary training process described in thissection in Algorithm 1.

2) Adaptive k-NN Weighted MMSE Estimation Process:Let us assume that the required dictionaries have been pre-computed. During the online estimation, the k-NNs (vectorsof LR and HR training blocks) are searched within thedictionaries to perform the weighted linear MMSE estimation,to estimate the HR block of full frequency spectrum from theobserved input LR block with low-frequency spectrum. Givenan observed LR image X ∈ �n×m , which is a down-sampled

Fig. 7. Block diagram of the estimation process of the proposed k-NNMMSE algorithm (before overlapping).

version of the unobserved HR image Y ∈ �2n×2m using theimage model in (1), we partition the observed LR image intononoverlapping LR blocks [q × q] and overlapping larger LRblocks [(q +2)×(q +2)], as in the dictionary training process.Let us also vectorize and denote the LR block as xi ∈ �q2

and the larger LR block as ui ∈ �(q+2)2.

As an example, Fig. 7 shows the searching and estimationprocesses for the observed LR block, xi . The searching processis done by the computation of the normalized correlationcoefficient (NCC) [49], [50] between the observed larger LRblock ui and all the larger LR training blocks u′

i in thedictionary Bu′ . All the training blocks in dictionary Bu′ arethen sorted in descending order in terms of the computednormalized correlation coefficients, such that

Corr(ui , u′j ) ≥ Corr(ui , u′

j+1) for ∀ j (12)

where Corr(.,.) computes the NCC. Hence, the larger LRtraining blocks u′

i are sorted into u′j , where the values of

j ∈ [1, Q] represent the index of the sorted training blocksfrom the highest NCC to the lowest NCC. After sorting, thetraining vector u′

1 is the most similar vector to the observedvector ui . The sorted training blocks which have the largestk NCC are chosen using the following adaptive criterion:

k = arg mink

|k| subject to∑

j=[1,k]Corr(ui , u′

j ) ≥ T1 (13)

where T1 is the threshold to be determined empirically. Resultsof our experiments found that using the NCC to search forthe k-NNs gives a slightly better result than those usingthe Euclidean distance. It is because the MMSE estimationestimates the linear coefficients using second-order statistics,whereas the first-order criterion (such as Euclidean distance)to search for relevant blocks is less accurate. Instead, thesecond-order criterion (such as NCC) allows us to find relevantblocks with similar intensity variations of the input data, whichmatches the need for MMSE estimation. Note that the valueof Corr(.,.) is bounded to [0, 1]. Hence, for example, if we setthreshold T1 to 1000, the minimal value of k is 1000 and anupper bound, T3, of k should be set accordingly.

Fig. 7 shows the flow of the estimation process of theproposed algorithm. The training vectors x′

i and y′i are sorted

into x′j and y′

j by copying the sorted index of the trainingvector u′

j . Hence, after sorting, we assume that the training

Page 7: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2024 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

vector x′1 is the most similar vector to the observed vector xi ,

and the training vector y′1 is the most similar vector to the

unobserved vector yi (which is the desirable output). We canthen group the k-NNs from the sorted LR and HR trainingblocks into matrixes, as

Tx′ = ⌊x′

1 x′2 ..... x′

k

⌋(14)

Ty′ = ⌊y′

1 y′2 ..... y′

k

⌋(15)

where x′j and y′

j have been sorted in a descending order interms of the computed NCC between the larger LR block,ui , and the larger LR training block, u′

j , i.e., Corr(ui , u′j ),

and k was found using (13). Note that the dimensions of Tx′and Ty′ are [q2 × k] and [4q2 × k], respectively. Then, theweighted MMSE estimation [51] is performed to obtain theadaptive linear filter coefficients, as

Hi = arg minHi

(Hi Tx′ − Ty′)W(Hi Tx′ − Ty′)T (16)

and its closed-form solution is given by

∂(HiTx′ − Ty′)W(Hi Tx′ − Ty′)T

∂Hi= 0

2(Hi Tx′ − Ty′)WTTx′ = 0

Hi Tx′WTTx′ = Ty′WTT

x′

Hi = Ty′WTTx′(Tx′WTT

x′)−1

(17)

where Hi ∈ �4q2×q2contains the linear filters coefficients,

and W∈ �k×k is the weighting matrix that weights thecontribution of each pair of LR and HR nearest trainingneighbors. Let us define the weighting function using theexponential form, which has been considered useful in manysituations [12], [16], [17], as

W j = exp(Corr(ui , u′j ) · f ) (18)

where W j represents the diagonal element of the weightingmatrix W and contains the weight of each nearest neighboringtraining pairs, x′

j and y′j , and f is a parameter to be

determined empirically. The weighting function in (18) givesa higher weight to the training pair, which has a largercorrelation with the observed larger LR block ui . Since thesmooth area does not much benefit from the k-NN weightedMMSE estimation (17), it is unnecessary to apply the k-NNMMSE algorithm (12)–(18) to the smooth area. Hence, weprecomputed a set of filter coefficients using the completedictionaries Bx′ and By′ , as

Huni = By′BTx′(Bx′BT

x′)−1 (19)

where Huni ∈ �4q2×q2represents the universal filter coeffi-

cients for the smooth area. Before carrying out the searchingprocess in (12), if the variance of the LR block xi is smallerthan a threshold T2, we then skip all the steps in (12)–(18) anddirectly use the universal filter coefficients Huni from (19) toperform the estimation in (20). Finally, the filter coefficientsobtained from the k-NN weighted MMSE estimation in (17) or

the universal filter in (19) are used together with the observedLR block xi to estimate the HR block in (20)

yi = Hi xi . (20)

3) Overlapping as the Spatial Relationship Constraint:In the literature, the learning-based interpolation algorithms

[12]–[15] always incorporate a spatial relationship constraintto make the found k-nearest image pairs to be coherent in theneighborhood of image blocks {xi}. For example, the beliefpropagation was adopted in the example-based SR [15] toconstrain the overlapped parts of the estimated HR blocks tobe coherent. Later on, the single-pass Markov random fieldwas proposed in the k-NN MMSE algorithm to increase theweights of similar k-nearest image pairs within a neighbor-hood [12].

Different from conventional block sizes for SR [12]–[15],the estimated HR block of the proposed algorithm has a largesize of [8 × 8] when the input LR block size is [4 × 4].One intuitive way to accommodate the spatial relationshipconstraint is to overlap the large size HR blocks, such thatsimilar k-nearest image pairs are inevitably reused within alarge neighborhood of overlapping blocks. The implementationdetails are the same as those in Section III-B. We artificiallyshift the observed LR blocks by integer pixels, and denotethem as xi(m). We can then apply the proposed k-NN MMSEalgorithm (12)–(20) to obtain the estimated HR blocks, yi(m).All 2×2 sub-blocks of the estimated 8×8 shifted HR blocks,yi(m), which overlap, are averaged to improve the robustnessand accuracy, as

yMMSEi,s = 1

M

∑m=[1,M]

Sm yi(m) (21)

where m represents the index of overlaps (or shifts), M is thenumber of overlaps (or shifts) and Sm is the operator to extractthe overlapping sub-blocks from shifted HR blocks yi(m)’s thatcontribute to the final HR block yMMSE

i . If the size of theoutput HR block size is [8 × 8], the number of overlaps foreach 2 × 2 sub-block in the final HR block yMMSE

i is 16.4) Comparison With the Adaptive Wiener Filters: Note that

some adaptive Wiener filters were proposed in the literaturefor video coding [52]–[57]. These filters are optimal in thesense of minimizing the mean squares errors, which resemblesthe proposed learning-based algorithm. However, these filterswere proposed for video coding applications, such that theyrequire the original reference HR image to compute the filtercoefficients or filter parameters. In this paper, our learning-based algorithm computes the filter weights using the traineddictionary, such that the application of our algorithm can beused not only for interpolation but also for video coding.Another difference is that the proposed learning-based algo-rithm uses [4 × 4] and [8 × 8] block sizes for input andoutput to match with the block sizes of DCT during the down-sampling process, while the block sizes of input and outputof the available adaptive Wiener filters are usually [6 ×6] and[4 × 4] for quarter pixel interpolation.

Page 8: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2025

B. Iterative De-Blurring and De-Blocking Process

As we have explained in previous sections, the proposedadaptive k-NN MMSE algorithm (21) can estimate the HRblock, yMMSE

i , with both low-frequency and high-frequencycoefficients. To refine the low-frequency DCT coefficients,we can make use of the proposed iterative back-projectionframework in Section III-B to update the low-frequency parts

y0i = yMMSE

i + 4 · FTDCT(2q)

DT (FDCT(q)xi − DFDCT(2q)y

MMSEi

)(22)

where y0i is the HR block updated by the iterative back-

projection framework in (22). The refined solution in (22)retains the high-frequency DCT coefficients and refines thelow-frequency DCT coefficients of the estimated HR block,yMMSE

i , to match with those in the observed LR block.With an analysis of the refined image in our experimental

work, it shows that the refined solution in (22) is slightlyblurry due to the overlapping process in the weighted MMSEalgorithm in (21), and slightly blocky due to the block-basedrefinement in (22). In this paper, we propose a de-blurring andde-blocking process based on the idea of the unsharp mask[58], [59] to de-blur the results of the refined solution (22), as

yl+1i = yl

i − α · FTDCT(2q)

DT (FDCT(q)xi − DFDCT(2q)y

li

)(23)

where l is the iteration index (which starts from 0) and α isthe parameter that controls the strength of de-blurring. Duringthe iterative de-blurring step in (23), gradients of the squarederrors between the observed LR image and the simulated LRimage, −FT

DCT(2q)DT FDCT(q) (xi − FT

DCT(q)DFDCT(2q)y

li ), are added

to the estimate of the HR image, such that high-frequencycomponents are amplified naturally using the DCT down-sampling image model in (1). The original idea in the unsharpmask is to estimate the gradients using the difference betweenthe original image and the Gaussian blurred image. In theproposed iterative de-blurring step in (23), we make use ofthe DCT down-sampling image model in (1) to approximatethe gradients and take care of the resolution differences of theestimated HR image and the observed LR image in (23).

To address the blocking artifacts, we adopt the overlappingapproach described in Sections III-B and IV-A. Recall that theshifted LR blocks are denoted as xi(m), where m representsthe index of the shifts. Let us locate the corresponding shiftedHR blocks in the current iteration which is denoted as yl

i(m).We can then average the overlapping sub-blocks of the shiftedHR blocks to smooth out the transitions of high-frequencycomponents between the image blocks, such that the blockingeffect may be moreover alleviated. Hence, we replace (23) by(24) as

yl+1i,s = 1

M

∑m=[1,M]

Sm

[yl

i(m) − α · FTDCT(2q)

DT(FDCT(q)xi(m)

− DFDCT(2q)yli(m)

).

].

(24)

This approach performs simultaneously de-blurring and de-blocking and always produces a better result, compared withthe de-blurring solution in (23), according to our experiments.All sub-blocks yl+1

i,s of the final HR block yl+1i are obtained

Algorithm 2 Algorithm Flow of the Proposed Up-SamplingScheme (Fig. 4)

Input: LR image X, dictionaries Bx′ , By′ ,Bu′Output: Estimated HR image

1) Initialization using learning-based adaptive k-NNMMSE estimation

(a) Partition the LR image into the nonoverlapping(shifted) LR blocks [q × q].

(b) Partition the LR image into the correspondingoverlapping larger LR blocks [(q +2)×(q +2)].

(c) Vectorize the image blocks in 1(a)–(b), denotedas xi(m), ui(m).

(d) Iterate i(m) for all (shifted) LR blocks in theLR image,

(i) If the variance of the LR block xi(m) is lessthan T2 = 25, (1) Use the universal filtercoefficients in (19) to perform the MMSEestimation in (20).

(ii) Otherwise,

(1) Use the larger LR block ui(m) to searchfor k-NN training blocks from the dic-tionary Bu′ in (12) and (13)

(2) Use the k indexes found in (13) to groupthe k-NN training pairs into Tx′ and Ty′in (14) and (15)

(3) Compute k weights of training pairs in(18) to perform the weighted MMSEestimation in (16)–(20)

(e) Average the overlapping sub-blocks of all esti-mated HR blocks in (21) accordingly to obtainthe initial HR image

2) Iterative refinement process

(a) Refine the initial HR image from 1(e) by theIBP in (22).

(b) Iterate on l until the maximum number of iter-ation, L, is reached,

(i) Perform simultaneous de-blurring and de-blocking in (24).

(ii) Perform IBP refinement in (25)

using (24). The overlapping solution in (24) is similar tothat in (7) and (21) as described in the previous sections,except that we moreover have the shifted HR blocks yl

i(m) inthe current iteration as the input. Note that the simultaneousde-blurring and de-blocking step in (24) not only amplifiesthe high-frequency components, but also modifies the low-frequency DCT coefficients (due to overlapping). Hence, onlyone iteration is required for the iteration of the IBP to refinethe low-frequency DCT coefficients [making the solution fitthe image model in (1)], as

yl+2i = yl+1

i + 4 · FTDCT(2k)D

T (FDCT(q)xi − DFDCT(2k)y

l+1i

).

(25)

Since the de-blurring and de-blocking step in (24) and theiterative back-projection refinement step in (25) are depen-

Page 9: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2026 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

Fig. 8. Twenty four training images (768 × 512) used in our experiments.

Fig. 9. Twelve natural images (512 × 512) used in our experiments.

Fig. 10. Eight CIF (352×288) and two 720p (1280 × 720) video sequencesused in our experiments.

dent processes, we can alternatively apply these steps (24),(25) until the maximum number of iterations, L, is reached(Fig. 4). In the following section, we will show details ofexperimental results for this iterative refinement process, asgiven in (24) and (25).

Let us summarize the proposed iterative up-samplingscheme in Algorithm 2 with details.

V. EXPERIMENTAL RESULTS

Much experimental work has been done on an Intel i7950 system. Twenty four natural images [768 × 512] fromKodak were used as HR training images, as shown in Fig. 8.Twelve natural images (not included the training images)and 10 natural videos sequences were used to analyze andverify the performance of the proposed up-sampling scheme,as shown in Figs. 9 and 10. All the testing images andvideo sequences were down-sampled using the image modelin (1) (except the comparisons of different down- and up-sampling methods in Section V-C), such that the PSNR andM-SSIM of the up-sampled images and video frames can bemeasured. The DCT block size of the HR block was chosento be [8 × 8], which is the most commonly used block size inthe literature.

A. Analysis of the Proposed Learning-Based k-NN MMSEAlgorithm in Section IV-A

Let us analyze the settings of the proposed learning-basedk-NN MMSE algorithm in this section. During the dictionarytraining, we made use of image gradients [step 2(c) (ii) inAlgorithm 1] to control the dictionary size. As shown inTable I and Fig. 11, the PSNR and M-SSIM values are

TABLE I

PSNR (dB) AND M-SSIM [48] OF THE RECONSTRUCTED Lena IMAGE

USING DIFFERENT DICTIONARY SIZES IN OUR PROPOSED

k-NN MMSE ALGORITHM

Fig. 11. Plots of PSNR (dB) and M-SSIM [48] against the dictionary sizes(1 cluster).

Fig. 12. Portions (128×128) (top row) and (256×256) (bottom row) of thereconstructed Lena image using (a) 10563, (b) 50629, (c) 118061, (d) 177663,and (e) 228669 training pairs in our learning-based k-NN MMSE algorithm(Section IV-A).

saturated when the dictionary size is beyond 200 000 train-ing pairs. These results confirm that further increasing thedictionary size by including more image blocks, whichbelong to the smooth area gives very marginal improve-ments. To reduce the searching time, the k-means clusteringapproach was applied to divide the dictionary into severalsub-parts. The computation times for the online estimationof the k-NN MMSE algorithm and the overall up-samplingscheme (implemented using nonoptimized MATLAB code) areprovided in Table I. Fig. 12 shows portions of the reconstructedimage using different dictionary sizes. When the dictionarymerely contains 10 563 training pairs, which are mostly edges(due to large gradients), the quality of the edge reconstructionis still satisfactory. According to the experimental results inTable I, we chose to set the threshold of image gradient to0.75 and the corresponding dictionary size to 228 869 (trainingpairs) without partitioning the dictionary into clusters in thefollowing experiments.

Let us make an analysis on the overlapping step in (21).Overlapping the estimated HR blocks means to implicitly

Page 10: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2027

TABLE II

PSNR (dB) AND M-SSIM [48] OF THE SEVERAL NATURAL

IMAGES USING (A) NONOVERLAPPING IN (20) AND

(B) OVERLAPPING IN (21) STEPS

Fig. 13. Portions (128 × 128) of the reconstructed natural images using(a) nonoverlapping (20) and (b) overlapping (21) steps in our k-NN MMSEalgorithm.

implement the spatial relationship constrain by reusing similark-nearest image pairs in the overlapped neighborhood. For afair comparison, we did not overlap the shifted HR blocksbut took the centermost 2 × 2 pixels as the nonoverlappingapproach in (20). Table II shows that the overlapping stepin (21) significantly improves the accuracy in terms of PSNR(around 0.88 dB improvement) and M-SSIM. Fig. 13 showsthe visual comparison. There are some obvious noises aroundthe edges of the reconstructed images in Fig. 13(a), which aremainly due to block-independent estimation in the nonover-lapping approach.

1) Summary of Settings of the Proposed Learning-Basedk-NN MMSE Algorithm: Let us summarize the settings of theproposed learning-based k-NN MMSE algorithm in Table IIIwhen the block size of the original HR block is [8 × 8].Specifically, the block sizes of our training data should matchwith the block sizes for down-sampling in the DCT domain,for consistency. A set of 228 869 training pairs from 24 Kodakimages are sufficient to provide a state-of-the-art performance.In our algorithm, we have used the NCC as the criterion tosearch for the k-NNs and a weighting function based on thecomputed NCC to weight the k-NNs. Finally, we incorporatedthe spatial relationship constraint by overlapping the estimatedHR blocks in a relatively large neighborhood. For furthercomparison, we also have included the settings of the k-NNMMSE algorithm [12], which was proposed to up-sample animage that was down-sampled by the bicubic interpolation,in Table III.

2) Summary of Settings of the Proposed OverallUp-Sampling Scheme: Table IV shows the values ofthe parameters used for the proposed up-sampling scheme.These parameters were fixed throughout all experiments. As

TABLE III

SETTINGS OF THE PROPOSED ADAPTIVE k-NN MMSE ALGORITHM

FOR IMAGES DOWN-SAMPLED IN THE DCT DOMAIN

TABLE IV

PARAMETER SETTINGS OF THE PROPOSED OVERALL

UP-SAMPLING SCHEME

shown in Algorithm 2 and Fig. 4, the proposed scheme hasthree major steps. Table V and Fig. 14 show the PSNR (dB)and M-SSIM [48] improvements of the reconstructed imagesduring every major step of the proposed up-samplingscheme to achieve two times up-sampling. For the firststep [Fig. 14(a)], the HR image is initialized using theproposed k-NN MMSE algorithm, which contributes tothe initialization of high-frequency DCT components usingexternal information. The second step [Fig. 14(b)] is torefine the low-frequency DCT coefficients of the initialHR image making use of the IBP refinement in (22).This contributes to the consistency of low-frequency DCTcomponents of the observed LR image and the estimatedHR image. For the last step [Fig. 14(c)–(h)], the iterativerefinement process (de-blurring and de-blocking) in (24)and (25) refines progressively the high-frequency DCTcoefficients, which contributes to de-blocking the imagedue to the block-based IBP refinement in the second stepand de-blurring the image due to the overlapping stepin the de-blocking process. The empirical analysis suggeststhat the proposed iterative refinement process converges ataround the fifth-sixth iterations, which gives the best PSNRand M-SSIM values on average. Hence, the maximum numberof iterations is set to 6 in Table IV.

Page 11: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2028 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

TABLE V

PSNR (dB) AND M-SSIM OF THE PROPOSED UP-SAMPLING SCHEME

FOR TWO TIMES UP-SAMPLING. (A) INITIALIZATION. (B) IBP

REFINEMENT (22). (C)–(F) ITERATIVE REFINEMENT PROCESS

(24), (25): (C) FIRST ITERATION. (D) THIRD ITERATION.

(E) FIFTH ITERATION. (F) SIXTH ITERATION

Fig. 14. Average PSNR (dB) and M-SSIM of the proposed up-samplingscheme for two times up-sampling. (a) Initialization. (b) IBP refinement (22).(c)–(h) Iterative refinement process (24), (25).

Fig. 15. Portions (96 × 96) of the reconstructed Foreman sequenceduring intermediate steps of the proposed scheme. (a) Initialization. (b) IBPrefinement. (c) 1st iteration. (d) 2nd iteration. (e) 3rd iteration. (f) 4th iteration.(g) 5th iteration. (h) 6th iteration.

Furthermore, subjective evaluations are shown in Fig. 15.Fig. 15(a) shows the initialization using the k-NNalgorithm. This is followed by the IBP refinement inFig. 15(b). The sharpness of the reconstructed imageimproves gradually from Fig. 15(a)–(h) during the iterativede-blurring and de-blocking process. Stronger de-blurringperformance can be easily achieved by increasing the strength

TABLE VI

PROCESSING TIME (SECONDS) FOR TWO TIMES UP-SAMPLING.

(A) ZERO PADDING [24]. (B) OVERLAPPING [6]. (C) BICUBIC

INTERPOLATION. (D) WIENER FILTER [46]. (E) HYBRID

SCHEME [35]. (F) PROPOSED SCHEME

TABLE VII

PSNR (dB) AND M-SSIM [48] FOR TWO TIMES UP-SAMPLING.

(A) ZERO PADDING [24]. (B) OVERLAPPING [6]. (C) BICUBIC

INTERPOLATION. (D) WIENER FILTER [46]. (E) HYBRID

SCHEME [35]. (F) PROPOSED SCHEME

parameter, α. However, the fidelity of the reconstructed imagedeteriorates at the expense of a better visual pleasant.

B. Comparison With State-of-the-Art Up-Sampling Algorithmsfor DCT Down-Sampling

In the following sections, we will give a comprehensivecomparison among: 1) three state-of-the-art and representativeup-sampling algorithms for images down-sampled in the DCTdomain as described in Section III-B; 2) arbitrary factorinterpolation algorithm (bicubic interpolation) that addressesthe pixel shift due to DCT down-sampling; and 3) the dyadicfactor interpolation algorithm (fixed coefficients Wiener filter[46]). Moreover, we also include a comparison with state-of-the-art interpolation (robust soft-decision interpolation) [17]and SR (adaptive sparse domain selection) [14] algorithms,to provide a better insight of the performance differences ofvarious down- and up-sampling methods.

Table VI shows the processing time of the proposed schemeand other comparing methods. Tables VII and VIII summa-rize the PSNR and M-SSIM of reconstructed images andvideo sequences using our scheme and several state-of-the-art up-sampling algorithms [6], [24], [35]. For the videosequences, the YUV components were down-sampled inde-pendently using the image model in (1). Only the PSNR andM-SSIM of the Y components are compared, since the humanvisual system is more sensitive to Y components. The UVcomponents were up-sampled using the bicubic interpolation(with pixel shifts). Fig. 16 shows the PSNR-Y and M-SSIM-Yplotting of several video sequences. As shown in Tables VIIand VIII, the proposed scheme has more than 1 dB PSNR

Page 12: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2029

TABLE VIII

PSNR (dB) AND M-SSIM [48] FOR TWO TIMES UP-SAMPLING.

(A) ZERO PADDING [24]. (B) OVERLAPPING [6]. (C) BICUBIC

INTERPOLATION. (D) WIENER FILTER [46]. (E) HYBRID

SCHEME [35]. (F) PROPOSED SCHEME

Fig. 16. PSNR-Y (dB) and M-SSIM-Y of reconstructed video sequences(Akiyo, Foreman, and BigShips) using (a) zero padding [24], (b) overlapping[6], (c) bicubic interpolation, (d) Wiener filter [46], (e) hybrid scheme [35],and (f) proposed up-sampling scheme.

advantage over the comparing algorithms. Note that the fixedcoefficient Wiener filter [46] is around 3 dB worse than thePSNR performance of the zero padding approach [24] for twotimes up-sampling (due to half-pixel shift), as shown in [35]and Tables VII–IX.

For a more comprehensive comparison, let us also includethe results of four times up-sampling. Several natural imageswere down-sampled twice using the image model in (1), i.e.,1/4 down-sampling factor, such that the proposed scheme andstate-of-the-art algorithms were applied twice to reconstructthe original images. Note that the down-sampled images have

TABLE IX

PSNR (dB) AND M-SSIM [48] FOR FOUR TIMES UP-SAMPLING.

(A) ZERO PADDING [24]. (B) OVERLAPPING [6]. (C) BICUBIC

INTERPOLATION. (D) WIENER FILTER [46]. (E) HYBRID

SCHEME [35]. (F) PROPOSED SCHEME

Fig. 17. Portions (256 × 256) of the reconstructed Bicycle (top) andBoat (bottom) images for two times up-sampling using different approaches.(a) Zero padding [24]. (b) Overlapping [6]. (c) Bicubic interpolation.(d) Hybrid scheme [35]. (e) Proposed scheme. (f) Original HR image.

around one and a half pixel shift with the original HR images,which can be addressed using the zero padding approach [24],overlapping zero padding approach [6], bicubic interpolationor the proposed scheme. Table IX shows that the proposedscheme maintains the advantages of having high PSNR (1 dBimprovement) and M-SSIM values when the up-samplingfactor is four.

Figs. 17–19 show some visual comparison of the proposedscheme and the state-of-the-art algorithms for two times andfour times up-sampling. Fig. 17 shows that the proposed

Page 13: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2030 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

Fig. 18. First frames (352 × 288) of the reconstructed Akiyo (top) andFootball (bottom) video sequence for two times up-sampling using differentapproaches. (a) Zero padding [24]. (b) Overlapping [6]. (c) Bicubic interpo-lation. (d) Hybrid scheme [35]. (e) Proposed scheme. (f) Original HR image.

TABLE X

DOWN-SAMPLING METHODS USED IN DIFFERENT

UP-SAMPLING ALGORITHMS

scheme produces much sharper pictures with much less ringingand blocking effects around the edges compared with otheralgorithms. Specifically, the reconstructed Bicycle image usingthe proposed scheme looks nearly identical to the originalHR image. For the reconstructed boat image in Fig. 17,the edges reconstructed by the proposed scheme are muchsmoother and sharper than those of the comparing algorithms.Fig. 18 shows the visual comparisons of the reconstructedvideo sequences. The reconstructed frames of the Akiyo andFootball sequences give similar observations with those inFig. 17. The proposed scheme produces video frames, whichlook similar to the original HR frames, while other algorithmsproduce some ringing artifacts around the edges and textureareas. Fig. 19 shows some reconstructed images using fourtimes up-sampling. The proposed scheme gives more pleasantpictures in terms of edge and texture reconstruction. Thesubjective results in Figs. 17–19 generally agree with thePSNR and M-SSIM results in Tables VII–IX.

Fig. 19. Portions (256 × 256) of the reconstructed Bicycle (top) and Pepper(bottom) images for four times up-sampling using different approaches.(a) Zero padding [24]. (b) Overlapping [6]. (c) Bicubic interpolation.(d) Hybrid scheme [35]. (e) Proposed scheme. (f) Original HR image.

C. Comparison to the State-of-the-Art Interpolationand SR Algorithms

To provide an insight of different state-of-the-art down-sampling and up-sampling methods, let us include someresults of the state-of-the-art interpolation [17] and SR [14]algorithms. Table X summarizes the down-sampling methodsused in the proposed scheme and the comparing algorithms[14]–[17]. Several natural images were down-sampled usingdifferent down-sampling methods (as described in Table X)and then up-sampled using the corresponding algorithms. Bothdown- and up-sampling factors are 2. For the SR algorithm[14], we tested three settings of standard derivation of theGaussian blur, to find out a better setting for this SR algorithm.Table XI shows the PSNR and M-SSIM values of severalreconstructed images using the proposed scheme and otheralgorithms [14], [17]. The SR algorithm [14] using Gaussianblur of standard derivation 1 gives the highest average PSNRvalue; however, the proposed scheme gives the highest averageSSIM value.

Let us recall that much research effort has been put intointerpolation and making SR of images down-sampled in thespatial domain. We believe that there are still great potentialsof up-sampling algorithms for images down-sampled in theDCT domain or other compressed domains. Figs. 20 and 21

Page 14: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2031

TABLE XI

PSNR (dB) AND M-SSIM [48] OF THE SEVERAL NATURAL IMAGES

USING (A) THE PROPOSED SCHEME, (B) INTERPOLATION

ALGORITHM [17], (C)–(E) SR ALGORITHM [14] USING

GAUSSIAN BLUR OF S.D. 0.75, 1, 1.6

Fig. 20. Portions (256 × 256) of the reconstructed bicycle image usingdifferent down-sampling and up-sampling methods. (a) Proposed up-samplingscheme (Left). (b) Interpolation algorithm [17] (Right). (c) SR algorithm [14](Left). (d) Original high-resolution image (Right).

Fig. 21. Portions (256 × 256) of the reconstructed Lena image using differentdown-sampling and up-sampling methods. (a) Proposed up-sampling scheme(Left). (b) Interpolation algorithm [17] (Right). (c) SR algorithm [14] (Left).(b) Original high-resolution image (Right).

show the visual comparison of the proposed scheme and otheralgorithms. The SR algorithm [14] using Gaussian blur ofstandard derivation 1 shows some artifacts at the turning pointsof edges in the reconstructed bicycle image. Such artifacts maybe the result of insufficient training data in this learning-basedalgorithm. On the contrary, the proposed scheme does not havesuch artifacts. The interpolation algorithm [17] produces some

severe aliasing artifacts, which is the fundamental problem ofdirect down-sampling without the anti-aliasing filter. On theother hand, all algorithms perform very well in reconstructingthe Lena image in Fig. 21.

VI. CONCLUSION

In this paper, we propose a novel DCT-based up-samplingscheme for images down-sampled in the DCT domain,which allows to estimate accurately the high-frequency DCTcoefficients. The major idea is to make use of a sophis-ticated learning-based technique to estimate the truncatedhigh-frequency DCT coefficients using extra informa-tion from precomputed training sets, which give somevery successful results. This is followed by an effec-tive low-frequency DCT coefficients refinement step, andthen the iterative de-blurring and de-blocking refinementprocess.

Experimental results justify the performance of the pro-posed learning-based up-sampling scheme, which significantlyoutperforms the state-of-the-art up-sampling algorithms interms of PSNR (more than 1 dB improvement), M-SSIMand subjective quality. The proposed scheme has a flexi-ble structure. Since the proposed learning-based techniqueis adaptive to the input and output training data, it can bemodified for nondyadic up-sampling factors and other blocksizes theoretically with some minor optimizations of para-meters. Specifically, using a different image model from (1)for different down-sampling factors and block sizes, readerscan design their own customized up-sampling scheme step-by-step from formulating the iterative back-projection frameworkto dictionary training and estimation process of the adaptivek-NN MMSE algorithm. The iterative refinement processfor de-blurring and de-blocking can be designed accord-ing to the formulated IBP framework and the imagemodel as well. Hence, the whole of this up-samplingscheme can be modified for different applications. For theconvenience of readers on implementation details of ourscheme, the MATLAB source codes and more results ofthe proposed up-sampling scheme are available on ourwebsite [61].

In the future, we can also investigate the feasibility ofapplying the proposed up-sampling scheme for scalable videocoding, to improve the coding efficiency. However, the com-putational complexity of the proposed scheme should beaddressed. In this paper, we have successfully made use of thek-means clustering to cluster the training pairs for acceleration.In the future, we perhaps can investigate the feasibility ofprecalculating (storing) a set of filter coefficients for eachcluster. During the online estimation, the process for searchingthe k-NNs can be simplified to find clusters, such that theprocess for precomputing the filter coefficients bears the majorcomputational load instead of performing the weighted MMSEestimation to calculate the filter coefficients. This can possiblysave the huge amount of computation on real-time applica-tions. Our initial experiments show a substantial performancedrop by precomputing sets of filter coefficients using thek-means clustering to cluster the training data. Hence, much

Page 15: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

2032 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

future research work should be done, to accelerate the speedof the proposed algorithm with the minimal quality cost. Theaccelerated algorithm can then be incorporated into a scalablevideo coding platform for improving the coding efficiency,as one of its applications.

REFERENCES

[1] L. Ma, S. Li, and K. N. Ngan, “Perceptual image compression viaadaptive block-based super-resolution directed down-sampling,” in Proc.IEEE Int. Symp. Circuits Syst., ISCAS, May 2011, pp. 97–100.

[2] M. Shen, P. Xue, and C. Wang, “Down-sampling based video codingusing super-resolution technique,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 21, no. 6, pp. 755–765, Jun. 2011.

[3] E. M. Hung, R. L. de Queiroz, F. Brandi, K. F. Oliveira, andD. Mukherjee, “Video super-resolution using codebooks derived fromkey-frames,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 9,pp. 1321–1331, Sep. 2012.

[4] X. Wu, X. Zhang, and X. Wang, “Low bit-rate image compression viaadaptive down-sampling and constrained least squares upconversion,”IEEE Trans. Image Process., vol. 18, no. 3, pp. 552–561, Mar. 2009.

[5] Z. Shi, X. Sun, and F. Wu, “Spatially scalable video coding forHEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12,pp. 1813–1826, Dec. 2012.

[6] I. Shin and H. W. Park, “Adaptive up-sampling method using DCT forspatial scalability of scalable video coding,” IEEE Trans. Circuits Syst.Video Technol., vol. 19, no. 2, pp. 206–214, Feb. 2009.

[7] Q. Hu and S. Panchanathan, “Image/video spatial scalability in com-pressed domain,” IEEE Trans. Ind. Electron., vol. 45, no. 1, pp. 23–31,Feb. 1998.

[8] H. Shu and L.-P. Chau, “A resizing algorithm with two-stage realizationfor DCT-based transcoding,” IEEE Trans. Circuits Syst. Video Technol.,vol. 17, no. 2, pp. 248–253, Feb. 2007.

[9] C. A. Segall and G. J. Sullivan, “Spatial scalability within theH.264/AVC scalable video coding extension,” IEEE Trans. Circuits Syst.Video Technol., vol. 17, no. 9, pp. 1121–1135, Sep. 2007.

[10] Z. Alkachouh and M. G. Bellanger, “Fast DCT-based spatial domaininterpolation of blocks in images,” IEEE Trans. Image Process., vol. 9,no. 4, pp. 729–732, Apr. 2000.

[11] D. C. Garcia, C. Dorea, and R. L. de Queiroz, “Super resolution formultiview images using depth information,” IEEE Trans. Circuits Syst.Video Technol., vol. 22, no. 9, pp. 1249–1256, Sep. 2012.

[12] K. S. Ni and T. Q. Nguyen, “An adaptable k-nearest neighbors algorithmfor MMSE image interpolation,” IEEE Trans. Image Process., vol. 18,no. 9, pp. 1976–1987, Sep. 2009.

[13] K. S. Ni and T. Q. Nguyen, “Image superresolution using support vectorregression,” IEEE Trans. Image Process., vol. 16, no. 6, pp. 1596–1610,Jun. 2007.

[14] W. Dong, D. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regular-ization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857,Jul. 2011.

[15] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56–65,Mar./Apr. 2002.

[16] H. He and W.-C. Siu, “Single image super-resolution using Gaussianprocess regression,” in Proc. IEEE Int. Conf. Comput. Vis. PatternRecognit., CVPR, Jun. 2011, pp. 449–456.

[17] K.-W. Hung and W.-C. Siu, “Robust soft-decision interpolation usingweighted least squares,” IEEE Trans. Image Process., vol. 21, no. 3,pp. 1061–1069, Mar. 2012.

[18] K.-W. Hung and W.-C. Siu, “Fast image interpolation using the bilateralfilter,” IET Image Process., vol. 6, no. 7, pp. 877–890, Oct. 2012.

[19] H. A. Aly and E. Dubois, “Image up-sampling using total-variationregularization with a new observation model,” IEEE Trans. ImageProcess., vol. 14, no. 10, pp. 1647–1659, Oct. 2005.

[20] H. Demirel and G. Anbarjafari, “IMAGE resolution enhancement byusing discrete and stationary wavelet decomposition,” IEEE Trans.Image Process., vol. 20, no. 5, pp. 1458–1460, May 2011.

[21] S. G. Chang, Z. Cvetkovic, and M. Vetterli, “Locally adaptive wavelet-based image interpolation,” IEEE Trans. Image Process., vol. 15, no. 6,pp. 1471–1485, Jun. 2006.

[22] J. I. Agbinya, “Interpolation using the discrete cosine transform,”Electron. Lett., vol. 28, no. 20, pp. 1927–1928, Sep. 1992.

[23] Y. S. Park and H. W. Park, “Design and analysis of an image resizingfilter in the block-DCT domain,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 14, no. 2, pp. 274–279, Feb. 2004.

[24] R. Dugad and N. Ahuja, “A fast scheme for image size change in thecompressed domain,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,no. 4, pp. 461–474, Apr. 2001.

[25] C. Salazar and T. D. Tran, “A complexity scalable universal DCT domainimage resizing algorithm,” IEEE Trans. Circuits Syst. Video Technol.,vol. 17, no. 4, pp. 495–499, Apr. 2007.

[26] Y. S. Park and H. W. Park, “Arbitrary-ratio image resizing using fastDCT of composite length for DCT-based transcoder,” IEEE Trans. ImageProcess., vol. 15, no. 2, pp. 494–500, Feb. 2006.

[27] C. Wang, H.-B. Yu, and M. Zheng, “A fast scheme for arbitrarily resizingof digital image in the compressed domain,” IEEE Trans. Consum.Electron., vol. 49, no. 2, pp. 466–471, May 2003.

[28] V. Patil, R. Kumar, and J. Mukherjee, “A fast arbitrary factor videoresizing algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 16,no. 9, pp. 1164–1171, Sep. 2006.

[29] J. Mukherjee and S. K. Mitra, “Arbitrary resizing of images inDCT space,” IEE Proc. Vis., Image Signal Process., vol. 152, no. 2,pp. 155–164, Apr. 2005.

[30] H. W. Park, Y. S. Park, and S.-K. Oh, “L/M-fold image resizing inblock-DCT domain using symmetric convolution,” IEEE Trans. ImageProcess., vol. 12, no. 9, pp. 1016–1034, Sep. 2003.

[31] J. Mukherjee and S. K. Mitra, “Image resizing in the compressed domainusing subband DCT,” IEEE Trans. Circuits Syst. Video Technol., vol. 12,no. 7, pp. 620–627, Jul. 2002.

[32] S.-J. Park and J. Jeong, “Hybrid image upsampling method in thediscrete cosine transform domain,” IEEE Trans. Consum. Electron.,vol. 56, no. 4, pp. 2615–2622, Nov. 2010.

[33] H. Lim and H. W. Park, “A ringing-artifact reduction method for block-DCT-based image resizing,” IEEE Trans. Circuits Syst. Video Technol.,vol. 21, no. 7, pp. 879–889, Jul. 2011.

[34] M.-K. Cho and B.-U. Lee, “Discrete cosine transform domain imageresizing using correlation of discrete cosine transform coefficients,”J. Electron. Imag., vol. 15, no. 3, p. 033009, Jul./Sep. 2006.

[35] Z. Wu, H. Yu, and C. W. Chen, “A new hybrid DCT-Wiener-basedinterpolation scheme for video intra frame up-sampling,” IEEE SignalProcess. Lett., vol. 17, no. 10, pp. 827–830, Oct. 2010.

[36] K.-W. Hung and W.-C. Siu, “Hybrid DCT-Wiener-based interpolation vialearnt Wiener filter,” in Proc. IEEE Int. Conf. Acoust., Speech, SignalProcess., May 2013, pp. 1419–1423.

[37] W. Zhang and W.-K. Cham, “Hallucinating face in the DCT domain,”IEEE Trans. Image Process., vol. 20, no. 10, pp. 2769–2779, Oct. 2011.

[38] T. Q. Pham, L. J. van Vliet, and K. Schutte, “Resolution enhancement oflow-quality videos using a high-resolution frame,” Proc. SPIE, vol. 6077,p. 607708, Jan. 2006.

[39] A. J. Patti and Y. Altunbasak, “Super-resolution image estimation fortransform coded video with application to MPEG,” in Proc. IEEE Int.Conf. Image Process., vol. 3. Oct. 1999, pp. 179–183.

[40] S. C. Park, M. G. Kang, C. A. Segall, and A. K. Katsaggelos, “Spatiallyadaptive high-resolution image reconstruction of DCT-based compressedimages,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 573–585,Apr. 2004.

[41] J. M. Adant, P. Delogne, E. Lasker, B. Macq, L. Stroobants, andL. Vandendorpe, “Block operations in digital signal processing withapplication to TV coding,” Signal Process., vol. 13, no. 4, pp. 385–397,Dec. 1987.

[42] K. N. Ngan, “Experiments on two-dimensional decimation in timeand orthogonal transform domains,” Signal Process., vol. 11, no. 3,pp. 249–263, 1986.

[43] H. Shu and L.-P. Chau, “An efficient arbitrary downsizing algorithm forvideo transcoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 14,no. 6, pp. 887–891, Jun. 2004.

[44] M. Irani and S. Peleg, “Motion analysis for image enhancement: Reso-lution, occlusion, and transparency,” J. Vis. Commun. Image Represent.,vol. 4, no. 4, pp. 324–335, Dec. 1993.

[45] M. Irani and S. Peleg, “Improving resolution by image registration,”CVGIP, Graph. Models Image Process., vol. 53, no. 3, pp. 231–239,May 1991.

[46] Advanced Video Coding for Generic Audiovisual Services ITU-T andISO/IEC JTC1, document Rec. H.264-ISO/IEC 14496-10 AVC, 2003.

[47] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed.New York, NY, USA: Wiley, 2000.

[48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: From error visibility to structural similarity,” IEEETrans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

Page 16: 2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR …wcsiu/softmodule/9/6_2014...2018 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 12, DECEMBER 2014

HUNG AND SIU: NOVEL DCT-BASED IMAGE UP-SAMPLING 2033

[49] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.Reading, MA, USA: Addison-Wesley, 1992.

[50] A. Goshtasby, S. H. Gage, and J. F. Bartholic, “A two-stage crosscorrelation approach to template matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. PAMI-6, no. 3, pp. 374–378, May 1984.

[51] E. L. Lehmann and G. Casella, Theory of Point Estimation, 2nd ed.New York, NY, USA: Springer-Verlag, 1998.

[52] W. Zhang, A. Men, and P. Chen, “Adaptive inter-layer intra predictionin scalable video coding,” in Proc. IEEE Int. Symp. Circuits Syst.,May 2009, pp. 876–879.

[53] J. Dong and K. N. Ngan, “Parametric interpolation filter for HD videocoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12,pp. 1892–1897, Dec. 2010.

[54] Y. Vatis and J. Ostermann, “Adaptive interpolation filter for H.264/AVC,”IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 179–192,Feb. 2009.

[55] D. Rusanovskyy, K. Ugur, A. Hallapuro, J. Lainema, and M. Gabbouj,“Video coding with low-complexity directional adaptive interpolationfilters,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8,pp. 1239–1243, Aug. 2009.

[56] D. Rusanovskyy, K. Ugur, and M. Gabbouj, “Adaptive interpolation withflexible filter structures for video coding,” in Proc. IEEE Int. Conf. ImageProcess., Nov. 2009, pp. 1025–1028.

[57] Y. Vatis and J. Ostermann, “Locally adaptive non-separable interpola-tion filter for H.264/AVC,” in Proc. IEEE Int. Conf. Image Process.,Oct. 2006, pp. 33–36.

[58] G. Cao, Y. Zhao, R. Ni, and A. C. Kot, “Unsharp masking sharpeningdetection via overshoot artifacts analysis,” IEEE Signal Process. Lett.,vol. 18, no. 10, pp. 603–606, Oct. 2011.

[59] G. Deng, “A generalized unsharp masking algorithm,” IEEE Trans.Image Process., vol. 20, no. 5, pp. 1249–1261, May 2011.

[60] M. C. Pease, Method of Matrix Algebra. New York, NY, USA:Academic, 1965.

[61] MATLAB Source Codes and More Results of this Paper [Online].Available: http://www.eie.polyu.edu.hk/∼wcsiu/softmodule/9/dct.html,accessed Mar. 10, 2014.

Kwok-Wai Hung (S’09) received the B.Eng.(Hons.) and Ph.D. degrees in electronic and infor-mation engineering from Hong Kong PolytechnicUniversity, Hong Kong, in 2009 and 2014, respec-tively.

His research interests include image and videosignal processing, image and video interpolation,and superresolution.

Dr. Hung received several awards and scholarshipsduring his undergraduate and postgraduate studies,including the Dean’s Honors List, the Technical

Excellent Award of his final-year project, and the Postentry Scholarship forcurrent students on UGC-funded undergraduate programs.

Wan-Chi Siu (S’77–M’77–SM’90–F’12) receivedthe M.Phil. degree from Chinese University of HongKong, Hong Kong, in 1977 and the Ph.D. degreefrom Imperial College of Science, Technology andMedicine, London, U.K., in 1984.

He joined Hong Kong Polytechnic University,Hong Kong, as a Lecturer in 1980, where he hasbeen a Chair Professor with the Department ofElectronic and Information Engineering since 1992.He was the Head of the Department of Electronicand Information Engineering and then the Dean of

Engineering Faculty from 1994 to 2002. He is currently the Director of theResearch Centre for Signal Processing, Hong Kong Polytechnic University.He is an expert in digital signal processing, specializing in fast algorithms,video coding, and object recognition. He has authored 400 research papers,more than 190 of which appeared in international journals, such as IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY.He holds four patents granted and another four patents filed, and is also aCo-Editor of the book Multimedia Information Retrieval and Management(Springer, 2003). His work on fast computational algorithms (such as DCTand motion estimation algorithms) has been well received by academic peers,with good citations, and a number of them are now being used in hightech industrial applications, such as for modern video surveillance and videocodec design for high-definition television systems of some million dollarcontract consultancy works. His research interests include transforms, imagecoding, transcoding, 3-D videos, wavelets, and computational aspects ofpattern recognition.

Prof. Siu is a Chartered Engineer and a fellow of the Institution ofEngineering and Technology and the Hong Kong Institution of Engineers. Heis the Vice President of the IEEE Signal Processing Society, the Chairman ofConference Board, and was a core member of the Board of Governors from2012 to 2014. He has been the Associate Editor, Guest Editor, and EditorialBoard Member of a number of journals, including IEEE TRANSACTIONS ON

IMAGE PROCESSING, Journal of VLSI Signal Processing Systems for Signal,Image, Video Technology, and the IEEE TRANSACTIONS ON CIRCUITS ANDSYSTEMS. He is a very popular lecturing staff member within the University,whereas outside the University, he has been a keynote speaker of more than10 international/national conferences in the last 10 years, and was an invitedspeaker of numerous professional events, such as the IEEE CPM in 2002(keynote speaker, Taipei, Taiwan), the IEEE Intelligent Multimedia, Video andSpeech Processing in 2004 (keynote speaker, Hong Kong), the IEEE Inter-national Conference on Information, Communications and Signal Processingin 2007 (invited speaker, Singapore), and the IEEE/IET 2012 InternationalConference on Audio, Language and Image Processing, Shanghai, China,in 2012. He is an organizer of many international conferences, includingthe MMSP’08 (Australia) as the General Co-Chair, and three IEEE Societysponsored flagship conferences, ISCAS’1997 as the Technical Program Chair,ICASSP’2003 as the General Chair, and ICIP’2010 as the General Chair.He was a member of the Engineering Panel and the Physical Sciencesand Engineering Panel of the Research Grant Council at the Hong KongGovernment from 2010 to 2014 and from 1991 to 1995, respectively. From1993 to 1994, he was the Chair of the first Engineering and InformationTechnology Panel of the Research Assessment Exercise to assess the researchquality of 19 departments from all universities in Hong Kong, and initiatedto set up a set of objective indicators to assess the basic research qualityof academia, which gives substantial impact to the research culture in HongKong for the last 20 years. He also received a number of awards, includingthe Distinguished Presenter Award at Hong Kong in 1997; the IEEE ThirdMillennium Medal at USA in 2000; the Best Teacher Award at Hong Kongin 2003; the Outstanding Award in Research at Hong Kong in 2003; thePlaque for Exceptional Leadership at the IEEE SPCB, USA, in 2003; and theHonorable Mention Winner Award at the Pattern Recognition, USA, in 2004.