6
Adaptive Guided Image Filter for Improved In-Loop Filtering in Video Coding Chen Chen #1 , Zexiang Miao *2 , Bing Zeng #*3,4 # Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong, China 1,3 {cchenaf,eezeng}@ust.hk * Institute of Image Processing, University of Electronic Science and Technology of China No.2006, Xiyuan Avenue, West Hi-tech Zone, Chengdu, Sichuan, China 2 [email protected] 4 [email protected] Abstract—This paper proposes a new adaptive sharpening filter based on guided image filter and improves HEVC’s in- loop filter architecture by embedding sharpening filter between deblocking filter and SAO. The proposed algorithm classifies pixels of a frame into several groups according to uniform quantization of each pixel’s Sum-Modified-Laplacian value and assigns identical optimal filtering parameters to the pixels be- longing to the same group based on rate-distortion optimization. Simulation results show that our proposed algorithm achieves 0.7% on average and up to 8% BD-rate reduction with respect to the original HEVC in-loop filtering method. Encoding time increases slightly by about 15% and decoding time increases by 70% on average without special optimization of C++ program integrated in HM-16.5. I. I NTRODUCTION With the rapid growth of demand for high-definition videos with 1080p or even 4K resolutions and the limited storage space or network bandwidth, more and more people are inspired to dedicated in research on video compression. High Efficiency Video Coding (HEVC) is a newly established video coding standard in Jan. 2013 as the successor of H.264/AVC by joint collaborative team on video coding (JCT-VC), whose primary goal is to achieve 50% bit-rate reduction as compared with H.264/AVC under the same perceptual quality [1]. While inheriting most of H.264/AVC’s features, HEVC is still based on a hybrid coding scheme using block-based pre- dictions and transforms. Unlike the fixed 16×16 macroblock coding basic unit [2], HEVC introduces a flexible quad- tree structure called Coding Tree Unit (CTU), predefined by configuration with size 16×16, 32×32 or 64×64, which can be further split into different sizes of Coding Unit (CU) varying from 8×8 to 64×64, Prediction Unit (PU) and Transform Unit (TU) varying from 4×4 to 32×32 adaptively according to complexity of frame textures [1]. Such a block-based coding scheme adopted by H.264/AVC and HEVC will be no doubt that leads to blocking artifacts because of the discontinues occurred on block boundaries MMSP’15, Oct. 19 - Oct. 21, 2015, Xiamen, China. 978-1-4673-7478-1/15/$31.00 c 2015 IEEE caused by quantization errors and differences among motion vectors of adjacent blocks [3]. Besides, another artifact— ringing artifact will also be introduced after compression, which is analogous to the Gibbs phenomenon and results from the high frequency detail loss due to quantization [4]. In the most recent version of HEVC codec, two kinds of in-loop filters are adopted in the standard—deblocking filter [5] whose role is to reduce the blocking artifact and sample adaptive offset (SAO) [6] aimed at removing the ringing artifact. Both the two in-loop filters are designed to improve the subjective visual quality and objective coding efficiency while reducing the implement complexity, especially considering the parallel operations of encoder and decoder. Another in-loop filter adopted by early HEVC version is adaptive loop filter (ALF), which is a Wiener-based adaptive filter in order to minimize the mean square error between original samples and decoded samples [7]. The essence of current deblocking filter is still low-pass filtering by eliminating high frequency noise occurred on block boundaries. Even the HEVC standard has employed some smart strategy to identify the discontinues of block boundaries and real image edges, it still blurs some exact image textures. So the most important part of in-loop filter is to preserve the original image edges or even recover the real image textures as far as possible while removing the blocking and ringing artifacts at the same time. So, in recent years, many people spent a lot of efforts on pursing some sharpening filters, playing an opposite role of deblocking filter, which can sharpen image edges and restore image textures. In [8], a famous bilateral filter is proposed to smooth image while preserving edges by means of a nonlinear combina- tion of nearby image values. Based on such edge-preserving smooth image filter, paper [9] proposes a new type of trained bilateral filter by least mean square optimization to reduce coding artifacts. Paper [10] integrates bilateral filer into early HEVC coding standard, doing rate-distortion optimization (RDO) combined with ALF to enhance coding efficiency. Inspired by the classical bilateral filter, an explicit im- age filter named guided filter is proposed in [11]. Derived

Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

  • Upload
    vuquynh

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

Adaptive Guided Image Filter for ImprovedIn-Loop Filtering in Video Coding

Chen Chen #1, Zexiang Miao ∗2, Bing Zeng #∗3,4

# Department of Electronic and Computer Engineering, The Hong Kong University of Science and TechnologyClear Water Bay, Kowloon, Hong Kong, China

1,3 {cchenaf,eezeng}@ust.hk

∗ Institute of Image Processing, University of Electronic Science and Technology of ChinaNo.2006, Xiyuan Avenue, West Hi-tech Zone, Chengdu, Sichuan, China

2 [email protected] [email protected]

Abstract—This paper proposes a new adaptive sharpeningfilter based on guided image filter and improves HEVC’s in-loop filter architecture by embedding sharpening filter betweendeblocking filter and SAO. The proposed algorithm classifiespixels of a frame into several groups according to uniformquantization of each pixel’s Sum-Modified-Laplacian value andassigns identical optimal filtering parameters to the pixels be-longing to the same group based on rate-distortion optimization.Simulation results show that our proposed algorithm achieves0.7% on average and up to 8% BD-rate reduction with respectto the original HEVC in-loop filtering method. Encoding timeincreases slightly by about 15% and decoding time increases by70% on average without special optimization of C++ programintegrated in HM-16.5.

I. INTRODUCTION

With the rapid growth of demand for high-definition videoswith 1080p or even 4K resolutions and the limited storagespace or network bandwidth, more and more people areinspired to dedicated in research on video compression. HighEfficiency Video Coding (HEVC) is a newly established videocoding standard in Jan. 2013 as the successor of H.264/AVCby joint collaborative team on video coding (JCT-VC), whoseprimary goal is to achieve 50% bit-rate reduction as comparedwith H.264/AVC under the same perceptual quality [1].

While inheriting most of H.264/AVC’s features, HEVC isstill based on a hybrid coding scheme using block-based pre-dictions and transforms. Unlike the fixed 16×16 macroblockcoding basic unit [2], HEVC introduces a flexible quad-tree structure called Coding Tree Unit (CTU), predefined byconfiguration with size 16×16, 32×32 or 64×64, which can befurther split into different sizes of Coding Unit (CU) varyingfrom 8×8 to 64×64, Prediction Unit (PU) and Transform Unit(TU) varying from 4×4 to 32×32 adaptively according tocomplexity of frame textures [1].

Such a block-based coding scheme adopted by H.264/AVCand HEVC will be no doubt that leads to blocking artifactsbecause of the discontinues occurred on block boundaries

MMSP’15, Oct. 19 - Oct. 21, 2015, Xiamen, China.978-1-4673-7478-1/15/$31.00 c©2015 IEEE

caused by quantization errors and differences among motionvectors of adjacent blocks [3]. Besides, another artifact—ringing artifact will also be introduced after compression,which is analogous to the Gibbs phenomenon and results fromthe high frequency detail loss due to quantization [4]. In themost recent version of HEVC codec, two kinds of in-loopfilters are adopted in the standard—deblocking filter [5] whoserole is to reduce the blocking artifact and sample adaptiveoffset (SAO) [6] aimed at removing the ringing artifact. Boththe two in-loop filters are designed to improve the subjectivevisual quality and objective coding efficiency while reducingthe implement complexity, especially considering the paralleloperations of encoder and decoder. Another in-loop filteradopted by early HEVC version is adaptive loop filter (ALF),which is a Wiener-based adaptive filter in order to minimizethe mean square error between original samples and decodedsamples [7].

The essence of current deblocking filter is still low-passfiltering by eliminating high frequency noise occurred onblock boundaries. Even the HEVC standard has employedsome smart strategy to identify the discontinues of blockboundaries and real image edges, it still blurs some exactimage textures. So the most important part of in-loop filter isto preserve the original image edges or even recover the realimage textures as far as possible while removing the blockingand ringing artifacts at the same time. So, in recent years,many people spent a lot of efforts on pursing some sharpeningfilters, playing an opposite role of deblocking filter, which cansharpen image edges and restore image textures.

In [8], a famous bilateral filter is proposed to smooth imagewhile preserving edges by means of a nonlinear combina-tion of nearby image values. Based on such edge-preservingsmooth image filter, paper [9] proposes a new type of trainedbilateral filter by least mean square optimization to reducecoding artifacts. Paper [10] integrates bilateral filer into earlyHEVC coding standard, doing rate-distortion optimization(RDO) combined with ALF to enhance coding efficiency.

Inspired by the classical bilateral filter, an explicit im-age filter named guided filter is proposed in [11]. Derived

Page 2: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

from a local model, the guided filter generates output imageconsidering the content of a guidance image (can be theinput image itself), which can perform as an edge-preservingsmooth operator or even sharpening operator and has betterbehavior as compared with the bilateral filter. Paper [12] hasemployed this guided image filter on screen content coding.Consequently, the filter is with large probability useful forvideo coding, reducing coding artifacts while preserving realimage edges and finally resulting in significant enhancementof coding efficiency. Therefore, in this article, we will proposean adaptive sharpening filter to improve in-loop filter of videocoding based on guided image filtering.

The rest of the paper is organized as follows. In Section 2,we briefly introduce the related work of the guided image filter.In Section 3, we detailed present our proposed sharpening filterintegrated with HEVC standard. Some experimental results areshown in Section 4. Finally, Section 5 concludes this paper.

II. REVIEW OF GUIDED IMAGE FILTER

Guided image filter (GIF) is proposed in [11]. In thisalgorithm, a guidance image G, which could also be identicalto the input image I itself, is introduced. GIF assumes thatthe filter is a local linear model between the guidance imageand the filter output image Y as

Y (i) = akG(i) + bk,∀k ∈ ωk (1)

where {ak, bk} are some linear coefficients considered to beconstant in a square window ωk (can be 3×3, 5×5, 7×7, · · · )which contains image pixel i.

In order to solve out the coefficients {ak, bk}, the GIFalgorithm proposes a cost function that minimizes the squareerror between input pixel value I(i) and filter output pixelvalue Y (i) by

E(ak, bk) =∑i∈ωk

[(akG(i) + bk − I(i))

2+ εa2k

](2)

where εa2k is a regularization term preventing ak from beingtoo large. Simply computing ∂E(ak,bk)

∂ak= 0 and ∂E(ak,bk)

∂bk= 0,

the solution of Eq. (2) are given by:

ak =

1|ωk|

∑i∈ωk

G(i)I(i)− µkIk

σ2k + ε

(3)

bk = Ik − akµk (4)

where |ωk| means the pixel number in window ωk, µk and σ2k

are the mean and variance of guidance image G in windowωk, respectively and Ik represents the mean value of inputimage I in window ωk.

Finally, after averaging all the possible output values Y (i)for all windows ωk that contains pixel i, the filter output isgiven by:

Y (i) =1

|ωk|∑

k:i∈ωk

(akG(i) + bk) = a(i)G(i) + b(i) (5)

Ref

Motion

Est.

Motion

Comp.

+

M

C

IntraPred.

Intra

Inter

-

Residual

T & Q CABAC

Bit-Stream

Q-1& T-1+

+

MVs

IntraEst.

Mode

SAO ParamsEst.

Deblk.

Filter Control

Reconstructed

RefRefRef.

DPB

RefRefRef

Input Video

SAO params

Quantized

residuals

Reference samples Intra/Inter

Decision

MVs/Intra modes I

SAO

(a) HEVC encoder architecture

Ref

Motion

Est.

Motion

Comp.

+

M

C

IntraPred.

Intra

Inter

-

Residual

T & Q CABAC

Bit-Stream

Q-1& T-1+

+

MVs

IntraEst.

Mode

SAO ParamsEst.

Deblk.

Filter Control

Reconstructed

RefRefRef.

DPB

RefRefRef

Input Video

SAO params

Quantized

residuals

Reference samples Intra/Inter

Decision

MVs/Intra modes I

SAO

SharpeningFilter

Sharpening Filter paramsSF ParamsEst.

Filter Control

(b) New encoder architecture including proposed sharpening filter

Fig. 1. Comparison of traditional HEVC encoder architecture and newencoder architecture

III. PROPOSED ADAPTIVE SHARPENING FILTER

A. Proposed In-loop Filter Architecture

As shown in Fig. 1(a), the in-loop filter part of traditionalHEVC encoder architecture consists of two filters—deblockingfilter and SAO. Inheriting the two in-loop filters, our newproposed encoder architecture embeds a sharpening filterbased on guided image filter between the deblocking filterand SAO. The three filters aim at reducing blocking artifact,sharpening or restoring image edges and eliminating ringingartifact, respectively.

Like SAO, the sharpening filter also needs be controlledby some filtering parameters that are estimated by previousalready encoded frames and chosen in current frame accordingto rate-distortion optimization. And of course, such parame-ters should be packed into bitstream by entropy coding andtransmitted to decoder. Detailed algorithm of this sharpeningfilter will be described in the following subsections.

B. Derivation of Self-guided Image Filter

In term of video coding, the only known information forthe decoder is the already decoded frames, so the moststraightforward way to handle the GIF algorithm is regardingdecoded frame itself as guidance image directly. Therefore, theGIF algorithm can be simplified as self-guided image filter.Replacing guidance image G(i) as input image itself I(i) inEq. (3) and Eq. (4), the solutions of parameters {ak, bk} are

Page 3: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

(a) (b) (c) (d)

Fig. 2. Sequence BasketballDrill coded by encoder lowdelay P main configuration, QP=27, frame 10: (a) reconstructed image after deblocking filter,37.1278dB; (b) reconstructed image after deblocking and pixel-adaptive self-guided image filter, 38.1624dB; (c) difference map between (a) and (b); (d)optimal ε map (pixel with color black means no guided filter, blue means ε = 5, green means ε = 10, yellow means ε = 15 and red means ε = 20)

(a) (b) (c) (d)

Fig. 3. Sequence BasketballPass coded by encoder lowdelay P main configuration, QP=27, frame 5: (a) reconstructed image after deblocking filter,38.6681dB; (b) reconstructed image after deblocking and pixel-adaptive self-guided image filter, 39.9247dB; (c) difference map between (a) and (b); (d)optimal ε map (pixel with color black means no guided filter, blue means ε = 5, green means ε = 10, yellow means ε = 15 and red means ε = 20)

(a) (b) (c) (d)

Fig. 4. Sequence Johnny coded by encoder lowdelay P main configuration, QP=27, frame 5: (a) reconstructed image after deblocking filter, 41.2768dB;(b) reconstructed image after deblocking and pixel-adaptive self-guided image filter, 42.1389dB; (c) difference map between (a) and (b); (d) optimal ε map(pixel with color black means no guided filter, blue means ε = 5, green means ε = 10, yellow means ε = 15 and red means ε = 20)

given by:

ak =

1|ωk|

∑i∈ωk

I2(i)− I2kσ2k + ε

=I2k − I

2

k

σ2k + ε

=σ2k

σ2k + ε

(6)

bk = Ik − akIk =ε

σ2k + ε

Ik (7)

where σ2k and Ik represents variance and mean of input image

in window ωk. Then, it’s easy to derive the filter output as:

Y (i) =1

|ωk|∑

k:i∈ωk

(σ2kI(i) + εIkσ2k + ε

)(8)

Observing the solution of {ak, bk} in Eq. (6) and Eq. (7),suppose the parameters ωk and ε are determined, it’s obviousthat in some “flat areas” the value of ak will close to 0 whilebk will close to Ik, which means that the pixel located inthe middle of a “flat areas” will be smoothed by the pixelsnearby. On the other hand, in some “fluctuating area” thevalue of ak will close to 1 while bk will close to 0, whichmeans that the pixel located in the middle of a “fluctuatingarea” will remain unchange. Such features ensure that original

image edges will not be smoothed, achieving the goal—edge-preserving. Furthermore, changing of parameters {ωk, ε} willadjust the degree of edge sharpness.

In term of the algorithm complexity, Eq. (8) clearly showsthat only some local variances and means should be pre-calculated and then a linear combination of pixel values ina local window is regarded as the filter output of each pixel.So, such algorithm has a O(N) computation complexity (N isthe total pixel number), which is a great advantage for videocoding that requires fast encoding and decoding speed.

C. Performance of Pixel-adaptive Self-guided Image Filter

Original GIF algorithm applies the filter on the whole imageusing the same parameters [11], but it’s a very coarse strategyto enhance quality of a reconstructed frame after compressionsince there is an original frame as a reference and a strictobjective criteria—PSNR. So the most accurate solution toachieve the best filter output is determining parameters foreach pixel respectively, namely pixel-adaptive self-guided im-age filter. Fortunately, only two parameters—local windowsize ω and coefficient ε need to be handled (refer to Eq. (8)).

Experiments in [11] detailed demonstrate that larger window

Page 4: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

size results in smoother filter output. While our goal is notto smooth the flat area but to sharpen the image edges, weshould spend no efforts on searching the optimal window sizefor each pixel but fix the window size ω as the smallest 3×3.Therefore, the only parameter should be chosen adaptivelyis ε. We conduct a much more accurate experiment thatassigns an adaptive parameter ε to each pixel. ε is optimizedby calculation square errors after filtering over five possibleoptions—ε = {5, 10, 15, 20} and don’t filtering. Notice thatthe guided filter is only applied on Luma component.

Three sequences (BasketballDrill, BasketballPass andJohnny) are coded by default encoder lowdelay P main con-figuration with quantization parameter (QP) equaling 27. Ex-periment results are shown in Fig. 2–4. Sub-images (a) (b) (c)(d) represent reconstructed frames after deblocking filter, re-constructed frames after deblocking flter+pixel-adaptive self-guided image filter, difference map between (a) and (b) andoptimal ε map, respectively.

Observing these figures, especially the subparts in small redwindows, the optimal pixel-adaptive self-guided image filtersharpens the image edges more naturally and continuously,further reducing some blocking artifacts and even ringingartifacts. In addition, objective qualities are also significantlyimproved by fantastic more than 1dB PSNR gain. Differencemaps clearly illustrate that main enhancements occurred onimage edges and high variance areas. Finally, the most im-portant ε maps also indicate that the “flat areas” are assignedwith small ε values or even no filtering while the “fluctuatingareas” or image edges are filtered with large ε values.

Although such pixel-adaptive self-guided image filter willresult in significant PSNR gain with optimal parameter as-signment, it’s a huge bit-rate expense when transmitting the εmaps shown in Fig. 2–4(d). Therefore, a feasible alternativestrategy is to estimate the optimal ε map using already knowninformation—previous decoded frames, which can keep abalance between increasing of extra bit-rate and improvementof video quality.

D. Estimation of Parameters

The above section detailed discusses that the only parameterneed to be estimated in entire guided image filtering algorithmis the ε and experiments demonstrate that the chosen of ε arehighly correlated with the texture complexity. Consequently, alogical idea is to consider each pixel’s gradient intensity.

The Laplace operator is a differential operator given bythe divergence of the gradient of a continuous function onEuclidean space. A discrete approximation to the modifiedLaplacian is obtained as:

ML(x, y) =|2I(x, y)− I(x− 1, y)− I(x+ 1, y)|+|2I(x, y)− I(x, y − 1)− I(x, y + 1)|

(9)

Then the Sum-Modified-Laplacian (SML) [13] operator iscomputed in a small window (2N + 1)× (2N + 1) as below:

SML(i, j) =

i+N∑x=i−N

i+N∑y=i−N

[ML(x, y)] (10)

VPS id

VPS data

SPS id

VPS index

SPS data

PPS id

SPS index

PPS data

First slice flag

PPS index

Slice seg. header

Slice seg. data

VPS Slice segmentPPSSPS PPS

SF Param.

Fig. 5. Parameter set referencing hierarchy including the guided imagefiltering based sharpening filer parameters

All pixels in an entire frame are classified into M groupsaccording to uniform quantization of their SML values. Pixelsassigned in each group share one parameter—ε selected from acandidate list E and determined by rate-distortion optimization(RDO).

According to a large number of experiments, finally in ourproposed adaptive guided filter algorithm, the small windowfor SML calculation is defines as 7× 7 (N = 3), window ofguided filter local averaging is fixed as 3 × 3 (|ω| = 3 × 3),each frame is divided into 16 groups (M = 16) and the εcandidate list is E = {0, 1, 2, 3, · · · , 31}, where ε = 0 meansapplying on guided filter on these pixels.

Summarily, the process of adaptive guided image filteringoptimization is as follow:• Divide all pixels of a frame into 16 groups based on

calculation of SML,• For all the pixels associated to group i, do

– Define MinRDCost(i) = λ · rate(0) (means noguided filering)

– For each candidate ε in set E, do∗ compute RDCost(i) = ∆SSE(i, ε) +λ · rate(ε)

on the image pixels belonging to group i beforeand after guided filter.

∗ if RDCCost(i) < MinRDCost(i), store thebest parameter ε and update MinRDCost(i) =RDCCost(i).

– Go to next candidate ε in set E• Go to next group i• Perform guided image filter on all the pixels using the

optimal parameters for each group.Notice that the rate(ε) represents the bit cost when coding ε,which will be detailed explained in the next subsection.

E. Syntax Design and Entropy Coding of Parameters

The coded information filter coefficient ε need to be trans-mitted. As shown in Fig. 5, the parameters are located in thepicture parameter set (PPS) since the coefficient set of ε is usedfor an entire frame and will change among different pictures.

Since adjacent frames usually share similar parameters, forB or P frame, a most probable ε candidate list for each groupi is defined as Ebest

i = {ε1i , ε2i , ε3i , ε4i } which are estimatedby corresponding group of previous already coded referenceframes (up to 4). For the remaining ε in candidate list E,calculate the smallest difference between ε and the value inEbest

i , then transmit the index in Ebesti and the difference

value.

Page 5: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

TABLE IBD-RATE REDUCTION UNDER DIFFERENT SEQUENCES COMPARED WITH

HM-16.5 (60 FRAMES, encoder intra main CONFIGURATION)

Anchor: Sequence Y BD-rateHM-16.5 QP=22–37 QP=27–42

Traffic -0.26% -0.51%Class A PeopleOnStreet -0.85% -1.49%

(4k) Nebuta -0.10% -0.12%SteamLocomotiveTrain -0.28% -0.25%

Avg -0.37% -0.59%Kimono -1.22% -1.27%

ParkScene -0.00% -0.16%Class B Cactus -0.36% -0.52%(1080p) BasketballDrive -0.75% -1.35%

BQTerrace -0.01% -0.10%Avg -0.47% -0.68%

BasketballDrill -0.64% -0.87%Class C BQMall -0.38% -0.50%

(WVGA) PartyScene -0.32% -0.40%RaceHorses -0.53% -0.61%

Avg -0.46% -0.59%BasketballPass -0.51% -0.64%

Class D BQSquare -0.73% -0.88%(WQVGA) BlowingBubbles -0.30% -0.48%

RaceHorses -0.57% -0.64%Avg -0.53% -0.66%

FourPeople -0.80% -1.09%Class E Johnny +0.01% -0.18%(720p) KristenAndSara -0.74% -1.03%

Avg -0.51% -0.77%BasketballDrillText -0.28% -0.87%

Class F ChinaSpeed -0.44% -0.64%Synthetic SlideEditing -0.98% -0.91%

SlideShow -1.80% -1.95%Avg -0.87% -1.09%

Avg BD-rate -0.53% -0.73%Overall Encoding time 134% 138%

Summary Decoding time 158% 168%

IV. EXPERIMENT RESULTS

Simulations are conducted on a range of HEVC stan-dard test video sequences including natural and syntheticscenes. Performance of the proposed algorithm is comparedagainst the HEVC algorithm implemented in the new es-tablished HM-16.5 reference software [14] using the de-fault encoder intra main, encoder randomaccess main, en-coder lowdelay main and encoder lowdelay P main config-urations. Furthermore, we take the average over 60 frames toensure the correctness of experimental results.

The whole algorithm is implemented on Luma componentof 24 test sequences which are arranged into six classes:Class A (4K), Class B (1080p), Class C (WVGA), ClassD (WQVGA), Class E (720p) and Class F (synthetic). Thequantization parameters are set to QP = {22, 27, 32, 37}for the high quality test and QP = {27, 32, 37, 42} for thelow quality test. The coding efficiency results are presented inTable I–IV as the percentage of bit-rate savings (BD-rate [15])with respect to the HM-16.5 main profile anchor. Statisticsreveal that our algorithm obtain about 0.7% performance gainon average and can be up to 8% for the sequence “Kimono”under encoder lowdelay P main configuration, as comparedwith the original HEVC’s coding scheme.

Generally speaking, the proposed algorithm performs the

TABLE IIBD-RATE REDUCTION UNDER DIFFERENT SEQUENCES COMPARED WITHHM-16.5 (60 FRAMES, encoder randomaccess main CONFIGURATION)

Anchor: Sequence Y BD-rateHM-16.5 QP=22–37 QP=27–42

Traffic +0.23% +0.04%Class A PeopleOnStreet -0.26% -0.56%

(4k) Nebuta -0.45% -1.03%SteamLocomotiveTrain -1.46% -0.77%

Avg -0.49% -0.58%Kimono -1.13% -1.15%

ParkScene +0.37% +0.27%Class B Cactus +0.11% -0.07%(1080p) BasketballDrive -0.29% -0.45%

BQTerrace -0.24% -0.49%Avg -0.23% -0.38%

BasketballDrill -0.24% -0.51%Class C BQMall -0.38% -0.60%

(WVGA) PartyScene -0.16% -0.26%RaceHorses -0.46% -0.72%

Avg -0.31% -0.52%BasketballPass -0.21% -0.19%

Class D BQSquare -0.59% -0.69%(WQVGA) BlowingBubbles -0.13% +0.03%

RaceHorses -0.69% -1.07%Avg -0.40% -0.48%

FourPeople -0.67% -0.75%Class E Johnny -0.02% -0.26%(720p) KristenAndSara -0.78% -0.71%

Avg -0.49% -0.58%BasketballDrillText -0.35% -0.79%

Class F ChinaSpeed -0.62% -0.73%Synthetic SlideEditing -0.09% -0.66%

SlideShow -1.50% -1.46%Avg -0.64% -0.91%

Avg BD-rate -0.42% -0.57%Overall Encoding time 109% 110%

Summary Decoding time 168% 180%

best under encoder lowdelay P main configuration whichcan achieve 1.1% BD-rate reduction on average whilethe proposed filter save only 0.5% BD-rate under en-coder randomaccess main configuration. Besides, this sharp-ening filter is more effective for low bir-rate video codingsince the low quality test performs better than high qualitytest under all configurations.

In terms of the algorithm complexity, the encoding timewill increase by 10% on average because of introduction ofthe sharpening filter and the decoding time will increase byaverage 70%.

V. CONCLUSION

In this paper, we propose a new sharpening filter based onguided image filter and embeds this filter between deblockingfilter and sample adaptive offset (SAO) in the HEVC en-coding structure. Our algorithm classifies pixels of a frameinto several groups according to uniform quantization ofeach pixel’s Sum-Modified-Laplacian (SML) value and assignsoptimal filtering parameters to the pixels belonging to eachgroup based on rate-distortion optimization (RDO). Finally, weapply self-guided filtering on Luma component of images andeach group shares the same parameters. Experimental resultsemploying C++ program integrated in HM-16.5 show that ourproposed algorithm achieves 0.7% on average and up to 8%

Page 6: Adaptive Guided Image Filter for Improved In-Loop ...vzhao/temp/Papers/MMSP15_143.pdf · ringing artifact will also be introduced after compression, ... In order to solve out the

TABLE IIIBD-RATE REDUCTION UNDER DIFFERENT SEQUENCES COMPARED WITH

HM-16.5 (60 FRAMES, encoder lowdelay P main CONFIGURATION)

Anchor: Sequence Y BD-rateHM-16.5 QP=22–37 QP=27–42

Traffic +0.39% -0.28%Class A PeopleOnStreet -0.28% -1.27%

(4k) Nebuta -1.27% -2.37%SteamLocomotiveTrain -0.24% -0.79%

Avg -0.35% -1.18%Kimono -7.50% -8.39%

ParkScene +0.15% -0.77%Class B Cactus +0.10% -0.55%(1080p) BasketballDrive -0.24% -1.06%

BQTerrace -0.23% -0.86%Avg -1.54% -2.33%

BasketballDrill -0.22% -0.85%Class C BQMall -0.03% -0.58%

(WVGA) PartyScene -0.21% -0.37%RaceHorses -0.37% -1.02%

Avg -0.21% -0.70%BasketballPass -0.53% -1.51%

Class D BQSquare -1.11% -1.22%(WQVGA) BlowingBubbles -0.47% -1.11%

RaceHorses -0.68% -1.56%Avg -0.70% -1.35%

FourPeople -0.46% -1.39%Class E Johnny -0.61% -1.54%(720p) KristenAndSara -1.11% -1.90%

Avg -0.73% -1.61%BasketballDrillText -0.18% -0.88%

Class F ChinaSpeed -0.58% -0.65%Synthetic SlideEditing -0.44% -0.57%

SlideShow -2.18% -2.12%Avg -0.85% -1.05%

Avg BD-rate -0.76% -1.40%Overall Encoding time 109% 111%

Summary Decoding time 170% 180%

BD-rate reduction with respect to the original HEVC codingmethod. Encoding time increases slightly by about 15% anddecoding time increases by 70% on average.

Our future work will focus on more accurate estimation offiltering parameters which can achieve more significant en-hancement of compression efficiency and further optimizationof program’s running efficiency for the purpose of decreasingcoding time.

REFERENCES

[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han and T. Wiegand, “Overview of thehigh efficiency video coding (HEVC) standard,” IEEE Trans. CircuitsSyst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.

[2] T. Wiegand, G. J. Sullivan, G. Bjøntegaard and A. Luthra, “Overviewof the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[3] T. Meier, K. N. Ngan and G. Crebbin, “Reduction of blocking artifactsin image and video coding,” IEEE Trans. Circuits Syst. Video Technol.,vol. 9, no. 3, pp. 490–500, Apr. 1999.

[4] S. H. Oguz, Y.-H. Hu and T. Q. Nguyen, “Image coding ringingartifact reduction using morphological post-filtering,” IEEE Workshopon Multimedia Signal Processing, MMSP 1998, Redondo Beach, CA,USA, Dec. 7–9, 1998.

[5] A. Norkin, G. Bjøntegaard, A. Fuldseth, M. Narroschke, M. Ikeda,K. Andersson, M. Zhou and G. Van der Auwera, “HEVC deblockingfilter,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp.1746–1753, Dec. 2012.

TABLE IVBD-RATE REDUCTION UNDER DIFFERENT SEQUENCES COMPARED WITH

HM-16.5 (60 FRAMES, encoder lowdelay main CONFIGURATION)

Anchor: Sequence Y BD-rateHM-16.5 QP=22–37 QP=27–42

Traffic +0.03% -0.23%Class A PeopleOnStreet -0.01% -0.56%

(4k) Nebuta -1.29% -2.55%SteamLocomotiveTrain -0.08% -0.34%

Avg -0.34% -0.92%Kimono -1.79% -2.00%

ParkScene +0.07% -0.14%Class B Cactus -0.08% -0.22%(1080p) BasketballDrive +0.06% -0.12%

BQTerrace -0.31% -0.44%Avg -0.41% -0.58%

BasketballDrill +0.04% -0.36%Class C BQMall -0.12% -0.28%

(WVGA) PartyScene -0.04% -0.25%RaceHorses -0.21% -0.65%

Avg -0.08% -0.38%BasketballPass -0.47% -0.56%

Class D BQSquare -0.55% -0.78%(WQVGA) BlowingBubbles -0.39% -0.83%

RaceHorses -0.66% -1.11%Avg -0.52% -0.82%

FourPeople -0.59% -1.05%Class E Johnny -0.46% -0.78%(720p) KristenAndSara -0.63% -0.73%

Avg -0.56% -0.85%BasketballDrillText +0.07% -0.34%

Class F ChinaSpeed -0.51% -0.58%Synthetic SlideEditing -0.56% -0.54%

SlideShow -1.36% -1.46%Avg -0.59% -0.73%

Avg BD-rate -0.41% -0.70%Overall Encoding time 109% 110%

Summary Decoding time 170% 180%

[6] C.-M. Fu, E. Alshina, A Alshin, Y.-W. Huang, C.-Y. Chen, C.-Y. Tsai,C.-W Hsu, S.-M. Lei, J.-H. Park and W.-J. Han, “Sample adaptive offsetin the HEVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol.22, no. 12, pp. 1755–1764, Dec. 2012.

[7] C.-Y. Tsai, C.-Y. Chen, A. Alshin, T. Yamakage, I. S. Chong, Y.-W. Huang, C.-M. Fu, T. Itoh, T. Watanabe, T. Chujoh, M. Karczewiczand S.-M. Lei, “Adaptive Loop Filtering for Video Coding,” IEEE Trans.Circuits Syst. Video Technol., vol. 7, no. 6, pp. 934–944, Dec. 2013.

[8] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and colorimages,” IEEE Conference on Computer Vision, ICCV 1998, Bombay,India, Jan. 4–7, 1998.

[9] H. Hu and G. de Haan, “Trained bilateral filters and applications tocoding artifacts reduction,” IEEE Conference on Image Processing, ICIP2007, San Antonio, TX, USA, Sep. 16–19, 2007.

[10] M. Naccari and F. Pereira, “Adaptive bilateral filter for improved in-loopfiltering in the emerging high efficiency video coding standard,” PictureCoding Symposium, PCS 2012, Krakow, Poland, May 7–9, 2012.

[11] K. He, J. Sun and X. Tang, “Guided image filtering,” European Confer-ence on Computer Vision, ECCV 2010, Heraklion, Crete, Greece, Sep.5–11, 2010

[12] T. Vermeir, J. Slowack S. Van Leuven, G. Van Wallendael, J. De Cockand R. Van de Walle, “Adaptive guided image filtering for screen contentcoding,” IEEE Conference on Image Processing, ICIP 2014, Paris,France, Oct. 27–30, 2014

[13] S. K. Nayar and Y. Nakagawa, “Shape from focus,” IEEE Trans. PatternAnalysis and Mechine Intelligence, vol. 16, no. 8, pp. 824–831, Aug.1994.

[14] HEVC Test Model, HM-16.5. [Online]. Available:https://hevc.hhi.fraunhofer.de/svn/svn HEVCSoftware/tags/HM-16.5/.

[15] G. Bjøntegarrd, Calculation of average PSNR differences between RDcurves, VCEG-M33, 13th ITU-T Video Coding Experts Group (VCEG)Meeting, Austin, Texas, Apr. 2001.