Multiview-Video-Plus-Depth Coding Based on the Advanced Video Coding Standard

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013 3449

Multiview-Video-Plus-Depth Coding Based on theAdvanced Video Coding Standard

Miska M. Hannuksela, Member, IEEE, Dmytro Rusanovskyy, Wenyi Su, Lulu Chen, Ri Li, Payman Aflaki,Deyan Lan, Michal Joachimiak, Houqiang Li, Member, IEEE, and Moncef Gabbouj, Fellow, IEEE

Abstract— This paper presents a multiview-video-plus-depthcoding scheme, which is compatible with the advanced videocoding (H.264/AVC) standard and its multiview video coding(MVC) extension. This scheme introduces several encoding andin-loop coding tools for depth and texture video coding, such asdepth-based texture motion vector prediction, depth-range-basedweighted prediction, joint inter-view depth filtering, and gradualview refresh. The presented coding scheme is submitted to the 3Dvideo coding (3DV) call for proposals (CfP) of the Moving PictureExperts Group standardization committee. When measured withcommonly used objective metrics against the MVC anchor, theproposed scheme provides an average bitrate reduction of 26%and 35% for the 3DV CfP test scenarios with two and three views,respectively. The observed bitrate reduction is similar accordingto an analysis of the results obtained for the subjective tests onthe 3DV CfP submissions.

Index Terms— H.264/AVC, three-dimensional video, videocoding.

I. INTRODUCTION

CODING of multiview-video-plus-depth (MVD) data [1]facilitates more flexible three-dimensional (3D) video

displaying at the receiving or playback devices when comparedto conventional frame-compatible stereoscopic video as well asmultiview video coding, such as the Multiview Video Coding(MVC) extension of the Advanced Video Coding (H.264/AVC)standard [2]. While coding of two texture views providesa basic 3D perception on stereoscopic displays, it has beendiscovered that disparity adjustment between views is neededfor adapting the content on different displays and viewingconditions, such as viewing distance, as well as for meet-ing individual preferences [3]. Moreover, auto-stereoscopicdisplay technology typically requires displaying a relativelylarge number of views simultaneously, for which views have

Manuscript received October 15, 2012; revised March 15, 2013; acceptedMay 31, 2013. Date of publication June 18, 2013; date of current versionJuly 30, 2013. The associate editor coordinating the review of this manuscriptand approving it for publication was Prof. Gary Sullivan.

M. M. Hannuksela is with Nokia Research Center, Tampere 33720, Finland(e-mail: [email protected]).

D. Rusanovskyy was with Nokia Research Center, Tampere 33720, Finland.He is now with LG Electronics (e-mail: [email protected]).

W. Su, L. Chen, R. Li, D. Lan, and H. Li are with the Universityof Science and Technology of China, Hefei 230026, China (e-mail:[email protected]; [email protected]; [email protected];[email protected]; [email protected]).

P. Aflaki, M. Joachimiak, and M. Gabbouj are with Tampere Uni-versity of Technology, Tampere 33720, Finland (e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2013.2269274

to be generated in the playback device from the receivedviews. These needs can be served by the MVD format andusing the decoded MVD data as source for depth-image-basedrendering (DIBR) [4]. Each texture view is accompanied bya respective depth view in the MVD format, from whichnew views can be synthesized using any appropriate DIBRalgorithm.

MPEG issued a Call for Proposals (CfP) for 3D videocoding technology in March 2011 [5], aiming at standardizinga coding format supporting advanced stereoscopic displayprocessing and improved support for auto-stereoscopic multi-view displays. The CfP invited submissions in two categories,the first one compatible with H.264/AVC and the second com-patible with the High Efficiency Video Coding (H.265/HEVC)standard [6], which was under development at the time ofthe CfP. As a result of the CfP evaluation, MPEG and, sinceJuly 2012, the Joint Collaborative Team on 3D Video Coding(JCT-3V) [7] have initiated two parallel H.264/AVC-basedMVD coding developments, which are briefly described in thefollowing paragraphs.

An MVC extension for inclusion of depth maps, abbre-viated MVC+D, specifies the encapsulation of MVC-codedtexture and depth views into a single bitstream [8], [9]. Thecoding technology is identical to MVC, and hence MVC+Dis backward-compatible with MVC and the texture views ofMVC+D bitstreams can be decoded with an MVC decoder.The last technical changes to the MVC+D specification werefinalized in January 2013.

Another ongoing JCT-3V development is a multiview videoand depth extension of H.264/AVC, referred to as 3D-AVC [10]. This development exploits redundancies betweentexture and depth and includes several coding tools thatprovide a compression improvement over MVC+D. The spec-ification requires that the base texture view is compatible withH.264/AVC and compatibility of dependent texture views toMVC may optionally be provided. 3D-AVC is planned to befinalized in November 2013.

In this paper we present Nokia’s codec submission [11]to the MPEG 3DV CfP [5], which is referred to as Nokia3DV Test Model or Nokia 3DV-TM in this paper. Nokia3DV-TM was evaluated as the best-performing submission inthe H.264/AVC-compatible category of the MPEG 3DV CfP.Consequently, it was selected as the basis of the initial testmodel for MVC+D and 3D-AVC development, where thebackward compatibility requirements of MVC+D could bereached by configuring the encoder.

1057-7149/$31.00 © 2013 IEEE

3450 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013

The paper is organized as follows. Section II describes thegeneral principles and the architecture utilized for the codingof MVD data with Nokia 3DV-TM and the bitstream designof Nokia 3DV-TM. Section III describes the coding tools ofNokia 3DV-TM for texture data, whereas the tools for codingdepth map data are described in Section IV. The conditions ofthe MPEG 3DV CfP, its evaluation procedure and the resultsachieved by the proposed coding design are given in Section V,and additionally Section V analyzes the impact of individualtools on the results. Section VI briefly analyzes the complexityof the tools and explains how the tools have evolved in the 3D-AVC standardization process. Finally, the paper is concludedin the Section VII.

II. PROPOSED CODEC ARCHITECTURE AND

BITSTREAM DESIGN

A. Design Goals

A goal of the Nokia 3DV-TM development was an MVDcoding system that is able to benefit from a wide deploymentof H.264/AVC-based video services and from widely availablehardware and software implementations of H.264/AVC. Ourintent was to allow only a limited number of changes to low-level processing and at the same time obtain a significantcompression improvement compared to MVC-compatible cod-ing. In Nokia 3DV-TM, a modified motion vector prediction(MVP) scheme is the only low-level tool introduced to theoriginal H.264/AVC technology. The Nokia 3DV-TM encodercan be configured to code a selected number of texture viewsas H.264/AVC and MVC compatible, while the remainingtexture views utilize enhanced texture coding.

B. Codec Architecture and Bitstream Structure

The encoder input and decoder output of Nokia 3DV-TMfollow the MVD data format, as detailed in the MPEG 3DVCfP [5]. The encoder codes the input data into a bitstream,which consists of a sequence of access units. Each accessunit consists of texture view components and depth viewcomponents representing one sampling or playback instant ofMVD data. Since the bitrate required for transmission of high-quality texture content is typically significantly larger than thebitrate required for coded depth maps [12], a design conceptin Nokia 3DV-TM is to utilize depth data for enhanced texturecoding. In particular, a depth view component (D) can becoded prior to the texture view component (T) of the sameview and hence used as inter-component prediction referencefor the texture view component. In Nokia 3DV-TM thiscoding order is used for depth-based motion vector prediction(D-MVP) and joint view depth filtering (JVDF).

3DV-TM supports joint coding of texture and depth thathave different spatial resolutions. Particularly, coding of depthdata is supported at full, half (reduced in vertical or horizontaldirections) and quarter spatial resolution (downsampling inthe vertical and the horizontal directions) compared to theresolution of the texture data. To enable coding tools, such asD-MVP and view synthesis prediction (VSP), the resolutionof depth map images is normalized to the resolution of luma

IDR

IDR

B

B

I

P

B

B

I

P

B

B

I

P

0 158 3023 38 45

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Fig. 1. Example of GVR access units (picture order count 15 and 45) codedat every other random access point.

texture images. Depth image normalization is implemented asin-loop upsampling with bi-linear interpolation.

To enable the use of VSP and depth-range-based weightedprediction (DRWP), Nokia 3DV-TM transmits camera para-meters and the depth range represented by the depth views aspart of the bitstream. The parameters include for example theclosest and farthest real-world depth values Znear and Z f ar ,respectively.

C. Gradual View Refresh (GVR)

Nokia 3DV-TM allows random access into bitstream witha new type of an access unit, referred to gradual view refresh(GVR) access unit. This section reviews GVR briefly, whilean in-depth analysis of GVR is available in [13].

MVC enables random access through instantaneous decod-ing refresh (IDR) and anchor access units, which allow onlyinter-view prediction and disallow temporal prediction (a.k.a.inter prediction). All access units following an IDR or anchoraccess unit in output order can be correctly decoded. GVRaccess units are coded in such a way that inter predictionis selectively enabled and hence compression improvementcompared to IDR and anchor access units may be obtained.When decoding is started from a GVR access unit, a subsetof the views in the multiview bitstream can be accuratelydecoded, while the remaining views can only be approximatelyreconstructed. The encoder selects which views are refreshedin a GVR access unit and codes these view components in theGVR access unit without inter prediction, while the remainingnon-refreshed views may use both inter and interview predic-tion. Accurate decoding of all views can be achieved in asubsequent IDR, anchor, or GVR access unit.

Fig. 1 presents an example bitstream where GVR accessunits are coded at every other random access point. It isassumed in that the frame rate is 30 Hz and random accesspoints are coded every half a second. In the example, GVRaccess units refresh the base view only, while the non-baseviews are refreshed once per second with anchor access units.

When decoding is started from a GVR access unit, thetexture and depth view components which do not use inter pre-diction are decoded. Then, DIBR may be used to reconstructthose views that cannot be decoded because inter predictionwas used for them. It is noted that the separation betweenthe base view and the synthesized view is selected based onthe rendering preferences for the used display environmentand therefore need not be the same as the camera separationbetween the coded views. Fig. 2 presents an example of thedecoder side operation when decoding is started at a GVRaccess unit of the bitstream presented in Fig. 1.

HANNUKSELA et al.: MULTIVIEW-VIDEO-PLUS-DEPTH CODING 3451

IDR

IDR

B

B

I B I

P

B

B

I

P

0 158 3023 38 45

...

...

...

...

... ... ...

...

...

...

...

...

DIBRFrames notreceived

DIBR... ...

Fig. 2. Decoder operation when starting decoding from GVR access unit atpicture order count 15.

Tools for enhanced texture coding

T

FrameBuffer z-1

T-1

Q ENC.

ME

VSP

+

+

ENC.

-

AVC coding scheme

X’

Q-1

MCP

X

Y

E’

mvD-MVP

Decodeddepth

picture(s)

Decodedtexturepicture(s)

Fig. 3. High-level flow chart of the texture encoder in the Nokia 3DV-TM.

III. DEPTH-BASED ENHANCED TEXTURE CODING TOOLS

A. Introduction

Nokia 3D-TM was designed with an assumption that mostof the compression gain for MVD coding can be achievedfrom improvements in texture coding since the bitrate budgetfor depth map is usually a minor share of the total MVDbitrate [12]. Therefore, Nokia 3DV-TM includes two texturecoding tools that utilize depth information: view synthesisprediction (VSP) and depth-based motion vector prediction(D-MVP). Fig. 3 shows a high level flowchart of the texturecoding in Nokia 3DV-TM with VSP and D-MVP modulesmarked in red color.

B. View Synthesis Prediction (VSP)

In VSP, an already decoded texture view component isprojected according to camera parameters to a viewing pointof the currently (de)coded dependent view using DIBR, asdescribed in many earlier papers, such as [14]. In Nokia 3DV-TM the projected image is included in the reference picturelist(s) and serves as a reference for motion compensatedprediction (MCP). The VSP implemented in Nokia-3DV issimilar to that presented in [15]. However, in order to keepthe syntax of the macroblock coding layer unchanged, Nokia

3DV-TM does not include specific VSP skip and direct modesas in [15].

There are multiple DIBR implementations available includ-ing different projection and post-processing techniques. TheDIBR algorithm of VSP utilized in Nokia 3DV-TM wasimplemented using the 1D image projection of the MPEGview synthesis reference software (VSRS) [16], [17]. As partof the DIBR algorithm, the depth sample values are convertedto disparity vectors, which are rounded to a quarter-pixelaccuracy, resulting in a virtual image t (x, y) being horizontallyfour times the size of the source image s(x, y).

In order to be used as a reference picture for MCP, thevirtual image t (x, y) is downsampled using the default filterof VSRS before inserting it in the initial reference picture listsat a position subsequent to temporal and inter-view referenceframes. The reference picture list modification syntax wasextended to support VSP reference pictures, thus any orderingof reference picture lists is allowed.

C. Depth-Based Motion Vector Prediction (D-MVP)

In this sub-section we review the H.264/AVC motion vectorprediction with a goal of explaining its shortcomings for MVDcoding. We then introduce D-MVP, which is a novel featurein Nokia 3DV-TM. The D-MVP scheme is described in moredetails in [18].

In H.264/AVC motion information associated with eachprediction block of a current block (Cb) consists of threecomponents, a reference index (refIdx) indicating the referencepicture and two spatial components of motion vectors (MV x ,and MV y). In order to reduce the required number of bits toencode the motion information, the blocks adjacent to Cb areused to produce a predicted motion vector (mvpx, mvpy), andthe difference between the actual motion information of Cband mvp is transmitted.

H.264/AVC specifies that components of the predictedmotion vector are calculated by a median value of the cor-responding motion vector components (MV x , MV y) of theneighboring blocks A, B and C:

mvpx = median(MVx (A), MVx (B), MVx(C))

mvpy = median(MVy(A), MVy(B), MVy(C)) (1)

where the subscripts x and y indicate the horizontal andvertical components of the motion vector MV respectively.The layout of spatial neighbors (A, B, C) utilized in MVP isdepicted in the top-left corner of Fig. 4. The motion vectorsof the corresponding blocks (A, B, C) are marked accordingly(MV(A), MV(B), MV(C)).

As described in more details in [18], the median MVP ofH.264/AVC is not suitable for using more than one predictiondirection (inter, inter-view, VSP), because it operates indepen-dently in the horizontal and vertical directions and becausethe magnitude of motion vector components can differ to agreat extent in different prediction directions. Therefore, inNokia 3DV-TM we restricted the conventional median MVPof (1) to identical prediction directions. All available neigh-boring blocks are classified according to the direction of theirprediction (temporal, inter-view, VSP). For example, if Cb uses


Temporal prediction in Cb

Select blocks {A,B,C}with temporal prediction

Select blocks {A,B,C}with inter-view prediction

N

Median MVPof H.264/AVC

MV coding

YAverage disparity D

Median MVP with Dinstead of zero MV

MVs of Cband {A,B,C}

Disparity blockD(Cb(i))

Cb

B

A

CD

_

_

Fig. 4. Flow chart of direction-separated MVP.

an inter-view reference picture, all neighboring blocks whichdo not utilize inter-view prediction are marked as not-availablefor MVP and are not considered in the median MVP (1). Theflowchart of this process is depicted in Fig. 4 for inter andinter-view prediction, while it is also applied similarly for VSPin Nokia 3DV-TM Furthermore, we introduced a new defaultcandidate vector, when inter-view prediction is in use in theoriginal H.264/AVC design: if no motion vector candidatesare available from the neighboring blocks, MV x is set to theaverage disparity �D which is associated with Cb and computedby (2):

�D(Cb) = 1

N

∑

i

D(Cb(i)) (2)

where i is the index of pixels within Cb, D(Cb(i)) is thedisparity of pixel Cb(i), and N is the total number of lumapixels in Cb.

In addition to the generic MVP defined by (1), there are twospecial modes, the Direct and Skip modes, in H.264/AVC.In these modes, motion vector components are predicted asshown in (1), whereas the minimal reference index used inthe neighboring blocks (A, B, C) is selected for Cb. Thisselection of reference indices favors the prediction direction ofthe first reference picture in the reference picture list and henceconstrains the use of the Direct and Skip modes for multipleprediction directions. We therefore introduced a depth-basedmotion competition (DMC) into Nokia 3DV-TM as describedin the following paragraphs.

The flow chart of DMC in the Skip mode is shown in Fig. 5.In the Skip mode, motion vectors {MV i } of the texture datablocks {A, B, C} are grouped according to their predictiondirection. The DMC process, which is detailed in the greyblock of Fig. 5, is performed for each group independently.

For each motion vector MV i within a given group, we firstderive a motion-compensated depth block d(Cb,MV i) wherethe motion vector MV i is applied relative to the positionof Cb to obtain the depth block from the reference picturepointed to by MV i . Then, we estimate the similarity of d(Cb)and d(Cb,MV i) by computing the sum of absolute differences

Fig. 5. Flowchart of the DMC for Skip mode in P Slice.

(SAD) as follows:

SAD(MVi ) = SAD(d(Cb, MVi , d(Cb)) (3)

The MV i that provides a minimal SAD value within thecurrent group is selected as the optimal predictor for thatparticular direction (mvpdir ). Following this, the predictor inthe temporal direction (mvptemp) is compared to the predictorin the inter-view direction (mvpinter ), and the predictor whichprovides the minimal SAD is used in the Skip mode.

The MVP for the Direct mode of B slices is very similarto the Skip mode, but DMC (marked with grey blocks) isperformed over both reference pictures lists (List 0 and List 1)independently. Thus, for each prediction direction (temporalor inter-view) DMC produces two predictors (mvp0dir andmvp1dir ) for List 0 and List 1, respectively. The SAD valuesof mvp0dir and mvp1dir are computed as shown in (3) andaveraged to form the SAD of bi-prediction for each direc-tion independently. Finally, the MVP for the Direct mode isselected among mvpinter and mvptemp based on which oneproduces a smaller SAD, similarly to the Skip mode.

IV. DEPTH CODING TOOLS

A. Introduction

The following depth coding tools were included in Nokia3DV-TM: joint view depth filtering (JVDF), intending toimprove the fidelity of depth maps across views, and depth-range-based weighted prediction (DRWP), which utilizestransmitted depth range parameters for deriving weightedprediction parameters implicitly, and VSP that operates asdescribed above. The interaction of these tools with otherdepth coding blocks is illustrated in Fig. 6. VSP has beendescribed in Section III.B and JVDF and DRWP are describedin the next sub-sections.

B. Joint View Depth Filtering (JVDF)

The main idea of JVDF is that depth map filtering can uti-lize redundancy of multi-view depth map representation, anddepth maps of all available N viewpoints are filtered jointly.


Fig. 6. High-level flow chart of the depth encoder in the Nokia 3DV-TM.

JVDF attempts to make depth maps of the same time instantconsistent across views and hence removes depth estimationand coding errors. JVDF is similar to but somewhat simplerthan the approach proposed in [19]. A detailed description andsimulation results for JVDF can be found in [20], while a briefdescription of the JVDF algorithm is presented next.

All available depth maps are first warped to a single viewm. Since warping results in multiple estimates of a noise-free depth map value in a spatial location (xm, ym), we selectsamples among which the filtering is carried out. We assumethat the depth value Zm of view m is relatively accurate andtherefore the correctly projected depth value Zi from otherviews which describe the same object should be close invalue to Zm . A classification of similarity is defined through aconfidence range and a threshold T on the absolute differencebetween Zi and Zm . The depth value Zi , for which theabsolute difference exceeds threshold T are excluded fromjoint filtering, whereas other depth values of location (xm, ym)are averaged in order to produce a “noise-free” estimate. Afterthat the produced “noise-free” estimate of the depth value iswarped back to corresponding views that participated in jointfiltering. Depth map values which were found to be outliersremain unchanged.

C. Depth-Range-Based Weighted Prediction (DRWP)

Since every frame of the depth map may be created with dif-ferent Znear/Zfar , the same actual Z values can be representedwith different depth map values. To compensate this mismatch,we introduced a novel coding tool, DRWP, which can beutilized to produce weights for the weighted prediction processof H.264/AVC. The weighted prediction of H.264/AVC isimplemented as shown in (4):

v2 = �v1 · W + O f f set + 0.5� (4)

In DRWP, the parameters of the weighted prediction W andOffset are computed as follows:

W = Zfar1 − Znear1Zfar2 − Znear2

∗ Znear2 ∗ Zfar2Znear1 ∗ Zfar1

(5)

O f f set = 255 ∗ Zfar2Zfar1

∗ (Zfar2 − Zfar1)

(Zfar2 − Znear2)(6)

where variables with subscript 1 are represent the parametersof the currently coded/decoded depth image and variables withsubscript 2 represent the parameters of the reference depth mapimage.

V. CODING CONDITIONS AND RESULTS

A. Introduction

In this section, we present the coding and the view synthesisperformance results of Nokia 3DV-TM. The results are basedon the conditions of the MPEG 3DV CfP [5], which aredescribed in sub-section V.B. The Nokia 3DV-TM encodingand view synthesis settings used to fulfill the requirementsof the MPEG 3DV CfP are presented in sub-section V.C.Objective and subjective performance results are presentedin sub-sections V.D and V.E, respectively. Sub-section V.Fanalyzes the tool-wise performance.

B. MPEG 3DV CfP

Eight test sequences were used as summarized in Table I.As can be observed from the table, four of the sequenceshave a 1080p resolution while the other four have a resolutionof 1024×768. Six of the sequences were captured with amulti-camera setup, while two sequences were fully or mostlycomputer-generated.

Bitstreams were encoded for two test scenarios: the 2-viewscenario (C2), providing disparity adjustment capability forstereoscopic displays, and the 3-view scenario (C3), addition-ally providing the capability for view synthesis for multiviewautostereoscopic displays. The selected input views for encod-ing the two scenarios are presented in Table III. In order totest the operation of the codec at a wide range of operationpoints, bitstreams were generated for four bitrates, R1 to R4,listed in Table II.

The JMVC reference software [21] of MVC was used asanchor for comparisons in the CfP. Texture views were codedas one MVC bitstream and the depth views were coded asanother MVC bitstream. The anchor bitstreams follow thesame constraints imposed on the proposals. MPEG VSRSwas used for DIBR for the anchor encoding. For furtherinformation, the anchor bitstreams as well as the configurationfiles are available on-line [16].

Two viewing environments, stereoscopic glasses-based andautostereoscopic ones, were used in the subjective evaluationsorganized by MPEG. Stereoscopic viewing was performedon a 46” stereo display with passive glasses, while in theautostereoscopic viewing 28 views were displayed simultane-ously on a 52” panel. The subjective test setup is describedin [5].


TABLE I

MPEG 3DV CfP TEST SEQUENCES

TABLE II

BITRATES FOR RATE POINTS R1 TO R4 OF C2 AND C3

TABLE III

INPUT VIEWS FOR CODING, GENERATED STEREO PAIRS FOR VIEWING,

AND CAMERA SEPARATION FOR SYNTHESIZED VIEWS (FOR C3)

C2 was tested only with the stereoscopic display, whileC3 was tested both with the stereoscopic and the autostereo-scopic display. Synthesized views were generated from thedecoded texture and depth views as summarized in Table III.The displayed views for C2 are also indicated in the “Stereopair” column of Table III. As can be observed, the displayedstereo pair consisted of one decoded view and one synthesizedview, which is a reasonable assumption for disparity-adjustedstereoscopic viewing. For C3and stereo viewing, views weresynthesized at regular intervals as presented in Table III.Out of the synthesized views one fixed stereo pair and, forthe lower-resolution sequences, also one randomly selectedstereo pair were evaluated as indicated in Table III. For C3and autostereoscopic viewing, the 28 adjacent views from allthe synthesized and coded views were randomly selected. Thesame random selections were made for all submissions, butthe proponents were not aware of the random selections at thetime of the submission in order to avoid tuning of the encodingsettings to favour certain views.

C. Encoding and View Synthesis Settings for Nokia 3DV-TM

Nokia 3DV-TM coding tools as well as high-level syntaxand codec operation were implemented on top of the JM 17.2reference software [22] of H.264/AVC.

In C3, the PIP inter-view prediction structure was used,in which the central view is the base view used as inter-view reference for coding the two side views. In C2, thenon-base view was inter-view-predicted from the base view.As governed by the CfP, the random access period was set to12 and 15 for 25-Hz and 30-Hz sequences, respectively. GVRwas used at every other random access point.

Quantization parameter (QP) cascading was used betweenviews, i.e. side views were quantized more coarsely with a QPvalue increase of 3 compared to the QP value of the base view.Subjective testing was performed to verify that the selectedinter-view QP cascading was preferred over some other testedoptions for selecting QP values across views [23].

A dyadic inter prediction hierarchy was used. Temporal QPcascading was used for different temporal levels of the interprediction hierarchy, as proposed in [24], i.e. intra frames werecoded with a certain QP b, while pictures at temporal leveln ≥ 1 were coded with QP b+3+n. The value b was keptunchanged for the entire coded view sequence.

The same inter-view and GOP prediction patterns as well asthe same QP cascading scheme were used for both texture anddepth. The QP values were selected to match the target bitratesmanually, with an emphasis on the selection of the texture QPvalue and using depth QP as a mechanism for finer granularitybitrate matching.

Depth views were selected to be coded at quarter resolutionfrom their original resolution listed in Table I. To producereduced resolution depth map data, a linear 13-tap lengthfilter specified in the JSVM reference software [25] for theScalable Video Coding of H.264/AVC was utilized. Followingthe decoding, depth map data is up-sampled back to theoriginal resolution with a simple 2-tap bilinear filter. However,more advanced upsampling approaches may be used in orderto preserve depth contours and yield a better subjective andobjective quality [26].

The spatial resolution of the texture views was automat-ically selected from three options: full resolution (as listedin Table I), ¾ resolution, and ½ resolution horizontally andvertically. For each spatial resolution, the Mean Square Error(MSE) of the reconstructed image upsampled to the fullresolution was calculated against the original image. Theresolution providing the smallest MSE value was selected.If two resolutions provided approximately equal MSE withina 5% margin, the smaller resolution of the two was selected.Later on, we studied a resolution selection algorithm based onfrequency analysis, which provided better resolution selectionsubjectively, particularly for the 1080 p test sequences [27].

The presented enhanced texture coding tools were usedfor all non-base texture views. Hence, non-base depth viewcomponents preceded the respective texture view componentsin coding/decoding order. The presented depth coding toolswere used except for VSP for depth, which was turned offmainly to reduce execution times.


TABLE IV

CODING EFFICIENCY OF NOKIA 3DV-TM VS. REFERENCE MVC CODING,

IN TERMS OF BJONTEGAARD METRICS

The 1D parallel mode of the MPEG VSRS [16], [17] wasused for DIBR at the post processing stage. The same VSRSconfiguration files and camera parameters as those used foranchor encoding were utilized as such [16].

D. Objective Results

The objective quality of the decoded views of Nokia 3DV-TM bitstreams was measured against the MVC referenceencoding (anchor), where texture views were coded as oneMVC bitstream and depth views were coded as another MVCbitstream. The commonly used Bjontegaard delta bitrate (dBR)and Bjontegaard delta Peak Signal-to-Noise Ratio (dPSNR)[28] metrics were applied in comparing the rate-distortion(RD) curves of Nokia 3DV-TM against the reference MVCresults. These metrics have been produced by taking intoaccount the total bitrate required for MVD data transmissionand PSNR results for the luma component of the decodedtexture views. The obtained results are presented in Table IV.

As can be observed from Table IV, Nokia 3DV-TM outper-formed MVC coding with a clear margin. The RD improve-ment compared to MVC in C2 is smaller than that in C3,because the coding tools in Nokia 3DV-TM improve particu-larly the coding efficiency of non-base texture views. It shouldbe noted that the perceived quality obtained with MVD codingis not limited to the quality of the decoded texture views,although it serves as a clear indicator of the coding technologyperformance, but the quality of synthesized views should alsobe taken into account. However, for most of the synthesizedviews there is no original data available and hence a full-reference objective quality metrics, such as PSNR, are notapplicable as such. Therefore, this component of MVD coding(i.e. the quality of synthesized views) was evaluated througha large-scale formal viewing procedure arranged by MPEG. Itis also noted that in an analysis of the subjective evaluationresults of the MPEG 3DV CfP it was discovered that thePSNR of the decoded view had the highest correlation tosubjective ratings of displayed stereo pairs with one decodedview and one synthesized view [29]. Consequently, the resultsof Table IV can be considered indicative of the subjectivequality improvement of synthesized stereo pairs too.

0.00

2.00

4.00

6.00

8.00

10.00

500 700 900 1100Ave

rage

sub

ject

ive r

atin

g

Bitrate(kbps)

H.264/MVC Nokia 3DV-TM

Fig. 7. An example of subjective viewing results performed for the MPEG3DV CfP evaluation. The average subjective ratings of Nokia 3DV-TM(the C2 case, sequence S06) are compared against the reference MVC results.

TABLE V

BITRATE SAVING OF NOKIA 3DV-TM VS. REFERENCE MVC CODING

OBTAINED FROM THE MEAN OPINION SCORE

E. Subjective Results

A detailed review of the MPEG 3DV CfP test arrangementand results is provided in [30], while the results for the Nokia3DV-TM compared to MVC are analyzed in greater detailin this paper. The Double Stimulus Impairment Scale (DSIS)test method was used in the subjective testing with 11 qualitylevels, where 10 indicates the highest quality and 0 indicatesthe lowest quality.

In order to provide results that are intuitively comprehen-sible as well as comparable to the dBR results presentedin the previous sub-section, we analyzed the graphs of themean opinion score (MOS) plotted against the target bitrate.An example of such a curve is provided in Fig. 7 for theC2 case. In the figure, the obtained MOS points from thesubjective testing are piece-wise linearly connected. For theMOS range that overlaps between the curves, the bitrates of theMVC curve were compared to the bitrates of the Nokia 3DV-TM curve at an equal MOS value. The bitrate saving averagedover the overlapping MOS range is presented in Table V. Itcan be observed that the objective bitrate results presentedin Table IV are approximately aligned with the bitrate sav-ing results at equal MOS values. There are sequence-wisedifferences, such as for the 2-view coding of sequence S04,where the subjective scores of Nokia 3DV-TM were slightlyinferior to those of MVC, whereas objective coding results


TABLE VI

ESTIMATED TOOL-WISE AVERAGE BJONTEGAARD BITRATE REDUCTION

IN NOKIA’S 3DV CfP RESPONSE (%)

TABLE VII

BJONTEGAARD DELTA BITRATE IMPACT OF VSP AND D-MVP (%)

indicate that Nokia 3DV-TM clearly outperformed MVC for2-view coding of S04. We suspect that this behavior is due toRD optimization in the encoder causing some residual blocksbeing left uncoded and creating a clearly visible temporal trail.This problem was later fixed by an improved texture viewresolution selection algorithm [27].

F. Objective Results Per Coding Tool

This sub-section attempts to give a summary of the contri-bution of the main tools of Nokia 3DV-TM to the objectivequality improvement relative to the MVC anchor of MPEG3DV CfP. Table VI presents a summary of the tool-wiseestimated coding gain in dBR for the MPEG 3DV CfP codingconditions, while the next paragraphs provide more details howthe estimates were obtained. A more detailed analysis of themost substantial coding tools and encoding methods has beenprovided in [13], [18], [20], [27]. It can be observed that thetotal RD improvement reported in Table VI does not add up tothe RD improvement reported in Table IV. The remaining ofthe RD gain can be considered to result from various encoderalgorithms and configurations.

VSP and D-MVP were tested using common test condi-tions [31] of the JCT-3V by turning off these tools individually.The results of this experiment are reported in Table VII. Evenif the common test conditions differ from the conditions weused in the submission to the MPEG 3DV CfP, for examplewhen it comes to QP settings, it can be estimated based onthese results that the depth-based texture coding tools providedroughly 10–15% dBR reduction in C3.

The impact of depth coding tools, namely JVDF and DRWP,was analyzed as follows. As presented in [20], JVDF beingapplied for full resolution depth map provides noticeablecoding gain (up-to 10% of dBR) for noisy depth maps of

natural test sequences. However, taking into account lowresolution depth map coding used in Nokia 3DV-TM, and thesmall share of depth bitrate from the total MVD bitrate (about11% in Nokia’s response to the MPEG 3DV CfP), the impactof JVDF in Nokia’s CfP response can be estimated to be upto 1% dBR from the total bitrate of texture and depth views.DRWP is applicable only if the depth range varies during thecoded sequence, and among the sequences in the CfP onlyGT_Fly (S04) is such a sequence. The impact of DRWP forGT_Fly was measured to be 5.6% dBR of coded depth views,which corresponds to 0.15% dBR of all coded views [32]. Inconclusion, as the depth bitrate share was small in Nokia’s CfPresponse and as the underlying coding technology remainedthe same as in the MVC anchor, the depth coding tools (i.e.,JVDF and DRWP) have a minor impact on the presentedoverall results in sub-sections V.D and V.E.

GVR was analyzed in [13], where it was concluded that anRD gain was achieved in 3 out of the 8 test sequences in theMPEG 3DV CfP. The average texture coding gain of GVRover all 8 test sequences was 1.4% and 3.0% dBR in C2 andC3, respectively, and the estimated impact of the respectivedepth coding gain on the total bitrate was 0.1% and 0.2%dBR in C2 and C3, respectively.

VI. DISCUSSION ON COMPLEXITY AND

STANDARDIZATION DEVELOPMENT

A. Introduction

Nokia 3DV-TM was intended to include coding tools andcoding approaches that have significant potential in terms ofcoding efficiency, while the optimization and finer tuning ofthese coding tools were expected to be conducted during thecollaboration phase of the standardization process. With thisapproach, Nokia 3DV-TM provided a solid starting point forcollaboration and it was selected as the initial basis for the3DV-ATM reference test model [33]. The later developmentwithin MPEG and JCT-3V has resulted into rigorous studyon complexity-performance tradeoffs for tools in 3DV-ATM,complexity-optimized solutions and improved compressionefficiency.

In this section, we briefly describe relevant tools in termsof their complexity, complexity optimization techniques andthe evolution of the presented coding tools within the scopeof 3D-AVC specification.

B. Depth-Based Enhanced Texture Coding Tools

The original design of VSP in Nokia 3DV-TM was imple-mented with a forward view synthesis approach (F-VSP) [15].This process is considered to be demanding in terms ofmemory use and processing power. For example, the sub-pixel-based processing of F-VSP and the hole and occlu-sion handling, inherently part of F-VSP, significantly increasememory access rates and disable the block-based processingconcept which is typically utilized in state-of-the-art videocoding systems. As a result of the analysis provided in [34],it was shown that in-loop F-VSP is the most computationallydemanding module of the 3DV-ATM, requiring about 30–40%of the total decoding time. This was considered unacceptable


and hence the original F-VSP design was replaced by abackward VSP approach [34] (B-VSP) utilizing the depth-first coding order for non-base views. In B-VSP, the depthview component of the current non-base view is used toderive a block-wise disparity to obtain a prediction block froman adjacent texture view component. This design is alignedwith the conventional MCP of H.264/AVC and was found toprovide comparable compression efficiency to F-VSP [34].

The original D-MVP design described in this paper con-sists of two conceptual modules, direction-separated MVP(DS-MVP) and depth-based motion competition (DCP). Thecomplexity of DS-MVP can be considered very close to thatof the original MVP design in H.264/AVC. The only changeis the computation of a disparity vector which is required if noother candidate is available. The disparity derivation describedin this paper specifies computing the average depth value overd(Cb) as shown in (2). However, the spatial redundancy ofdepth information allowed a simple sub-sampling approach toreplace the averaging procedure of (2), as proposed in [32].The complexity of DCP, in turn can be regarded as significant,since a SAD operation between two depth blocks shouldbe performed at the decoder side for up to three pairs ofblocks. Therefore, DMC was replaced during the 3D-AVCstandardization by the simpler depth-based MVP process forthe Skip and Direct modes proposed in [35].

C. Depth Coding Tools

In terms of number of operations, the complexity of JVDFcan be considered insignificant. JVDF processing can beestimated to require seven operations per pixel, which issignificantly lower than the average complexity of H.264/AVCinterpolation, for example. However, pixel-wise operation andthe relatively large amount of required memory accesses madethis tool relatively complex, similarly to F-VSP. These factsand the fairly small coding gain lead to excluding JVDF fromthe normative part of 3D-AVC. However, JVDF remains apart of 3DV-ATM as a non-normative pre-processing and post-processing tool.

The complexity of DRWP is considered to be negligible,since it introduces no changes to block level processing ofH.264/AVC. However, the original implementation of DRWPdescribed in this paper utilizes floating point calculations(4–6) which are performed at the slice level. As floating pointoperations may be rounded differently in different computingsystems and are computationally more demanding than fixedpoint operations, the implementation of (4–6) in fixed pointarithmetic was proposed in [32] and got adopted to 3D-AVC.

D. Gradual View Refresh

Finally, GVR is included in 3DV-ATM as a non-normativetool. The use of GVR can be signaled with supplementalenhancement information messages as explained in [33]. GVRdoes not involve additional coding tools, and hence its com-plexity impact can be considered negligible.

VII. CONCLUSION

This paper describes the Nokia 3D video coding test model(Nokia 3DV-TM), which was found to be the best-performing

submission in the H.264/AVC compatible category of the callfor proposals (CfP) on 3D video coding technology organizedby the Moving Picture Experts Group (MPEG). Both objectivecoding performance results and subjective viewing experienceresults were provided and compared against the coding of tex-ture views as one Multiview Video Coding (MVC) bitstreamand depth views as another MVC bitstream. As a result ofthe CfP evaluation, Nokia 3DV-TM was selected as a startingpoint for development of MVC and H.264/AVC compatible3D video coding standards.

ACKNOWLEDGMENT

The authors would like to thank T. Utriainen, E. Pesonen,and S. Jumisko-Pyykkö from the laboratory of the Human-Centered Technology of Tampere University of Technologyfor performing systematic subjective testing supporting thedevelopment of Nokia 3DV-TM. Moreover, the authors thankProf. M. Domanski, et al. for providing Poznan test sequencesand their camera parameters [36].

REFERENCES

[1] Multi-View Video Plus Depth (MVD) Format for Advanced 3D VideoSystems, document JVT-W100.doc, Joint Video Team, Apr. 2007.

[2] Advanced Video Coding for Generic Audiovisual Services, documentH.264.doc, ITU-T Recommendation, Apr. 2013.

[3] T. Shibata, J. Kim, D. M. Hoffman, and M. S. Banks, “The zoneof comfort: Predicting visual discomfort with stereo displays,” J. Vis.,vol. 11, no. 8, p. 11, Jul. 2011.

[4] L. McMillan, Jr., “An image-based approach to three-dimensional com-puter graphics,” Ph.D. thesis, Dept. Comput. Sci., Univ. North Carolina,Charlotte, NC, USA, 1997.

[5] Call for Proposals on 3D Video Coding Technology, documentN12036.doc, MPEG, Mar. 2011.

[6] High Efficiency Video Coding, document H.265.doc, ITU-T Recommen-dation, Apr. 2013.

[7] [Online]. Available: http://phenix.int-evry.fr/jct3v/[8] MVC Extension for Inclusion of Depth Maps Draft Text 6, document

JCT3V-C1001.doc, JCT-3V, Mar. 2013.[9] Y. Chen, M. M. Hannuksela, T. Suzuki, and S. Hattori, “Overview of the

MVC+D 3D video coding standard,” J. Vis. Commun. Image Represent.,Apr. 2013.

[10] 3D-AVC Draft Text 6, document JCT3V-D1002.doc, JCT-3V, May 2013.[11] Description of 3D Video Coding Technology Proposal by Nokia, docu-

ment M22552.doc, MPEG, Nov. 2011.[12] A. Vetro, S. Yea, and A. Smolic, “Toward a 3D video format

for auto-stereoscopic displays,” Proc. SPIE Applications of Dig-ital Image Processing XXXI, vol. 7073, pp. 1–12, Sep. 2008,doi:10.1117/12.797353.

[13] M. M. Hannuksela, L. Chen, D. Rusanovskyy, and H. Li, “Gradual viewrefresh in depth-enhanced multiview video,” in Proc. Picture CodingSymp., May 2012, pp. 141–144.

[14] S. Shimizu, M. Kitahara, H. Kimata, K. Kamikura, and Y. Yashima,“View scalable multiview video coding using 3-D warping withdepthmap,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 11,pp. 1485–1495, Nov. 2007.

[15] S. Yea and A. Vetro, “View synthesis prediction for multiviewvideo coding,” Signal Process., Image Commun., vol. 24, nos. 1–2,pp. 89–100, Jan. 2009.

[16] MPEG View Synthesis Reference Software [Online]. Available FTP:ftp.merl.com/pub/avetro/3dv-cfp/

[17] D. Tian, P.-L. Lai, P. Lopez, and C. Gomila, “View synthesis tech-niques for 3D video,” Proc. SPIE Applications of Digital ImageProcessing XXXII, vol. 7443, pp. 74430T-1–74430T-11, Sep. 2009,doi:10.1117/12.829372.

[18] W. Su, D. Rusanovskyy, M. M. Hannuksela, and H. Li, “Depth-basedmotion vector prediction in 3D video coding,” in Proc. Picture CodingSymp., May 2012, pp. 37–40.


[19] E. Ekmekcioglu, V. Velisavljevic, and S. T. Worrall, “Content adaptiveenhancement of multi-view depth maps for free viewpoint video,” IEEEJ. Sel. Topics Signal Process., vol. 5, no. 2, pp. 352–361, Apr. 2011.

[20] R. Li, D. Rusanovskyy, M. M. Hannuksela, and H. Li, “Joint viewfiltering for multiview depth map sequences,” in Proc. IEEE Int. Conf.Image Process., Sep./Oct. 2012, pp. 1329–1332.

[21] MVC Reference Software, document N10897.doc, MPEG, Dec. 2009.[22] JM Software [Online]. Available: http://iphome.hhi.de/suehring/tml/

download/old_jm/jm17.2.zip[23] P. Aflaki, D. Rusanovskyy, T. Utriainen, E. Pesonen, M. M. Hannuksela,

S. Jumisko-Pyykkö, and M. Gabbouj, “Study of asymmetric qualitybetween coded views in depth-enhanced multiview video coding,” inProc. IC3D, Dec. 2011.

[24] Hierarchical B Pictures, document JVT-P014.doc, JVT, Jul. 2005.[25] JSVM Software [Online]. Available: http://wftp3.itu.int/av-arch/

jvt-site/2008_01_Antalya/JVT-Z203.zip[26] P. Aflaki, M. M. Hannuksela, D. Rusanovskyy, and M. Gabbouj, “Non-

linear depth map resampling for depth-enhanced 3-D video coding,”IEEE Signal Process. Lett., vol. 20, no. 1, pp. 87–90, Jan. 2013.

[27] P. Aflaki, D. Rusanovskyy, M. M. Hannuksela, and M. Gabbouj,“Frequency based adaptive spatial resolution selection for 3D videocoding,” in Proc. EUSIPCO, Aug. 2012, pp. 759–763.

[28] Calculation of Average PSNR Differences Between RD-Curves, docu-ment VCEG-M33.doc, ITU-T SG16 Q.6 (VCEG), Apr. 2001.

[29] P. Hanhart, F. De Simone, and T. Ebrahimi, “Quality assessment ofasymmetric stereo pair formed from decoded and synthesized views,” inProc. Int. Workshop QoMEX, Jul. 2012, pp. 236–241.

[30] Report of Subjective Test Results from the Call for Proposals on 3DVideo Coding Technology, document N12347.doc, MPEG, Jan. 2012.

[31] Common Test Conditions of 3DV Core Experiments, document JCT3V-A1100.doc, JCT-3V, Jul. 2012.

[32] Calculation Process for Parameters of Depth-Range-Based WeightedPrediction with Fixed-Point/Integer Operations, document JCT3V-A0112.doc, JCT-3V, Jul. 2012.

[33] 3D-AVC Test Model 5, document JCT3V-C1003.doc, JCT-3V, Jan. 2013.[34] 3DV-CE1.a: Block-Based View Synthesis Prediction for 3DV-ATM, doc-

ument JCT3V-A0107.doc, JCT-3V, Jul. 2012.[35] 3D-CE5.a Results on Motion Vector Competition-Based Skip/Direct

Mode with Explicit Signaling, document JCT3V-A0045.doc, JCT-3V,Jul. 2012.

[36] Poznan Multiview Video Test Sequences and Camera Parameters, doc-ument M17050.doc, MPEG, Oct. 2009.

Miska M. Hannuksela (M’03) received the Master of Science degree inengineering and Doctor of Science degree in technology from the TampereUniversity of Technology, Tampere, Finland, in 1997 and 2010, respectively.

He has been with Nokia since 1996 in different roles, including ResearchManager and Leader in the areas of video and image compression, end-to-end multimedia systems, as well as sensor signal processing and contextextraction. Currently, he works as a Distinguished Scientist with MultimediaTechnologies, Nokia Research Center, Tampere. He has published more than100 journal and conference papers and 100 standardization contributions inJCT-VC, JCT-3V, JVT, MPEG, 3GPP, and DVB. He has granted patents frommore than 70 patent families. His current research interests include videocompression and multimedia communication systems.

Dr. Hannuksela received the Best Doctoral Thesis of the Tampere Univer-sity of Technology in 2009 and the Scientific Achievement Award nominatedby the Centre of Excellence of Signal Processing, Tampere University ofTechnology, in 2010. He has been an Associate Editor of the IEEE TRANS-ACTIONS ON CIRCUITS AND SYSTEMS OF VIDEO TECHNOLOGY since 2010.

Dmytro Rusanovskyy received the Master of Science degree from theKharkiv National University of Radioelectronics, Kharkiv, Ukraine, in 2000,and the Doctor of Science degree in technology from the Tampere Universityof Technology, Tampere, Finland, in 2009.

He is currently a Consultant in video architecture for LG Electronics.His current research interests include image and video processing andenhancement, 2-D/3-D video coding, and depth-enhanced video processing.He is the co-author of multiple journal and conference papers, standardizationcontributions, and patent applications.

Wenyi Su received the Bachelor of Science degree from Xi’dian University,Xi’an, China, in 2010. He is currently pursuing the master’s degree in signaland information processing with the University of Science and Technology ofChina, Hefei, China. His current research interests include 3-D video codingand processing.

Lulu Chen received the B.S. degree in electronic information engineeringfrom the University of Science and Technology of China (USTC), Hefei,China, in 2010. She is currently pursuing the M.S. degree with the Signaland Information Processing, USTC. Her current research interests include3-D video coding and processing.

She received the Best Paper Award of 2012 Visual Communications andImage Processing conference with Dr. Hannuksela and Dr. Li.

Ri Li received the Bachelor of Engineering and master’s degree in signaland information processing from the University of Science and Technologyof China, Hefei, China, in 2009 and 2012, respectively. His current researchinterests include 3-D depth coding and pre/post-processing.

Payman Aflaki received the master’s degree from the Polytechnic Universityof Turin, Turin, Italy, and the Polytechnic university of Catalonia, Catalonia,Spain, in 2008 and 2009, respectively. He is pursuing the Ph.D. degree withthe Tampere University of Technology, Tampere, Finland. Since June 2011, hehas been an External Researcher with the Nokia Research Center, Tampere,contributing actively to ongoing 3-D video coding standardization activities.

He has been working in 3-D video coding for four years. His currentresearch interests include asymmetric 3-D video compression and depth-enhanced video processing/coding.

Deyan Lan received the master’s degree in signal and information processingfrom the University of Science and Technology of China, Hefei, China, in2012. His current research interests include 3-D depth coding and pre/post-processing.

Michal Joachimiak received the master’s degree in computer science fromthe Lodz University of Technology, Lodz, Poland, in 2006. From 2007 to2009, he was a Computer Vision Researcher with the Tampere Universityof Technology (TUT), Tampere, Finland. In 2009, he has started the Ph.D.studies in 3-D video coding with the Department of Signal Processing, TUT.His current research interests include 3-D video processing and coding.

Houqiang Li (M’10) received the B.S., M.Eng., and Ph.D. degrees from theUniversity of Science and Technology of China (USTC), Hefei, China, in1992, 1997, and 2000, respectively, all in electronic engineering.

He is currently a Professor with the Department of Electronic Engineeringand Information Science, USTC. He has authored or co-authored over 90papers in journals and conferences. His current research interests include videocoding and communication, multimedia search, and image/video analysis.

He is an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND

SYSTEMS FOR VIDEO TECHNOLOGY and the Editorial Board of Journal ofMultimedia. He has served on Technical/Program Committees, OrganizingCommittees, and as a Program Co-Chair, Track/Session Chair for over teninternational conferences. He was a recipient of the Best Paper Award for theInternational Conference on Mobile and Ubiquitous Multimedia from ACMin 2011 and a senior author of the Best Student Paper of the 5th InternationalMobile Multimedia Communications Conference (MobiMedia) in 2009.

Moncef Gabbouj (M’85–SM’95–F’11) received the B.S. degree in elec-trical engineering from Oklahoma State University, Stillwater, OK, USA,in 1985, and the M.S. and Ph.D. degrees in electrical engineeringfrom Purdue University, West Lafayette, IN, USA, in 1986 and 1989,respectively.

He has been an Academy Professor with the Academy of Finland, Helsinki,Finland, since January 2011. He is currently with the Department of Electronicand Computer Engineering and the Department of Computer Science andEngineering, The Hong Kong University of Science and Technology, HongKong. He was with the School of Electrical Engineering, Purdue University,from August 2011 to December 2011 and the Viterbi School of Engineering,University of Southern California, Los Angeles, CA, USA, from January 2012to June 2012. He was a Senior Research Fellow with the Academy of Finlandfrom 1997 to 1998 and 2007 to 2008. His current research interests includemultimedia content-based analysis, indexing and retrieval, nonlinear signaland image processing and analysis, voice conversion, and video processingand coding.

Dr. Gabbouj has served as an Associate Editor of the IEEE TRANSAC-TIONS ON IMAGE PROCESSING and a Guest Editor of Multimedia Tools andApplications, and the European Journal Applied Signal Processing.

Documents

Multiview-Video-Plus-Depth Coding Based on the Advanced Video Coding Standard