Temporal compensated motion estimation with simple block-based prediction

Preview:

Citation preview

IEEE TRANSACTIONS ON BROADCASTING, VOL. 49, NO. 3, SEPTEMBER 2003 241

Temporal Compensated Motion Estimation WithSimple Block-Based Prediction

Jisheng Wang, Dong Wang, and Wenjun Zhang

Abstract—Motion compensated video format conversion(MC-VFC) needs true motion vectors, which have such featuresas spatial consistency and temporal extension. Based on thesefeatures, the temporal compensated motion estimation algorithmwith the simple block-based prediction (TC-SBP) is presented inthis paper. Making use of the spatial-temporal correlation, thesimple block-based prediction needs only three neighboring can-didate vectors for each block, so that its computational complexityis highly reduced. At the same time, three temporal compensatedupdating candidates are used to efficiently accelerate the con-vergence of the algorithm for the global motion field. Measuredwith the criteria relevant to the MC-VFC and video processingapplications, the new TC-SBP algorithm is shown to get a superiorperformance over alternative ones.

Index Terms—Motion compensated video format conversion,simple block-based prediction, spatial-temporal correlation,temporal compensated strategy, video processing.

I. INTRODUCTION

I T HAS BEEN shown by lots of experiments and researchesthat there is a spatial and temporal correlation in image

sequence. The inertia of moving object causes correlation ofvector fields in the temporal domain, while the object sizeexplains spatial correlation of motion vectors.

As shown in Fig. 1, although the amplitudes of motion vectordifferences distribute in the range from 0 to 7 randomly, about89% of them are consistent with their neighbors. Due to the “ill-posed” [1] problem of 2-D motion estimation, the output vectorsof full search algorithm are more disordered than the true mo-tion vectors. Therefore, the true correlation of the test sequenceis stronger than the result shown in Fig. 1. By applying suchcorrelation, smaller search windows are applicable in the block-matching algorithms and it results the improvement of computa-tion cost reduction. At the same time, the smoothness constraintimplied in the correlation can prevent the occlusion and apertureproblems [1] of motion estimation, so that the spatial-temporalconsistency, which is the most important feature of the vectorfields used in the motion compensated video format conversion(MC-VFC) and video processing, would be enhanced. Many al-

Manuscript received May 8, 2003; revised July 31, 2003. This work was sup-ported in part by Institute of Image Communication & Information ProcessingResearch, Shanghai Jiaotong University, Shanghai, P. R. China.

J. Wang is with the Institute of Image Communication & Information Pro-cessing Research, SJTU, Shanghai, China (e-mail: wangjs@cdtv.org.cn).

D. Wang is with the Trident Multimedia Technologies (Shanghai) Co., Ltd.,on leave from the Department of Electronics & Information Technology in SJTU(e-mail: wangdong@trident.com.cn).

W. Zhang is with the Department of Electronics & Information Technology,SJTU (e-mail: hdtvteeg@163.net).

Digital Object Identifier 10.1109/TBC.2003.817684

Fig. 1. The distribution of motion vector’s amplitudes and minimumneighboring difference in the vector fields of test sequence Renata (from fieldperiod 10 to 20), using the full search algorithm.

Fig. 2. The neighboring area of spatial-temporal correlated search.

gorithms concerning the correlation have been proposed. In pa-pers [2] and [3], only the spatial correlation is used, and the mo-tion vectors of the blocks marked as A and B respectively inFig. 2(b) are selected to be the candidate vectors of the currentblock. Owing to lack of the motion information from the blockson the right side and underside of the current block, if the es-timated block is at the motion edge presented as the slope lineshown in Fig. 2(b), its correct output vectors are not includedin the candidate set. In paper [4], the bidirectional convergenceprinciple is used, and the blocks marked as C in Fig. 2(b) areselected to be the candidate vectors. Because it is necessary forthe bidirectional search method to save two temporary results,its hardware resource is inevitably increased. Having the samedisadvantages as [4], [5] is not suitable for reducing the hard-ware cost either. In this paper, a new block-matching algorithmis presented and the number of candidate vectors is further re-duced. Moreover, a temporal compensated strategy is also pro-posed to accelerate its convergence rate.

The paper is organized as follows. In Section II and III, thesimple block-based prediction is presented. The updating andthe temporal compensated strategy are given in Section IV andV. In Section VI, two new evaluation measures relevant to theMC-VFC and video processing applications are proposed. Sim-ulation results are discussed in Section VII.

0018-9316/03$17.00 © 2003 IEEE

242 IEEE TRANSACTIONS ON BROADCASTING, VOL. 49, NO. 3, SEPTEMBER 2003

II. SPATIAL-TEMPORAL CORRELATEDPREDICTIVE SEARCH

Conventionally, following features are assumed existed in themotion fields of MC-VFC and video processing: (1) the imageincludes one or more moving objects, and their sizes are muchlarger than the block size; (2) the moving edges are smooth incomparison with the block size; (3) the moving objects have spa-tial consistency; that is, there is little moving difference amongthe spatial neighboring blocks of an object; (4) the moving ob-jects have inertia, that is, the moving difference between twoneighboring intervals of an object is also very little.

Then, for the motion fields which have such features men-tioned above, the spatial-temporal correlated search procedureis designed as follows:

Step 1) Get the predicted motion vectors from the spatial andtemporal neighbors:

(1)

where is the center of the current block, anddenote the offset vectors of the neighbors in the

present and previous fields, andis the field period.Both of the spatial and temporal neighbors form thepredicted block set.

Step 2) Generate the candidate vectors through adding up-date vectors to the predicted ones:

(2)

where denotes the update vectors.Step 3) Find the best output vectors from the candidate set

through comparing their distortion degrees:

(3)where is the distortion function, is theluminance function.

It can be concluded from the procedures that the key of thealgorithm is the selection method of the predicted blocks andthe updating strategy. The former decides the algorithm’s com-plexity and the consistency of its output vector fields, whilethe latter decides its convergence rate when the motion modelchanges from one to another. Detailed discussion will be givenin the following sections.

III. T HE SIMPLE BLOCK-BASED PREDICTION

The key criterion for predicted block selecting is to obtainmore motion information of the current block with as few blocksas possible. It can be concluded from Feature (1) in Section IIthat there must be at least one neighbor which is in the sameobject with the current block. And Feature (3) tells us that themotion information of several blocks in the same object is sim-ilar. Accordingly, if the fewest predicted blocks are used, oneand only one predicted block in the same object with the cur-rent block can be selected. If the current block lies in the inte-rior of object, any of its neighbors can be selected to serve asthe predicted block. However, the solution is more complicatedif the current block is at the motion edge. Considering the size

Fig. 3. The geometrical relationship between the motion edge and the blockposition.

Fig. 4. Six selection methods of the simple predicted blocks. The blocksof gray color are the nonconsequence blocks whose motion vectors shouldbe obtained from their corresponding blocks in the previous field. Moreover,because of the VLSI architecture, the left-side block (marked as 1/6), whoseestimation process ends after the beginning of the current block, is also anonconsequence one.

of block, the motion edge can, according to Features (1) and (2),be regarded as a straight line, which divides the spatial neigh-boring region of the current block into two parts: the same mo-tion region and the different motion region. As shown in Fig. 3,it is clear that it needs only three points to form a triangle whichcontains a point A, and an arbitrary line through the point A iso-lates the three points on two different sides. Therefore, at leastthree neighboring blocks are needed to ensure at least one blockis in the same motion region with the current block; at the sametime, the current block must lie within the triangle formed bythe three neighboring ones.

Therefore, the key to this motion estimation method, whichis called the simple block-based prediction (SBP), is selectingthree neighboring predicted blocks according to the conclusionabove. As shown in Fig. 4, there are six selection methods ofSBP. The smaller the number of nonconsequence blocks, themore accurate the motion information. As a result, Mode 4 inFig. 4 is selected. By replacing the motion vector of the non-consequence block (marked as 4/5/6 in Fig. 4) with its corre-sponding output vector of the previous field, the whole predictedmotion vectors of SBP is obtained as follows:

(4)

IV. UPDATING STRATEGY

The predicted motion vectors of SBP can not cover all ofthe motion models of the current block. There are many com-plex motion models, such as the moving edges and the scene

WANG et al.: TEMPORAL COMPENSATED MOTION ESTIMATION WITH SIMPLE BLOCK-BASED PREDICTION 243

changes, which generate new motion vectors. Therefore, thepredicted vectors should be updated for the algorithm to con-verge to the best outputs. The common updating strategies in-clude the 3 3 neighboring area full search method [2] and therandom updating strategy [4]. For the former, its update vectorsare:

(5)

It is necessary for every block to search 27 different blocks to es-timate its motion vector, which highly increases the complexityof the motion estimation algorithm. As a result, the random up-dating strategy is selected. However, the new strategy, differentfrom [4], updates not all of the three predicted vectors but thebest one of them, which can not only reduce the number of thecandidate vectors, but also accelerate its convergence. Thus, thecandidate vectors are:

(6)

where , denotes the best one of thethree predicted motion vectors, denotes the update vec-tors, which are selected randomly from the update set. Dueto camera panning, large horizontal movements seem to occurmore frequently than large vertical movements [4], so the vari-ance of the vertical update is smaller than that of the horizontalupdate. includes such following updates:

(7)

Then, to estimate one block, it is necessary to search onlyfive blocks. It is obvious that the new random updating strategyis more efficient than the strategy in paper [4].

V. TEMPORAL COMPENSATEDSTRATEGY

Although the random updating strategy based on SBP hashigh operation efficiency, its convergence rate is still low, sothat it weakens the algorithm’s tracking performance for thescene change in the spatial domain. Moreover, global motion,such as the camera motion due to panning, titling, traveling,and zooming of the camera [6], makes the motion field changecontinuously, which is called the slope motion field. And if themovement velocity on the convergence path of SBP search is toofast to exceed the convergence rate, the algorithm does not con-verge to its correct output finally. Increasing the number of up-date vectors and increasing their amplitudes are the two methodsthat are used to accelerate the convergence rate. However, theformer inevitably increases the computational complexity, andtoo many update vectors also weaken the smoothness constraintimplied in the spatial-temporal correlated prediction. On theother hand, increasing the amplitudes makes the update set sosparse that the estimation accuracy is reduced. Hence, a newtemporal compensated strategy is proposed, which makes useof the convergence result of the previous field to improve theconvergence efficiency of the present one.

The model of the slope motion field can be fitted throughpiecewise continuous functions [7], which is expressed as

, where is the 2-D spatial coordinates,is the temporalcoordinates. Then can be expanded into aTaylor series about :

(8)(8) is the fundamental principle formula of spatial search for themotion field at time , and is the spa-tial gradient of motion field. If the image has the objects withfast movement, the spatial gradient of its motion field couldbe very large in some area and the value of spatial gradientis unknown; the SBP search could not track it, and the algo-rithm could not converge to the correct output. Again, the spatialgradient can be expanded into a first-orderTaylor series:

(9)

Substituting (9) into (8):

(10)

where can be estimated from the first-orderTaylor series expansion of :

(11)

Substituting (11) into (10):

(12)

That is:

(13)

where is the higher-order terms, which can be neglected,and

(14)

andare the differences of motion vectors between positionand at time and respectively. Because the global(camera) motion changes slowly in the temporal domain andthe interval between two neighboring fields is very little,

has less value and more stability than the spatial gradientof in (8). If in the SBP search, motion dif-

ferences (spatial change of motion vectors) between blocks inthe previous field, which are known, are used andinstead

244 IEEE TRANSACTIONS ON BROADCASTING, VOL. 49, NO. 3, SEPTEMBER 2003

of is tracked, it is easy to predict the motiondifferences in the current field and its convergence efficiency ishighly improved. Equation (13) can also be expressed as:

(15)

where the value of is so small that the term in (15)can be neglected during the estimation. Therefore, according to(15), the new temporal compensated vector can be addedto the candidate set as follows:

(16)

where the subscript indexidentifies the three simple predictedblocks, is the motion vector of predicted block,is the gradient of motion vectors in the predicted directionof previous field. In addition, the gradient of current block’smotion vector in direction of previous field is calculated withthe motion vectors of its preceding and following blocks in thesame direction:

(17)

where both and denote the motion vectors from theprevious field.

Then, the whole candidate set of temporal compensated SBP(TC-SBP) algorithm is as follows:

(18)

where is the same as (6).

VI. EVALUATION CRITERIA

To evaluate the performance of a motion estimation al-gorithm, the mean square prediction error (MSE) and peaksignal-to-noise ratio (PSNR) are often used. Both of themare relevant and objective for coding applications of motionestimation. However, for MC-VFC and video processing appli-cations, the motion vectors, whose errors are minimum, maynot be the true motion vectors due to the ill-posed problem.Accordingly, in this section, an attempt to present objectivemotion estimation algorithm evaluation measures for theMC-VFC and video processing applications is made.

A. The Spatial Consistency Indicator

The spatial consistency of motion field refers to the motioncorrelation among the spatial neighboring blocks, which can bemeasured by the difference between their motion vectors. At thesame time, both the isolated disordered vectors and the movingedges are also expected to be identified by this indicator. Asa result, the sum of absolute minimum neighboring difference(SAMND) criterion is presented:

(19)

where

(20)

B. The Temporal Continuity Indicator

The temporal continuity of motion field refers to the smooth-ness of motion trajectories. In general, for the smooth motiontrajectories, their motion vectors (first derivative of trajectories)have strong temporal extension, in other words, they are still ef-fective for the next motion field. Hence, the temporal extendingmotion compensation mean square error (TEMC-MSE) crite-rion is proposed:

(21)where is the total sampling pixels of image, isthe luminance of position at time , is the motion

vector of position at time , which can be estimated bythe luminance field and .

VII. SIMULATION RESULTS

In addition to the new SBP and TC-SBP algorithms proposedin the paper, other four algorithms are selected to provide areference in the evaluation. They are the full search algorithm(FS), three-step search algorithm (TSS) [8], three-dimensionalrecursive search algorithm (TDR) [4], and block-based gradientdescent search algorithm (BBGDS) [9]. The four test videosequences used for comparison in the experiment are: Mobile& Calendar (Fig. 5(a)), Renata (Fig. 5(b)), Flower Garden(Fig. 5(c)), and Football (Fig. 5(d)). Each frame has a resolutionof and 30 frames/sec as the ITU-R 601format. The performances of the algorithms are measured interms of SAMND, TEMC-MSE and mean square predictionerror (MSE).

A. Comparison of Spatial-Temporal Consistency

The SAMND and TEMC-MSE figures of the algorithms areshown in Fig. 6 and Fig. 7 respectively. As shown in Fig. 6,the new SBP and TC-SBP algorithms have less SAMND thanthe others, which indicates that their output vector fields havea stronger spatial consistency. On the other hand, it is obviousthat the vector fields of SBP and TC-SBP are more regular thanthe others; in other words, their motion vectors are much closerto the true motion.

The TEMC-MSE figures in Fig. 7 show that the vector fieldsof FS algorithm have the best temporal extension performance,which attests the existence of inertia during the motion of ob-jects. Meanwhile, SBP and TC-SBP have TEMC-MSE curvesmuch more similar to FS than the others, which shows theirstronger temporal continuity. Furthermore, by comparing SBPwith TC-SBP, it can be concluded that their temporal perfor-mances are almost the same for the slow translation and simplescene (Mobile & Calendar), whereas TC-SBP is better than SBPfor the fast moving and complex scene (Football).

WANG et al.: TEMPORAL COMPENSATED MOTION ESTIMATION WITH SIMPLE BLOCK-BASED PREDICTION 245

Fig. 5. Four test video sequences (from field period 11th). (a) Mobile & Calendar; (b) Renata; (c) Flower Garden; (d) Football.

Fig. 6. SAMND figures of various algorithms.

246 IEEE TRANSACTIONS ON BROADCASTING, VOL. 49, NO. 3, SEPTEMBER 2003

Fig. 7. TEMC-MSE figures of various algorithms.

Fig. 8. MSE figures of various algorithms.

Fig. 9 shows the normalized performance scores of each algo-rithm for the four test sequences. Although FS algorithm has theleast MSE score and the strongest temporal continuity, its spatialconsistency is too weak and its vector fields are too disordered

to be used for the image reconstruction of MC-VFC. Havingthe same disadvantages as FS, TSS and BBGDS algorithms arenot suitable either. For TDR, in spite of its strong spatial consis-tency, the weak temporal continuity highly weakens its tracking

WANG et al.: TEMPORAL COMPENSATED MOTION ESTIMATION WITH SIMPLE BLOCK-BASED PREDICTION 247

Fig. 9. The normalized performance scores of various algorithms.

TABLE ICOMPARISON OFCOMPUTATIONAL COMPLEXITY OF VARIOUS ALGORITHMS

ability of the true motion. In comparison with the others, SBPand TC-SBP have the best overall performances, and of the two,TC-SBP has a stronger temporal continuity whereas the trackingperformance of SBP is better.

B. Comparison of Convergence Performance

The comparison of convergence performance mainly aims atthe predicted search algorithms (SBP, TC-SBP and TDR). Be-cause of their recursive convergence strategies, their conver-gence is a stepwise process to the correct vectors, which canbe seen from Figs. 6–8. Although the curves of the three algo-rithms have decreased noticeably during the first five field pe-riods, TC-SBP is the fastest in convergence rate, SBP is a littleslower, whereas TDR is the slowest. Especially for the complexscenes and fast moving (Football), only TC-SBP has an obviousconvergence process at the beginning among their MSE curvesin Fig. 8. All of the facts show that TC-SBP has the best con-vergence performance of the three predicted algorithms.

C. Comparison of Computational Complexity

Table I displays the comparison of computational complexityof various algorithms. The data are the percentage of each algo-rithm in comparison to FS algorithm. Because the complexityof motion estimation algorithm is mainly decided by the blockmatching process, the accessorial operations except the block-matching part are ignored.

VIII. C ONCLUSION

In this paper, new temporal compensated motion estimationalgorithm with the simple block-based prediction has been pre-sented. There are two key contributions. The first is the simpleblock-based prediction, which simplifies the hardware imple-mentation with three neighboring candidate vectors for the mo-tion estimation of each block. The second is the temporal com-

pensated updating strategy, which increases the convergencerate of block-matching algorithm especially for the global mo-tion field.

Measured with the three evaluation criteria, the newly de-signed TC-SBP algorithm gains the advantage over other mo-tion estimation algorithms in both spatial-temporal consistencyand convergence performance, whereas its complexity is signif-icantly less.

It can be concluded from the evaluation results that the newTC-SBP is emerging as the most attractive block-matching algo-rithm in the applications of MC-VFC and video processing. Fur-thermore, due to its simplicity, the proposed algorithm is easyto be implemented.

ACKNOWLEDGMENT

The authors wish to thank the members of Shanghai HighDefinition Digital Technology Innovation Center of ShanghaiJiao Tong University, and S. Yu in particular, for their valuablehelp and suggestions with the preparation of this paper.

REFERENCES

[1] M. Tekalp,Digital Video Processing: Prentice Hall, 1995.[2] J. L. Chen and P. Y. Chen, “A fast-search motion estimation method and

its VLSI architecture,” inProc. 43rd IEEE Midwest Symp. on Circuitsand Systems, Lansing MI, Aug. 8–11, 2000, pp. 164–167.

[3] L. J. Luo, C. R. Zou, X. Q. Gao, and Z. Y. He, “A new prediction searchalgorithm for block motion estimation in video coding,”IEEE Trans.Consumer Electron., vol. 43, no. 1, pp. 56–61, Feb. 1997.

[4] G. de Haan, P. W. A. C. Biezen, H. Huijgen, and O. A. Ojo, “True-mo-tion estimation with 3-D recursive search block matching,”IEEE Trans.Circuits Syst. Video Technol., vol. 3, no. 5, pp. 368–379, Oct. 1993.

[5] T.Tao Chen, “Adaptive temporal interpolation using bidirectional mo-tion estimation and compensation,” in2002 Proceedings, 2002 Interna-tional Conference on Image Processing, vol. 2, Sept. 2002, pp. 313–316.

[6] G. de Haan and P. W. A. C. Biezen, “An efficient true-motion estimatorusing candidate vectors from a parametric motion model,”IEEE Trans.Circuits Syst. Video Technol., vol. 8, no. 1, pp. 85–91, Feb. 1998.

248 IEEE TRANSACTIONS ON BROADCASTING, VOL. 49, NO. 3, SEPTEMBER 2003

[7] D. W. Murray and B. F. Buxton, “Scene segmentation from visual mo-tion using global optimization,”IEEE Trans. Pattern Anal. Machine In-tell., vol. 9, no. 2, pp. 220–228, Mar. 1987.

[8] T. Koga, K. Iinuma, and T. Ishiguro, “Motion compensated interframecoding for video conferencing,” inProc. Nat. Telecomm. Conf., NewOrleans, LA, Nov. 1981, pp. G5.3.1–G5.3.5.

[9] L. K. Liu and E. Feig, “A block-based gradient descent search algorithmfor block motion estimation in video coding,”IEEE Trans. Circuits Syst.Video Technol., vol. 6, no. 4, pp. 419–422, Aug. 1996.

Jisheng Wang was born in Jiangsu, P. R. China,on October 25, 1979. He received the B. Sc. degreefrom the Department of Electronic Power Engi-neering in Shanghai Jiaotong University, Shanghai,China, in 2001. He now is a masterate candidateof the Department of Electronics & InformationTechnology in SJTU.

He is a member of Shanghai High Definition Dig-ital Technology Innovation Center and has been en-gaged in the research of DTV since 2000. Now he istaking part in the national project of video translation

processor (VTP) chip design. His main research interests include video signalprocessing, multimedia communication technologies, VLSI architecture designand digital video compression coding and decoding algorithms.

Dong Wangwas born in Jiangsu, China in 1974. Hereceived the B. Sc. degree and M. Sc. degree fromthe Department of Electronic Engineering in NUAA,China in 1996 and 1999. And he received the Ph.D.degree in Video Signal Processing from the Depart-ment of Electronics and Information Technology inSJTU, China, in 2002.

Dr. Wang is a member of Trident Multimedia Tech-nologies (Shanghai) Co., Ltd. His main research in-terests include video signal process, VLSI architec-ture design and parallel algorithms.

Wenjun Zhang was born in Shandong, China in1963. He received the Ph.D. degree from the Depart-ment of Electronics and Information Technology inSJTU, China, in 1989. From 1990 to 1993, he hadbeen working as a postdoctoral researcher at PhilipsCommunications Industry Co. in Nuremberg, Ger-many, where he was actively involved in developingthe HD-MAC (former European HDTV) system.

In 1995, Dr. Zhang was elected as the headof HDTV Technical Executive Experts Group(TEEG) of China. Since that time, he has become

the chief technical expert for the Chinese government in the field of digitaltelevision and multimedia communications. In the past five years, he led TEEGsuccessfully implemented several generations HDTV prototype systems forterrestrial broadcasting. Two formal TEEG schemes have been presented to thegovernment as the candidates of Chinese digital TV terrestrial broadcastingstandard. Dr. Zhang himself is a member of Digital Television StandardizationCommittee (DTVSC) of China.

In addition to his TEEG and DTVSC responsibilities, Dr. Zhang is also a dis-tinguished professor and the vice president of SJTU. He published about fiftytechnical papers in the area of digital television and multimedia communica-tions. Dr. Zhang received 5 patents and has 7 patents pending.

Recommended