10
HAL Id: hal-01212168 https://hal.inria.fr/hal-01212168 Submitted on 23 Feb 2017 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer To cite this version: Adnane Boukhayma, Edmond Boyer. Video based Animation Synthesis with the Essential Graph. 3DV 2015 - International Conference on 3D Vision, Oct 2015, Lyon, France. pp.478-486 10.1109/3DV.2015.60. hal-01212168

Video based Animation Synthesis with the Essential Graph

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

HAL Id: hal-01212168https://hal.inria.fr/hal-01212168

Submitted on 23 Feb 2017

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Video based Animation Synthesis with the EssentialGraph

Adnane Boukhayma, Edmond Boyer

To cite this version:Adnane Boukhayma, Edmond Boyer. Video based Animation Synthesis with the EssentialGraph. 3DV 2015 - International Conference on 3D Vision, Oct 2015, Lyon, France. pp.478-486�10.1109/3DV.2015.60�. �hal-01212168�

Video based Animation Synthesis with the Essential Graph

Adnane Boukhayma, Edmond BoyerINRIA Rhone-Alpes

Grenoble{adnane.boukhayma,edmond.boyer}@inria.fr

Abstract

We propose a method to generate animations usingvideo-based mesh sequences of elementary movements ofa shape. New motions that satisfy high-level user-specifiedconstraints are built by recombining and interpolating theframes in the observed mesh sequences. The interest ofvideo based meshes is to provide real full shape informa-tion and to enable therefore realistic shape animations. Aresulting issue lies, however, in the difficulty to combine andinterpolate human poses without a parametric pose model,as with skeleton based animations. To address this issue,our method brings two innovations that contribute at differ-ent levels: Locally between two motion sequences, we intro-duce a new approach to generate realistic transitions usingdynamic time warping; More globally, over a set of motionsequences, we propose the essential graph as an efficientstructure to encode the most realistic transitions betweenall pairs of input shape poses. Graph search in the essen-tial graph allows then to generate realistic motions that areoptimal with respect to various user-defined constraints. Wepresent both quantitative and qualitative results on various3D video datasets. They show that our approach comparesfavourably with previous strategies in this field that use themotion graph.

1. Introduction3D animation has become a crucial part of digital me-

dia production with numerous applications, in particular inthe game and motion picture industry. Traditionally, virtualcharacters are created and animated either manually with,for instance user defined keyframes, or automatically withmotion capture data. While manual animations are usuallytedious and require artistic expertise, motion capture dataallows to animate characters with real motion informationextracted from real performances. In addition to signifi-cantly reduce the animation cost, the benefit of motion cap-ture data also lies in the fact that movements are real and,therefore, ensure realism to the created animations. Go-

ing a step further, recent seminal works in computer ani-mation [6] propose to bypass purely virtual characters andto use real videos for both the creation and the animationof characters. The advantage of this strategy is twofold: itreduces the creation cost and increases realism by consid-ering only real data. Furthermore, it allows to create newmotions, for real characters, by recombining recorded ele-mentary movements. In this paper, we consider this synthe-sis of real motions using videos and propose a new solutionto automatically build animations that are as real as possi-ble.

Multi-view capture systems, e.g. [26], can produce 3Dvideos of human movements. In these 3D videos, shapeinformation is generally encoded in the form of tempo-ral mesh sequences, coherent or not. In order to recom-bine elementary movements, motion information could beparametrized using skeletal articulated models, as with tra-ditional motion capture systems, and concatenated based onsuch parametrization. While methods exist that provide ar-ticulated motion information, e.g. [29, 7], this strategy suf-fers anyway from two drawbacks: First the identificationof such information is difficult to perform robustly withgeneric surfaces that do not always evolve according to theassumed articulated motion model, as with clothes for in-stance. Second, the use of an intermediate skeletal modelmakes it difficult to guarantee realism for transitions that fi-nally concern surfaces around skeletons. Another strategyis to avoid the intermediate motion model and to directlyconsider surfaces or meshes. This is the direction takenby [10] who proposed to extend motion graphs, originallydesigned for skeletal motion capture data [15], to structureand recombine motion information with mesh sequences. Insuch a graph, edges connect consecutive meshes within a se-quence and new edges are added between pair of sequencesand with respect to a similarity cost between meshes. Al-though simple and intuitive, motion graphs do not yet guar-antee global optimality of created motion paths since suchpaths are composed of locally optimal edges, i.e. betweentwo sequences, where other paths, through edges not in themotion graph, with better global costs can exist. In order

1

input sequenceshigh-level

constraintsgeneratedanimation

complete digraph essential subgraph minimum cost walk

pose distances transition costs

Figure 1. Overview of the proposed video based animation synthesis pipeline.

to find such paths and to optimally exploit all the availableshape poses, a fully connected graph must be considered.

In light of this, we consider animations that are globallyoptimal with respect to a defined realism criterion. Thiscriterion derives from physical considerations and takesinto account a surface interpolation cost, that measures theamount of surface deformation along a motion segment,as well as a duration cost that accounts for the number offrames of a segment. The first cost tries to preserve shapeconsistency along a motion segment while the second triesto minimize the number of poses within a segment. Ourapproach first builds a fully connected graph where ev-ery shape pose is connected to all others and where edgesweight direct transitions between poses with respect to therealism criterion. To estimate the edge weights, each tran-sition is optimized through dynamic time warping and vari-able blended segments lengths. In a second step, the graphis pruned and the essential subgraph, that is the union ofall shortest path trees rooted at every node, is extracted.The essential subgraph is an optimal structure that con-tains only the necessary edges to generate the most realisticgraph transitions between all pairs of input frames. Finally,the resulting structure is used to generate motion that sat-isfies user-specified spatial, temporal and behavioural con-straints as demonstrated in our results, by solving a stan-dard search optimization problem over the essential graphand with the appropriate cost function. At a lower level, ourapproach combines mesh non-linear interpolation, with 2Drigid alignments to generate the spatially continuous mo-tion sequence corresponding to a graph walk in the essentialsubgraph.

To summarize, our contribution is twofold:

• We introduce a transition approach that combines dy-namic time warping and variable length blended seg-ments to generate mesh based interpolated transitionsthat are optimal with respect to a realism criterion.

• We propose the essential graph as an optimal structureto organize several pose sequences of a person move-ments and from which minimal cost paths are easilyidentified between two given poses.

In order to evaluate the benefit of the essential graphover the motion graph, we compare both graph structuresin a similar context, quantitatively with our criterion, andqualitatively with visual results (see also the accompanyingvideo).

The remainder of the paper is as follows. Section 2 re-views related work. In section 3, we present our graph rep-resentation and we explain motion synthesis in Section 5.Section 6 compares our approach with a standard motiongraph based strategy and shows results before concludingin Section 7.

2. Related work2.1. Motion Data Structures

Traditional motion capture systems provide sparse infor-mation on moving shapes in the form of marker positions.Using such markers positioned at joint locations of, for ex-ample a human body, it is possible to track the poses of anassociated skeletal model. A large body of work is consid-ering such input data to animate virtual characters. To thisaim, motion graphs [15] are largely used to organize andstructure recorded movement sequences of a shape. A mo-tion graph is composed of nodes that represent shape posesand of edges that represent transitions between them. Tobuild such a graph, every consecutive nodes in a tempo-ral sequence are connected. Edges are added to this graphby considering a pose and velocity similarity matrix thatencode all possible pairs of poses between two sequences.Local minima found in this matrix, with sometimes thresh-olding to reduce the number of resulting transitions, definethe new edges in the motion graph.

Following motion graphs, several solutions have been

proposed in the literature. [17] builds first a directed graphin a similar manner to a motion graph, then subsequentlybuilds a statistical model over it for a more intuitive con-trol. [1] also creates a graph where nodes represent se-quences, and transition edges are added between sequenceposes if a similarity cost between them is lower than a userspecified threshold. A hierarchical randomized search isused then to generate motions. [22] introduces a structurecalled interpolated motion graphs that consists of interpo-lating time-scaled paths within a standard motion graph.[33] increases the connectivity in a motion graph by usingmulti-target blending to generate intermediate interpolatedmotions between some of the dataset sequences and build-ing a well connected motion graph that incorporates thesenew motions. [20] constructs an optimization-based graphcombining constrained optimization and motion graphs. Inthe method we present, we also use a graph representationthat connects frame poses in temporal sequences. Howeverwe consider mesh vertices instead of skeleton joint loca-tions and, more importantly, our graph representation en-ables paths between any two frames that satisfy global op-timality, over pose recombinations, and with respect to sur-face deformation and transition duration.

Some statistical approaches have been proposed to solvethe animation problem for skeletal data. [28] applies di-mension reduction and clustering to the input dataset andconstructs a Markov chain with these clusters as states in afirst level. A second level consists then of a hidden Markovmodel that relates the states of the Markov chain to the mo-tion examples. Given a source and destination key frames,the corresponding sequence of clusters is found in the firstlevel, and the most likely succession of motion segmentsis found in the second. More recently, [18] proposes agraph where nodes represent motion primitives and edgestransitions between them. Each node stores a statisticalmodel learned from similar motion segments associated tothe same primitive and a transition distribution function isassociated to every directed edge. However, unlike skeletalmotion capture data and besides their very high dimension-ality, 3D mesh motion datasets that are currently availablerepresent mostly limited sets of movements. Thus, suchstatistical strategies do not adapt to this specific type ofdata. Additionally, and unlike [18], our solution also solvesfor random acyclic motion, such as an unstructured danc-ing sequence (see results for Cathy), which seems hard todecompose into distinctive simple motion primitives. Ina graph-less approach, [11] clusters motion segments andpre-compute a table defining intermediate segments that areused to blend all pairs of cluster medoids. In order to gener-ate a transition from a source segment to a target segment,these two segments are thus blended together with their re-spective cluster medoid intermediaries. Although this workenables, in principle, transitions between all pairs of frames

in the dataset, it requires transition scoring, motion segmentclustering and multi-way frame blending that would be dif-ficult to achieve with 3D mesh sequences and without con-tact information.

2.2. Motion Similarity

Measuring motion similarity is an important aspect inanimation synthesis. Although [14] argued that logical mo-tion similarity does not necessarily imply numerical mo-tion similarity, most of the works mentioned previously de-fined some quantitative pose and velocity similarity metricto estimate motion similarity. For instance, [15] computesa weighted sum of squared distances between point cloudsdriven by the skeleton and formed over a window of framesthat are 2D rigidly aligned, while [17] uses a weighted sumof euclidean differences of joint velocities, skeleton root po-sitions and geodesic distances between joint orientations.Our approach considers mesh based pose distances but isindependent of the distance function used for that purpose.

2.3. Motion Transition

Another important component of motion synthesis is thegeneration of transitions between sequences. Gradual frameblending remains yet a reasonable and intuitive solutionthat is widely used in the literature, either in a start-end[15], center-aligned [1] or left aligned [17, 22] transitionform. However, it might generate unpleasant results if in-correctly applied, especially with linear interpolation andunless costly space-time constraints and inverse kinemat-ics are used as in [21]. These limitations can anyway besubstantially reduced with motion dynamic time warping.Following a strategy similar to [13], we use dynamic timewarping but extend the framework to transitions betweenmesh sequences. In addition, inspired by the study of inter-polated transitions in [30] that emphasizes the importanceof a suitable transition duration, we allow such duration tovary during a transition.

2.4. Mesh Sequences

Only a few works consider yet mesh sequences as in-put data for motion synthesis. They were first introducedin [10] in association with motion graphs and frame juxta-position for transitions, with therefore the limitations men-tioned before for transitions and motion graphs. [12] alsouses motion graphs but over separate groups arranged in atree-structured kinematic hierarchy forming the global ob-ject. Linear interpolation in the local coordinate frame ofeach group was sufficient for their type of data. In [34],the animation is solved in the skeleton domain using linearblend skinning. Motion Transitions are selected by simplethresholding of a transition cost calculated using the poseparameters and their first derivatives, as in [1]. Our objec-tive is to solve the animation in the mesh domain, remov-

ing therefore the issue of mesh/skeleton conversion, and wepropose an elaborated model for the selection and genera-tion of transitions.

More recently, in a work similar to [14] with skeletaldata, [5] achieves motion synthesis by temporally aligningand blending logically similar 3D mesh motion sequences,and mapping the blend weights to a high level parameter,thus creating a parametrized motion space. A Parametricmotion graph is then built by connecting several parametricmotion spaces through pre-computed transitions, as in [9]for skeletal data, or by computing optimal transitions at run-time, as in [6] for 3D meshes. The theory behind parametricmotion graphs is similar to that of Fat graphs [23], inspiredby the work of [8]. These methods allow motion synthe-sis with precise control but only for specific input data andrequire a high level of supervision. Our objective is differ-ent and improves over [10] with globally optimal motionsequences generated from a general set of input sequences.

3. Graph modelThis section details the steps involved in the construc-

tion of the essential graph for mesh based motion segments.The input data is a set of temporally aligned sequences of3D triangulated meshes, i.e. meshes present the same topol-ogy and vertex connectivity over all sequences. These se-quences represent a model, typically but not necessarily ahuman, undergoing arbitrary movements with no prior re-strictions on the nature of the movements. It should benoticed here that while we consider meshes as input, thegraph construction is independent of the pose model usedand could be applied with other models including articu-lated models.

A directed weighted graph structure is drawn from theinput sequences, where nodes are sequence frames, corre-sponding to shape poses, and edges represent possible tran-sitions between them. Edge weights account for both thesurface interpolation cost of transitions and their duration.Initially, edges in the graph connect successive frames inthe input sequences, as shown in figure 1. Since these edgescorrespond to natural transition, their surface interpolationcost is low, zero in practice in our implementation, and theirduration cost is one time unit (assuming uniform temporalsampling). From this initial graph, a fully connected graphis built where additional edge weights are estimated using astatic pose similarity metric and dynamic time warping, asdetailed in section 3.2. In a last step, the essential graph isextracted from the fully connected graph by pruning edgesthat do not contribute to minimal cost paths in the graph.

3.1. Pose similarity metric

We define a static distance function between two givenshape poses. Such basic distance will be used further at var-ious stages of our approach, for instance when estimating

the cost of a dynamic transition that involves several inter-mediate poses. Given two poses i and j, we first rigidlyalign them by finding the rigid transformation that mini-mizes the weighted sum of squared distances between meshvertices, where the rigid transformation is composed of atranslation in the horizontal plane (i.e. the plane parallelto the motion) and of a rotation around the vertical axis.Weights are used, for instance, to give more importance totorso vertices with humans. Once aligned, one could takethe residual of the alignment cost as the distance betweenthe poses. However, this residual does not account for themesh geometry [24] . We use instead local deformations [2]or deformation gradients as referred to in [27, 32], as ex-plained below.

For every pair of corresponding triangles (v1, v2, v3) and(v′1, v

′2, v′3) in a pair of meshes, we compute the following

unique transform matrix:

T = (v1 − v3, v2 − v3, n)−1 .

(v′1 − v′3, v′2 − v′3, n′

), (1)

where n and n′ are the triangle unit normals. We performpolar decomposition on this linear transformation matrix toextract the orthogonal rotation component Rm, and sym-metric scale/shear component Sm for triangle m, and alsostore them for later purposes such as pose interpolation:

T = RmSm. (2)

Combining the Riemannian metric of the Lie goup of ro-tations SO(3), and Frobenius norm for symetric 3x3 matri-ces, we define our pose similarity metric between frames iand j as follows:

d(i, j) =∑

m∈Triangles

1√2‖ log (Rm) ‖F + wm‖Sm − I3‖F . (3)

Through weighting factors wm, we give more importanceto the rotation term. In practice, we use the same valuewm = 0.01 for all mesh triangles. In order to validatethis distance function, we have conducted experiments withskeleton based mesh animations for which skeleton posedistances are assumed to capture exact shape pose dis-tances. Comparisons between distance matrices obtainedwith pairs of sequences show that the above distance func-tion gives better results than the sum of squared distancesmentioned previously, i.e. distances (3) present higher cor-relations with skeleton pose distances.

3.2. Graph edge weights

Weights attributed to edges in our graph should ideallycapture the realism of the transitions they represent. Assum-ing that a realistic transition should involve as little surfacedeformation as possible, we introduce a metric that mea-sures the minimum surface interpolation cost between anypair of nodes in the graph. It involves several intermedi-ate poses for that purpose, which makes it different fromthe static pose similarity. This metric, in addition to the

Pose distancematrix

Figure 2. In order to generate an interpolated transition betweenframe i and j, we gradually blend source and destination segments[i, i + lii,j − 1] and [j − lji,j + 1, j] using dynamic time warpingthat finds the minimum-cost path joining element (i; j − lji,j + 1)

to element (i+ lii,j −1; j) in the pose distance matrix between thetwo segments frames. This path should be continuous, not reversedirection, and not exceed 3 consecutive vertical or horizontal steps.The resulting path defines the temporal warps wii,j and wji,j .

transition duration, allows us to quantitatively evaluate therealism of a transition.

In order to generate a smooth interpolated transitionfrom frame i in a source sequence to a frame j in a des-tination sequence, we consider a window of successiveframes of size li in the source sequence starting at frame i:[i, i + li − 1], and a similar window of successive framesof size lj in the destination sequence ending at frame j:[j−lj+1, j]. These are the source and destination segmentswhose frames are to be blended gradually so as to generatea smooth transition between frames i and j. First, these twosegment frames are temporally aligned with warps wi andwj respectively and with respect to the pose metric definedpreviously in section 3.1. We use a standard dynamic timewarping algorithm [19] to estimate the best temporal warps.In addition, following [30] who argued that a transition du-ration in skeletal motion interpolation should be comprisedbetween a third of a second and 2 seconds, we allow seg-ment lengths li and lj to vary between boundaries lmin andlmax that correspond to these values. We choose the tran-sition with the least total surface deformation cost D(i, j)through the interpolation path (see figure 2).

D(i, j) = minli,lj ,wi,wj ,L

∑t∈J0,LK

d(wi−1

(t), wj−1

(t)), (4)

where d(, ) is the pose similarity metric defined in equa-tion (3), and L is the transition duration, i.e. the length ofthe interpolation path found by the dynamic time warpingoptimization (see figure 2).

Since the transition duration L is uniquely defined for agiven set of values (li, lj , wi, wj), expression (4) rewrites:

D(i, j) = minli,lj∈Jlmin,lmaxK

[ minwi,wj

∑t∈J0,LK

d(wi−1

(t), wj−1

(t)]

︸ ︷︷ ︸Dynamic time warping

.

The optimal parameters found with the above optimization,i.e. (lii,j , l

ji,j , w

ii,j , w

ji,j , Li,j), will be used later on for

motion synthesis.For each pair of nodes i and j in the graph, we define the

weight Eij of the directed edge (i, j) as the weighted sumof the optimal surface deformation cost Di,j = D(i, j) andthe associated duration cost Li,j : Eij = Di,j + αLi,j . Inthe case where frames i and j belong to the same sequence,with i < j, it might be unnecessary to consider an inter-polated transition if its cost is greater than α(j − i), that isthe cost of going from i to j through the original motionsequence. The following expression summarizes the defini-tion for the edge weight Ei,j between poses i and j. If i andj do not belong to the same sequence :

Ei,j︸︷︷︸realism

= Di,j︸︷︷︸surface deformation

+ α Li,j︸︷︷︸duration

, (5)

Else :

Ei,j = min[α(j − i), Di,j + α Li,j ], i < j. (6)

The weight α represents the ratio of tolerance between sur-face deformation and transition duration. This user definedcoefficient allows for some flexibility on the admissible sur-face deformation during a transition and with respect to timeduration.

At this stage all nodes are connected in the graph withdirected weighted edges, hence forming a complete digraph(see figure 1). The resulting structure is analogous to aquasi-semi-metric space, (I, E), where I is the set of nodesand E : I × I → R+ is the asymmetric function definedin equations(5) and(6). In the next stage, we proceed withthe extraction of the essential subgraph from this completedigraph.

3.3. Essential subgraph

In a traditional motion graph, edges between sequencesare obtained by selecting minima in transition cost matricesbetween pairs of sequences. This strategy is intuitive but yetnot globally optimal, as transitions between sequences otherthan minima might give better total cost to a graph walkif the minimized criterion considers also duration for in-stance, which is an important constraint for motion synthe-sis. We propose therefore a global and principled strategythat consists of extracting the best paths between any pairsof poses and to keep only edges in the graph that contributeto these paths. This corresponds to extracting the essentialsub-graph from the complete digraph induced from the in-put sequences. Unlike previous methods, ours ensures the

existence of at least one transition between any two nodesin the graph, which potentially yields to a better use of theoriginal data with less dead ends trimming.

For every pair of nodes in the graph, we use Dijkstraalgorithm to find the path with the least cost joining thesource node to the destination node. The cost of a pathp = [n1, n2, ..., nN ] that goes through nodes n1, n2, ..., nN

is defined as:

J(p) = J([n1, n2, ..., nN ]) =∑

i∈J1,N−1K

Eni,ni+1 . (7)

Once the optimal paths joining all pairs of nodes are found,all edges that do not belong to any of these paths are pruned.This gives an optimally connected graph which containsonly the necessary edges to connect every pair of nodes withthe best possible path cost in terms of realism (see figure 1).This resulting structure is also referred to as the union ofshortest path trees rooted at every graph node, and its den-sity is proportional to the α factor introduced in equations 5and 6. The following sections explain how to use this struc-ture to generate new animations.

4. High-level constraintsWe consider in this part constrained navigation in the es-

sential graph with various user-specified constraints such asspatial, temporal and behavioural constraints. We cast mo-tion extraction as a graph search problem that consists offinding the walk p = [n1, n2, ..., nN ] with minimal totalerror cost Jc(p) as follows:

Jc(p) = Jc([n1, n2, ..., nN ]) =∑

i∈J1,NK

jc([n1, ..., ni−1], ni), (8)

where jc([n1, ..., ni−1], ni) is a scalar function evaluatingthe additional error of adding node ni to walk [n1, ..., ni−1]with respect to the user specification. Since a graph nodemight be visited more than once in this search, a haltingcondition must be specified to prevent the search from in-finite recursion. In order to demonstrate the interest of es-sential graphs for animation purposes, we implement twoscenarios with different types of constraints. In both sce-narios, the graph search was solved using Depth-First algo-rithm with Branch-and-Bound to reduce the computationalcomplexity.

3D Behavioural path synthesis: In this scenario (seefigure 5) the user provides a 3D curve that the character’scenter of mass should follow as closely as possible. Addi-tionally, some parts of this path may impose specific typesof motion taken from the dataset. This kind of constraintsbetter adapts to datasets with a variety of basic motions in-volving locomotion activities, and for which there are sig-nificant displacements of the character in space. This is thecase with our dataset Thomas. In this example, the costfunction jc measures the 3D distance between the addedframe to the current generated path and the point along the

input 3D curve with the same arc length as the added framealong the generated path.

Pose/Time constraint: In this scenario the user providesa sequence of poses with associated times of occurrence.The search generates the motion sequence that satisfy asmuch as possible these pose/time constraints. This type ofconstraint adapts well to sequences of random unplannedactivities, such as dance footages as in our dataset Cathy.The search problem is solved here for every two successivequery poses separately, and the cost function jc measuresthe duration cost of adding a node to the current generatedpath. The halting condition occurs when the duration of thetravelled path between source and destination poses exceedsthe desired duration.

5. Motion synthesisIn this section, we explain how a walk in the essential

subgraph is converted into a continuous stream of motionin the form of a 3D mesh sequence. A walk path is repre-sented by a node sequence, where every pair of successivenodes should correspond to an edge in the essential sub-graph. We browse the path and sequentially generate thepurely original or newly interpolated motion segments cor-responding to every pair of consecutive nodes in the path.Interpolated segments are generated using the optimal tran-sition scheme introduced in section 3.2. That is, temporallywarped source and destination motion segments are blendedby gradually aligning and interpolating the matched framemeshes as well as the displacement of their respective cen-tres of mass. Finally, we perform a rigid alignment (as ex-plained in section 3.1) at the junction between two consec-utive motion segments to obtain a spatially continuous mo-tion stream.

Pose interpolation: In order to interpolate framemeshes, we use a variant of Poisson shape interpola-tion [31]. Our experiments demonstrate that this strategyperforms better compared to linear vertex interpolation oras rigid as possible approaches [24]. Using gradient fieldmanipulation [3] notations, the 3 × 3 per-face gradient ofthe mesh for triangle m can be expressed as follow:

Gm =

(v1 − v3)T

(v2 − v3)T

nT

−11 0 −10 1 −10 0 0

v1T

v2T

v3T

. (9)

A global 3M ×N gradient operator G can then be yielded,whereM andN are the total numbers of mesh triangles andvertices respectively:G1

...GM

= G

vT1

...vTN

. (10)

In order to interpolate poses with interpolation parameterλ ∈ [0, 1], the interpolated rotation and scale/shear, ex-tracted previously from per-face gradient deformations (see

section 3.1), are applied to each per-face gradient whichyields new gradients Gm:

Tm = eλ log(Rm) ((1− λ)I3 + λSm) , (11)

Gm = TmGm. (12)The following standard Poisson system is then solved for

new mesh vertex positions v1 ... v1, with a Dirichlet bound-ary condition imposing to the interpolated mesh center ofmass to be at the same location as the original one.

GTDG︸ ︷︷ ︸∆

vT1

...vTN

= GTD︸ ︷︷ ︸div

G1

...GM

, (13)

where D is a diagonal 3M × 3M mass matrix containingthe triangle areas. We note that the left hand operator ofthis equation is equivalent to a discrete Laplace-Beltramioperator with cotangent weights. To insure the unicity of thesolution, we perform a double interpolation using the sourceand target meshes and we linearly interpolate the resultingmeshes.

6. ResultsTo evaluate our framework, we recorded two datasets

of temporally consistent mesh sequences, with 5000 ver-tices and 10000 faces per mesh, and a frame rate of 50fps.These datasets were obtained using multi-view reconstruc-tion and temporal tracking. The Thomas dataset contains2633 frames of a male actor performing basic locomotionactivities, and the Cathy dataset has 1910 frames of a fe-male actress performing random dancing. We also appliedour method to standard datasets of 3D mesh sequences,such as Dan and JP datasets [25] [4], the Capoeira dancerdataset [26] and the Adobe dataset [29].

Figure 3. (a) Our transition with Dynamic time warping and vari-able length blended segments, (b) Standard interpolated transition,both using Poisson pose interpolation

Transition approach: Our transition approach combin-ing dynamic time warping and variable length blended seg-ments offers substantial improvements in the visual results.

Compared to a standard interpolated transition, our transi-tions show less visual flaws such as surface distortion (seefigure 3) and foot-skate, and also compensates for the dif-ferential in velocity between source and destination motionsthrough temporal warping.

Figure 4. Illustration of motion graphs failures to find a graph walkas short and costless as the essential graph regardless of the transi-tion approach. Input poses are in green. Note that input destinationpose rotation around the vertical axis is estimated by the algorithmin both cases. (a) Essential graph, (b) Motion graph.

Graph structure: Given a dataset of mesh sequences,we want to generate motions joining all frames minimizinga joint cost of surface interpolation and duration. To thisend, we construct an essential graph and a motion graph,and compute the costs of optimal paths joining every pairof frames in the dataset. In order to compare graph struc-tures independently of the transition approach, we use thesame transition cost for both methods. To construct a mo-tion graph, we compute transition cost matrices between allpairs of sequences and select the local minima within themas transition points. In the original paper implementation,it is also recommended to only accept local minima belowan empirically determined threshold to reduce the compu-tational burden. We deliberately ignore this measure to ob-tain a highly connected motion graph to better challengeour method. Note that we do not compare to existing exten-sions of the motion graph since our primary objective is toevaluate the initial structure that is used to select optimaltransitions and we believe that many of these extensionscould anyway apply to the essential graph as well. Fig-ure 6 presents the total costs of all the optimal paths usingboth our method and the standard motion graph. The plotsshow clearly that our method outperforms standard motiongraphs quantitatively for various values of the α parameterthat weights surface deformation and path duration contri-butions in the cost. Additionally, we show some visual re-sults where motion graphs fail to find graph walks as shortand costless as essential graphs (see figure 4).

Figure 5. 3D Behavioural path synthesis (Section 4). Green meshes are original and red ones are interpolated.

0

1

2

3

0 10000 20000 30000 40000 50000

(a) Motion graphEssential graph

1

2

3

4

5

6

7

8

0 10000 20000 30000 40000 50000

(b) Motion graphEssential graph

0

1

2

3

4

5

6

7

0 50000 100000 150000 200000

(c) Motion graphEssential graph

0

2

4

6

8

10

12

14

0 50000 100000 150000 200000

(d) Motion graphEssential graph

Figure 6. Comparisons between the essential graph a standard mo-tion graph. Figures (a)-(d) show the costs (ordinate) of all optimalpaths (abscissa) joining all pairs of frames in the dataset using us-ing both methods while varying α. Dan dataset: (a) α = 0.01,(b) α = 0.1. JP dataset: (c) α = 0.01, (d) α = 0.1.

7. Conclusion

We presented a new strategy for automatic motion gen-eration using temporally aligned 3D mesh sequences. Thismethod introduces the essential graph as the organizing datastructure for motion information. This structure identifiesall the shortest paths in the fully connected graph of shape

poses. It appears to be a good alternative to motion graphsthat improves realism and better exploits the input informa-tion. In addition, the method proposes interpolated transi-tions between 3D surface meshes in place of frame concate-nations.

Our contributions also include an approach to estimateframe to frame transition using Poisson pose interpolation.It combines variable-length source and destination motionsegments along with dynamic time warping. In addition,we present a new motion transition cost function that eval-uates a realism criterion. Although there is no standard ob-jective way of measuring motion transition realism, neitherfor skeletal data nor for 3D surface data, other than per-ceptual evaluation, the criterion that we introduced in thiswork achieved satisfactory results in that respect and overthe datasets we experimented. In practice, we have ob-served only very few mesh distortions and abrupt changes inthe generated motions while such artifacts are often presentwith standard interpolated transitions.

Nonetheless, we have also experienced artifacts whichmight appear during transitions and such as foot-skate,which is a common issue in motion synthesis with multi-target blending. With skeletal data, this issue is partiallydealt with by annotating the input data with contact in-formation and taking this annotation into considerationwhile generating transitions, and also by applying inverse-kinematics on transitions to erase the foot-skate effect fromthem [16]. Although mesh sequence annotation seems to bemore complicated, a similar approach could be applied on3D mesh sequences.

References[1] O. Arikan and D. A. Forsyth. Interactive motion generation

from examples. ACM Trans. Graph., 21(3), 2002. 3[2] A. H. Barr. Global and local deformations of solid primitives.

In Proc. of ACM SIGGRAPH, 1984. 4[3] M. Botsch and O. Sorkine. On linear variational surface de-

formation methods. IEEE Transactions on Visualization andComputer Graphics, 14(1), 2008. 6

[4] C. Budd, P. Huang, M. Klaudiny, and A. Hilton. Global non-rigid alignment of surface sequences. Int. J. Comput. Vision,102(1-3), 2013. 7

[5] D. Casas, M. Tejera, J. Guillemaut, and A. Hilton. Paramet-ric control of captured mesh sequences for real-time anima-tion. In Motion in Games - 4th International Conference,2011. 4

[6] D. Casas, M. Tejera, J.-Y. Guillemaut, and A. Hilton. 4dparametric motion graphs for interactive animation. In Proc.of the 2012 Symposium on Interactive 3D Graphics andGames, 2012. 1, 4

[7] J. Gall, C. Stoll, E. De Aguiar, C. Theobalt, B. Rosenhahn,and H.-P. Seidel. Motion capture using joint skeleton track-ing and surface estimation. In Proc. of CVPR, 2009. 1

[8] M. Gleicher, H. J. Shin, L. Kovar, and A. Jepsen. Snap-together motion: Assembling run-time animations. ACMTrans. Graph., 22(3), 2003. 4

[9] R. Heck and M. Gleicher. Parametric motion graphs. InProc. of the 2007 Symposium on Interactive 3D Graphicsand Games, 2007. 4

[10] P. Huang, A. Hilton, and J. Starck. Human motion synthesisfrom 3d video. In Proc. of CVPR, 2009. 1, 3, 4

[11] L. Ikemoto, O. Arikan, and D. Forsyth. Quick transitionswith cached multi-way blends. In Proc. of the 2007 Sympo-sium on Interactive 3D Graphics and Games, 2007. 3

[12] D. L. James, C. D. Twigg, A. Cove, and R. Y. Wang. Meshensemble motion graphs: Data-driven mesh animation withconstraints. ACM Trans. Graph., 26(4), 2007. 3

[13] L. Kovar and M. Gleicher. Flexible automatic motion blend-ing with registration curves. In Proc. of the 2003 ACM SIG-GRAPH/Eurographics Symposium on Computer Animation,2003. 3

[14] L. Kovar and M. Gleicher. Automated extraction and param-eterization of motions in large data sets. ACM Trans. Graph.,23(3), 2004. 3, 4

[15] L. Kovar, M. Gleicher, and F. Pighin. Motion graphs. ACMTrans. Graph., 21(3), 2002. 1, 2, 3

[16] L. Kovar, J. Schreiner, and M. Gleicher. Footskatecleanup for motion capture editing. In Proc. of the 2002ACM/Eurographics Symposium on Computer Animation,2002. 8

[17] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S.Pollard. Interactive control of avatars animated with humanmotion data. ACM Trans. Graph., 21(3), 2002. 3

[18] J. Min and J. Chai. Motion graphs++: A compact genera-tive model for semantic motion analysis and synthesis. ACMTrans. Graph., 31(6), 2012. 3

[19] M. Muller. Information Retrieval for Music and Motion.Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007.5

[20] C. Ren, L. Zhao, and A. Safonova. Human motion synthe-sis with optimization-based graphs. Comput. Graph. Forum,29(2), 2010. 3

[21] C. Rose, B. Guenter, B. Bodenheimer, and M. F. Cohen. Ef-ficient generation of motion transitions using spacetime con-straints. In Proc. of ACM SIGGRAPH, 1996. 3

[22] A. Safonova and J. K. Hodgins. Construction and optimalsearch of interpolated motion graphs. ACM Trans. Graph.,26(3), 2007. 3

[23] H. J. Shin and H. S. Oh. Fat graphs: constructing an in-teractive character with continuous controls. In Proc. of the2006 ACM/Eurographics Symposium on Computer Anima-tion, 2006. 4

[24] O. Sorkine, D. Cohen-Or, Y. Lipman, M. Alexa, C. Rossl,and H.-P. Seidel. Laplacian surface editing. In Proc. of the2004 Eurographics/ACM SIGGRAPH Symposium on Geom-etry Processing, 2004. 4, 6

[25] J. Starck and A. Hilton. Surface capture for performance-based animation. IEEE Comput. Graph. Appl., 27(3), 2007.7

[26] C. Stoll, J. Gall, E. de Aguiar, S. Thrun, and C. Theobalt.Video-based reconstruction of animatable human characters.ACM Trans. Graph., 29(6), 2010. 1, 7

[27] R. W. Sumner and J. Popovic. Deformation transfer for tri-angle meshes. ACM Trans. Graph., 23(3), 2004. 4

[28] L. M. Tanco and A. Hilton. Realistic synthesis of novelhuman movements from a database of motion capture ex-amples. In Proc. of the Workshop on Human Motion(HUMO’00), 2000. 3

[29] D. Vlasic, I. Baran, W. Matusik, and J. Popovic. Articulatedmesh animation from multi-view silhouettes. ACM Trans.Graph., 27(3), 2008. 1, 7

[30] J. Wang and B. Bodenheimer. Synthesis and evaluation oflinear motion transitions. ACM Trans. Graph., 27(1), 2008.3, 5

[31] D. Xu, H. Zhang, Q. Wang, and H. Bao. Poisson shape in-terpolation. In Proc. of the 2005 ACM Symposium on Solidand Physical Modeling, 2005. 6

[32] Y. Yu, K. Zhou, D. Xu, X. Shi, H. Bao, B. Guo, and H.-Y. Shum. Mesh editing with poisson-based gradient fieldmanipulation. In Proc. of ACM SIGGRAPH, 2004. 4

[33] L. Zhao and A. Safonova. Achieving good connectivity inmotion graphs. In Proc. of the 2008 ACM/Eurographics Sym-posium on Computer Animation, July 2008. 3

[34] C. Zheng. One-to-many: example-based mesh animationsynthesis. In Proc. of the 2013 ACM/Eurographics Sympo-sium on Computer Animation, 2013. 3