Motion Curves: A versatile representation for motion data
by
Kevin Forbes
A thesis submitted in conformity with the requirementsfor the degree of Master of Science
Graduate Department of Computer ScienceUniversity of Toronto
Copyright c© 2005 by Kevin Forbes
Abstract
Motion Curves: A versatile representation for motion data
Kevin Forbes
Master of Science
Graduate Department of Computer Science
University of Toronto
2005
This thesis presents Motion Curve space: a novel representation scheme for the poses
of an articulated skeletal figure. A Motion Curve space is defined by a set of orthogo-
nal basis vectors that have been found by performing a weighted principal component
analysis on an example motion clip. An animator can control the properties of the space
through the selection of the example clip and the PCA weights. We explore the expres-
sive and computational power of the representation through the creation of several new
motion processing and analysis algorithms, which are demonstrated through prototype
applications. These prototypes help to establish the workflow for a hypothetical produc-
tion application. In presenting this work, we hope to expand the size of the animator’s
toolbox. By providing a new and usable framework for editing motions, we make it
possible to quickly modify existing motion assets and stretch animation budgets.
ii
Acknowledgements
I’d like to thank my advisor, Dr. Eugene Fiume, for his guidence and for giving me the
freedom to pursue my choice of research topics. I’d also like the thank Dr. Karan Singh
for being my second reader.
Science is an inherently collaborative endeavor, and I am indebted to everyone in the
lab who offered suggestions and help along the way. I owe Alex Kolliopoulos a huge
favour at some point for his help with submitting this thesis from a distance. I owe
my wife, Shannon, an even bigger favour for her moral support over the course of this
project.
Finally, I’d like to thank OGS and NSERC for financial support.
iii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Statement of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 7
2.1 Representing Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Skeletal Animation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Driving a Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Other Pose Representations . . . . . . . . . . . . . . . . . . . . . 10
2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Creating Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Keyframing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Rotoscoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.5 Digital Puppetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Motion Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Signal Based Techniques . . . . . . . . . . . . . . . . . . . . . . . 17
iv
2.4.2 State Based Techniques . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Motion Segmentation and Recognition . . . . . . . . . . . . . . . . . . . 20
2.6 High Dimensional Data Search Techniques . . . . . . . . . . . . . . . . . 22
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Motion Representation 26
3.1 The Trouble with Motion Data . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Motion Curve Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Constructing the space . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Projections and Unprojections . . . . . . . . . . . . . . . . . . . . 31
3.3 Space Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Pose Distance Metric . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.4 Representational Error . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Pose Detection in Motion Curve Space . . . . . . . . . . . . . . . . . . . 37
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Interpolation 43
4.1 Two-Pose Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 M-way Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Improved Non-overlapping Blends . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Case study - Motion Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Geometric Operations 54
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Finding Mean Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Scaling-Based Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 57
v
5.4 Translation-Based Operations . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5.1 A Wavelet Approach to Smoothing . . . . . . . . . . . . . . . . . 62
5.6 Case Study - PCA Explorer . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.7 Extensions: Joint Limits and Selective Blending . . . . . . . . . . . . . . 66
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Unsegmented Motion Searching 69
6.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.1 Finding the Characteristic Point . . . . . . . . . . . . . . . . . . . 73
6.2.2 Generating Seed Points . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.3 Seed Point Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.4 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.5 Results Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.3 Motion Capture Data . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.5 Performance Optimization . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Conclusion and Future Work 88
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
vi
7.2.2 New Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.3 Search Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.4 Software Development . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 93
vii
Chapter 1
Introduction
1.1 Motivation
Animation provides a powerful and compelling artistic medium. Given complete control
over the canvas, an animator can envision anything from the abstract work of Nor-
man McLaren to the gritty hyper-reality of Linklater’s adaptation of A Scanner Darkly.
Within the frames of moving images, there are no physical constraints limiting what can
be represented. Where sculptors must do battle with gravity, musicians must play within
the range of their instruments, and dancers may bend but not transcend the capabilities
of the human body, the animator, in theory, is only held back by his or her imagination.
That said, animation does in fact have real-world limits. One of the constraints upon
animation is economic, rather than technical. While it is theoretically possible to any
create any sequence of two dimensional images, the types of sequences that are econom-
ically feasible, in terms of available time and expertise, is limited by the expressiveness
of the animators’ tools and the available computational resources.
In the beginning, the only way to create animation was to hand-draw every frame.
Using this solution rapidly becomes infeasible to create an animation of any appreciable
length. Even if a single animator drawing every frame from scratch could meet the
1
Chapter 1. Introduction 2
demands of frame-to-frame consistency and stave off the tedium of re-drawing rarely-
changing objects again and again, s/he could simply not draw quickly enough to complete
a complicated project within a reasonable amount of time.
Traditional animation studios developed a catalogue of techniques to surmount these
limitations. Cell animation separates foreground from the background by placing layers
on transparent sheets, allowing each to animated separately. Keyframing allows a lead
animator to define the flow of an animation with very few drawings, leaving the bulk of
drawing to a team of junior artists. These techniques allow for animation reuse, and for
parallel frame production, both which improve a studio’s throughput.
These techniques carried the animation industry for many decades - from Snow White
to Saturday morning cartoons. But as the twentieth century drew to a close, cheap
computing power and digital storage revolutionized the medium. Computers tend to
change the way in which we do things (not always for the better). In no field is this more
true than in animation.
The invention of the word processor may have changed the interface one uses to write
a novel, but it did not change the actual substance of the activity of writing. Computer
animation, however, is an entirely different medium than its 2D predecessor. With the
shift and inexorable increase in audience expectations, technology and workflow, the
animator’s task has fundamentally changed. The creation of motion data has been freed
from the representation of the character exhibiting the motion. Modelers and texture
painters create detailed three dimensional descriptions of sets, props, and characters that
can be rendered (relatively) quickly, from any angle. An animator typically interacts with
these virtual objects by directly manipulating them, or through procedural methods. The
product of the animator’s labour is no longer a single, concrete representation of a moving
character, but rather, an abstract representation of a character’s movement.
In this way, the task of animation has come to resemble puppetry, but with an im-
portant distinction. The motion that a traditional puppeteer creates is real, and in-the-
Chapter 1. Introduction 3
moment. It is by nature ephemeral - it is a performance, not a piece. The motion that
a 3D animator creates, in contrast, is an abstract mathematical representation of a mo-
tion. It is data. As such, it can be stored, manipulated, and re-used like any other piece
of data. This presents new opportunities for expression. Much in the way that digital
sampling has expanded the scope of musical expression, digital motion editing and re-use
have the potential to create new ways to work with motion.
If a 3D animator is to fully exploit that medium’s digital nature, he or she will need
two things: a large body of existing motion clips with which to work, and a flexible
representation for the motion that facilitates interesting and expressive operators. The
first requirement can be filled by using motion capture. This thesis seeks to fill the second
requirement.
Motion capture enables the quick recording and representation of subtle, nuanced
physical performance. Unfortunately, the cost of purchasing or renting time with motion
capture equipment is often prohibitive. Techniques that facilitate the synthesis of new
motions from existing motion clips help to alleviate this problem. Animation software lets
the animator to manually edit the individual degrees of freedom of an animation. While
this does permit animation re-use, the process is tedious. Semi-automatic techniques,
which operate over more than one degree of freedom at a time under an animator’s
direction, can be much more useful.
Sequences of motion capture data are usually stored and processed as hierarchical lists
of orientations. Most methods for expressing orientations have undesirable properties,
such as non-euclidean distance metrics or discontinuities, which complicate the treatment
of the data. It would be advantageous to transform the data into a form that is easier
to work with.
As we shall show, it is possible to perform a weighted principal components analysis
on pose data. Projecting the poses of a motion into the resulting Euclidean space results
in a series of points that can be used to define a discrete but explicit path through a high
Chapter 1. Introduction 4
dimensional space. We reconstruct such paths to curves that we call Motion Curves. This
representation allows for the direct application of techniques from geometric modeling
and signal processing. These techniques can be used to simplify animation tasks, such as
interpolation. They also present new and interesting ways to interact with motion data,
and have been leveraged to create unique motion editing tools.
1.2 Statement of Thesis
This thesis formalizes the Motion Curve representation, and explores the expressive power
of various operations within the Motion Curve space. In doing so, it introduces several
new algorithms for dealing with motion data, including a search algorithm for unseg-
mented motions clips. These algorithms are implemented as standalone prototypes uti-
lizing a common data format. The purpose of these prototypes is to establish the workflow
for a hypothetical production application. The prototypes validate the functionality of
the proposed application.
In presenting this work, we hope to expand the size of the animator’s toolbox. By
providing a new and usable framework for editing motions, we make it possible to quickly
modify existing motion assets and stretch animation budgets. Our techniques can also
be used to modify motions dynamically and continuously, in situations such as games or
real-time visualizations. In this context, our work gives the designer of such a system
meaningful axes for high-level control of animations. It also provides a flexible frame-
work for pose interpolation, which can be integrated with existing blend-based animation
systems.
1.3 Contributions
The main contribution of this thesis is the introduction and characterization of the Mo-
tion Curve representation. The unique characteristics of this representation permit the
Chapter 1. Introduction 5
development of several useful algorithms for dealing with motion data. We provide both
low-level data manipulation tools, as well as high-level algorithms that leverage the tools.
The major technical contributions include:
• An algorithm for robustly detecting key poses (section 3.4)
• Quick, M-way pose interpolation (section 4.2)
• A prototype motion editing application that implements several unique operators
(chapter 5)
• A search algorithm for unsegmented motion data, which was published as [24].
1.4 Thesis Organization
Chapter 2 presents an overview of the state of art in the various fields this work touches
upon. It begins by discussing the representation of poses in the literature. The standard
skeletal hierarchy model is presented in detail, and other more obscure or specialized
models are mentioned briefly. Next, the various methods used to generate motion data
are explained. We continue by presenting an overview of modern motion processing
techniques, dividing the field into camps: signal-based and state-based. We then survey
some recent works in motion segmentation and automatic recognition, and finish with
a survey of high dimensional data search techniques. The information in this chapter
provides a good sense of context for the ensuing work.
The Motion Curve representation is formalized in chapter 3. A case for Motion Curves
is built first, by discussing existing representations, and enumerating a list of desirable
but as yet unmet characteristics for a motion representation. The steps for constructing a
Motion Curve space are enumerated next. The chapter ends by demonstrating the prop-
erties of the representation, and presenting a method for pose detection using statistical
Chapter 1. Introduction 6
modeling within the space. This chapter is crucial to the remainder of the thesis, because
all of the techniques developed later depend upon the Motion Curve representation.
Several new results in motion interpolation are presented in chapter 4. The first is
a simple method for multi-way interpolation. Next, we demonstrate linear interpolation
in the Motion Curve space, and compare the results to the standard spherical linear
interpolation result. We also present a method for preserving the appearance of dynamics
when extrapolating through gaps between motion clips. The methods introduced in this
chapter greatly simplify several very important cases of the pose interpolation problem.
In chapter 5, several families of motion editing geometric operations are introduced.
First, a method for finding bounded mean poses is presented. This method is then used
to develop a series of operations based upon scaling and translation which can used to
change the character of regions of motion clips. Examples of edited clips are presented.
In addition, several operations are presented that lack artistic usefulness, but help to
flesh out the space. Finally, the filtering of Motion Curves is discussed, and a wavelet
decomposition model is built. The operators described in this chapter provide enough
functionality for a highly expressive motion editing platform.
An efficient search algorithm for unsegmented motion clips is presented in Chapter
6. This search algorithm finds the regions in a long database clip that are most similar
to a short query clip. The components of the algorithm are first presented in isolation,
then the performance of the resulting system are evaluated through experimentation.
The example-based search algorithm presented in this chapter is useful in its own right,
and is a powerful enhancement to the motion editing platform described in the previous
chapter.
Chapter 7 presents the future work stimulated by this thesis, and draws conclusions
from the results presented in previous chapters.
Chapter 2
Background
In this chapter, we survey the state of the art in animation representations. We begin
by laying down the fundamentals of how poses are stored and manipulated in modern
works. This leads into a discussion of how motion data is represented for editing in both
manual and automatic contexts. Automatic editing contexts often include an element
of pose-based segmentation or recognition, so we outline these areas as well. Finally, we
discuss high dimensional data search techniques, which provides a background for our
work on motion searching.
2.1 Representing Poses
In this thesis, we describe a pose as the instantaneous configuration of an articulated
figure. We only consider the figure’s spatial position- poses are regarded outside of time.
In this section, we describe the most commonly used pose representation in detail, and
briefly discuss other models.
7
Chapter 2. Background 8
2.1.1 Skeletal Animation
Human character animation, when rendered both in real-time or off-line, is usually im-
plemented using a hierarchical skeletal model. In such a model, the body is divided into
rigid sections, called bones, that roughly correspond to the character’s skeleton. These
bones are arranged hierarchically, with a parent-child relationship forming a joint. The
orientation of each joint can be represented as a local rotation matrix, and each bone
can be represented as a rigid translation. Motion can be introduced to the system by
changing the matrices over time. In general, the joint transformation can be any combi-
nation of transformations, although many systems make the simplifying assumption that
all joints are purely rotational.
The character can be posed by specifying rotational values at the joints. Joint limits
derived from anatomical data are often enforced to prevent the skeleton from assuming
unrealistic positions, although these do nothing to limit self-intersection or balance con-
straints. Joints may be constrained to only allow movement along certain axes. Each of
these axes is referred to as a degree of freedom. A pose is fully specified by a complete
listing of all of its degrees of freedom. Often, some elements of the global position and
orientation of the root bone (usually the pelvis) is also included in the definition of the
pose.
The mathematics of the transformation are quite simple. Consider the joint and bone
hierarchy depicted in figure 2.1.1. Each bone has a translation matrix associated with
it, denoted Tx. Rotational joint transformations are named after their child joints, and
denoted Rx. The root transform, which can be any combination of rotations and trans-
lations, is Mroot. To express the position P of the far end of a bone in world coordinates,
it is only required to concatenate the transformations. The full transformation for the
tip of each bone can be expressed as
Chapter 2. Background 9
T1
T2
T3
T4
R2
R1
R3R4
P1
P2
P3
P4
M1
Figure 2.1: The transformation chain of a skeletal hierarchy.
P1 = T1R1Mr (2.1)
P2 = T2R2P1 (2.2)
P3 = T3R3P2 (2.3)
P4 = T4R4P2 (2.4)
These transformations can be used directly to position graphical representations of
the bones, or the transformation chain can be used to drive a linear-blend skinning scheme
(as will be discussed later). The application prototypes developed for this thesis use a
simple rigid body part graphical model, for simplicity.
This model contains many simplifications. Often many fewer bones are used than
exist in an actual skeleton. For example, the human spine has 26 vertebrae. The default
skeleton used by the Vicon 9 motion capture system has only 3 bones in its pelvis-to-
head chain. Joints are commonly simplified in terms of allowable axes of rotations. The
translational effects of stretched tendons and soft tissue is ignored when the translations
are excluded from the joint transform. The assumption that bones are rigid is also
Chapter 2. Background 10
suspect, as real bones exhibit surprising flexibility under load.
2.1.2 Driving a Mesh
Skeletons provide a fast and convenient method for representing the motion of an articu-
lated figure, but are not attractive when rendered. The eventual goal with most character
animation is to deform a surface model. The underlying motion representation discussed
in the previous section is often used to drive such a deformation. Given a complex enough
surface model and deformation method, the results can look quite good. With greater
artistic expectations, however, comes a requirement for more realistic motions. In [31]
Hodgins et al. present experimental results that suggest that people are more able to
spot differences in motions when they are expressed though a polygonal surface model,
rather than though a stick figure.
One of the simplest methods for deforming a mesh by a skeleton is to use linear blend
skinning, which maps each vertex to one or more of the figure’s joints with a set of real-
valued weights. This leads to the points being deformed by a linear combination of their
parent transforms [39]. This technique is conceptually simple, and can be implemented
in graphics hardware. As such, it is often used in games, or similar real-time applications.
While linear blend skinning provides a fast solution for on-line applications, it introduces
unsightly artifacts to the mesh. For this reason, more complicated models are often used
in off-line animation.
2.1.3 Other Pose Representations
Not all researchers use a skeletal animation system. Skeletal animation, even when
paired with a skinning algorithm, is a simplification that does not accurately reflect the
deformations of a flexing subject’s surface.
In [1], Alexa and Muller present a PCA-based vertex representation for time-varying
geometry. They perform PCA on an animation represented by a collection of keyframe
Chapter 2. Background 11
meshes with isomorphic vertex-edge topology. The use of PCA has several benefits, but
the motivating factor for using it in this case is dimensionality reduction. In addition, it
also facilitated a mesh-correspondence algorithm to transfer animations between similar
meshes.
Kovar et al use a skeletal animation system with a point-based pose distance metric
in an ongoing series of papers [45, 44, 43]. In order to compute a distance between
two poses, a low-resolution mesh is deformed to the current pose. The metric is then
based upon a squared sum of distances between corresponding mesh vertices. The root
transformation that cancels the difference between the two poses is found by a closed
form minimization.
In [46], Kulpa et al. describe a motion representation that is independent of character
morphology and which encodes the constraints in the motion itself. This allows of the
easy transfer of motion between different characters, and facilitates the enforcement of
spacetime constraints.
2.2 PCA
Principal Components Analysis is a statistical technique that is widely used for dimen-
sionality reduction [8]. The result of performing PCA on a given dataset is infect vector
space with the same dimensionality. Each axis in the space represents a principal com-
ponent vector. Any point in the space is thus a weighted combination of the principal
components. If the principal components are ordered according to the amount of variance
that they describe in the original dataset, the variances typically show an exponential
drop-off. It is this property that admits dimensionality reduction: a full data point can
be represented with a predictable degree of fidelity by using some smaller subset of its
PC coordinates.
The standard method for performing PCA on a set of n d-dimensional points is to
Chapter 2. Background 12
first determine the sample mean, and subtract it from the data set. Next, the covariance
matrix of the points is found. An eigenanalysis is then performed on the covariance ma-
trix, yielding d eigenvectors and eigenvalues. The eigenvectors, which are orthogonal by
virtue of the diagonal originating matrix, form the basis of the PCA space. d-dimensional
can be transformed into the PCA space by multiplying them by the basis matrix. If the
original data exhibits a low-dimensional linear structure (such as lying about an embed-
ded plane), further data that conforms to the same structure, when projected into the
space, can be represented using fewer than the full set of basis vectors with minimal loss
of fidelity. Full-dimensional points are reconstructed by multiplying the projected points
by the basis vectors, and re-adding the original sample mean.
An interesting twist on standard PCA is weighted PCA. Skocaj and Leonardis present
a framework for wPCA in [63]. Working within a vision context, they seek to construct a
PCA model of a video stream. They apply temporal and spatial weights to pixels in the
video, denoting their relevance to the model. For example, occlusions in the camera’s
field of view can be masked out, and periods of bad lighting or focus can be ignored.
2.3 Creating Motion
Once we have a way to represent poses, we can consider ways to generate them. Currently,
motion data is quite expensive to acquire, compared to other forms of multimedia. For
example, high quality images can be taken with consumer-grade cameras. Collecting
motion data requires either considerable technical and artistic expertise or specialized
hardware.
2.3.1 Keyframing
The most common way to generate motion data is to meticulously build it by hand. In
production studios, animators often use software packages like Alias’ Maya, or Discreet’s
Chapter 2. Background 13
3D Studio Max. While techniques such as inverse kinemetics and procedural animation
help to reduce the workload, most time is spent setting up key frames. Key framing is
a concept borrowed from traditional animation, in which a lead animator often draws
only the most important frames in a sequence. Other animators then proceed to draw
the “in between” frames. In computer animation, interpolation takes the place of the “in
betweeners”.
2.3.2 Rotoscoping
Rotoscoping is an animation technique that results in extremely life-like motion, because
it is in fact drawn from live motion. The desired motion is first recorded on film or
video, resulting in a series of frames. Drawings are then done over each frame, using the
captured images as a reference. The process of rotoscoping can be used as a time-saving
shortcut to producing traditional-looking animation, or as a means to creating stylized
animations. An example of a film that did the former is Disney’s Snow White, and an
example of film that did the latter is Linklater’s Waking Life.
2.3.3 Motion Capture
An alternative to keyframing is to use motion capture. Motion capture systems use
various techniques to digitize the movements of an actor. Where rotoscoping generally
only recovers the two-dimensional projection of the position of the actor’s body from the
perspective of the camera, motion capture recovers a fully three-dimensional representa-
tion of the actor’s pose. For this thesis work, we had access to a Vicon 9 motion capture
system. The Vicon 9 is vision based: the actor wears special reflective markers, which are
viewed by an array of cameras. Given enough cameras to avoid self-occlusion, the loca-
tions of the markers can be found via computer vision techniques. Software provided by
the manufacturer can be used to fit the reconstructed marker positions to an underlying
skeletal model, and estimate the joint angles. Motion capture is useful for creating large
Chapter 2. Background 14
volumes of realistic motion data quickly and easily, but it has several limitations. First,
the equipment involved is expensive and awkward to use. Secondly, the resultant motion
is limited to the realm of the possible. In order to get animation of superhuman feats,
post-processing with traditional animation tools is required. Similarly, motion capture
is hard to implement for non-human animal subjects, and impossible to implement for
imaginary subjects.
2.3.4 Simulation
As the computing power available to animators grows, simulation is becoming a more
feasible option for generating certain kinds of character motion. A good general intro-
duction to the concepts behind numerical simulation is the Siggraph 1997 course note
package prepared by Baraff and Witkin [4]. The notes start with a review of differntial
equations, and work their way up to rigid body dynamics and constrained dynamics, two
subjects crucial for physical character animation.
Most work in physical character simulation focuses on specific behaviours or aspects
of motion. A good example of this is the controller-based work of Yang et al. [71]
A swimming character is intimately tied to its environment through full-body contact
with a viscous medium, so simulation works well to add the subtle interactions that an
animator might miss. A further example is the work of Hodgins et al. that deals with
animating human athletics [32]. In this case, specific motions that depend upon balance
or ballistics were simulated with a high degree of verisimilitude. Work on composable
controllers by Faloutsos et al. provides a framework for switching between specialized
controllers during a simulation to allow a simulated agent a wider repetoire [20]. This idea
was further explored in [21]. A mixture of kinematic animation and physical simulation
was used by Shapiro et al. in [60], where they implemented a supervisory controller
similar to Faloutsos’ which switched between animation methods depending upon the
circumstances in the scene.
Chapter 2. Background 15
Procedural controllers can also be used to drive kinematic animation. In [66], Sun
and Metaxas present a layered controller that uses a novel interpolation technique to
synthesize walking motion from database examples. Here, the plausibility of the resultant
motion is maintained through heuristics and the sample-remix nature of the data, rather
than from a physically correct simulation. In [55], Neff and Fiume present another use
of kinematic controllers. In this case, they use a heuristic approximation of balance
(amongst other things) to increase the expressive power of an IK solver. Taking the
complimentary approach in [54], they used dynamics simulation to increase the expressive
range of PD controllers.
An interesting use of simulation is presented in [73]. In this work, Zordan and Hodgins
use motion capture data and IK techniques to drive very stiff controllers in a physical
simulation. When a contact occurs in the animation (such as a boxer getting punched),
the controllers are loosened to allow the dynamics to have a greater effect on the overall
motion. This technique allows for motion capture reuse, but maintains a degree of
interactivity.
2.3.5 Digital Puppetry
Techniques from puppetry have successfully been used to create nuanced motion perfor-
mances. Puppetry itself is an ancient art, but even the concept of remapping a person’s
movements to an exterior manipulator is not new. For example, Heinlein provided the
intellectual groundwork for telerobotics in his 1940 novella Waldo. It is a small step to
move from remapping one’s degrees of freedom to a robotic manipulator to remapping
them to a virtual character.
Remapping motions from one character to another is a common problem in computer
graphics, and a fundamental issue in virtual puppetry. One of the first papers to attempt
to solve this problem was Gleicher’s work, [27], which transfers motion between characters
with the same skeletal structure, but different limb lengths. Key features of the input
Chapter 2. Background 16
motion, such as foot falls or interactions with external objects, are specified as constraints,
and the new motion is found via a non-linear optimization, using the input motion as a
starting point. Shin et al. present a framework for filtering real-time motion capture input
and remappping it to a virtual character [61]. This remapping is guided by constraints
which are deemed as having “dynamic importance”, which depends upon the context of
the motion itself. Shin’s system operates in real time, so the types of constraints that
can be specified are more limited than those in Gleicher’s off-line system, which is free
to optimize over all of spacetime.
Many characters that an animator/puppeter might want to control will have non-
standard body configurations. In [19], Dontcheva et al present a system that allows the
animator to interactively animate characters by manipulating motion-captured widgets
made from Tinker Toys. Mappings between the widgets and the character’s degrees of
freedom are built on the fly by the animator imitating the character’s movements. Since
the animator can only operate a few degrees of freedom at one time, complex animations
are built in multiple passes. This provides an intuitive, play-like interface that allows for
impressive results in a short amount of time.
In [48], Laszlo et al. take a different approach to the problem of mapping input DOFs
to performance DOFs. In this work, several different control schemes are presented that
relate mouse movements and keypresses to simulated motor controllers. The figures being
animated are physically simulated in real-time. The simulation provides a subtlety to the
motion beyond the raw data provided by an input device, and with some training, inter-
esting performances are possible. This process presents a highly interactive environment
that is closer to performance art than traditional animation.
Chapter 2. Background 17
2.4 Motion Processing
Most works take some variation of the skeletal representation developed in the previous
sections for granted. There is much more diversity of opinion when it comes to repre-
senting motion. We present an overview here by dividing the various approaches into
two camps: those that treat motion as a signal, and those that treat it as a progression
through a series of states.
2.4.1 Signal Based Techniques
Bruderlin and Williams’ “Motion Signal Processing” [12] provides a good introduction
to the signal theoretic approach to motion. In this paper, they introduce the concept
of multiresolution filtering for motion data. By applying band-pass filters to the indi-
vidual degrees of freedom of recorded motions, they are able to change the character of
the motions. They also present a multi-target interpolation technique that can be used
in conjunction with the dynamic time warp algorithm (see below) to produce blended
animations. They introduce motion displacement mapping, wherein a motion signals are
locally altered to resemble example motions. The paper uses a straightforward Euler
angle parameterization, which limits its applicability to complex joints such as the shoul-
der. The interpolation techniques that they present, in particular, would be fraught with
rotation order artifacts that would not be present if they were to have used quaternions.
Li et al. present a novel signal-based motion editing technique in [49]. The central
conceit of this work is that the structure of a motion comes from its mid and low fre-
quency components, while its character (or “texture” in the paper) is expressed in higher
frequencies. By decomposing motion signals into Laplacian triangles, it is possible to
transfer “texture” from one example to another through a pattern-matching algorithm.
With this technique, motions that have been coarsely keyframed can be automatically
updated with detail from a previously completed (or motion captured) example.
Chapter 2. Background 18
Gleicher offers another perspective on the signal view of motions in [26]. In this work,
motions are modified through a displacement mapping procedure. The displacement map
is found through a constrained optimization, where the constraints are defined by the
animator. For example, the animator might specify that the figure’s hand must follow
a certain trajectory. Using automatic differentiation, the Jacobian of world-space con-
straint parameters are expressed with respect to the motion’s native parameterization
(Euler angle joint positions, in the paper). The optimization minimizes the weighted
magnitude of the displacement vector. The spacetime framework is used again in Gle-
icher’s work on retargetting [27].
In “Verbs and Adverbs” [58], Rose et al. describe a technique for creating a space that
supports parameterized interpolation and extrapolation. Through manual segmentation
and mark-up, example motions are clustered into ‘verbs’ and described with subjective
‘adverbs’. A good example of this taxonomy would be two clips labeled as walking, with
a real-valued ‘jauntyness’ parameter. Given several parameters, a coordinate system can
be constructed to hold all examples of a particular verb. Interpolation values can then
be found using radial basis functions. New motions are synthesized by applying these
interpolation values directly to the motions’ individual Euler-angle valued degrees of
freedom. The paper also describes how to create ‘verb graphs’, which define transitions
between different verbs. A time warping algorithm, along with spacetime constraints are
used to construct the transitions between the verbs.
2.4.2 State Based Techniques
An alternative to looking at motion as a collection of signals is to view it as a discrete
collection of poses. Each pose is represented by a particular configuration of the feature
vector (such as joint angles or vertex positions).
A series of papers by Kovar et al. play upon this idea [45, 44, 43]. These papers all
build distance tables between pairs of sampled motions. The table is then operated upon
Chapter 2. Background 19
to get various effects. The important distinction here is that the pose is the fundamental
entity. In Kovar et al.’s “Motion Graphs” [45], new motions are synthesized by finding
suitable sequences of poses from an example motion, much in the same way that video
frames are processed in [59].
Works that perform PCA upon motion data usually take a state-based perspective.
In [25], entire cycles walking sequences are collected into vectors, and PCA is performed
at a very high level. Points in the resulting space represent cycles. By projecting several
parameterized examples into the space, it is possible to build axes. Points can be then
be sampled along the axes in order to interpolate or extrapolate the original parameter-
ization.
Brand and Hertzmann’s Style Machines [11] used PCA as well, but takes a more
pose-centric view. The main thrust of this paper was an application of Hidden Markov
Models to pose data. PCA was used to reduce the dimensionality of the dataset to make
the HMM training feasible. Given examples of similar movements performed in different
styles, related HMMs are trained, which indicate which portions of the two motions are
similar. The pairing of HMM states allows an animation to be synthesized which moves
between styles at will, but maintains a consistent choreographic structure.
A more recent work by Grochow et al. [29] fits a Scaled Gaussian Process Latent
Variable Model to pose data in order to construct a map of the likelihood of poses in a
given movement. This map can then be used as a part of the objective function of an
optimization process in order to perform inverse kinematics that conform to the ‘style’
of the constructing motion data. Again, the base unit of currency is the pose.
A novel use of PCA is found in the work of Barbic et al. on segmentation [5]. A space
is built from the raw quaternion representation of a motion. The inherent dimensionality
of the motion is calculated over time. The authors contend that a gross segmentation can
be made at the zero crossings of the derivative of the inherent dimensionality. They also
experiment with fitting a Gaussian Mixtures Model to the projected data, and segmenting
Chapter 2. Background 20
based upon the resulting clusters. While the segmentation algorithms presented in this
paper work well, the motion representation used is limited in that interpolation is not
possible in the constructed PCA space. This precludes it from being used for synthesis.
2.5 Motion Segmentation and Recognition
Motion capture data is most often captured in long takes. Individual motions, if they
are to be used in a production, or as a part of an interactive application, must then
be segmented out of the original sequence. This is a repetitive and boring task, so a
system for automatically segementing the data would be a boon. A related task is mo-
tion recognition. If motion data is to be used for real-time interaction, a method for
interpreting the motion as it occurs must be implemented. Since motion capture equip-
ment is not yet widespread, most of the relevant work in this field comes from the vision
field. While a video of a person’s motions and motion capture data representing those
motions differ greatly in representation, they describe the same underlying phenomena.
In the research, motion capture is often viewed as a way to perform research on advanced
interaction techniques, working under the assumption that the same functionality will
become common vision techniques in the future.
The boundaries between one motion and other motions are often ambiguous. A
person may express more than one gesture at a time with different parts of their body - for
example juggling while walking. For this reason, it is somewhat easier to segment motions
that have a sense of structure, such as sports, dance, or martial arts. An example of a
system designed for dance is [38]. In this paper, Kahol et al. derive velocity, acceleration,
and positional data for various body segments from motion capture data, and aggregate
the results into an observation vector. They then trained a Bayesian classifier with
manual segmentations provided by several different human choreographers. The trained
system was able to correctly predict 93 percent of the gesture boundaries produced by
Chapter 2. Background 21
the five choreographers when presented with novel motion data. This system emphasises
the subjective nature of segmentation. Even within a structured context, the various
human observers produced different results. That the system was able to predict each
observer’s style is most impressive.
Bobick and Wilson present a technique for recognizing motions in [10]. In this work,
motions are represented as trajectories, regardless of their source, such as from a 2D
mouse, a motion cpatured point, or even the PCA projection of an image sequence. The
method hinges upon the notion of “state”: a gesture is defined as an ordered progression
through several regions of configuration space. A prototype trajectory is built from one
or more examples, and states are found via a clustering algorithm. When the trained
system is presented with a novel trajectory, a dynamic programming algorithm is used to
estimate its support from each gesture prototype. Overall, this technqiue is conceptually
similar to Hidden Markov Models. The authors state that the most important distinction
is that this method can build a prototype from a single example motion, whereas Hidden
Markov models require a larger training set.
Hidden Markov models [56] provide a method for modelling and predicting the be-
haviour of a time varying system. The model assumes that the system can be approxi-
mated by a stochastic state machine. The internal (“hidden”) states and state-to-state
transition probabilities of the system are determined solely through the observation of its
output. In motion terms, the states found by such a system would be gestures or actions
that a figure can exhibit, and the output would be an observation of the figure in some
form (such as joint angles, or video).
Many papers have been written applying HMMs to various gesture recognition tasks.
In [70], Wilson and Bobick describe a vision-based system that trains an HMM online,
effectively learning new gestures on the fly. Starner and Pentland use HMMs to track
American sign language through video [64]. In vision-based techniques, the feature vector
used is of utmost importance. Campbell et al. explore the effectiveness of various feature
Chapter 2. Background 22
vectors for use recognising Tai Chi movements using an HMM in [13]. Beck created a
vision-based system for Tai Chi training using HMMs in [6].
An interesting image-processing based motion recognition framework is presented by
Davis and Bobick in [18]. This system makes extensive use of Motion History Images
(MHIs). MHIs are produced by extracting binary segmentations of the foreground figures
in a sequence of video frames. These binary images are then superimposed over each
other, with an intensity keyed to their frame index. The resulting images are quite
distinctive (and interesting artistically), and are amenable to standard image recognition
techniques. MHIs were used extensively in [23].
2.6 High Dimensional Data Search Techniques
In chapter 6, we develop a method to search a long motion for segments of high similarity
to a short query segment. Searching within a set of loosely-ordered, high-dimensional
data points is a difficult task, and an area of active research. In this section we will first
discuss some general strategies for high dimensional searching, then present some relevant
results in the specific field of motion searching. We finish by presenting details about the
dynamic timewarping algorithm - a technique that we use in our search algorithm.
One way to search high-dimensional data, such as pictures, sound, or motion capture
data, is through markup. ’Markup’ refers to textual annotations that are added to
data. Search is then done by proxy on the text. The MPEG-7 standard [50] describes
a framework for multimedia mark-up. The standard covers multiple media types, and
can be extended to cover others. Features in the media stream are assigned descriptors
according to a schema based upon the media type. These desctiptors can then be queried
to navigate the media stream.
One thing that the MPEG-7 standard does not address is how to apply the markup
in the first place. Certain features can be extracted from a stream automatically, but
Chapter 2. Background 23
higher level features that require an understanding of the stream, or outside knowledge,
must be found manually. In chapter 7 we suggest (as future work) a method to semi-
automatically apply subjective markup to motion capture data using a search algorithm.
More sophisticated results may be realized with modern machine learning techniques.
If meta-data is not used in a search, it can be difficult to phrase the query. One
strategy that has been successful for several types of data is query-by-example. In a QBE
system, the user provides a sample data point, and the system returns other points that
it deems similar. A survey of such systems is provided in [72].
The key component of a similarity-based search algorithm is a well-defined distance
measure between the data points that are vistited. Unfortunately, efficient and robust
distance measures are hard to design for many types of media. Salesin and Finkelstein
present a wavelet-based search method for static images in [36]. Their method transforms
an entire image into a robust and much more compact signature. The signatures that
they define are robust enough that the user can specify a very rough version of the image
as a search key. This lends itself to an intuitive sketching interface.
Using a signature for searching works well for discrete entities, like whole images, but
is not applicable to motion data, where potential matches take the form of subintervals
within a much larger time-series. Fortunately, there are several techniques for finding
similarities in sequences. Hidden Markov Models, which were introduced above in our
discussion on segmentation, are a good candidate. In [68], Valivelli et al. use HMMs to
implement an example-based search for audio data. A model of “uninteresting” sound
is built from a large library of noises that do not match the query clip. This model is
then used in conjunction with a model of the query to find matching regions in the input
stream.
Motion data is still quite scarce, so few authors have addressed the task of searching
through it. As motion data becomes more prevalent, however, research in this direction
is starting to appear. In [44], Kovar and Gleicher create an exhaustive table of the
Chapter 2. Background 24
inter-pose difference between two motion sequences of arbitrary length. With some post-
processing, this table can be used to quickly find matches for segments from one motion
in the other. While useful for certain applications, such as the parametric extraction task
which is the major focus of their paper, the long pre-processing time precludes it from
use with novel or real-time queries.
Dynamic time warping is a technique that is traditionally associated with speech
recognition, but is often applied to other signals as well. DTW defines a non-linear
correspondence between two signals, effectively stretching and compressing one of them
to match the other. The algorithm is computationally expensive, and is solved using
dynamic programming [7]. Conceptually, the two signals are arranged along the axes of
a two dimensional matrix. This matrix is filled with the pair-wise sample distances of the
two signals. Starting from any index in the matrix, an optimal alignment can be found
by accumulating the minimum distance forward and backward to the boundaries of the
matrix. In most situations, the high computational cost of the algorithm stems from the
filling of the distance table.
Bruderlin and Williams applied DTW to animation parameters in [12], which we
have previously mentioned. Kovar and Gleicher have used it to align motion clips before
interpolation [43], and there is active research within the data mining community to
improve upon the basic algorithm [16, 41].
Not all motion matching systems use DTW to align signals. In [14], Cardle et al.
present a system for motion searches based upon the Longest Common Subsequence-
based multidimensional trajectory comparison measure proposed by Gunopulos et al.
[69]. Keogh et al. use uniform scaling to match signals globally in [40], avoiding the
degenerate over-fit warps to which DTW is prone.
Chapter 2. Background 25
2.7 Summary
In this chapter we have presented the background materials that define the context of
our work. We began by discussing various methods of representing poses, and describing
the dominant skeletal hierarchy method upon which our work is based in detail. We then
explored various methods of creating motion data from which poses can be extracted.
Next, we considered two perspectives on the problem of processing motion data once it
has been created. We finished the chapter with brief overviews of motion segmentation,
recognition, and search techniques. This chapter presented general background materials
and papers related to the thesis. More specific references for specific techniques are
provided in context through the remainder of the document. We shall cite other work as
we develop the technical material later, particularly the background work related to the
creation of Motion Curves in Chapter 3.
Chapter 3
Motion Representation
In this chapter we introduce motion curve space, a representational framework for pose
and motion data. We begin by highlighting the problems with other pose representations
that instructed the development of motion curve space. Next, we explain the steps that
must be taken to construct a motion curve space from example data. We then briefly
describe the features of the space, foreshadowing the detailed descriptions in the following
chapters. A visualization method for the motion curve representation, which is used in
almost all of the prototype applications that were developed for this thesis, is described
next. Finally, we present a method for building statistical models of poses in motion
curve space in order to recognize those poses within novel motion clips.
3.1 The Trouble with Motion Data
In its raw form, motion data is not easy to work with. Much of the difficulty stems from
the lack of an inherent distance function between poses. Researchers have used many
different approaches in their own motion work, such as the deformed point-cloud method
described be Kovar et al. [45], or the weighted sum of quaternion distances proposed
by Johnson [37]. We present a weighted-PCA based representation for poses that has a
Euclidean distance metric. The simple distance metric allows for the direct application of
26
Chapter 3. Motion Representation 27
standard data processing techniques. Being PCA-based, our representation also benefits
from having a coarse-to-fine interpretation, which may allow for less accurate, but quicker
distance calculations.
3.2 Motion Curve Space
A motion curve space is constructed using a motion clip. The choice of clip is very
important, because the joint angle correlations that it contains are reflected in the distri-
bution of the axes in the resulting space. The clip should be long explore the full range
of motion for each joint in the figure. If a joint is not fully exercised in the example clip,
certain valid poses may fall outside of the span of the space. Since it is impossible to
represent out-of-span poses accurately, the use of motion curve space becomes lossy. We
will analyze the error inherent in our representation after we describe how to construct
a space.
When working with motion capture data, we usually use a space created from the
range of motion test data that was used to calibrate the capture system. This ensures
that the maximal amount of variance is introduced during the construction of the space.
Sometimes such clips are not available. For example, in Chapter 6, we use synthetic
motion data created using dynamic simulation and controllers. The controllers were very
simple, and incapable of fully exploring the space of possible poses. In this case, we used
the data clip that we were operating on to create the space. The scarce input data led
to a space with a small span, and thus less expressive power. It was acceptable for the
purposes of the search algorithm that we were testing, however, because we were not
trying to create and represent new poses.
Chapter 3. Motion Representation 28
3.2.1 Constructing the space
Creating a motion curve space is a two-step process. The first step is to linearize the
quaternions of the example clip, and put the data into a matrix form. The second step
is to apply the weighted PCA algorithm, and orthogonalize the resulting vectors. Each
step will now be explained in detail.
Linearizing a unit quaternion brings it from a four-element imaginary vector to a
three element real vector. The advantage of doing so is that one does not have to worry
about keeping the vector normalized: all possible vectors in R3 correspond to a valid
rotation. Grassia explains the procedure in [28], and we outline it here for completeness.
A unit quaternion can be expressed in Euler form as
QT = en θ2 , (3.1)
where n is the axis of rotation, and θ is the angle of rotation. The quaternion is linearized
by taking its logarithm in this form:
log QT = log en θ2
= n θ2.
Computationally, the mapping from vector ~v to quaternion [ w x y z ] is implemented
as:
θ = |~v|
w = cos θ2
[xyz] = ~vsin θ
2
θ.
Chapter 3. Motion Representation 29
And the reverse operation is:
m = 2 arccos w|[xyz]|
~v = m [xyz] .
Derivations for these operations can be found in [37] and [28].
The linearization of a quaternion is performed with respect to some reference orienta-
tion. This is done by ‘rotating out’ the reference via quaternion multiplication before the
log is taken. The choice of reference orientation is very important, because the accuracy of
an interpolation between two linearized orientations is reduced with their distance from
the reference. The orientations that show up in hierarchical skeletal models are often
based upon actual skeletal joints. Real joints usually have tightly constrained bounds,
and in most natural cases, the movement will tend to fall into an even tighter comfort-
able range. We exploit these features and use each joint’s sample mean orientation as its
reference. The procedure for finding an estimate for the mean of a set of quaternions is
discussed in detail in section 5.2. Each linearized quaternion in a pose is concatenated
to create a vector of length 3DOF. These vectors will be the observations in the WPCA
algorithm.
We can use the pose vectors created during the linearization step to construct a PCA
space. Such a space, however, will not take into account the hierarchical nature of the
pose data. Perceptually speaking, a few degrees of change in the angle of a shoulder
changes the shape of a pose much more than a similar change in a toe. In fact, ‘noisy
toes’ can threaten to dominate the PCA space, and lead to an inefficient distribution of
the motion’s degrees of freedom over the principal components. This in turn increases
the number of dimensions that must be used to produce acceptable looking motion.
In order to prevent this, we use weighted PCA. Skocaj and Leonardis present a wPCA
formulation for vision applications, wherein weights can be applied to both subsections
Chapter 3. Motion Representation 30
of individual frames, and to entire frames [63]. We use only the former, and apply a
real-valued weight to each joint. The specific weights used can be manipulated to change
the properties of the resulting space, as we will show later. In the general case, we
use weights that are derived from an approximation of the relative amount of body mass
that is influenced by the movement of each joint. Pseudocode for the wPCA construction
algorithm is given in algorithm 1.
Algorithm 1 Creating the wPCA space
Ensure: X ← linearized ROM data
Find the mean pose
for all samples in ROM do
for all DOF do
Rotate out the mean quaternion
Linearize the result
Accumulate in matrix X
end for
end for
Ensure: X = wU × A
U ← random values
its← 0
reconError ← infinity
while (its < maxIts)∧
(reconError < ε) do
E Step: QR Solve for projection A
M Step: LU Solve for space vectors U
end while
return the orthogonalized columns of U as the PCs
We construct the motion curve space off-line using an offline application. The princi-
Chapter 3. Motion Representation 31
pal components, along with their corresponding eigenvalues, and joint means, are saved to
a file. Any number of these files, each built with different weightings or reference datasets,
can be used during an session with the interactive programs that we will describe over
the course of this document.
3.2.2 Projections and Unprojections
Motions can be expressed within a space by projecting them into it. They can be taken
out of the space (after modification, for example) through the process of unprojection.
Before projection, a pose must be linearized. This is done in the same way as it was
for the construction of the space, except the stored joint means are used. The projection
itself is then a matter of a simple vector-matrix multiplication. The projected coordinates
p can be found by multiplying the pose vector v by a matrix B, which has for rows the
space’s bases:
~p = ~vB. (3.2)
Unprojection is equally simple. First, the opposite multiplication is made:
~v = ~pB−1. (3.3)
By construction, matrix B is orthogonal, so B−1 is simply BT . Given v, a quaternion
representation of the pose can be found by taking the log map (equation 3.2).
The matrix multiplications in these operations are readily optimized. Multiple poses
can be concatenated into matrices for batch processing in both directions. The log and
exponential mappings are more expensive, since they involve the evaluation of square
roots and trigonometric functions.
A single pose projects to a point in high-dimensional motion curve space. A sampled
motion that is made up of multiple sequential poses projects into a time-ordered series
Chapter 3. Motion Representation 32
of points. As we shall see in later sections, this representation lends itself to a geometric
interpretation. The fact that the lower dimensions of the projection can be visualized
geometrically reinforces the metaphor.
3.3 Space Characteristics
Various operations that are complex to perform with the original quaternion-based mo-
tion representation are greatly simplified using the wPCA representation. In this section,
we discuss several of the characteristics of motion curve space that make it useful for
working with motion.
3.3.1 Pose Distance Metric
The most significant feature of motion curve space is that it has an implicitly defined
distance metric. Since it is by construction a real vector space, the L2 norm can be used
as a metric. In practice, however, we usually subject the space to an affine scaling before
applying the norm, to take into account the relative amount of variance captured in each
axis. If ~v = [v1, v2, ..., vn] is a vector containing the eigenvalues from the orthogonalization
step of the space construction, the distance metric for comparing poses p and q an n
dimensions space is written as:
√√√√ n∑i=0
v2i (pi − qi)2 (3.4)
It is sometimes advantageous to truncate the sum when evaluating the distance. This
estimates the high-dimensional deviation between poses by a lower-dimensional approxi-
mation. The relative distances of sets of points is not consistent under such a projection.
The frequency of such a projection error is reduced by the fact that in mnay cases
vi > vi+1 exponentially. Still, projection errors can creep up in certain situations, such
as the reduced-dimension Approximate Nearest Neighbour search described in chapter 6.
Chapter 3. Motion Representation 33
The weighting scheme used during the wPCA phase of space construction is reflected
in the distance metric. Movement in joints that were weighted heavily is represented
in the lower dimensions of the space, and thus have much higher ~v coefficients. An
animator thus has some control over the nature of the distance metric. By strategically
weighting different joints, it is possible to build spaces that have distance metrics suited
to specific tasks. For example, if an animator is working with walk cycles, s/he might
decide to weight the joints of the legs higher than those of the upper body. This will
cause two poses that have similar leg orientations and dissimilar arm orientations, to be
considered as closer together than two poses with similar arm orientations and dissimilar
leg orientations.
3.3.2 Dimensionality Reduction
One of the primary uses of PCA is to reduce the dimensionality of a dataset. By combin-
ing correlated axes, PCA allows for a data point to be represented by fewer coordinates
than in its natural form, with some loss in fidelity. For the size of data that we work
with (usually 57 degrees of freedom), we have found that this is not needed to attain
interactive manipulation rates with the techniques that we have developed. Since it does
not cost much, it is usually best to use the full set of bases for reconstruction. Artifacts
typically become noticeable on the 57 DOF dataset as soon as anything more than the
spurious DOFs have been removed. The essential character of the motion is retained
much longer, with most motions being recognizable with as few as 3-5 DOF, but the
fidelity is not acceptable for most applications. We do use dimensionality reduction,
combined with different weighting schemes, to target the pose distance metric and direct
the search algorithm developed in chapter 6. PCA also guarantees that the bases of the
motion curve space that we produce are orthogonal.
Given our representation, some dimensionality reduction is natural, however. A phys-
ical knee joint has only one degree of freedom, barring the bending that we are already
Chapter 3. Motion Representation 34
abstracting away in our model. Synthetic knee joints, such as from a physical simulation,
are even more likely to have information for only one degree of freedom. By representing
every orientation in our skeleton using a quaternion, we inflate the number of degrees of
freedom for the sake of consistency. Luckily, the wPCA procedure finds all of these spu-
rious degrees of freedom, and relegates them to the lowest-value principal components,
where they can be safely ignored.
3.3.3 Visualization
Motion, being time dependent, is challenging to visualize. One of the most compelling
features of motion curve space is that it lends itself to a natural visualization, which
presents the entire motion as a static entity outside of time.
We visualize motion curve space in three dimensions by displaying the lowest three
dimensions of the space. Poses can be rendered as points in the space. Sequential poses
from a motion can be joined using line segments (or even higher-order polynomials) to
reinforce the sense of continuity. We navigate the space using a mouse-dragging interface
similar to the one used in Maya. The camera is locked in a spherical coordinate system
built around a focal point. Left dragging the mouse orbits the view position about the
focal point (which is rendered as a small coordinate axis). Right dragging moves the
focal point and view-local ground plane, and middle-dragging moves the focal point on
the view plane. Scrolling the mouse wheel adjusts the camera’s distance to the focal
point. Figure 3.1 shows an example motion projection.
The appearance of a motion visualized using this system depends upon the content
of the principal components. The same motion, when viewed under projection into two
separate spaces, can appear drastically different. The features of the motion that are
reflected in the visualization are determined by which joints are controlled by the lowest
principal components - something that the animator can control indirectly through the
choice of weighting schemes during space construction. This is a beneficial feature - the
Chapter 3. Motion Representation 35
Figure 3.1: An example projection
visualization simplifies the data, while giving the animator the choice of what types of
things that s/he wants to see. In figure 3.2, we show several steps of a walking motion
projected into two spaces. The left hand space uses our standard weighting scheme, while
the other uses a scheme that is weighted heavily toward the leg joints. The phase structure
of the walk cycle is visible in both examples, because walking is a highly coordinated full-
body motion. For more localized motions, such as punching or tapping, the animator
may need to try using specialized weighting schemes to discover the motions’ structure.
3.3.4 Representational Error
As mentioned earlier, the projection of a pose into motion curve space is guaranteed to
be reversible if the pose was part of the dataset used to create the space, and all of the
space’s dimensions are used. In other cases, some error may be introduced. The amount
of error depends upon the rank of the projection matrix. Degenerate spaces, made from
motion clips that do not exercise every joint in the skeleton, cannot be made to represent
those joints that were not used. When using motion capture data, the rank will almost
Chapter 3. Motion Representation 36
Figure 3.2: Two steps of a walking motion projected into two different spaces
always be fully expressed, but this can be a problem when working with synthetic data.
In order to determine the effect of using different datasets for space construction,
we built several spaces using clips with different characteristics (the default weighting
scheme was used in each case):
• Full Range of Motion Test. This clip is a recording of the trial used to calibrate
the motion capture array. The actor starts in the T-pose, and then proceeds to
exercise each major joint in isolation. He finishes with some walking and stretches.
The total length of the clip is about 128 seconds.
• Truncated Range of Motion Test. This clip is 20 seconds, taken from the
middle of the range of motion test.
• T-pose. This clip is 3 seconds of the actor standing in the T-pose.
• Assorted Moves. This clip is approximately 70 seconds of the actor performing
various Aikido movements.
The reconstruction error was tested subjectively using a small application that allows
Chapter 3. Motion Representation 37
the user to view an animation alongside its reconstruction. The number of bases used
for the reconstruction is user-specified with a slider. Unsurprisingly, the full range of
motion test clip resulted in the space with the best properties. Motion reconstructed
with as little as 25 bases (out of 56) passed visual inspection, and no glaring artifacts
were present at any level of reconstruction. The T-pose and truncated ROM trials lead to
similar spaces - reconstruction with the full range of bases (minus the redundant DOFs)
was perfect, but the reconstructions did not degrade gracefully with reduced numbers of
bases. Using the assorted clip and a reduced number of bases caused a reconstruction
artifact resulting in contorted poses, but using the full set of bases fixed the problem.
The apparent robustness of the spaces (using the full number of bases) likely stems
from the random initialization of the base matrix during the wPCA procedure. Any full-
rank base matrix will produce a perfect reconstruction. The fact that there are several
redundant DOFs in our skeleton definition (since we are using quaternions to specify
all joints) reduces the rank required for a perfect reconstruction. In order to get good
reconstruction behaviour when using less than a full set of bases, the user should use
clips that exhibit a large range of motion when constructing a wPCA space. As we shall
see in chapter 4. the mean pose of the constructing clip will also affect the quality of
joint interpolations. In order to reduce artifacts, each joints’ mean should be as close as
possible to the interpolant joint orientations. Thus, the constructing clip should depict
natural motion, preferably reflecting the same range as the motion as the target motions
that will make use of the resulting space. A standard motion capture range of motion
test provides a good general case.
3.4 Pose Detection in Motion Curve Space
In this section we present a method for robustly detecting when an hierarchical skeleton
assumes previously modeled poses. Interestingly, this application is what lead to the
Chapter 3. Motion Representation 38
development of the Motion Curves representation. The original context for the task
was segmenting real-time motion captured movements for use in a sonification-based
physical training system. The representational issues involved with the segmenting tasks
proved more interesting than the training system, however. As the expressive power
of the Motion Curves representation became apparent (as will be seen in the following
chapters), this work took its current form.
We want to be able to determine when the subject driving the motion data has
entered a specific static pose. This pose may be the canonical T-Pose, a certain martial
arts stance, or any other static configuration. A naive implementation would be to use
the pose distance metric to compare the incoming poses to an example of the target pose,
and threshold the results. The result of such a scheme is shown figure 3.3.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Figure 3.3: Comparing the data to a single target pose. The horizontal axis is time, and
the vertical axis is similarity.
In this example, the motion clip depicts a performing random movements interspersed
with returns to the T-Pose. The naive approach does a remarkably good job of indicating
when the subject is in the T-pose - these regions in time are indicated by the plateaus
in the graph.
Chapter 3. Motion Representation 39
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Figure 3.4: Comparing the data to an average target pose
We can improve upon these results by taking the average of many examples of the
T-pose as the target, rather than one particular instance of the T-pose. The results of
this test are shown in figure 3.4. Note that the results are almost identical to the naive
case. This is because the particular pose chosen for that test was near the average point.
Measurement error with the motion capture system prevents exact skeletal configura-
tions from being repeated, but on a larger scale, it is very difficult for an actor to exactly
repeat a pose. A difference of millimeters in certain parts of the body will not usually
be detectable to the human observer, especially if the two poses are presented with an
intervening movement. Thus, instead of specifying one particular instance of a pose to
be the Platonic ideal, we will describe a distribution of valid poses. This will lead to a
much more robust comparison that allows dome deviation from the “ideal” pose.
A special motion capture trial was taken wherein the actor tried to explore all possible
variations of the ready pose. This data was projected into a trucated Motion Curves
space, yielding a set of three-dimensional coordinates. A Mixture of Gaussians model
was fit to the data using readily available software [15]. The result of the clustering is
shown in figure 3.5.
Chapter 3. Motion Representation 40
-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
X1
X2
X1
X2
-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
Figure 3.5: Two dimensional projection of the MoG model
Given a multidimensional MoG model, the log likelihood that a novel point belongs
to a particular Gaussian distribution can be evaluated by the following equation:
P (~x|~µ, Σ) =1
(2π)d2 |Σ| 12
e(− 12(~x−~µ)T Σ−1(~x−~µ))
ln(P (~x|~µ, σ)) = ln
(1
(2π)d2 |Σ| 12
e(− 12(~x−~µ)T Σ−1(~x−~µ))
)= −(~x− ~µ)T Σ−1(~x− ~µ)(
d
2ln(2π) +
1
2ln|Σ|).
Where ~µ and Σ are the Gaussian’s mean vector and covariance matrix, d is the
dimensionality of the MoG model, and ~x is the projected pose point.
A MoG model generally consists of more than one Gaussian distribution, along with
associated weights. The likelihood of ~x is calculated for each distribution, and the overall
likelihood is taken as the weighted sum. This formulation can be used within the same
framework as in our previous pose detection test. The result of such an experiment is
shown in figure 3.6. These results are very similar, because there was not much variance
Chapter 3. Motion Representation 41
in the T-pose trial. This makes sense: the T-pose is used as a reference in animation
because it is distictive and easy to assume. The standard weighting scheme that we use
heavily favours the shoulder and hip joints, and the space used for the MoG clustering
model was truncated at three dimensions, so only a very gross description of the pose
was retained. This is by design: we want a system that is robust to inconsequential
differences. Note that this technique’s sensitivity can be targeted, the importance of
various features of the pose are defined by the weighting scheme. If we wanted to detect
a pose entirely by the position of the right arm, this could be accomplished via a carful
tuning of the weights and the truncation point.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-2000
-1800
-1600
-1400
-1200
-1000
-800
-600
-400
-200
0
Figure 3.6: MoG model evaluation of motion trial (using a different dataset)
3.5 Summary
In this chapter we introduced motion curve space. We began by describing the steps
involved in constructing a motion curve space, and showed how motions can be put into
the space and taken out of it. Next, we explored the features of the space, establishing a
distance metric, discussing dimensionality reduction, presenting a method for visualizing
projected motions, and explaining how error can creep into the representation. Finally,
Chapter 3. Motion Representation 42
we discussed how the work came about, and presented a method for detecting poses in
motion curve space.
Chapter 4
Interpolation
Interpolation is a very important tool when working with most graphics entities, and
motion data is no exception. By blending between poses, it is possible to combine dis-
parate clips seamlessly, to portray a continuous range of some underlying characteristic,
or even to do inverse kinematics. In short, interpolation allows the animator to greatly
amplify the expressive power of his dataset very quickly. In this chapter we investigate
several techniques for interpolating across and between motion clips. These techniques
exploit the fact that every point in Motion Curves space fully specifies a valid pose. We
also present a case study, where we implement motion graphs by Kovar et al.[45]
4.1 Two-Pose Interpolation
As mentioned in chapter 3, interpolating multidimensional angular values is not straight-
forward. Directly interpolating angular values in an Euler representation often gives
unsatisfactory results. Interpolating between two angles is the same task as finding the
rotational path that one of the orientations must travel to match the other. The ro-
tational movement in the Eulerian case does not take the most direct route. Different
rotation orders will give different results.
This problem can be solved by using Quaternions. The spherical linear interpolation
43
Chapter 4. Interpolation 44
(SLERP) operation finds the shortest path for a rotation by carving out a geodesic in
SO(3) [28]. Unfortunately, SLERP is a relatively expensive operation. SLERP also
suffers from the fact that it does not generalize easily to more than two interpolants, as
we shall see in the next section.
As shown in [37], the formula for SLERPing two quaternions Q1 and Q2 is
slerp(Q1, Q2, t) = Q1(Q∗1Q2)
t (4.1)
Where the operator ∗ denotes taking the quaternion conjugate, and t varies from 0 to 1.
The usual method to interpolate between two poses is to apply SLERP at every
joint. We present an alternate method for pose interpolation that takes advantage of the
motion curve representation. Motion curve space is Euclidean1, and every point within
it corresponds to a fully specified pose. When projected into such a space, smooth body
motions look like continuous, sampled curves. Intuitively, then, it would seem that simple
Euclidean linear interpolation would provide a reasonable approximation of SLERP-based
pose interpolation. We call this interpolation method MC-LERP, for brevity.
MC-LERP performs an interpolation that is very similar to the log-quaternion linear
interpolation described by Grassia in [28]. If each adjacent three-vector within a mo-
tion curve space’s principal components can be interpreted as a linearized quaternion,
and only two interpolants are used, the methods are equivalent. Grassia points out that
while this interpolation does not guarantee travel about the SO(3) geodesic like SLERP,
a reasonable approximation can be made through careful selection of the reference ori-
entation when linearizing the quaternion. The mapping that gives this approximation
is the one that minimizes the Euclidean distance between the interpolants. Obviously,
this criterion is not observed by MC-LERP: the reference orientation for the linearization
of all of the quaternions for a given joint is fixed during the construction of the wPCA
space. Recall, however, that the orientation used is the sample mean from the construct-
1A n-dimensional Euclidean space is a space constructed such that the distance between any two
points ~p1 and ~p2 is√∑n
i=0 (p1[i]− p2[i])2
Chapter 4. Interpolation 45
ing clip. Most joints in the human body have a relatively small range of motion, and
many have only one or two degrees of freedom. If the constructing clip is truly indicative
of the motions used in an interpolation, the reference orientation should be close to the
minimizer. We posit that MC-LERP should is good enough for use in most interactive
situations where an animator would be using our system, and that the speed increase and
the flexibility of using more than two interpolants outweighs the minimal visual artifacts
that it introduces.
Experimentation provides empiric data to support our approximation. When applied
to various poses, the results of both interpolation methods look very similar, and both
look plausible.
The case for MC-LERP can be made stronger by investigating its behaviour when
applied to whole motions. Figure 4.1 shows a screenshot of a prototype application for
comparing motion interpolations. The user begins by loading two motion clips, and
arranging them on the timeline. Once the clips are in position, the parameters of the
interpolation can be set, and the actual operation performed. The results are shown both
applied to a figure, and projected into three dimensions. The user may choose to use
either SLERP or MC-LERP.
When interpolating between two motions, there are three cases to consider: when
the motions are separated in time, when they partially overlap, and when one motion is
contained within the other. We use the same interpolating weights for both SLERP and
MC-LERP, and define them for each case as follows.
• Separate: The lead-in and lead-out times define a linear ramp centered about the
mid-point between the end of the first clip and the start of the second clip. If the
clips are far enough apart in time that the ramp would reach its maximal value
before the start or end of the clip, it is stretched to fit. The ramp is not modified
if it is too wide for the gap between the clips.
Chapter 4. Interpolation 46
Figure 4.1: Screen Shot from the Interpolation Explorer Prototype
• Overlapping: The ramp is constructed in the same way as the previous case.
• Containing: The lead in and lead out times define the widths of two ramps, one
centered on each extremity of the contained clip, that bring the value away from
and back to the containing clip. The value of the interpolating value between the
ramps is user-specified.
We compare the projections of LERP and SLERP interpolations between a pair of
overlapping motions in figure 4.2. The motions for MC-LERP and SLERP, when applied
to a figure and played back, look very similar. The results are similar to those from the
single pose case. Significant artifacts can only be detected when the target poses are very
different, and far from the mean pose.
As mentioned in chapter 3, projecting and unprojecting motions is an expensive
operation. As such, MC-LERP is not a cost-effective replacement for SLERP in all
cases. In cases where other operations need to be done in Motion Curve space, however,
or when multiple interpolations must be done with the same data, the cost of projection
can be amortized. As we shall see in the next section, however, MC-LERP can be used
Chapter 4. Interpolation 47
Figure 4.2: Motion interpolation comparison: LERP is on the left and SLERP is on the
right
as a springboard to more interesting techniques that justify the computational cost of
projection.
It is important to note that MC-LERP is convex in pose space only if the full set
of bases are used. Information about joint angels that is encoded in bases that are not
included in a projection is not used in the interpolation. For example, if the weighting
scheme is organized such that the angle of the figure’s elbow joint is encoded in the
35th base, and only the first 34 bases are used during projection, the elbow joint will
assume the space’s mean position for the duration of the interpolation. The fidelity of an
interpolation is thus constrained by the fidelity of the projections involved. If all bases
are used, the joint angles of an interpolated pose should lie between the two example
poses’ joint angles.
Chapter 4. Interpolation 48
4.2 M-way Interpolation
MC-LERP or SLERP can be used to create a blend of two poses. It is often useful to
combine more than two poses. For example, an animator might want to create a weighted
average of several emotionally charged poses to generate a continuum of expression.
One way to perform such an interpolation is to assign an order to the operations, and
proceed pairwise. This solution is problematic, because interpolation is not commutative.
The order in which the interpolations are made changes the result. In certain constrained
cases, this might be acceptable. If the source poses have distinct meanings, a logical
ordering may be possible. For example, the first interpolation may establish the result’s
position along a single parametric axis, and the second may establish a separate axis. In
the general case, however, where the number of examples, and their parametric values
are not known in advance, a consistent ordering is hard to define.
Johnson provides two separate algorithms for multi-way unit quaternion interpolation
in [37], which he calls Slime and Sasquatch. The Slime algorithm begins by linearizing
the interpolants about their sample mean. The result is then found using ordinary vector
interpolation on the linearized quaternions. This works well for range-limited joints, but
suffers from a discontinuity 180 degrees off-mean. The Sasquatch algorithm is more
general, and does not suffer from any discontinuities. It works by iteratively minimizing
an ODE representing the sum of the spherical distances between the result and the
weighted interpolants. Since it is an iterative procedure, it is not as fast as Slime.
Computationally, MC-LERP just a normal vector interpolation, so it is trivial to
extend it to multiple interpolants. The scaling behaviour is linear with the number of
interpolants.
We developed a prototype application to demonstrate the usefulness of multi-way
interpolation. This application is similar in spirit to the technique described by Igarashi
et al in [35], but it differs greatly in the underlying interpolation technique. Using the
application, the user can select single poses from motion clips. The selected poses are
Chapter 4. Interpolation 49
represented as points on a 2D plane. The user can arrange the points in any configu-
ration that is desired by right-click dragging them. By left-clicking and specifying 2D
coordinates (X, Y ), the user can specify a set of n interpolation weights wi governed by
the following equation:
wi = ((X − dxi)2 + (Y − dyi)
2 + ε)−12 , (4.2)
where (dxi, dyi) are the ith interpolant’s planar coordinates, and ε is a small number.
These weights are expressed as percentages of their sum, and then used to create a
projected point:
Proj =n∑
i=0
wiPi. (4.3)
This point is unprojected and applied to a figure in real time. This gives the user a
puppet-like interface for specifying new poses. Figure 4.3 shows a set of example poses
with several interpolated results.
Figure 4.3: The Planar interpolation application
Chapter 4. Interpolation 50
4.3 Improved Non-overlapping Blends
By inspection, the 3D projections of smooth motions appear to be smooth themselves.
Joining two clips that are separated in time by a linear interpolation through pose space
gives reasonable results, but suffers from a characteristic ‘over smoothed’ appearance.
One way to reduce the blending artifacts is to use a higher-order interpolation between
the clips. We use a simple cubic interpolation in these cases. We use a Hermite spline
that depends upon the derivatives and positions of the endpoints of the two clips. The
value of each interpolated point is given by
x(t) = P0(2t3 − 3t2 + 1) + P1(−2t3 + 3t2) + v(D0(t
3 − 2t2 + t) + D1(t3 − t2)), (4.4)
where v is a control variable that the animator can set between zero and one to damp
out the nonlinear portion of the interpolation.
We use a simple finite difference to estimate the derivative at the boundaries of each
clip. This can lead to problems if a clip is noisy, so smoothing should be done first to
get the best results.
In most cases, the effects of using the cubic interpolator are subtle. When working
with clips with large derivative values at their boundaries, however, the improvement is
quite noticeable. One such case is illustrated in figure 4.4. Here, the first clip portrays
the figure stepping to the left, while the second clip shows the shuffling forward into a
crouch. With a linear interpolation, the momentum of the stepping motion is lost as
soon as the interpolation begins. Cubic interpolation provides a smoother transition by
incorporating the momentum within the interpolated result. The character anticipates
the next motion as is crosses clip boundaries, without the robotic-looking transition
artifact that is characteristic to linear blends. Of course, there is no physical simulation
going on, so the figure’s ‘momentum’ in this discussion is simply a side effect of the
Hermite blend. While there is nothing guaranteeing physical plausibility in the final
Chapter 4. Interpolation 51
Figure 4.4: Cubic interpolation in motion curve space
animation, the observation that smooth motions project as smooth curves would tend to
support the hypothesis of a cubic interpolation as leasing to a more realistic motion than
a linear interpolation.
4.4 Case study - Motion Graphs
In [45], Kovar et al. introduce an adaptation of Schodl et al.’s Video Textures [59]
to motion data, which they call Motion Graphs. Both techniques create an endless
animation by finding instances of self-similarity within a finite data stream. Cross-over
points are built at these instances, and a directed graph structure is built to represent all
possible transitions. Animations can then be produced by traversing the graph. In order
to test the features of motion curve space, we built an implementation of the motion
graphs algorithm using Motion Curves as its underlying representation.
Porting the motion graphs algorithm to a new representation requires two major
pieces of infrastructure: a distance metric, and an interpolation operator. As shown in
chapters 3 and 4, both of these are readily available in Motion Curves representation. In
Chapter 4. Interpolation 52
addition, both operations are computationally simple, and are scalable in the sense that
extra speed can be bought at the price of accuracy.
The original motion graphs paper uses a very heavyweight pose difference calculation.
Two poses are compared by using them to deform polygonal meshes, resulting in a
pair of vertex point clouds. A closed-form optimization is then performed to find a
transformation to align the two point clouds and cancel the poses’ root transformations.
The distance is then taken to be the sum of squared displacements between corresponding
vertices. Interpolation is done using the usual quaternion-based SLERP on each joint.
Our implementation has two major components, which are embodied in two separate
programs. The first program is used to create the motion graph structure itself, and
saving the results to a file. The second program accepts the results of the first, and
presents a random walk around the graph. The walk is visualized by animating a figure,
and showing the projection of its current pose and the graph structure in Motion Curve
space.
Constructing the motion graph is a time consuming process. The algorithm accepts
as input a long motion clip. The first step is to construct a table of inter-pose distances
for the entire clip. The clip is usually smoothed and sub-sampled as a preprocessing step
in to reduce the number of poses. Once constructed, a diagonal filter is convolved with
the table to enforce causality, as described in [59]. The table is normalized, and local
minima are found. These local minima represent possible transitions.
The pose indices of the minima found in the table become vertices in a graph (built
using the Boost graph library [62]). The edges are created for each potential transition,
and for each pair of time-adjacent vertices. In this form, the motion graph may contain
dead ends, so only the strongest connected component is retained.
A screenshot showing a single moment of a random walk is shown in figure 4.5. The
random walk is performed by traversing the graph structure, and taking the natural (non-
transitional) edge most of the time. MC-LERP is used to perform the interpolation when
Chapter 4. Interpolation 53
transitional edges are taken. While such an undirected animation serves little purpose
on its own, it demonstrates the use of several of algorithms developed in this thesis,
and is interesting to watch. Through this animation, we demonstrate the real-world
applicability of MC-LERP.
Figure 4.5: The Random Walk Demo Application
4.5 Summary
In this chapter we have shown how pose interpolation can be done in Motion Curve
space. We have subjectively compared our interpolation with the standard method, and
demonstrated how it can be easily applied to multiple blend targets. We also shown how
non-linear interpolations can be used to create smoother transitions. Finally, we put our
intepolation algorithm to use in implementing the results of Kovar et al.’s Motion Graphs
paper, and shown that it creates acceptable results.
Chapter 5
Geometric Operations
5.1 Overview
In this chapter we introduce several motion-editing operations that are made possible
by the Motion Curve space introduced in the previous chapter. Screenshots from the
prototype editing application illustrate both key concepts and the animator’s workflow.
Motions in this application are visualized ‘out of time’ as three dimensional curves.
This representation emphasizes the object-editing metaphor employed by the geometric
operations, and presents a familiar interface to animators used to working with 3D model
editing programs. Figure 5.1 shows the results of two of the operations described in this
chapter.
5.2 Finding Mean Poses
Many of the operations that we develop require reference points or directions. Such
navigational guideposts can be difficult to come by in a high-dimensional space, especially
when the basis vectors do not necessarily correspond to anything meaningful to the
animator. The task is further complicated by the fact that animator will often use more
than once space in a single editing session: any landmarks created in one space will need
54
Chapter 5. Geometric Operations 55
Figure 5.1: Translation and Scaling Operations: The left image depicts a bounded trans-
lation applied to a running motion. The figure in the foreground has been brought closer
to a ducking stance. The right image depicts a bounded scaling applied to a martial
arts move. The foreground figure’s stance has been widenedm and its hand motions
exaggerated.
to be re-projected or recalculated into the other spaces before it can be used.
Given the lack of inherent points of reference in our spaces, we construct reference
points using projected data. This most often involves finding the sample mean of one or
more sets of poses.
Our unprojected data takes the form of a list of unit quaternions, each one repre-
senting the orientation of one of the figure’s joints. Taking the arithmetic mean of the
quaternion values does not work because the results are not necessarily of unit length,
and therefore do not correspond to a rotation. Also problematic is the fact that quater-
nions Q and −Q, which correspond to the same rotation, have a null arithmetic mean.
This precludes any re-normalizing step.
Johnson gives a robust algorithm for finding the sample mean of a set of quaternions
in [37]. This algorithm, which is outlined in listing 2, reformulates the problem as a
minimization exercise. The mean quaternion is taken to be that which minimizes the
sum of squared inner products with all of the data points, subject to the constraint that
it must lie upon the unit hypersphere. This requires a large matrix multiplication, and
Chapter 5. Geometric Operations 56
a 4 by 4 eigen-decomposition. In order to find the mean of a pose, this calculation has
to be repeated for each joint. We use this method during the construction of our Motion
Curve spaces.
Algorithm 2 Calculating a sample mean quaternion
1. Construct a 4xN matrix Q containing the sample quaternions in its columns
2. S = QQT
3. Perform the eigenanalysis of S
4. The mean quaternion can be taken as the eigenvector associated with the largest
eigenvalue
Once constructed, a Motion Curve space of dimensionality n behaves much like Rn.
Poses project to single points. Given the space’s closure over addition and scalar mul-
tiplication, we can can use the arithmetic mean to calculate the mean of M projected
points:
P =1
M
M∑i=0
pi. (5.1)
This projected mean can be unprojected to yield an approximation to the quaternion-
based mean pose. The quality of this approximation will depend upon the span of motion
curve axes, which in turn depends upon the choice of construction data.
Given a suitable space, this method for finding the sample mean is conceptually
straightforward and computationally cheap. It is expensive to project the data, but this
cost will be amortized over the application of several Motion Curve operations in most
use cases.
Chapter 5. Geometric Operations 57
5.3 Scaling-Based Operations
In [9], Blanz and Vetter use scaling in a PCA space to generate caricatures of faces. In
their formulation, texture and segmented geometry are represented as points in separate
spaces that were derived from a database of face data. In order to generate a caricature,
the projected points are moved away from the origin by scalar multiplication. The logic
behind this is that the origin of the PCA spaces represent a zero offset from the mean face
of the constructing dataset. A caricature of a face can be thought of as an exaggeration
of those features which distinguish it from the average. Scaling the projected datapoint
increases the distance from the average, and thus the perceptual distinctiveness of the
reconstructed face.
We can perform a similar operation in Motion Curve space. Given a projected motion
clip, an exaggeration can be made by performing a simple linear scaling about the origin.
This increases the distance between the origin and every pose in the motion. Scaling a
clip by a factor less than one subdues the motion, bringing it closer to the base point.
The expressive power of this operation can be increased by subjecting it to a time-
based envelope. Instead of scaling the entire clip, the animator can instead isolate a
particular segment to exaggerate. This is done by dragging start and end markers to the
desired positions on timeline. The envelope is ramped at either end to prevent popping
artifacts in the final animation. The animator can control the lengths of both the lead
in and lead out times. The application of the gating function is shown in figure 5.2. In
our prototype, we can apply this gate to any of the geometric operations.
A further improvement to the scaling operation can be made by changing its reference
frame. Scaling about the origin increases the dissimilarity between the poses in the clip
and the origin. The origin in a Motion Curve space represents the mean pose of the
training data set. This may not always be a good choice of baseline for a motion clip.
Figure 5.3 presents an example of a bad scaling. When a motion’s projection is entirely
to one side of the mean, for example, its projected ‘center of mass’ is moved by the
Chapter 5. Geometric Operations 58
G(t)
Timet0 t1
1
0
tlead-in tlead-out
Figure 5.2: The gating function and its application
translation. This may or may not be the effect that an animator would be seeking in
such an situation.
Recall that we can quickly find the projected mean of an arbitrary set of poses. If
we find the mean of the animator-selected region during the scaling operation, we can
use it as the origin for the operation. This is accomplished using a simple chain of affine
transformations. First the projected poses are translated by −Tmean, the negative of
the selection’s projected mean. Next, scaling is done as before. Finally, the poses are
translated back into position by Tmean. Figure 5.3 shows the difference between local and
global scaling.
The main advantage of using linear scaling is that it offers a wide expressive range
for very little computational cost. The entire operation, including the translations, can
be wrapped into a single matrix multiplication, which can be optimized using SIMD
instructions on modern processors. Given that motion editing is most often an offline
technique, we are free to explore more expensive options.
Making the scaling factor dependent upon a point’s distance from the base point
can lead to interesting effects. Using a shifted and scaled sigmoid function S(d) as
Chapter 5. Geometric Operations 59
Global Scaling Local Scaling
Figure 5.3: Global vs Local Scaling
the distance function modifies only certain portions of the motion. By adjusting the
function’s parameters, it is possible to scale only those parts of the motion that differ
from the baseline by at least some threshold. This is similar in effect to applying several
separate, manually bounded scalings.
Complex space warps can be built by combining multiple scaling fields with exponen-
tial drop-offs. Such a field can be configured to attract motions toward certain areas of
pose space, or away from others. Using poses taken from library animations, an animator
can ‘sculpt’ a field that will bend motions toward characteristic poses. New animations
can then be ‘coloured’ by applying the field to their projections. An example of this is a
field that attracts footfall poses toward exaggerated, limping counterparts.
5.4 Translation-Based Operations
As we will see in chapter 4, Motion Curve space allows us to blend between poses using
a simple linear interpolator. Translation along a straight line can be viewed as a special
case of interpolation, where the two ends of the interpolation are the initial point and
Chapter 5. Geometric Operations 60
some point located along the vector describing the translation. The translation operators
in our system work on the idea of similarity: poses can be made more or less like other
poses by linear translations through space. For example, a normal walking motion can
be translated toward a crouching pose to make a crouching walk.
The simplest translation that can be made is one along one of a space’s axis. The
effects on the figure’s pose will of course depend upon the principal poses of the current
space. These poses are not guaranteed to be meaningful to an animator, although they
can be manipulated through judicious setting of the joint weights when building a Motion
Curve space.
Indeed, it is possible to build a puppet-like interface using such a scheme. By directly
mapping a translational offset to input devices, the animator can rapidly change change
a pose (or motion). Unfortunately, degrees of freedom are quite limited on most input
devices, so the amount of control this scheme gives is limited. Still, if suitable principle
components can be developed in a space, a single input DOF will map to several correlated
joints in the final animation.
The biggest problem with using translation to modify a pose through its Motion
Curve space projection is one of navigation. It is hard to control, or even visualize, many
more than just a few degrees of freedom at once. In most general spaces, the axes will not
relate to a feature that the animator wants to control anyway. Clearly, if the animator
is to navigate Motion Curve space with any degree of effectiveness, we need to establish
a more intuitive navigational framework.
The easiest way to establish a location in Motion Curve space is not to construct
it manually, but to take it from example. Given a desired pose, it is easy to project
it into the space, and calculate the direction to it from another point. A single pose
can be made more or less like another by constructing a vector connecting the two and
translating along it.
Entire motions clips, or gated segments of clips, can be translated in a similar manner.
Chapter 5. Geometric Operations 61
In this case, however, the base of the translational direction vector is the mean pose of
the set of points to be edited. Translating the segment as a whole preserves the relative
positions of its poses, such that the motion’s original character is preserved. Figure 5.4
demonstrates this process.
Figure 5.4: Translating a clip toward a target. In this case a sneaking motion is translated
toward the ‘hands-up’ pose pictured on the left.
By selecting two poses that represent two extremes in a continuum, the animator can
create an axis along which to move other clips in parallel. The advantage of this ap-
proach over translating directly toward a target is that incidental features of the original
animation are preserved. Figure 5.4 presents a comparison of the two methods.
Chapter 5. Geometric Operations 62
5.5 Filtering
The notion of using signal processing techniques on motion data is not new. Bruderlin
and Williams proposed several frequency-based techniques in [12]. Using an angle-based
representation for such a task introduces many problems, as was described in chapter
3. We avoid many of the problems inherent to angle-based representations by bringing
signal processing techniques into Motion Curve space. Each dimension of the Motion
Curves data can be treated as an independent one-dimensional signal. Filters can then
be applied by convolving their kernels with the signals. The width of the kernel will
depend upon the sampling rate of the motion, and the desired effect.
One of the most useful kernels to use on motion data is the low pass filter. Motion
capture data is often plagued by high-frequency noise, which manifests as popping and
jittering upon playback. Most of the content in large-muscle human motion is quite
low-frequency, so filtering out the high-frequency noise has little visual effect. Low-pass
filtering is necessary to prevent aliasing when sub-sampling the data. Other filters are
also useful for processing motions. In [49], band pass filters are used to separate style
from content in angle-based motion data.
5.5.1 A Wavelet Approach to Smoothing
One disadvantage of using convolution to perform smoothing is that it is quite slow.
Performance can be greatly increased at the cost of some preprocessing if a multiresolution
analysis is done first. Wavelet theory is a very rich, and mathematically dense subject.
We present a very limited description of one particular type of wavelet, and defer a
thorough explanation to more authoritative sources, such as [17]
A wavelet decomposition expresses a signal as a hierarchical combination of basis
functions. There are many different basis functions to choose from, each having different
characteristics. We use the Haar basis because of its easy implementation. Despite
Chapter 5. Geometric Operations 63
its simplicity, we have achieved interesting results using the Haar function. Further
exploration, using continuous bases [22] may be warranted, but it is important to note
that using more complicated bases will come at the cost of increased computation, which
may reduce the practical effectivness of the techniques we present.
The Haar basis function is described by the following formula:
h(x) =
1 if 0 < x ≤ 1
2,
−1 if 12
< x ≤ 1,
0 if x ≤ 0or1 < x.
(5.2)
The decomposition is made by recursively taking averages of adjacent data points,
and encoding the results and differences. An example decomposition is shown in figure
5.5. One requirement of the decomposition is that the signal has a length that is a power
of two. Since our motion data is most often of an arbitrary length, we perform a linear
supersampling before decomposition.
1 7 4 0 9 4 8 8 2 4 5 5 1 7 1 1
4 2 6.5 8 3 5 4 1 -3 2 2.5 0 -1 0 -3 0
3 7.25 4 2.5 1 -.75 -1 1.5
5.125 3.25 -2.125 .75
4.1875 .9375
4.1875 .9375 -2.125 .75 1 -.75 -1 1.5 -3 2 2.5 0 -1 0 -3 0
original data:
l3:
l2:
l1:
l0:
result:
average difference from average
Figure 5.5: The Haar wavelet basis function in action
Given the wavelet data, the original signal can be reconstructed by performing the
Chapter 5. Geometric Operations 64
reverse operations as used in the deconstruction. Reconstruction proceeds as a series
of refinements. The level zero reconstruction is simply the first value in the wavelet
representation. This is the arithmetic mean of the signal. The level one reconstruction is
two elements long. If the level zero reconstriction is r0, and the second value in the wavelet
representation is w2, the level one reconstruction can be expressed as [ r0 − w2 r0 + w2 ].
Further levels can be reconstructed by following the pattern recursively. In this way, a
smoothed signal can reconstructed more quickly than it can be found by convolution
with a low-pass filter.
One limitation of this approach, is that it allows smoothing only at discrete levels.
When using a filter-based approach, one can simply widen the kernel, but that is not an
option with a discrete wavelet representation. One solution, which is based on a technique
used in [22], is to reconstruct two adjacent discrete levels of the curve, and perform a
linear interpolation between the two to get the continuous-level result. The first three
dimensions of a Motion Curves signal are shown at continuous levels of smoothing in
figure 5.6.
An area where wavelet representation have seen much success is in compression. Using
the algorithm presented in [65], we were able to compress motion data captured at 120Hz
up to around half of its normal size, before artifacts became visible. When the artifacts
did appear, they took the form of shakiness - the figure still performed the recorded
motions (and looked good in individual still poses), but moved with a palsy. Since the
key poses are still hit, it may be possible to avoid the artifacts through an intelligent
resampling. While compression was not a major area of research for this work, we theorize
that more impressive results could be found by leveraging the inherent dimensionality
reduction powers of the weighted PCA space in addition to the wavelet result.
Johnson did some initial work with unweighted pose PCA for compression in [37],
but found the results mixed at best, suggesting that more complex reduction techniques
that search for curved manifolds rather than orthogonal bases might provide better com-
Chapter 5. Geometric Operations 65
Figure 5.6: Continuous levels of smoothing. The original motion is shown on the left.
The area highlighted in red is shown under increasing levels of smoothing in the sequence
on the right.
pression. The use of weighted PCA, where the weights are used to concentrate the more
important degrees of freedom (perceptually speaking) lower in the list of PCs, might
improve the results.
5.6 Case Study - PCA Explorer
The PCA Explorer application presents a good example of the direct application of a
geometric algorithm to Motion Curve space.
In its default mode, the application allows the user to control a puppet figure by
directly navigating the three most significant dimensions of a PCA space. This interface
is limited by two major factors. The first is that there simply is not much expressive
power in only three dimensions. Correlated joints lead to motion in more than three of
the puppet’s degrees of freedom, but there is still only three axis of control. The second
problem is that is is difficult to control three dimensions at one time using only a mouse.
Chapter 5. Geometric Operations 66
The application may be set into nearest neighbour mode to help offset these limita-
tions. In this mode, an example motion clip is loaded and projected into the PCA space.
When the example motion is loaded, its three dimensional projection is used to populate
an octree structure. The closest projected point to the user’s cursor can then be found
using the standard algorithms associated with octrees. When the user moves the cursor,
the system finds the closest point in the projection, and applies the full-dimensional pose
from that point in the original clip to the puppet.
By adjusting the viewing angle, the user can find planes of control that can give
good results with only two input degrees of freedom. The motion of the puppet is
still quite limited, but the improvement over the default direct manipulation scheme is
impressive. The expressiveness of the system could be easily increased by using the M-
way interpolation technique described in 4 instead of a direct projected nearest-neighbour
query. In chapter 6, we will expand upon the idea of using octrees to perform queries in
Motion Curve space when we use the approximate nearest neighbour algorithm, which
is a refinement of kd-trees.
A possible extension to this interface that would enhance interactive performance
would be to use a predictive feedback system similar to that used in [47]. This would
allow the user more freedom in exploring the space of poses around the figure’s current
position without necessarily affecting the resulting animation.
5.7 Extensions: Joint Limits and Selective Blending
When applying any of the operations discussed in this chapter, it is quite easy to cause
the figure to adopt an unnatural posture. Posture errors can take the form of inter-
penetrations, broken foot contacts, and hyper-extended joints. While the current system
favours flexibility over realism, and as such does not attempt to fix any of these problems,
the framework could be extended to support clean-up as a post-processing step.
Chapter 5. Geometric Operations 67
Inter-penetrations occur when one part of the model moved by the motion data inter-
sects with another part. For example, the figure’s hands might collide with each other.
Detecting such collisions requires knowledge of the geometry moved by the motion data.
There has been much research in motion planning for articulated figures to avoid collisions
in the robotics and simulation field [57, 42, 53].
Broken foot contacts can make the figure appear to float above the ground, or even
descend into it. Currently, the system can be set to move the figure vertically until its
lowest joint touches the ground. This is a quick and easy method that works in most
situations, but has some problems. The most glaring issue is that the figure is unable to
jump. Certain extreme poses, such as if the motion capture actor were to reach down
below its feet while standing on a non-modeled step, would be rendered incorrectly.
Some of these issues could be resolved using physical simulation, but that lies outside
the current scope of the Motion Curves representation.
Mathematically, a quaternion-based joint in an articulated figure can assume any ro-
tational value. No naturally-occurring joint has such freedom, however. Even a shoulder
joint has clearly determined limits. Many commercial animation packages support the
concept of joint limits, where joints can have arbitrary restrictions placed upon them
to limit their range of movement [2]. The actual values of the limits can be based on
anatomic data, or automatically calculated from reference motions [30]. While this can be
helpful to maintain physically realistic motion, animators often choose to go ‘off-model’
to get certain effects. Given the gross over-extensions that navigating motion-curve space
can create, however, toggleable limits would probably be useful. A more useful feature
might be a pose inspector that gave warnings on invalid rotations without limiting their
movement.
Currently, all edits made in Motion Curve space are applied to the entire skeleton. By
design, the weighted PCA procedure finds correlations between the joints’ movements,
and these correlations end up being reflected in the principal components. This is nor-
Chapter 5. Geometric Operations 68
mally advantageous: it multiplies the expressive power of each input degree of freedom
in a principled way. Changing a degree of freedom that moves one major joint will often
effect sympathetic joints (as defined by the training data) ‘for free’. To get a specific
pose or motion, however, an animator might want to intentionally break these discovered
correlations.
Under the current system, this is not possible. It would be a useful extension to
allow the animator to freeze joints that are in the correct position, and protect them
from the effects further edits. An alternate solution would be to segment the body
into major regions (such as limbs), and train individual (but smaller) wPCA spaces for
each. This would allow for the body parts represented in each space to be moved totally
independently. The trade-off in this situation would be that the animator would receive
very little ‘for free’ from correlation, and at least one value for each segment would have
to be specified to define a full pose.
5.8 Summary
In this chapter we have detailed several geometric operations that can be used to edit
motions projected into a Motion Curve space. An efficient method to calculate an ap-
proximation to the mean of a set of poses was presented first. This result was used in
the development of useful operators based on scaling and translation. Next, we described
filtering operations in Motion Curve space, and showed the use of the Harr wavelet trans-
form for quick smoothing and compression. We concluded the discussion by pointing out
some possibly useful extensions to the technique. We will see some of the techniques
outlined in this chapter put to use in the chapter 6.
Chapter 6
Unsegmented Motion Searching
In this chapter, we present a new method for searching long motions for regions of
high similarity to a shorter, query motions. We begin by providing motivation for the
search. Then we present our search algorithm and an overview of the system, followed by
detailed descriptions of each of its components. We finish the chapter by presenting an
experimental evaluation of the system, and highlighting areas for future research. This
work appears in [24].
6.1 Introduction and Motivation
As the corpus of information regarding virtually every human endeavour grows exponen-
tially, the importance of computer-based indexing and searching becomes correspondingly
important. The ubiquity of the Web could not have occurred without the coincident rise
of the search engine. There is little value in information unless one can explore it. Once
a collection of data grows to a certain size, its index becomes almost as important as its
content.
In some ways, designing a search algorithm for web pages is easy. The Web is primarily
text-based, and comes pre-packaged in a structural mark-up language. Other forms of
information are not such easy targets. Recently, there has been much academic interest
69
Chapter 6. Unsegmented Motion Searching 70
in search engines for non-textual media. Search algorithms and heuristics exist for most
common media, such as still images, video, and audio. These types of data have received
the most attention because of their ubiquity. In an era of cheap digital cameras and
considerable disk storage, even individual consumers are starting to require some kind of
media indexing solution.
Digitized motion data is expensive to create and manipulate. Its creation requires
the talents of a skilled animator using specialized software, or exotic and finicky motion
capture hardware. The motion data that goes into the production of a feature animation
represents an investment of millions of dollars. As an animation studio accumulates more
such data, it is in its best interest to leverage this investment, or at least try to. In order
to do so, however, they need an efficient way to search the data.
Animators often save time when creating new animations by working from prior ex-
amples. It is often more productive to modify a walk cycle to match the requirements
of a particular situation than to start from scratch on every scene. As individual pro-
duction studios accumulate 3D character animation, the possibilities for motion reuse
at once grow and diminish. Reuse becomes potentially more fruitful, since there are
more examples to choose from, but the act of actually finding useful clips gets consid-
erably more difficult. Motions, whether they are key-framed or motion-captured, are
high-dimensional objects that are hard to compare numerically. Two motions that look
similar to a human observer may in fact be numerically very dissimilar using certain
representational schemes. In most cases, searching though a catalog for a particular type
of motion quickly becomes an exercise in patience, memory, and hard work. Clearly,
a method for quickly searching a database of motions is a prerequisite for large-scale
motion reuse. Furthermore, it is important to develop similarity measures that can be
more readily adapted to user needs.
We present a method for querying a skeletal motion database with example clips. The
motion database is constructed from one or more long motion sequences. These sequences
Chapter 6. Unsegmented Motion Searching 71
can be taken from previously finished animations, unsegmented motion capture trials, or
manually keyframed motion tests. All motions must be expressed over the same skeleton.
As a preprocessing step, all sequences in the database are re-sampled to a uniform rate,
and spliced together to form a single long motion. The database is queried with a short
example clip, which ideally expresses one distinct motion, such as a single reach, step,
punch, or jump. The search algorithm finds the subsegments of the database which are
most similar to the query, subject to a nonlinear time warping. These subsegments are
ranked and returned as the search results.
6.2 Algorithm
Given our motion representation and pose distance metric, we will now describe our
motion search strategy. Our search application is similarity-based. This means that the
user must have an existing motion clip with which to query the system. This clip could
be from a library of pre-segmented, canonical actions, the results of a previous query, or
even from real-time motion capture. As a preprocessing step, both the query clip, and
the database are projected into a user-specified wPCA space. Projecting the database
is an expensive operation, but it only needs to be done once for each wPCA space, and
the results can be placed in permanent storage. After the query clip is projected, its
characteristic pose is found. An efficient spatial sorting data structure is then used to
find the indices of all similar poses in the database. These indices are clustered to reduce
redundancy, and then a variant of the dynamic time warping algorithm is used to warp
the database subregions surrounding the cluster means to match the query clip as closely
as possible. The resulting warps are ranked according to fit, and returned as the search
results. We will now describe each step in this process in detail. Pseudocode for the
querying operation is given in algorithm 3.
Chapter 6. Unsegmented Motion Searching 72
Algorithm 3 Performing a Query
Require: projected database and query clip, and offset to characteristic pose in query
Perform an Approximate Nearest Neighbour search query with the characteristic pose
for all ANN results r do
if r can be joined with an existing cluster c then
Grow cluster c, join with neighbours if necessary
else
Create a new cluster initialized with r
end if
end for
for all Cluster min points do
Calculate the forward and backward distance tables
Find the min forward and backward paths
Join the two half paths
calculate the mean warp distance
end for
return the sorted warp paths
Chapter 6. Unsegmented Motion Searching 73
6.2.1 Finding the Characteristic Point
Queries represent single, coherent motions. Such motions can often be expressed using
single poses [51]. We call these poses characteristic points, and we will use them as
starting points in our motion search. First, however, we must come up with a workable
definition of “characteristic”.
A good characteristic point for a punching motion would be the moment of maximum
arm extension. All punching motions contain some element of arm extension. Most
verb-level action descriptions, such as stepping, jumping, or ducking imply some similar
common element. In a step motion, the characteristic point could be moment when the
legs are farthest apart. The characteristic point of a jump might be at its apex. Likewise,
a ducking motion might be characterized by its lowest point. The common thread in all of
these examples is that the characteristic point represents a moment of maximal extension
or deviation from some neutral pose.
This concept fits well with our motion representation. If we define the neutral pose
to be some point in the wPCA space, we can find the characteristic point with respect
to that point by searching for the most distant pose from that point. The neutral pose
can be defined in any number of ways. If the action phase of the query is proportionally
short, the mean pose of the whole motion provides a good approximation. If the query
is nicely segmented, a reasonable assumption may be that the subject begins and ends
in a neutral pose. Either boundary pose could be used directly, or alternately the mean
of the two could be taken. The origin of the PCA space represents the mean pose of the
(probably significantly longer) motion used in its creation, so it can also be used as the
neutral point.
Our objective in choosing a characteristic point is to find class of poses that is guar-
anteed to have a close analogue in all possible matching motions, but is unlikely to exist
in non-matches. If the pose is too common, spurious matches will drown out the actual
results during the next step of the algorithm. For this reason, finding a suitable charac-
Chapter 6. Unsegmented Motion Searching 74
teristic point is a crucial task. Of course, if the query is quite short, it is not unreasonable
to require the user to specify a characteristic point directly. For certain types of queries,
this gives better results than the automatic methods.
6.2.2 Generating Seed Points
An exhaustive solution to our search problem would be to use DTW to rank all possible
alignments of the query clip and the database. This is analogous to the technique in
[44], but is slow because the DTW operation is expensive. In order to provide interactive
response rates to the user, we must cull the search space before the DTW step. We refer
to this culling as finding the seed points in the database. Seed points are the indices of
poses in the database that are similar to the characteristic point of the query.
Our measure of similarity is the euclidean distance within the scaled wPCA space, so
we can use algorithms from computational geometry to speed our search. We also have
to choose the number of dimensions within which we will operate. The weighting scheme
used to construct the wPCA space greatly influences the results of a search, which can
be exploited to considerable advantage in searching selectively. The weights should be
picked by the user to reflect the constraints of the animation for which s/he is searching.
There are several different search structures that would work for our implementation.
We chose to use Approximate Nearest Neighbour search because of its quick running
time, flexibility, and readily available source code [52]. The effects of varying the various
parameters of the ANN software are discussed in the results section.
6.2.3 Seed Point Clustering
Motion in the database takes the form of contiguous, time-ordered strands of pose-points.
Nearest neighbour searches within such a context result in sequences of temporally ad-
jacent points. Since we will be subjecting the seed points to the DTW algorithm in the
next step, all of these points will return valid, yet similar results. We cluster the seed
Chapter 6. Unsegmented Motion Searching 75
points in order to avoid overwhelming the user with hundreds of very similar results, and
to improve the search’s run time. The clustering is performed on the seed points’ time
indices. This data, since it is one dimensional, integer-valued, and mostly sequential, is
very well-behaved and easy to cluster. Data points are simply collected into contiguous
(to within a noise term) intervals as they are found. The final results are given as the
closest points within each cluster.
6.2.4 Dynamic Time Warping
The query signal has a well-defined start and end point, but we have no such luxury
when looking for a subsequence within the database. The search is further complicated
by the fact that motions tend to be performed slightly differently each time they occur.
Changes in motion timing which are subtle to a human observer may cause an enormous
numerical difference.
Both of these issues are surmounted through the use of dynamic time warping. As
discussed in chapter 2, DTW is a signal processing technique that finds a non-linear
alignment minimizing the error between two signals. It returns a time displacement
function that compresses and dilates one of the functions to match the other. DTW
has been used extensively on sound signals for speech recognition, and is often used to
improve the interpolation of multiple motion clips [12, 43].
Each clustered index represents a single moment of similarity between some point in
the database, and the characteristic point of the query. A valid time warp must pass
through this point. The warp is also constrained to run from the beginning to the end
of the query. These constraints are visualized in figure 6.1.
We can divide the time warp into two subproblems: one running forward in time and
one running backward. The method for solving each subproblem is identical. First, a
distance table is computed involving the pertinent half of the query and the corresponding
section of the database. A limit imposed on slope of the warp line through the distance
Chapter 6. Unsegmented Motion Searching 76
Database Motion
Qu
ery
Mo
tio
n
Figure 6.1: The DTW constraints. The warp must pass through the intersection of
the characteristic and seed points, and contact both horizontal edges of the table. The
search is constrained by causality, so distances in the shaded areas of the table need not
be calculated.
table provides a bound on the size of the distance table. Starting from the characteristic
point, each cell in the table is filled with the sum of the pose distances between the
indexed animation frames and the minimum of its previously filled neighbours. Once the
table is filled, the minimum value along the query’s boundary frame is found. The DTW
path is then found by greedily searching through the table toward the characteristic
point. This search is subject to causality, so there are only at most three possible steps
to take at any given point. The slope limit is enforced to prevent degenerate warps.
Degeneracies in the warp are still possible around the characteristic point, but these can
be culled during the results ranking.
When the characteristic point is in the middle of the query, splitting the DTW into two
problems halves the number of distance calculations required. There are several methods
available to further reduce the number of calculations, and/or improve the warp quality
[16, 41]. Our pose distance metric is quick enough that this has not been necessary to
achieve interactive rates with the test data that we have used. Another desirable feature
of the distance metric is that it is possible to trade accuracy for speed, and use less than
its full dimensionality in the calculation.
Chapter 6. Unsegmented Motion Searching 77
6.2.5 Results Ranking
The time warps must be scored before they can be returned as ranked search results.
Any of a number of motion distance measures can be used. We define the final score of
a warp to be the average pose distance of each cell in its path. This measure does not
penalize warping, so it is more forgiving of timing differences in the results. Alternative
measures could take into account the effects of outliers along the path, or put a premium
on time distortions. It may also be useful to cull results that have large degeneracies
about the critical point, or at least penalize them so that they are lower ranked.
6.2.6 Interface
Our implementation consists of several linked applications. Space weights are specified
in XML files, and wPCA spaces are created using a command line utility. The search
algorithm itself is implemented as a GUI-based application. The user begins a session by
loading database and query clips. A PCA space description file must also be loaded, and
further spaces can be loaded on the fly afterward. The user can preview the query clip
by scrubbing along a timeline. After setting the search parameters, the user can perform
a search by clicking the search button. Results are displayed as a list of offset times,
sorted by score. The user can see a side-by-side comparison of any result with the query
by selecting it and scrubbing along the timeline.
6.3 Results
6.3.1 Synthetic Data
We first used synthetically-generated data to verify the functioning of our system. This
allowed us produce clean motion clips with controlled variations in movement parameters.
We used the physically-based animation system designed by Neff and Fiume to generate
Chapter 6. Unsegmented Motion Searching 78
this data procedurally [55].
Two synthetic motion sets were generated. The first is approximately 300 seconds
long, sampled at 50 fps. The figure begins by raising its right arm 15 times. The exact
position of each raised hand was selected from a 10cm3 cube using a uniform random
distribution. The posture of the figure was randomly set along the Alberts axis from .25
to .75 [55]. Finally, the overall timing of each motion was scaled, ranging from .7 to 1.5
times the normal length. The figure then performs 20 similarly varied left arm raises, and
20 double-arm-raises. It finishes by performing 10 identical shrugs at different speeds,
and 10 slouches. The second dataset only contains arm raises, but the bounds for the
arm targets are increased in the x and y directions by a factor of three.
Neff and Fiume’s animation system uses an SD-Fast derived physical model [33].
Using the measurements provided in the SD system definition file, a similar skeleton
definition was created. The mass definitions from the file SD file were used to set the
wPCA weights: each joint was weighted with the amount of mass under it in the skeleton
hierarchy. No range of motion trial was available to train the PCA space, so we used the
longer of the two samples.
A 3D projection of the long motion clip is shown in figure 6.2. The individual motion
classes stand out as the path extremities. The sample mean of the training data is the
same as the rest position in this clip, and is represented by the cluster of points at the
origin. More complicated motions embedded within spaces generated from richer training
data take on a much less angular appearance when projected in three dimensions. This
is consistent with the fact that the synthetic data was designed to have a low inherent
dimensionality.
6.3.2 Validation
Validation was performed using the synthetic data. The first of each type of arm raising
motion was manually segmented from the longer motion clip, and then used to query
Chapter 6. Unsegmented Motion Searching 79
Figure 6.2: PCA projection of synthetic motion, showing low inherent dimensionality.
both motion clips. The quality of the results of the queries were highly dependent upon
the characteristic point used, and the size of the initial ANN search.
The automatic characteristic point finder did not work well with the arm movements,
because the time that the hands are raised is very long in relation to the length of the
whole clip. This shifted the clip’s mean point away from the rest position, and put the
characteristic point down near the rest pose. Using the rest pose as the query point in the
ANN search lead to mostly spurious results, with the time warping algorithm left trying
to match essentially random segments of the database to the query. Manually setting the
characteristic point to the moment of maximum arm extension and re-querying allowed
the algorithm to proceed as designed.
The size of the initial ANN search determines the broadness of the results. The
structure of the motion data is such that a spatial proximity search returns long chains
of time-adjacent points for any query. We simplify the results in the clustering phase,
Chapter 6. Unsegmented Motion Searching 80
but in order to find all valid target clusters, the initial search must be set to return a
very large number of points. For example, when querying the long clip with the raised
left hand motion, the ANN search must be widened to 1130 points before all of the raised
left hand targets are found. If the search is widened even further, other types of motions
start to creep in to the results. At 1900 points, several instances of the raise both hands
action are returned after all of the raise right hands. This is good behaviour, since the
‘raising both hands’ motion is perceptually closer to raising only the right hand than
any other action in the database. The motion distance measure also holds up, since it
consistently ranks the second tier of matches below the actual matches.
Interestingly, querying the database with one of its own motions does not always most
highly rank that motion. This is due to several factors. During the manual segmentation
of the query clips, the designated motion range is re-sampled, introducing small numerical
differences. With synthetic data, the figure holds absolutely static poses at several points
during its performance. These quiescent points tend to occur at the characteristic points,
so the ANN search often finds several identically closest points. The clustering technique
has no way to distinguish among these points, so it picks the first. This causes some
noise in the alignment of the database ranges with the query clip, which is corrected by
the time warping step. This correction has a non-zero cost, so self-matching does not
necessarily hold.
6.3.3 Motion Capture Data
After verifying the system with synthetic data, we tested its real-world applicability
with motion capture data. The data was collected using a Vicon optical motion capture
system, and post-processed into joint angles using the Vicon IQ software. This data uses
a different skeleton than the synthetic data, having the same dimensionality (19 joints),
but a different arrangement (see figure 6.3). The Vicon samples at 120 Hz.
Many individual motion trials were captured, with each being on the order of a couple
Chapter 6. Unsegmented Motion Searching 81
Vicon Real-Time Skeleton SD-Fast Skeleton
Figure 6.3: The two skeletal structures used.
minutes in length. Specific range of motion trials were made to train the PCA spaces.
Trials included walking, cyclic motions, martial arts moves, and others. As with the
synthetic data, query clips were manually segmented from the longer trials. An example
projection is shown in figure 6.4.
Searching with the real data worked well, but the results were not as clean as those
from the synthetic tests. An illustrative example is a search done using recorded Aikido
movements. The actor in the Aikido motion trial performed a specific script. By tinkering
with the location of the characteristic point, it is possible to cause the system to return
movements in the wrong order. The matches are still consistent, however, with the bodies
being in similar, if not identical poses. The ranking that our search method provides is
somewhat arbitrary, much like the ranking of web pages returned by most web search
portals.
Differently-weighted PCA spaces can be used to modify the results. Re-doing the pre-
vious example with a weighting scheme that emphasizes the arms improves the ranking
of the results. This is because the position of the arms is what most clearly differenti-
Chapter 6. Unsegmented Motion Searching 82
Figure 6.4: wPCA space projection of a martial arts move.
ates the moves. An animator can use different spaces to accomplish specific goals. For
example, when animating a character picking up an object, it would be a good idea to
use a space weighted heavily toward the hand the character is using.
6.3.4 Scalability
One of the most important features of a search algorithm is its scaling behaviour. In this
section we discuss the theoretical complexity of our algorithm and present some search
heuristics. We then provide experimental results that demonstrate the efficacy of these
techniques.
The complexity of the overall search algorithm is best evaluated in individual sections.
The complexity of the ANN k-nearest neighbour search in d dimensions over n points
and with error bound ε is shown to be O((cd,ε + kd) log n) in [3], where cd,ε is a constant
dependent upon d and ε. This gives good performance with large databases, but is
quite sensitive to dimensionality. The clustering technique that we used has a worst-case
complexity of r(r+1)2
, and returns c clusters, where r is number of results returned by the
Chapter 6. Unsegmented Motion Searching 83
Length (s) Num. Results Query Time (ms)
54 46 580
100 21 263
130 30 662
230 40 854
300 41 877
440 25 534
670 27 580
Table 6.1: Query Time vs Database Length
ANN search. Each application of our DTW algorithm requires at most sq2 and at least
(sq2)2
pose distance calculations to build the distance table, and 3s comparisons to trace
a path through the table (where q is the length of the query and s is the slope limit).
Sorting the c warps takes c log c comparisons. The aggregate complexity of all of the
steps is
(cd,ε + kd) log n +r(r + 1)
2+ c(sq2 + 3s + log c).
We tested our system with motion capture databases of various sizes to experimentally
evaluate its scalability. The results of the tests are given in table 6.1.
The data shows that our implementation’s real-world performance largely depends
upon the number of results passed on to the DTW step, rather than the size of the
database. The databases used for the last two trials contained exact matches for the
query. It follows that the area around the characteristic point would be dense with
consecutive poses, which would lead to more clustering, and fewer returned results. This
emphasizes the importance of a having both a unique characteristic point and a good
clustering result. It also indicates that superior performance can be gained by adjusting
the DTW parameters. All tests in this paper were performed using a 3GHz Pentium 4
computer.
Chapter 6. Unsegmented Motion Searching 84
Data Sampling rate 120 Hz
Database length 70 seconds
Query Length 2.8 seconds
ANN dimensionality 10
ANN search size 1000
ANN epsilon 0
DTW dimensionality 5
Table 6.2: Baseline Parameters
6.3.5 Performance Optimization
There are several ways to improve the running time of the algorithm. Most involve a
trade-off of either search breadth or accuracy for query time performance. In order to
build a basis for comparison, we set up what could be considered an average query. We
used the Aikido data described in the previous section, and the search parameters listed
in table 6.2. Using the automatically generated characteristic point, the average query
time (from 10 trials) was 347 ms, and the search returned 12 results. Using a manually
specified characteristic point, the average query time was 130 ms, with 9 results returned.
The results from the manual characteristic point trial were subjectively much closer to
the query than those from the automatic trial, so the manual point was used in all
subsequent trials.
Pre-smoothing the data
The Vicon system samples motion at a default rate of 120 Hz–a much higher rate than is
necessary to capture most large muscle movements. Reducing the sampling rate obviouly
reduces search time. We resampled the database and queried at a progression of sampling
rates to illustrate the effects of this reduction on average query time. The results are
shown in table 6.3. As expected, using fewer samples greatly speeds up the algorithm.
Chapter 6. Unsegmented Motion Searching 85
Hz Num. Results Query Time (ms)
233 9 453
166 10 158
58 9 29
29 9 7
Table 6.3: Query Time vs Sampling Rate
The quality of the results is consistent until the low sampling rate conflicts with the
clustering algorithm. At very low rates, the gaps between clusters all get filled, and the
system fails by returning a single result.
Adjusting the ANN parameters
The scalability results indicate that the performance barrier in the system lies with the
DTW stage. That being said, an investigation of the effects of the ANN parameters is
important, if only just to verify the earlier result.
The number of results returned by the ANN search affects both its own running
time, and the number of final results after the clustering step. Using the default setup
as a baseline, we varied the number of neighbours to be returned by the ANN search.
The results are given in table 6.4. With a small number of neighbours, many potential
results are missing. As the number of neighbours increases, there tend to be more results
returned. After a point, the seed points’ neighbourhoods grow too large, and incorporate
spurious points, which in turn causes the clustering algorithm to create improper clusters.
The ‘sweet spot’ between too few results and over-clustering depends upon the specific
nature of the database and query being used.
One of the most desirable features of ANN search is that it can deliver improved
performance if some measure of error is acceptable. Paradoxically, increasing this error
tolerance leads to reduced performance in our system. Non-exact results cause gaps in
Chapter 6. Unsegmented Motion Searching 86
Neighbours Num. Results Query Time (ms)
100 4 54
250 5 70
360 6 80
490 8 109
600 8 118
1000 9 132
1500 9 137
2500 12 193
3010 17 256
5000 12 219
Table 6.4: Query Time vs Neighbours
the runs of nearest neighbour poses, which leads to an increased number of clusters.
Adjusting the dimensionality of the data used for the ANN search had a negligible effect
upon the running time of the overall system.
Adjusting the DTW parameters
The running time of the search algorithm directly corresponds with the number of clus-
tered results that make it to the DTW phase, so the time warping algorithm is good area
upon which to focus optimizations. An easy optimization is to exploit the flexibility of
our motion representation by reducing the dimensionality of the pose distance metric. As
the shown in table 6.5, reducing the accuracy of the distance comparisons does improve
the system’s run time. For the types of motions tested, the quality of the warp is not
subjectively affected by reducing the dimensionality. This of course depends upon the
distribution of the principal components. The user is required to pick a set of weighted
PCA bases that reflect the content of the motions that s/he is using.
Chapter 6. Unsegmented Motion Searching 87
Dimensions Num Results Query Time (ms)
1 9 70
3 9 101
5 9 130
20 9 200
54 9 372
Table 6.5: Query Time vs DTW Dimensions
6.4 Summary
In this chapter we have presented a search algorithm for use with sampled motion data.
In doing so we have also developed a representation for motion data that introduces a
meaningful distance metric for poses. We have shown how an animator can control the
properties of the wPCA space through its weights, and how this may be used to direct
the search results. We have demonstrated the use of the search algorithm on both real
and synthetic data, and have analyzed its performance. Finally, we have experimented
with the algorithm’s settings in order to gauge its scalability to large databases. In our
tests, the algorithm returned both examples of the same motion as the query clip, and
different motions with similar features. This algorithm produces reasonable results in
very little time, allowing a user to quickly locate clips with which to work.
Chapter 7
Conclusion and Future Work
7.1 Conclusion
This thesis has introduced the Motion Curve representation for samples motion data.
We presented a detailed description of the steps for constructing a motion curve space,
taking care to show how it relates to existing motion representation schemes. We explored
the features of the space, highlighting its strong points, but also pointing out potential
weaknesses.
Through this exploration of motion curve space, several new motion editing techniques
were introduced. The most serendipitous, perhaps, was the pose detection algorithm
that motivated the development of the space in the first place. While investigating the
properties of projected interpolation, we happened upon an efficient and flexible method
for blending more than two poses. An automatic motion segmentation algorithm was
hinted at by some of our work with geometric operators, although we chose not to flesh
it out, in favour of doing work on other aspects of the research problems. The motion-
editing geometric operations described in chapter 5 provide a compelling tool set for
modifying existing motion clips to suit new needs, or to adjust their subjective qualities.
In creating new techniques for editing motion data, we have expanded the number
88
Chapter 7. Conclusion and Future Work 89
of tools at the disposal of animators. The overarching goal of all of the tools that we
have developed is to provide the means to quickly modify exiting motions to create new
motions. It is our hope that this will allow animators to rapidly prototype motions
without resorting to traditional motion creation techniques.
We have also shown the usefulness of having an interpolating embedding space for
motion data when developing automatic processing algorithms. The simplified interpo-
lation scheme that we presented in chapter 4 opens the door for new and interesting
applications, such as the interpolated plane interface that we demonstrated. Having a
well-defined pose distance metric also facilitates automatic processing, as shown by the
success of the search algorithm of chapter 6.
7.2 Future work
No thesis is complete without a discussion of further work that could be done and refine-
ments that could be made to the existing work. Indeed, there are several areas where
this work could be continued, ranging from trivial extensions to more fundamental en-
hancements.
7.2.1 Representation
Principal components analysis was chosen as the basis for the motion curves projection
because of its simplicity, and because of its existing uses in the literature. PCA projection
is a linear operation, so degree to which it can compress non-linear structures embedded
within the data is limited. This thesis did not dwell upon the use of PCA for compression,
in part at least because of initially lackluster results. Using a different change of basis,
however, might provide results worth pursuing.
Independent Component Analysis (ICA) is a linear projection technique, like PCA
[34]. The difference is that ICA seeks to find bases that are as statistically independent
Chapter 7. Conclusion and Future Work 90
as possible. This may or may not be of benefit for the techniques described in this thesis.
Often, controlling more than one correlated factor in an animation is a desired effect.
Nonlinear dimensionality reduction techniques fit reduced spaces to curved structures
in original data. An example of such a technique is IsoMap [67]. Most motions projected
into a PCA space exhibit non-linear structure, so they could, in theory, be characterized
quite well by such a technique. This would be useful in building models of particular
classes of repeated motions, but not as well with a heterogeneous mix of motions. Non-
linear dimensionality reduction could be done on the original linearized motion data, or
after wPCA projection, in order to take advantage of the weighting scheme.
7.2.2 New Operators
The collection of operators presented in chapter 5 is by no means exhaustive. Nonlin-
ear operators, such as distance-based translations, could be used to create caricatured
motions. The biggest problem with implementing complex operations is in creating an
intuitive user interface for actually using them.
An example of an easy to implement and usable tool that could be added is an
interactive speed adjuster. The speed at which a motion is performed is determined by the
apparent density of samples along the projected motion curve. This could be displayed to
the animator using a colour coding scheme, where brighter colours correspond to faster
movements. Then, by using a painting analogy, the animator could specify re-sampling
rates along the curve to adjust the performance speed.
7.2.3 Search Refinements
Our search system was designed to be integrated into a larger motion editing system
utilizing our pose space representation. In this context, the search system can be used
to find motions that are similar to a specified clip, so that fine adjustments can be made
via interpolation. It can also be useful as a motion exploration tool. A clip created using
Chapter 7. Conclusion and Future Work 91
the various editing tools can be used as a query in order to find a similar, but more
realistic motion. The time warping code prototyped for the search system is also useful
for improving the quality of arbitrary interpolations.
A possible use of the system would be to quickly apply markup to a large motion
database. This procedure would start with a small set of manually marked-up clips. The
mark-up would take the form of a 〈descriptor,value〉 tuples, where descriptor describes
the motion’s action, and value indicates how well the descriptor fits. For example, if
the descriptor is ‘step forward’, an unambiguous step forward might have the value of
1, while a ‘step forward and to the left’ would have a lower value. Queries could be
performed using each of the motions in the marked-up set. The ranges that are returned
from each query would then take on the descriptors of the queries, with values set to be
a function of the query’s values and the result’s ranking score. After mark-up, semantic
queries could be made to database very quickly.
The system could be extended to work with more complicated motions by adding
support for more than one characteristic point. If a query were determined to have mul-
tiple characteristic points, the modified algorithm would start by finding and clustering
seed points for them all. The dynamic time warping phase of the algorithm would have
to be modified to take into account multiple constraints. It would work by finding seed
points in the same order as their corresponding characteristic points. Warps would then
be found between adjacent matching pairs. The adjacency information could then be
represented as a directed graph, and all possible traversing paths enumerated.
7.2.4 Software Development
The various prototype applications developed over the course of this project were designed
with an eye toward their eventual integration. The usefulness of the techniques that they
embody would be greatly enhanced by a synergistic framework. For example, an animator
could create a pose using planar interpolation, then use it as a search key or blend target
Chapter 7. Conclusion and Future Work 92
without changing programs.
Most of the actual computation involved in the prototypes is done deep within com-
mon base classes. Most of programs use a similar set of structures for tracking their
operation. The most disparate component is the user interface. In order to bring the
prototypes together, a common interface standard would have to be created. Alternately,
the base classes could be compiled into a plugin for an existing motion editing suite, such
as Alias’ Maya. It may be conceptually difficult to fit motion curves into an exiting
framework, but if it could be done without too much contortion of the framework design-
ers’ original intentions, it would be the quickest way to move the techniques described in
this thesis into a production environment.
Bibliography
[1] Marc Alexa and Wolfgang Muller. Representing animations by principal compo-
nents. Comput. Graph. Forum, 19(3), 2000.
[2] Alias. Maya unlimited 7.0.
[3] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and An-
gela Wu. An optimal algorithm for approximate nearest neighbor searching. In
SODA ’94: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 573–582, Philadelphia, PA, USA, 1994. Society for Industrial and
Applied Mathematics.
[4] David Baraff and Andrew Witkin. Physically based modeling: Principles and prac-
tice, 1997.
[5] Jernej Barbic, Alla Safonova, Jia-Yu Pan, Christos Faloutsos, Jessica K. Hodgins,
and Nancy S. Pollard. Segmenting motion capture data into distinct behaviors. In
GI ’04: Proceedings of the 2004 conference on Graphics Interface, pages 185–194.
Canadian Human-Computer Communications Society, 2004.
[6] D.A. Becker. Sensei: A real-time recognition, feedback, and training system for t’ai
chi gestures. In Vismod, 1997.
93
Bibliography 94
[7] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time
series. In KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pages
359–370, July 1994.
[8] Christopher M. Bishop. Neural networks for pattern recognition. Oxford University
Press, Oxford, UK, UK, 1996.
[9] Volker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces.
In SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graph-
ics and interactive techniques, pages 187–194, New York, NY, USA, 1999. ACM
Press/Addison-Wesley Publishing Co.
[10] Aaron F. Bobick and Andrew D. Wilson. A state-based approach to the repre-
sentation and recognition of gesture. IEEE Trans. Pattern Anal. Mach. Intell.,
19(12):1325–1337, 1997.
[11] Matthew Brand and Aaron Hertzmann. Style machines. In Kurt Akeley, editor,
Siggraph 2000, Computer Graphics Proceedings, pages 183–192. ACM Press / ACM
SIGGRAPH / Addison Wesley Longman, 2000.
[12] Armin Bruderlin and Lance Williams. Motion signal processing. In SIGGRAPH,
pages 97–104, 1995.
[13] L. W. Campbell, D. A. Becker, A. Azarbayejani, A. F. Bobick, and A. Pentland.
Invariant features for 3-d gesture recognition. In FG ’96: Proceedings of the 2nd
International Conference on Automatic Face and Gesture Recognition (FG ’96),
page 157, Washington, DC, USA, 1996. IEEE Computer Society.
[14] Marc Cardle, Michalis Vlachos, Stephen Brooks, Eamonn Keogh, and Dimitrios
Gunopulos. Fast motion capture matching with replicated motion editing. In SIG-
GRAPH 2003, Sketches and Applications. ACM Press, jul 2003.
Bibliography 95
[15] Naval Undersea Warfare Center. Gaussian mixtures / hmm toolkit for matlab.
http://www.npt.nuwc.navy.mil/Csf/.
[16] Selina Chu, Eamonn J. Keogh, David Hart, and Michael J. Pazzani. Iterative deep-
ening dynamic time warping for time series. In SDM, 2002.
[17] Charles K. Chui. An Introduction to Wavelets. Academic Press, 1992.
[18] James W. Davis and Aaron F. Bobick. The representation and recognition of hu-
man movement using temporal templates. In CVPR ’97: Proceedings of the 1997
Conference on Computer Vision and Pattern Recognition (CVPR ’97), page 928,
Washington, DC, USA, 1997. IEEE Computer Society.
[19] Mira Dontcheva, Gary Yngve, and Zoran Popovic. Layered acting for character
animation. ACM Trans. Graph., 22(3):409–416, 2003.
[20] Petros Faloutsos, Michiel van de Panne, and Demetri Terzopoulos. Composable
controllers for physics-based character animation. In SIGGRAPH ’01: Proceedings of
the 28th annual conference on Computer graphics and interactive techniques, pages
251–260, New York, NY, USA, 2001. ACM Press.
[21] Petros Faloutsos, Michiel van de Panne, and Demetri Terzopoulos. The virtual stunt-
man: dynamic characters with a repertoire of autonomous motor skills. Computers
& Graphics, 25(6):933–953, 2001.
[22] Adam Finkelstein and David H. Salesin. Multiresolution curves. In Proceedings of
SIGGRAPH 94, pages 261–268, July 1994.
[23] Kevin Forbes. Summarizing motion in video sequences.
http://thekrf.com/projects/motionsummary/, 2004.
[24] Kevin Forbes and Eugene Fiume. An efficient search algorithm for motion data using
weighted pca. In SCA ’05: Proceedings of the 2003 ACM SIGGRAPH/Eurographics
Bibliography 96
Symposium on Computer animation, Aire-la-Ville, Switzerland, Switzerland, 2005.
Eurographics Association.
[25] Pascal Glardon, Ronan Boulic, and Daniel Thalmann. Pca-based walking engine
using motion capture data. In Computer Graphics International, pages 292–298,
2004.
[26] Michael Gleicher. Motion editing with spacetime constraints. In SI3D ’97: Proceed-
ings of the 1997 symposium on Interactive 3D graphics, pages 139–ff., New York,
NY, USA, 1997. ACM Press.
[27] Michael Gleicher. Retargetting motion to new characters. In SIGGRAPH ’98:
Proceedings of the 25th annual conference on Computer graphics and interactive
techniques, pages 33–42, New York, NY, USA, 1998. ACM Press.
[28] F. Sebastin Grassia. Practical parameterization of rotations using the exponential
map. J. Graph. Tools, 3(3):29–48, 1998.
[29] Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popović.
Style-based inverse kinematics. ACM Trans. Graph., 23(3):522–531, 2004.
[30] Lorna Herda, Raquel Urtasun, Pascal Fua, and Andrew Hanson. Automatic deter-
mination of shoulder joint limits using quaternion field boundaries. I. J. Robotic
Res., 22(6):419–438, 2003.
[31] Jessica K. Hodgins, James F. O’Brien, and Jack Tumblin. Perception of human
motion with different geometric models. IEEE Transactions on Visualization and
Computer Graphics, 4(4):307–316, 1998.
[32] Jessica K. Hodgins, Wayne L. Wooten, David C. Brogan, and James F. O’Brien.
Animating human athletics. In SIGGRAPH ’95: Proceedings of the 22nd annual
Bibliography 97
conference on Computer graphics and interactive techniques, pages 71–78, New York,
NY, USA, 1995. ACM Press.
[33] Michael G. Hollars, Dan E. Rosenthal, and Michael A. Sherman. SD Fast User’s
Manual, 1994.
[34] A. Hyvarinen and E. Oja. Independent component analysis: algorithms and appli-
cations. Neural Netw., 13(4-5):411–430, 2000.
[35] T. Igarashi, T. Moscovich, and J. F. Hughes. Spatial keyframing for
performance-driven animation. In SCA ’05: Proceedings of the 2005 ACM SIG-
GRAPH/Eurographics symposium on Computer animation, pages 107–115, New
York, NY, USA, 2005. ACM Press.
[36] Charles E. Jacobs, Adam Finkelstein, and David H. Salesin. Fast multiresolution
image querying. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on
Computer graphics and interactive techniques, pages 277–286, New York, NY, USA,
1995. ACM Press.
[37] Michael Patrick Johnson. Exploiting Quaternions to Support Expressive Interactive
Character Motion. PhD thesis, Massachusettes Institute of Technology, 2003.
[38] Kanav Kahol, Priyamvada Tripathi, and Sethuraman Panchanathan. Automated
gesture segmentation from dance sequences. In Sixth IEEE International Conference
on Automatic Face and Gesture Recognition, pages 883–888, 2004.
[39] Ladislav Kavan and Jiri Zara. Spherical blend skinning: a real-time deformation of
articulated models. In SI3D ’05: Proceedings of the 2005 Symposium on Interactive
3D Graphics and Games, pages 9–16, New York, NY, USA, 2005. ACM Press.
[40] Eamonn Keogh, Themis Palpanas, Victor Zordan, Dimitrios Gunopulos, and Marc
Cardle. Indexing large human-motion databases. In VLDB 2004, 2004.
Bibliography 98
[41] Eamonn J. Keogh and Michael J. Pazzani. Scaling up dynamic time warping to
massive dataset. In PKDD ’99: Proceedings of the Third European Conference on
Principles of Data Mining and Knowledge Discovery, pages 1–11, London, UK, 1999.
Springer-Verlag.
[42] Evangelos Kokkevis, Dimitri Metaxas, and Norman I. Badler. User-controlled
physics-based animation for articulated figures. In CA ’96: Proceedings of the Com-
puter Animation, page 16, Washington, DC, USA, 1996. IEEE Computer Society.
[43] Lucas Kovar and Michael Gleicher. Flexible automatic motion blending with regis-
tration curves. In SCA ’03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics
Symposium on Computer animation, pages 214–224, Aire-la-Ville, Switzerland,
Switzerland, 2003. Eurographics Association.
[44] Lucas Kovar and Michael Gleicher. Automated extraction and parameterization of
motions in large data sets. ACM Trans. Graph., 23(3):559–568, 2004.
[45] Lucas Kovar, Michael Gleicher, and Frédéric Pighin. Motion graphs. In
SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics
and interactive techniques, pages 473–482, New York, NY, USA, 2002. ACM Press.
[46] Richard Kulpa, Franck Multon, and Bruno Arnaldi. Morphology-independent rep-
resentation of motions for interactive human-like animation. In Eurographics 2005,
August 2005.
[47] Joe Laszlo, Michael Neff, and Karan Singh. Predictive feedback for interactive
control of physics-based characters. In Eurographics 2005, August 2005.
[48] Joseph Laszlo, Michiel van de Panne, and Eugene Fiume. Interactive control for
physically-based animation. In SIGGRAPH ’00: Proceedings of the 27th Annual
Conference on Computer Graphics and Interactive Techniques, pages 201–208, New
York, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co.
Bibliography 99
[49] Yan Li, Tianshu Wang, and Heung-Yeung Shum. Motion texture: a two-level sta-
tistical model for character motion synthesis. In SIGGRAPH ’02: Proceedings of
the 29th annual conference on Computer graphics and interactive techniques, pages
465–472. ACM Press, 2002.
[50] Jos M Martnez, Rob Koenen, and Fernando Pereira. Mpeg-7: the generic multimedia
content description standard. IEEE Computer Society, pages 78–87, 2002.
[51] Scott McCloud. Understanding Comics. Perennial Currents, 1994.
[52] David M Mount. ANN Programming Manual, 2005.
[53] J.C Nebel. Keyframe animation of articulated figures using autocollision-free inter-
polation. In 17th Eurographics UK Conference’99, Cambridge, UK, April 1999.
[54] Michael Neff and Eugene Fiume. Modeling tension and relaxation for computer
animation. In SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics
symposium on Computer animation, pages 81–88, New York, NY, USA, 2002. ACM
Press.
[55] Michael Neff and Eugene Fiume. Methods for exploring expressive stance. In SCA
’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Com-
puter animation, pages 49–58, New York, NY, USA, 2004. ACM Press.
[56] Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications
in speech recognition. Readings in speech recognition, pages 267–296, 1990.
[57] Stephane Redon, Young J. Kim, Ming C. Lin, and Dinesh Manocha. Fast continuous
collision detection for articulated models. In Proceedings of ACM Symposium on
Solid Modeling and Applications, 2004.
Bibliography 100
[58] Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. Verbs and adverbs:
Multidimensional motion interpolation. IEEE Comput. Graph. Appl., 18(5):32–40,
1998.
[59] Arno Schodl, Richard Szeliski, David H. Salesin, and Irfan Essa. Video textures. In
Kurt Akeley, editor, Siggraph 2000, Computer Graphics Proceedings, pages 489–498.
ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2000.
[60] Ari Shapiro, Fred Pighin, and Petros Faloutsos. Hybrid control for interactive char-
acter animation. In PG ’03: Proceedings of the 11th Pacific Conference on Computer
Graphics and Applications, page 455, Washington, DC, USA, 2003. IEEE Computer
Society.
[61] Hyun Joon Shin, Jehee Lee, Sung Yong Shin, and Michael Gleicher. Computer
puppetry: An importance-based approach. ACM Trans. Graph., 20(2):67–94, 2001.
[62] Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. Boost Graph Library, The:
User Guide and Reference Manual, 2002.
[63] Danijel Skocaj and Ales Leonardis. Weighted incremental subspace learning. In
Workshop on Cognitive Vision, proceedings, Zurich, Switzerland, September 19-20
2002.
[64] T. Starner and A. Pentland. Visual recognition of american sign language using
hidden markov models. In International Workshop on Automatic Face and Gesture
Recognition, pages 189–194, 1995.
[65] Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for computer
graphics: A primer, part 1. IEEE Comput. Graph. Appl., 15(3):76–84, 1995.
[66] Harold C. Sun and Dimitris N. Metaxas. Automating gait generation. In SIG-
GRAPH, pages 261–270, 2001.
Bibliography 101
[67] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework
for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.
[68] Atulya Velivelli, ChengXiang Zhai, and Thomas S. Huang. Audio segment retrieval
using a short duration example query. In ICME, pages 1603–1606, 2004.
[69] M. Vlachos, G. Kollios, and D. Gunopulos. Discovering similar multidimensional
trajectories. In In Proc. of 18th ICDE, San Jose, p. 673684, CA, 2002., 2002.
[70] Andrew D. Wilson and Aaron F. Bobick. Realtime online adaptive gesture recogni-
tion. In ICPR, pages 1270–1275, 2000.
[71] Po-Feng Yang, Joe Laszlo, and Karan Singh. Layered dynamic control for in-
teractive character swimming. In SCA ’04: Proceedings of the 2004 ACM SIG-
GRAPH/Eurographics symposium on Computer animation, pages 39–47, New York,
NY, USA, 2004. ACM Press.
[72] Atsuo Yoshitaka and Tadao Ichikawa. A survey on content-based retrieval for multi-
media databases. IEEE Transactions on Knowledge and Data Engineering, 11(1):81–
93, 1999.
[73] Victor Brian Zordan and Jessica K. Hodgins. Motion capture-driven simulations that
hit and react. In SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics
symposium on Computer animation, pages 89–96, New York, NY, USA, 2002. ACM
Press.