6
Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People Tracking Jakob Ziegler Henrik Kretzschmar Cyrill Stachniss Giorgio Grisetti Wolfram Burgard Abstract— A large number of applications use motion capture systems to track the location and the body posture of people. For instance, the movie industry captures actors to animate virtual characters that perform stunts. Today’s tracking systems either operate with statically mounted cameras and thus can be used in confined areas only or rely on inertial sensors that allow for free and large-scale motion but suffer from drift in the pose estimate. This paper presents a novel tracking approach that aims to provide globally aligned full body posture estimates by combining a mobile robot and an inertial motion capture system. In our approach, a mobile robot equipped with a laser scanner is used to anchor the pose estimates of a person given a map of the environment. It uses a particle filter to globally localize a person wearing a motion capture suit and to robustly track the person’s position. To obtain a smooth and globally aligned trajectory of the person, we solve a least squares optimization problem formulated from the motion capture suite and tracking data. Our approach has been implemented on a real robot and exhaustively tested. As the experimental evaluation shows, our system is able to provide locally precise and globally aligned estimates of the person’s full body posture. I. I NTRODUCTION The movie industry and creators of computer games are ma- jor markets for professional motion capture systems. Marker- based visual tracking approaches that use multiple statically mounted cameras, such as the Vicon mocap studio [18], belong to the most commonly used systems. These tracking systems are highly accurate but restrict the actor to a rather small area that can be captured by the cameras. To overcome this limitation, motion capture systems based on inertial measurement units (IMUs), such as the Xsens MVN motion capture suit [20], have been developed. These inertial motion capture suits attach multiple IMUs to a person and provide an estimate of the person’s full body posture. They can accurately track the person’s pose locally but suffer from global drift particularly when the person moves over large distances. As this drift accumulates over time due to the incremental nature of the IMU-based filter, interactions between the person and the environment itself are difficult to realize. Consequently, applications that require a locally accurate as well as a globally correctly aligned estimate of the person’s posture are still limited to rather small scenes. The contribution of this paper is a novel approach that bridges this gap by combining locally accurate full body posture estimates with a precise global alignment of the All authors are with the University of Freiburg, Department of Computer Science, Freiburg, Germany. G. Grisetti is also with Sapienza University of Rome, Department of Systems and Computer Science, Rome, Italy. This work has been supported by the German Research Foundation (DFG) under contract number SFB/TR-8 and by Microsoft Research, Redmond. Their support is gratefully acknowledged. Fig. 1: Globally aligned full body posture estimate of a person obtained using an inertial motion capture system and a mobile robot equipped with a laser range finder. tracked person with respect to the environment. Our approach combines the estimates of an inertial motion capture suit with particle filter-based people tracking techniques using a mobile robot equipped with a laser range finder (see Fig. 1). To obtain accurate trajectory estimates, our method computes the solution of a global least squares problem consisting of the estimates of both the inertial motion capture suit and the laser-based tracker. As a result, our approach is able to compute a globally accurate alignment of the smooth track of the person in the scene including the full body posture estimate of the person from the inertial motion capture suit. II. RELATED WORK Tracking people in the environment is a well studied problem in robotics and related disciplines. Approaches to people tracking in robotics can be classified according to the used sensor, e.g., laser scanners [1], [7], monocular cameras [3], stereo cameras [6], IMUs [16], or combinations of these sensors [4], [10], [12], [19], [21]. Key techniques found in most people tracking approaches are Kalman filters [5], [7], [8], particle filters [15], [19], and HMMs [2]. To tackle the data association problem, approaches rely on nearest neighbor techniques [15], the Hungarian method [3], multi-hypothesis data association trackers [1], probabilistic data association filters [12], or its joint variant called JPDAFs [2], [17]. Montemerlo et al. [15] present an approach to simulta- neously localize a mobile robot and people in its vicinity. This method is related to our system in so far as we also estimate the trajectory of the robot and the person jointly. In contrast to Montemerlo et al., we do not only track but also smooth the output to obtain a more accurate maximum likelihood solution. However, we do not consider that the

Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

Accurate Human Motion Capture in Large Areasby Combining IMU- and Laser-Based People Tracking

Jakob Ziegler Henrik Kretzschmar Cyrill Stachniss Giorgio Grisetti Wolfram Burgard

Abstract— A large number of applications use motion capturesystems to track the location and the body posture of people.For instance, the movie industry captures actors to animatevirtual characters that perform stunts. Today’s tracking systemseither operate with statically mounted cameras and thus can beused in confined areas only or rely on inertial sensors that allowfor free and large-scale motion but suffer from drift in the poseestimate. This paper presents a novel tracking approach thataims to provide globally aligned full body posture estimates bycombining a mobile robot and an inertial motion capture system.In our approach, a mobile robot equipped with a laser scanneris used to anchor the pose estimates of a person given a mapof the environment. It uses a particle filter to globally localizea person wearing a motion capture suit and to robustly trackthe person’s position. To obtain a smooth and globally alignedtrajectory of the person, we solve a least squares optimizationproblem formulated from the motion capture suite and trackingdata. Our approach has been implemented on a real robot andexhaustively tested. As the experimental evaluation shows, oursystem is able to provide locally precise and globally alignedestimates of the person’s full body posture.

I. INTRODUCTION

The movie industry and creators of computer games are ma-jor markets for professional motion capture systems. Marker-based visual tracking approaches that use multiple staticallymounted cameras, such as the Vicon mocap studio [18],belong to the most commonly used systems. These trackingsystems are highly accurate but restrict the actor to a rathersmall area that can be captured by the cameras. To overcomethis limitation, motion capture systems based on inertialmeasurement units (IMUs), such as the Xsens MVN motioncapture suit [20], have been developed. These inertial motioncapture suits attach multiple IMUs to a person and provide anestimate of the person’s full body posture. They can accuratelytrack the person’s pose locally but suffer from global driftparticularly when the person moves over large distances. Asthis drift accumulates over time due to the incremental natureof the IMU-based filter, interactions between the person andthe environment itself are difficult to realize. Consequently,applications that require a locally accurate as well as aglobally correctly aligned estimate of the person’s postureare still limited to rather small scenes.

The contribution of this paper is a novel approach thatbridges this gap by combining locally accurate full bodyposture estimates with a precise global alignment of the

All authors are with the University of Freiburg, Department of ComputerScience, Freiburg, Germany. G. Grisetti is also with Sapienza University ofRome, Department of Systems and Computer Science, Rome, Italy.

This work has been supported by the German Research Foundation (DFG)under contract number SFB/TR-8 and by Microsoft Research, Redmond.Their support is gratefully acknowledged.

Fig. 1: Globally aligned full body posture estimate of a personobtained using an inertial motion capture system and a mobile robotequipped with a laser range finder.

tracked person with respect to the environment. Our approachcombines the estimates of an inertial motion capture suitwith particle filter-based people tracking techniques using amobile robot equipped with a laser range finder (see Fig. 1).To obtain accurate trajectory estimates, our method computesthe solution of a global least squares problem consisting ofthe estimates of both the inertial motion capture suit andthe laser-based tracker. As a result, our approach is able tocompute a globally accurate alignment of the smooth trackof the person in the scene including the full body postureestimate of the person from the inertial motion capture suit.

II. RELATED WORK

Tracking people in the environment is a well studiedproblem in robotics and related disciplines. Approaches topeople tracking in robotics can be classified according tothe used sensor, e.g., laser scanners [1], [7], monocularcameras [3], stereo cameras [6], IMUs [16], or combinationsof these sensors [4], [10], [12], [19], [21].

Key techniques found in most people tracking approachesare Kalman filters [5], [7], [8], particle filters [15], [19], andHMMs [2]. To tackle the data association problem, approachesrely on nearest neighbor techniques [15], the Hungarianmethod [3], multi-hypothesis data association trackers [1],probabilistic data association filters [12], or its joint variantcalled JPDAFs [2], [17].

Montemerlo et al. [15] present an approach to simulta-neously localize a mobile robot and people in its vicinity.This method is related to our system in so far as we alsoestimate the trajectory of the robot and the person jointly.In contrast to Montemerlo et al., we do not only track butalso smooth the output to obtain a more accurate maximumlikelihood solution. However, we do not consider that the

Page 2: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

laser beams reflected by a person may negatively influencethe pose estimate of the robot itself.

Grzonka et al. [9] present an approach which is able toreconstruct the trajectory of a person wearing an inertialmotion capture suit along with localized events such asopening doors to estimate the path of the person as wellas a topological-metric map.

The approach presented in this paper is based on a mobilerobot that observes the tracked person with a laser-range finderand a mobile motion capture system based on IMUs, such asthe Xsens MVN suit. Our method is able to provide globallyconsistent and locally accurate full-body motion estimates ofa person, even over large distances. We use a combination ofparticle filter-based localization, people tracking, and globaloptimization to obtain the most likely motion path. In additionto that, our approach can deal with situations in which theperson being tracked is temporarily outside the field of viewof the robot.

III. APPROACH TO PARTICLE FILTER-BASED PERSONTRACKING USING AN INERTIAL MOTION CAPTURE SUIT

In our approach, we use a particle filter to track the pose xtof a person. We estimate the posterior

p(xt | z1:t, u1:t) ∝

p(zt | xt)∫x′p(xt | x′, ut)p(x′ | z1:t−1, u1:t−1) dx′, (1)

where ut describes the movement of the person, and ztcorresponds to an observation of the person at time t. Theparticle filter estimates this posterior using the motion modelp(xt | xt−1, ut) and the observation model p(zt | xt). In ourapproach, the motion model is based on the estimate providedby the inertial motion capture suit, and the observation modelis based on the laser measurements obtained by the robot.

A. Motion ModelAn estimate of the motion of the person to be tracked is

provided by the inertial motion capture suit. In our currentimplementation, the Xsens MVN suit estimates the motionof the person’s full body posture using a set of IMUs.Consequently, the estimate of the inertial motion capture suitis affected by drift. Although the relative motion estimate isglobally inaccurate, it is typically locally precise and thereforewell-suited for the motion model.

The estimates provided by the inertial motion capture suitare with respect to its own reference frame. The motionut = (T,∆θ)> consists of a translation T and a change ∆θin orientation.

The particle filter algorithm samples a new pose for eachparticle according to the motion model based on ut and itsestimate of the current state. Assuming Gaussian noise in Tand ∆θ, the sampled motion for a sample i is given by

∆θi = ∆θ + εrot,where εrot ∼ N (0, α1|T |+ α2|∆θ|) (2)

T i = T + εtrans,where εtrans ∼ N (0, α3|T |+ α4|∆θ|). (3)

The noise parameters α1, α2, α3, and α4 determine the uncer-tainty with respect to translation and rotation. Finally, the newpose for sample i is computed as xit = xit−1 ⊕ (T i,∆θi)>.

B. Observation Model

The particle filter uses the observation model p(zt | xt)to correct the prediction estimated by the motion model. Inour approach, the observation model is based on a (localized)mobile robot equipped with a laser range finder.

To obtain p(zt | xt), we compute the expected observationof the tracked person given its current estimated pose. Wedo so by constructing a 3D skeleton model of the personusing the current estimate of the person’s full body posture,which is provided by the inertial motion capture suit. Theintersection of the skeleton and the plane of the laser rangefinder indicates where the legs of the person are expected toappear in the scan.

To evaluate how well the expected observation matches theactual observation, we detect potential legs in the current laserrange scan. We consider the distances between the expectedlocations of the legs and the locations of the detected legs. Letp(ll) = p(lr) be the probability of the detector identifyingthe legs of the person to track, which we assume to beindependent from the state xt and the measurement zt. Thisprobability can be easily learned from a labeled dataset. Sincethe data associations are unknown, we integrate/sum over allpossible associations. Assuming Gaussian noise, we have

p(zt | xt) ∝ p(¬ll)p(¬lr)

+ p(ll)p(lr)∑

ll 6=lr∈Lexp(−λ(‖l∗l − ll‖2 + ‖l∗r − lr‖2))

+ p(ll)p(¬lr)∑

ll∈Lexp(−λ‖l∗l − ll‖2)

+ p(¬ll)p(lr)∑

lr∈Lexp(−λ‖l∗r − lr‖2), (4)

where l∗l and l∗r are the positions of the left and the rightleg of the skeleton, according to the current particle, and Lrefers to the set of legs detected in the current range scan.Finally, λ is a scaling parameter that determines how peakedthe distribution is.

Our approach applies a technique similar to that proposedby Kluge [11] to detect legs in the laser range scans. We groupthe laser endpoints into segments based on the Euclideandistance between the endpoints of the laser beams andsubsequently examine the width and shape of each segment.If the width of a segment approximately corresponds to thatof a human leg and the shape roughly resembles a half circle,the segment is classified as a leg candidate.

Note that for evaluating a particle it is more important toobtain few false-negatives (i.e., missing a leg) rather than anon-zero number of false positives (i.e., legs of other personsor similarly looking objects). This is similar to the observationlikelihood in robot localization, which often is a multi-modaldistribution over the environment. Hence, this relatively naiveheuristic for leg extraction turned out to be sufficient inpractice.

To update the weight wi of each sample i we apply

wit = ηp

(zt | xit

)wi

t−1, (5)

where η is a normalization constant computed by summingup the individual likelihoods. Our approach carries out

Page 3: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

the resampling step selectively by considering the effectivenumber of samples, similar to Liu [14].

C. Globally Localizing a Person

A common approach to global localization is to uniformlysample poses from the entire state space, which is ratherinefficient. An alternative method is to add samples to theparticle filter such that the probability of adding a particularsample is proportional to its observation likelihood, similarto Lenser and Veloso [13]. In our application, however, sucha technique is likely to make the particle filter diverge sincethe robot may observe a different person while the personthat is supposed to be tracked is outside the robot’s field ofview.

To mitigate this problem, we require that a potential posebe consistent with the measurements for some time beforewe incorporate it into the particle filter. For this purpose, wesample from the distribution

xt ∝ p(xt | zt−k:t, ut−k:t), (6)

taking into account the last k measurement updates andmotion predictions. In this way, our algorithm preventssamples that correspond to a different person and that movedifferently from being added to the particle filter as theperson to be tracked. To efficiently sample from (6), we runan additional particle filter at the corresponding locationsand only incorporate the samples into the main particle filterwhose motion and measurements are consistent.

D. Following a Person

Our approach is designed to track the person over longerperiods of time and longer distances. Therefore, the robotneeds to follow the person in order to obtain relevantobservations, which are required to correct the pose estimateobtained from the inertial motion capture suit. To follow thetracked person, our current implementation uses a standardrobot navigation approach to guide the robot to the vicinityof the person’s current location.

IV. OPTIMIZING THE TRAJECTORIES OF PERSONSBY MEANS OF LEAST SQUARES

When an animated character walks in a movie or in a virtualgame according to the recorded trajectory, it is necessary thatthis trajectory accurately reflects the motion of the person.A tracking approach, especially based on particle filters,however, can lead to non-smooth tracks with small jumpsin the trajectory. In an animation, this would look like thecharacter being teleported between nearby locations becausethe track does not reflect the smooth motion of a person thatthe inertial motion capture suit itself provides.

To bridge this gap between local and global accuracy,we perform a post-processing step that refines the trajectoryestimate provided by the particle filter. Our solution presentedhere bears resemblance with the graph-based SLAM problem,except that three different types of constraints are considered,namely the location of the robot, the person’s location

rt−1 rt rt+1

µMCLt−1 µMCL

t µMCLt+1

xt−1 xt xt+1

(zt−1,Ωzt−1 ) (zt,Ωzt ) (zt+1,Ωzt+1 )

(ut,Ωut ) (ut+1,Ωut+1 )

(0,Ωrt−1 ) (0,Ωrt ) (0,Ωrt+1 )

Fig. 2: The structure of the underlying optimization problemillustrated as a graph, similar to the pose graph in the SLAMproblem. The nodes in the graph represent the poses rt of the robotwhile observing the person and the tracker estimates of the person’slocations xt. The edges model the laser-based observations zt andthe inertial motion capture suit data ut. The poses rt of the robotare anchored in the global reference frame according to the MCLestimate of the robot during data acquisition. Uncertainties aremodeled by the corresponding information matrices Ω.

observation, and the inertial motion capture suit data. SeeFig. 2 for an illustration.

As before, let xt be the person’s pose at time t, let zt bethe observation of that person seen from the location rt of therobot, and let ut be the motion estimate of the inertial motioncapture suit. Let Ωzt be the information matrix encodingthe accuracy of the estimated poses of the particle-basedtracker of the person (see Section III), let and Ωrt be theaccuracy of the localization of the robot itself, which is, incontrast to Section III, considered explicitly here. Let Ωut

bethe information matrix encoding the accuracy of the inertialmotion capture suit.

The overall goal is to find the most likely configurationx∗1:n of the person’s trajectory of length n as

x∗1:n = argmaxx1:n

p(x1:n, r1:n | z1:n, u1:n). (7)

Under the assumption of independent Gaussian noise, themaximum likelihood estimation is equivalent to a least squareserror minimization given the error function

F =

n∑t=1

(e>ztΩztezt︸ ︷︷ ︸fzt

+ e>rtΩrtert︸ ︷︷ ︸frt

) +

n∑t=2

e>utΩzteut︸ ︷︷ ︸fut

. (8)

The individual error terms ert for the robot’s own localization,ezt for the observed locations of the person, and eut for theinertial motion capture suit estimate are defined as

ezt = (rt ⊕ zt) xt (9)

ert = rt µMCLt (10)

eut= (xt−1 ⊕ ut) xt. (11)

Minimizing these error functions is not directly possible sincethey include non-linear terms. Therefore, we compute anapproximation of these error functions by applying a Taylorexpansion. For simplicity of notation, we introduce ekt

asa generic error function that can be used for k ∈ z, r, u.Given this notation, the approximations of the individual errorfunctions can be written as

ekt(x + ∆x) ' ekt(x) + Jkt∆x, (12)

Page 4: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

where x is a stacked vector of the person’s poses x1:t andthe robot poses r1:t. Furthermore, Jkt

is the Jacobian of thecorresponding error functions given in (9)-(11). Given thisapproximation of the error function, we can write

fkt(x + ∆x)

' e>ktΩktekt︸ ︷︷ ︸ckt

+2 e>ktΩktJkt︸ ︷︷ ︸bkt

∆x + ∆x> J>ktΩktJkt︸ ︷︷ ︸Hkt

∆x

= ckt+ 2bkt

∆x + ∆x>Hkt∆x, (13)

yielding the variables ckt , bkt , and Hkt . Based on thisequation, we can directly obtain an approximation of F thatcan be solved via the Gauss-Newton algorithm:

F 'n∑

t=1

∑k∈z,r

ckt+ 2bkt

∆x + ∆x>Hkt∆x

+

n∑t=2

cut+ 2but

∆x + ∆x>Hut∆x (14)

= c + 2b>∆x + ∆x>H∆x. (15)

The quadratic form in (15) is obtained from (14) by settingc =

∑ckt

, b =∑

bkt, and H =

∑Hkt

. To minimize (15),we derive it with respect to ∆x and set it to zero. Thus, theminimum can be found by solving the linear system

H∆x∗ = −b. (16)

Note that the matrix H is sparse by construction, whichallows us to efficiently solve (16) without computing H−1

explicitly by applying a sparse Cholesky factorization.The solution ∆x∗ to (16) is used to update the current

estimate of the trajectory of the person and the robot by

x← x + ∆x∗. (17)

The Gauss-Newton algorithm iterates the procedure untilconvergence and yields a solution which represents themaximum likelihood trajectory of the person given allobservations. Our experiments suggest that the resultingsmooth trajectory estimate is well suited for rendering avirtual character locally and globally consistent in a virtualscene.

V. EXPERIMENTAL EVALUATION

To evaluate the presented approach, we carried out severalexperiments. We there used an Xsens MVN suit, which is aninertial mobile motion capture suit that consists of 17 IMUsand estimates the full body posture of the person that iswearing it. We furthermore used an ActivMedia PowerBotmobile robot equipped with a SICK LMS laser ranger finder.

A. Accuracy of the MVN Inertial Motion Capture Suit

The first experiment evaluates the global accuracy of theXsens MVN suit without any external system, such as themobile robot. A subject therefore walked around our campusmultiple times in a loop passing predefined locations. Eachloop had a length of 300 m-350 m. Figure 3 depicts a typicaltrajectory. The red solid line is the raw estimate of the

meters30150

manually verified trajectoryMVN suit estimate

meters30150

manually verified trajectoryMVN suit estimate

5.2 m

Fig. 3: The accuracy of the Xsens MVN suit in an experimentconducted on our campus. The manually verified trajectory of aperson (obtained by inspecting every scan) is plotted as a dashedline. The raw trajectory estimate of the Xsens MVN suit is depictedin red. To measure the global drift, the start and the end locationsof the trajectory were identical (see magnification).

Xsens MVN suit. The black dashed line is a ground truthtrajectory that was recorded by an accurately localized mobilerobot. This ground truth trajectory was obtained by manuallyverifying every single laser scan of the robot and providingthe data association. After having walked a loop of 350 m,the subject returned to its starting point, which allowed usto measure an offset of 5.2 m. The run includes a number ofrapid motions and small turns as can be seen by the trajectory.

In addition to that, we measured the tracking error whenwalking 10 times around the campus. Fig. 4 depicts thecorresponding plot. Along this trajectory, we physicallylabeled 9 locations (1 and 10 are the same). We then evaluatedthe error for each round individually. The error-bars in Fig. 5show the standard deviation around the mean error. The redand blue points correspond to the estimates of the XsensMVN suit and our approach, respectively. We also carriedout a paired sample t-test taking into account the errors at allcheckpoints, which suggests that the results are statisticallysignificant at a 99% confidence level.

B. Following a Person

The next experiment evaluates the capability of ourapproach to track and follow a person over long distances.We therefore used our approach to track a person that walkeda distance of more than 1 km. In this experiment, the personwearing the Xsens MVN suit was additionally accompaniedby three other persons walking closely to the tracked person,frequently blocking the field of view of the robot. Furthermore,there were several other people walking independently onour campus. During this experiment, the robot autonomouslyfollowed the tracked person. Fig. 6 shows the trajectory of thetracked person estimated by our approach (blue) and estimatedby the Xsens MVN suit (red). In all of our experiments, ourapproach never lost the person that it followed, which suggeststhat our approach operates robustly.

Page 5: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

1 2 3 4 5 6 7 8 9 100

5

10

15

20

round

erro

r[m

]MVN suit estimate tracker estimate

Fig. 4: A person walked on a 300 m loop ten times. After everyround, we calculated the error of the pose estimate of the XsensMVN suit with respect to a known position, which we had markedon the ground. In contrast to the Xsens MVN suit estimate thatdrifts, the error of our tracking system is significantly lower andstayed always below 20 cm during the entire experiment.

1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

10

checkpoint

erro

r[m

]

MVN suit estimate tracker estimate

Fig. 5: The error of the estimate of the Xsens MVN suit (red) andof our tracker (blue) at predefined checkpoints along a 300 m loop(10 runs).

C. Optimization of the Person’s Trajectory

The particle filter-based tracking approach described inSection III is well suited for online person tracking. Incertain situations, however, the estimate of a particle filter isnon-smooth, which is suboptimal for our application. Whenrendering animated characters in virtual games or movies,we do not only require the person’s trajectory estimate to beglobally consistent but also locally accurate and smooth. Inthis section, we evaluate our optimization procedure presentedin Section IV.

To analyze the local smoothness of the trajectory obtainedby our approach, we used the dataset from the previousexperiment (see Fig. 6). In this experiment, we selectedtwo relevant situations to analyze the smoothing. In the firstsituation, there was an error in the leg detection, whichcaused a local jump in the trajectory estimate of the person,as shown in Fig. 7. In animations, these jumps are undesirable.Fortunately, our smoothing approach is able to compensate forthat by exploiting the high local accuracy of the estimate of theinertial motion capture suit. In other words, the determinantof Ωut

is bigger than the one of Ωzt .The second situation illustrates that the optimization

preserves local movements of the person, as seen in Fig. 8.The MVN suit estimate (red) and the estimate of ourapproach (blue) exhibit the same local behavior (apart fromthe global displacement), which suggests that local motionsare appropriately conserved by the least squares optimization.

meters30150

MVN suit estimateour approach

Fig. 6: Trajectory estimated while multiple people were walkingclose to the person to be tracked causing substantial occlusions. Theblack rectangle at the bottom corresponds to the magnified viewshown in Fig. 8.

Fig. 7: Comparison of three trajectory estimates of a person, walkingroughly on a straight line. The image on the left shows a part of thetrajectory in the map whereas the right plots show the trajectoriesgiven the different estimation approaches. Top: particle filter-basedtracker, Middle: Xsens MVN suit, Bottom: our approach usingsmoothing. The estimates of our approach preserve the local accuracyof the inertial motion capture suit estimate but are located correctlyin the global reference frame.

Note that our smoothing approach can be implemented veryefficiently. For instance, the dataset shown in Fig. 6, whichcomprises 5343 nodes and 7158 constraints, was optimizedby our approach in around 45 ms per iteration. Given thegood initial guess of the tracker, typically 2-3 iterations aresufficient for convergence.

D. Comparison to Other Motion Models

This final experiment illustrates that a standard motionmodel, such as a constant velocity motion model or aBrownian motion model, is not sufficient for robust persontracking in crowded scenes. This motivates the usage of theinertial motion capture suit (or something similar).

In this experiment, the robot aims to track one specificperson of a walking group of three people. We initialized thetracker with the correct person and evaluated the trackingperformance of three different motion models, i.e., the inertialmotion capture suit motion model, a constant velocity motionmodel, and a Brownian motion model. As can be seen inFigure 9, the Brownian motion as well as the constant velocitymotion model fail to keep track of the person. Both filters

Page 6: Accurate Human Motion Capture in Large Areas by Combining ...stachnis/pdf/ziegler11iros.pdf · Accurate Human Motion Capture in Large Areas by Combining IMU- and Laser-Based People

meters

210

Fig. 8: The smoothed estimate (blue) preserves local movementsin the trajectory (red) such as small loops (no differences in localshape).

0 5 10 15 20 25 30 35 40

0

2

4

6

8

time [s]

erro

r[m

]

Brownian noise motion modelconstant velocity motion model

MVN suit motion model

Fig. 9: Performance of the tracker using alternative motion models(Brownian motion and constant velocity) in the presence of a secondperson in the scene. As can be seen, the Brownian noise motionmodel and the constant velocity motion model fail to accuratelytrack the person. In contrast to that, the error of the inertial motioncapture suit motion model stays close to zero.

failed when other people came too close to the person totrack. Once the filter diverged to a wrong person, it has nomeans to realize its error. In contrast, the motion model usingthe inertial motion capture suit data is able to correctly trackthe right person.

VI. CONCLUSION

In this paper, we addressed the problem of accurately andconsistently tracking the full body posture of a person in large-scale environments over longer periods of time. Our approachrelies on the locally accurate full body posture estimates ofan inertial motion capture suit and tracks the position of theperson in the environment using an autonomously navigatingrobot equipped with a laser range finder. To provide a smoothestimate of the person’s trajectory, we formulate a joint leastsquares optimization problem that computes a maximumlikelihood trajectory estimate given all measurements. Weimplemented our approach and exhaustively tested it in realworld scenarios. As shown in our experiments, our algorithmyields smooth, consistent, and globally aligned estimates ofthe full body posture of the person.

Our method can be applied in areas which are too large fortracking systems based on static cameras and in which IMU-based systems suffer from substantial drift. It thus overcomesthe key limitations of today’s commercially available motioncapture systems.

Note that our approach is not restricted to a single robot.Multiple robots or multiple static laser scanners can be used

with only marginal changes. In such a case, every robotintroduces nodes for its own poses (rt). Static sensors area special case of mobile robots. Instead of multiple nodesrepresenting the different robots’ locations, a single node (r)is sufficient for each sensor. As a result, the optimizationapproach automatically calibrates the sensors in the globalcoordinate frame.

REFERENCES

[1] K. Arras, S. Grzonka, M. Luber, and W. Burgard. Efficient peopletracking in laser range data using a multi-hypothesis leg-trackerwith adaptive occlusion probabilities. In Proc. IEEE InternationalConference on Robotics and Automation (ICRA), 2008.

[2] M. Bennewitz. Mobile Robot Navigation in Dynamic Environments.PhD thesis, University of Freiburg, Dept. of Computer Science, 2004.

[3] M. Bennewitz, F. Faber, D. Joho, and S. Behnke. Fritz – A humanoidcommunication robot. In Proc. of the IEEE Int. Symp. on Robot andHuman Interactive Communication (RO-MAN), 2007.

[4] Z. Byers, M. Dixon, K. Goodier, C.M. Grimm, and W.D. Smart. Anautonomous robot photographer. In Proc. of the IEEE/RSJ Int. Conf. onIntelligent Robots and Systems (IROS), pages 2636–2641, 2003.

[5] J. Cui, H. Zha, H. Zhao, and R. Shibasaki. Tracking multiple peopleusing laser and vision. In Proc. of the IEEE/RSJ Int. Conf. on IntelligentRobots and Systems (IROS), pages 2116–2121, 2005.

[6] A. Ess, K. Schindler, B. Leibe, and L. Van Gool. Moving obstacledetection in highly dynamic scenes. In Proc. of the IEEE Int. Conf. onRobotics & Automation (ICRA), 2009.

[7] A. Fod, A. Howard, and M.J. Mataric. Laser-based people tracking. InProc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), pages3024 –3029, 2002.

[8] L.M. Fuentes and S.A. Velastin. People tracking in surveillanceapplications. Image and Vision Computing, pages 1165–1171, 2006.

[9] S. Grzonka, F. Dijoux, A. Karwath, and W. Burgard. Mappingindoor environments based on human activity. In Proc. of the IEEEInt. Conf. on Robotics & Automation (ICRA), pages 476–481, 2010.

[10] M. Kleinehagenbrock, S. Lang, J. Fritsch, F. Lomker, G.A. Fink, andG. Sagerer. Person tracking with a mobile robot based on multi-modalanchoring. In Proc. of the IEEE Int. Symp. on Robot and HumanInteractive Communication (RO-MAN), pages 423–429, 2002.

[11] B. Kluge, C. Kohler, and E. Prassler. Fast and robust tracking ofmultiple moving objects with a laser range finder. In Proc. of the IEEEInt. Conf. on Robotics & Automation (ICRA), 2001.

[12] M. Kobilarov, G. Sukhatme, J. Hyams, and P. Batavia. People trackingand following with mobile robot using an omnidirectional camera anda laser. In Proc. of the IEEE Int. Conf. on Robotics & Automation(ICRA), pages 557–562, 2006.

[13] S. Lenser and M. Veloso. Sensor resetting localization for poorlymodelled mobile robots. In Proc. of the IEEE Int. Conf. on Robotics& Automation (ICRA), pages 1225–1232, 2000.

[14] J.S. Liu. Metropolized independent sampling with comparisons torejection sampling and importance sampling. Statist. Comput., 6:113–119, 1996.

[15] M. Montemerlo, S. Thrun, and W. Whittaker. Conditional particlefilters for simultaneous mobile robot localization and people-tracking.In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA),pages 1259–1264, 2002.

[16] L. Ojeda and J. Borenstein. Non-GPS navigation for emergencyresponders. Sharing Solutions for Emergencies and HazardousEnvironments, 2006.

[17] D. Schulz, D. Fox, and A.B. Cremers. Tracking multiple movingtargets with a mobile robot using particle filters and statistical dataassociation. In Proc. of the IEEE Int. Conf. on Robotics & Automation(ICRA), 2001.

[18] Vicon Motion Systems. http://www.vicon.com.[19] T. Wilhelm, H.J. Bohme, and H.M. Gross. Sensor fusion for vision

and sonar based people tracking on a mobile service robot. In Proc. ofthe Int. Workshop on Dynamic Perception, pages 315–320, 2002.

[20] Xsens. MVN - Inertial Motion Capture. http://www.xsens.com.[21] Z. Zivkovic and B. Krose. Part based people detection using 2D range

data and images. In Proc. of the IEEE/RSJ Int. Conf. on IntelligentRobots and Systems (IROS), pages 214–219, 2007.