Multiple Targets Tracking in World Coordinate with a

Multiple Targets Tracking in World Coordinate with a Single, Minimally Calibrated Camera.Wongun Choi Silvio Savarese

Department of Electrical Engineering and Computer Science, University of MichiganExperimental Results

Conclusion

References

• Joint estimation of camera and targets’ state helps improve tracking performance.• Other geometric cues further stabilize the system.• Better target association can be achieved by modeling interaction between targets.

Challenges• Unknow (uncalibrated) camera motion.

• Background clutter.

• Detector failure (missed detection).

• Occlusion between targets.

Estimation : MCMC Particle Filtering• Motivation

- Non-linear and non-gaussian density function.

• Proposal Distribution Q(Ωt; Ωt)

- Choose one of the variables : camera parameter, target states, or ground features.- Sample from Gaussian distribution centered on current sample.

- Randomly change the class of a target or a ground feature.

- Direct sampling from high dimensional space is not ecient.

• Improved detection accuracy.

0 0.5 1 1.5 2 2.5 30.6

0.65

0.7

0.75

0.8

0.85

Recall/FPPI

FPPI

Reca

ll

Full ModelNo GeometryNo InteractionBaseline [9]

• Example tracking results on ETH sequence.

12 34 51314 1516 171819 20

117 121

122

121

161 162

165

167

121162

166

−20 −15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 6

−30 −25 −20 −15 −10 −5 0 50

5

10

15

20

25

30

35

40Frame 96

−50 −45 −40 −35 −30 −25 −20 −1510

15

20

25

30

35

40

45Frame 600

−80 −75 −70 −65 −60 −55 −50 −45 −4015

20

25

30

35

40

45

50Frame 995

−80 −75 −70 −65 −60 −55 −50 −45 −4015

20

25

30

35

40

45

50 Frame 1005

Frame 176

42

4344

45

Frame 1064

121

177178

179

• Accurate localization of camera

• Eect of interaction model.

4 25 31 333435

Frame 533

24

24

2728 30

Frame 479

424

28 30 3233

Frame 533

2 425

2829 3133

Frame 479

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 479

Group Interaction

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 533

Group Interaction

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 479

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 533

• Eect of geometric features.

1 2

34

Frame 89

1 2

34

Frame 223

1 2

34

Frame 89

1 2

68

Frame 223

−15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 89

−15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 223

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 89

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 223

Interaction Model

Zk(t-1) Zk(t) Zk(t+1)

Zj(t-1) Zj(t) Zj(t+1)

Zi(t-1) Zi(t) Zi(t+1) • Pairwise interaction Model

• Two exclusive mode of interactions- Repulsion: people want to keep a distance from others.

- Group Interaction: People moving as a group tend to move together.

- Targets’ motions are dependent.

P (Zt|Zt−1) =i<j

ψ(Zit, Zjt; βijt)i<j

P (βijt|βij(t−1))Ni=1

P (Zit|Zi(t−1))Repulsion

Group Interaction

ψ(Zit, Zjt; βijt) =

ψg(Zit, Zjt), if βijt = 1ψr(Zit, Zjt), otherwise

- Switch variables to select one of the above models.

[1] Ess, A., Leibe, B., Schindler, K., , van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR. (2008)

[2] Khan, Z., Balch, T., Dellaert, F.: Mcmc-based particle tering for tracking a variable number of interacting targets. PAMI (2005)

[3] Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You'll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV. (2009)

[4] Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: CVPR. (2006)

Joint Model

X2

τ1

X3

X1

τ3

τ2

ϴZ1

G2

Z3Z2

G3

G1

Xt-1 Xt

θt-1 θt

Zt-1 Zt

Xt+1

θt+1

Zt+1

τt-1 τt

Gt-1 Gt

τt+1

Gt+1

Graphical Representation World Illustration

• Sequential bayesian formulationP (Ωt|χt) ∝ P (Ωt, χt|χt−1) = P (χt|Ωt)

P (Ωt|Ωt−1)P (Ωt−1|χt−1)dΩt−1

P (χt|Ωt) = P (Xt, Yt|Zt,Θt)P (τt|Gt,Θt)P (Ωt|Ωt−1) = P (Zt|Zt−1)P (Θt|Θt−1)P (Gt|Gt−1)

• Motion models

• Observation models- Simplied camera projection function [4].

- Target’s individual motion modeled as a rst order linear dynamic model.

- Ground features assumed to be static.

- Camera assumed to move along the viewing direction.

• Indicator variable- Target : indicate whether this target is valid human or not.

- Ground features : indicate whether the feature is on static ground plane.

Overview• Objective : robustly track multiple targets in 3D.

• Modeling interaction between targets.

• MCMC particle ltering for ecient estimation.

• Ground feature points for robust camera estimation.

• Joint estimation of camera and target’s motion.

−36 −34 −32 −30 −28 −26 −24 −22 −20 −18 −1618

20

22

24

26

28

30

32

34Frame 657

3D Trajectory

Ground Plane Feature

Frame 657

114

121122123124

Projected Target

2D TrajectoryKLT Feature

Recall/FPPI on ETH dataset

ETH [1]Recall

FPPI

0.498 0.404 0.338

0.781 0.431 0.262

0.673 0.616 0.484

2.772 1.593 0.638

Our AlgorithmRecall

FPPI

0.556 0.541 0.519

0.792 0.442 0.267

0.339 0.421 0.497

2.792 1.608 0.647

Method Seq.#2 Seq.#3

Randomly Select one

Human? Non-human?How fast?Where? ...New random sample X’

Accept/Reject(Metropolis Hasting)

Time t

Time t+1

Documents

Multiple Targets Tracking in World Coordinate with a