1
Multiple Targets Tracking in World Coordinate with a Single, Minimally Calibrated Camera. Wongun Choi Silvio Savarese Department of Electrical Engineering and Computer Science, University of Michigan Experimental Results Conclusion References Joint estimation of camera and targets’ state helps improve tracking performance. Other geometric cues further stabilize the system. Better target association can be achieved by modeling interaction between targets. Challenges Unknow (uncalibrated) camera motion. Background clutter. Detector failure (missed detection). Occlusion between targets. Estimation : MCMC Particle Filtering Motivation - Non-linear and non-gaussian density function. Proposal Distribution Q(Ω t ;Ω t ) - Choose one of the variables : camera parameter, target states, or ground features. - Sample from Gaussian distribution centered on current sample. - Randomly change the class of a target or a ground feature. - Direct sampling from high dimensional space is not efficient. Improved detection accuracy. 0 0.5 1 1.5 2 2.5 3 0.6 0.65 0.7 0.75 0.8 0.85 Recall/FPPI FPPI Recall Full Model No Geometry No Interaction Baseline [9] Example tracking results on ETH sequence. 1 2 3 4 5 13 14 15 16 17 18 19 20 117 121 122 121 161 162 165 167 121 162 166 -20 -15 -10 -5 0 5 10 10 5 10 15 20 25 Frame 6 -30 -25 -20 -15 -10 -5 0 0 5 10 15 20 25 30 35 40 Frame 96 -50 -45 -40 -35 -30 -25 -20 -110 15 20 25 30 35 40 45 Frame 600 -80 -75 -70 -65 -60 -55 -50 -45 -415 20 25 30 35 40 45 50 Frame 995 -80 -75 -70 -65 -60 -55 -50 -45 -40 15 20 25 30 35 40 45 50 Frame 1005 Frame 176 42 43 44 45 Frame 1064 121 177 178 179 Accurate localization of camera Effect of interaction model. 4 25 31 33 34 35 Frame 533 2 4 24 27 28 30 Frame 479 4 24 28 30 32 33 Frame 533 2 4 25 28 29 31 33 Frame 479 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 479 Group Interaction −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 533 Group Interaction −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 479 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 533 Effect of geometric features. 1 2 3 4 Frame 89 1 2 3 4 Frame 223 1 2 3 4 Frame 89 1 2 6 8 Frame 223 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 89 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 223 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 89 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 223 Interaction Model Z k(t-1) Z k(t) Z k(t+1) Z j(t-1) Z j(t) Z j(t+1) Z i(t-1) Z i(t) Z i(t+1) Pairwise interaction Model Two exclusive mode of interactions - Repulsion: people want to keep a distance from others. - Group Interaction: People moving as a group tend to move together. - Targets’ motions are dependent. P (Z t |Z t1 )= i<j ψ (Z it ,Z jt ; β ijt ) i<j P (β ijt |β ij (t1) ) N i=1 P (Z it |Z i(t1) ) Repulsion Group Interaction ψ (Z it ,Z jt ; β ijt )= ψ g (Z it ,Z jt ), if β ijt =1 ψ r (Z it ,Z jt ), otherwise - Switch variables to select one of the above models. [1] Ess, A., Leibe, B., Schindler, K., , van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR. (2008) [2] Khan, Z., Balch, T., Dellaert, F.: Mcmc-based particle fltering for tracking a variable number of interacting targets. PAMI (2005) [3] Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You'll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV. (2009) [4] Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: CVPR. (2006) Joint Model X 2 τ 1 X 3 X 1 τ 3 τ 2 ϴ Z 1 G 2 Z 3 Z 2 G 3 G 1 X t-1 X t θ t-1 θ t Z t-1 Z t X t+1 θ t+1 Z t+1 τ t-1 τ t G t-1 G t τ t+1 G t+1 Graphical Representation World Illustration Sequential bayesian formulation P (Ω t |χ t ) P (Ω t t |χ t1 )= P (χ t |t ) P (Ω t |t1 )P (Ω t1 |χ t1 )dt1 P (χ t |t )= P (X t ,Y t |Z t , Θ t )P (τ t |G t , Θ t ) P (Ω t |t1 )= P (Z t |Z t1 )P t |Θ t1 )P (G t |G t1 ) Motion models Observation models - Simplified camera projection function [4]. - Target’s individual motion modeled as a first order linear dynamic model. - Ground features assumed to be static. - Camera assumed to move along the viewing direction. Indicator variable - Target : indicate whether this target is valid human or not. - Ground features : indicate whether the feature is on static ground plane. Overview Objective : robustly track multiple targets in 3D. Modeling interaction between targets. MCMC particle filtering for efficient estimation. Ground feature points for robust camera estimation. Joint estimation of camera and target’s motion. -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 18 20 22 24 26 28 30 32 34 Frame 657 3D Trajectory Ground Plane Feature Frame 657 114 121 122 123 124 Projected Target 2D Trajectory KLT Feature Recall/FPPI on ETH dataset ETH [1] Recall FPPI 0.498 0.404 0.338 0.781 0.431 0.262 0.673 0.616 0.484 2.772 1.593 0.638 Our Algorithm Recall FPPI 0.556 0.541 0.519 0.792 0.442 0.267 0.339 0.421 0.497 2.792 1.608 0.647 Method Seq.#2 Seq.#3 Randomly Select one Human? Non-human? How fast? Where? ... New random sample X’ Accept/Reject (Metropolis Hasting) Time t Time t+1

Multiple Targets Tracking in World Coordinate with a

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Multiple Targets Tracking in World Coordinate with a Single, Minimally Calibrated Camera.Wongun Choi Silvio Savarese

Department of Electrical Engineering and Computer Science, University of MichiganExperimental Results

Conclusion

References

• Joint estimation of camera and targets’ state helps improve tracking performance.• Other geometric cues further stabilize the system.• Better target association can be achieved by modeling interaction between targets.

Challenges• Unknow (uncalibrated) camera motion.

• Background clutter.

• Detector failure (missed detection).

• Occlusion between targets.

Estimation : MCMC Particle Filtering• Motivation

- Non-linear and non-gaussian density function.

• Proposal Distribution Q(Ωt; Ωt)

- Choose one of the variables : camera parameter, target states, or ground features.- Sample from Gaussian distribution centered on current sample.

- Randomly change the class of a target or a ground feature.

- Direct sampling from high dimensional space is not ecient.

• Improved detection accuracy.

0 0.5 1 1.5 2 2.5 30.6

0.65

0.7

0.75

0.8

0.85

Recall/FPPI

FPPI

Reca

ll

Full ModelNo GeometryNo InteractionBaseline [9]

• Example tracking results on ETH sequence.

12 34 51314 1516 171819 20

117 121

122

121

161 162

165

167

121162

166

−20 −15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 6

−30 −25 −20 −15 −10 −5 0 50

5

10

15

20

25

30

35

40Frame 96

−50 −45 −40 −35 −30 −25 −20 −1510

15

20

25

30

35

40

45Frame 600

−80 −75 −70 −65 −60 −55 −50 −45 −4015

20

25

30

35

40

45

50Frame 995

−80 −75 −70 −65 −60 −55 −50 −45 −4015

20

25

30

35

40

45

50 Frame 1005

Frame 176

42

4344

45

Frame 1064

121

177178

179

• Accurate localization of camera

• Eect of interaction model.

4 25 31 333435

Frame 533

24

24

2728 30

Frame 479

424

28 30 3233

Frame 533

2 425

2829 3133

Frame 479

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 479

Group Interaction

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 533

Group Interaction

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 479

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 533

• Eect of geometric features.

1 2

34

Frame 89

1 2

34

Frame 223

1 2

34

Frame 89

1 2

68

Frame 223

−15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 89

−15 −10 −5 0 5 10 150

5

10

15

20

25 Frame 223

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 89

−15 −10 −5 0 5 10 150

5

10

15

20

25Frame 223

Interaction Model

Zk(t-1) Zk(t) Zk(t+1)

Zj(t-1) Zj(t) Zj(t+1)

Zi(t-1) Zi(t) Zi(t+1) • Pairwise interaction Model

• Two exclusive mode of interactions- Repulsion: people want to keep a distance from others.

- Group Interaction: People moving as a group tend to move together.

- Targets’ motions are dependent.

P (Zt|Zt−1) =i<j

ψ(Zit, Zjt; βijt)i<j

P (βijt|βij(t−1))Ni=1

P (Zit|Zi(t−1))Repulsion

Group Interaction

ψ(Zit, Zjt; βijt) =

ψg(Zit, Zjt), if βijt = 1ψr(Zit, Zjt), otherwise

- Switch variables to select one of the above models.

[1] Ess, A., Leibe, B., Schindler, K., , van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR. (2008)

[2] Khan, Z., Balch, T., Dellaert, F.: Mcmc-based particle tering for tracking a variable number of interacting targets. PAMI (2005)

[3] Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You'll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV. (2009)

[4] Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: CVPR. (2006)

Joint Model

X2

τ1

X3

X1

τ3

τ2

ϴZ1

G2

Z3Z2

G3

G1

Xt-1 Xt

θt-1 θt

Zt-1 Zt

Xt+1

θt+1

Zt+1

τt-1 τt

Gt-1 Gt

τt+1

Gt+1

Graphical Representation World Illustration

• Sequential bayesian formulationP (Ωt|χt) ∝ P (Ωt, χt|χt−1) = P (χt|Ωt)

P (Ωt|Ωt−1)P (Ωt−1|χt−1)dΩt−1

P (χt|Ωt) = P (Xt, Yt|Zt,Θt)P (τt|Gt,Θt)P (Ωt|Ωt−1) = P (Zt|Zt−1)P (Θt|Θt−1)P (Gt|Gt−1)

• Motion models

• Observation models- Simplied camera projection function [4].

- Target’s individual motion modeled as a rst order linear dynamic model.

- Ground features assumed to be static.

- Camera assumed to move along the viewing direction.

• Indicator variable- Target : indicate whether this target is valid human or not.

- Ground features : indicate whether the feature is on static ground plane.

Overview• Objective : robustly track multiple targets in 3D.

• Modeling interaction between targets.

• MCMC particle ltering for ecient estimation.

• Ground feature points for robust camera estimation.

• Joint estimation of camera and target’s motion.

−36 −34 −32 −30 −28 −26 −24 −22 −20 −18 −1618

20

22

24

26

28

30

32

34Frame 657

3D Trajectory

Ground Plane Feature

Frame 657

114

121122123124

Projected Target

2D TrajectoryKLT Feature

Recall/FPPI on ETH dataset

ETH [1]Recall

FPPI

0.498 0.404 0.338

0.781 0.431 0.262

0.673 0.616 0.484

2.772 1.593 0.638

Our AlgorithmRecall

FPPI

0.556 0.541 0.519

0.792 0.442 0.267

0.339 0.421 0.497

2.792 1.608 0.647

Method Seq.#2 Seq.#3

Randomly Select one

Human? Non-human?How fast?Where? ...New random sample X’

Accept/Reject(Metropolis Hasting)

Time t

Time t+1