19
Real-time Tracking of Multiple People Using Stereo David Beymer Bob Bolles Kurt Konolige Chris Eveland Artificial Intelligence Center SRI International

Real-time Tracking of Multiple People Using Stereo David BeymerBob Bolles Kurt Konolige Chris Eveland Artificial Intelligence Center SRI International

Embed Size (px)

Citation preview

Real-time Tracking of MultiplePeople Using Stereo

David Beymer Bob Bolles

Kurt Konolige Chris Eveland

Artificial Intelligence Center SRI International

Problem: people tracking for surveillance

• return coarse 3D locations of people

• real-time on standard hardware

• multiple people in scene

• stationary camera

• consider: template-based tracking– maintain template of object

– correlation used to update object position

– template is recursively updated to handle changing object appearance

• limitations/problems1) object initialization/detection2) template drift

Approach

),( yxT

),( pp yx

),()1(),(),( pp yyxxIyxTyxT

Goal: add modality of stereo

• segmentation: background subtraction on stereo disparities to detect foreground

• detection: person templates encoding head and torso shape

• tracking:– person templates used to avoid drift– stereo segmentation used to add “support” template

left background disparities foreground

Approach

• detection– segment foreground into

depth layers

– correlate with person templates

• tracking– intensity and "support"

templates are recursively updated

– Kalman filtering on person location in 3D

– person templates used to avoid drift

backgroundsubtraction

backgroundinit

foreground

stereo

detection tracking

leftintensity

persontemplates

Related Work

• Companies– Teleos Research/Autodesk, People Tracker

– DEC/Compac, Smart Kiosk [Rehg, et al, 1997]

– Interval, Morphin' Mirror [Darrell, et al, 1998]

– Sarnoff [IUW, 1998]

– Texas Instruments [Flinchbaugh, 1998]

– Electric Planet

• Universities– MIT, Pfinder [Wren, et al, 1997]

– Toronto, [Fieguth and Terzopoulos, 1997

– Maryland, W S [Haritaoglu, et al., 1998]

– MIT, Forest of Sensors [Grimson, et al., 1998]

– CMU [Kanade, et al, 1998]

– Columbia/Lehigh [Nayar and Boult, 1998]

– Boston Univ., [Rosales and Sclaroff, 1998]

4

Stereo module: SRI's Small Vision System (SVS)

• Hardware– two CMOS cameras

– low power (150mW), inexpensive ($100 components)

– adjustable baseline: 2.7'' to 6.2'' in 1'' increments

– another version with DSP processing onboard

• Software– stereo algorithm is area

correlation based

– optimized C and MMX code

– 20 Hz on 320x240 image, 24 disparities, 400 MHz Pentium II

SVS Stereo Results

left right

disparities

notation:

),( yxd

),(0 yxd

current disparities

background estimate

• look for disparities closer than background

• using stereo disparities versus intensities

Background subtraction

),( yxfotherwise 0

undefined),( and defined

orthresh,),(),(if),(

0

0

yx(x,y)

yxyxyx

dd

ddd

left ),(background 0 yxd ),(sdisparitie yxd ),(foreground yxf

+less sensitive to lighting changes, shadows

+can segment people at different depths

–more computationally expensive

–tends to blur & expand object boundaries

• idea: range info from stereo can be used to fix scale of processing avoid search over scale parameter

– person width is proportional to disparity

– from similar triangles:

– stereo equation:

Handling scale

image

COP

f

z

ww'

'w

f

w

z

const' wfwz

z

bfd

dKw '

d: disparityb: baseline

K: constant

Detection

foregroundf(x,y)

histogram anotherpeak?

thresholddisparities

correlate withpersontemplate

foundperson?

remove personfrom layer(x,y)

exit

no

yes

no

yes

disparity

count

layer(x,y)

Detection example

• during detection, extract intensity and “support” template from layer(x,y)

Tracking -- coordinate space

stereohead

left

right

top viewZ

X

(x, disparity) (X, Z)

image 3D

Tracking Steps• prediction

– predict Kalman filter (X, Z)

– predict person disparity

• segmentation– select foreground layer around predicted disparity

• localization– correlate gray level template against left image, weighted by support

template [coarse localization]

– correlate head/torso shape template against segmented foreground layer [re-centering step that addresses template drift]

• update– Kalman filter

– recursive update of intensity and support templates

Tracking Videos

• recursive template update

walking figure eight running

Please click on image to start video. Once finished viewing the video, use the “back” button on your browser to return.

Tracking Videos

visualizing tracks from map view

tracking under multiple occlusions

Please click on image to start video. Once finished viewing the video, use the “back” button on your browser to return.

Tracking: quantitative results

Sequence # people # occlusions TR FP MTD1 1 0 96% 0% 6.02 1 0 98% 0% 4.03 1 0 96% 0% 10.04 2 0 89% 10% 2.55 2 0 92% 6% 11.06 2 1 86% 0% 9.07 3 2 79% 3% 7.78 4 2 85% 2% 5.09 3 6 84% 4% 5.810 5 10 78% 1.3% 6.611 4 9 69% 5.6% 7.012 5 20 68% 3.2% 5.413 5 28 70% 6.7% 6.2

TR = tracking rate FP = false positive rate

MTD = mean time to detect

Evaluating use of stereo in tracker

• Experiment: disable stereo in tracker– code modifications:

• disable re-centering step

• weighted intensity correlation unweighted correlation

– results:• mean tracking rate (TR) drops 4%

• mean false positive rate (FP) increases from 3% to 10%

• (qualitative) template drift causes people to be lost and re-detected

Conclusion

• Stereo is an effective segmentation tool:– detection: provides a foreground layer divided into different depth

layers

– tracking: helps to avoid template drift by focusing on foreground pixels at object’s depth

• Combine segmentation with priors on person shape (i.e. head/torso templates) for person localization.