Motion and Target Tracking (Overview) Suya Yousuya/Projects-old_files/docs/Slides-Motion.pdfMotion...

Preview:

Citation preview

1

Motion and Target Tracking(Overview)

Suya YouIntegrated Media Systems Center

Computer Science DepartmentUniversity of Southern California

2

§ Commercial - Personals/Publics

- Environment/Wildlife animal monitoring

- Traffic measurement

§ Law enforcement- National security

§ Military & defense

Applications - Video Surveillance

3

§ Cheaper, cheaper…- Very prevalent in

commercial/military establishments

§ High performance- Millions pixels

- Full range

- Networked (wired/wireless)

- On-board processors

Sensor and Technology

4

§ Covers many of challenging issuesSensor & data acquisition

- Multiple & distributed sensor network

Scene analysis & understanding

- Detection

- Tracking

- Recognition

Data representation & comprehension

- Object and environment modeling

- Simulation and Visualization

Machine Vision

5

§ GroundSmall/modest-scale environment

- Infrastructure, Military base…

- Intelligent traffic monitoring

§ AirborneLarge-scale environment

- National Infrastructure, Battlefield…

§ SpaceGlobal/outspace

- Battlefield, Environment monitoring, Mars…

Research & Systems

6

§ Distributed sensor network- Rectilinear CCD, omnidirectional, IR

cameras- Location sensor - GPS- Fixed, active, and mobile- Networked – wired and wireless

§ Dynamic event detection & analysis- Target detection/tracking/recognition- Incident detection/classification/reporting

§ 3D environment- 3D scene model (city 3D digital map)- Target 3D geo-localization- Immersive 3D visualization

§ Real-time information access- Control center and drivers

Example: Intelligent Traffic Monitoring

Sensor network

Information processing

Information access

Concept of Operations

7

§ Camera modeling and calibration- Perspective, panoramic cameras

- Allows automatic and on-site

§ Dynamic image analysis- Dynamic target detection/tracking

- Vehicle and people- Target recognition

- Classification approximately- Active vision

- Fixed and mobile platforms

§ 3D processing- 3D scene modeling:

- City model (building and road)- Target 3D geo-localization

- Tracking and positioning in 3D world- Visualization

- Immersive 3D (base station)- Abstract and full data (Web, drivers)

Vision Processing Issues

8

§ Camera modeling and calibration- Basic techniques are pretty as is- Main challenges are automatic and on-site calibrations

- Model based approach – given 3D model- Self-calibration vision approach – included in the tracking module

§ Dynamic image analysis- Outdoor imaging environment – lighting, weather…

- Dynamic background modeling approach- Visual modeling – finding imaging invariant (lighting, geometry)

- Target detection/tracking – long sequence, drifts, self-motion…- Model based approach – 3D scene- Distributed vision approach – multi-view/camera geometry- Hybrid approach – Active sensor (GPS/INS) aided vision

- Active sensors aid video system- Reduces frame-frame vision processing

- Video processing aids sensor performance- Allows estimate of camera attitude- Improves speed and accuracy

§ 3D scene modeling- Urban site model (building and road) – city scale, accuracy to level of street block, less manual interaction

- Stereo approach still plays a main role- LiDAR is pretty new and promising approach- Ground based laser range finder

Challenges

9

§ Heavy computation load is a main barrier- High resolution sensor – better for image analysis (e.g. detection…)- Fast processing - can loose lots of vision processing jobs (e.g. tracking)- Multiple camera arrays – huge data needs to be fused and computed- Users want the results what they are seeing

§ Real-time vision computation- Developing fast algorithms

e.g. Pyramid technique is a good example- Aided by other sensors

e.g. Inertial sensor, GPS...- Hardware

- General computer- Special CPU features (low-level programming)- Processor clusters – (parallelization programming)

- Special processor/board- DSP technique- FPGA technique (cheaper, flexible)- GPU power (CG language)

- Smart camera (on-board processors)

Challenges (con.)

10

§ Dynamic Global Image Construction and Registration§ Construct video Mosaic and register mission-collected video

frames to previously prepared reference imagery in order to geolocate both moving and stationary targets in real time

§ Multiple Target Surveillance§ Simultaneously track multiple moving targets in a sensor’s field of

regard

§ Fixed sensors and active moving platforms (Satellite, UAV, robot)

§ Activity Monitoring § The monitoring of several areas of the battle space for distinctive

motion activities such as a soldier incursion and vehicle movement

Research Components (image related)

11

§ Achieved through successive refinement within a multi-resolution pyramid structure

§ Highly efficient can handle very large camera motions of the field of view, and provide very precise alignment

Motion Estimate

A Pyramid-Based Approach

- 2D motion flow estimation

- Fit motion model (linear/nonlinear)

- Warp to align

12

§ It’s simple, but still very useful§ Target detection

§ Motion tracking

§ Navigation

§ Compression

§ It can handle large motion and be helpful for vision acceleration, but construction of itself needs extra computation

§ Pyramid Vision Processor/Board§ Single-chip

§ Simultaneous input/processing of up to 2 channels

§ Real-time (30fps), low-latency processing (1-2 frame delay)

Multi-resolution Approach

13

Robust Image Motion Estimation

- Hybrid point and region- selecting “good” points and

regions as tracking features- Multi-stage tracking strategy

- multiresolution- A closed-loop cooperative

manner– integrating the feature

detection, tracking, and verification

Region/Point Detect & Select

Affine Region Warp and SSD Evaluation

Multiscale Region Optical Flow

Affine Region Warp and SSD Evaluation

Iteration Control

Linear Point Motion Refinement by Search

14

Robust Image Motion Estimation (con.)

Image i

Image i+1

Source RegionTarget Region

Affine model defines warp of source region to a confidence frame

Normalized SSD measures the difference between warped source and target regions, thereby measuring the quality of tracking δ=(0,1]

SSD

εδ

+=

11

Confidence Frame

AffineWarp

Rt0

Rt

Rc

}),(,),(max{

),(),(22

2

tRtR

tRtR

ct

ct

xx

xx −=ε

15

Performance Evaluation

(a) detected tracking features (b) estimated motion field

Synthetic image sequence (Yosemite-Fly-Through)

Technique Average Angle Error Standard DeviationHorn and Schunck 11.26 16.41Lucas and Kanade 4.10 9.58Anandan 15.84 13.46Fleet and Jepson 4.29 11.24Closed-loop approach 2.84 7.69

16

Some Applications

-Tracking for ground and Aerial image

- Movie special effects including “X-Men 2,”“Daredevil”, and “Dr. Seuss’ ‘The Cat in the Hat.’ ”

- Hardware implementation is under way (Olympus): PCMCIA size card

17

Video Stabilization/Mosaic

Inter-frame image motion

estimation

(Parameters)

Motion compensation

and registration

(Model)

Image alignments and

mosaicking

(Composition)

18

Global Motion Compensation

Registration modeltranslation, affine, and perspective

++++++++

=

xyuyuyvxvvxyuxuyvxvv

yxvyxu

22

1543

22

1210

),(),(

Model fittingAn over-constrained SVD solution

- Motion vector field (every pixel)- Feature based approach- Coarse-fine approach

Image stabilization

Registering the two images and computing the geometric transformation that warps the source image such that it aligns with the reference image – cancel the motion of observer

19

Video Stabilization/Mosaic

“Frame-to-Mosaic” alignment• Mosaic reference (first, middle, defined…)• Warping each frame to reference• Hierarchical alignment

Temporal filtering (for mosaic)• Intensity blending• Weighted average blending function

20

Goal§ Moving target detection/tracking§ Vehicle and people

§ Landmark recognition

§ Interested buildings and reference features

Target Detection & Tracking

Platform§ Stationary sensors

§ Ground cameras (perspective, panoramic cameras)

§ Moving sensors

§ Satellite, UAV, robot carried

§ Image, GPS, and INS data are available

21

Stationary cameras

§ Background is “static” -assumption

§ Foreground is moving

BG/FG classification

§ Background matching

§ Matching image

§ Identification

§ Tracking

Stationary & Moving Platforms

Moving cameras§ Background is “moving” – camera

motion

§ Foreground is moving

BG/FG classification§ Motion compensation

§ Background matching§ Matching image

§ Identification

§ Tracking

Challenges:

§ Background modeling and maintaining

§ Motion compensation (image stabilization)

22

Target Detection/Tracking (stationary sensor)

Background matching

Preprocessing

Video image

Detection Tracking

Background model

Background matching

Preprocessing

Video image

Detection Tracking

Background model

Motion compensation

23

It’s a challenging problem§ Appearance changes§ Time, lighting, weather…

§ Waking/sleeping objects§ BG objects moving, FG object still

§ Color/contrast aperture§ Subsumed BG/FB, Homogeneous region

§ Waving trees§ Vacillating BK

§ Apparent Motion§ Camera motion

Background Modeling

24

§ Pre-defined constant BK§ “Blue screen” - movie special effect

§ Everything is predefined – no need to be estimated on-line

§ Some preprocessing may be necessary – log filtering

§ Adjacent Frame Difference (AFD) approach

§ Constant BK, but unknown§ BK is modeled as intensity constant

§ Need parameter estimate/update on-line

§ Mean Estimate Approach

§ Linear model, i.e.

§ Mean-Covariance Approach

§ Both need to be estimated

§ Optimal estimators (Kalman filter)

§ Block Correlation Matching Approach

§ Block-wise median template

§ Correlation matching

Constant Intensity Model

),(1

),(1

),( yxIN

yxmN

Nyxm old +

−=

σ,m

25

§ Complex background§ Feature based approach - matching feature is a 4D Spatio-Temporal vector, i.e.

§ BK is modeled as a certain statistical distribution in the 4D vector space

§ Background update – temporal blending

§ Single Gaussian Estimate approach

§ Mixture of Gaussian Estimate Approach

§ BK is modeled as multiple Gaussian distributions

§ Multiple frequency Gaussian channels

§ Markov Model, EM (Expectation-Maximization) approaches

Statistical Feature Model

],,,[ tyx IIIIm =

Tmeanimean

N

ii

N

iimean mmmm

Nm

Nm ))((

11,1 −−−

== ∑∑ σ

),(1

),(1

),( yxmN

yxmN

Nyxm oldnew +

−=

Tmeanmeanoldnew mmmm

NN

NN

yx ))(()1(1

),(2

−−+

++

= σσ

mBB oldnew αα +−= )1(

26

§ Motion Estimate Techniques§ Instead of using intensity constant constraint, BK is modeled as constant motion/optical flow field

§ Matching feature is a 3D-vector, i.e.

§ Background update is an optical estimation problem

§ Extend to Multi-resolution detection and update

Motion Field Model

0=++ tyx IvIuI

],,[ tyx IIIm =

−=

∑∑

∑∑∑∑

ty

tx

yyx

yxx

IIII

IIIIII

vu

1

2

2

27

§ Statistical Prediction Techniques§ The BK pixels are predicted – what are expected in next input frame

§ Linear estimation problem - LS, Wiener filtering

§ More complex prediction model is possible

§ Motion/optical filed prediction model

§ Non-linear prediction model

Prediction Model

∑=

−=1

),(),(i

itit yxIayxB

[ ] [ ] [ ]itti

itt IIEaIEeE −=∑+=

1

22

28

§ Estimate as a Recognition Problem§ Training – motionless background frames

§ Feature extraction – statistical image feature

§ Eigenbackground

§ PCA (Principal Component Analysis)

§ Matching – PCA projection

Statistical Recognition Model

Image space PCA Space Image space

Background training

Live video projection

Foreground Background

29

- The problem of above approaches is to separate three detection/tracking processes into independent phases

- Low level – pixel-wise detection/segmentations- Middle level – labeling pixels as grouped targets- High level – temporal-tracking, Spatio-recognition

- An Integrated Approach– integrating the pixel classification, region detection, and inter-

frame tracking in closed-loop manner

More Techniques

Frame-wise processing

(Matching) Region-wise processing

(Clustering)

Pixel-wise processing

(Segmentation)

……

K-means clusteringLinear prediction

30

- Illumination invariant

More Techniques (con.)

γϕρα )( ,,,, yxyxyxyx LI =

)log()log()log()log()log( ,,,, yxyxyxyx LI ϕγργγα +++=

434 21321invariant-onilluminatidependent-onilluminati

),(),(),( yxyxyx MLI +=

Surface lighting model

)),((),(ˆ yxFilteryx IL =

),(ˆ),(),(ˆ yxyxyx LIM −=

)),(ˆ),(exp(),(ˆ yxyxyxm LI −=or

Strong surface shading: effective

Strong illumination gradients: less effective

Low intensity: none or worst

Illumination invariant

31

§ Moving cameras

- Background is “moving” –camera motion

- Foreground is moving

§ Motion compensation- Registering images and

computing geometric transformation that compensates the source image such that it aligns with reference images

Target Detection/Tracking (moving sensor)

Background matching

Preprocessing

Video image

Detection Tracking

Background model

Motion compensation

32

Motion Compensation

Parametric modeltranslation, affine, and perspective

++++++++

=

′′

xyuyuyvxvvxyuxuyvxvv

yx

22

1543

22

1210

Model fittingAn over-constrained optimal estimate problem- It’s hard – BK contains moving objects- Motion vector field vs. Feature based approaches- Iterative vs. Non-iterative approaches

33

Motion Vector Field Estimation

Parametric model – Affine transformation

+

=

′′

5

4

32

10

,

,

v

v

y

x

vv

vv

y

x

Optical flow tracking and warpingFrame i-1 Frame i

Source point Target point

Affine model defines warp of source frame to a reference frame

Normalized SSD measures the difference between warped source and target

SSD

AffineWarpRt0 RtRc

Multi-resolution Iterative refinement

34

Dynamic Object Tracking: Results

Hand-held camera: Multiple objects Tracked object visualized in 3D

Hand-held camera: Integration of mosaic, image stabilization, and object tracking

35

Dynamic Object Tracking: Results

UAV sensor: Integration of mosaic, image stabilization, and object tracking

36

Feature Matching

Parametric model – Affine transformation

+

=

′′

5

4

32

10

,

,

v

v

y

x

vv

vv

y

x

Feature tracking and warping

Multi-resolution Iterative refinement

Frame i

Frame i-1Feature

selection (N) & SSD

Affine (T1)

Affine (T2)

Affine (TM)

Selection optimal T

Affine Warp

37

Others

Perceptual Grouping Methodology - Tensor Voting- A simulation of perceptual organization – infer what we perceive from noise/missing data

- A Computational Framework for Segmentation and Grouping (formalized by USC Prof. Gérard Medioni)- Tensor Voting

- Description – data is represented as tensors to generate descriptions in terms of surface, regions, curves, and labeled junctions, from sparse, noisy, binary data in 2D/3D - Voting – how the tensors communicate and propagate information between neighbors

- Has been apply to many vision problems, including- Segmentation/detection- Motion tracking, Trajectory extraction- Stereo vision- Epipolar geometry estimation

38

Others (con.)

§ Multi-view Cameras- Continuous cross-view tracking

- Stationary platform – Stationary platform- Stationary platform – Moving platform- Moving platform – Moving platform

- Requires continuous and complete tracking trajectories

- Requires trajectories and view points registrations

39

Omnidirectional Image- Wide (360 degree) horizontal FOV- Less partial occlusions- Less motion ambiguities (pure translation and rotation)- Limited resolution – used for close range objects

Others (con.)

40

Benefits Using Panoramic Imaging

§ Wide FOV ensures:- A sufficient number of

features for tracking- Less partial occlusion

§ Accurate estimates for large motion- Provides sufficient

information for distinguishing motion ambiguities (pure translation and rotation)

41

Integration of Imagery and Range Data- Wide coverage- Rapidness and robustness- Direct recover of 3D models and geolocations

Others (con.)

Image warping

Camera parameters

Residual estimate

DEM

Reference images

Live images

Space Filtering

Detected targets LiDAR has accuracy typically as ~0.5-1.0m ground-spacing and centimeters height

Recommended