Upload
dangquynh
View
221
Download
3
Embed Size (px)
Citation preview
1
Motion and Target Tracking(Overview)
Suya YouIntegrated Media Systems Center
Computer Science DepartmentUniversity of Southern California
2
§ Commercial - Personals/Publics
- Environment/Wildlife animal monitoring
- Traffic measurement
§ Law enforcement- National security
§ Military & defense
Applications - Video Surveillance
3
§ Cheaper, cheaper…- Very prevalent in
commercial/military establishments
§ High performance- Millions pixels
- Full range
- Networked (wired/wireless)
- On-board processors
Sensor and Technology
4
§ Covers many of challenging issuesSensor & data acquisition
- Multiple & distributed sensor network
Scene analysis & understanding
- Detection
- Tracking
- Recognition
Data representation & comprehension
- Object and environment modeling
- Simulation and Visualization
Machine Vision
5
§ GroundSmall/modest-scale environment
- Infrastructure, Military base…
- Intelligent traffic monitoring
§ AirborneLarge-scale environment
- National Infrastructure, Battlefield…
§ SpaceGlobal/outspace
- Battlefield, Environment monitoring, Mars…
Research & Systems
6
§ Distributed sensor network- Rectilinear CCD, omnidirectional, IR
cameras- Location sensor - GPS- Fixed, active, and mobile- Networked – wired and wireless
§ Dynamic event detection & analysis- Target detection/tracking/recognition- Incident detection/classification/reporting
§ 3D environment- 3D scene model (city 3D digital map)- Target 3D geo-localization- Immersive 3D visualization
§ Real-time information access- Control center and drivers
Example: Intelligent Traffic Monitoring
Sensor network
Information processing
Information access
Concept of Operations
7
§ Camera modeling and calibration- Perspective, panoramic cameras
- Allows automatic and on-site
§ Dynamic image analysis- Dynamic target detection/tracking
- Vehicle and people- Target recognition
- Classification approximately- Active vision
- Fixed and mobile platforms
§ 3D processing- 3D scene modeling:
- City model (building and road)- Target 3D geo-localization
- Tracking and positioning in 3D world- Visualization
- Immersive 3D (base station)- Abstract and full data (Web, drivers)
Vision Processing Issues
8
§ Camera modeling and calibration- Basic techniques are pretty as is- Main challenges are automatic and on-site calibrations
- Model based approach – given 3D model- Self-calibration vision approach – included in the tracking module
§ Dynamic image analysis- Outdoor imaging environment – lighting, weather…
- Dynamic background modeling approach- Visual modeling – finding imaging invariant (lighting, geometry)
- Target detection/tracking – long sequence, drifts, self-motion…- Model based approach – 3D scene- Distributed vision approach – multi-view/camera geometry- Hybrid approach – Active sensor (GPS/INS) aided vision
- Active sensors aid video system- Reduces frame-frame vision processing
- Video processing aids sensor performance- Allows estimate of camera attitude- Improves speed and accuracy
§ 3D scene modeling- Urban site model (building and road) – city scale, accuracy to level of street block, less manual interaction
- Stereo approach still plays a main role- LiDAR is pretty new and promising approach- Ground based laser range finder
Challenges
9
§ Heavy computation load is a main barrier- High resolution sensor – better for image analysis (e.g. detection…)- Fast processing - can loose lots of vision processing jobs (e.g. tracking)- Multiple camera arrays – huge data needs to be fused and computed- Users want the results what they are seeing
§ Real-time vision computation- Developing fast algorithms
e.g. Pyramid technique is a good example- Aided by other sensors
e.g. Inertial sensor, GPS...- Hardware
- General computer- Special CPU features (low-level programming)- Processor clusters – (parallelization programming)
- Special processor/board- DSP technique- FPGA technique (cheaper, flexible)- GPU power (CG language)
- Smart camera (on-board processors)
Challenges (con.)
10
§ Dynamic Global Image Construction and Registration§ Construct video Mosaic and register mission-collected video
frames to previously prepared reference imagery in order to geolocate both moving and stationary targets in real time
§ Multiple Target Surveillance§ Simultaneously track multiple moving targets in a sensor’s field of
regard
§ Fixed sensors and active moving platforms (Satellite, UAV, robot)
§ Activity Monitoring § The monitoring of several areas of the battle space for distinctive
motion activities such as a soldier incursion and vehicle movement
Research Components (image related)
11
§ Achieved through successive refinement within a multi-resolution pyramid structure
§ Highly efficient can handle very large camera motions of the field of view, and provide very precise alignment
Motion Estimate
A Pyramid-Based Approach
- 2D motion flow estimation
- Fit motion model (linear/nonlinear)
- Warp to align
12
§ It’s simple, but still very useful§ Target detection
§ Motion tracking
§ Navigation
§ Compression
§ It can handle large motion and be helpful for vision acceleration, but construction of itself needs extra computation
§ Pyramid Vision Processor/Board§ Single-chip
§ Simultaneous input/processing of up to 2 channels
§ Real-time (30fps), low-latency processing (1-2 frame delay)
Multi-resolution Approach
13
Robust Image Motion Estimation
- Hybrid point and region- selecting “good” points and
regions as tracking features- Multi-stage tracking strategy
- multiresolution- A closed-loop cooperative
manner– integrating the feature
detection, tracking, and verification
Region/Point Detect & Select
Affine Region Warp and SSD Evaluation
Multiscale Region Optical Flow
Affine Region Warp and SSD Evaluation
Iteration Control
Linear Point Motion Refinement by Search
14
Robust Image Motion Estimation (con.)
Image i
Image i+1
Source RegionTarget Region
Affine model defines warp of source region to a confidence frame
Normalized SSD measures the difference between warped source and target regions, thereby measuring the quality of tracking δ=(0,1]
SSD
εδ
+=
11
Confidence Frame
AffineWarp
Rt0
Rt
Rc
}),(,),(max{
),(),(22
2
tRtR
tRtR
ct
ct
xx
xx −=ε
15
Performance Evaluation
(a) detected tracking features (b) estimated motion field
Synthetic image sequence (Yosemite-Fly-Through)
Technique Average Angle Error Standard DeviationHorn and Schunck 11.26 16.41Lucas and Kanade 4.10 9.58Anandan 15.84 13.46Fleet and Jepson 4.29 11.24Closed-loop approach 2.84 7.69
16
Some Applications
-Tracking for ground and Aerial image
- Movie special effects including “X-Men 2,”“Daredevil”, and “Dr. Seuss’ ‘The Cat in the Hat.’ ”
- Hardware implementation is under way (Olympus): PCMCIA size card
17
Video Stabilization/Mosaic
Inter-frame image motion
estimation
(Parameters)
Motion compensation
and registration
(Model)
Image alignments and
mosaicking
(Composition)
18
Global Motion Compensation
Registration modeltranslation, affine, and perspective
++++++++
=
xyuyuyvxvvxyuxuyvxvv
yxvyxu
22
1543
22
1210
),(),(
Model fittingAn over-constrained SVD solution
- Motion vector field (every pixel)- Feature based approach- Coarse-fine approach
Image stabilization
Registering the two images and computing the geometric transformation that warps the source image such that it aligns with the reference image – cancel the motion of observer
19
Video Stabilization/Mosaic
“Frame-to-Mosaic” alignment• Mosaic reference (first, middle, defined…)• Warping each frame to reference• Hierarchical alignment
Temporal filtering (for mosaic)• Intensity blending• Weighted average blending function
20
Goal§ Moving target detection/tracking§ Vehicle and people
§ Landmark recognition
§ Interested buildings and reference features
Target Detection & Tracking
Platform§ Stationary sensors
§ Ground cameras (perspective, panoramic cameras)
§ Moving sensors
§ Satellite, UAV, robot carried
§ Image, GPS, and INS data are available
21
Stationary cameras
§ Background is “static” -assumption
§ Foreground is moving
BG/FG classification
§ Background matching
§ Matching image
§ Identification
§ Tracking
Stationary & Moving Platforms
Moving cameras§ Background is “moving” – camera
motion
§ Foreground is moving
BG/FG classification§ Motion compensation
§ Background matching§ Matching image
§ Identification
§ Tracking
Challenges:
§ Background modeling and maintaining
§ Motion compensation (image stabilization)
22
Target Detection/Tracking (stationary sensor)
Background matching
Preprocessing
Video image
Detection Tracking
Background model
Background matching
Preprocessing
Video image
Detection Tracking
Background model
Motion compensation
23
It’s a challenging problem§ Appearance changes§ Time, lighting, weather…
§ Waking/sleeping objects§ BG objects moving, FG object still
§ Color/contrast aperture§ Subsumed BG/FB, Homogeneous region
§ Waving trees§ Vacillating BK
§ Apparent Motion§ Camera motion
Background Modeling
24
§ Pre-defined constant BK§ “Blue screen” - movie special effect
§ Everything is predefined – no need to be estimated on-line
§ Some preprocessing may be necessary – log filtering
§ Adjacent Frame Difference (AFD) approach
§ Constant BK, but unknown§ BK is modeled as intensity constant
§ Need parameter estimate/update on-line
§ Mean Estimate Approach
§ Linear model, i.e.
§ Mean-Covariance Approach
§ Both need to be estimated
§ Optimal estimators (Kalman filter)
§ Block Correlation Matching Approach
§ Block-wise median template
§ Correlation matching
Constant Intensity Model
),(1
),(1
),( yxIN
yxmN
Nyxm old +
−=
σ,m
25
§ Complex background§ Feature based approach - matching feature is a 4D Spatio-Temporal vector, i.e.
§ BK is modeled as a certain statistical distribution in the 4D vector space
§ Background update – temporal blending
§ Single Gaussian Estimate approach
§ Mixture of Gaussian Estimate Approach
§ BK is modeled as multiple Gaussian distributions
§ Multiple frequency Gaussian channels
§ Markov Model, EM (Expectation-Maximization) approaches
Statistical Feature Model
],,,[ tyx IIIIm =
Tmeanimean
N
ii
N
iimean mmmm
Nm
Nm ))((
11,1 −−−
== ∑∑ σ
),(1
),(1
),( yxmN
yxmN
Nyxm oldnew +
−=
Tmeanmeanoldnew mmmm
NN
NN
yx ))(()1(1
),(2
−−+
++
= σσ
mBB oldnew αα +−= )1(
26
§ Motion Estimate Techniques§ Instead of using intensity constant constraint, BK is modeled as constant motion/optical flow field
§ Matching feature is a 3D-vector, i.e.
§ Background update is an optical estimation problem
§ Extend to Multi-resolution detection and update
Motion Field Model
0=++ tyx IvIuI
],,[ tyx IIIm =
−=
∑∑
∑∑∑∑
−
ty
tx
yyx
yxx
IIII
IIIIII
vu
1
2
2
27
§ Statistical Prediction Techniques§ The BK pixels are predicted – what are expected in next input frame
§ Linear estimation problem - LS, Wiener filtering
§ More complex prediction model is possible
§ Motion/optical filed prediction model
§ Non-linear prediction model
Prediction Model
∑=
−=1
),(),(i
itit yxIayxB
[ ] [ ] [ ]itti
itt IIEaIEeE −=∑+=
1
22
28
§ Estimate as a Recognition Problem§ Training – motionless background frames
§ Feature extraction – statistical image feature
§ Eigenbackground
§ PCA (Principal Component Analysis)
§ Matching – PCA projection
Statistical Recognition Model
Image space PCA Space Image space
Background training
Live video projection
Foreground Background
29
- The problem of above approaches is to separate three detection/tracking processes into independent phases
- Low level – pixel-wise detection/segmentations- Middle level – labeling pixels as grouped targets- High level – temporal-tracking, Spatio-recognition
- An Integrated Approach– integrating the pixel classification, region detection, and inter-
frame tracking in closed-loop manner
More Techniques
Frame-wise processing
(Matching) Region-wise processing
(Clustering)
Pixel-wise processing
(Segmentation)
……
K-means clusteringLinear prediction
30
- Illumination invariant
More Techniques (con.)
γϕρα )( ,,,, yxyxyxyx LI =
)log()log()log()log()log( ,,,, yxyxyxyx LI ϕγργγα +++=
434 21321invariant-onilluminatidependent-onilluminati
),(),(),( yxyxyx MLI +=
Surface lighting model
)),((),(ˆ yxFilteryx IL =
),(ˆ),(),(ˆ yxyxyx LIM −=
)),(ˆ),(exp(),(ˆ yxyxyxm LI −=or
Strong surface shading: effective
Strong illumination gradients: less effective
Low intensity: none or worst
Illumination invariant
31
§ Moving cameras
- Background is “moving” –camera motion
- Foreground is moving
§ Motion compensation- Registering images and
computing geometric transformation that compensates the source image such that it aligns with reference images
Target Detection/Tracking (moving sensor)
Background matching
Preprocessing
Video image
Detection Tracking
Background model
Motion compensation
32
Motion Compensation
Parametric modeltranslation, affine, and perspective
++++++++
=
′′
xyuyuyvxvvxyuxuyvxvv
yx
22
1543
22
1210
Model fittingAn over-constrained optimal estimate problem- It’s hard – BK contains moving objects- Motion vector field vs. Feature based approaches- Iterative vs. Non-iterative approaches
33
Motion Vector Field Estimation
Parametric model – Affine transformation
+
=
′′
5
4
32
10
,
,
v
v
y
x
vv
vv
y
x
Optical flow tracking and warpingFrame i-1 Frame i
Source point Target point
Affine model defines warp of source frame to a reference frame
Normalized SSD measures the difference between warped source and target
SSD
AffineWarpRt0 RtRc
Multi-resolution Iterative refinement
34
Dynamic Object Tracking: Results
Hand-held camera: Multiple objects Tracked object visualized in 3D
Hand-held camera: Integration of mosaic, image stabilization, and object tracking
35
Dynamic Object Tracking: Results
UAV sensor: Integration of mosaic, image stabilization, and object tracking
36
Feature Matching
Parametric model – Affine transformation
+
=
′′
5
4
32
10
,
,
v
v
y
x
vv
vv
y
x
Feature tracking and warping
Multi-resolution Iterative refinement
Frame i
Frame i-1Feature
selection (N) & SSD
Affine (T1)
Affine (T2)
Affine (TM)
…
Selection optimal T
Affine Warp
37
Others
Perceptual Grouping Methodology - Tensor Voting- A simulation of perceptual organization – infer what we perceive from noise/missing data
- A Computational Framework for Segmentation and Grouping (formalized by USC Prof. Gérard Medioni)- Tensor Voting
- Description – data is represented as tensors to generate descriptions in terms of surface, regions, curves, and labeled junctions, from sparse, noisy, binary data in 2D/3D - Voting – how the tensors communicate and propagate information between neighbors
- Has been apply to many vision problems, including- Segmentation/detection- Motion tracking, Trajectory extraction- Stereo vision- Epipolar geometry estimation
38
Others (con.)
§ Multi-view Cameras- Continuous cross-view tracking
- Stationary platform – Stationary platform- Stationary platform – Moving platform- Moving platform – Moving platform
- Requires continuous and complete tracking trajectories
- Requires trajectories and view points registrations
39
Omnidirectional Image- Wide (360 degree) horizontal FOV- Less partial occlusions- Less motion ambiguities (pure translation and rotation)- Limited resolution – used for close range objects
Others (con.)
40
Benefits Using Panoramic Imaging
§ Wide FOV ensures:- A sufficient number of
features for tracking- Less partial occlusion
§ Accurate estimates for large motion- Provides sufficient
information for distinguishing motion ambiguities (pure translation and rotation)
41
Integration of Imagery and Range Data- Wide coverage- Rapidness and robustness- Direct recover of 3D models and geolocations
Others (con.)
Image warping
Camera parameters
Residual estimate
DEM
Reference images
Live images
Space Filtering
Detected targets LiDAR has accuracy typically as ~0.5-1.0m ground-spacing and centimeters height