View
225
Download
0
Category
Preview:
Citation preview
Detecting Pedestrians Using Patterns of Motion and Appearance
Paul ViolaMicrosoft Research
Irfan UllahDept. of Info. and Comm. Engr.Myongji University
Michael J. Jones, and Daniel SnowMitsubishi Electric Research Laboratories
Copyright © solarlits.com
Contents
1. Introduction2. Background3. System architecture4. Objective5. Rectangle features6. Boosting algorithm7. Training algorithm8. Detection results9. Conclusions
• Pattern recognition approaches• Face, automobile, and pedestrian detection
• Works well for face detection
Introduction
Automobile
Face detection
Pedestrian detection
Training examples
Detector
Scanning
Pattern of intensities
• Researchers presumed that moving object is detected • Recognize, categorize, or analyze the long-term pattern of motion
Background
Low resolution 9 x 15 pixelsR. Cutler and L. S. Davis, 2000
Gavrila and Philomen (1999)
Pedestriain detection in static imagesDetection rates: 75%False positive rate: 2 per image
support vector machineFalse positive rate was higher in face detectionPapageorgiou et al. (1998)
Rectangle features and AdaBoostPaul Viola, Michael J. Jones, 2004
System architecture
Input ImageRectangle filterTwo-rectanglThree-triangle features
Motion filtersIntegral image1. Difference
2. Motion3. direction of motion , U, D, L and R
1. fi
2. fj
3. fk
4. fm
Final classifierPedestriandetection
AdaBoostClassifier from featuresThreshold filter
Training Process
• Pedestrian detection system • Integrates image intensity information with motion information• Detection style algorithm (using AdaBoost)• Detectors based on motion information and detectors based on appearance information• 4 frames/second with 20 x 15 pixels
• Representation of image motion• Pedestrian detection system• Under conditions (rain and snow)• Full human figures
Objective
Example
Rectangle features
difference between the sum of the pixels within two rectangular regions
Two-rectangle feature
Three-rectangle feature
sum within two outside rectangles subtracted from the sum in a center rectangle Four-rectangle feature
difference between diagonal pairs of rectanglesDark-Bright
(Bright1+Bright2)-Dark
Integral image
“Intermediate representation for the image”
Integral image
Original image above and to the left of x, y
Cumulative row sum
sum of the pixels within rectangle D
Sum of pixels in A
A+B
A+C A+B+C+D 4+1-(2+3) integral image: double integral of the imagefirst along rows and then along columns
i is the image and r is the box
Simard et al. (1999)
• Rectangle filters on motion pair• Two-rectangle filters
• Sum of the pixels within the lighter rectangles - Sum of pixels in the darker rectangles
• Three-rectangle filters• (Sum of pixels in the darker rectangle) 2 to account for twice as many lighter pixels
Detection of Motion Patterns
Bright-dark
• Motion information• Optical flow
• 100s or 1000s of operations per pixel
• Block motion estimation• This is not entirely compatible with multi-scale object
•Differences between pairs of images in time•Motion: Regions where the sum of the absolute values of the
differences is large•Direction of motion: Difference between shifted versions of the
second image in time with the first image
Detection of Motion Patterns
Filters
ri() is a single box of rectangular sum within the detection windowS is one of {U, L, R, D}
Region moving in a given direction
Measures closer to motion shear
φj is one of the rectangle filters
Magnitude of motion in one of the motion images
rk() is a single box rectangular sum within the detection window
Appearance filter
Integral image
Classifier
Feature is a thresholded filter that outputs one of two votes
Classifier is a thresholded sum of features
ti R is a feature threshold∈fi is one of the motion or appearance filtersReal-valued α and β are computed during AdaBoost learningfilter threshold ti and classifier threshold θ
Detection at multiple scales
• Scaling training images during tanning process• 20 × 15 training images• Pyramids are computed• Scale factor: 0.8 to generate each successive layer
of the pyramid
where Xl refers to the lth level of the pyramid
“Select the features and to train the classifier”Combining a collection of weak classification functions to form a stronger classifier
AdaBoost
Week classifier
f: featureθ: thresholdP: polarity (direction of the inequality)x is a (24 × 24) pixel sub-window of an image
“Generates final classifier”
Depends on designed system
Boosting algorithm
Example images Initialize weights
Final strong classifier
m and l are the number of negatives and positives
Normalize weights
Best weak classifier
Define ht (x)
where ft , pt , and θt are the minimizers of (error) t
Update weights
ei = 0 if xi is classified correctly, ei = 1 otherwise
Correctly classified
Training process
• To select a subset of features and construct the classifier• AdaBoost
• Learning round• Appearance filters• Motion direction filters• Motion shear filters• Motion magnitude filters• Threshold• α and β votes of each feature
• Lowest weighted errorCascade architecture
Fewest features
False positive Detection rate
“classifiers are applied to every sub-window”Initial classifier eliminates a large number of negative examples with very little processing
Training process
False positive rate of the cascade
Detection rate
Expected number of features
K: number of classifiersfi: falsepositive rate of the ith classifier on the examples
di: detection rate of the ith classifier on the examples
pi is the positive rate of the ith classifierni are the number of features in the ith classifier
Optimization framework
• the number of classifier stages• the number of features, ni, of each stage• the threshold of each stage
Training algorithm for building a cascaded detector
Selects f and d per layer Overall false positive rateFtarget
Acceptable false positive rateMinimum acceptable detection rate
while Fi > Ftarget
Train classifier with ni features using AdaBoostUse P and N
Evaluate current classifier
Decrease threshold
until detection rate
evaluate detector on set of non-face imagesput any false detections into the set N
P = set of positive examplesN = set of negative examplesF0 = 1.0D0 = 1.0i = 0
• 8 set of video sequences of street with pedestrians
• Each contain 2000 frames• 1 frame of each sequence is used for
training• Other two sequences were used to test
the detectors• Examples
• 2250 positive and 2250 negative examples• 20 × 15 pedestrian images
Experiments
6 sequences used for training
• Variance normalization is performed• To reduce contrast
Experiments
Positive training examples
2250 positive exemples2250 false positive
Détection threshold
• Training• Dynamic pedestrian detector: 54,624 filters• Static detector: 24,328 filters• 20 × 15 pixel window
Training the cascade
Difference in motion
Pedestrians in the centerStand out from background
The first 5 filters learned for the static pedestrian detector
First 5 filters learned for the dynamic pedestrian detector
Legs Chest
• Dynamic detector• few false positive
Detection results
Dynamic detector Static detectorRain
Static detector• More false positive
Detection results
At 80% detection rate:
dynamic detector: 1/400,000
static detector: 1/15,000.
At 80% detection rate:
both detectors: 1/400,000
false positive every 2 frames for the 360×240
“Sequence 2 has some highly textured areas such as the tree and grass”
• Detection style algorithm• Combines motion and appearance information• Low false positive rate
• low computation time• 0.25 seconds to detect pedestrians in 360 × 240 pixel image• With 2.8 GHz P4 processor• 0.1 seconds: scanning the cascade over all positions and scale the image• 0.15 seconds: creating the pyramids of difference images
• Applications • human motion (running, jumping)• Facial expression classification• Lip reading
Conclusions
Thanks ?
Recommended