8/12/2019 Schneiderman Kanade Viola Jones Presentation
1/20
Object Detection Using the Statistics of Parts
Presented by Nicholas Chan16-721 Advanced Perception
Robust Real-time Object Detection
Henry Schneiderman and Takeo Kanade
Paul Viola and Michael Jones
8/12/2019 Schneiderman Kanade Viola Jones Presentation
2/20
Object Detection Using the Statistics of Parts
Object detectors trained using information on image parts
Henry Schneiderman and Takeo Kanade
8/12/2019 Schneiderman Kanade Viola Jones Presentation
3/20
So whats a part?
Intuitively a part is a portion of an object For the purposes of image processing a
part is a group of features that arestatistically dependent.
The assumption being that certain groups of pixels
in an image tend to appear together and are(relatively) independent of other groups.
8/12/2019 Schneiderman Kanade Viola Jones Presentation
4/20
Choosing parts
First wavelet transform is applied to the image.This decorrelates the pixels, localizingdependencies and therefore producing more
focused parts.
A wavelet transform is the result ofapplying a series of wavelet filters toan image. The result is horizontal,vertical and diagonal responses forseveral scales.
8/12/2019 Schneiderman Kanade Viola Jones Presentation
5/20
Choosing parts (2)Next, seventeen hand designed local operatorsare applied across the image.These local operators combine pairs of filterresults from the wavelet transform. Some relatehorizontal to vertical responses, whereas othersrelate responses to those of the same orientationbut different scale.
The output is discrete over 3 8 values. These arethe parts.
Are we even talking about parts of anything anymore..?
8/12/2019 Schneiderman Kanade Viola Jones Presentation
6/20
Choosing parts (3)Intra-Subband
Inter-orientation
Inter-frequency
Inter-frequency/Inter-orientation
Local operator
Local operator
Local operator
Box o Mystery
Parts
8/12/2019 Schneiderman Kanade Viola Jones Presentation
7/20
Classification by parts
Using this definition of parts and the baseassumption that pixels within parts areindependent of those outside parts, a classifier can
be obtained:
r r
r
object non part P object part P
)|()|(
A simple independence assumption
8/12/2019 Schneiderman Kanade Viola Jones Presentation
8/20
Learning by parts
P(part | object) and P(part | non-object) arecalculated with a simple MLE:
)(
)&()|(
object count
object part count object part P
AdaBoost is used to improve classificationaccuracy (more on this later).
8/12/2019 Schneiderman Kanade Viola Jones Presentation
9/20
Detection examples
8/12/2019 Schneiderman Kanade Viola Jones Presentation
10/20
Robust Real-time Object DetectionPaul Viola and Michael Jones
High-speed face detection with good accuracy
8/12/2019 Schneiderman Kanade Viola Jones Presentation
11/20
The detector
A simple filter bank with learned weightsapplied across the image
But with some notable performance-boosting implementation tricks
8/12/2019 Schneiderman Kanade Viola Jones Presentation
12/20
Three big speed gains
Integral image representation andrectangle features
Selection of a small but effective featureset with AdaBoost
Cascading simple detectors to quicklyeliminate false positives
8/12/2019 Schneiderman Kanade Viola Jones Presentation
13/20
The integral image representation
An image representation that stores the sum of theintensity values above and to the left of the imagepoint.
x, y
IntegralImage(x,y) = Sum of the values in the grey region
So whats it good for?
8/12/2019 Schneiderman Kanade Viola Jones Presentation
14/20
The integral image representation
This representation allows rectangular featureresponses to be calculated in constant time.Rectangular features are simple filters that have
only +1 and -1 values and are well rectangles.
Two-rectangle features Three-rectangle features I bet you can guesswhat these are called
With an integral image and rectangular features, filterresponses are just a fixed number of table lookups and
additions away.
8/12/2019 Schneiderman Kanade Viola Jones Presentation
15/20
Speed gain number two: AdaBoost selected features
AdaBoost is used to select the best set ofrectangular features.
AdaBoost iteratively trains a classifier by
emphasizing misclassified training data. Assigned feature weights are used to select themost important features.
Top two features weighted by AdaBoost
8/12/2019 Schneiderman Kanade Viola Jones Presentation
16/20
Intermediate results
The face detector using 200 AdaBoost-selectedfeatures achieved a 1 in 14084 false positive ratewhen turned for a 95% classification rate.
An 384x288 image took 0.7 seconds to scan.
There are more improvements to be made
8/12/2019 Schneiderman Kanade Viola Jones Presentation
17/20
Speed gain number three:Cascading detectors
Instead of applying all 200 filters at every locationin the image, train several simpler classifiers toquickly eliminate easy negatives.
Each successive filter can be trained on truepositives and the false positives passed by thefilters before it.The filters are trained to allow approximately 10%false positives.
200Features
Imagesegment
Reject
Accept 20Features
Imagesegment
Reject
Accept20Features
Reject
8/12/2019 Schneiderman Kanade Viola Jones Presentation
18/20
Cascade improvements
The cascadingfeatures provide
comparable accuracy,but ten times the
speed.
8/12/2019 Schneiderman Kanade Viola Jones Presentation
19/20
Results
Good accuracy with very fast evaluation.
0.067 Seconds per image. An average of 8 out of 4297 features evaluated.
8/12/2019 Schneiderman Kanade Viola Jones Presentation
20/20
Detection examples