Image Analysis Window-based face detection: The Viola-Jones …cnikou/Courses/Image_Analysis/09_Face... · 2011. 5. 25. · −Split the data set and average over all possible splits

University of Ioannina - Department of Computer Science

Christophoros [email protected]

Image Analysis

Images taken from:D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003.Computer Vision course by Kristen Grauman, University of Texas at Austin.Computer Vision course by Svetlana Lazebnik, University of North Carolina at Chapel Hill.

Window-based face detection:The Viola-Jones algorithm

mailto:[email protected]

2

C. Nikou – Image Analysis (T-14)

Face detection and recognition

3


Face detection and recognition

Detection Recognition “Sally”

4


Consumer application: iPhoto 2009

http://www.apple.com/ilife/iphoto/

http://www.apple.com/ilife/iphoto/

5



It can be trained to recognize pets!

http://www.maclife.com/article/news/iphotos_faces_recognizes_cats

http://www.maclife.com/article/news/iphotos_faces_recognizes_cats

6



iPhoto decides that this is a face

http://www.flickr.com/groups/977532@N24/pool/

7


Outline

• Face recognition– Eigenfaces

• Face detection– The Viola and Jones algorithm.

P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2), 2004. Also in CVPR 2001.

M. Turk and A. Pentland. "Eigenfaces for recognition". Journal of Cognitive Neuroscience, 3(1):71–86, 1991. Also in CVPR 1991.

http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

http://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf

8


Recognition by finding patterns

• We have seen very simple template matching methods (using filters).

• Some objects behave like quite simple templates – Frontal face images.

• Strategy:– Build/train model (supervised classifier).– Find image windows.– Correct lighting.– Pass them to a statistical test (a classifier) that

accepts faces and rejects non-faces with respect to the model.

9


Basic ideas in classifiers

• Loss: some errors may be more expensive than others

– e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects

• false positives in diagnosis are better than false negatives

– We discuss two class classification: L(1 2) is the loss caused by calling 1 a 2.

• Total risk of strategy s is the total expected loss:

( ) Pr{1 2 | } (1 2) Pr{2 1| } (2 1)R s s L s L= → → + → →

• The optimal classifier minimizes the total risk.

10


Basic ideas in classifiers (cont.)

• Generally, we should classify as 1 if the expected loss of classifying as 2 is larger than calling a 2 a 1:

(1/ ) (1 2) (2 / ) (2 1)p x L p x L→ > →

• Equivalently, we should classify as 2 if(1/ ) (1 2) (2 / ) (2 1)p x L p x L→ < →

• Using Bayes rule:( /1) (1) (1 2) ( / 2) (2) (2 1)p x p L p x p L→ > →

( /1) (1) (1 2) ( / 2) (2) (2 1)p x p L p x p L→ < →

11



• Crucial notion: Decision boundary– points where the average loss is the same for

either case.

(1 2) (2 1) 1,(1 1) (2 2) 0L LL L

→ = → =→ = → =

12



• Some loss may be inevitable:− the total risk (shaded area) is called the

Bayes risk.

(1 2) (2 1) 1,(1 1) (2 2) 0L LL L

→ = → =→ = → =

13



• Example: Gaussian class densities, p-dimensional measurements with common covariance Σ and different means μ1, μ2and class priors πk:

( ) ( ) ( )2 1 121 1/ exp

2 2

pT

k kp x k x xμ μπ

−− −⎛ ⎞ ⎡ ⎤= Σ − − Σ −⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎣ ⎦

14



• Ignoring a common factor in the posteriors:

( ) ( ) ( )2 1 121 1/ exp

2 2

pT

k k kp k x x xπ μ μπ

−− −⎛ ⎞ ⎡ ⎤∝ Σ − − Σ −⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎣ ⎦

• The classifier boils down to choose the class minimizing

( ) ( )1 2logTk k kx xμ μ π−− Σ − −

• The common covariance simplifies the resulting expression.

15



• Cross validation to estimate the total risk.−Split the data set and average over all possible

splits.− Leave-one-out, 10-fold,…

• Bootstrapping to estimate better the decision boundary.−Retrain the classifier with the false positives

(FP) and false negatives (FN).−They are the most significant in the

configuration of the decision boundary.

16


Histogram based classifiers

• Use a histogram to represent the class-conditional densities p(x|1), p(x|2),…

• Advantage: estimates become quite good with enough data.

• Disadvantage: Histogram becomes big with high dimension.

– we may assume feature independence.

17


Example: finding skin pixels

• Skin has a very small range of (intensity independent) colours and little texture.

• Get class conditional densities p(x|skin)and p(x|not skin) from examples.

• Get priors p(skin) and p(not skin) from examples

– count the number of face/non face pixels.• The classifier becomes:

18


• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)

Feature x = Hue

P(x|skin)

Feature x = Hue

P(x|not skin)

Kristen Grauman

Example: finding skin pixels (cont.)

19


• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)

Feature x = Hue

P(x|skin)

Feature x = Hue

P(x|not skin)Now we get a new image, and want to label each pixel as skin or non-skin.

Kristen Grauman


20


posterior priorlikelihood

Where does the prior come from?

Why use a prior?


21


For every pixel in a new image, we can estimate the probability that it is generated by skin.

Classify pixels based on these probabilities

Brighter pixels higher probability of being skin

Kristen Grauman


22


Using skin color-based face detection and pose estimation as a video-based interface. Open CV, Gary Bradski, 1998.

Kristen Grauman


23



• Parameter θ encapsulates the relative loss. There is a family of classifiers for each value of θ.

– Each one has different FP and FN rates described by the Receiver Operating Characteristic (ROC) curve.

24


Window-based generic object detection framework

• Build/train object model– Choose a representation.

– Learn or fit parameters of model / classifier.

• Generate candidates in new image.

• Score the candidates.

25


Simple holistic descriptions of image contentgrayscale / color histogramvector of pixel intensities

Window-based models:building a model

26


• Pixel-based representations sensitive to small shifts.

• Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation.

Window-based models:building a model (cont.)

27


• Consider edges, contours, and (oriented) intensity gradients.


28


• Consider edges, contours, and (oriented) intensity gradients.

• Summarize local distribution of gradients with histogramLocally orderless: offers invariance to small shifts and rotationsContrast-normalization: try to correct for variable illumination


29


Car/non-car Classifier

Yes, car.No, not a car.

• Given the representation, train a binary classifier.


30


106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998 …

Support Vector Machines Conditional Random Fields

McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001,…

Boosting

Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

Discriminative classifiers

31


Window-based generic object detection framework

• Build/train object model

– Choose a representation

– Learn or fit parameters of model / classifier

• Generate candidates in new image

• Score the candidates

32



Window-based models:generating and scoring candidates

33



Feature extraction

Training examples

Training:1. Obtain training data2. Define features3. Define classifier

Given new image:1. Slide window2. Score by classifier

Window-based models:Summary

34


106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998 …

Support Vector Machines Conditional Random Fields

McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001,…

Boosting

Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

Discriminative classifiers

35


Boosting

• Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier

– A weak learner need only do better than chance.• Training consists of multiple boosting rounds

– During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners.

– “Hardness” is captured by weights attached to training examples.

Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.

http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/FreundSc99.ps

36


Boosting intuition

Weak Classifier 1

Slide credit: Paul Viola

37


Boosting illustration

WeightsIncreased

38



Weak Classifier 2

39



WeightsIncreased

40



Weak Classifier 3

41



Final classifier is a combination of weak classifiers

42


• Initially, weight each training example equally.

• In each boosting round:– Find the weak learner that achieves the lowest weighted training error.

– Raise weights of training examples misclassified by current weak learner.

• Compute final classifier as linear combination of all weak learners (weight of each learner is directly proportional to its accuracy).

• Exact formulas for re-weighting and combining weak learners depend on the particular boosting scheme (e.g. AdaBoost).

Slide credit: Lana Lazebnik

Boosting: training

43


Boosting pros and cons

• Advantages of boosting– Integrates classification with feature selection.– Complexity of training is linear instead of quadratic in

the number of training examples.– Flexibility in the choice of weak learners, boosting

scheme.– Testing is fast.– Easy to implement.

• Disadvantages– Needs many training examples.– Often doesn’t work as well as SVM (especially for

many-class problems).

44


Challenges of face detection

• Sliding window detector must evaluate tens of thousands of location/scale combinations.

– PCA is slow for real time face detection.• Faces in images are rare: 0–10 per image

– For computational efficiency, we should try to spend as little time as possible on the non-face windows.

– A megapixel image has ~106 pixels and a comparable number of candidate face locations.

– To avoid having a false positive in every image, the false positive rate has to be less than 10-6.

45


The Viola/Jones Face Detector

• A seminal approach to real-time object detection.• Training is slow but detection is very fast.• Key ideas

– Represent local texture with efficiently computed rectangular features (integral images) within the window of interest.

– Select discriminative features as weak classifiers.– Boosting for classification.– Attentional cascade for fast rejection of non-face

windows.

P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2), 2004. Also in CVPR 2001.

http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

46


“Rectangular filters”

Output = ∑ (pixels in white area) – ∑ (pixels in black area)

Image features

47


Image features (cont.)

• For real problems, the result depends strongly on the features used.

• Do not employ pixel values.• Use a very large set of

simple functions• sensitive to edges and other

critical features• multiple scales.

48


Example

49


Fast computation with integral images

• The integral image computes a value at each pixel (x,y) that is the sum of the pixel values above and to the left of (x,y) inclusive.

• This can be quickly computed in one pass.

(x,y)

50


Computing the integral image

51


Computing the integral image

Cumulative row sum: s(x, y) = s(x–1, y) + i(x, y) Integral image: ii(x, y) = ii(x, y−1) + s(x, y)

ii(x, y-1)

s(x-1, y)

i(x, y)

MATLAB: ii = cumsum(cumsum(double(i)), 2);

52


Computing sum within a rectangle

• Let A,B,C,D be the values of the integral image at the corners of a rectangle.

• The sum of original image values within the rectangle can be computed as:

sum = A – B – C + D

• Only 3 additions are required for any size of rectangle!

D B

C A

53


Example

Integral Image

• Feature output is different between adjacent regions.

54


Feature selection

• The integral image significantly reduces the number of computations.

• However, for a base resolution of 24x24 detection region, the number of possible rectangle features is ~160,000! (scale, size, type, position,…)

• It is impractical to evaluate the entire feature set for training.

• Can we create a good classifier using just a small subset of all possible features?

• How to select such a subset?

Use AdaBoost both to select the informative features and to form the classifier.

55


• We seek to select the single rectangle feature and threshold that best separates positive (faces) and negative (non-faces) training examples, in terms of weighted error.

Outputs of a possible rectangle feature on faces and non-faces.

…

Resulting weak classifier:

For next round, reweight the examples according to errors, choose another filter/threshold combo.

Kristen Grauman

Boosting for face detection

56


AdaBoost AlgorithmStart with uniform weights on training examples

Evaluate weighted error for each feature, pick best.

Re-weight the examples:Incorrectly classified -> more weightCorrectly classified -> less weight

Final classifier is combination of the weak ones, weighted according to error they had.

Freund & Schapire 1995

{x1,…xn}For T rounds

57


Boosting for face detection (cont.)

• The first two features selected by boosting.

• This feature combination can yield 100% detection rate and 50% false positive rate.

58


Boosting for face detection

• A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 over 14084.

Not good enough!

59


Attentional cascade

• Even if the filters are fast to compute, each new image has a lot of possible windows to search.

• How to make the detection more efficient?• There is no need to compute the linear

combination of all of the features in a window that is clear to be negative from the first features.

• Define a computational risk hierarchy.• The goal of the training process is different

– instead of minimizing classification errors minimize the false positive rate.

60


• Form a cascade with low false negative rates early on.

• Apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative

Attentional cascade (cont.)

61


• Train with 5K positives, 350M negatives (10k non face images).

• Real-time detector using 38 layer cascade.• Implemented in OpenCV.

Faces

Non-faces

Train cascade of classifiers with

AdaBoost

Selected features, thresholds, and weights

New image

Viola-Jones detector: summary

62


Output of face detector on test images

63


Other detection tasks

Facial Feature Localization

Male vs. female

Profile Detection

64


Profile Detection

65


Profile Features

66


Application: Naming characters in TV video

M. Everingham, J. Sivic, and A. Zisserman."Hello! My name is... Buffy" - Automatic naming of characters in TV video.Proceedings of the 17th British Machine Vision Conference (BMVC 2006).

67


Application of sliding windows: pedestrian detection

C. Papageorgiou and T. Poggio. A Trainable System for Object Detection. International Journal of Cmputer Vision, Vol. 38, No 1, pp. 15-22, 2000.

SVM on Haar wavelets.

68


Application of sliding windows: pedestrian detection (cont.)

P. Viola, M. Jones and D. Snow. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, Vol. 63, No 2, pp. 153-161, 2005.

Space-time rectangle features.

69


Application of sliding windows: pedestrian detection (cont.)

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005).

SVM on histogram of oriented gradients.

70


Summary: Viola-Jones detector

• Rectangle features.• Integral images for fast computation.• Boosting for feature selection.• Attentional cascade for fast rejection of

negative windows.

Documents

Image Analysis Window-based face detection: The Viola-Jones …cnikou/Courses/Image_Analysis/09_Face... · 2011. 5. 25. · −Split the data set and average over all possible splits