Upload
moses-malone
View
239
Download
5
Tags:
Embed Size (px)
Citation preview
1
AdaBoost
2
Classifier
• Simplest classifier
3
4
Adaboost: Agenda
• (Adaptive Boosting, R. Scharpire, Y. Freund, ICML, 1996):
• Supervised classifier• Assembling classifiers
– Combine many low-accuracy classifiers (weak learners) to create a high-accuracy classifier (strong learners)
5
Example 1
6
Adaboost: Example (1/10)
7
Adaboost: Example (2/10)
8
Adaboost: Example (3/10)
9
Adaboost: Example (4/10)
10
Adaboost: Example (5/10)
11
Adaboost: Example (6/10)
12
Adaboost: Example (7/10)
13
Adaboost: Example (8/10)
14
Adaboost: Example (9/10)
15
Adaboost: Example (10/10)
16
17
Adaboost• Strong classifier = linear combination of T weak
classifiers
(1) Design of weak classifier (2) Weight for each classifier (Hypothesis weight) (3) Update weight for each data (example distribution)
• Weak Classifier: < 50% error over any distribution
)(xht
t
18
Adaboost: Terminology (1/2)
19
Adaboost: Terminology (2/2)
20
Adaboost: Framework
21
Adaboost: Framework
22
23
24
Adaboost• Strong classifier = linear combination of T weak
classifiers
(1) Design of weak classifier (2) Weight for each classifier (Hypothesis weight) (3) Update weight for each data (example distribution)
• Weak Classifier: < 50% error over any distribution
)(xht
t
25
Adaboost: Design of weak classifier (1/2)
26
Adaboost: Design of weak classifier (2/2)• Select a weak classifier with the smallest weighted
error
• Prerequisite: 2
1t
)]([)(minarg1 iji
m
i tjHh
t xhyiDhj
27
Adaboost• Strong classifier = linear combination of T weak
classifiers
(1) Design of weak classifier (2) Weight for each classifier (Hypothesis weight) (3) Update weight for each data (example distribution)
• Weak Classifier: < 50% error over any distribution
)(xht
t
28
Adaboost: Hypothesis weight (1/2)
• How to set ? t
it
it
iT
iii
i
ii
i
ii
Z
ZiD
)f(xyN
)f(xy
N
)H(xy
NHtraining
)(
)exp(1
else ,0
0 if , 11
else ,0
if , 11)(error
1
it
iiT Z
)f(xy
NiD
)exp(1)(1
29
Adaboost: Hypothesis weight (2/2)
)1
ln(2
1
t
tt
30
Adaboost• Strong classifier = linear combination of T weak
classifiers
(1) Design of weak classifier (2) Weight for each classifier (Hypothesis weight) (3) Update weight for each data (example distribution)
• Weak Classifier: < 50% error over any distribution
)(xht
t
31
Adaboost: Update example distribution (Reweighting)
y * h(x) = 1
y * h(x) = -1
32
Reweighting
In this way, AdaBoost “focused on” the informative or “difficult” examples.
33
Reweighting
In this way, AdaBoost “focused on” the informative or “difficult” examples.
34
35
Summary t = 1
)1
ln(2
1
t
tt
36
Example 2
Example (1/5)
Original Training set : Equal Weights to all training samples
Taken from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire
Example (2/5)
ROUND 1
Example (3/5)
ROUND 2
Example (4/5)
ROUND 3
Example (5/5)
42
Example 3
43
)1
ln(2
1
t
tt
44
)1
ln(2
1
t
tt
45
)1
ln(2
1
t
tt
46
)1
ln(2
1
t
tt
47
)1
ln(2
1
t
tt
48
)1
ln(2
1
t
tt
49
50
Example 4
51
Adaboost:
52
53
54
55
Application
56
Discussion
57
Discrete Adaboost (DiscreteAB)(Friedman’s wording)
58
Discrete Adaboost (DiscreteAB)(Freund and Schapire’s wording)
59
Adaboost with Confidence Weighted Predictions (RealAB)
60
Adaboost Variants Proposed By Friedman
• LogitBoost
61
Adaboost Variants Proposed By Friedman
• GentleBoost
62
Reference
63
64
Robust Real-time Object Detection
Key word : Features extraction, Integral Image , AdaBoost , Cascade
65
Outline1. Introduction2. Features
2.1 Features Extraction 2.2 Integral Image
3. AdaBoost3.1 Training Process3.2 Testing Process
4. The Attentional Cascade5. Experimental Results6. Conclusion7. Reference
66
1. IntroductionThis paper brings together new algorithms and
insights to construct a framework for robust and extremely rapid object detection.
Frontal face detection system achieves :1. High detection rates
2. Low false positive rates
Three main contributions:1. Integral image
2. AdaBoost : Selecting a small number of important features.
3. Cascaded structure
67
2. FeaturesBased on the simple features value.Reason :
1. Knowledge-based system is difficult to learn using a finite quantity of training data.
2. Much faster than Image-based system.
ps. Feature-based: Use extraction features like eye, nose pattern. Knowledge-based: Use rules of facial feature. Image-based: Use face segments and predefined face pattern.
[3] A.S.S.Mohamed, Ying Weng, S. S Ipson, and Jianmin Jiang, ”Face Detection based on Skin Color in Image by Neural Networks”, ICIAS 2007, pp. 779-783, 2007.
68
2.1 Feature Extraction (1/2)Filter:
Filter type. Feature:
a. pattern 的座標位置 . b. pattern 的大小
Feature value:Feature value = Filter Feature
EX: eye , nose
Ex: haar-like filter
EX:
convolution
69
feature value
2.1 Feature Extraction (2/2)• Haar-like filter: The sum of the pixels which lie within the
white rectangles are subtracted from the sum of pixels in the grey rectangles.
24
24
Figure 1
Filter type
Feature
Filter C
+ - +
70
2.2 Integral Image (1/6) Integral image
1. Rectangle features
2. Computed very rapidly
II(x , y) : sum of the pixels above and to the left of (x , y).
' '
' '
,
( , ) ( , )x x y y
ii x y i x y
71
2.2 Integral Image (2/6)
Figure 2: Integral image
4 4
4 ( ) 4 2
4 2 (3 ) 4 2 (3 1) 4 1 (2 3)
A B C D D A B C
D A B C D C
D A D
The sum within D can be computed as 4 + 1 - (2 + 3).
A :
3 = A + C
Known:A: Sum of the pixels within rectangle A.B: Sum of the pixels within rectangle B.C: Sum of the pixels within rectangle C.D: Sum of the pixels within rectangle D.Location 1 value is A. Location 2 value is A+B.Location 3 value is A+C. Location 4 value is A+B+C+D.
Q : The sum of the pixels within rectangle D = ?
1 = A 2 = A + B.4 = A + B + C + D
Equation:
72
2.2 Integral Image (3/6)Sum of the pixels
73
2.2 Integral Image (4/6) Using the following pair of recurrences to get integral image :
( , )s x y
( , ) ( , 1) ( , )
( , ) ( 1, ) ( , )
s x y s x y i x y
ii x y ii x y s x y
( , )ii x y( , )i x y
( , 1) 0s x
( 1, ) 0ii y
( , )ii x y
(1)
(2)
is the integral image.
is the original image.
is the cumulative row sum.
' '
' '( , ) ( , )x x y y
ii x y i x y
ps.
74
2.2 Integral Image (5/6)( , )ii x y( , )s x y( , )i x y
original image
integral imagecumulative row sum
21 3 64 10
4 7 8 6 13 18
( , ) ( , 1) ( , )
( , ) ( 1, ) ( , )
s x y s x y i x y
ii x y ii x y s x y
(1)
(2)
75
2.2 Integral Image (6/6)
( , )s x y
(2, 2)s
( , )ii x y
( , )ii x y( , )s x y( , )i x yoriginal image
integral imagecumulative row sum
…
(3*3) (3*3) (3*3)
(2,2)ii
…
(0,0)s(1,0)s(2,0)s(0,1)s(1,1)s
(0,0)ii(1,0)ii(2,0)ii(0,1)ii(1,1)ii
( , 1)s x y ( , )i x y(0, 1)s (0,0)i(1, 1)s (1,0)i
(2, 1)s (2,0)i(0,0)s (0,1)i(1,0)s (1,1)i
(2,1)s (2,2)i
( 1, )ii x y ( , )s x y
( 1,0)ii (0,0)s
(0,0)ii (1,0)s(1,0)ii (2,0)s( 1,1)ii (0,1)s(0,1)ii (1,1)s
(1,2)ii (2, 2)s
=0+1=1=0+1=1
=0+4=4
=1+2=3
=1+2=3
=7+1=8
=0+1=1
=1+1=2
=2+4=6
=0+3=3=3+3=6
= +
==
==
=
+
+
+
+
+
= +
=
==
==
=
= +
+
+
+
+
+
+
=10+8=18
76
3. AdaBoost (1/2)AdaBoost (Adaptive Boosting) is a machine learning
algorithm. AdaBoost works by choosing and combining weak classifiers
together to form a more accurate strong classifier !– Weak classifier:
positive , if filter( )<( )
negative , otherwise
Xh X
Image set
ThresholdFeature value
77
3. AdaBoost (2/2)Subsequent classifiers built are tweaked in favor of those
instances misclassified by previous classifiers. [4]The goal is to minimize the number of features that need to
be computed when given a new image, while still achieving high identification rates.
[4] AdaBoost - Wikipedia, the free encyclopedia , http://en.wikipedia.org/wiki/AdaBoost
3.4. Image weight adjusting
3.1 Training Process - Flowchart
,,
,1
, 1,...,t it i n
t jj
ww i n
w
3.0. Image weight initialization
4. Output: A strong classifier
Face(24x24)l 張
Non-Face(24x24)m 張
2. Feature Extraction: Using haar-like filter
設每張 image 可 extract 出 N 個 feature value共有 N*(l+m) 個 feature valuefeature
1. Input: Training image set X
(24x24)
T 個 weak classifiers
Candidate threshold θ
1
( ) ( )T
T t tt
H x h x
Weak classifier weight
Weak classifier
3. AdaBoost Algorithm: 1
, for postive2 , 1,...,1
, for negative2
ilw i n
m
, , ,1
, 1,... , 1,...n
i j t k i j k kk
w h x y i n j N
,
1,,
, if is classified correctly
, otherwiset i t i
t it i
w xw
w
3.1. Normalize image weight
3.2. Error calculation
3.3. Select a weak classifier ht with the lowest error εt
79
1. Input:
Training data set 以 X 表示 . 設有 l 張 positive image , m 張 negative image , 共 n (n=l+m) 張 image.
{ … } 1 1 0 1 0 1
裡的第 j 個 local feature value , 共有 N * n 個 local feature value.
3.2 Training Process - Input
1 1 2 2 n n i
1 , postive/face( , ), ( , ),..., ( , ) ,
0 , negative/non-faceX x y x y x y y
( )j if x 設每張 image 可以 extract 出 N 個 local feature , ix表示 image
80
3.2 Training Process - Feature Extraction (1/2)
Haar-like filter :
…
Candidate feature value (n*N 個 )convolution
… …
…
…
…
…
n 張 image
Ps. 1 張 image 可 extract 出 N 個 feature value ∴ N = 4 * f
f: feature number
2. Feature Extraction: Using haar-like filter
81
3.2 Training Process - Feature Extraction (2/2)Define weak classifiers : ,i jh
, , ,
,
1 , if ( )( )
0 , otherwise
i j j k i j i j
i j k
p f x ph x
,i j : 即 image i 的第 j 個 local feature value
( )j kf x : 即 image k 的第 j 個 local feature value
,
,
1 , if 0.5
1 , otherwise
i j
i jp
Non-face
Face
θ1,5 θ2,5 θ3,5θ4,5
Polarity :
Ex : 3 face & 1 non-face image extract by 5th feature
h1,5 h2,5h3,5 h4,5
ε1,5 ε2,5 ε3,5 ε4,5
82
3.2 Training Process - AdaBoost Algorithm (1/4)
1 , for postive/face image
2 , 1,...,1
, for negative/non-face image2
ilw i n
m
3-0. Image weight initialization :
3. AdaBoost Algorithm:
l is the quantity of positive images.m is the quantity of negative images.
83
Iterative: t = 1, … ,T T : weak classifier number
3-1. Normalize image weight:
3-2. Error calculation :
3-3. Select a weak classifier with the lowest error rate .
3-4. Image weight adjusting :
, , ,1
, 1,... , 1,...n
i j t k i j k kk
w h x y i n j N
3.2 Training Process - AdaBoost Algorithm (2/4)
,,
,1
, 1,...,t it i n
t jj
ww i n
w
th
Training data set X
error rate
, t1, t
, t
, if is classified correctly where =
, otherwise 1-t i t i
t it i
w xw
w
Candidate weak classifier
tpositive or negative
84
3.2 Training Process – Output (1/2)
1 1
1positive/face , ( )
( ) 2
negative/non-face , otherwise
T T
t t tt tT
h xH x
1logt
t
Weak classifier weight
threshold
Weak classifier weight ( )
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Weak classifier weight ( )
0
0.1
0.2
0.3
0.4
0.5
0.6
0 50 100 150 200 250
3.2 Training Process – Output (2/2)
1log( )t
tt
1 tt
t
如 Fig. B ,當 ε (error rate) 在 0 ~ 0.1 區間內與其他如 0.1 ~ 0.5 區間內即使 ε 有相同的變化量,所對應到的 α (weak classifier weight) 變化量差異也相當大,如此一來當 ε 越趨近於 0 時,即使 ε 只有些微改變,在 strong classifier 中其比重也會劇烈加大。因此,取 log 是為了縮小 weight 彼此間差距,使 strong classifier 中的各個 weak classifiers 均佔有一定比重。
Fig. B
Fig. C
ε
ε
α
α
ε
0.001 999 2.99
0.005 199 2.29
0.101 8.9 0.94
0.105 8.52 0.93
1 1
log( )
800
0.38
0.7
0.01
AdaBoost Algorithm – Image Weight Adjusting Example
If 取最小 , 則1,3 1,3
1,3
0.1670.2
1 1 0.167
初始值 0.167 0.167 0.167 0.167 0.167 0.167
經分類後 O O O O X O
Update 0.167*0.20.167*0.
20.167*0.2 0.167*0.2 0.167
0.167*0.2
Normalize 0.1 0.1 0.1 0.1 0.5 0.1
Weight 變化
1W 2W 3W 4W 5W 6W
,
1,,
, if is classified correctly
, otherwiset i t i
t it i
w xw
w
t =1 時
1tw
每一輪都將分對的 image 調低其 weight ,經過 Normalize 後,分錯的 image 的 weight 會相對提高,如此一來,常分錯的 image 就會擁有較高 weight 。如果一張 image 擁有較高 weight 表示在進行分類評估時,會著重在此 image 。
3.3 Testing Process - Flowchart
1 1
1positive , ( )
( ) 2
negative , otherwise
T T
t t tt tT
h xH x
Test Image1. Extract Sub-windows
2. Strong Classifier Detection
(360*420)
…
(360*420)
(24*28)(228*336)
…
3. Merge Result
(24*24)
(24*24)
Result Image
Downsampling
…
…
h1 h2 h3 hT
Strong Classifier
Reject windows Accept windows
…
Load T weak classifiers
About 100000 sub-windows
Sub-window
For allsub-windows
…average
coordinate
88
4. The Attentional Cascade (1/5) Advantage: Reducing testing computation time. Method: Cascade stages. Idea: Reject as many negatives as possible at the earliest stage. More complex
classifiers were then used in later stages. The detection process is that of a degenerate decision tree, so called “cascade”.
StageTrue positive
False positive
True negative
False negative
Figure 4 : Cascade Structure
89
4. The Attentional Cascade (2/4)θ
θθ
Stage 1
Stage 2 Stage 3
90
4. The Attentional Cascade (3/4) True positive rates (detection rates):
將 positive 判斷為 positive 機率
False positive rates (FP):將 negative 判斷為 positive 機率
True negative rates:將 negative 判斷為 negative 機率
False negative rates (FN):將 positive 判斷為 negative 機率
FP FN+ => Error Rate
True Positive
True Positive False Negative
False Positive
False Positive True Negative
True Negative
False Positive True Negative
False Negative
True Positive False Negative
91
4. The Attentional Cascade (4/4)Training a cascade of classifiers:
Involves two types of tradeoffs :1. Higher detection rates
2. Lower false positive rates
More features will achieve higher detection rates and lower false positive rates. But classifiers require more time to compute.
Define an optimization framework:1. the number of stages
2. the number of features in each stage
3. the strong classifier threshold in each stage
4. The Attentional Cascade - Algorithm (1/3)
93
f : Maximum acceptable false positive rate. ( 最大 negative 辨識成 positive 錯誤百分比 ) d : Minimum acceptable detection rate. ( 最小辨識出 positive 的百分比 ) : Target overall false positive rate. ( 最後可容許的 false positive rate)
Initial value:
P : Total positive images
N : Total negative images
f = 0.5
d = 0.9999
初始 False positive rate.
初始 Detection rate.
Threshold = 0.5 AdaBoost threshold
Threshold_EPS = Threshold adjust weight
i = 0 The number of cascade stage410
0 1.0F
0 1.0D
6arg 10t etF
argt etF
4. The Attentional Cascade - Algorithm (2/3)
While( ){
i=i+1
While( ){
( ) = AdaBoost(P,N, )While( ){
Threshold = Threshold – Threshold_EPS = Re-computer current strong classifier
detection rate with Threshold (this also affects )}
}If( )
N = false detections with current cascaded detector on the N}
argi t etF F
10,i i in F F 1*i iF f F
1i in n ,i iF D
1*i iD d D
iDiF
argi t etF F
Add Stage
Add Featurein
Threshold , 則Di ,Fi
Get New Di , Fi
N = Fi *N
Iterative:
4. The Attentional Cascade - Algorithm (3/3)f : Maximum acceptable false positive rate
d : Minimum acceptable detection rate
: Target overall false positive rate
P : Total positive images
N : Total negative images
i : The number of cascade stage
Fi : False positive rate at ith stage
Di : Detection rate at ith stage
ni : The number of features at ith stage
argt etF
95
5. Experimental Results (1/3)Face training set:
Extracted from the world wide web. Use face and non-face training images. Consisted of 4916 hand labeled faces. Scaled and aligned to base resolution of 24 by 24 pixels.
The non-face sub-windows come from 9544 images which were manually inspected and found to not contain any faces.
Fig. 5: Example of frontal upright face images used for training
96
5. Experimental Results (2/3) In the cascade training:
Use 4916 training faces. Use 10,000 non-face sub-windows. Use the AdaBoost training procedure.
Evaluated on the MIT+CMU test set: An average of 10 features out of a stage are evaluated per sub-window. This is possible because a large majority of sub-windows are rejected by
the first or second stage in the cascade. On a 700 Mhz Pentium III processor, the face detector can process a 384
by 288 pixel image in about .067 seconds .
97
5. Experimental Results (3/3)
Fig. 6: Create the ROC curve (receiver operating characteristic) the threshold of the final stage classifier is adjusted from . to
θ
98
Reference[1] P. Viola and M. Jones, “Rapid Object Detection Using A Boosted Cascade of
Simple Features”, Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol.1, pp. 511-518, 2001
[2] P. Viola and M. Jones, “Robust Real-time Object Detection”, IEEE International Journal of Computer Vision, vol.57, no.2, pp.137-154, 2001.
[3] A.S.S.Mohamed, Ying Weng, S. S Ipson, and Jianmin Jiang, ”Face Detection based on Skin Color in Image by Neural Networks”, ICIAS 2007, pp. 779-783, 2007.
[4] AdaBoost - Wikipedia, the free encyclopedia , http://en.wikipedia.org/wiki/AdaBoost