View
229
Download
0
Category
Preview:
Citation preview
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
1/50
1
Technology for a better society
INF 5300 Advanced Topic: Video Content Analysis
Asbjrn Berge
Random algorithms in Computer Vision
x
y
Technology for a better society
Outline
Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities
Robust fittingby randomsampling
Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details
Randomizedclassifiers
2
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
2/50
2
Technology for a better society
Structure from motion
Obtain 3D scene structure frommultiple images from the same camerain different locations, poses
Typically, camera location & posetreated as unknowns
Track points across frames, infercamera pose & scene structure fromcorrespondences
Simultaneous Location And Mapping (SLAM)
Localize a robot and map itssurroundings with a single camera
3
Inferring 3D
Technology for a better society
3D Reconstruction
InternetPhotos(Colosseum) Reconstructed3Dcamerasandpointshttp://photosynth.net/default.aspx
http://phototour.cs.washington.edu/applet/index.html
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
3/50
3
Technology for a better society
Why extract features? Motivation: panorama stitching
We have two images how do we combine them?
Technology for a better society
Why extract features?
Motivation: panorama stitching
We have two images how do we combine them?
Step1:extractfeatures
Step2:matchfeatures
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
4/50
4
Technology for a better society
Why extract features? Motivation: panorama stitching
We have two images how do we combine them?
Step1:extractfeatures
Step2:matchfeatures
Step3:alignimages
Technology for a better society
Local invariant features: outline
1) Detection: Identify the interestpoints
2) Description: Extract vectorfeature descriptor surrounding
each interest point.3) Matching: Determine
correspondence betweendescriptors in two views
],,[)1()1(
11 dxx x
],,[ )2()2(12 dxx x
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
5/50
5
Technology for a better society
Computing transformations
Given a set of matches between images A and B
How can we compute the transform T from A to B?
Find transform T that best agrees with the matches
Technology for a better society
Evaluating the results
How can we measure the performance of a feature matcher?
50
75
200
featuredistance
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
6/50
6
Technology for a better society
True/false positives
The distance threshold affects performance True positives = # of detected matches that are correct
Suppose we want to maximize thesehow to choose threshold?
False positives = # of detected matches that are incorrect Suppose we want to minimize thesehow to choose threshold?
50
75
200false match
true match
featuredistance
Howcanwemeasuretheperformanceofafeaturematcher?
Technology for a better society
Robustnessoutliers
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
7/50
7
Technology for a better society
Robustness Lets consider a simpler example
How can we fix this?
Problem:Fitalinetothesedatapoints Leastsquaresfit
Technology for a better society
Idea
Given a hypothesized line
Count the number of points that agree with the line
Agree = within a small distance of the line
I.e., the inliers to that line
For all possible lines, select the one with the largest number of inliers
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
8/50
8
Technology for a better society
How do we find the best line?
Unlike least-squares, no simple closed-form solution
Hypothesize-and-test Try out many lines, keep the best one
Which lines?
Technology for a better society
Translations
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
9/50
9
Technology for a better society
RAndom SAmple Consensus
Selectone matchatrandom,countinliers
Technology for a better society
RAndom SAmple Consensus
Selectanothermatchatrandom,countinliers
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
10/50
10
Technology for a better society
RAndom SAmple Consensus
Outputthetranslationwiththehighestnumberofinliers
Technology for a better society
RANSAC
Idea: All the inliers will agree with each other on the translation
vector; the (hopefully small) number of outliers will (hopefully)disagree with each other
RANSAC only has guarantees if there are < 50% outliers
All good matches are alike; every bad match is bad in its ownway.
Tolstoy via Alyosha Efros
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
11/50
11
Technology for a better society
),(Pf
Omin OPI ,: such that:
TT PPPPf 1),(
Model parameters
RANSAC
[Fischler & Bolles, 1981}
(RANdom SAmple Consensus) :Learning technique to estimateparameters of a model by randomsampling of observed data
Technology for a better society
Algorithm:
1. Sample (randomly) the number of points required to fit the model2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
RANSAC
[Fischler & Bolles, 1981}
(RANdom SAmple Consensus) :
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
12/50
12
Technology for a better society
RANSAC
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Line fitting example
Technology for a better society
RANSAC
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Line fitting example
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
13/50
13
Technology for a better society
RANSAC
6IN
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Line fitting example
Technology for a better society
RANSAC
14INAlgorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
14/50
14
Technology for a better society
RANSAC
Inlier threshold related to the amount of noise we expect ininliers Often model noise as Gaussian with some standard deviation (e.g., 3
pixels)
Number of rounds related to the percentage of outliers weexpect, and the probability of success wed like to guarantee Suppose there are 20% outliers, and we want to find the correct answer
with 99% probability
How many rounds do we need?
Technology for a better society
RANSAC
xtranslation
ytranslation
setthresholdsothat,e.g.,
95%oftheGaussian
liesinsidethatradius
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
15/50
15
Technology for a better society
RANSAC
Back to linear regression
How do we generate a hypothesis?
x
y
Technology for a better society
RANSAC
x
y
Back to linear regression
How do we generate a hypothesis?
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
16/50
16
Technology for a better society
RANSAC
General version:1. Randomly choose s samples
Typically s = minimum sample size that lets you fit a model
2. Fit a model (e.g., line) to those samples
3. Count the number of inliers that approximately fit the model
4. Repeat Ntimes
5. Choose the model that has the largest set of inliers
Technology for a better society
How big is s?
For alignment, depends on the motion model
Here, each sample is a correspondence (pair of matching points)
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
17/50
17
Technology for a better society
Final step: least squares fit
Findaveragetranslationvectoroverallinliers
Technology for a better society
Choosing the parameters
Initial number of points s Typically minimum number needed to fit the model
Distance threshold t Choose tso probability for inlier isp (e.g. 0.95)
Zero-mean Gaussian noise with std. dev. : t2=3.842
Number of samples N Choose Nso that, with probabilityp, at least one random sample is free
from outliers (e.g.p=0.99) (outlier ratio: e)
sepN 11log/1log
pe Ns 111proportion of outliers e
s 5% 10% 20% 25% 30% 40% 50%
2 2 3 5 6 7 11 17
3 3 4 7 9 11 19 35
4 3 5 9 13 17 34 72
5 4 6 12 17 26 57 146
6 4 7 16 24 37 97 293
7 4 8 20 33 54 163 588
8 5 9 26 44 78 272 1177
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
18/50
18
Technology for a better society
Algorithmic specificities
Termination when when inlier ratio reaches expected ratio of inliers
e is often unknown a priori, so pick worst case, e.g. 50%, and adapt if more inliers are found,e.g. 80% would yield e=0.2
N=, sample_count=0
While N>sample_countrepeat
Choose a sample and count the number of inliers
Set e=1-(number of inliers)/(total number of points)
Recompute Nfrom e
Increment the sample_countby 1
Terminate
neT 1
sepN 11log/1log
Technology for a better society 36* From Marc Pollefeys COMP 256 2003
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
19/50
19
Technology for a better society
RANSAC conclusionsGood Robust to outliers
Applicable for larger number of parameters than Hough transform
Parameters are easier to choose than Hough transform
Bad Computational time grows quickly with fraction of outliers and number of
parameters
Not good for getting multiple fits
Common applications Robust linear regression (and similar).
Computing a homography (e.g., image stitching)
Estimating fundamental matrix (relating two views)
Technology for a better society
Sounds familiar?
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
20/50
20
Technology for a better society
VLFeat demo of Ransac Homography fit
335tentativematches
209(62.39%)inliner matches out of335
Mosaic
Technology for a better society
Reading materials and tools
R. Szeliski: Computer Vision: Algorithms and Applications
Chapters 4.1, 6.1 http://szeliski.org/Book/
M. Zuliani: Ransac for dummieshttp://vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf
ToolsRansac toolbox: https://github.com/RANSAC/RANSAC-ToolboxVlFeat toolbox : http://www.vlfeat.org
OpenCV 3D reconstruction:http://opencv.itseez.com/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
21/50
21
Technology for a better society
Outline
Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities
Robust fittingby randomsampling
Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details
Randomized
classifiers
41
Technology for a better society
Recap: AdaBoost Adaptive Boosting
Instead of resampling, reweight misclassified training examples. Increase the chance of being selected in a sampled training set.
Or increase the misclassification cost when training on the full set.
Components
: weak or base classifier Condition:
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
22/50
22
Technology for a better society
Recap: AdaBoost Intuition
43 B. Leibe
Consider a 2D feature
space with positive and
negative examples.
Each weak classifier splits
the training examples with
at least 50% accuracy.
Examples misclassified by
a previous weak learnerare given more emphasis
at future rounds.
Slide credit: Kristen Grauman Figure adapted from Freund & Schapire
43
Technology for a better society
Recap: AdaBoost Intuition
44 B. Leibe
Slide credit: Kristen Grauman Figure adapted from Freund & Schapire
44
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
23/50
23
Technology for a better society
Recap: AdaBoost Intuition
45
Final classifier iscombination of the
weak classifiers
Slide credit: Kristen Grauman
45
Technology for a better society
Recap: AdaBoost AlgorithmStart with uniformweights on trainingexamples
Evaluateweightederrorfor each feature, pick
best.
Re-weight the examples:Incorrectly classified -> more weightCorrectly classified -> less weight
Final classifier is combination of the weak ones, weightedaccording to error they had.
[Freund & Schapire, 1995]
{x1,xn}
For T rounds
46
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
24/50
24
Technology for a better society
Randomized Decision Forests Very fast tools for
classification
clustering
regression
Good generalization through randomized training
Inherently multi-class automatic feature sharing [Torralba et al. 07]
Simple training / testing algorithms
RandomizedDecisionForests=RandomizedForests=RandomForestsTM
Technology for a better society
A brief history of forests
[ L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees (CART). 1984 ]
[ Y. Amit and D. Geman.Randomized enquiries about shape; An application to
handwritten digit recognition. Technical Report 1994]
[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. 1997 ]
[ L. Breiman. Random forests. 1999, 2001 ]
[ V. Lepetit and P. Fua. Keypoint recognition using randomized trees.2005, 2006 ]
[ F. Moosman, B. Triggs, F. Jurie. Fast discriminative visual codebooks using
randomized clustering forests. 2006]
[ G. Rogez, J. Rihan, S. Ramalingam, P. Orrite, C. Torr. Randomized trees for human pose detection. 2008 ]
[ C. Leistner, A. Saffari, J. Santner, H. Bischoff. Semi-supervised random forests. 2009 ]
[ A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischoff. On-line random forests. 2009 ]
[ S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli. Decision tree fields. 2011 ]
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
25/50
25
Technology for a better society
What can decision forests do? tasks
Regression forests
Classification forests
Semi-supervised forests
Technology for a better society
What can decision forests do? applications
Regression forests
Classification forests
Semi-supervised forestse.g. semantic segmentation
e.g. object localization e.g. semi-sup. semantic segmentation
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
26/50
26
Technology for a better society
Decision trees and decision forests
A forest is an ensemble of trees. The trees are all slightly different from one another.
[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural
Computation. 9:1545--1588, 1997]
[ L. Breiman. Random forests. Machine Learning. 45(1):5--32, 2001]
Is toppart blue?
Is bottompart green?
Is bottompart blue?
A decision tree
terminal (leaf) node
internal(split) node
root node0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
A general tree structure
Technology for a better society
Inputtestpoint Split the data at node
Decision tree testing (runtime)
Input data in feature space
Prediction at leaf
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
27/50
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
28/50
28
Technology for a better society
The decision forest model
Basic notation
Output/label space Categorical, continuous?e.g.
Input data point e.g. Collection of feature responses . d=?
Feature response selector Features can be e.g. wavelets? Pixel intensities? Context?
Forest model
tree
Node weak learner The test function for splitting data at a node j.e.g.
Node objective function (train.) The energy to be minimized when training the j-thsplit nodee.g.
Stopping criteria (train.) e.g. max tree depth = When to stop growing a tree during training
The ensemble model How to compute the forest output from that of individual trees?e.g.ensemble
Forest size Total number of trees in the forest
Leaf predictor model Point estimate? Full distribution?e.g.
Randomness model (train.)e.g. 1. Bagging,
2. Randomized node optimization
How is randomness injected during training? How much?
Node test parametersParameters related to each split node:i) which features, ii) what geometric primitive, iii) thresholds.
Technology for a better society
Decision forest model: the randomness model
1) Bagging (randomizing the training set)
The full training set
The randomly sampled subset of training data made available for the tree t
Forest training
Efficient training
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
29/50
29
Technology for a better society
Decision forest model: the randomness model
The full set of all possible node test parameters
For each node the set of randomly sampled features
Randomness control parameter.For no randomness and maximum tree correlation.For max randomness and minimum tree correlation.
2) Randomized node optimization (RNO)
S mal l val ue o f ; l it tl e t ree corr el at io n. Larg e val ue o f ; l arg e t ree corr el at io n.
The effect of
Node weak learner
Node test params
Node training
Technology for a better society
Decision forest model: the ensemble modelAn example forest to predictcontinuous variables
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
30/50
30
Technology for a better society
Decision forest model: training and information gain
Before
split
Information gain
Shannons entropy
Node training
(for categorical, non-parametric distributions)
Split1
Split2
Technology for a better society
Decision forest model: training and information gain
Information gain
Differential entropy of Gaussian
Node training
Before
split
(for continuous, parametric densities)
Split1
Split2
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
31/50
31
Technology for a better society
Background: overfitting and underfitting
Technology for a better society
Classification forests
Efficient, supervised multi-class classification
[ V. Lepetitand P. Fua. Keypoint Recognition Using Randomized Trees. IEEE Trans. PAMI. 2006.]
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
32/50
32
Technology for a better society
Decision Tree Pseudo-Codedouble[]ClassifyDT(node,v)
ifnode.IsSplitNode thenifnode.f(v)>=node.t then
returnClassifyDT(node.right,v)else
returnClassifyDT(node.left,v)end
else
returnnode.Pend
end
Technology for a better society
feature vectors are x, y coordinates: ,
split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green
x
yToy Example Try several lines, chosen at
random
Keep line that best separates data
information gain
Recurse
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
33/50
33
Technology for a better society
feature vectors are x, y coordinates: ,
split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green
x
yToy Example Try several lines, chosen at
random
Keep line that best separates data
information gain
Recurse
Technology for a better society
x
y
feature vectors are x, y coordinates: ,
split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green
Toy Example Try several lines, chosen at
random
Keep line that best separates data
information gain
Recurse
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
34/50
34
Technology for a better society
feature vectors are x, y coordinates: ,
split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green
x
yToy Example Try several lines, chosen at
random
Keep line that best separates data
information gain
Recurse
Technology for a better society
Recursively split examples at node n set In indexes labeled training examples (vi, li):
At node , is histogram of example labels
Randomized Learning
left split
right split thresholdfunction ofexample is
feature vector
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
35/50
35
Technology for a better society
Randomized Learning Pseudo Code
TreeNode LearnDT(I)
repeatfeatureTests timesletf =RndFeature()letr =EvaluateFeatureResponses(I,f)repeatthreshTests times
lett =RndThreshold(r)let(I_l, I_r)=Split(I,r,t)letgain =InfoGain(I_l,I_r)ifgain isbestthenrememberf,t,I_l,I_r
end
end
ifbestgain issufficientreturnSplitNode(f,t,LearnDT(I_l),LearnDT(I_r))
elsereturnLeafNode(HistogramExamples(I))
end
end
Technology for a better society
Forest is ensemble of several decision trees
classification is
_|
A Forest of Trees
tree 1 tree
categoryc
categoryc
split nodes
leaf nodes
[Amit & Geman 97][Breiman 01][Lepetit et al. 06]
v v
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
36/50
36
Technology for a better society
Decision Forests Pseudo-Codedouble[]ClassifyDF(forest,v)
//allocatememoryletP =double[forest.CountClasses]//loopovertreesinforestfort =1toforest.CountTrees
letP =ClassifyDT(forest.Tree[t],v)P =P +P//sumdistributions
end
//normaliseP =P /forest.CountTrees
end
Technology for a better society
Learning a Forest
Divide training examples into subsets improves generalization
reduces memory requirements & training time
Train each decision tree on subset same decision tree learning as before
Multi-core friendly Subsetscanbechosenatrandomorhandpicked
Subsetscanhaveoverlap(andusuallydo)
Canenforcesubsetsofimages (notjustexamples)
Couldalsodividethefeaturepoolintosubsets
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
37/50
37
Technology for a better society
Learning a Forest Pseudo CodeForest LearnDF(countTrees,I)
//allocatememoryletforest =Forest(countTrees)//loopovertreesinforestfort =1tocountTrees
letI_t =RandomSplit(I)forest[t]=LearnDT(I_t)
end
//returnforestobjectreturnforest
end
Technology for a better society
Toy Forest Classification Demo
6 classes in a 2 dimensional feature space.Split functions are lines in this space.
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
38/50
38
Technology for a better society
Toy Forest Classification Demo
With a depth 2 tree, you cannot separate all six classes.
Technology for a better society
Toy Forest Classification Demo
With a depth 3 tree, you are doing better, but still cannot separate all six classes.
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
39/50
39
Technology for a better society
Toy Forest Classification Demo
With a depth 4 tree, you now have at least as many leaf nodes as classes,and so are able to classify most examples correctly.
Technology for a better society
Toy Forest Classification Demo
Different trees within a forest can give rise to very different decision boundaries,none of which is particularly good on its own.
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
40/50
40
Technology for a better society
Toy Forest Classification Demo
But averaging together many trees in a forest can result in decision boundariesthat look very sensible, and are even quite close to the max margin classifier.(Shading represents entropy darker is higher entropy).
Technology for a better society
Classification forestTraining data in feature space
?
?
?
Entropy of a discrete distribu tion
with
Classification treetraining
Obj. funct. for node j (information gain)
Training node j
Output is categorical
Input data point
Node weak learner
Predictor model (class posterior)
Model specialization for classification
( is feature response)
(discrete set)
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
41/50
41
Technology for a better society
Classification forest: the weak learner model
Node weak learner
Node test params
Splitting data at node j
Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section
Examples of weak learners
See Appendix C for relation with kernel trick.
Feature responsefor 2D example.
With a generic line in homog. coordinates.
Feature responsefor 2D example.
With a matrix representing a conic.
Feature responsefor 2D example.
In general may select only a very small subset of features
With or
Technology for a better society
Classification forest: the prediction model
What do we do at the leaf?
leafleaf
leaf
Prediction model: probabilistic
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
42/50
42
Technology for a better society
Classification forest: the ensemble model
Tree t=1 t=2 t=3
Forest output probability
The ensemble model
Technology for a better society
Training different trees in the forest
Testing different trees in the forest
(2 videos in this page)
Classification forest: effect of the weak learner model
Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic
Accuracy of prediction
Quality of confidence
Generalization
Three concepts to keep in mind:
Training points
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
43/50
43
Technology for a better society
Training different trees in the forest
Testing different trees in the forest
Classification forest: effect of the weak learner model
Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic(2 videos in this page)
Training points
Technology for a better society
Classification forest: effect of the weak learner modelTraining different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic(2 videos in this page)
Training points
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
44/50
44
Technology for a better society
Classification forest: with >2 classes
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic(2 videos in this page)
Training points
Technology for a better society
Classification forest: effect of tree depth
max tree depth, D
overfittingunderfitting
T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic
Predictor model = prob.(3 videos in this page)
Training points: 4-class mixed
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
45/50
45
Technology for a better society
Classification forest: analysing generalization
Parameters: T=200, D=3, leaf model = probabilistic
Weak learner: axis aligned
Weak learner: oriented line
Weak learner: conic section
Training points
(3 videos in this page. Increasing T)
Technology for a better society
Classification forest: analysing generalization
Parameters: T=200, D=13, w. l. = conic, predictor = prob.(3 videos in this page)
Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps
Testingposteriors
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
46/50
46
Technology for a better society
Classification forest: comparison with boosting
[Boosting code in http://graphics.cs.msu.ru/ru/science/research/machinelearning/adaboosttoolbox]
Boosting parameters: 200 weak learners.Weak learners = axis aligned
Forest parameters: T=200, D=13,w. l. = axis aligned,l. m. = probabilistic
Classification forest ModestBoost ModestBoost (soft output)
Example1
Example
2
Technology for a better society
Increased uncertaintyaway from trainingpoints.
Classification forest: comparison with SVM
Max-margin likebehaviour formulti-class problem
Increased uncertaintyin mixed regions
Max-margin likebehaviour formulti-class problem
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
47/50
47
Technology for a better society
Increased uncertaintyaway from trainingpoints.
Classification forest: comparison with SVM
Max-margin likebehaviour formulti-class problem
Increased uncertaintyin mixed regions
Max-margin likebehaviour for
multi-class problem
Parameters: T=200, D=6, weak learner = conic, leaf model = probabilistic(4 videos in this page)
Technology for a better society
Classification forest: comparison with SVM
Note overfitting+
overly confident
Same high confidenceaway from trainingdata
Lack of symmetry
SVM produces niceseparation but noconfidence information
[SVM code in http://asi.insa-rouen.fr/enseignants/~arakotom/toolbox/index.html]
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
48/50
48
Technology for a better society
Classification forest: max-margin for multiple classes
Training points
weak learner: conic sectionweak learner: oriented line
Technology for a better society
Summary: Random Forests
Properties Very simple algorithm.
Resistant to overfitting generalizes well to new data.
Faster training
Extensions available for clustering, distance learning, etc.
Limitations Memory consumption
Decision tree construction uses much more memory.
Well-suited for problems with little training data
Little performance gain when training data is really large.
97 B. Leibe
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
49/50
49
Technology for a better society
Why do they work? Suppose there are 25 base classifiers Each classifier has error rate, Assume independence among classifiers Probability that the ensemble classifier makes a
wrong prediction:
06.0)1(25
25i25
13
i
i i
35.0
Technology for a better society
Relation to Cascades [Viola & Jones 04]
Boosted Cascades very unbalanced tree
good for unbalanced binary problemse.g. sliding window object detection
Hard to learn
Randomized forests less deep, fairly balanced
ensemble of trees gives robustness
good for multi-class problems
7/29/2019 Inf5300 v2013 Lecture2 Random 2pp
50/50
Technology for a better society
Credits, reading materials and tools
Many decision tree slides from [A. Criminisi and J. Shotton, 2013]
Tree software (C#/C++) Sherwooddownloadable fromhttp://research.microsoft.com/projects/decisionforests/
Random Forests in Matlab: https://github.com/karpathy/Random-Forest-Matlab
Random Forests : http://www.stat.berkeley.edu/~breiman/RandomForests/
Hastie et al "The elements of statistical learning" Chap 9.1, 9.2
Recommended