Upload
charleen-wells
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
What you need to know• Dual SVM formulation
– How it’s derived• The kernel trick• Derive polynomial kernel• Common kernels• Kernelized logistic regression• SVMs vs kernel regression• SVMs vs logistic regression
Example: Dalal-Triggs pedestrian detector
1. Extract fixed-sized (64x128 pixel) window at each position and scale
2. Compute HOG (histogram of gradient) features within each window
3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove
overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
uncentered
centered
cubic-corrected
diagonal
Sobel
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Outperforms
• Histogram of gradient orientations
– Votes weighted by magnitude– Bilinear interpolation between
cells
Orientation: 9 bins (for unsigned angles)
Histograms in 8x8 pixel cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Normalize with respect to surrounding cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
X=
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
# features = 15 x 7 x 9 x 4 = 3780
# cells
# orientations
# normalizations by neighboring cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
pos w neg w
pedestrian
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Viola-Jones sliding window detector
Fast detection through two mechanisms• Quickly eliminate unlikely windows• Use features that are fast to compute
Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).
Cascade for Fast Detection
Examples
Stage 1H1(x) > t1?
Reject
No
YesStage 2
H2(x) > t2?Stage N
HN(x) > tN?
Yes
… Pass
Reject
No
Reject
No
• Choose threshold for low false negative rate• Fast classifiers early in cascade• Slow classifiers later, but most examples don’t get there
Features that are fast to compute
• “Haar-like features”– Differences of sums of intensity– Thousands, computed at various positions and
scales within detection window
Two-rectangle features Three-rectangle features Etc.
-1 +1
Feature selection with Adaboost
• Create a large pool of features (180K)• Select features that are discriminative and work well
together– “Weak learner” = feature + threshold
– Choose weak learner that minimizes error on the weighted training set
– Reweight
Integration
• Feature level
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Integration
• Feature level
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Feature Passing
• Compute features from one estimated scene property to help estimate another
Image X Estimate
Y Estimate
X Features
Y Features
Feature passing: example
ObjectWindow
Below
Above
Use features computed from “geometric context” confidence images to improve object detection
Hoiem et al. ICCV 2005
Features: average confidence within each window
Feature Passing
• Pros and cons– Simple training and inference– Very flexible in modeling interactions– Not modular
• if we get a new method for first estimates, we may need to retrain
Integration
• Feature Passing
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Structured Prediction• Prediction of complex outputs
– Structured outputs: multivariate, correlated, constrained
• Novel, general way to solve many learning problems
Bipartite Matching
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x yWhat
is the
anticipatedcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?
Combinatorial structure
Structured Models
Mild assumptions:
linear combination
sum of part scores
space of feasible outputs
scoring function
Supervised Structured Prediction
Learning Prediction
Estimate w
Example:Weighted matching
Generally: Combinatorial
optimization
Data
Model:
Likelihood(can be intractable)
MarginLocal(ignores structure)
Local Estimation
• Treat edges as independent decisions
• Estimate w locally, use globally– E.g., naïve Bayes, SVM, logistic regression – Cf. [Matusov+al, 03] for matchings
– Simple and cheap– Not well-calibrated for matching model– Ignores correlations & constraints
Data
Model:
Conditional Likelihood Estimation
• Estimate w jointly:
• Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93]
• Tractable model, intractable learning
• Need tractable learning method margin-based estimation
Data
Model:
• We want:
• Equivalently:
Structured large margin estimation
a lot!…
“brace”
“brace”
“aaaaa”
“brace” “aaaab”
“brace” “zzzzz”
Large margin estimation
• Given training examples , we want:
Maximize margin
Mistake weighted margin:
# of mistakes in y
*Collins 02, Altun et al 03, Taskar 03
Large margin estimation• Brute force enumeration
• Min-max formulation
– ‘Plug-in’ linear program for inference
Min-max formulation
LP Inference
Structured loss (Hamming):
Inference
discrete optim.
Key step:
continuous optim.
Matching Inference LP
degree
Whatis
theanticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
Need Hamming-like loss
LP Duality• Linear programming duality
– Variables constraints– Constraints variables
• Optimal values are the same– When both feasible regions are bounded