Upload
others
View
38
Download
0
Embed Size (px)
Citation preview
CS 1699: Intro to Computer Vision
Intro to Machine Learning & Visual Recognition – Part II
Prof. Adriana KovashkaUniversity of Pittsburgh
October 27, 2015
Homework
• HW3: push back to Nov. 3
• HW4: push back to Nov. 24
• HW5: make half-length, still 10% of overall grade, out Nov. 24, due Dec. 10
• Take out a small piece of paper and vote yes/no on this proposal
Other announcements
• Piazza
– Can get participation credit by asking/answering
• Feedback from surveys
Plan for today
• Visual recognition problems
• Recognition pipeline
– Features and data
– Challenges
• Overview of some methods for classification
• Challenges and trade-offs
Some translations
• Feature vector = descriptor = representation
• Recognition often involves classification
• Classes = categories (hence classification = categorization)
• Training = learning a model (e.g. classifier), happens at training time from training data
• Classification = prediction, happens at test time
Classification
• Given a feature representation for images,
learn a model for distinguishing features from
different classes
Zebra
Non-zebra
Decision
boundary
Slide credit: L. Lazebnik
Image categorization
• Cat vs Dog
Slide credit: D. Hoiem
Image categorization
• Object recognition
Caltech 101 Average Object ImagesSlide credit: D. Hoiem
Image categorization
• Place recognition
Places Database [Zhou et al. NIPS 2014]Slide credit: D. Hoiem
Region categorization
• Material recognition
[Bell et al. CVPR 2015]Slide credit: D. Hoiem
Recognition: A machine
learning approach
The machine learning
framework
• Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”Slide credit: L. Lazebnik
The machine learning
framework
y = f(x)
• Training: given a training set of labeled examples {(x1,y1),
…, (xN,yN)}, estimate the prediction function f by minimizing
the prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
output prediction
function
image
feature
Slide credit: L. Lazebnik
Prediction
Steps
Training
LabelsTraining
Images
Training
Training
Image
Features
Image
Features
Testing
Test Image
Learned
model
Learned
model
Slide credit: D. Hoiem and L. Lazebnik
BOARD
Popular global image features
• Raw pixels (and simple
functions of raw pixels)
• Histograms, bags of features
• GIST descriptors [Oliva and
Torralba, 2001]
• Histograms of oriented gradients
(HOG) [Dalal and Triggs, 2005]
Slide credit: L. Lazebnik
• Color
• Texture (filter banks or HOG over regions)L*a*b* color space HSV color space
What kind of things do we compute histograms of?
Slide credit: D. Hoiem
What kind of things do we compute histograms of?• Histograms of descriptors
• “Bag of visual words”
SIFT – [Lowe IJCV 2004]
Slide credit: D. Hoiem
Prediction
Steps
Training
LabelsTraining
Images
Training
Training
Image
Features
Image
Features
Testing
Test Image
Learned
model
Learned
model
Slide credit: D. Hoiem and L. Lazebnik
Recognition training data• Images in the training set must be annotated with the
“correct answer” that the model is expected to produce
“Motorbike”
Slide credit: L. Lazebnik
Challenges: robustness
Illumination Object pose Clutter
ViewpointIntra-class
appearanceOcclusions
Slide credit: K. Grauman
Challenges: importance of context
Slide credit: Fei-Fei, Fergus & Torralba
Painter identification
• How would you learn to identify the author of a painting?
Goya Kirchner Klimt Marc Monet Van Gogh
Plan for today
• Visual recognition problems
• Recognition pipeline
– Features and data
– Challenges
• Overview of some methods for classification
• Challenges and trade-offs
One way to think about it…
• Training labels dictate that two examples are the same or different, in some sense
• Features and distances define visual similarity
• Goal of training is to learn feature weights so that visual similarity predicts label similarity– Linear classifier: confidence in positive label is a weighted
sum of features – What are the weights?
• We want the simplest function that is confidently correct
Adapted from D. Hoiem
BOARD
Supervised classification
• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.
• How good is some function that we come up with to do the classification?
• Depends on– Mistakes made
– Cost associated with the mistakes
“four”
“nine”
?Training examples Novel input
Slide credit: K. Grauman
Supervised classification
• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.
• Consider the two-class (binary) decision problem– L(4→9): Loss of classifying a 4 as a 9
– L(9→4): Loss of classifying a 9 as a 4
• Risk of a classifier s is expected loss:
• We want to choose a classifier so as to minimize this total risk.
49 using|49Pr94 using|94Pr)( LsLssR
Slide credit: K. Grauman
Supervised classification
Feature value x
Optimal classifier will minimize total risk.
At decision boundary, either choice of label yields same expected loss.
If we choose class “four” at boundary, expected loss is:
If we choose class “nine” at boundary, expected loss is:
So, best decision boundary is at point x where
4)(4) | 4 is (class4)(9 )|9 is class( LPLP xx
9)(4 )|4 is class( LP x
9)(4) |4 is P(class4)(9 )|9 is class( LLP xx
Slide credit: K. Grauman
Supervised classification
To classify a new point, choose class with lowest expected loss; i.e., choose “four” if
)94()|4()49()|9( LPLP xx
Feature value x
Optimal classifier will minimize total risk.
At decision boundary, either choice of label yields same expected loss.
Loss for choosing “four” Loss for choosing “nine”
Slide credit: K. Grauman
Disclaimers
• We will often assume the same loss for all possible types of misclassifications
• We won’t always build probability distributions – often we’ll just find a decision boundary (using discriminative methods)
• What’s the simplest classifier you can think of?
Nearest neighbor classifier
f(x) = label of the training example nearest to x
• All we need is a distance function for our inputs
• No training required!
Test
exampleTraining
examples
from class 1
Training
examples
from class 2
Slide credit: L. Lazebnik
K-Nearest Neighbors classification
k = 5
Slide credit: D. Lowe
• For a new point, find the k closest points from training data
• Labels of the k points “vote” to classify
If query lands here, the 5
NN consist of 3 negatives
and 2 positives, so we
classify it as negative.
Black = negative
Red = positive
1-nearest neighbor
x x
xx
x
x
x
x
o
oo
o
o
o
o
x2
x1
+
+
Slide credit: D. Hoiem
3-nearest neighbor
x x
xx
x
x
x
x
o
oo
o
o
o
o
x2
x1
+
+
Slide credit: D. Hoiem
5-nearest neighbor
x x
xx
x
x
x
x
o
oo
o
o
o
o
x2
x1
+
+
Slide credit: D. Hoiem
What are the tradeoffs of having a too large k? Too small k?
A nearest neighbor recognition example:
im2gps: Estimating Geographic Information from a Single Image.
James Hays and Alexei Efros.CVPR 2008.
http://graphics.cs.cmu.edu/projects/im2gps/
Where in the World?
Slides: James Hays
Where in the World?
Slides: James Hays
Where in the World?
Slides: James Hays
How much can an image tell about its geographic location?
Slide credit: James Hays
Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others
Slide credit: James Hays
6+ million geotagged photosby 109,788 photographers
Slides: James Hays
A scene is a single surface that can be
represented by global (statistical) descriptors
Spatial Envelope Theory of Scene RepresentationOliva & Torralba (2001)
Slide Credit: Aude Olivia
Scene Matches
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
Slides: James Hays
Scene Matches
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
Scene Matches
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
The Importance of Data
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]Slides: James Hays
Nearest neighbor classifier
f(x) = label of the training example nearest to x
• All we need is a distance function for our inputs
• No training required!
Test
exampleTraining
examples
from class 1
Training
examples
from class 2
Slide credit: L. Lazebnik
Evaluating Classifiers
• Accuracy
– # correctly classified / # all test examples
• Precision/recall
– Precision = # retrieved positives / # retrieved
– Recall = # retrieved positives / # positives
• F-measure
= 2PR / (P + R)
Discriminative classifiers
Learn a simple function of the input features that correctly predicts the true labels on the training set
Training Goals
1. Accurate classification of training data
2. Correct classifications are confident
3. Classification function is simple
𝑦 = 𝑓 𝑥
Slide credit: D. Hoiem
Linear classifier
• Find a linear function to separate the classes
f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)
Slide credit: L. Lazebnik
What about this line?
NN vs. linear classifiers• NN pros:
+ Simple to implement
+ Decision boundaries not necessarily linear
+ Works for any number of classes
+ Nonparametric method
• NN cons:
– Need good distance function
– Slow at test time (large search problem to find neighbors)
– Storage of data
• Linear pros:
+ Low-dimensional parametric representation
+ Very fast at test time
• Linear cons:
– Works for two classes
– How to train the linear function?
– What if data is not linearly separable?
Adapted from L. Lazebnik