CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

CS 1699: Intro to Computer Vision

Intro to Machine Learning & Visual Recognition – Part II

Prof. Adriana KovashkaUniversity of Pittsburgh

October 27, 2015

Homework

• HW3: push back to Nov. 3

• HW4: push back to Nov. 24

• HW5: make half-length, still 10% of overall grade, out Nov. 24, due Dec. 10

• Take out a small piece of paper and vote yes/no on this proposal

Other announcements

• Piazza

– Can get participation credit by asking/answering

• Feedback from surveys

Plan for today

• Visual recognition problems

• Recognition pipeline

– Features and data

– Challenges

• Overview of some methods for classification

• Challenges and trade-offs

Some translations

• Feature vector = descriptor = representation

• Recognition often involves classification

• Classes = categories (hence classification = categorization)

• Training = learning a model (e.g. classifier), happens at training time from training data

• Classification = prediction, happens at test time

Classification

• Given a feature representation for images,

learn a model for distinguishing features from

different classes

Zebra

Non-zebra

Decision

boundary

Slide credit: L. Lazebnik

Image categorization

• Cat vs Dog

Slide credit: D. Hoiem


• Object recognition

Caltech 101 Average Object ImagesSlide credit: D. Hoiem


• Place recognition

Places Database [Zhou et al. NIPS 2014]Slide credit: D. Hoiem

http://places.csail.mit.edu/places_NIPS14.pdf

Region categorization

• Material recognition

[Bell et al. CVPR 2015]Slide credit: D. Hoiem

http://arxiv.org/pdf/1412.0623.pdf

Recognition: A machine

learning approach

The machine learning

framework

• Apply a prediction function to a feature representation of

the image to get the desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”Slide credit: L. Lazebnik

The machine learning

framework

y = f(x)

• Training: given a training set of labeled examples {(x1,y1),

…, (xN,yN)}, estimate the prediction function f by minimizing

the prediction error on the training set

• Testing: apply f to a never before seen test example x and

output the predicted value y = f(x)

output prediction

function

image

feature


Prediction

Steps

Training

LabelsTraining

Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem and L. Lazebnik

BOARD

Popular global image features

• Raw pixels (and simple

functions of raw pixels)

• Histograms, bags of features

• GIST descriptors [Oliva and

Torralba, 2001]

• Histograms of oriented gradients

(HOG) [Dalal and Triggs, 2005]


http://people.csail.mit.edu/torralba/code/spatialenvelope/

http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients

• Color

• Texture (filter banks or HOG over regions)L*a*b* color space HSV color space

What kind of things do we compute histograms of?


What kind of things do we compute histograms of?• Histograms of descriptors

• “Bag of visual words”

SIFT – [Lowe IJCV 2004]


Prediction

Steps

Training

LabelsTraining

Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem and L. Lazebnik

Recognition training data• Images in the training set must be annotated with the

“correct answer” that the model is expected to produce

“Motorbike”


Challenges: robustness

Illumination Object pose Clutter

ViewpointIntra-class

appearanceOcclusions

Slide credit: K. Grauman

Challenges: importance of context

Slide credit: Fei-Fei, Fergus & Torralba

Painter identification

• How would you learn to identify the author of a painting?

Goya Kirchner Klimt Marc Monet Van Gogh

Plan for today

• Visual recognition problems

• Recognition pipeline

– Features and data

– Challenges

• Overview of some methods for classification

• Challenges and trade-offs

One way to think about it…

• Training labels dictate that two examples are the same or different, in some sense

• Features and distances define visual similarity

• Goal of training is to learn feature weights so that visual similarity predicts label similarity– Linear classifier: confidence in positive label is a weighted

sum of features – What are the weights?

• We want the simplest function that is confidently correct

Adapted from D. Hoiem

BOARD

Supervised classification

• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.

• How good is some function that we come up with to do the classification?

• Depends on– Mistakes made

– Cost associated with the mistakes

“four”

“nine”

?Training examples Novel input



• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.

• Consider the two-class (binary) decision problem– L(4→9): Loss of classifying a 4 as a 9

– L(9→4): Loss of classifying a 9 as a 4

• Risk of a classifier s is expected loss:

• We want to choose a classifier so as to minimize this total risk.

49 using|49Pr94 using|94Pr)( LsLssR



Feature value x

Optimal classifier will minimize total risk.

At decision boundary, either choice of label yields same expected loss.

If we choose class “four” at boundary, expected loss is:

If we choose class “nine” at boundary, expected loss is:

So, best decision boundary is at point x where

4)(4) | 4 is (class4)(9 )|9 is class( LPLP xx

9)(4 )|4 is class( LP x

9)(4) |4 is P(class4)(9 )|9 is class( LLP xx



To classify a new point, choose class with lowest expected loss; i.e., choose “four” if

)94()|4()49()|9( LPLP xx

Feature value x

Optimal classifier will minimize total risk.

At decision boundary, either choice of label yields same expected loss.

Loss for choosing “four” Loss for choosing “nine”


Disclaimers

• We will often assume the same loss for all possible types of misclassifications

• We won’t always build probability distributions – often we’ll just find a decision boundary (using discriminative methods)

• What’s the simplest classifier you can think of?

Nearest neighbor classifier

f(x) = label of the training example nearest to x

• All we need is a distance function for our inputs

• No training required!

Test

exampleTraining

examples

from class 1

Training

examples

from class 2


K-Nearest Neighbors classification

k = 5

Slide credit: D. Lowe

• For a new point, find the k closest points from training data

• Labels of the k points “vote” to classify

If query lands here, the 5

NN consist of 3 negatives

and 2 positives, so we

classify it as negative.

Black = negative

Red = positive

1-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+


3-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+


5-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+


What are the tradeoffs of having a too large k? Too small k?

A nearest neighbor recognition example:

im2gps: Estimating Geographic Information from a Single Image.

James Hays and Alexei Efros.CVPR 2008.

http://graphics.cs.cmu.edu/projects/im2gps/

http://graphics.cs.cmu.edu/projects/im2gps/

Where in the World?

Slides: James Hays

Where in the World?

Slides: James Hays

Where in the World?

Slides: James Hays

How much can an image tell about its geographic location?

Slide credit: James Hays

Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others

Slide credit: James Hays

6+ million geotagged photosby 109,788 photographers

Slides: James Hays

A scene is a single surface that can be

represented by global (statistical) descriptors

Spatial Envelope Theory of Scene RepresentationOliva & Torralba (2001)

Slide Credit: Aude Olivia

Scene Matches

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Slides: James Hays

Scene Matches



Scene Matches



The Importance of Data

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]Slides: James Hays

Nearest neighbor classifier

f(x) = label of the training example nearest to x

• All we need is a distance function for our inputs

• No training required!

Test

exampleTraining

examples

from class 1

Training

examples

from class 2


Evaluating Classifiers

• Accuracy

– # correctly classified / # all test examples

• Precision/recall

– Precision = # retrieved positives / # retrieved

– Recall = # retrieved positives / # positives

• F-measure

= 2PR / (P + R)

Discriminative classifiers

Learn a simple function of the input features that correctly predicts the true labels on the training set

Training Goals

1. Accurate classification of training data

2. Correct classifications are confident

3. Classification function is simple

𝑦 = 𝑓 𝑥


Linear classifier

• Find a linear function to separate the classes

f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)


What about this line?

NN vs. linear classifiers• NN pros:

+ Simple to implement

+ Decision boundaries not necessarily linear

+ Works for any number of classes

+ Nonparametric method

• NN cons:

– Need good distance function

– Slow at test time (large search problem to find neighbors)

– Storage of data

• Linear pros:

+ Low-dimensional parametric representation

+ Very fast at test time

• Linear cons:

– Works for two classes

– How to train the linear function?

– What if data is not linearly separable?

Adapted from L. Lazebnik

Documents

CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &