54
CS 1699: Intro to Computer Vision Intro to Machine Learning & Visual Recognition Part II Prof. Adriana Kovashka University of Pittsburgh October 27, 2015

CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

  • Upload
    others

  • View
    38

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

CS 1699: Intro to Computer Vision

Intro to Machine Learning & Visual Recognition – Part II

Prof. Adriana KovashkaUniversity of Pittsburgh

October 27, 2015

Page 2: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Homework

• HW3: push back to Nov. 3

• HW4: push back to Nov. 24

• HW5: make half-length, still 10% of overall grade, out Nov. 24, due Dec. 10

• Take out a small piece of paper and vote yes/no on this proposal

Page 3: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Other announcements

• Piazza

– Can get participation credit by asking/answering

• Feedback from surveys

Page 4: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Plan for today

• Visual recognition problems

• Recognition pipeline

– Features and data

– Challenges

• Overview of some methods for classification

• Challenges and trade-offs

Page 5: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Some translations

• Feature vector = descriptor = representation

• Recognition often involves classification

• Classes = categories (hence classification = categorization)

• Training = learning a model (e.g. classifier), happens at training time from training data

• Classification = prediction, happens at test time

Page 6: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Classification

• Given a feature representation for images,

learn a model for distinguishing features from

different classes

Zebra

Non-zebra

Decision

boundary

Slide credit: L. Lazebnik

Page 7: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Image categorization

• Cat vs Dog

Slide credit: D. Hoiem

Page 8: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Image categorization

• Object recognition

Caltech 101 Average Object ImagesSlide credit: D. Hoiem

Page 9: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Image categorization

• Place recognition

Places Database [Zhou et al. NIPS 2014]Slide credit: D. Hoiem

Page 10: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Region categorization

• Material recognition

[Bell et al. CVPR 2015]Slide credit: D. Hoiem

Page 11: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Recognition: A machine

learning approach

Page 12: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

The machine learning

framework

• Apply a prediction function to a feature representation of

the image to get the desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”Slide credit: L. Lazebnik

Page 13: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

The machine learning

framework

y = f(x)

• Training: given a training set of labeled examples {(x1,y1),

…, (xN,yN)}, estimate the prediction function f by minimizing

the prediction error on the training set

• Testing: apply f to a never before seen test example x and

output the predicted value y = f(x)

output prediction

function

image

feature

Slide credit: L. Lazebnik

Page 14: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Prediction

Steps

Training

LabelsTraining

Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem and L. Lazebnik

BOARD

Page 15: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Popular global image features

• Raw pixels (and simple

functions of raw pixels)

• Histograms, bags of features

• GIST descriptors [Oliva and

Torralba, 2001]

• Histograms of oriented gradients

(HOG) [Dalal and Triggs, 2005]

Slide credit: L. Lazebnik

Page 16: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

• Color

• Texture (filter banks or HOG over regions)L*a*b* color space HSV color space

What kind of things do we compute histograms of?

Slide credit: D. Hoiem

Page 17: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

What kind of things do we compute histograms of?• Histograms of descriptors

• “Bag of visual words”

SIFT – [Lowe IJCV 2004]

Slide credit: D. Hoiem

Page 18: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Prediction

Steps

Training

LabelsTraining

Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem and L. Lazebnik

Page 19: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Recognition training data• Images in the training set must be annotated with the

“correct answer” that the model is expected to produce

“Motorbike”

Slide credit: L. Lazebnik

Page 20: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Challenges: robustness

Illumination Object pose Clutter

ViewpointIntra-class

appearanceOcclusions

Slide credit: K. Grauman

Page 21: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Challenges: importance of context

Slide credit: Fei-Fei, Fergus & Torralba

Page 22: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Painter identification

• How would you learn to identify the author of a painting?

Goya Kirchner Klimt Marc Monet Van Gogh

Page 23: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Plan for today

• Visual recognition problems

• Recognition pipeline

– Features and data

– Challenges

• Overview of some methods for classification

• Challenges and trade-offs

Page 24: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

One way to think about it…

• Training labels dictate that two examples are the same or different, in some sense

• Features and distances define visual similarity

• Goal of training is to learn feature weights so that visual similarity predicts label similarity– Linear classifier: confidence in positive label is a weighted

sum of features – What are the weights?

• We want the simplest function that is confidently correct

Adapted from D. Hoiem

BOARD

Page 25: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Supervised classification

• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.

• How good is some function that we come up with to do the classification?

• Depends on– Mistakes made

– Cost associated with the mistakes

“four”

“nine”

?Training examples Novel input

Slide credit: K. Grauman

Page 26: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Supervised classification

• Given a collection of labeled examples, come up with a function that will predict the labels of new examples.

• Consider the two-class (binary) decision problem– L(4→9): Loss of classifying a 4 as a 9

– L(9→4): Loss of classifying a 9 as a 4

• Risk of a classifier s is expected loss:

• We want to choose a classifier so as to minimize this total risk.

49 using|49Pr94 using|94Pr)( LsLssR

Slide credit: K. Grauman

Page 27: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Supervised classification

Feature value x

Optimal classifier will minimize total risk.

At decision boundary, either choice of label yields same expected loss.

If we choose class “four” at boundary, expected loss is:

If we choose class “nine” at boundary, expected loss is:

So, best decision boundary is at point x where

4)(4) | 4 is (class4)(9 )|9 is class( LPLP xx

9)(4 )|4 is class( LP x

9)(4) |4 is P(class4)(9 )|9 is class( LLP xx

Slide credit: K. Grauman

Page 28: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Supervised classification

To classify a new point, choose class with lowest expected loss; i.e., choose “four” if

)94()|4()49()|9( LPLP xx

Feature value x

Optimal classifier will minimize total risk.

At decision boundary, either choice of label yields same expected loss.

Loss for choosing “four” Loss for choosing “nine”

Slide credit: K. Grauman

Page 29: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Disclaimers

• We will often assume the same loss for all possible types of misclassifications

• We won’t always build probability distributions – often we’ll just find a decision boundary (using discriminative methods)

• What’s the simplest classifier you can think of?

Page 30: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Nearest neighbor classifier

f(x) = label of the training example nearest to x

• All we need is a distance function for our inputs

• No training required!

Test

exampleTraining

examples

from class 1

Training

examples

from class 2

Slide credit: L. Lazebnik

Page 31: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

K-Nearest Neighbors classification

k = 5

Slide credit: D. Lowe

• For a new point, find the k closest points from training data

• Labels of the k points “vote” to classify

If query lands here, the 5

NN consist of 3 negatives

and 2 positives, so we

classify it as negative.

Black = negative

Red = positive

Page 32: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

1-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+

Slide credit: D. Hoiem

Page 33: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

3-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+

Slide credit: D. Hoiem

Page 34: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

5-nearest neighbor

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

+

Slide credit: D. Hoiem

What are the tradeoffs of having a too large k? Too small k?

Page 35: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

A nearest neighbor recognition example:

im2gps: Estimating Geographic Information from a Single Image.

James Hays and Alexei Efros.CVPR 2008.

http://graphics.cs.cmu.edu/projects/im2gps/

Page 36: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Where in the World?

Slides: James Hays

Page 37: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Where in the World?

Slides: James Hays

Page 38: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Where in the World?

Slides: James Hays

Page 39: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

How much can an image tell about its geographic location?

Slide credit: James Hays

Page 40: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others

Slide credit: James Hays

Page 41: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

6+ million geotagged photosby 109,788 photographers

Slides: James Hays

Page 42: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

A scene is a single surface that can be

represented by global (statistical) descriptors

Spatial Envelope Theory of Scene RepresentationOliva & Torralba (2001)

Slide Credit: Aude Olivia

Page 43: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Scene Matches

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Page 44: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Slides: James Hays

Page 45: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Scene Matches

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Page 46: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Page 47: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Scene Matches

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Page 48: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Page 49: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

The Importance of Data

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]Slides: James Hays

Page 50: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Nearest neighbor classifier

f(x) = label of the training example nearest to x

• All we need is a distance function for our inputs

• No training required!

Test

exampleTraining

examples

from class 1

Training

examples

from class 2

Slide credit: L. Lazebnik

Page 51: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Evaluating Classifiers

• Accuracy

– # correctly classified / # all test examples

• Precision/recall

– Precision = # retrieved positives / # retrieved

– Recall = # retrieved positives / # positives

• F-measure

= 2PR / (P + R)

Page 52: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Discriminative classifiers

Learn a simple function of the input features that correctly predicts the true labels on the training set

Training Goals

1. Accurate classification of training data

2. Correct classifications are confident

3. Classification function is simple

𝑦 = 𝑓 𝑥

Slide credit: D. Hoiem

Page 53: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

Linear classifier

• Find a linear function to separate the classes

f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)

Slide credit: L. Lazebnik

What about this line?

Page 54: CS 1699: Intro to Computer Vision Introductionpeople.cs.pitt.edu/~kovashka/cs1699_fa15/vision_16_ml_recognition... · CS 1699: Intro to Computer Vision Intro to Machine Learning &

NN vs. linear classifiers• NN pros:

+ Simple to implement

+ Decision boundaries not necessarily linear

+ Works for any number of classes

+ Nonparametric method

• NN cons:

– Need good distance function

– Slow at test time (large search problem to find neighbors)

– Storage of data

• Linear pros:

+ Low-dimensional parametric representation

+ Very fast at test time

• Linear cons:

– Works for two classes

– How to train the linear function?

– What if data is not linearly separable?

Adapted from L. Lazebnik