69
CS 1699: Intro to Computer Vision Bias - Variance Trade - off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015

CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

CS 1699: Intro to Computer Vision

Bias-Variance Trade-off +Other Models and Problems

Prof. Adriana KovashkaUniversity of Pittsburgh

November 3, 2015

Page 2: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Outline

• Support Vector Machines (review + other uses)

• Bias-variance trade-off

• Scene recognition: Spatial pyramid matching

• Other classifiers– Decision trees

– Hidden Markov models

• Other problems– Clustering: agglomerative clustering

– Dimensionality reduction

Page 3: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Lines in R2

0 bxw

c

aw

y

xx

0 bcyax

Let

w

00, yx

D

w

xw ||

22

00 b

ca

bcyaxD

distance from

point to line

Kristen Grauman

Page 4: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Support vector machines

• Want line that maximizes the margin.

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

Support vectors

For support, vectors, 1 bi wx

Distance between point

and line: ||||

||

w

wx bi

www

211

M

ww

xw 1

For support vectors:

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Margin

Page 5: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Finding the maximum margin line

1. Maximize margin 2/||w||

2. Correctly classify all training data points:

Quadratic optimization problem:

Minimize

Subject to yi(w·xi+b) ≥ 1

wwT

2

1

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

One constraint for each

training point.

Note sign trick.

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 6: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Finding the maximum margin line

• Solution:

b = yi – w·xi (for any support vector)

• Classification function:

• Notice that it relies on an inner product between the test

point x and the support vectors xi

• (Solving the optimization problem also involves

computing the inner products xi · xj between all pairs of

training points)

i iii y xw

by

xf

ii

xx

xw

i isign

b)(sign )(

If f(x) < 0, classify as negative, otherwise classify as positive.

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 7: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

• Datasets that are linearly separable work out great:

• But what if the dataset is just too hard?

• We can map it to a higher-dimensional space:

0 x

0 x

0 x

x2

Nonlinear SVMs

Andrew Moore

Page 8: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Examples of kernel functions

Linear:

Gaussian RBF:

Histogram intersection:

)2

exp()(2

2

ji

ji

xx,xxK

k

jiji kxkxxxK ))(),(min(),(

j

T

iji xxxxK ),(

Andrew Moore

Page 9: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Allowing misclassifications

Maximize margin Minimize misclassification

Slack variable

The w that minimizes…

Misclassification cost

# data samples

Page 10: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

What about multi-class SVMs?

• Unfortunately, there is no “definitive” multi-class SVM formulation

• In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs

• One vs. others– Training: learn an SVM for each class vs. the others– Testing: apply each SVM to the test example, and assign it to the

class of the SVM that returns the highest decision value

• One vs. one– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to the

test example

Svetlana Lazebnik

Page 11: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Evaluating Classifiers

• Accuracy

– # correctly classified / # all test examples

• Precision/recall

– Precision = # predicted true pos / # predicted pos

– Recall = # predicted true pos / # true pos

• F-measure

= 2PR / (P + R)

• Want evaluation metric to be in some range, e.g. [0 1]

– 0 = worst possible classifier, 1 = best possible classifier

Page 12: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Precision / Recall / F-measure

• Precision = 2 / 5 = 0.4

• Recall = 2 / 4 = 0.5

• F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44

True positives

(images that contain people)

True negatives

(images that do not contain people)

Predicted positives

(images predicted to contain people)

Predicted negatives

(images predicted not to contain people)

Accuracy: 5 / 10 = 0.5

Page 13: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Support Vector Regression

Page 14: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Regression

Regression is like classification except the labels are real valued

Example applications:

‣ Stock value prediction

‣ Income prediction

‣ CPU power consumption

Subhransu Maji

Page 15: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Regularized Error Function for Regression

N

n

nn wty1

22

2}{

N

n

nn wtxyEC1

2

2))((

In linear regression, we minimize the error function:

Use the Є-insensitive error function:

An example of Є-insensitive error function:

Adapted from Huan Liu

|)(|

0

yxfE

E |)(| yxffor

otherwise

true value predicted value

Page 16: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Slack Variables for Regression

nnn yty

For a target point to lie

inside the tube:

Introduce slack variables

to allow points to lie

outside the tube:

nnn

nnn

xyt

xyt

)(

)(

Adapted from Huan Liu

0

0

n

n

Subject to:

N

n

nn wC1

2

2

1)(

Minimize:

Page 17: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Next time:Support Vector Ranking

Page 18: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Outline

• Support Vector Machines (review + other uses)

• Bias-variance trade-off

• Scene recognition: Spatial pyramid matching

• Other classifiers– Decision trees

– Hidden Markov models

• Other problems– Clustering: agglomerative clustering

– Dimensionality reduction

Page 19: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Generalization

• How well does a learned model generalize from

the data it was trained on to a new test set?

Training set (labels known) Test set (labels

unknown)

Slide credit: L. Lazebnik

Page 20: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Generalization• Components of generalization error

– Bias: how much the average model over all training sets differs

from the true model

• Error due to inaccurate assumptions/simplifications made by

the model

– Variance: how much models estimated from different training

sets differ from each other

• Underfitting: model is too “simple” to represent all the

relevant class characteristics

– High bias and low variance

– High training error and high test error

• Overfitting: model is too “complex” and fits irrelevant

characteristics (noise) in the data

– Low bias and high variance

– Low training error and high test error

Slide credit: L. Lazebnik

Page 21: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Bias-Variance Trade-off

• Models with too few parameters are inaccurate because of a large bias (not enough flexibility).

• Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).

Slide credit: D. Hoiem

Page 22: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Fitting a model

Figures from Bishop

Is this a good fit?

Page 23: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

With more training data

Figures from Bishop

Page 24: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Bias-variance tradeoff

Training error

Test error

Underfitting Overfitting

Complexity Low Bias

High Variance

High Bias

Low Variance

Err

or

Slide credit: D. Hoiem

Page 25: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Bias-variance tradeoff

Many training examples

Few training examples

Complexity Low Bias

High Variance

High Bias

Low Variance

Test E

rror

Slide credit: D. Hoiem

Page 26: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Choosing the trade-off

• Need validation set

• Validation set is separate from the test set

Training error

Test error

Complexity Low Bias

High Variance

High Bias

Low Variance

Err

or

Slide credit: D. Hoiem

Page 27: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Effect of Training Size

Testing

Training

Generalization Error

Number of Training Examples

Err

or

Fixed prediction model

Adapted from D. Hoiem

Page 28: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

How to reduce variance?

• Choose a simpler classifier

• Use fewer features

• Get more training data

• Regularize the parameters

Slide credit: D. Hoiem

Page 29: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Regularization

Figures from Bishop

Page 30: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Characteristics of vision learning problems

• Lots of continuous features

– Spatial pyramid may have ~15,000 features

• Imbalanced classes

– Often limited positive examples, practically infinite negative examples

• Difficult prediction tasks

• Recently, massive training sets became available

– If we have a massive training set, we want classifiers with low bias (high variance is ok) and reasonably efficient training

Adapted from D. Hoiem

Page 31: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Remember…

• No free lunch: machine learning algorithms are tools

• Three kinds of error

– Inherent: unavoidable

– Bias: due to over-simplifications

– Variance: due to inability to perfectly estimate parameters from limited data

• Try simple classifiers first

• Better to have smart features and simple classifiers than simple features and smart classifiers

• Use increasingly powerful classifiers with more training data (bias-variance tradeoff)

Adapted from D. Hoiem

Page 32: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Outline

• Support Vector Machines (review + other uses)

• Bias-variance trade-off

• Scene recognition: Spatial pyramid matching

• Other classifiers– Decision trees

– Hidden Markov models

• Other problems– Clustering: agglomerative clustering

– Dimensionality reduction

Page 33: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Beyond Bags of Features: Spatial Pyramid Matching

for Recognizing Natural Scene Categories

CVPR 2006

Svetlana Lazebnik ([email protected])

Beckman Institute, University of Illinois at Urbana-Champaign

Cordelia Schmid ([email protected])

INRIA Rhône-Alpes, France

Jean Ponce ([email protected])

Ecole Normale Supérieure, France

http://www-cvr.ai.uiuc.edu/ponce_grp

Page 34: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Bags of words

Slide credit: L. Lazebnik

Page 35: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

1. Extract local features

2. Learn “visual vocabulary” using clustering

3. Quantize local features using visual vocabulary

4. Represent images by frequencies of “visual words”

Bag-of-features steps

Slide credit: L. Lazebnik

Page 36: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Local feature extraction

Slide credit: Josef Sivic

Page 37: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Learning the visual vocabulary

Slide credit: Josef Sivic

Page 38: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Page 39: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

Page 40: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Image categorization with bag of words

Training1. Extract bag-of-words representation

2. Train classifier on labeled examples using histogram values as features

Testing1. Extract keypoints/descriptors

2. Quantize into visual words using the clusters computed at training time

3. Compute visual word histogram

4. Compute label using classifier

Slide credit: D. Hoiem

Page 41: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

What about spatial layout?

All of these images have the same color histogramSlide credit: D. Hoiem

Page 42: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Spatial pyramid

Compute histogram in each spatial bin

Slide credit: D. Hoiem

Page 43: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Spatial pyramid

[Lazebnik et al. CVPR 2006]Slide credit: D. Hoiem

Page 44: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Level 2

Level 1

Level 0

Feature histograms:

Level 3

Total weight (value of pyramid match kernel):

Pyramid matchingIndyk & Thaper (2003), Grauman & Darrell (2005)

Matching using pyramid and histogram intersection for some particular visual word:

Original images

Adapted from L. Lazebnik

xi xj

K( xi , xj )

Page 45: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Scene category dataset

Fei-Fei & Perona: 65.2%

Multi-class classification results (100 training images per class)

Fei-Fei & Perona (2005), Oliva & Torralba (2001)

http://www-cvr.ai.uiuc.edu/ponce_grp/data

Slide credit: L. Lazebnik

Page 46: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Scene category confusions

Difficult indoor images

kitchen living room bedroomSlide credit: L. Lazebnik

Page 47: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Caltech101 dataset

Multi-class classification results (30 training images per class)

Fei-Fei et al. (2004)

http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html

Slide credit: L. Lazebnik

Page 48: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Outline

• Support Vector Machines (review + other uses)

• Bias-variance trade-off

• Scene recognition: Spatial pyramid matching

• Other classifiers– Decision trees

– Hidden Markov models

• Other problems– Clustering: agglomerative clustering

– Dimensionality reduction

Page 49: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Decision tree classifier

Example problem: decide whether to wait for a table at a restaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?

8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Slide credit: L. Lazebnik

Page 50: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Decision tree classifier

Slide credit: L. Lazebnik

Page 51: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Decision tree classifier

Slide credit: L. Lazebnik

Page 52: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Sequence Labeling Problem

• Unlike most computer vision problems, many

NLP problems can viewed as sequence labeling.

• Each token in a sequence is assigned a label.

• Labels of tokens are dependent on the labels of

other tokens in the sequence, particularly their

neighbors (not i.i.d).

foo bar blam zonk zonk bar blam

Adapted from Ray Mooney

Page 53: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Markov Model / Markov Chain

• A finite state machine with probabilistic

state transitions.

• Makes Markov assumption that next state

only depends on the current state and

independent of previous history.

Ray Mooney

Page 54: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Sample Markov Model for POS

0.95

0.05

0.9

0.05stop

0.5

0.1

0.8

0.1

0.1

0.25

0.25

start0.1

0.5

0.4

Det Noun

PropNoun

Verb

Ray Mooney

Page 55: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Sample Markov Model for POS

0.95

0.05

0.9

0.05stop

0.5

0.1

0.8

0.1

0.1

0.25

0.25

start0.1

0.5

0.4

Det Noun

PropNoun

Verb

P(PropNoun Verb Det Noun) = 0.4*0.8*0.25*0.95*0.1=0.0076

Adapted from Ray Mooney “Mary ate the cake.”

Page 56: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Hidden Markov Model

• Probabilistic generative model for sequences.

• Assume an underlying set of hidden (unobserved) states in which the model can be (e.g. parts of speech).

• Assume probabilistic transitions between states over time (e.g. transition from POS to another POS as sequence is generated).

• Assume a probabilistic generation of tokens from states (e.g. words generated for each POS).

Ray Mooney

Page 57: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Sample HMM for POS

PropNoun

JohnMaryAlice

Jerry

Tom

Noun

catdog

carpen

bedapple

Det

a thethe

the

that

athea

Verb

bit

ate sawplayed

hit

0.95

0.05

0.9

gave0.05

stop

0.5

0.1

0.8

0.1

0.1

0.25

0.25

start0.1

0.5

0.4

Ray Mooney

Page 58: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Outline

• Support Vector Machines (review + other uses)

• Bias-variance trade-off

• Scene recognition: Spatial pyramid matching

• Other classifiers– Decision trees

– Hidden Markov models

• Other problems– Clustering: agglomerative clustering

– Dimensionality reduction

Page 59: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Slide credit: D. Hoiem

Page 60: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Clustering Strategies

• K-means– Iteratively re-assign points to the nearest cluster

center

• Mean-shift clustering– Estimate modes

• Graph cuts– Split the nodes in a graph based on assigned links with

similarity weights

• Agglomerative clustering– Start with each point as its own cluster and iteratively

merge the closest clusters

Page 61: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

Page 62: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

Page 63: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

Page 64: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

Page 65: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

Page 66: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Agglomerative clustering

How to define cluster similarity?- Average distance between points, maximum

distance, minimum distance

How many clusters?- Clustering creates a dendrogram (a tree)

- Threshold based on max number of clusters or based on distance between merges

dis

tan

ce

Adapted from J. Hays

Page 67: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Why do we cluster?

• Summarizing data– Look at large amounts of data– Represent a large continuous vector with the cluster number

• Counting– Histograms of texture, color, SIFT vectors

• Segmentation– Separate the image into different regions

• Prediction– Images in the same cluster may have the same labels

Slide credit: J. Hays, D. Hoiem

Page 68: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Slide credit: D. Hoiem

Page 69: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive

Dimensionality Reduction

Figure from Genevieve Patterson, IJCV 2014