72
Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Embed Size (px)

Citation preview

Page 1: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

1

Tamara BergObject Recognition – BoF models

790-133Recognizing People, Objects, & Actions

Page 2: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

2

Topic Presentations

• Hopefully you have met your topic presentations group members?

• Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read.

• Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.

Page 3: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

3

ObjectBag of

‘features’

Bag-of-features models

source: Svetlana Lazebnik

Page 4: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

4

Exchangeability

• De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.

Page 5: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

5

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

source: Svetlana Lazebnik

Page 6: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

6

Bag of words for text

· Represent documents as a “bags of words”

Page 7: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

7

Example

• Doc1 = “the quick brown fox jumped”• Doc2 = “brown quick jumped fox the”

Would a bag of words model represent these two documents differently?

Page 8: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

8

Bag of words for images

· Represent images as a “bag of features”

Page 9: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

9

Bag of features: outline1. Extract features

source: Svetlana Lazebnik

Page 10: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

10

Bag of features: outline1. Extract features2. Learn “visual vocabulary”

source: Svetlana Lazebnik

Page 11: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

11

Bag of features: outline1. Extract features2. Learn “visual vocabulary”3. Represent images by frequencies of

“visual words”

source: Svetlana Lazebnik

Page 12: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

12

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Page 13: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

13

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

Page 14: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

14

K-means clustering (reminder)• Want to minimize sum of squared Euclidean

distances between points xi and their nearest cluster centers mk

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

• Assign each data point to the nearest center• Recompute each cluster center as the mean of all points assigned

to it

k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

source: Svetlana Lazebnik

Page 15: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

15

Example visual vocabulary

Fei-Fei et al. 2005

Page 16: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Image Representation

• For a query image Extract features

Associate each feature with the nearest cluster center (visual word)

Accumulate visual word frequencies over the image

Visual vocabulary

xx

x x

x x

x

x

x

x

Page 17: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

17

3. Image representation

…..

freq

uenc

y

codewords

source: Svetlana Lazebnik

Page 18: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

18

4. Image classification

…..

freq

uenc

y

codewords

source: Svetlana Lazebnik

Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

CAR

Page 19: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Image Categorization

Choose from many categories

What is this? helicopter

Page 20: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Image Categorization

Choose from many categories

What is this?

SVM/NBCsurka et al (Caltech 4/7)

Nearest NeighborBerg et al (Caltech 101)

Kernel + SVMGrauman et al (Caltech 101)

Multiple Kernel Learning + SVMsVarma et al (Caltech 101)…

Page 21: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

21

Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray

Page 22: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

22

Data

• Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books

• Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background

Page 23: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

23

Method

Steps:– Detect and describe image patches.– Assign patch descriptors to a set of predetermined

clusters (a visual vocabulary).– Construct a bag of keypoints, which counts the

number of patches assigned to each cluster.– Apply a classifier (SVM or Naïve Bayes), treating the

bag of keypoints as the feature vector– Determine which category or categories to assign to

the image.

Page 24: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

24

Bag-of-Keypoints Approach

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

5.1

.

.

.

5.0

1.0

Slide credit: Yun-hsueh Liu

Page 25: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

25

SIFT Descriptors

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

5.1

.

.

.

5.0

1.0

Slide credit: Yun-hsueh Liu

Page 26: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

26

Bag of Keypoints (1)

• Construction of a vocabulary– Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) – Define a “vocabulary” as a set of “centroids”, where every centroid represents

a “word”.

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

Slide credit: Yun-hsueh Liu

Page 27: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

27

Bag of Keypoints (2)

• Histogram– Counts the number of occurrences of different visual words in each image

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

Slide credit: Yun-hsueh Liu

Page 28: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

28

Multi-class Classifier

• In this paper, classification is based on conventional machine learning approaches– Support Vector Machine (SVM)– Naïve Bayes

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

Slide credit: Yun-hsueh Liu

Page 29: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

29

SVM

Page 30: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Reminder: Linear SVM

x1

x2 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

x+

x+

x-

Support Vectors

Slide credit: Jinwei GuSlide 30 of 113

( ) Tg b x w x

( ) 1Ti iy b w x

21minimize

2w

s.t.

Page 31: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

31

Nonlinear SVMs: The Kernel Trick With this mapping, our discriminant function becomes:

SV

( ) ( ) ( ) ( )T Ti i

i

g b b

x w x x x

No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.

A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:

( , ) ( ) ( )Ti j i jK x x x x

Slide credit: Jinwei Gu

Page 32: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

32

Nonlinear SVMs: The Kernel Trick

Linear kernel:

2

2( , ) exp( )

2i j

i jK

x x

x x

( , ) Ti j i jK x x x x

( , ) (1 )T pi j i jK x x x x

0 1( , ) tanh( )Ti j i jK x x x x

Examples of commonly-used kernel functions:

Polynomial kernel:

Gaussian (Radial-Basis Function (RBF) ) kernel:

Sigmoid:

Slide credit: Jinwei Gu

Page 33: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

34

SVM for image classification

• Train k binary 1-vs-all SVMs (one per class)• For a test instance, evaluate with each

classifier• Assign the instance to the class with the

largest SVM output

Page 34: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

35

Naïve Bayes

Page 35: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

36

Naïve Bayes Model

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 36: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

37

Slide from Dan Klein

Example:

Page 37: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

38

Slide from Dan Klein

Page 38: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

39

Slide from Dan Klein

Page 39: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

40

Percentage of documents in training set labeled as spam/ham

Slide from Dan Klein

Page 40: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

41

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

Slide from Dan Klein

Page 41: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

42

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

Slide from Dan Klein

Page 42: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

43

Classification

The class that maximizes:

Page 43: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

44

Classification

• In practice

Page 44: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

45

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow

Page 45: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

46

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.

Page 46: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

47

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.

Page 47: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

48

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.– So, what we usually compute in practice is:

Page 48: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

49

Naïve Bayes on images

Page 49: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

50

Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 50: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

51

Naive Bayes Parameters

Problem: Categorize images as one of k object classes using Naïve Bayes classifier:– Classes: object categories (face, car, bicycle, etc)– Features – Images represented as a histogram of

visual words. are visual words.

treated as uniform. learned from training data – images labeled

with category. Probability of a visual word given an image category.

Page 51: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

52

Multi-class classifier –Naïve Bayes (1)

• Let V = {vi}, i = 1,…,N, be a visual vocabulary, in which each vi represents a visual word (cluster centers) from the feature space.

• A set of labeled images I = {Ii } .

• Denote Cj to represent our Classes, where j = 1,..,M

• N(t,i) = number of times vi occurs in image Ii

• Compute P(Cj|Ii):

Slide credit: Yun-hsueh Liu

Page 52: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

53

Multi-class Classifier –Naïve Bayes (2)

• Goal - Find maximum probability class Cj:

• In order to avoid zero probability, use Laplace smoothing:

Slide credit: Yun-hsueh Liu

Page 53: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Results

Page 54: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

55

Results

Page 55: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

56

Results

Page 56: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

57

Results

Results on Dataset 2

Page 57: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

58

Results

Page 58: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

59

Results

Page 59: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

60

Results

Page 60: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

Thoughts?

• Pros?

• Cons?

Page 61: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

62

Related BoF modelspLSA, LDA, …

Page 62: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

63

pLSA

wordtopicdocument

Page 63: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

64

pLSA

Page 64: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

67

pLSA on images

Page 65: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

68

Discovering objects and their location in imagesJosef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Documents – ImagesWords – visual words (vector quantized SIFT descriptors)Topics – object categories

Images are modeled as a mixture of topics (objects).

Page 66: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

69

Goals

They investigate three areas: – (i) topic discovery, where categories are

discovered by pLSA clustering on all available images.

– (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set.

– (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image.

Page 67: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

70

(i) Topic Discovery

Most likely words for 4 learnt topics (face, motorbike, airplane, car)

Page 68: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

71

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images.

Page 69: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

72

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good.

Page 70: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

73

(iii) Topic Segmentation

Page 71: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

74

(iii) Topic Segmentation

Page 72: Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1

75

(iii) Topic Segmentation