78
Object recognition (part 2) CSE P 576 Larry Zitnick ( [email protected] )

Object recognition (part 2) CSE P 576 Larry Zitnick ([email protected])[email protected]

Embed Size (px)

Citation preview

Page 1: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Object recognition (part 2)

CSE P 576Larry Zitnick ([email protected])

Page 2: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 3: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 4: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 5: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 6: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 7: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 8: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 9: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 10: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 11: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 12: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 13: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore

Support Vector Machines

Modified from the slides by Dr. Andrew W. Moorehttp://www.cs.cmu.edu/~awm/tutorials

Page 14: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 14Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 15: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 15Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 16: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 16Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 17: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 17Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 18: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 18Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

Any of these would be fine..

..but which is best?

Page 19: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 19Copyright © 2001, 2003, Andrew W. Moore

Classifier Margin

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Page 20: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 20Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)Linear SVM

Page 21: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 21Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin

f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Support Vectors are those datapoints that the margin pushes up against

Linear SVM

Page 22: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 22Copyright © 2001, 2003, Andrew W. Moore

Why Maximum Margin?

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Support Vectors are those datapoints that the margin pushes up against

1. Intuitively this feels safest.

2. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.

3. LOOCV is easy since the model is immune to removal of any non-support-vector datapoints.

4. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing.

5. Empirically it works very very well.

Page 23: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 23Copyright © 2001, 2003, Andrew W. Moore

Nonlinear Kernel (I)

Page 24: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 24Copyright © 2001, 2003, Andrew W. Moore

Nonlinear Kernel (II)

Page 25: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Support Vector Machines: Slide 25

Page 26: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 27: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 28: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

• Smallest category size is 31 images:

• Too easy?

– left-right aligned

– Rotation artifacts

– Soon will saturate performance

Caltech-101: Drawbacks

N train 30

Page 29: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Ant

onio

Tor

ralb

a ge

nera

ted

thes

e av

erag

e im

ages

of

the

Cal

tech

101

cat

egor

ies

Page 30: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com
Page 31: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

• Jump to Nicolas Pinto’s slides. (page 29)

Page 32: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

32

Objects in ContextR. Gokberk Cinbis

MIT 6.870 Object Recognition and Scene Understanding

Page 33: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

33

Papers

• A. Torralba. Contextual priming for object detection. IJCV 2003.

• A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

Page 34: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

34

Object DetectionProbabilistic Framework

Object presence at a particular location/scale

Given all imagefeatures (local/object and scene/context)

(Single Object Likelihood)

v = vLocal + vContextual

Page 35: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

35

Contextual Reasoning

Scene Centered

?

Contextual priming for object detection

Object Centered

Objects in Context

2D Reasoning 2,5D / 3D Reasoning

SUPPORT

VERTICAL

VERTICAL

SKY

Geometric context from a single image.

Surface orientations w.r.t. camera

Page 36: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

36

Preview: Contextual Priming for Object Detection

Input test image

Page 37: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

37

Preview: Contextual Priming for Object Detection

Correlate withmany filters

Page 38: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

38

Preview: Contextual Priming for Object Detection

Using previously collected statistics about filter output predict information about objects

Page 39: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

39

Preview: Contextual Priming for Object Detection

Where I can find the objects easily?

Which objects do I expect to see?

peoplecarchair

How large objects do I expect to see?

Predict information about objects

Page 40: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

40

Contextual Priming for Object Detection: Probabilistic Framework

Local measurements (a lot in the literature)

Contextual features

Page 41: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

41

Contextual Priming for Object Detection: Contextual Features

Gabor filters at4 scales and 6 orientations

Use PCA on filter output images to reduce the number of features (< 64)

Use Mixture of Gaussians to model the probabilities. (Other alternativesinclude KNN, parzen window, logistic regression, etc)

Page 42: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

42

Contextual Priming for Object Detection: Object Priming Results

(o1 =people, o2 =furniture, o3 =vehicles and o4 =trees

Page 43: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

43

Contextual Priming for Object Detection: Focus of Attention Results

Heads

Page 44: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

44

Contextual Priming for Object Detection:Conclusions

• Proves the relation btw low level features and scene/context

• Can be seen as a computational evidence for the (possible) existence of low-level feature based biological attention mechanisms

• Also a warning: Whether an object recognition system understands the object or works by lots bg features.

Page 45: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

45

Preview: Objects in Context

Input test image

Page 46: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

46

Preview: Objects in Context

Do segmentation on the image

Page 47: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

47

Preview: Objects in Context

Do classification (find label probabilities) in each segment only with local info

Building,boat, motorbike

Building, boat, person

Water,sky

Road

Page 48: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

48

Preview: Objects in Context

Building,boat, motorbike

Building, boat, person

Water,sky

Road

Most consistent labeling according to object co-occurrences & local label probabilities.

Boat

Building

Water

Road

Page 49: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

49

Objects in Context:

Local Categorization

Building,boat, motorbike

Building, boat, person

Water,sky

Road

• Extract random patches on zero-padded segments• Calculate SIFT descriptors

• Use BoF: Training: - Cluster patches in training (Hier. K-means, K=10x3) - Histogram of words in each segment - NN classifier (returns a sorted list of categories)

Each segment is classified independently

Page 50: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

50

Objects in Context:

Contextual RefinementContextual model based on co-occurrencesTry to find the most consistent labeling with high posterior probability and high mean pairwise interaction.Use CRF for this purpose. Boat

Building

Water

Road

Independent segment classificationMean interaction of all label pairs

Φ(i,j) is basically the observed label co-occurrences in training set.

Page 51: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

51

Objects in Context:

Learning Context

Using labeled image datasets (MSRC, PASCAL)

Using labeled text based data (Google Sets): Contains list of related items A large set turns out to be useless! (anything is related)

Page 52: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

52

Objects in Context:

Results

Page 53: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

53

“Objects in Context” – Limitations: Context modeling

SegmentationCategorization without contextLocal information only

With co-occurrence contextMeans: P(person,dog) > P(person, cow)

(Bonus Q: How did it handle the background?)

Page 54: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

54

“Objects in Context” – Limitations: Context modeling

SegmentationCategorization without contextLocal information only

With co-occurrence contextP(person,horse) > P(person, dog)

But why? Isn’t it only a dataset bias? We have seen in the previous example that P(person,dog) is common too.

Page 55: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

55

“Objects in Context” Object-Object or Stuff-Object ?

Stuff

Stuff-like

Looks like “background” stuff – object (such as water-boat) does help rather than “foreground” object co-occurrences (such as person-horse)[but still car-person-motorbike is useful in PASCAL]Labels with high

co-occurrences with other labels

Page 56: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

56

“Objects in Context” – Limitations: Segmentation

• Too good: A few or many? How to select a good segmentation in multiple segmentations?

• Can make object recognition & contextual reasoning (due to stuff detection) much easier.

Page 57: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

57

“Objects in Context” - Limitations

• No cue by unknown objects • No spatial relationship reasoning • Object detection part heavily depends on

good segmentations• Improvements using object co-occurrences

are demonstrated with images where many labels are already correct. How good is the model?

Page 58: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

58

Contextual Priming vs. Objects in Context

Scene->Object {Object,Stuff} <-> {Object,Stuff}

Simpler training data(only target object’s labels are enough)

May need huge amount of labeled data

Scene information is view-dependent (due to gist)

Can be more generic than scene->object with a very good model

Object detector independent Contextual model is object detector independent, in theory. But:

- uses segmentation can be unreliable

+ use segmentation easier to detect stuff

Page 59: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Finding the weakest link in person detectors

Larry ZitnickMicrosoft Research

Devi ParikhTTI, Chicago

Page 60: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Object recognition

We’ve come a long way…

Fischler and Elschlager, 1973Dollar et al., BMVC 2009

Page 61: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Still a ways to go…

Dollar et al., BMVC 2009

Page 62: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Dollar et al., BMVC 2009

Page 63: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Still a ways to go…

Dollar et al., BMVC 2009

Page 64: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Part-based person detector4 main components:

Feature selection

Felzenszwalb et al., 2005Hoiem et al., 2006

Color Intensity Edges

Part detection

Spatial model

NMS / context

Page 65: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

How can we help?• Humans supply training data…– 100,000s labeled images

• We design the algorithms.– Going on 40 years.

• Can we use humans to debug?

Help me!

Page 66: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Human debuggingFeature

selectionPart

detectionSpatial model

NMS / context

Feature selection

Spatial model

NMS / context

Feature selection

Part detection

NMS / context

Feature selection

Part detection

Amazon Mechanical Turk

Page 67: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Human performance• Humans ~90% average precision• Machines ~46% average precision

PASCAL VOC dataset

Page 68: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Human debuggingFeature

selectionSpatial model

NMS / context

Low resolution 20x20 pixels

Head Leg

Feet Head ?

Is it a head, torso, arm, leg, foot, hand, or nothing?

Nothing

Page 69: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Part detections

Humans

Machine

Head Torso Arm

Hand Leg Foot

Person

Page 70: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Part detections

Humans

Machine

Page 71: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Part detections

Humans

Machine

Page 72: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Part detections

Machine

High res

Low res

Page 73: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

AP results

Page 74: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Spatial model

High res

Low res

Feature selection

Part detection

NMS / context

Person

Not a person

Page 75: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Spatial model

Page 76: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Context/NMS

vs.

NMS / context

Page 77: Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com)larryz@microsoft.com

Conclusion