42
CONTEXT AS SUPERVISORY SIGNAL: DISCOVERING OBJECTS WITH PREDICTABLE CONTEXT Carl Doersch, Abhinav Gupta, Alexei Efros

Context as Supervisory Signal: Discovering Objects with Predictable Context

  • Upload
    kasia

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Context as Supervisory Signal: Discovering Objects with Predictable Context. Carl Doersch , Abhinav Gupta, Alexei Efros. Unsupervised Object Discovery. Children learn to see without millions of labels Is there a cue hidden in the data that we can use to learn better representations?. …. - PowerPoint PPT Presentation

Citation preview

Page 1: Context as Supervisory Signal: Discovering Objects with Predictable Context

CONTEXT AS SUPERVISORY SIGNAL:DISCOVERING OBJECTS WITH PREDICTABLE CONTEXT

Carl Doersch, Abhinav Gupta, Alexei Efros

Page 2: Context as Supervisory Signal: Discovering Objects with Predictable Context

Unsupervised Object Discovery

• Children learn to see without millions of labels• Is there a cue hidden in the data that we can

use to learn better representations?

Page 3: Context as Supervisory Signal: Discovering Objects with Predictable Context

How hard is unsupervised object discovery?

• In state of the art literature it is still commonplace to select 20 categories from Caltech 1011:

• Le et al. got in the New York Times for spending 1M+ CPU hours and reporting just 3 objects. 2

1Faktor & Irani, “Clustering by Composition” – Unsupervised Discovery of Image Categories, ECCV 20122Le et al, Building High-level Features Using Large Scale Unsupervised Learning, ICML 2012

Page 4: Context as Supervisory Signal: Discovering Objects with Predictable Context

Object Discovery = Patch Clustering

Page 5: Context as Supervisory Signal: Discovering Objects with Predictable Context

Object Discovery = Patch Clustering

Page 6: Context as Supervisory Signal: Discovering Objects with Predictable Context

Object Discovery = Patch Clustering

Page 7: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering algorithms

Page 8: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering algorithms

Page 9: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering algorithms?

Page 10: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering as Prediction

Y value ishere

Prediction:

Actual

Page 11: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering as Prediction

Y value ishere

Prediction:

Actual

Page 12: Context as Supervisory Signal: Discovering Objects with Predictable Context

What cue is used for clustering here?

• The datapoints are redundant! – Within a cluster, knowing one coordinate tells you

the other• Images are redundant too!

• Contribution #1 (of 3): use patch-context redundancy for clustering.

Page 13: Context as Supervisory Signal: Discovering Objects with Predictable Context

More rigorously

• Unlike standard clustering, we measure the affinity of a whole cluster rather than pairs

• Datapoints x can be divided into x=[p,c]• Affinity is defined by how ‘learnable’ is the

function c=F(p)– Or P(c|p)– Measure how well the function has been learned

via leave-one-out cross validation

Page 14: Context as Supervisory Signal: Discovering Objects with Predictable Context

Previous whispers of this idea• Weakly supervised methods

• Probabilistic clustering– choose clusters s.t. dataset can be compressed

• Autoencoders & RBM’s

(Olshausen & Field 1997)

Page 15: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clustering as Prediction

Y value ishere

Prediction:

Actual

Page 16: Context as Supervisory Signal: Discovering Objects with Predictable Context

When is a prediction “good enough”?

• Image distance metrics are unreliable

Min distance: 2.59e-4

Max distance: 1.22e-4

Input Nearest neighbor

Page 17: Context as Supervisory Signal: Discovering Objects with Predictable Context

It’s even worse during prediction

• Horizontal gratings are very close together in HOG feature space, and are easy to predict

• Cats have more complex shape that is farther apart and harder to predict.

? ?

Page 18: Context as Supervisory Signal: Discovering Objects with Predictable Context

Our task is more like this:

Page 19: Context as Supervisory Signal: Discovering Objects with Predictable Context

P2(X|Y)

P1(X|Y)

Y value ishere

Contribution #2 (of 3): Cluster Membership as Hypothesis Testing

Page 20: Context as Supervisory Signal: Discovering Objects with Predictable Context

What are the hypotheses?• thing hypothesis– The patch is a member of the proposed cluster– Context is best predicted by transferring

appearance from other cluster members• stuff hypothesis– Patch is ‘miscellaneous’– The best we can do is model the context as clutter

or texture, via low-level image statistics.

Page 21: Context as Supervisory Signal: Discovering Objects with Predictable Context

Our Hypotheses

True Region:

High Likelihood

Low Likelihood

Stuff Model

Stuff-basedPrediction

Thing Model

Thing-basedPrediction

PredictedCorrespondence

Correspondence

Condition Region (Patch)

Prediction Region (Context)

Page 22: Context as Supervisory Signal: Discovering Objects with Predictable Context

Real predictions aren’t so easy!

Page 23: Context as Supervisory Signal: Discovering Objects with Predictable Context

What to do?

• Generate multiple predictions from different images, and take the best

• Take bits and pieces from many images• Still not enough? Generate more with

warping.• With arbitrarily small pieces and lots of

warping, you can match anything!

Page 24: Context as Supervisory Signal: Discovering Objects with Predictable Context

Generative Image Models

• Long history in computer vision

• Come up with a distribution P(c|p), without looking at c

• Distribution is extremely complicated due to dependencies between features

- Hinton et al. 1995- Weber et al. 2000- Fergus et al. 2003

- Fei-Fei et al. 2007- Sudderth et al. 2005- Wang et al. 2006

Page 25: Context as Supervisory Signal: Discovering Objects with Predictable Context

Generative Image Models

Object

Part Locations

Features

Page 26: Context as Supervisory Signal: Discovering Objects with Predictable Context

Inference in Generative Models

Object

Part Locations

Features

Page 27: Context as Supervisory Signal: Discovering Objects with Predictable Context

Contribution #3: Likelihood Factorization

• Let c be context and p be a patch• Break c chunks c1…ck (e.g. HOG cells)

• Now the problem boils down to estimating

Page 28: Context as Supervisory Signal: Discovering Objects with Predictable Context

Our Hypotheses

True Region:

Stuff Model

Thing ModelPredictedCorrespondence

Correspondence

Condition Region (Patch)

Prediction Region (Context)

Page 29: Context as Supervisory Signal: Discovering Objects with Predictable Context

Surprisingly many good properties

• This is not an approximation• Broke a hard generative problem into a series of

easy discriminative ones• Latent variable inference can use maximum

likelihood: it’s okay to overfit to condition region• ci’s do not need to be integrated out• Likelihoods can be compared across models

Page 30: Context as Supervisory Signal: Discovering Objects with Predictable Context

Recap of Contributions

• Clustering as prediction• Statistical hypothesis testing of stuff vs. thing

to determine cluster membership• Factorizing the computation of stuff and thing

data likelihoods

Page 31: Context as Supervisory Signal: Discovering Objects with Predictable Context

Stuff Model

Condition Region

Find NN’s for thisRegion and transfer

Prediction Region

Page 32: Context as Supervisory Signal: Discovering Objects with Predictable Context

Faster Stuff Model

x1,000,000 x5,000

Conditional:

Random Image Patches Dictionary (GMM of patches)

Page 33: Context as Supervisory Signal: Discovering Objects with Predictable Context

Faster Stuff Model

x5,000

Cond’l:

Condition Region

Prediction Region

Region SpecifiedIn Dictionary

x5,000

Cond’l:

Marginalization

Page 34: Context as Supervisory Signal: Discovering Objects with Predictable Context

Thing Model

Image i…

Query image

……

Condition Region Prediction Region

Page 35: Context as Supervisory Signal: Discovering Objects with Predictable Context

Finite and Occluded ‘Things’

• If the thing model believes it can’t predict well, ‘mimic’ the background model– When the thing model believes the prediction

region corresponds to regions previously predicted poorly

– When the thing model has been predicting badly nearby (handles occlusion!)

Image i

……

Query image

……

Page 36: Context as Supervisory Signal: Discovering Objects with Predictable Context

Clusters discovered on 6 Pascal Categories

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cumulative Coverage

Visual Words .63 (.37)Russel et al. .66 (.38)HOG Kmeans .70 (.40)

Our Approach .83 (.48)Singh et al. .83 (.47)

Purit

y

Page 37: Context as Supervisory Signal: Discovering Objects with Predictable Context

Results: Pascal 2011, unsupervised

1

2

5

6

Rank Initial Final

Page 38: Context as Supervisory Signal: Discovering Objects with Predictable Context

Results: Pascal 2011, unsupervised

19

20

31

47

Rank Initial Final

Page 39: Context as Supervisory Signal: Discovering Objects with Predictable Context

Results: Pascal 2011, unsupervised

153

200

220

231

Rank Initial Final

Page 40: Context as Supervisory Signal: Discovering Objects with Predictable Context

Errors

Page 41: Context as Supervisory Signal: Discovering Objects with Predictable Context

Unsupervised keypoint transfer

L F Wheel CtrR F Wheel Ctr

L Headlight

L Mirror R Headlight [sic]

R MirrorR F Roof L B Roof

L Taillight

L B Wheel Ctr

R B Wheel Ctr

L F Roof R B Roof

Page 42: Context as Supervisory Signal: Discovering Objects with Predictable Context

Keypoint Precision-Recall

Recall

Pre

cis

ion

0 0.025 0.05 0.075 0.1 0.1250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.15

Ours

Exemplar LDA