40
Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John

Using the Forest to see the Trees: A computational model relating features, objects and scenes

  • Upload
    thor

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Using the Forest to see the Trees: A computational model relating features, objects and scenes. Antonio Torralba CSAIL-MIT. Joint work with. Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson. SceneType 2 {street, office, …}. S. O 1. O 1. O 1. O 1. O 2. O 2. - PowerPoint PPT Presentation

Citation preview

Page 1: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Using the Forest to see the Trees: A computational model relating

features, objects and scenes

Antonio Torralba

CSAIL-MIT

Joint work with

Aude Oliva, Kevin Murphy, William Freeman

Monica Castelhano, John Henderson

Page 2: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

From objects to scenes

ImageI

Local features L L L L

O2 O2 O2 O2

SSceneType 2 {street, office, …}

O1 O1 O1 O1Object localization

Riesenhuber & Poggio (99); Vidal-Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)

Page 3: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

From scenes to objects

SSceneType 2 {street, office, …}

Image

Local features L L L

I

O1 O1 O1 O1 O2 O2 O2 O2

L

Global gistfeatures

GObject localization

Page 4: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

From scenes to objects

SSceneType 2 {street, office, …}

Image

Local features L L L

I

O1 O1 O1 O1 O2 O2 O2 O2

L

Global gistfeatures

GObject localization

Page 5: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

The context challenge

2

What do you think are the hidden objects?

1

Biederman et al 82; Bar & Ullman 93; Palmer, 75;

Page 6: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

The context challengeWhat do you think are the hidden objects?

Answering this question does not require knowing how the objects look like. It is all about context.

Chance ~ 1/30000

Page 7: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

From scenes to objects

SSceneType 2 {street, office, …}

Image

Local features L L L

I

L

Global gistfeatures

G

Page 8: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Scene categorization

Office Corridor Street

Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

Page 9: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Place identificationOffice 610 Office 615

Draper street

59 other places…

Scenes are categories, places are instances

Page 10: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Supervised learning

Vg , Office}

Vg ,

{

{ Office}

Vg ,{ Corridor}

Vg ,{ Street}

Classifier

Page 11: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Supervised learning

Vg , Office}

Vg ,

{

{ Office}

Vg ,{ Corridor}

Vg ,{ Street}

Classifier

Which feature vector for a whole image?

Page 12: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Global features (gist)First, we propose a set of features that do not encode specific object information

Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

Page 13: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

| vt | PCA

80 features

Global features (gist)

V = {energy at each orientation and scale} = 6 x 4 dimensions

First, we propose a set of features that do not encode specific object information

G

Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

Page 14: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Example visual gists

I

I’

Global features (I) ~ global features (I’)

Cf. “Pyramid Based Texture Analysis/Synthesis”, Heeger and Bergen, Siggraph, 1995

Page 15: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Learning to recognize places

• Hidden states = location (63 values)

• Observations = vGt (80 dimensions)

• Transition matrix encodes topology of environment

• Observation model is a mixture of Gaussians centered on prototypes (100 views per place)

Office 610 Corridor 6b Corridor 6c Office 617

We use annotated sequences for training

Page 16: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Wearable test-bed v1

Kevin Murphy

Page 17: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Wearable test-bed v2

Page 18: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Place/scene recognition demo

Page 19: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

From scenes to objects

SSceneType 2 {street, office, …}

Image

Local features L L L

I

O1 O1 O1 O1 O2 O2 O2 O2

L

Global gistfeatures

GObject localization

Page 20: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Global scene features predicts object location

vg

New image

Image regions likely to contain the target

Page 21: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Vg ,1{ }X1

Vg ,2{ }X2

Vg ,3{ }X3

Vg ,4{ }X4

Training set (cars)

Global scene features predicts object location

The goal of the training is to learn the association between the location of the target and the global scene features

Page 22: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Global scene features predicts object location

vg X

Results for predicting the vertical location of people

Estimated Y

Tru

e Y

Results for predicting the horizontal location of people

Estimated X

Tru

e X

Page 23: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

The layered structure of scenes

p(x2|x1)p(x)

In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate.

Page 24: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Global scene features predicts object location

Stronger contextual constraints can be obtained using other objects.

vg X

Page 25: Using the Forest to see the Trees:  A computational model relating features, objects and scenes
Page 26: Using the Forest to see the Trees:  A computational model relating features, objects and scenes
Page 27: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

1

Page 28: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

1

Page 29: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Attentional guidance

Localfeatures

Saliency

Saliency models: Koch & Ullman, 85; Wolfe 94; Itti, Koch, Niebur, 98; Rosenholtz, 99

Page 30: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Attentional guidance

Localfeatures

Globalfeatures

Saliency

Scene prior

TASK

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Page 31: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Attentional guidance

Localfeatures

Globalfeatures

Saliency

Objectmodel

TASK

Scene prior

Page 32: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Comparison regions of interest

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Page 33: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Comparison regions of interest

Saliencypredictions

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

10%

20%

30%

Page 34: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Comparison regions of interest

Saliencypredictions

Saliency and Global scene priors

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

10%

20%

30%

Page 35: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Comparison regions of interest

Dots correspond to fixations 1-4

Saliencypredictions

10%

20%

30%

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Page 36: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Comparison regions of interest

Dots correspond to fixations 1-4

Saliencypredictions

Saliency and Global scene priors

10%

20%

30%

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Page 37: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Results

Saliency Region Contextual Region

1 2 3 4

Chance level: 33 %

100

90

80

70

60

50

% offixationsinsidethe region

Fixation number1 2 3 4

100

90

80

70

60

50

Scenes without people Scenes with people

Fixation number

Page 38: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Task modulation

Localfeatures

Globalfeatures

Saliency

Scene prior

TASK

Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Page 39: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

Task modulation

Mug search Painting search

Saliencypredictions

Saliency and Global scene priors

Page 40: Using the Forest to see the Trees:  A computational model relating features, objects and scenes

• From the computational perspective, scene context can be derived from global image properties and predict where objects are most likely to be.

• Scene context considerably improves predictions of fixation locations. A complete model of attention guidance in natural scenes requires both saliency and contextual pathways

Discussion