36
Lecture 11 Hierarchies 6.870 Object Recognition and Scene Understanding http://people.csail.mit.edu/torralba/courses/ 6.870/6.870.recognition.htm

Mit6870 orsu lecture11

  • Upload
    zukun

  • View
    271

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mit6870 orsu lecture11

Lecture 11

Hierarchies

6.870 Object Recognition and Scene Understanding http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm

Page 2: Mit6870 orsu lecture11

Next weekAlec Rivers

Scene Understanding Based on Object Relationships

Gokberk Cinbis

Category Level 3D Object Detection Using View-Invariant Representations

Hueihan Jhuang and Sharat Chikkerur

Video shot boundary detection using GIST representation

Jenny Yuen

Semiautomatic alignment of text and images

Nathaniel R Twarog

A Filtering Approach to Image Segmentation: Perceptual Grouping in Feature Space

Nicolas Pinto

Evaluating dense feature descriptor and multi-kernel learning for face detection/recognition

Tilke Judd and Vladimir Bychkovsky

Identify the same people in different photographs from the same event

Tom Kollar

Context-based object priors for scene understanding

Tom Ouyang

Hand-Drawn Sketch Recognition, A Vision-Based Approach

Papers due this Friday (5pm): send PDF by email

Page 3: Mit6870 orsu lecture11

Hierarchies vs. holistic features

Although we haveseen some “successful”holistic methods.

Page 4: Mit6870 orsu lecture11

Hierarchies, compositionality and reusable parts

Compositionality refers to our evident ability to construct hierarchical representations, whereby constituents are used and reused in an essentially infinite variety of relational compositions.

Assumption (Bienenstock, Geman): what is learnable is what is representable as a hierarchy of more-or-less simple composition rules.

Bienenstock, Geman. Compositionality in neural systems.

Page 5: Mit6870 orsu lecture11

Hierarchies vs. holistic features

Feature hierarchies are often inspired by the structure of the primate visual system, which has been shown to use a hierarchy of features of increasing complexity, from simple local features in the primary visual cortex, to complex shapes and object views in higher cortical areas.

S. Ullman et al.

Page 6: Mit6870 orsu lecture11

Diagram of the visual system

Felleman and Van Essen, 1991

Page 7: Mit6870 orsu lecture11

Modified by T. Serre from Ungerleider and Haxby, and then shamelessly copied by me.

Page 8: Mit6870 orsu lecture11

Modified by T. Serre from Ungerleider and Haxby, and then copied by me.

Page 9: Mit6870 orsu lecture11

Modified by T. Serre from Ungerleider and Haxby, and then copied by me.

Page 10: Mit6870 orsu lecture11

Modified by T. Serre from Ungerleider and Haxby, and then copied by me.

Page 11: Mit6870 orsu lecture11

Modified by T. Serre from Ungerleider and Haxby, and then copied by me.

Page 12: Mit6870 orsu lecture11

IT readout

Slide by Serre

Page 13: Mit6870 orsu lecture11

Identifying natural images from human brain activity

?

Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.

Page 14: Mit6870 orsu lecture11
Page 15: Mit6870 orsu lecture11

Voxel Activity ModelGoal: to predict the image seen by the observer out of a large collection of possible images. And to do this for new images: this requires predicting fMRI activity for unseen images.

Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.

Page 16: Mit6870 orsu lecture11

Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.

Page 17: Mit6870 orsu lecture11

Performance

Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.

Page 18: Mit6870 orsu lecture11

D. Marr

Page 19: Mit6870 orsu lecture11

NeocognitronFukushima (1980). Hierarchical multilayered neural network

S-cells work as feature-extracting cells. They resemble simple cells of the primary visual cortex in their response.

C-cells, which resembles complex cells in the visual cortex, are inserted in the network to allow for positional errors in the features of the stimulus. The input connections of C-cells, which come from S-cells of the preceding layer, are fixed and invariable. Each C-cell receives excitatory input connections from a group of S-cells that extract the same feature, but from slightly different positions. The C-cell responds if at least one of these S-cells yield an output.

Page 20: Mit6870 orsu lecture11

Neocognitron

Learning is done greedily for each layer

Page 21: Mit6870 orsu lecture11

Convolutional Neural Network

The output neurons share all the intermediate levels

Le Cun et al, 98

Page 22: Mit6870 orsu lecture11

Hierarchical models of object recognition in cortex

Hierarchical extension of the classical paradigm of building complex cells from simple cells. Uses same notation than Fukushima: “S” units performing template matching, solid lines and “C” units performing non-linear operations ( “MAX” operation, dashed lines)

Riesenhuber, M. and Poggio, T. 99

Page 23: Mit6870 orsu lecture11

Slide by T. Serre

Page 24: Mit6870 orsu lecture11

Slide by T. Serre

Page 25: Mit6870 orsu lecture11
Page 26: Mit6870 orsu lecture11
Page 27: Mit6870 orsu lecture11
Page 28: Mit6870 orsu lecture11
Page 29: Mit6870 orsu lecture11
Page 30: Mit6870 orsu lecture11
Page 31: Mit6870 orsu lecture11

Learning a Compositional Hierarchy of Object Structure

Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008

The architecture

Parts model

Learned parts

Page 32: Mit6870 orsu lecture11

Learning a Compositional Hierarchy of Object Structure

Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008

Page 33: Mit6870 orsu lecture11

Learning a Compositional Hierarchy of Object Structure

• Fidler & Leonardis, CVPR’07Fidler & Leonardis, CVPR’07• Fidler, Boben & Leonardis, CVPR 2008Fidler, Boben & Leonardis, CVPR 2008

Layer 2

Layer 3

Layer 4

Layer 1

LEARNLEARNhierarchical libraryhierarchical library

car motorcycle dog person

• Hierarchical compositional architectureHierarchical compositional architecture• Features are shared at each layer Features are shared at each layer • Learning is done on natural imagesLearning is done on natural images• Indexing and matching detection schemeIndexing and matching detection scheme

Learned L1 – L3Learned L1 – L3

Learned hierarchical Learned hierarchical vocabularyvocabulary DetectionsDetections

Page 34: Mit6870 orsu lecture11

Learning a Compositional Hierarchy of Object Structure

• Fidler & Leonardis, CVPR’07Fidler & Leonardis, CVPR’07• Fidler, Boben & Leonardis, CVPR 2008Fidler, Boben & Leonardis, CVPR 2008

Layer 2

Layer 3

Layer 4

Layer 1

LEARNLEARNhierarchical libraryhierarchical library

car motorcycle dog person

Learned hierarchical Learned hierarchical vocabularyvocabulary DetectionsDetections

• Hierarchical compositional architectureHierarchical compositional architecture• Features are shared at each layer Features are shared at each layer • Learning is done on natural imagesLearning is done on natural images• Biologically plausible?Biologically plausible?

• Learns T- and L- junctions, different Learns T- and L- junctions, different curvatures, and features that graduallycurvatures, and features that graduallyincrease in complexityincrease in complexity

Page 35: Mit6870 orsu lecture11

Hierarchical Topic Models

z

x

JN

K

Latent Dirichlet Allocation (LDA)Blei, Ng, & Jordan, JMLR 2003

Pr(topic | doc)

Pr(word | topic)

“bag of features” models:

Object Recognition (Sivic et. al., ICCV 2005)

Scene Recognition (Fei-Fei et. al., CVPR 2005)

Page 36: Mit6870 orsu lecture11

HDP Object Model

• We learn the number of parts.

• Each object uses a different number of parts.

• The model assumes a known number of object categories.

Parts are distributions over appearances and locations

Sudderth et al. IJCV 2008