What should be done at the Low Level? 16-721: Learning-Based Methods in Vision A. Efros, CMU, Spring 2009

What should be done at the Low Level?

16-721: Learning-Based Methods in VisionA. Efros, CMU, Spring 2009

Class Introductions

• Name:• Research area / project / advisor• What you want to learn in this class?• When I am not working, I ______________• Favorite fruit:

Analysis Projects / Presentations

Wed: Varun

note-taker: Dan

Next Wed: Dan

note-taker: Edward

Dan and Edward need to meet with me ASAP

Varun needs to meet second time

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Light

Vision

Audition

STM

LTM

Motor

Sound

LightMove-ment

Odor (etc.)

Ceramiccup on a table

David Marr, 1982



The Retinal Image

An Image (blowup) Receptor Output



Image-basedRepresentation

Primal Sketch(Marr)

An Image

(Line Drawing)

RetinalImage

Image-based

processes

EdgesLinesBlobsetc.



Surface-basedRepresentation

Primal Sketch 2.5-D Sketch

Image-basedRepresentation

Surface-based

processes

StereoShadingMotion

etc.

Koenderink’s trick



Object-basedRepresentation

Object-based

processes

GroupingParsing

Completionetc.

Surface-basedRepresentation

2.5-D Sketch Volumetric Sketch

Geons(Biederman '87)



Category-basedRepresentation

Category-based

processes

Pattern-Recognition

Spatial-description

Object-basedRepresentation

Volumetric Sketch Basic-level Category

Category: cup

Color: light-gray

Size: 6”

Location: table

We likely throw away a lot

line drawings are universal

However, things are not so simple…

● Problems with feed-forward model of processing…

two-tone images

hair (not shadow!)

inferred external contours

“attached shadow” contour

“cast shadow” contour

Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity

The images do not contain this information a priori (at low level)

Cavanagh's argument

Marr's model (circa 1980) Cavanagh’s Model (circa 1990s)

Feedforward vs. feedback models

stimulusstimulus

2D shape

memory

3D shape

2½D sketch

Object

3D model

feedback

basic recognition with 2D primitives

reconstruction of shape from image features

object recognition by matching 3D models

primal sketch

A Classical View of Vision

Grouping /Segmentation

Figure/GroundOrganization

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

A Contemporary View of Vision

Figure/GroundOrganization

Grouping /Segmentation

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

But where we draw this line?

Question #1:What (if anything) should be done at the “Low-Level”?

N.B. I have already told you everything that is known. From now on, there

aren’t any answers.. Only questions…

Who cares? Why not just use pixels?

Pixel differences vs. Perceptual differences

Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Cornsweet Illusion

Campbell-Robson contrast sensitivity curveCampbell-Robson contrast sensitivity curve

Sine wave

Metamers

Question #1:What (if anything) should be done at the “Low-Level”?

i.e. What input stimulus should we be invariant to?

Invariant to:

• Brightness / Color changes?

small brightness / color changeslow-frequency changes

But one can be too invariant

Invariant to:

• Edge contrast / reversal?

I shouldn’t care what background I am on!

but be careful of exaggerating noise

Representation choices

Raw Pixels

Gradients:

Gradient Magnitude:

Thresholded gradients (edge + sign):

Thresholded gradient mag. (edges):

Typical filter bank

pyramid (e.g. wavelet, stearable, etc)

Filters

Input image

What does it capture?

v = F * Patch (where F is filter matrix)

Why these filters?

Learned filters

Spatial invariance

• Rotation, Translation, Scale• Yes, but not too much…

• In brain: complex cells – partial invariance

• In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -- will discuss later

Many lives of a boundary

Often, context-dependent…

input canny human

Maybe low-level is never enough?

Documents

What should be done at the Low Level? 16-721: Learning-Based Methods in Vision A. Efros, CMU, Spring 2009