66
Challenges in Visual Recognition: Challenges in Visual Recognition: A Historical Perspective A Historical Perspective Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California at Berkeley

Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Embed Size (px)

Citation preview

Page 1: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Challenges in Visual Recognition:Challenges in Visual Recognition:A Historical PerspectiveA Historical Perspective

Jitendra MalikJitendra MalikJitendra MalikJitendra MalikUniversity of California at BerkeleyUniversity of California at Berkeley

Page 2: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

The more you look, the more you see!The more you look, the more you see!

Page 3: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

PASCAL Visual Object Challenge

Page 4: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

We want to locate the objectWe want to locate the object

Orig. Image Segmentation Orig. Image Segmentation

Page 5: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

And we want to detect and label parts..

The Visually Tagged HumanThe Visually Tagged Human Projectj

Page 6: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Categorization at Multiple Levelsg p

Watertd

TigerGrass outdoor

wildlife

Sand back

Tiger

eye

head

tail

eye

legs mouth

Computer Vision GroupUC Berkeleyshadow

Page 7: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 8: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Examples of Actionsp• Movement and posture change

run walk crawl jump hop swim skate sit stand kneel lie dance– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …

• Object manipulationObject manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,

press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike insert extract juggle play musical instrument (various)bike, insert, extract, juggle, play musical instrument (various)…

• Conversational gesturepoint– point, …

• Sign Language

Page 9: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Key cues for action recognitiony g

• “Morpho-kinetics” of action (shape andMorpho kinetics of action (shape and movement of the body)Id tit f th bj t/• Identity of the object/s

• Activity contexty

• ACTION = MOVEMENT + GOAL

Page 10: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Resolution Regimesg

Far field Near fieldFar field Near field

3 i l• 3-pixel man• Blob tracking

• 300-pixel man• Stick Figureg

Page 11: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Medium-field Recognitiong

The 30-Pixel Man

Page 12: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

The more you look, the more you see!The more you look, the more you see!

Page 13: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

We need to identifyy

• Objects

A t• Agents

• Relationships among objects with objects, objects p g j j jwith agents, agents with agents …

• Events and ActionsEvents and Actions

Computer Vision GroupUniversity of California

Berkeley

Page 14: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Different aspects of visionDifferent aspects of vision

• Perception: study the “laws of seeing” -predict what a• Perception: study the laws of seeing -predict what a human would perceive in an image.

• Neuroscience: understand the mechanisms in the retina andNeuroscience: understand the mechanisms in the retina and the brain

• Function: how laws of optics, and the statistics of the p ,world we live in, make certain interpretations of an image more likely to be valid

The match between human and computer vision is strongest at the level of function, but since typically the results of computer vision aremeant to be conveyed to humans makes it useful to be consistent

ith h ti N i i f id b t b iwith human perception. Neuroscience is a source of ideas but beingbio-mimetic is not a requirement.

Page 15: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Taxonomy and Partonomyy y

• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia

Recognition can be at multiple levels of categorization or be identification at– Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces.

• Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes.

• These notions apply equally well to scenes and to activities.

h l i h d h h i b i l l hi h• Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al).

• In a partonomy each level contributes useful information for recognition.In a partonomy each level contributes useful information for recognition.

Computer Vision GroupUC Berkeley

Page 16: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Visual Processing AreasVisual Processing Areasgg

Page 17: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Macaque Visual AreasMacaque Visual Areas

Page 18: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Hubel and Wiesel (1962) discovered orientation sensitive neurons in V1

Page 19: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

These cells respond to edges and bars ..

Page 20: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 21: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Orientation based features were inspired by V1 (SIFT GIST HOG GB etc)(SIFT, GIST, HOG, GB etc)

Computer Vision GroupUC Berkeley

Page 22: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Attneave’s Cat (1954)Line drawings convey most of the informationLine drawings convey most of the information

Computer Vision GroupUC Berkeley

Page 23: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Modeling simple cellsModeling simple cells

• Elongated directional G i d i i

• Elongated directional G i d i iGaussian derivatives

• 2nd derivative and Gaussian derivatives

• 2nd derivative and Hilbert transform

• L1 normalized for Hilbert transform

• L1 normalized for 1scale invariance

• 6 orientations 3 scales

1scale invariance

• 6 orientations 3 scales6 orientations, 3 scales• Zero mean

6 orientations, 3 scales• Zero mean

Used for texture discrimination and classification by Malik and Perona (1990), Leung and Malik (1999)

Page 24: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Texton Histogram Model for Recognition(Leung & Malik 1999) cf Bag of Words(Leung & Malik, 1999) cf. Bag of Words

Rough Plastic

Pebbles

Plaster-b

Terrycloth

ICCV '99, Corfu, Greece

Page 25: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Object Detection can be very fastj y

O k f j d i i l• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & (Thorpe, 2006)

C bl t ti d l i th ti– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.

– Doesn’t rule out feed back but shows feed f d l i f lforward only is very powerful

• Detection and categorization are ti ll i lt (G ill S tpractically simultaneous (Grill-Spector

& Kanwisher, 2005)

Computer Vision GroupUC Berkeley

Page 26: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Rolls et al (2000)Rolls et al (2000)Rolls et al (2000)Rolls et al (2000)

Page 27: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 28: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Convolutional Neural Networks (LeCun et al)(LeCun et al)

Page 29: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

A brief history of computer vision ..

Those who cannot remember the past are condemned to repeat it-George Santayana

29

Page 30: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Fifty years of computer vision 1963-2013y y p

• 1960s: Beginnings in artificial intelligence, image processing and pattern recognition

• 1970s: Foundational work on image formation: Horn, Koenderink, Longuet-Higgins …

• 1980s: Vision as applied mathematics: geometry, multi-scale analysis, probabilistic modeling, control theory, optimization

• 1990s: Geometric analysis largely completed, vision meets graphics, statistical learning approaches resurface

• 2000s: Significant advances in visual recognition, range of practical applications

Computer Vision GroupUC Berkeley

Page 31: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Object recognition in computer visionj g p

• Recognition as Pose Estimation

R iti D i ti i V l t i• Recognition as Description using Volumetric primitives

• Recognition as Pattern Classification

• Recognition as Deformable MatchingRecognition as Deformable Matching

Computer Vision GroupUniversity of California

Berkeley

Page 32: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Recognition as Pose Estimation:Object as a set of points in 3DObject as a set of points in 3D

• Roberts (1963) , Faugeras & Hebert (1983), Huttenlocher & Ullman (1987)( )

• VariantsGeometric Hashing : Lamdan & Wolfson (1988)– Geometric Hashing : Lamdan & Wolfson (1988)

– Pose Clustering : Stockman (1987), Olson (1994)Linear Combination of Views: Basri & Ullman (1991)– Linear Combination of Views: Basri & Ullman (1991)

Computer Vision GroupUniversity of California

Berkeley

Page 33: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Huttenlocher & Ullman’s alignment Algorithm (1990)Algorithm (1990)

Page 34: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Recognition as Fitting Volumetric Primitives: Object as a hierarchy of simple shapesObject as a hierarchy of simple shapes

• Binford (1971) , Marr & Nishihara (1978), Biederman(1987)( )

• Discredited as an approach for recognition in general, it has retained appeal for analyzing images of peopleit has retained appeal for analyzing images of people

Computer Vision GroupUniversity of California

Berkeley

Page 35: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

The Stick Figure IdealThe Stick Figure Ideal

Page 36: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 37: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Recognition as Statistical Pattern Classification: Object as a feature vectorObject as a feature vector

• Optical Character Recognition studied as far back as the 1950s. Recent years focus on handwritten digit classification and face detection.

• Some examples:– Neural networks: Neocognitron (Fukushima, 1980, 1988) , Convolution

Neural Networks (LeCun et al), C2 Features (Serre, Wolf & Poggio 2005)

– Support Vector Machines (various)– Decision Trees (Amit, Geman, & Wilder, 1997)– Boosted Decision Trees (Viola & Jones 2001)– Boosted Decision Trees (Viola & Jones, 2001)

Computer Vision GroupUniversity of California

Berkeley

Page 38: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 39: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Handwritten digit recognition (MNIST USPS)(MNIST,USPS)

• LeCun’s Convolutional Neural Networks variations (0.8%, 0 6% d 0 4% MNIST)0.6% and 0.4% on MNIST)

• Tangent Distance(Simard, LeCun & Denker: 2.5% on USPS)

• Randomized Decision Trees (Amit, Geman & Wilder, 0.8%)

• K-NN based Shape context/TPS matching (Belongie, Malik & p g ( g ,Puzicha: 0.6% on MNIST)

Computer Vision GroupUniversity of California

Berkeley

Page 40: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Convolutional Neural Networks (LeCun et al)(LeCun et al)

Page 41: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

The idea behind Tangent Distance (Simard et al)(Simard et al)

Page 42: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 43: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 44: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Amit, Geman & Wilder (1997)( )

Page 45: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Recognition as Pictorial Structure Matching: Object as a spatial configuration of features

• Transformations to model shape variation - D’Arcy Wentworth Thompson (1910)(1910)

• Grenander (1970s and later) probabilistic models on transformations

Fi hl d El hl (1973) d f bl hi f l d k “ i• Fischler and Elschlager (1973) - deformable matching of landmarks ,“point masses”, in a configuration of “springs” to model deformable templates.

• Von der Malsburg - dynamic link architecture for neural modeling, elastic Vo de a sbu g dy a c a c tectu e o eu a ode g, e ast cgraph matching for face recognition (1993, 1997)

• Felzenszwalb and Huttenlocher (2000) - pictorial structures for aligning h b di t ti k fi i d i ihuman bodies to stick figures using dynamic programming

• Belongie, Malik & Puzicha (2001) use “shape contexts” as point descriptors, and thin plate splines to model deformation.p , p p

Computer Vision GroupUniversity of California

Berkeley

Page 46: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Modeling shape variation in a categoryg p g y

• D’Arcy Thompson: On Growth and Form, 1917y p ,– studied transformations between shapes of organisms

Computer Vision GroupUniversity of California

Berkeley

Page 47: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

MatchingExampleExample

model target

Computer Vision GroupUniversity of California

Berkeley

Page 48: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

EZ-Gimpy Results (Mori & Malik, 2003)py ( , )

• 171 of 192 images correctly identified: 92 %g y

horse spadep

smile join

Computer Vision GroupUC Berkeleycanvas here

Page 49: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Face DetectionFace Detection Carnegie Mellon University

R lt i i b itt d t th CMU li f d t tResults on various images submitted to the CMU on‐line face detectorhttp://www.vasc.ri.cmu.edu/cgi‐bin/demos/findface.cgi

Page 50: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Multiscale sliding windowMultiscale sliding window

Ask this question repeatedly varying position scale categoryAsk this question repeatedly, varying position, scale, category…

Paradigm introduced by Rowley, Baluja & Kanade 96 for face detectionViola & Jones 01 Dalal & Triggs 05 Felzenszwalb McAllester Ramanan 08Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08

Page 51: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Caltech-101 [Fei-Fei et al. 04][ ]

• 102 classes, 31-300 images/class

Computer Vision GroupUC Berkeley

Page 52: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Caltech 101 classification results

(even better by combining cues )(even better by combining cues..)

Page 53: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 54: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

PASCAL Visual Object Challenge

Page 55: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 56: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 57: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 58: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

A good building block is a linear SVM trainedA good building block is a linear SVM trained on HOG features (Dalal & Triggs)

Page 59: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 60: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

AP=0.23AP 0.23

Page 61: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 62: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection
Page 63: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Datasets and computer vision (slide credit: Fei‐Fei Li)(slide credit: Fei‐Fei Li)

UIUC Cars (2004)S. Agarwal, A. Awan, D. Roth

FERET Faces (1998)P. Phillips, H. Wechsler, J. H P R

CMU/VASC Faces (1998)H. Rowley, S. Baluja, T. Kanade

COIL Objects (1996)S. Nene, S. Nayar, H. Murase

Huang, P. Raus

MNIST di i (1998 10) KTH h i (2004) Si L (2008) S i (2001)MNIST  digits (1998‐10)Y LeCun & C. Cortes

KTH human action (2004)I. Leptev & B. Caputo

Sign Language (2008)P. Buehler, M. Everingham, A. Zisserman 

Segmentation (2001)D. Martin, C. Fowlkes, D. Tal, J. Malik.

3D Textures (2005)S. Lazebnik, C. Schmid, J. Ponce

CuRET Textures (1999)K. Dana B. Van Ginneken S. Nayar J. Koenderink

CAVIAR Tracking (2005)R. Fisher, J. Santos‐Victor J. Crowley 

Middlebury Stereo (2002)D. Scharstein R. Szeliski 

Page 64: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

Comparison among free datasets

10)

p g(slide credit: Fei‐Fei Li)

4ory (lo

g_1

3

PASCAL1LabelMe

er catego

2

Caltech101/256MRSCTiny Images2m

ages pe

1

f clean

 im

1 2 3 4 5# of visual concept categories (log_10)

# of

1. Excluding the Caltech101 datasets from PASCAL2. No image in this dataset is human annotated. The # of clean images per category is a rough estimation

Page 65: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

The more you look, the more you see!The more you look, the more you see!

Page 66: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection

So much remains to be done…

• Objects, Scenes, Events

• The semantic gap is to be confronted, not avoided!

Computer Vision GroupUC Berkeley