THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects...

Preview:

Citation preview

THE PROBLEM OF THE PROBLEM OF VISUAL RECOGNITION VISUAL RECOGNITION

(Ch. 3, Farah)(Ch. 3, Farah) Why is it difficult to identify real world Why is it difficult to identify real world

objects from the retinal image?objects from the retinal image? How many shape representations of each How many shape representations of each

distinguishable object do we need in distinguishable object do we need in memory?memory?

What is the nature of shape representations What is the nature of shape representations in memory? in memory? How do we evaluate a proposed representation?How do we evaluate a proposed representation? What are the fundamental dimensions of a What are the fundamental dimensions of a

useful shape representation?useful shape representation?

Just a small reminderJust a small reminder

From image to object: A From image to object: A hard problemhard problem

Why is it difficult to identify a Why is it difficult to identify a distal object using only our 2D distal object using only our 2D retinal image?retinal image?

The 2D retinal image is only The 2D retinal image is only partly determined by the shape partly determined by the shape of a distal objectof a distal object

Shape of the 2D retinal image Shape of the 2D retinal image ……

The The shapeshape of a 2D retinal image of an of a 2D retinal image of an object varies depending upon spatial object varies depending upon spatial relation between the viewer (e.g., you) relation between the viewer (e.g., you) and the object. and the object. In a 2D image, the shape of a CUBE or In a 2D image, the shape of a CUBE or

SPHERE is rarely square or round, more often SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, it appears as a parallelogram or an oval, respectively.respectively.

What else varies in a 2D retinal image?What else varies in a 2D retinal image? PositionPosition and and sizesize of the object in the picture of the object in the picture

planeplane Which Which surfaces are visible, foreshortened, surfaces are visible, foreshortened,

or occludedor occluded The presence or absence of shadows.The presence or absence of shadows.

Shape and Size Shape and Size ConstancyConstancy

How do we identify the How do we identify the object shape from a 2D object shape from a 2D

retinal image? retinal image? Do we infer (compute) the Do we infer (compute) the

shape from the image?shape from the image? Or, do we learn a separate Or, do we learn a separate

association between each association between each view of an object and its view of an object and its identity?identity?

Object recognition in Object recognition in normal humans: Two normal humans: Two

hypotheses hypotheses

Human shape perception Human shape perception is:is:

H1: Viewpoint dependent H1: Viewpoint dependent (Rock, Tarr)(Rock, Tarr)

H2: Viewpoint independent H2: Viewpoint independent (Marr, Biederman)(Marr, Biederman)

H1: Human shape H1: Human shape perception is tied to perception is tied to viewing conditions.viewing conditions.

Novel perspectives of Novel perspectives of wire figures can be wire figures can be hard to identify hard to identify (accuracy: 75% - 39% (accuracy: 75% - 39% correct; Rock et al, correct; Rock et al, 1981). 1981).

But, the same shapes But, the same shapes with clay surfaces can with clay surfaces can be recognized from be recognized from different perspectives different perspectives (Farah et al, 1994).(Farah et al, 1994).

H2: Human shape perception H2: Human shape perception is independent of viewing is independent of viewing

conditions.conditions. Mental rotation in Mental rotation in

the picture plane:the picture plane: We can name highly We can name highly

familiar letters/numbers familiar letters/numbers equally fast/accurate at equally fast/accurate at any orientation any orientation (Corballis, 1988). (Corballis, 1988).

But, orientation is But, orientation is important on first important on first encounter (Jolicoeur, encounter (Jolicoeur, 1985, see figure) .1985, see figure) .

Canonical perspectiveCanonical perspective

Variation from canonical Variation from canonical perspectiveperspective

Rotating in depth mimics Rotating in depth mimics foreshortening and changes foreshortening and changes canonical perspective.canonical perspective.

Time to name objects increases Time to name objects increases as they are rotated away from as they are rotated away from a canonical perspective a canonical perspective (Palmer et al, 1981). (Palmer et al, 1981).

Multiple Views Theory Multiple Views Theory (Tarr, 1995)(Tarr, 1995)

Shape representations Shape representations in memory combine in memory combine shape and viewpoint shape and viewpoint information (a la information (a la Rock). Rock).

We transform We transform perceptual perceptual representations to representations to match shape match shape representations across representations across changes in viewpoint. changes in viewpoint.

General conclusionsGeneral conclusions Our ability to identify “a familiar object Our ability to identify “a familiar object

from a novel image may depend strongly from a novel image may depend strongly on the type, complexity, and familiarity of on the type, complexity, and familiarity of the object”. (Farah, p. 68)the object”. (Farah, p. 68)

Most likely, we have more than one Most likely, we have more than one shape representation in memory per shape representation in memory per distinguishable object.distinguishable object.

Two potential ways to identify an image:Two potential ways to identify an image: Transform it to correspond to a familiar Transform it to correspond to a familiar

shapeshape Factor it into true object shape + viewing Factor it into true object shape + viewing

conditioncondition

Shape representation: a Shape representation: a computational computational

frameworkframework Some information about an object is Some information about an object is

EXPLICIT in the 2D retinal image EXPLICIT in the 2D retinal image (e.g., location in visual field, distance (e.g., location in visual field, distance of parts from viewer).of parts from viewer).

But, much of the information But, much of the information important for visual recognition is important for visual recognition is only IMPLICIT in the retinal image only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape (e.g., 3D shape, presence and shape of component parts).of component parts).

What is the nature of the What is the nature of the shape representation in shape representation in

memory?memory?

What are the criteria by What are the criteria by which we can evaluate which we can evaluate proposed shape proposed shape representations?representations?

What goes into creating a What goes into creating a shape representation?shape representation?

Criteria for evaluating the Criteria for evaluating the usefulness of the internal shape usefulness of the internal shape

representation for object representation for object recognition (Marr & Nishihara, recognition (Marr & Nishihara,

1978)1978)

AccessibilityAccessibility ScopeScope UniquenessUniqueness StabilityStability SensitivitySensitivity

AccessibilityAccessibility: Ease of : Ease of deriving (recovering)deriving (recovering) shape shape information about an object information about an object from a 2D retinal imagefrom a 2D retinal image Human object perception is Human object perception is

typically fast, effortless, and typically fast, effortless, and accurate.accurate.

Hence, the relevant information Hence, the relevant information should be recoverable from the 2D should be recoverable from the 2D image with minimal demand on image with minimal demand on resources.resources.

ScopeScope: : Range of stimuli Range of stimuli over which a shape over which a shape representation is effectiverepresentation is effective

Most machine vision Most machine vision representations are special representations are special purpose systems that can purpose systems that can only recognize stimuli in a only recognize stimuli in a limited domain (e.g., bank limited domain (e.g., bank numbers, blocks world). numbers, blocks world).

In contrast, human object In contrast, human object recognition system is often recognition system is often viewed as a general-purpose viewed as a general-purpose system, capable of system, capable of representing all types of representing all types of stimuli (objects, faces, stimuli (objects, faces, printed letters, handwriting).printed letters, handwriting).

From Palmer (1999)

UniquenessUniqueness: Assigning the : Assigning the same shape description to a same shape description to a

given image of an objectgiven image of an object

To describe an image of an object the To describe an image of an object the same way on different occasions requires same way on different occasions requires that the image is always coded using the that the image is always coded using the same coordinate system.same coordinate system. For example: Assigning the same shape For example: Assigning the same shape

representation to a particular chair on representation to a particular chair on different occasions requires that the chair be different occasions requires that the chair be coded using the same coordinates on each coded using the same coordinates on each occasion.occasion.

StabilityStability: : Assigning the Assigning the same shape representation to same shape representation to

images of the same object images of the same object under different viewing under different viewing

conditionsconditions A stable representation A stable representation capturescaptures the the intrinsic intrinsic

shapeshape of an object regardless of changes in of an object regardless of changes in image appearance due to shifts in location, image appearance due to shifts in location, perspective, lighting, position of moving parts perspective, lighting, position of moving parts (e.g., a cat in many positions). (e.g., a cat in many positions).

Stability also Stability also capturescaptures the the similarity relationssimilarity relations that exist between images of similar objects that exist between images of similar objects (e.g., seeing a polar bear and a black bear as (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different bears or seeing different black bears in different locations or on different occasions as bears). locations or on different occasions as bears).

Stability: CatsStability: Cats

Cats have movable Cats have movable parts, can be in parts, can be in different positions, different positions, colors, etc.colors, etc.

A stable shape A stable shape representation will representation will capture the capture the intrinsic shapeintrinsic shape of of a cat, regardless of a cat, regardless of variation in the 2D variation in the 2D retinal image.retinal image. From Kosslyn (1994)

SensitivitySensitivity: The degree to : The degree to which the shape representation which the shape representation

codes codes (subtle) (subtle) differences between differences between similar shapes and different similar shapes and different images of the same shapeimages of the same shape

Making within category discriminations:Making within category discriminations:

Being able to distinguish between the Being able to distinguish between the shape representations of different bears shape representations of different bears (black bears, polar bears, grizzly bears), (black bears, polar bears, grizzly bears), chairs (wooden chair, folding chair) and chairs (wooden chair, folding chair) and faces (your face, my face, your friend’s faces (your face, my face, your friend’s face).face).

Four fundamental Four fundamental aspects of shape aspects of shape representation representation

MarrMarr: Three dimensions of shape : Three dimensions of shape representation that must be specified in representation that must be specified in any computational model: any computational model: Coordinate systemCoordinate system PrimitivesPrimitives Organization.Organization.

Plaut & FarahPlaut & Farah: How the shape : How the shape representation is implemented.representation is implemented.

Coordinate system: Coordinate system: A A fundamental aspect of shape fundamental aspect of shape

representation.representation.

“… “… shape is nothing more than a set shape is nothing more than a set of locations occupied by an object” of locations occupied by an object” (Farah, 2000, p. 71) and hence, (Farah, 2000, p. 71) and hence, representing these locations has to be representing these locations has to be relative to some coordinate system.relative to some coordinate system.

Accessibility and stability trade-off. Accessibility and stability trade-off. Highly accessible coordinate systems Highly accessible coordinate systems have low stability and vice versa.have low stability and vice versa.

Three types of coordinate Three types of coordinate systemssystems

Viewer centeredViewer centered Environment centeredEnvironment centered Object centeredObject centered

Viewer-centered Coordinate Viewer-centered Coordinate SystemSystem

Locations are specified relative to Locations are specified relative to viewer – retina, head, hand, etcviewer – retina, head, hand, etc..

Visual stimuli are initially Visual stimuli are initially represented in a represented in a retinotopicretinotopic coordinate system (2D space with coordinate system (2D space with origin fixed with respect to retina). If origin fixed with respect to retina). If either the eyes or the object moves, either the eyes or the object moves, the retinotopic representation the retinotopic representation changes. changes.

Very accessible, poor stability.Very accessible, poor stability.

Viewer-centered photosViewer-centered photos

Environment-centered Environment-centered Coordinate SystemCoordinate System

Locations of objects are specified Locations of objects are specified relative to other objects in the relative to other objects in the environment.environment.

Stable over movements of viewer, but Stable over movements of viewer, but not over movements of objects. not over movements of objects.

Requires the viewer to continually Requires the viewer to continually update the spatial relationship of the update the spatial relationship of the environment to the viewer as the environment to the viewer as the viewer moves about the environment. viewer moves about the environment. Accessibility is reduced.Accessibility is reduced.

Object-Centered Coordinate Object-Centered Coordinate SystemSystem

Locations occupied by different parts of an Locations occupied by different parts of an object are represented in a coordinate object are represented in a coordinate system intrinsic to, or fixed, relative to the system intrinsic to, or fixed, relative to the object.object.

Mug: Mug: Handle is on the outside wall of a Handle is on the outside wall of a cylindercylinder. This spatial relation stays the . This spatial relation stays the same, regardless of viewing perspective. same, regardless of viewing perspective. Position and orientation invariance yields perfect Position and orientation invariance yields perfect

stability, but reduced accessibility.stability, but reduced accessibility. Interesting difficulty: How do you assign Interesting difficulty: How do you assign

relations between parts before you recognize relations between parts before you recognize object?object?

Primitives: Primitives: What is localized in What is localized in space: Contours, surfaces, or 3D space: Contours, surfaces, or 3D

shapes? shapes?

Contour-based primitivesContour-based primitives? ? Edges are extracted from visual Edges are extracted from visual

image early in cortical processing. image early in cortical processing. They are relatively accessible, They are relatively accessible, but have limited scope, and are but have limited scope, and are not stable across viewing not stable across viewing conditions, especially depth conditions, especially depth rotation. rotation.

Primitives cont.Primitives cont.

Surface-based Surface-based primitivesprimitives? Evidence ? Evidence suggests simple cells suggests simple cells in V1 actually code in V1 actually code surfaces. Surfaces surfaces. Surfaces provide broader scope, provide broader scope, better stability. better stability. (Marr’s 2 ½-D sketch). (Marr’s 2 ½-D sketch).

Primitives cont…Primitives cont… Volume-based primitivesVolume-based primitives: :

Although it is Although it is computationally difficult to computationally difficult to derive them from a 2D derive them from a 2D image, volume-based image, volume-based primitives seem ideal for primitives seem ideal for object recognition.object recognition. Marr’s cylinders (upper Marr’s cylinders (upper

figure)figure) Biederman’s geons (lower Biederman’s geons (lower

figure)figure)

Biederman’s GEON Biederman’s GEON modelmodel

Some geons

Organization: Degree and Organization: Degree and type of relation among type of relation among

elements of shape elements of shape representation. representation.

Are the elements Are the elements on:on: the same scale the same scale

as in Biederman’s as in Biederman’s geon model orgeon model or

related related hierarchically as hierarchically as in Marr’s model?in Marr’s model?

Recapping …Recapping … Have examined: Have examined:

Need for multiple shape representations in Need for multiple shape representations in memorymemory

Criteria for evaluating shape representationsCriteria for evaluating shape representations Three coordinate systemsThree coordinate systems Nature of the primitive elementsNature of the primitive elements

Taken together, the evidence suggests Taken together, the evidence suggests that object recognition may use an object-that object recognition may use an object-centered coordinate system, where centered coordinate system, where volume-based primitive parts combine to volume-based primitive parts combine to represent objects. represent objects.

ImplementationImplementation

Neural net modeling blurs the Neural net modeling blurs the distinction between the algorithmic distinction between the algorithmic (computational processes involved in (computational processes involved in perception) and implementation (brain, perception) and implementation (brain, machine) levels. Hence, consider two machine) levels. Hence, consider two aspects here.aspects here. Nature of the computations underlying Nature of the computations underlying

memory search differs between symbolic memory search differs between symbolic and neural net models.and neural net models.

Local vs. distributed representationsLocal vs. distributed representations

Models in Cognitive Models in Cognitive PsychologyPsychology

Function:Function: Help to organize what we knowHelp to organize what we know Help to identify gaps in our knowledgeHelp to identify gaps in our knowledge Are the source of testable hypothesesAre the source of testable hypotheses When implemented as a computer When implemented as a computer

model, allow us to test the adequacy of model, allow us to test the adequacy of the modelthe model

Main Types of Cognitive Main Types of Cognitive ModelsModels

Symbolic ModelsSymbolic Models

Symbolic Symbolic ModelsModels Parallel Parallel

processing vs processing vs serial processingserial processing

Transformation Transformation of symbolic of symbolic information from information from stage to stagestage to stage

Nature of the Nature of the computations underlying computations underlying

memory search. memory search. Symbolic model: Symbolic model:

Perceptual representation is Perceptual representation is separate from the stored shape separate from the stored shape representation in memory. representation in memory.

Comparison process is separated Comparison process is separated from knowledge. from knowledge.

Explicitly compares input (perceptual Explicitly compares input (perceptual representation) to memory (shape representation) to memory (shape representations in memory). representations in memory).

Neural Net ModelsNeural Net Models

Neural Net ModelsNeural Net Models Simple units: Nodes Simple units: Nodes

organized in layers organized in layers (input, hidden, (input, hidden, output)output)

Activation level of Activation level of unitunit

Connections between Connections between unitsunits

Connection weightsConnection weights

Computations underlying Computations underlying “memory search” in neural “memory search” in neural

net modelnet model

IN NEURAL NET MODELSIN NEURAL NET MODELS Pattern of activation across units Pattern of activation across units

corresponds to recognized object, corresponds to recognized object, jointly determined by input activation jointly determined by input activation and weights of network (system and weights of network (system knowledge). knowledge).

Difficult to distinguish Difficult to distinguish structure/process; perception/memory.structure/process; perception/memory.

Local vs distributed Local vs distributed representations representations

LocalLocal: : One-to-one One-to-one mapping of things doing the mapping of things doing the representing to that which representing to that which is being represented (i.e., is being represented (i.e., grandmother cells).grandmother cells).

DistributedDistributed: : Many-to-Many-to-many mapping of things many mapping of things representing onto things representing onto things being represented. A being represented. A pattern of activation over pattern of activation over many units. many units.

Distributed Distributed representations …representations …

Distributed Distributed representations…representations…

Represent and retrieve information Represent and retrieve information efficiently in a network of highly efficiently in a network of highly interconnected representational interconnected representational units (like neurons in the brain)units (like neurons in the brain)

Allow a greater number of entities Allow a greater number of entities to be represented within a given to be represented within a given number of unitsnumber of units

Degrade gracefullyDegrade gracefully Automatically generalize (but this Automatically generalize (but this

can cause interference)can cause interference)

Onward to Object Onward to Object RecognitionRecognition

Chapter 4: Object recognitionChapter 4: Object recognition

Chapter 5: Face RecognitionChapter 5: Face Recognition

Chapter 6: Word RecognitionChapter 6: Word Recognition

Recommended