Insights into High-level Visual Perception

Preview:

Citation preview

Jeff B. PelzVisual Perception Laboratory

Carlson Center for Imaging ScienceRochester Institute of Technology

Insights into High-level Visual PerceptionInsights into High-level Visual Perception

or “Where You Look is What You Get”or “Where You Look is What You Get”

StudentsStudents

Roxanne Canosa (Ph.D. Imaging Science)

Jason Babcock (MS Color Science)

Eric Knappenberger (MS Imaging Science)

Dan Lerner (BS Imaging Science)

Marianne Lipps (BS Imaging Science)

“Optical Illusions”

Reveal the shortcomings of the visual system, and our

best effort to make sense from incomplete information

OutlineOutline

1. What are the fundamental limitations

of the visual system?

OutlineOutline

2. What strategies are employed to

compensate for those limitations?

1. Fundamental limitations

OutlineOutline

2. Strategies to compensate for limitations

3. Can we build tools that take advantage of

those strategies to inform the design and

evaluation of imaging systems?

1. Fundamental limitations

OutlineOutline

2. Strategies to compensate for limitations

3. Build design and evaluation tools

1. Fundamental limitations

4. Can we use our understanding of the human visual system to aid design of next-generation computer vision systems?

u Visual perception is a complex process thatunfolds over time, typically occurring at alevel below conscious awareness.

u People are often unaware of the details of howthey perform many tasks, including gatheringvisual information from the environment.

u By monitoring the eye movement patterns ofobservers as they perform a task, we can learnabout task strategy and performance.

IntroductionIntroduction

Fundamental LimitationsFundamental Limitations

1. What are the fundamental limitations

of the visual system?

There were evolutionary pressures for high-acuity vision (human as predator), and a widefield-of-view (human as prey).

The Design of the Visual SystemThe Design of the Visual System

There were evolutionary pressures for high-acuity vision (human as predator), and a widefield-of-view (human as prey).

Even if the entire cortex were devoted to vision,there are not sufficient resources to represent alarge visual field at high acuity.

The Design of the Visual SystemThe Design of the Visual System

The solution favored by nature representeda compromise between the two demands.

The foveal compromise makes use of:

A. Anisotropic sampling of the scene

B. Serial execution (task switching)

C. Limited internal representations

D. Focused attention

The The Foveal CompromiseFoveal Compromise

The foveal compromise

High-acuity central fovea

Limited-acuity periphery

A. Anisotropic Sampling of the Visual FieldA. Anisotropic Sampling of the Visual Field

periphery center periphery

phot

orec

epto

r de

nsity

If you can read this you must be cheating.+

Anisotropic Sampling of the Visual FieldAnisotropic Sampling of the Visual Field

The visual field must be sampled by thehigh-acuity fovea:

If you can read this you must be cheating

The foveal compromise requires a mechanismfor moving the eyes about the scene.

Anisotropic Sampling of the Visual FieldAnisotropic Sampling of the Visual Field

OutlineOutline

2. What strategies are employed to

compensate for those limitations?

1. Fundamental limitations

Each eye has three agonist-antagonist muscle pairs torotate the eye horizontally,vertically, and about theoptical axis.

Foveal Compromise: Eye MovementsFoveal Compromise: Eye Movements

Types of Eye MovementsTypes of Eye Movements

Smooth pursuit: match object motion

Vestibular-ocular response: compensate for self-motion

Vergence: merge images at different distances

Saccades: move fovea to new location

Background: Eye Movement TypesBackground: Eye Movement Types

Smooth pursuit

Vestibular-ocular response

Vergence

Saccades - Image destabilization: shifts fovea to new image region

Imagestabilization

u SaccadesAmplitude: < 1° → > 45° visual angle

Velocity: > 600°/secondFrequency: ~ 3-4/second (>150,000/day)

Saccades are made to targets requiring high spatial resolution and to the locus of attention.

Destabilizing Eye MovementsDestabilizing Eye Movements

B. Serial Execution: Sequential SamplingB. Serial Execution: Sequential Sampling

Serial Execution: Sequential SamplingSerial Execution: Sequential Sampling

Serial Execution: Sequential SamplingSerial Execution: Sequential Sampling

Serial Execution: Sequential SamplingSerial Execution: Sequential Sampling

Serial Execution: Sequential SamplingSerial Execution: Sequential Sampling

Serial Execution: FoveationsSerial Execution: Foveations

With each eye movement, the fovea ‘slidesunder’ a new portion of the retinal image.

A new portion of the image is sampled, buteach new sample is centered on the fovea

Serial Execution: FoveationsSerial Execution: Foveations

Serial Execution: FoveationsSerial Execution: Foveations

Serial Execution: FoveationsSerial Execution: Foveations

Serial Execution: FoveationsSerial Execution: Foveations

C. Internal RepresentationC. Internal Representation

B

A

If a high-acuity internal representation is built

up over multiple fixations, it should be easy to

detect even small differences between images.

Internal RepresentationInternal Representation

Following are two versions of the school

children, separated by a blank slide.

There is a difference between the two;

your task is to identify the difference.

View them in alternation, trying to find

the difference. The difference is clearly

visible in the slide at the end.

Internal RepresentationInternal Representation

A

View ~3 sec, then advanceView ~3 sec, then advance

View ~1/2 sec, then continueView ~1/2 sec, then continue

B

View ~3 sec, then REVERSEView ~3 sec, then REVERSE

A

Compare to previous slideCompare to previous slide

Something beyond variable acuity is responsible.

Deploying attention to different areas insequence conserves limited resources.

Changes to the scene can be made to unattendedregions without affecting conscious perception.

In nature, such changes usually induce apparentmotion, drawing attention to the region.

Limited Neural ResourcesLimited Neural Resources

The limited acuity periphery must besampled by the high-acuity fovea,resulting in serial data acquisition.

The eye movements guiding thatacquisition are externally-observablemarkers of acuity demands, deploymentof attention, and perceptual strategies.

Serial Execution: Eye MovementsSerial Execution: Eye Movements

Serial Execution; Image PreferenceSerial Execution; Image Preference

3 sec viewing

OutlineOutline

2. Strategies to compensate for limitations

3. Can we build tools that take advantage of

those strategies to inform the design and

evaluation of imaging systems?

1. Fundamental limitations

Measuring eye movementsMeasuring eye movements

The Problem:

“After all, the eye is sitting in a bag of fat in ahole in your head, and there are six big musclespulling on it.”

Cornsweet, 1976

The Solution:

“Barlow photographed a droplet of mercury placedon the limbus. Translations of the head wereminimized by having subjects lie on a stone slabwith their heads wedged tightly inside a rigid ironframe”

Kowler, 1990

Measuring eye movementsMeasuring eye movements

Measuring eye movementsMeasuring eye movements

Measuring eye movementsMeasuring eye movements

Video-based eyetrackerLimbus eyetracker

Measuring eye movementsMeasuring eye movements

Scleral eye-coils Dual Purkinje eyetracker

Infrared / VideoHeadband-mounted eyetracker

Head-mounted Head-mounted eyetrackereyetracker

Infrared, Video-based Infrared, Video-based EyetrackersEyetrackers

u Bright Pupil; On-axis Illumination

IRED

IRcamera

Remote eyetrackerRemote eyetracker

Infrared / VideoRemote-head eyetracker

Change BlindnessChange Blindness

Human Computer InterfaceHuman Computer Interface

= 250 ms

VisualizationVisualization

Image & Subject DependenceImage & Subject Dependence

Radiographic Search: ScanpathRadiographic Search: Scanpath

Radiographic Search: Fixation DensityRadiographic Search: Fixation Density

Measuring eye movementsMeasuring eye movements

These commercially available eyetrackersare restricted to laboratory use.

The ability to monitor perception as peopleperform real tasks in the real world wouldallow us to ask new kinds of questions.

RIT Wearable EyetrackerRIT Wearable Eyetracker

color CMOS scene camera

calibration LASER

hot mirror

folding mirror

IR illuminator/optics module

monochrome CMOS eye camera

RIT Wearable EyetrackerRIT Wearable Eyetracker

Fixation Sequence Before Image CaptureFixation Sequence Before Image Capture

Complex, Familiar TasksComplex, Familiar Tasks

OutlineOutline

2. Strategies to compensate for limitations

3. Build design and evaluation tools

1. Fundamental limitations

4. Can we use our understanding of the human

visual system to aid design of next-generation computer vision systems?

Because vision is effortless for humans, computervision was chosen as an early research domain.

Early attempts at computer vision systems attackedthe problem by brute force with limited success:

Tried Image Understanding on static 2D images(“From Pixels to Predicates”)

MotivationMotivation

Even in the face of Moore’s Law, computerswill not have sufficient power in the foreseeablefuture to solve “vision” by brute force.

LimitedLimited ComputationalComputational ResourcesResources

Even in the face of Moore’s Law, computerswill not have sufficient power in the foreseeablefuture to solve “vision” by brute force.

Computer-based perception faces the samefundamental challenge that human perceptiondid during evolution:

limited computational resources

LimitedLimited ComputationalComputational ResourcesResources

The solution favored by nature:

A. Anisotropic sampling of the scene

B. Serial execution (task switching)

C. Limited internal representations

D. Focused attention

The The Foveal CompromiseFoveal Compromise

Sensorial Experience

High-level Visual Perception

Attentional Mechanisms

Eye Movements

MotivationMotivation: : CognitiveCognitive ScienceScience

Human Cognition

Attentional Mechanisms

Eye Movements

Motivation: Cognitive ScienceMotivation: Cognitive Science

Artificial Intelligence

Computer Vision

“Active Vision”

Human Cognition

Sensorial Experience

High-level Visual Perception

Inspiration - Inspiration - Active VisionActive Vision

Active vision was the first step. Unliketraditional approaches to computer vision,active vision systems focused on extractinginformation from dynamic, 3D scenes.

CS @ U PennVision & robotics @ UR

Aloimonos, 1987 Bajcsy, 1988

Ballard, 1989 Brooks, 1991

Active VisionActive Vision

Inspired by anisotropic, binocular vision inhumans, researchers built neuromorphicvision systems that took advantage of‘active’ cameras.

Humanoid robotics @ MITVision & robotics @ UR

InspirationInspiration - “ - “ActiveActive VisionVision””

Visual routines were an important component

of the Active Vision approach. Pre-defined

routines are scheduled and run to extract

information when and where it is needed.

Limited representation + task-switching

Deploying attention and eye movements arecontrolled below conscious awareness; theremust be mechanisms (strategies) that protectus from the constraints of visual perception inthe real world - that help us make sense fromthe incomplete data available.

PerceptualPerceptual StrategiesStrategies

Beyond the mechanics of how the eyesmove during real tasks, we are interested instrategies that may support the consciousperception that is continuous temporally aswell as spatially.

PerceptualPerceptual strategiesstrategies

GoalGoal - “ - “StrategicStrategic VisionVision””

Strategic Vision can use high-level, top-down strategies for extracting informationfrom complex environments.

GoalGoal - “ - “StrategicStrategic VisionVision””

Strategic Vision can use high-level, top-down strategies for extracting informationfrom complex environments.

One goal of our research is to study humanbehavior in natural, complex tasks to searchfor visual routines that emerge under real-world constraints.

Perceptual StrategiesPerceptual Strategies

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

0 msec

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

770 msec

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

1400 msec

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

2000 msec

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

2700 msec

LimitedLimited representationsrepresentations: Successive Foveations: Successive Foveations

2800 msec

guiding fixation look-ahead fixation interaction

2000 msec 800 msec

Perceptual Strategies: Perceptual Strategies: Look-ahead Look-ahead fixationsfixations

. . .

Intervening tasks

0 5000

Sub-tasks

Fixations

milliseconds

Interposed look-ahead

2000 7000milliseconds

Sequenced look-ahead

Sub-tasks

Fixations

Perceptual Strategies: Perceptual Strategies: Look-ahead Look-ahead fixationsfixations

Humans employ strategies to ease thecomputational and memory loads inherent incomplex tasks. Look-ahead fixationsrepresent one such strategy:

Opportunistic execution of information-gathering visual routines to pre-fetchinformation needed for future subtasks.

Perceptual Strategies: Perceptual Strategies: Look-ahead Look-ahead fixationsfixations

u Monitoring eye movements gives us a windowinto perception and cognition that can revealdetails not available even to the observer.

u Visual Strategies observed can help usunderstand how people use vision in theirinteraction with the world, and perhaps aid inthe design of artificial systems that takeadvantage of this knowledge.

ConclusionsConclusions

ConclusionsConclusions

Tools that monitor subjects’ eye movementscan aid in the design and evaluation of imagingsystems.

The design of next-generation computer visionsystems may be aided by implementing algorithmsderived by understanding the strategies employedby the human visual system to compensate forlimited computational resources.

Questions?Questions?

Recommended