70
IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty, IIIT-B

IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Embed Size (px)

Citation preview

Page 1: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

IIIT-B Computer Vision, Fall 2006

Lecture 1 Introduction to Computer

VisionArvind Lakshmikumar

Technology Manager, Sarnoff Corporation

Adjunct Faculty, IIIT-B

Page 2: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Course Overview• Introduction to vision • Case Studies of Applied Vision

– Automotive Safety

– Autonomous Navigation

– Industrial Inspection

– Medical Imaging

– Entertainment

• Image Formation • About Cameras• Image Processing • Geometric Vision • Camera Motion

• Paper readings

Page 3: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Computer Graphics

Image

Output

ModelSyntheticCamera

(slides courtesy of Michael Cohen)

Page 4: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Real Scene

Computer Vision

Real Cameras

Model

Output

(slides courtesy of Michael Cohen)

Page 5: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Combined

Model Real Scene

Real Cameras

Image

Output

SyntheticCamera

(slides courtesy of Michael Cohen)

Page 6: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

The Vision Problem

How to infer salient properties of 3-D world from time-varying

2-D image projection

¤ What is salient?¤ How to deal with loss of information going from 3-D to 2-D?

Page 7: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Why study Computer Vision?

• Images and movies are everywhere• Fast-growing collection of useful applications

– building representations of the 3D world from pictures

– automated surveillance (who’s doing what)– movie post-processing– face finding

• Various deep and attractive scientific mysteries– how does object recognition work?

• Greater understanding of human vision

Page 8: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Properties of Vision

• One can “see the future”– Cricketers avoid being hit in the head

• There’s a reflex --- when the right eye sees something going left, and the left eye sees something going right, move your head fast.

– Gannets pull their wings back at the last moment• Gannets are diving birds; they must steer with their

wings, but wings break unless pulled back at the moment of contact.

• Area of target over rate of change of area gives time to contact.

Page 9: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Properties of Vision

• 3D representations are easily constructed– There are many different cues. – Useful

• to humans (avoid bumping into things; planning a grasp; etc.)

• in computer vision (build models for movies).

– Cues include• multiple views (motion, stereopsis)• texture • shading

Page 10: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Properties of Vision• People draw distinctions between what is seen

– “Object recognition”

– This could mean “is this a fish or a bicycle?”

– It could mean “is this George Washington?”

– It could mean “is this someone I know?”

– It could mean “is this poisonous or not?”

– It could mean “is this slippery or not?”

– It could mean “will this support my weight?”

– Great mystery

• How to build programs that can draw useful distinctions based on image properties.

Page 11: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part I: The Physics of Imaging• How images are formed

– Cameras• What a camera does

• How to tell where the camera was

– Light• How to measure light

• What light does at surfaces

• How the brightness values we see in cameras are determined

– Color• The underlying mechanisms of color

• How to describe it and measure it

Page 12: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part II: Early Vision in One Image

• Representing small patches of image– For three reasons

• We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points

• Sharp changes are important in practice --- known as “edges”

• Representing texture by giving some statistics of the different kinds of small patch present in the texture.

– Tigers have lots of bars, few spots– Leopards are the other way

Page 13: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Representing an image patch

• Filter outputs– essentially form a dot-product between a

pattern and an image, while shifting the pattern across the image

– strong response -> image locally looks like the pattern

– e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright bar next to dark bar)

Page 14: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Convolve this image

With this kernel

To get this

Page 15: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Texture• Many objects are distinguished by their texture

– Tigers, cheetahs, grass, trees• We represent texture with statistics of filter outputs

– For tigers, bar filters at a coarse scale respond strongly

– For cheetahs, spots at the same scale– For grass, long narrow bars– For the leaves of trees, extended spots

• Objects with different textures can be segmented• The variation in textures is a cue to shape

Page 16: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part III: Early Vision in Multiple Images

• The geometry of multiple views– Where could it appear in camera 2 (3, etc.) given

it was here in 1 (1 and 2, etc.)?

• Stereopsis– What we know about the world from having 2 eyes

• Structure from motion– What we know about the world from having many

eyes• or, more commonly, our eyes moving.

Page 17: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part IV: Mid-Level Vision

• Finding coherent structure so as to break the image or movie into big units– Segmentation:

• Breaking images and videos into useful pieces• E.g. finding video sequences that correspond to one

shot• E.g. finding image components that are coherent in

internal appearance

– Tracking:• Keeping track of a moving object through a long

sequence of views

Page 18: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part V: High Level Vision (Geometry)

• The relations between object geometry and image geometry– Model based vision

• find the position and orientation of known objects

– Smooth surfaces and outlines• how the outline of a curved object is formed, and what it

looks like

– Aspect graphs• how the outline of a curved object moves around as you

view it from different directions

– Range data

Page 19: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Part VI: High Level Vision (Probabilistic)

• Using classifiers and probability to recognize objects– Templates and classifiers

• how to find objects that look the same from view to view with a classifier

– Relations • break up objects into big, simple parts, find the parts with a

classifier, and then reason about the relationships between the parts to find the object.

– Geometric templates from spatial relations• extend this trick so that templates are formed from relations

between much smaller parts

Page 20: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Factory Inspection

Cognex’s “CapInspect” system:

Low-level image analysis: Identify edges, regionsMid-level: Distinguish “cap” from “no cap”

Estimation: What are orientation of cap, height of liquid?

Page 21: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Face Detection

courtesy of H. Rowley

How is this like the bottle problem on the previous slide?

Page 22: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Text Detection & Recognition

from J. Zhang et al.

Similar to face finding: Where is the text and what does it say?Viewing at an angle complicates things...

Page 23: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: MRI Interpretation

Coronal slice of brain Segmented white matter from W. Wells et al.

Page 24: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Detection and Recognition: How?

• Build models of the appearance characteristics (color, texture, etc.) of all objects of interest

• Detection: Look for areas of image with sufficiently similar appearance to a particular object

• Recognition: Decide which of several objects is most similar to what we see

• Segmentation: “Recognize” every pixel

Page 25: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Football First-Down Line

courtesy of Sportvision

Page 26: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Virtual Advertising

courtesy of Princeton Video Image

Page 27: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

First-Down Line, Virtual Advertising: How?

• Where should message go?– Sensors that measure pan, tilt, zoom and focus are attached

to calibrated cameras at surveyed positions– Knowledge of the 3-D position of the line, advertising

rectangle, etc. can be directly translated into where in the image it should appear for a given camera

• What pixels get painted?– Occluding image objects like the ball, players, etc. where the

graphic is to be put must be segmented out. These are recognized by being a sufficiently different color from the background at that point. This allows pixel-by-pixel compositing.

Page 28: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Inserting Computer Graphics with a Moving Camera

How does motion complicate things?

Opening titles from the movie “Panic Room”

Page 29: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Inserting Computer Graphics with a Moving Camera

courtesy of 2d3

Page 30: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

CG Insertion with a Moving Camera: How?

• This technique is often called matchmove• Once again, we need camera calibration, but also

information on how the camera is moving—its egomotion. This allows the CG object to correctly move with the real scene, even if we don’t know the 3-D parameters of that scene.

• Estimating camera motion:– Much simpler if we know camera is moving sideways (e.g.,

some of the “Panic Room” shots), because then the problem is only 2-D

– For general motions: By identifying and following scene features over the entire length of the shot, we can solve retrospectively for what 3-D camera motion would be consistent with their 2-D image tracks. Must also make sure to ignore independently moving objects like cars and people.

Page 31: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Rotoscoping

2d3’s Pixeldust

Page 32: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Motion Capture

Vicon software:12 cameras, 41 markers for body capture;

6 zoom cameras, 30 markers for face

Page 33: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Motion Capture without Markers

courtesy of C. Bregler

What’s the difference between these two problems?

Page 34: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Motion Capture: How?

• Similar to matchmove in that we follow features and estimate underlying motion that explains their tracks

• Difference is that the motion is not of the camera but rather of the subject (though camera could be moving, too)– Face/arm/person has more degrees of freedom

than camera flying through space, but still constrained

• Special markers make feature identification and tracking considerably easier

• Multiple cameras gather more information

Page 35: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Image-Based Modeling

courtesy of P. Debevec

Façade project: UC Berkeley Campanile

Page 36: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Image-Based Modeling: How?

• 3-D model constructed from manually-selected line correspondences in images from multiple calibrated cameras

• Novel views generated by texture-mapping selected images onto model

Page 37: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Applications: Robotics

Autonomous driving: Lane & vehicle tracking (with radar)

Page 38: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Why is Vision Interesting?

• Psychology– ~ 50% of cerebral cortex is for vision.– Vision is how we experience the world.

• Engineering– Want machines to interact with world.– Digital images are everywhere.

Page 39: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Vision is inferential: Light

(http://www-bcs.mit.edu/people/adelson/checkershadow_illusion.html)

Page 40: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Vision is inferential: Light

(http://www-bcs.mit.edu/people/adelson/checkershadow_illusion.html)

Page 41: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Vision is Inferential: Geometry

knill-movie.swf plaid-movie.swf

Page 42: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Computer Vision

• Inference Computation

• Building machines that see

• Modeling biological perception

Page 43: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Boundary Detection: Local cues

Page 44: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Boundary Detection: Local cues

Page 45: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,
Page 46: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,
Page 47: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Boundary Detection

http://www.robots.ox.ac.uk/~vdg/dynamics.html

Page 48: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Boundary Detection

Finding the Corpus Callosum

(G. Hamarneh, T. McInerney, D. Terzopoulos)

Page 49: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

(Sharon, Balun, Brandt, Basri)

Page 50: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,
Page 51: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Texture

Photo

Pattern Repeated

Page 52: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Texture

Computer Generated

Photo

Page 53: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking

(Comaniciu and Meer)

Page 54: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Understanding Action

Page 55: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking and Understanding

(www.brickstream.com)

Page 56: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking

Page 57: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking

Page 58: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking

Page 59: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Tracking

Page 60: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Stereo

http://www.magiceye.com/

Page 61: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Stereo

http://www.magiceye.com/

Page 62: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Motion

Courtesy Yiannis Aloimonos

Page 63: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Motion - Application

(www.realviz.com)

Page 64: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Pose Determination

Visually guided surgery

Page 65: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Recognition - Shading

Lighting affects appearance

Page 66: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Classification

(Funkhauser, Min, Kazhdan, Chen, Halderman, Dobkin, Jacobs)

Page 67: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Viola and Jones: Real time Face Detection

Page 68: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Modeling + Algorithms

• Build a simple model of the world

(eg., flat, uniform intensity).

• Find provably good algorithms.

• Experiment on real world.

• Update model.

Problem: Too often models are simplistic or intractable.

Page 69: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Bayesian inference

• Bayes law: P(A|B) = P(B|A)*P(A)/P(B).• P(world|image) = P(image|world)*P(world)/P(image)• P(image|world) is computer graphics

– Geometry of projection.– Physics of light and reflection.

• P(world) means modeling objects in world. Leads to statistical/learning approaches.Problem: Too often probabilities can’t be known and

are invented.

Page 70: IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty,

Related Fields

• Graphics. “Vision is inverse graphics”.• Visual perception.• Neuroscience.• AI• Learning• Math: eg., geometry, stochastic processes.• Optimization.