View
5
Download
0
Category
Preview:
Citation preview
COMP 776: Computer VisionCOMP 776: Computer Vision
TodayToday
I t d ti t t i i• Introduction to computer vision• Course overview• Course requirements• Course requirements
The goal of computer visionThe goal of computer vision
T b id th b t i l d “ i ”• To bridge the gap between pixels and “meaning”
What we see What a computer seesSource: S. Narasimhan
What kind of information can we extract What kind of information can we extract from an image?from an image?
M t i 3D i f ti• Metric 3D information• Semantic information
Vision as measurement deviceVision as measurement device
Real-time stereo Structure from motionReconstruction from
Internet photo collections
NASA Mars RoverNASA Mars Rover
Pollefeys et al. Goesele et al.
Vision as a source of semantic information
slide credit: Fei-Fei, Fergus & Torralba
Object categorization
sky
building
flag
wallbannerface
bus busstreet lamp
cars slide credit: Fei-Fei, Fergus & Torralba
Scene and context categorizationtd• outdoor
• city• traffic• …
slide credit: Fei-Fei, Fergus & Torralba
Qualitative spatial information
slantedslanted
non-rigid moving objectobject
rigid moving
vertical
rigid movingrigid moving object
horizontal slide credit: Fei-Fei, Fergus & Torralba
rigid moving object
Why study computer vision?Why study computer vision?• Vision is useful: Images and video are everywhere!Vision is useful: Images and video are everywhere!
Personal photo albums Movies, news, sports
Surveillance and security Medical and scientific images
Why study computer vision?Why study computer vision?
• Vision is useful• Vision is interesting• Vision is difficult
– Half of primate cerebral cortex is devoted to visual processing– Achieving human-level visual perception is probably “AI-complete”Achieving human level visual perception is probably AI complete
Why is computer vision difficult?Why is computer vision difficult?
Challenges: viewpoint variation
Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba
Challenges: illumination
image credit: J. Koenderink
Challenges: scale
slide credit: Fei-Fei, Fergus & Torralba
Challenges: deformation
Xu, Beihong 1943
slide credit: Fei-Fei, Fergus & Torralba
Challenges: occlusion
Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba
Challenges: background clutter
Challenges: Motion
Challenges: object intra-class variationvariation
slide credit: Fei-Fei, Fergus & Torralba
Challenges: local ambiguity
slide credit: Fei-Fei, Fergus & Torralba
Challenges or opportunities?Challenges or opportunities?
I f i b t th l l th t t f• Images are confusing, but they also reveal the structure of the world through numerous cues
• Our job is to interpret the cues!Our job is to interpret the cues!
Image source: J. Koenderink
Depth cues: Linear perspectiveDepth cues: Linear perspective
Depth cues: Aerial perspectiveDepth cues: Aerial perspective
Depth ordering cues: OcclusionDepth ordering cues: Occlusion
Source: J. Koenderink
Shape cues: Texture gradientShape cues: Texture gradient
Shape and lighting cues: ShadingShape and lighting cues: Shading
Source: J. Koenderink
Position and lighting cues: Cast shadowsPosition and lighting cues: Cast shadows
Source: J. Koenderink
Grouping cues: Similarity (color, texture,Grouping cues: Similarity (color, texture,proximity)proximity)p y)p y)
Grouping cues: “Common fate”Grouping cues: “Common fate”
Image credit: Arthus-Bertrand (via F. Durand)
Bottom lineBottom line
P ti i i h tl bi bl• Perception is an inherently ambiguous problem– Many different 3D scenes could have given rise to a particular 2D picture
Bottom lineBottom line
P ti i i h tl bi bl• Perception is an inherently ambiguous problem– Many different 3D scenes could have given rise to a particular 2D picture
• Possible solutions– Bring in more constraints (more images)– Use prior knowledge about the structure of the worldUse prior knowledge about the structure of the world
• Need a combination of different methods
Connections to other disciplinesConnections to other disciplines
Artificial Intelligence
Machine LearningRobotics
Computer Vision
Cognitive scienceNeuroscience
Computer Graphics
Image Processing
Neuroscience
Origins of computer visionOrigins of computer vision
L. G. Roberts, Machine Perception of Three Dimensional Solids,Ph.D. thesis, MIT Department of pElectrical Engineering, 1963.
Computer Vision in the Real World
Special effects: shape and motion capture
Source: S. Seitz
3D urban modeling
Bing maps, Google StreetviewBing maps, Google Streetview
Source: S. Seitz
3D urban modeling: Microsoft Photosynth
http://labs.live.com/photosynth/ Source: S. Seitz
Face detection
Many new digital cameras now detect faces• Canon, Sony, Fuji, …
Source: S. Seitz
Smile detection
Sony Cyber-shot® T70 Digital Still Camera Source: S. Seitz
Face recognition: Apple iPhoto software
http://www.apple.com/ilife/iphoto/
Biometrics
How the Afghan Girl was Identified by Her Iris Patterns
Source: S. Seitz
Biometrics
Fingerprint scanners on many new laptops
Face recognition systems now beginning to appear more widely
htt // ibl i i /many new laptops, other devices
http://www.sensiblevision.com/
Source: S. Seitz
Optical character recognition (OCR)
Technology to convert scanned docs to text• If you have a scanner, it probably came with OCR software
Digit recognition, AT&T labs License plate readershttp://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Source: S. Seitz
Mobile visual search: Google Goggles
Mobile visual search: iPhone Apps
Automotive safety
Mobileye: Vision systems in high-end BMW, GM, Volvo models • “In mid 2010 Mobileye will launch a world's first application of full
emergency braking for collision mitigation for pedestrians whereemergency braking for collision mitigation for pedestrians where vision is the key technology for detecting pedestrians.”
Source: A. Shashua, S. Seitz
Vision in supermarkets
LaneHawk by EvolutionRobotics“A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items thatWhen an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it… “
Source: S. Seitz
Vision-based interaction (and games)
Sony EyeToy
Nintendo Wii has camera-based IRtracking built in. See Lee’s work attracking built in. See Lee s work atCMU on clever tricks on using it tocreate a multi-touch display!
Source: S. Seitz
Assistive technologies
Vision for robotics, space exploration
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
Vision systems (JPL) used for several tasks
a low plateau where Spirit spent the closing months of 2007.
y ( )• Panorama stitching• 3D terrain modeling
Obstacle detection position tracking• Obstacle detection, position tracking• For more, read “Computer Vision on Mars” by Matthies et al.
Source: S. Seitz
The computer vision industryThe computer vision industry
A li t f i h• A list of companies here:
http://www.cs.ubc.ca/spider/lowe/vision.htmlp p
Course overviewCourse overview
I E l i i I f ti d iI. Early vision: Image formation and processingII. Mid-level vision: Grouping and fittingIII Multi view geometryIII. Multi-view geometryIV. RecognitionV. Advanced topicsd a ced top cs
I. Early visionI. Early vision
B i i f ti d i• Basic image formation and processing
* =
Cameras and sensorsLight and color
Linear filteringEdge detection
Light and color
Feature extraction: corner and blob detection
II. “MidII. “Mid--level vision”level vision”
Fitti d i• Fitting and grouping
Alignment
Fitting: Least squaresHough transformHough transform
RANSAC
III. MultiIII. Multi--view geometryview geometry
Stereo Epipolar geometry
Tomasi & Kanade (1993)
Projective structure from motionAffine structure from motion
IV. RecognitionIV. Recognition
Patch description and matching Clustering and visual vocabularies
Bag-of-features models ClassificationBag of features models
Sources: D. Lowe, L. Fei-Fei
V. Advanced TopicsV. Advanced TopicsTi itti• Time permitting…
Segmentation Face detection
Articulated models Motion and tracking
Basic InfoBasic Info
• Instructor: Svetlana Lazebnik (lazebnik@cs.unc.edu)
• Office hours: By appointment FB 244By appointment, FB 244
• Textbooks (suggested): Forsyth & Ponce, Computer Vision: A Modern ApproachRichard Szeliski, Computer Vision: Algorithms and Applications (draft available online)Applications (draft available online)
• Class webpage:// / / 10http://www.cs.unc.edu/~lazebnik/spring10
Course requirementsCourse requirements• Philosophy: computer vision is best experienced hands-onPhilosophy: computer vision is best experienced hands on
• Programming assignments: 50%– Four assignments– Expect the first one in the next couple of classes– Brush up on your MATLAB skills (see web page for tutorial)
• Final assignment: 30% – Recognition competitionRecognition competition– Winner gets a prize!
• Participation: 20%• Participation: 20% – Come to class regularly– Ask questions
A ti– Answer questions
Collaboration policyCollaboration policy
F l f t di i t ith h th b t di• Feel free to discuss assignments with each other, but coding must be done individually
• Feel free to incorporate code or tips you find on the Web, provided this doesn’t make the assignment trivial and you explicitly acknowledge your sourcesexplicitly acknowledge your sources
• Remember: I can Google too (and I have the copies of g ( peverybody’s assignments from the last two years this class was offered)
For next timeFor next time
S lf t d MATLAB t t i l• Self-study: MATLAB tutorial • Reading: cameras and image formation (F&P chapter 1)
Recommended