Upload
reese-liston
View
235
Download
1
Tags:
Embed Size (px)
Citation preview
16-824: Learning-based Methods in Vision
Instructors:• Alexei (Alyosha) Efros
[email protected], 225 Smith Hall
• Leon Sigal [email protected], Disney Research Pittsburgh
Web Page:• http://www.cs.cmu.edu/~efros/courses/LBMV12
/
Today
Introduction
Why This Course?
Administrative stuff
Overview of the course
Alexei (Alyosha) Efros
Ph.D 2003, from UC Berkeley (signed by Arnie!)
Postdoctoral Fellow, University of Oxford, ’03-’04
Research Interests:
Vision, Graphics, Data-driven “stuff”
Leonid Sigal
PhD 2007, from Brown University
Postdoctoral Fellow, University of Toronto, ’07-’09
Research interests:
Vision, Graphics, Machine Learning
A bit about Us
Why this class?
The Old Days™:
1. Graduate Computer Vision
2. Advanced Machine Perception
Why this class?
The New and Improved Days:
1. Graduate Computer Vision
2. Advanced Machine Perception• Physics-based Methods in Vision• Geometry-based Methods in Vision• Learning-based Methods in Vision
Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.
The Hip & Trendy Learning
Learning as Last Resort
Learning as Last Resort
from [Sinha and Adelson 1993]
EXAMPLE: Recovering 3D geometry from
single 2D projection
Infinite number of possible solutions!
Learning-based Methods in Vision
This class is about trying to solve problems that do not have a solution! • Don’t tell your mathematician frineds!
This will be done using Data:• E.g. what happened before is likely to happen again• Google Intelligence (GI): The AI for the post-modern world!• Note: this is not quite statistics
Why is this even worthwhile?• Even a decade ago at ICCV99 Faugeras claimed it wasn’t!
The Vision Story Begins…
“What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking.”
-- David Marr, Vision (1982)
Vision: a split personality“What does it mean, to see? The plain man's answer (and
Aristotle's, too). would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.”
Answer #1: pixel of brightness 243 at position (124,54)
…and depth .7 meters
Answer #2: looks like bottom edge of whiteboard showing at the top of the image
Which do we want?
Is the difference just a matter of scale?
depth map
Measurement vs. Perception
Brightness: Measurement vs. Perception
Brightness: Measurement vs. Perception
Proof!
Lengths: Measurement vs. Perception
http://www.michaelbach.de/ot/sze_muelue/index.html
Müller-Lyer Illusion
Vision as Measurement Device
Real-time stereo on Mars
Structure from Motion
Physics-based Vision
Virtualized Reality
…but why do Learning for Vision?“What if I don’t care about this wishy-washy human
perception stuff? I just want to make my robot go!”
Small Reason: • For measurement, other sensors are often better (in DARPA
Grand Challenge, vision was barely used!)• For navigation, you still need to learn!
Big Reason:
The goals of computer vision (what + where) are in terms of what humans care about.
So what do humans care about?
slide by Fei Fei, Fergus & Torralba
Verification: is that a bus?
slide by Fei Fei, Fergus & Torralba
Detection: are there cars?
slide by Fei Fei, Fergus & Torralba
Identification: is that a picture of Mao?
slide by Fei Fei, Fergus & Torralba
Object categorization
sky
building
flag
wallbanner
bus
cars
bus
face
street lamp
slide by Fei Fei, Fergus & Torralba
Scene and context categorization• outdoor
• city
• traffic
• …
slide by Fei Fei, Fergus & Torralba
Rough 3D layout, depth ordering
slide by Fei Fei, Fergus & Torralba
Challenges 1: view point variation
Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957 slide by Fei Fei, Fergus & Torralba
Challenges 4: scale
slide by Fei Fei, Fergus & Torralba
Challenges 5: deformation
Xu, Beihong 1943slide by Fei Fei, Fergus & Torralba
Challenges 6: background clutter
Klimt, 1913 slide by Fei Fei, Fergus & Torralba
Challenges 7: object intra-class variation
slide by Fei-Fei, Fergus & Torralba
Challenges 8: local ambiguity
slide by Fei-Fei, Fergus & Torralba
Challenges 9: the world behind the image
In this course, we will:
Take a few baby steps…
Role of Learning
Learning Algorithm
Features
Data
Role of Learning
DataFeatures
Algorithm
Shashua
Course Outline
• Overview of Learning for Vision (1 lecture)• Overview of Data for Vision (1 lecture)• Features
• Human Perception and visual neuroscience• Theories of Human Vision• Low-level Vision
• Filters, edge detection, interest points, etc.
• Mid-level Vision• Segmentation, Occlusions, 2-1/2D, scene layout, etc.
• High-Level Vision• Object recognition
• Scene Understanding
• Action / Motion Understaing
• Etc.
Goals
Read some interesting papers together• Learn something new: both you and us!
Get up to speed on big chunk of vision research• understand 70% of CVPR papers!
Use learning-based vision in your own work
Learn how to speakLearn how think critically about papersParticipate in an exciting meta-study!
Course Organization
Requirements:1. Class Participation (33%)
• Keep annotated bibliography• Post on the Class Blog before each class • Ask questions / debate / flight / be involved!
2. Two Projects (66%)• Deconstruction Project
• Implement and Evaluate a paper and present it in class• Must talk to us AT LEAST 2 weeks beforehand!• Can be done in groups of 2 (but must do 2 projects)
• Synthesis Project• Do something worthwhile with what you learned for
Deconstruction Project• Can be done in groups of 2 (1 project)
Class ParticipationKeep annotated bibliography of papers you read (always
a good idea!). The format is up to you. At least, it needs to have:• Summary of key points• A few Interesting insights, “aha moments”, keen observations,
etc.• Weaknesses of approach. Unanswered questions. Areas of
further investigation, improvement.
Before each class:• Submit your summary for current paper(s) in
hard copy (printout/xerox)• Submit a comment on the Class Blog
• ask a question, answer a question, post your thoughts,praise, criticism, start a discussion, etc.
Deconstruction Project1. Pick a paper / set of papers from the list2. Understand it as if you were the author
• Re-implement it• If there is code, understand the code completely• Run it on data the same data (you can contact authors for data and
even code sometimes)
3. Understand it better than the author• Run it on two other data sets (e.g. LabelMe dataset, Flickr dataset,
etc, etc)• Run it with two other feature representations• Run it with two other learning algorithms• Maybe suggest directions for improvement.
4. Prepare an amazing 45min presentation• Discuss with me twice – once when you start the project, 3 days
before the presentation
Synthesis Project
Hopefully can grow out of the deconstruction project
2 people can work on one
End of Semester Awards
We will vote for:• Best Deconstruction Project• Best Synthesis Project• Best Blog Comment
Prize: dinner in a French restaurant in Paris (transportation not included!) or some other worthy prizes