43
16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros [email protected] , 225 Smith Hall Leon Sigal [email protected] m , Disney Research Pittsburgh Web Page: http://www.cs.cmu.edu/~efros/course s/LBMV12/

16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros [email protected], 225 Smith Hall [email protected] Leon Sigal [email protected],

Embed Size (px)

Citation preview

Page 1: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

16-824: Learning-based Methods in Vision

Instructors:• Alexei (Alyosha) Efros

[email protected], 225 Smith Hall

• Leon Sigal [email protected], Disney Research Pittsburgh

Web Page:• http://www.cs.cmu.edu/~efros/courses/LBMV12

/

Page 2: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Today

Introduction

Why This Course?

Administrative stuff

Overview of the course

Page 3: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Alexei (Alyosha) Efros

Ph.D 2003, from UC Berkeley (signed by Arnie!)

Postdoctoral Fellow, University of Oxford, ’03-’04

Research Interests:

Vision, Graphics, Data-driven “stuff”

Leonid Sigal

PhD 2007, from Brown University

Postdoctoral Fellow, University of Toronto, ’07-’09

Research interests:

Vision, Graphics, Machine Learning

A bit about Us

Page 4: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Why this class?

The Old Days™:

1. Graduate Computer Vision

2. Advanced Machine Perception

Page 5: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Why this class?

The New and Improved Days:

1. Graduate Computer Vision

2. Advanced Machine Perception• Physics-based Methods in Vision• Geometry-based Methods in Vision• Learning-based Methods in Vision

Page 6: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.

The Hip & Trendy Learning

Page 7: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Learning as Last Resort

Page 8: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Learning as Last Resort

from [Sinha and Adelson 1993]

EXAMPLE: Recovering 3D geometry from

single 2D projection

Infinite number of possible solutions!

Page 9: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Learning-based Methods in Vision

This class is about trying to solve problems that do not have a solution! • Don’t tell your mathematician frineds!

This will be done using Data:• E.g. what happened before is likely to happen again• Google Intelligence (GI): The AI for the post-modern world!• Note: this is not quite statistics

Why is this even worthwhile?• Even a decade ago at ICCV99 Faugeras claimed it wasn’t!

Page 10: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

The Vision Story Begins…

“What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking.”

-- David Marr, Vision (1982)

Page 11: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Vision: a split personality“What does it mean, to see? The plain man's answer (and

Aristotle's, too). would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.”

Answer #1: pixel of brightness 243 at position (124,54)

…and depth .7 meters

Answer #2: looks like bottom edge of whiteboard showing at the top of the image

Which do we want?

Is the difference just a matter of scale?

depth map

Page 12: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Measurement vs. Perception

Page 13: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Brightness: Measurement vs. Perception

Page 14: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Brightness: Measurement vs. Perception

Proof!

Page 15: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Lengths: Measurement vs. Perception

http://www.michaelbach.de/ot/sze_muelue/index.html

Müller-Lyer Illusion

Page 16: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Vision as Measurement Device

Real-time stereo on Mars

Structure from Motion

Physics-based Vision

Virtualized Reality

Page 17: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

…but why do Learning for Vision?“What if I don’t care about this wishy-washy human

perception stuff? I just want to make my robot go!”

Small Reason: • For measurement, other sensors are often better (in DARPA

Grand Challenge, vision was barely used!)• For navigation, you still need to learn!

Big Reason:

The goals of computer vision (what + where) are in terms of what humans care about.

Page 18: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

So what do humans care about?

slide by Fei Fei, Fergus & Torralba

Page 19: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Verification: is that a bus?

slide by Fei Fei, Fergus & Torralba

Page 20: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Detection: are there cars?

slide by Fei Fei, Fergus & Torralba

Page 21: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Identification: is that a picture of Mao?

slide by Fei Fei, Fergus & Torralba

Page 22: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Object categorization

sky

building

flag

wallbanner

bus

cars

bus

face

street lamp

slide by Fei Fei, Fergus & Torralba

Page 23: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Scene and context categorization• outdoor

• city

• traffic

• …

slide by Fei Fei, Fergus & Torralba

Page 24: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Rough 3D layout, depth ordering

slide by Fei Fei, Fergus & Torralba

Page 25: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 1: view point variation

Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba

Page 26: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 2: illumination

slide credit: S. Ullman

Page 27: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 3: occlusion

Magritte, 1957 slide by Fei Fei, Fergus & Torralba

Page 28: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 4: scale

slide by Fei Fei, Fergus & Torralba

Page 29: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 5: deformation

Xu, Beihong 1943slide by Fei Fei, Fergus & Torralba

Page 30: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 6: background clutter

Klimt, 1913 slide by Fei Fei, Fergus & Torralba

Page 31: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

Page 32: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba

Page 33: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Challenges 9: the world behind the image

Page 34: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

In this course, we will:

Take a few baby steps…

Page 35: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Role of Learning

Learning Algorithm

Features

Data

Page 36: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Role of Learning

DataFeatures

Algorithm

Shashua

Page 37: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Course Outline

• Overview of Learning for Vision (1 lecture)• Overview of Data for Vision (1 lecture)• Features

• Human Perception and visual neuroscience• Theories of Human Vision• Low-level Vision

• Filters, edge detection, interest points, etc.

• Mid-level Vision• Segmentation, Occlusions, 2-1/2D, scene layout, etc.

• High-Level Vision• Object recognition

• Scene Understanding

• Action / Motion Understaing

• Etc.

Page 38: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Goals

Read some interesting papers together• Learn something new: both you and us!

Get up to speed on big chunk of vision research• understand 70% of CVPR papers!

Use learning-based vision in your own work

Learn how to speakLearn how think critically about papersParticipate in an exciting meta-study!

Page 39: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Course Organization

Requirements:1. Class Participation (33%)

• Keep annotated bibliography• Post on the Class Blog before each class • Ask questions / debate / flight / be involved!

2. Two Projects (66%)• Deconstruction Project

• Implement and Evaluate a paper and present it in class• Must talk to us AT LEAST 2 weeks beforehand!• Can be done in groups of 2 (but must do 2 projects)

• Synthesis Project• Do something worthwhile with what you learned for

Deconstruction Project• Can be done in groups of 2 (1 project)

Page 40: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Class ParticipationKeep annotated bibliography of papers you read (always

a good idea!). The format is up to you. At least, it needs to have:• Summary of key points• A few Interesting insights, “aha moments”, keen observations,

etc.• Weaknesses of approach. Unanswered questions. Areas of

further investigation, improvement.

Before each class:• Submit your summary for current paper(s) in

hard copy (printout/xerox)• Submit a comment on the Class Blog

• ask a question, answer a question, post your thoughts,praise, criticism, start a discussion, etc.

Page 41: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Deconstruction Project1. Pick a paper / set of papers from the list2. Understand it as if you were the author

• Re-implement it• If there is code, understand the code completely• Run it on data the same data (you can contact authors for data and

even code sometimes)

3. Understand it better than the author• Run it on two other data sets (e.g. LabelMe dataset, Flickr dataset,

etc, etc)• Run it with two other feature representations• Run it with two other learning algorithms• Maybe suggest directions for improvement.

4. Prepare an amazing 45min presentation• Discuss with me twice – once when you start the project, 3 days

before the presentation

Page 42: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

Synthesis Project

Hopefully can grow out of the deconstruction project

2 people can work on one

Page 43: 16-824: Learning-based Methods in Vision Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall efros@cs.cmu.edu Leon Sigal lsigal@disneyresearch.com,

End of Semester Awards

We will vote for:• Best Deconstruction Project• Best Synthesis Project• Best Blog Comment

Prize: dinner in a French restaurant in Paris (transportation not included!) or some other worthy prizes