Computer Vision II Scene Understanding Michael Yang 16/07/2015

Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Computer Vision II –

Scene Understanding

Michael Yang


Page 2: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (4 lectures)

• Object Detection (26.06)

• Image Categorization (03.07)

• Convolutional neural network (10.07)

• Scene Understanding (17.07)

• Poster, Q&A (24.07)

16/07/2015 Computer Vision II: Recognition 2

Page 3: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Slides credits

• Bernt Schiele

• Li Fei-Fei

• Rob Fergus

• Kirsten Grauman

• Derek Hoiem

• Antonio Torralba

• James Hays

• Jianxiong Xiao

• Stefan Roth

• Andreas Geiger

• Jamie Shotton

• Antonio Criminisi

• Carsten Rother

16/07/2015 Computer Vision II: Recognition 3

Page 4: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (last lecture)

• Shallow vs. deep architectures

• Convolutional neural network (CNN)

• Training CNN

• CNN for X

16/07/2015 Computer Vision II: Recognition 4

Page 5: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

“Shallow” vs. “deep” architectures

Hand-designed feature extraction

Trainable classifier

Image/ Video Pixels

Object Class

Layer 1 Layer N Simple

classifier Object Class

Image/ Video Pixels

Traditional recognition: “Shallow” architecture

Deep learning: “Deep” architecture

Page 6: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Neural Net Events

founded by

Warren McCulloch and

Walter Pitts

1943 1986

back propagation by

Rumelhart and Hinton


criticism by Minsky in his

book “Perceptron”





ImageNet classification

over millions of images


deep belief networks by


Convolutional neural networks


Page 7: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

• Neural network with specialized connectivity structure

• Stack multiple stages of feature extractors

• Higher stages compute more global, more invariant features

• Classification layer at the end

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.

Convolutional Neural Network (CNN/Convnet)

Page 8: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

• Feed-forward feature extraction:

1. Convolve input with learned filters

2. Non-linearity

3. Spatial pooling

4. Normalization

• Supervised training of convolutional filters by back-propagating classification error

Input Image

Convolution (Learned)


Spatial pooling


Feature maps

Convolutional Neural Network (CNN/Convnet)

Page 9: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

ImageNet Challenge 2012

• Similar framework to LeCun’98 but: • Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) • More data (106 vs. 103 images) • GPU implementation (50x speedup over CPU)

• Trained on two GPUs for a week • Better regularization for training (DropOut)

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Page 11: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Transfer Learning

• Improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

• Weight initialization for CNN

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks [Oquab et al. CVPR 2014]

Page 12: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

CNN for X

• Detection

• Segmentation

• Regression

• Pose estimation

• Matching patches

• Synthesis

and many more…

Beyond classification

Page 14: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (this lecture)

• Defining the Problem

• Context

• Spatial Layout

• 3D Scene Understanding

16/07/2015 Computer Vision II: Recognition 14

Page 15: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Scene Understanding

• What is goal of scene understanding: • Build machine that can see like humans to automatically

interpret the content of the images

• Comparing with traditional vision problem: • Study on larger scale

• Human vision related tasks

Page 16: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Larger Scale



















What your eyes see What a camera see Whole-room model

focal length = 35 mm

More image information. Context information.



















What your eyes see What a camera see Whole-room model

focal length = 35 mm

Page 17: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

More similar as the way that human understand the image Infer more useful information from image

Human vision related task

Page 18: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

How DO human learn?

• Bayesian Rules:

• In practice: Infer abstract knowledge based on observation

P(A | B) = P(B | A)× P(A) / P(B)

P(W | I ) = P(I |W)× P(W) / P(I )

µP(I |W) × P(W)

Likelihood: The probability of getting I given model W

Prior: The probability of W w/o seeing any observation

Posterior probability

✔ ✗ ✔ ✗

Page 19: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

• To teach human baby what is “horse”: show 3 pictures and let them learn by themselves.

• They can be very successful to learn the correct concept.

• But all the following concepts can explain the images:

• “horse” = all horse

• “horse” = all horse but not Clydesdales

• “horse” = all animal

I =


How DO human learn?

Page 20: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (this lecture)

• Defining the Problem

• Context

• Spatial Layout

• 3D Scene Understanding

16/07/2015 Computer Vision II: Recognition 20

Page 21: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context in Recognition

• Objects usually are surrounded by a scene that can provide context in the form of nearby objects, surfaces, scene category, geometry, etc.

Page 22: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Contextual Reasoning

• Definition: Making a decision based on more

than local image evidence.

Page 23: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context provides clues for function

• What is this?

Page 24: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context provides clues for function

• What is this?

• Now can you tell?

Page 25: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context provides clues for function

• once more how amazing is the visual system

Page 26: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context provides clues for function

• once more how amazing is the visual system

Page 27: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Is local information enough?

Page 28: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Is local information enough?



Local features

Contextual features

Page 29: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context in Recognition

We know there is a keyboard present in this scene even if we cannot see it clearly.

We know there is no keyboard present in this scene

… even if there is one indeed.

Page 30: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context in Recognition

Page 31: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context in Recognition

Look-Alikes by Joan Steiner

Page 32: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Context in Recognition

Biederman 1982

• Pictures shown for 150 ms

• Objects in appropriate context were detected more accurately than objects in an inappropriate context

• Scene consistency affects object detection

Page 33: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Why is context important?

• Changes the interpretation of an object (or its function)

•Context defines what an unexpected event is


Page 34: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

There are many types of context • Local pixels

• window, surround, image neighborhood, object boundary/shape, global image statistics

• 2D Scene Gist • global image statistics

• 3D Geometric • 3D scene layout, support surface, surface orientations, occlusions, contact points, etc.

• Semantic • event/activity depicted, scene category, objects present in the scene and their spatial extents,


• Photogrammetric • camera height orientation, focal length, lens distortion, radiometric, response function

• Illumination • sun direction, sky color, cloud cover, shadow contrast, etc.

• Geographic • GPS location, terrain type, land use category, elevation, population density, etc.

• Temporal • nearby frames of video, photos taken at similar times, videos of similar scenes, time of capture

• Cultural • photographer bias, dataset selection bias, visual cliches, etc. from Divvala et al. CVPR 2009

Page 35: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (this lecture)

• Defining the Problem

• Context

• Spatial Layout

• 3D Scene Understanding

16/07/2015 Computer Vision II: Recognition 35

Page 36: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Spatial layout is especially important

1. Context for recognition

Page 37: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Spatial layout is especially important

1. Context for recognition

Page 38: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Spatial layout is especially important

1. Context for recognition

2. Scene understanding

Page 39: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Spatial layout is especially important

1. Context for recognition

2. Scene understanding

3. Many direct applications

a) Assisted driving

b) Robot navigation/interaction

c) 2D to 3D conversion for 3D TV

d) Object insertion

Page 40: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Spatial Layout: 2D vs. 3D

Page 41: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by


Context in Image Space

[Kumar Hebert 2005] [Torralba Murphy Freeman 2004]

[He Zemel Cerreira-Perpiñán 2004]

Page 42: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

But object relations are in 3D…


Not Close

Slide: Derek Hoiem

Page 43: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

How to represent scene space?

Page 44: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Wide variety of possible representations

Page 45: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Wide variety of possible representations

Page 46: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Wide variety of possible representations

Page 47: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Key Trade-offs

• Level of detail: rough “gist”, or detailed point cloud?

• Precision vs. accuracy • Difficulty of inference

• Abstraction: depth at each pixel, or ground planes and walls?

• What is it for: e.g., metric reconstruction vs. navigation

Page 48: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Low detail, Low abstraction

Holistic Scene Space: “Gist”

Oliva & Torralba 2001

Torralba & Oliva 2002

Page 49: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

High detail, Low abstraction

Depth Map

Saxena, Chung & Ng 2005, 2007

Page 50: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Medium detail, High abstraction

[Hedau Hoiem Forsyth 2009]

Room as a Box

Page 51: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by




Planar (Left/Center/Right)

Non-Planar Porous

Non-Planar Solid

Surface Layout

Page 52: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by




The challenge

Page 53: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Our World is Structured

Abstract World Our World

Image Credit (left): F. Cunin and M.J. Sailor, UCSD

Page 54: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Learn the Structure of the World

Training Images

Page 55: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Unlikely Likely

Infer the most likely interpretation

Page 56: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Geometry estimation as recognition

Surface Geometry Classifier

Vertical, Planar

Training Data


Features Color

Texture Perspective


Page 57: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Surface Layout Algorithm


Features Perspective

Color Texture Position

Input Image Surface Labels

Training Data

Trained Region


[Hoiem Efros Hebert 2007]

Page 58: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Surface Layout Algorithm Multiple


[Hoiem Efros Hebert 2007]

Features Perspective

Color Texture Position

Input Image Confidence-Weighted


… Training Data

Trained Region


Final Surface Labels

Page 59: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Surface Description Result

Page 60: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by


Input Image Ground Truth Result

Page 61: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by


Input Image Ground Truth Result

Page 62: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Failures: Reflections, Rare Viewpoint

Input Image Ground Truth Result

Page 63: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Average Accuracy

Main Class: 88%

Subclasses: 61%

Page 64: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Automatic Photo Popup

Labeled Image Fit Ground-Vertical Boundary with Line


Form Segments into Polylines

Cut and Fold

Final Pop-up Model [Hoiem Efros Hebert 2005]

Page 65: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by


• Can learn to predict surface geometry from a single image

• Very rough models, much room for improvement

Page 66: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Things to remember

• Objects should be interpreted in the context of the surrounding scene

• Many types of context to consider

• Spatial layout is an important part of scene interpretation, but many open problems

• How to represent space? • How to learn and infer spatial models?

• Consider trade-offs of detail vs. accuracy and abstraction vs. quantification

Page 67: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Roadmap (this lecture)

• Defining the Problem

• Context

• Spatial Layout

• 3D Scene Understanding

16/07/2015 Computer Vision II: Recognition 67

Page 68: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Half way slide

10 Minutes break


16/07/2015 68 Computer Vision II: Recognition

Page 69: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Complete Scene Understanding


Localization of all instances of foreground objects (“things”)

Localization of all background classes (“stuff”)

Pixel-wise segmentation

3D reconstruction

Pose detection

Action recognition

Event recognition


Page 70: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Semantic Scene Understanding

We're interested in whole scene understanding Given an image, detect every thing in it.

Thing : An object with a specific size and shape.

Adelson, Forsyth et al. 96 Slides credit: Ľubor Ladický

Page 71: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Semantic Scene Understanding

We're interested in whole scene understanding Given an image, label all the stuff

Stuff : Material defined by a homogeneous or repetitive pattern, with no specific spatial extent / shape. Adelson, Forsyth et al. 96

Page 72: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Combining Object Detectors and CRFs

Why not combine?

– State of the art sliding window object detection

– State of the art segmentation techniques


Page 73: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Algorithms for Object Localization

Sliding window detectors

HOG descriptor (Dalal & Triggs CVPR05)

Based on histograms of features (Vedaldi et al. ICCV09)

Part-based models (Felzenszwalb et al. CVPR09)

Page 74: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Sliding window detectors

• Sliding window + Segmentation

– OBJCUT (Kumar et al. 05)

– Updating colour model (GrabCut - Rother et al. 04)


Page 75: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Sliding window detectors

Sliding window detectors not good for “stuff”


Sky is irregular shape not suited to the sliding window approach

Page 76: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Algorithms for Object-class Segmentation Pairwise CRF over pixels


Shotton et al. ECCV06

Input image

Final segmentation

Training of Potentials

CRF construction

Page 77: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments

Batra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08, Rabinovich et al. ICCV07, Boix et al. CVPR10

Input image

Final segmentation

Training of potentials

Unsupervised segmentation


Page 78: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Algorithms for Object-class Segmentation Associative Hierarchical CRF

Ladický et al. ICCV09, Russell et al. UAI10

Input image

Final segmentation

Multiple segmentations or hierarchies


CRF construction

Page 79: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

CRF Formulation with Detectors

CRF formulation altered with a potential for each detection

Set of pixels of d-th detection

Classifier response

Detected label

CRF graph over pixels

AH-CRF energy without detectors

Page 80: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

CRF Formulation with Detectors

Joint CRF formulation should contain

• Possibility to reject detection hypothesis

• Recover the status of the detection (0 / 1)

Thus, potential is a minimum over indicator variable yd { 0, 1 }

CRF graph over pixels

Indicator variables

Page 81: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Results on CamVid dataset

Result without detections Set of detections Final Result

Brostow et al.

Sturgess et al.

Brostow et al. ECCV08, Sturgess et al. BMVC09

Page 82: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Results on CamVid dataset

Result without detections

Set of detections Final Result

Also provides number of object instances (using yd’s)

Page 83: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Results on VOC2009 dataset

Input image CRF without detectors

CRF with detectors

Input image CRF without detectors

CRF with detectors

Page 84: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

3D Traffic Scene Understanding

KITTI (video)

3D Traffic Scene Understanding

from Movable Platforms

Andreas Geiger

Page 85: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

3D Traffic Scene Understanding

•Goal: Infer from short video sequences (moving observer) •Topology and geometry of the scene •Semantic information (traffic situation)

•Probabilistic generative model of 3D urban scenes

Page 86: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Topology and Geometry Model

Page 87: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Image Evidence

Image Evidence E = {T ; V; S; F;O}

Page 88: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Probabilistic Graphical Model

Page 89: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Probabilistic Graphical Model

Vehicle Tracklets

•Object detection [Felzenszwalb et al. 2010] •Associate objects over time (tracking by detection) •Projection to 3D object tracklet t = {d1, … , d} (d captures the object location and orientation)

Page 90: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Probabilistic Graphical Model

Vanishing Points

Page 91: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Probabilistic Graphical Model

Semantic Labels

Page 92: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Probabilistic Graphical Model

Occupancy, Scene Flow

Page 93: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by


Page 94: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Experimental Results

Experiments •113 sequences 5-30 seconds (9438 frames) •Best results when combining all feature cues •Most important: Occupancy grid, tracklets, 3D scene flow •Less important: Semantic labels, vanishing points

Metrics •Topology Accuracy: 92.0% •Location Error: 3.0 m •Street Orientation Error: 3.0 •Tracklet-to-Lane Accuracy: 82.0% •Vehicle Orientation Error: 14.0

Page 95: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Experimental Results

Page 96: Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

3D Scene Understanding

• Defining the Problem

• Context

• Spatial Layout

• 3D Scene Understanding

16/07/2015 Computer Vision II: Recognition 96