An Introduction to Computer Vision

Embed Size (px)

Citation preview

An Introduction to Computer Vision

Matthew Dockrey June 18, 2009

Introduction

State of the art in computer vision

PASCAL Visual Objects Challenge

Semantic Robot Vision Challenge

PASCAL VOC Data

Table by Mark Everingham

PASCAL VOC Results

Table by Mark Everingham

SRVC

SRVC Results

Table by Paul E. Rybski Alexei Efros

What is computer vision?

Step 1: Image analysis/feature extraction

Step 2: Machine learning/statistical analysis

Step 3: Profit?

Scale-invariant feature transform (SIFT)

The standard image feature / descriptor

David Lowe, 2004

The algorithm is patented, but free for non-commercial use

Binaries only for original implementation, but GPL'd versions exist

SIFT Point Detection

Images by David Lowe

SIFT Point Descriptor

Image by David Lowe

SIFT Feature

SIFT =

Row location

Column location

Orientation

Scale

128 dimensional vector

SIFT Image Matching

1) Extract SIFT features

2) Match using nearest-neighbor search

3) Apply semi-local constraints

Orientation

Scale

Location

4) Huzzah!

Goal: Object Classifier

Most object classes won't share exact SIFT features

Need to abstract properties of the class into a form that we can reason with

...do we really need context?

No.

Image parts: Thomas Hawk

Image: Thomas Hawk

Bag of Words

Comes from computational linguistics, document matching

Cluster features into codebook words

(Using k-means, usually)

Image descriptor is a histogram vector counting how many times each word is seen

[5 8 14 2 12 4 3 5 11 26 1 3 ...]

Support Vector Machine (SVM)

Very popular and reasonably fast

Given a set of training vectors and their labels, will build a classifier which will give a label for any other vector

(It is trying to find a hyperplane which maximizes the margin between the classes)

Training Data

But where do we get training data?

Dangers of data sets

Background class?

Well framed?

Sufficient variation?

Bag of Words Classifier

Putting it all together

For each training image

Extract SIFT features

Cluster into codebook words

Generate the vectors

Train the SVM on these vectors

To test an image

Extract SIFT features

Generate the vector

Use the SVM to predict the class

Classifier Results

68.9% accurate!

(Not bad for a couple hours of programming...)

Confusion matrix:

Any questions?

Other Machine Learning Systems

Randomized trees/forests

Image: Wikipedia

Other Machine Learning Systems

Boosting

Image: Kihwan Kim

Viola Jones Face Detector

Image: Kihwan Kim

Summed Area Table /
Integral Image

Image: Nvidia

???Page ??? (???)06/18/2009, 14:27:38Page / Predicted / ActualNoneBike

None0.830.17

Bike0.370.63