Pixel level localizationaz/icvss08_az_spatial.pdf · 2009-10-11 · Spatial Pyramid with optimization Spatial pyramid with feature combinationSpatial Pyramid with optimization Lazebnik

Outline

1. Bag of visual words model for categorization

• SVM classifier

2. Adding spatial information for localization

3. Databases and challenges

4. Spatial layout

5. Class based segmentation

• Pixel level localization

6. Conclusions and the future

Beyond a bag of visual words

Outline

• Region of Interest (ROI)

• jumping/sliding window for localization

• Spatial tiling

• Histogram of Gradients (HOG)

• Spatial pyramid

• Case study

• Problem of background clutter • Use a sub-window

– At correct position, no clutter is present

Region of Interest (ROI)

– Scale / orientation range to search over – Speed– Context

Sliding window detection

search over scale

SmallestScale

LargerScale

Problems with sliding windows …

• aspect ratio

• granuality (finite grid)

• partial occlusion

• multiple responses

See recent work by

• Christoph Lampert et al CVPR 08, ECCV 08

• Bosch et al BMVC 08

Sliding window• Classifier: SVM with linear kernel

• bag of visual word representation of ROI

• Stronger training: ROI on object instance

Example detections for dog

Lampert et al CVPR 08

More spatial information - tilingUse spatial grid to define correspondence

If codebook has V visual words, then representation has dimension 4V

Fergus et al ICCV 05

• parameter: number of tiles

Ex: Leibe & Schiele 03/04 : Generalized Hough Transform

• Learning: for every cluster, store possible “occurrences”

• Recognition: for new image, let the matched patches vote for possible object positions

Voting Space(continuous)

Interest Points Matched Codebook Entries

Probabilistic Voting

Backprojectionof Maximum

Ex: Leibe & Schiele 03/04 : Generalized Hough Transform

More features – histogram of orientations

Counts in orientation bins can be thought of as visual words

imagedominant direction HOG

frequ

ency

orientation

• tiling

• each tile represents HOG

• dense descriptor

Ex 1: Human (Pedestrian) DetectionHistograms of Oriented Gradients for Human DetectionDalal & Triggs, CVPR 2005

• training: ROI over pedestrian

• classification: linear SVM on HOG

• NB similarity to SIFT, GIST

Dalal and Triggs, CVPR 2005

Learned model

Slide from Deva Ramanan

f(x) = w>x+ b

Slide from Deva Ramanan

Ex 2: Upper body detector – using HOGs

• Ferrari et al CVPR 08

average training data

More features – dense visual words

Vogel & Schiele 2004, Jurie & Triggs ICCV 05 , Fei-Fei & Perona CVPR 05, Bosch et al ECCV 06

DENSE PATCHES

…

…

Row reorder gray valuesand form a vector ofsize N2

Parameters: N – size of patchM – distance between patches

Textons

… …… …… …

128- SIFT descriptor

Parameters: r – radi of patchM – distance between patches

SIFT

Luong & Malik 1999, Varma & Zisserman 2003

More Spatial information – Pyramid kernels

• Divide image into grids of varying resolution, and give more weight to agreement in finer grids.– 2^l grids at level l

• Intersect histograms, multiply by weight.

Lazebnik et al. [CVPR 2006]

Based on Grauman & Darrell ICCV 05

Spatial Pyramid Kernels for Geometry/Appearance Matching(Lazebnik et al CVPR’06)

More Spatial information – Pyramid kernels

9 matches x 1

= 9

9 matches x 1

= 9

9 matches x 1 4 matches x ½

= 9 = 2

9 matches x 1 4 matches x ½

= 9 = 2

9 matches x 1 4 matches x ½ 2 matches x ¼

= 9 = 2 = 1/2

Total matching weight (value of spatial pyramid kernel ): 9 + 2 + 0.5 = 11.5

l = 0 l = 1 l = 2

+ +

Pyramid spatial layout for appearance patches – for images

BoW 4BoW 16BoW

Represent appearance as dense grid of visual words

l = 1

l = 0

x y

Generalizations• Use chi-squared kernel instead of histogram intersection

Kf(i, j) =Xl∈L

βlfe−μχ2(hlf(i),hlf(j))

l = 0 l = 1 l = 2

1HOG 4HOG 16HOG

Pyramid HOG – for imagesRepresent local orientated gradients

Pyramid HOG for image regions0 1 LEVEL 2

Pyramid HOG for image regions0 1 LEVEL 2

Case study: scene classification

Suburb Bedroom Kitchen Living room Office

Coast Forest Mountain Open country River Sky/clouds

Vog

el&

Sch

iele

-VS

FeiF

ei&

Per

ona

-FP

Coast Forest Mountain Open country Highway Inside city Tall building Street

Oliv

a &

Tor

ralb

a -O

T

Laze

bnik

et a

l. -L

SP

Store Industrial

Vogel & Schiele DATASET

Coast Forest Mountain

Open country River Sky/clouds

702 images VS dataset6 categories

Oliva & Torralba DATASET

Coast Forest Mountain Open country

Highway Inside city Tall building Street

2688 images8 categories

OT dataset


FP dataset

Fei-Fei & Perona DATASET



Lazebnik, Schmid & Ponce DATASET



Store Industrial4385 images15 categories

LSP dataset

Mulit-way Classification

• For each class ‘c’ learn a 1-vs-rest SVM classifier

• Classification of test image I according to:

• where is the distance for the SVM for class c

c∗ = argmaxcDc(I)

Dc(I)

VisualVocabulary

w1

w2

w3w4

……

……

Features

• bag of visual words

• HOG

• spatial pyramid of visual words

• spatial pyramid HOG

Parameters• vocabulary size V

• level weightings

• feature combination weights

l = 0 l = 1 l = 2

Dataset

Validation set

100

Learn themodels

Others

½ Training ½ Testing

Classification

Methodology for learning parameter values

• optimize classification performance on a validation set

• 1 vs rest SVM classifier

SVM-BOW

60

65

70

75

80

85

90

200 500 1000 1500 2000 2500 5000

V

Per

form

ance

(%)

SVM-BOW

Optimize vocabulary size V on validation set


OT dataset

Spatial Pyramid with optimizationSpatial Pyramid with optimizationSpatial pyramid with learnt weights

without optimization:

with optimization:

Kf(i, j) =Xl∈L

βlfe−μχ2(hlf(i),hlf(j))

• Learn level weights – linear combination of kernels

level weightbase kernel

dense visual words

• if weights common to all classes:

β0 = 0.25, β1 = 0.25, β2 = 0.5

β0 : β2 = 1.0, β1 : β2 = 0.8

Optimized values for each datasetSame number of Images as authors

SVM one against all – x2 kernel for visual words Up to L = 2 for spatial pyramid

88.0

100

89.2

78.3

93.6

91.1

90.8

97.5

Coast

Forest

Mountain

Op. Country

Highway

Inside City

Street

Tall building

Ope

nco

untry

Mou

ntai

ns

Coa

st

Ope

nco

untry

Hig

hway

Stre

et

Spatial Pyramid with optimizationSpatial Pyramid with optimizationSpatial pyramid with feature combination

Lazebnik et al.dense visual words

dense visual words optimized

dense visual words & HOG optimized

81.1 83.5 90.2

Kopt(i, j) =Xf∈F

dfKf(i, j)

• feature weights – linear combination of kernels

feature weightfeature kernel

• dense visual words

• HOG

Take home messages

• Lite use of spatial information

• tiling, spatial pyramid

• Combination of features

• visual words, sparse, dense, HOG

• Learn parameters on validation set

More classifiers …• SVM Classifier

• good performance

• convex optimization

• Logistic regression

• Adaboost

• e.g. used by Viola & Jones face detector

• slow to learn, fast to test

• Random forests

• fast to learn, fast to test

• Jamie Shotton tutorial

Documents

Pixel level localizationaz/icvss08_az_spatial.pdf · 2009-10-11 · Spatial Pyramid with optimization Spatial pyramid with feature combinationSpatial Pyramid with optimization Lazebnik