Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Outline
1. Bag of visual words model for categorization
• SVM classifier
2. Adding spatial information for localization
3. Databases and challenges
4. Spatial layout
5. Class based segmentation
• Pixel level localization
6. Conclusions and the future
Beyond a bag of visual words
Outline
• Region of Interest (ROI)
• jumping/sliding window for localization
• Spatial tiling
• Histogram of Gradients (HOG)
• Spatial pyramid
• Case study
• Problem of background clutter • Use a sub-window
– At correct position, no clutter is present
Region of Interest (ROI)
– Scale / orientation range to search over – Speed– Context
Sliding window detection
search over scale
SmallestScale
LargerScale
Problems with sliding windows …
• aspect ratio
• granuality (finite grid)
• partial occlusion
• multiple responses
See recent work by
• Christoph Lampert et al CVPR 08, ECCV 08
• Bosch et al BMVC 08
Sliding window• Classifier: SVM with linear kernel
• bag of visual word representation of ROI
• Stronger training: ROI on object instance
Example detections for dog
Lampert et al CVPR 08
More spatial information - tilingUse spatial grid to define correspondence
If codebook has V visual words, then representation has dimension 4V
Fergus et al ICCV 05
• parameter: number of tiles
Ex: Leibe & Schiele 03/04 : Generalized Hough Transform
• Learning: for every cluster, store possible “occurrences”
• Recognition: for new image, let the matched patches vote for possible object positions
Voting Space(continuous)
Interest Points Matched Codebook Entries
Probabilistic Voting
Backprojectionof Maximum
Ex: Leibe & Schiele 03/04 : Generalized Hough Transform
More features – histogram of orientations
Counts in orientation bins can be thought of as visual words
imagedominant direction HOG
frequ
ency
orientation
• tiling
• each tile represents HOG
• dense descriptor
Ex 1: Human (Pedestrian) DetectionHistograms of Oriented Gradients for Human DetectionDalal & Triggs, CVPR 2005
• training: ROI over pedestrian
• classification: linear SVM on HOG
• NB similarity to SIFT, GIST
Dalal and Triggs, CVPR 2005
Learned model
Slide from Deva Ramanan
f(x) = w>x+ b
Slide from Deva Ramanan
Ex 2: Upper body detector – using HOGs
• Ferrari et al CVPR 08
average training data
More features – dense visual words
Vogel & Schiele 2004, Jurie & Triggs ICCV 05 , Fei-Fei & Perona CVPR 05, Bosch et al ECCV 06
DENSE PATCHES
…
…
Row reorder gray valuesand form a vector ofsize N2
Parameters: N – size of patchM – distance between patches
Textons
… …… …… …
128- SIFT descriptor
Parameters: r – radi of patchM – distance between patches
SIFT
Luong & Malik 1999, Varma & Zisserman 2003
More Spatial information – Pyramid kernels
• Divide image into grids of varying resolution, and give more weight to agreement in finer grids.– 2^l grids at level l
• Intersect histograms, multiply by weight.
Lazebnik et al. [CVPR 2006]
Based on Grauman & Darrell ICCV 05
Spatial Pyramid Kernels for Geometry/Appearance Matching(Lazebnik et al CVPR’06)
More Spatial information – Pyramid kernels
9 matches x 1
= 9
9 matches x 1
= 9
9 matches x 1 4 matches x ½
= 9 = 2
9 matches x 1 4 matches x ½
= 9 = 2
9 matches x 1 4 matches x ½ 2 matches x ¼
= 9 = 2 = 1/2
Total matching weight (value of spatial pyramid kernel ): 9 + 2 + 0.5 = 11.5
l = 0 l = 1 l = 2
+ +
Pyramid spatial layout for appearance patches – for images
BoW 4BoW 16BoW
Represent appearance as dense grid of visual words
l = 1
l = 0
x y
Generalizations• Use chi-squared kernel instead of histogram intersection
Kf(i, j) =Xl∈L
βlfe−μχ2(hlf(i),hlf(j))
l = 0 l = 1 l = 2
1HOG 4HOG 16HOG
Pyramid HOG – for imagesRepresent local orientated gradients
Pyramid HOG for image regions0 1 LEVEL 2
Pyramid HOG for image regions0 1 LEVEL 2
Case study: scene classification
Suburb Bedroom Kitchen Living room Office
Coast Forest Mountain Open country River Sky/clouds
Vog
el&
Sch
iele
-VS
FeiF
ei&
Per
ona
-FP
Coast Forest Mountain Open country Highway Inside city Tall building Street
Oliv
a &
Tor
ralb
a -O
T
Laze
bnik
et a
l. -L
SP
Store Industrial
Vogel & Schiele DATASET
Coast Forest Mountain
Open country River Sky/clouds
702 images VS dataset6 categories
Oliva & Torralba DATASET
Coast Forest Mountain Open country
Highway Inside city Tall building Street
2688 images8 categories
OT dataset
3759 images13 categories
FP dataset
Fei-Fei & Perona DATASET
Suburb Bedroom Kitchen Living room Office
Coast Forest Mountain Open country Highway Inside city Tall building Street
Lazebnik, Schmid & Ponce DATASET
Suburb Bedroom Kitchen Living room Office
Coast Forest Mountain Open country Highway Inside city Tall building Street
Store Industrial4385 images15 categories
LSP dataset
Mulit-way Classification
• For each class ‘c’ learn a 1-vs-rest SVM classifier
• Classification of test image I according to:
• where is the distance for the SVM for class c
c∗ = argmaxcDc(I)
Dc(I)
VisualVocabulary
w1
w2
w3w4
……
……
Features
• bag of visual words
• HOG
• spatial pyramid of visual words
• spatial pyramid HOG
Parameters• vocabulary size V
• level weightings
• feature combination weights
l = 0 l = 1 l = 2
Dataset
Validation set
100
Learn themodels
Others
½ Training ½ Testing
Classification
Methodology for learning parameter values
• optimize classification performance on a validation set
• 1 vs rest SVM classifier
SVM-BOW
60
65
70
75
80
85
90
200 500 1000 1500 2000 2500 5000
V
Per
form
ance
(%)
SVM-BOW
Optimize vocabulary size V on validation set
2688 images8 categories
OT dataset
Spatial Pyramid with optimizationSpatial Pyramid with optimizationSpatial pyramid with learnt weights
without optimization:
with optimization:
Kf(i, j) =Xl∈L
βlfe−μχ2(hlf(i),hlf(j))
• Learn level weights – linear combination of kernels
level weightbase kernel
dense visual words
• if weights common to all classes:
β0 = 0.25, β1 = 0.25, β2 = 0.5
β0 : β2 = 1.0, β1 : β2 = 0.8
Optimized values for each datasetSame number of Images as authors
SVM one against all – x2 kernel for visual words Up to L = 2 for spatial pyramid
88.0
100
89.2
78.3
93.6
91.1
90.8
97.5
Coast
Forest
Mountain
Op. Country
Highway
Inside City
Street
Tall building
Ope
nco
untry
Mou
ntai
ns
Coa
st
Ope
nco
untry
Hig
hway
Stre
et
Spatial Pyramid with optimizationSpatial Pyramid with optimizationSpatial pyramid with feature combination
Lazebnik et al.dense visual words
dense visual words optimized
dense visual words & HOG optimized
81.1 83.5 90.2
Kopt(i, j) =Xf∈F
dfKf(i, j)
• feature weights – linear combination of kernels
feature weightfeature kernel
• dense visual words
• HOG
Take home messages
• Lite use of spatial information
• tiling, spatial pyramid
• Combination of features
• visual words, sparse, dense, HOG
• Learn parameters on validation set
More classifiers …• SVM Classifier
• good performance
• convex optimization
• Logistic regression
• Adaboost
• e.g. used by Viola & Jones face detector
• slow to learn, fast to test
• Random forests
• fast to learn, fast to test
• Jamie Shotton tutorial