Upload
omari-dannels
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
k-Nearest Neighbors Search in High Dimensions
Tomer Peled
Dan Kushnir
Tell me who your neighbors are and Ill know who you are
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis