Upload
melissa-fitzgerald
View
222
Download
0
Embed Size (px)
Citation preview
Optimal Dimensionality of Metric Space for kNN Classification
Wei Zhang, Xiangyang Xue, Zichen Sun
Yuefei Guo, and Hong Lu
Dept. of Computer Science & Engineering
FUDAN University, Shanghai, China
2
Outline
Motivation Related Work Main Idea
Proposed Algorithm Discriminant Neighborhood Embedding Dimensionality Selection Criterion
Experimental Results Toy Datasets Real-world Datasets
Conclusions
3
Related Work
Many recent techniques have been proposed to learn a more appropriate metric space for better performance of many learning and data mining algorithms, for examples, Relevant Component Analysis, Bar-Hillel, A., et al. ICML2003. Locality Preserving Projections, He, X. et al., NIPS 2003. Neighborhood Components Analysis, Goldberger, J., et al. NIPS 2004. Marginal Fisher Analysis, Yan, S., et al., CVPR 2005. Local Discriminant Embedding, Chen, H.-T., et al. CVPR 2005. Local Fisher Discriminant Analysis, Sugiyama, M. ICML 2006 ……
However, the target dimensionality of the new space is selected empirically in the above mentioned approaches
4
Main Idea Given finite labeled multi-class samples, what can we do for better
performance of kNN classification?
Can we learn a low dimensional embedding for that kNN points in the same class have smaller distances to each other than to points in different classes?
Can we estimate the optimal dimensionality of the new metric space in the meantime ?
Original Space (D=2) New Space (d=1)
5
Outline
Motivation Related Work Main Idea
Proposed Algorithm Discriminant Neighborhood Embedding Dimensionality Selection Criterion
Experimental Results Toy Datasets Real-world Datasets
Conclusions
6
Setup
N labeled multi-class points:
k nearest neighbors of in the same class:
k nearest neighbors of in the other classes:
Discriminant adjacent matrix F:
N
1i iD
iii c,...2,1y,x,y,x
ix iNeig I
iNeigEix
1 (x ( ) x ( ))
1 (x ( ) x ( ))
0
I Ii j
E Eij i j
Neig j Neig i
F Neig j Neig i
otherwise
7
Objective Function
Objective Function
Intra-class compactness in the new space:
Inter-class separability in the new space:
2
,
( ) x x ,
(x ( ) x ( ))
T Ti j
i j
I Ii j
P P P
Neig j Neig i
2
,
( ) x x
(x ( ) x ( ))
T Ti j
i j
E Ei j
P P P
Neig j Neig i
PX)FS(XP2trace
FxPxPPPP
TT
ij
2
j,ij
Ti
T
(S is a diagonal matrix whose entries are column sums of F)
8
How to Compute P
Note
The matrix X(S-F)XT is symmetric, but not positive definite. It
might have negative, zero, or positive eigenvalues
The optimal transformation P can be obtained by the
eigenvectors of X(S-F)XT corresponding to its all d negative
eigenvalues
1
min ( )
. . 1, 0, ( )
mT Ti i
i
T Ti i i j
P X S F X P
s t P P P P i j
PP
argarg
9
What does the Positive/Negative Eigenvalue Mean?
The ith eigenvector Pi corresponding to the ith eigenvalue
: the total kNN pairwise distance in the same class
: the total kNN pairwise distance in different class
i
( ) ( ) ( )
2( ( ) )
2
2
i i i
T Ti i
Ti i i
i
P P P
P X S F X P
P P
iP
iP
iedmisclassif bemight most 0
classified correctly bemight most 0
if i
10
Choosing the Leading Negative Eigenvalues
Among all the negative eigenvalues, some might have much larger
absolute values, but the others with small absolute values could be
ignored
We can then choose t (t<d) negative eigenvalues with the largest
absolute values such that
1 1
t d
i ii i
11
Learned Mahalanobis Distance
In the original space, the distance between any pair of points can be obtained by
xx
)xx(M)xx(
)xx()xx(
xx)x,x(dist
2
M
T
T
ji
jiji
jiT
ji
2
jT
iT
ji
PP
PP
12
Outline
Motivation Related Work Main Idea
Proposed Algorithm Discriminant Neighborhood Embedding Dimensionality Selection Criterion
Experimental Results Toy Datasets Real-world Datasets
Conclusions
13
Three Classes of Well Clustered Data
Both eigenvalues are negative and comparable
Need not perform dimensionality reduction
1k
14
Two Classes of Data with Multimodal Distribution
A big difference between two negative eigenvalues
The leading eigenvector P1 corresponding to will
be kept.
1k
21
1
15
Three Classes of Data
Two eigenvectors
corresponding to positive
and negative
eigenvalues, respectively.
The eigenvector with
positive eigenvalue
should be discarded from
the point of view of kNN
classification.1k
16
Five Classes of Non-separable Data
Both eigenvalues are
positive, and it means
that we could not
perform kNN
classification well both in
the original and new
spaces
1k
17
UCI Sonar Dataset
When eigenvalues < 0, the
more dimensionality, the
higher accuracy
When eigenvalues near 0,
its optimum can be
achieved
When eigenvalues > 0, the
performance decreases
Cumulative eigenvalue curveCumulative eigenvalue curve
18
Comparisons with the State-of-the-Art
19
UMIST Face Database
20
Comparisons with the State-of-the-Art
1k UMIST Face Database
21
Outline
Motivation Related Work Main Idea
The Proposed Algorithm Discriminant Neighborhood Embedding Dimensionality Selection Criterion
Experimental Results Toy Datasets Real-world Datasets
Conclusions
22
Conclusions
SummaryA low dimensional embedding can be LEARNED for
better accuracy in kNN classification given finite
training samples
Optimal dimensionality can be estimated
Future workFor large scale datasets, how to reduce the
computational complexity?
Thanks for your Attention!
Any questions?