Upload
edolie
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order. Edgar Ch ávez Karina Figueroa Gonzalo Navarro. UNIVERSIDAD DE CHILE, CHILE. UNIVERSIDAD MICHOACANA, MEXICO. Content. About the problem Basic concepts Previous work Our technique Experiments - PowerPoint PPT Presentation
Citation preview
Proximity Searching in High Dimensional Spaces with a Proximity
Preserving Order
Edgar Chávez
Karina Figueroa
Gonzalo Navarro
UNIVERSIDADMICHOACANA,MEXICO
UNIVERSIDADDE CHILE,
CHILE
Content
1. About the problem
2. Basic concepts
3. Previous work
4. Our technique
5. Experiments
6. Conclusion and future wok
Proximity Searching
Huge Database
•Exact searching is not possible
Expensive distance
Applications
• Retrieval Information
• Classification
• People finder through the web
• Clustering
• Currently used on– Classification of Spider’s web– Face recognition on Chilean’s Web
Problems (metric spaces)
Index
Extraction of characteristics
Complex objects
High dimension
Memorylimited
Huge databases
Terminology
• Queries– Range query– K nearest neighbor
Properties•Symmetry•Strict possitiveness•Triangle inequality
Previous work
• Pivot based • Partition based
Pivot
distance
q
Previous work
• Pivot based • Partition based
centroq
Our techniquePermutation
Permutantp3
p2
p5
P4
P6
u
P1
Our technique
• Exact matching elements have the same permutation
• Similar elements must have a similar permutation (we guess)
• Spearman footrule metric– Measures the similarity of the
permutations– Promissority elements first
Spearman Footrule metricExample
3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4
Difference of positions
Searching process (1a. part)Preprocessing time
Permutantp1
p2
p3
p3,p1,p2
p3,p2,p1
p2,p1,p3
p2,p3,p1
Searching process (2a. part)Query time
Permutantp1
p2
p3
p3,p1,p2
p3,p2,p1
p2,p1,p3
p2,p3,p1
q
p2,p1,p3
Sorting elementsby SpearmanFootrule metric
p2,p1,p3p2,p3,p1…..…..p3,p1,p2
Experiments 93% retrieved, comparing 10% of database
90% retrieved, comparing 60% of databasePivot based
algorithmRetrieved 48%
%re
trie
ved
Experiments100% retrieved, comparing 15% of database
100% retrieved, comparing 90% of database%
retr
ieve
d
How good is our prediction?
retrieved
Dimension 256, using 256 pivots
Percentage of the database compared
Metric algorithms are using one of them
Similarities between permutations
Almost the same value
Conclusion
• A new probabilistic algorithm for proximity searching in metric space.
• Our technique is based on permutations.• Close elements will have similar
permutations.• This technique is the fastest known
algorithm for high dimension.• Permutations are good predictor
Future Work
• Can Non-metric spaces be tackled with this technique?
• Approximated all K Nearest neighbor algorithm.
• Improving other metric indexes.