View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Representatives for Representatives for
Visually Analyzing Cluster HierarchiesVisually Analyzing Cluster Hierarchies
Hans-Peter Kriegel,Stefan Brecheisen, Peer Kröger,Martin Pfeifle, Maximillian Viermetz
MDM/KDD2003Washington, DCAugust 24 - 27, 2003
Database GroupInstitute for Computer ScienceUniversity of Munich, Germany
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICS
Conclusion
Introduction
Cluster Recognition
Cluster Representatives
BOSS
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
IntroductionIntroduction
Telecommunication Data Market-Basket Data
Problem:
• Larger and larger amounts of data gathered automatically
• Too large for humans to analyze manually
Space Telescopes
Data anlysis tools:
• Help the user to get an overview over large data sets
• Help companies to get a competitive advantage out of the data
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
IntroductionIntroductionSolution based on Visual Data MiningSolution based on Visual Data Mining
OPTICS
DATA
Visualisation of the intermediate Result Reachability-Plot
BOSS• Cluster Recognition
• Cluster Representatives
Knowledge
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICSOPTICS
Introduction
Conclusion
Cluster Recognition
Cluster Representatives
BOSS
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
OPTICSOPTICSOrdering Points to Identify the Clustering StructureOrdering Points to Identify the Clustering Structure
OPTICS [Ankerst, Breunig, Kriegel, Sander 99]
• Yields a density-based hierarchical clustering
• Insensitive to its two input parameters , MinPts
• Result (so called reachability plot) can be easily
visualized and is suitable for interactive exploration
A1
A2
2
A1 A2 BB
A BA
B
1
Data Space Reachability Plot
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
44
reach
seedlist:
OPTICSOPTICSAlgorithmAlgorithm
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
(A, )
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist:
OPTICSOPTICSAlgorithmAlgorithm
A
44
reach
Database: 20 2-dimensional points, = 44, MinPts = 3
(B,40) (I, 40)
core-distance
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (I, 40) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)
OPTICSOPTICSAlgorithmAlgorithm
A B I
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (K, 18) (N, 19) (R, 20) (P, 21) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (N, 19) (R, 20) (P, 21) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (R, 20) (P, 21) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (P, 21) (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (C, 40)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (D, 22) (F, 22) (E, 30) (G, 35)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
OPTICSOPTICSAlgorithmAlgorithm
seedlist: (F, 22) (E, 22) (G, 32)
A B I J L M K N R P C D
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (G, 17) (E, 22)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C D F
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (E, 15) (H, 43)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C D F G
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: (H, 43)
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C D F G E
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: -
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C D F G E H
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
A I
B
J
K
L
R
M
P
N
CF
DE G H
seedlist: -
OPTICSOPTICSAlgorithmAlgorithm
A B I J L M K N R P C D F G E H
44
reach
• Example Database (2-dimensional, 16 points)
• = 44, MinPts = 3
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICSOPTICS
Conclusion
Cluster Recognition
Cluster Representatives
BOSS
Cluster Recognition
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
•Recognition of Clusters via steepness:•Definition: Steep Elements
• UpPoint: The successor is % higher than this point• DownPoint: The successor is % lower than this point
•Definition: Steep Areas•A steep area starts end ends with a steep point•A steep area contains at most MinPoints contiguous non-steep points•A steep area must be maximal
Cluster RecognitionCluster Recognition- Clustering [Kriegel et al. 99]- Clustering [Kriegel et al. 99]
Steep Downward Points Steep Upward Points
Steep Down Area Steep Upward Area
Cluster
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]
Root
3 4 5
1 2
1 2
3 4 5
significant local maxima
insignificant local maxima
Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]
Root
3 4 5
1 2
1 2
3 4 5
significant local maxima
insignificant local maxima
Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]
Root
3 4 5
1 2
1 2
3 4 5
significant local maxima
insignificant local maxima
Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]
Root
3 4 5
1 2
1 2
3 4 5
Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters
Similar reachability values => no new cluster hierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]
Root
3 4 5
1 2
1 2
3 4 5
significant local maxima
insignificant local maxima
Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Motivation:• Detection of narrowing clusters, e.g. cluster C
Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster
Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation
AA
B
C
A
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Motivation:• Detection of narrowing clusters, e.g. cluster C
Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster
Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation
A
B
A
B
C
B
A
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Motivation:• Detection of narrowing clusters, e.g. cluster C
Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster
Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation
A
C
B
A
B
C
B
C
A
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 1 (initial clustering):
• Sort all elements by descending reachability value• Find root clusters by scanning sorted list
Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
sorted list of reachability values
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 1 (initial clustering):
• Sort all elements by descending reachability value• Find root clusters by scanning sorted list
Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
pred
succ
pred
<<
succ
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
pred
succ
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
pred
succ
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters
Cluster RecognitionCluster Recognitionroot-cluster sorted elements of
root-cluster
Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm
borderpoints
clusterhierarchy
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results
Drop-Down-Clustering
Tree-Clustering
- Clustering
many clusters and subclustersare recognized
some clusters are recognized
no clusters are recognized
detection ofnarrowing clusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICS
Conclusion
Cluster Recognition
Cluster Representatives
BOSS
Cluster Representatives
Cluster Recognition
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster Representatives
Algorithms for Detecting Cluster Representatives:• Medoid-Approach
Medoid-ApproachMedoid-Approach
A
IB
J
K
L
R
M
P N
C
D E
G
HST
U
V
Example with MinPts = 3
I
I
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster Representatives
Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)
Core-Distance ApproachCore-Distance Approach
A
IB
J
K
L
R
M
N
C
D E
G
HST
U
V
I
Example with MinPts = 3
P
P
PP
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster Representatives
Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)
Core-Distance ApproachCore-Distance Approach
A
IB
J
K
L
R
M
N
C
D E
G
HST
I
Example with MinPts = 5
PL
L
U
V
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster Representatives
Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)• Maximizing Successors (based on an OPTICS run)
Maximizing SuccessorsMaximizing Successors
A
IB
J
K
R
M
P N
C
D E
G
HST
I
P
L
OPTICS run
LLL
Example with MinPts = 3
E G B I K L P R N J M
reach
V
narrowing cluster
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICS
Conclusion
Cluster Recognition
Cluster Representatives
BOSSBOSS
Cluster Representatives
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
BOSSBOSSBrowsing Optics-Plots for Similarity SearchBrowsing Optics-Plots for Similarity Search
• BOSS (BOSS (BBrowsing rowsing OOPTICS-Plots for PTICS-Plots for SSimilarity imilarity SSearch)earch)• Interactive data browsing tool based on reachability plotsInteractive data browsing tool based on reachability plots • User-friendly method to support the time-consuming task User-friendly method to support the time-consuming task
of finding similar parts:of finding similar parts:
• Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure
of the dataset at a glanceof the dataset at a glance
• Displaying suitable representatives for large clustersDisplaying suitable representatives for large clusters
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
BOSSBOSSArchitectureArchitecture
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
BOSSBOSSScreenshotScreenshot
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Outline of the TalkOutline of the Talk
Introduction
OPTICS
Conclusion
Cluster Recognition
Cluster Representatives
BOSS
Conclusion
BOSS
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
• Contribution• New algorithm for cluster recognition
• New algorithms for finding suitable cluster representatives
• BOSS: a new data analysis tool
• Future Work• detailed evaluation of the new algorithms
ConclusionsConclusions
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
Thank you for your attention
Any questions?
??
?
??
?
?
?
Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich
OPTICSOPTICSApplication RangesApplication Ranges
• OPTICS yields an intermediate result which serves as a
multi-purpose basis for further analysis:
Similarity Search
• Similarity search
Visualisation of the intermeediate result
OPTICS
DATA
Other Algorithms
Knowledge
Visual Data Mining
• Visual data mining
• Evaluation of similarity models
Evaluation of Similarity Models
k-nn query: