Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich...

Preview:

Citation preview

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Representatives for Representatives for

Visually Analyzing Cluster HierarchiesVisually Analyzing Cluster Hierarchies

Hans-Peter Kriegel,Stefan Brecheisen, Peer Kröger,Martin Pfeifle, Maximillian Viermetz

MDM/KDD2003Washington, DCAugust 24 - 27, 2003

Database GroupInstitute for Computer ScienceUniversity of Munich, Germany

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Introduction

Cluster Recognition

Cluster Representatives

BOSS

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

IntroductionIntroduction

Telecommunication Data Market-Basket Data

Problem:

• Larger and larger amounts of data gathered automatically

• Too large for humans to analyze manually

Space Telescopes

Data anlysis tools:

• Help the user to get an overview over large data sets

• Help companies to get a competitive advantage out of the data

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

IntroductionIntroductionSolution based on Visual Data MiningSolution based on Visual Data Mining

OPTICS

DATA

Visualisation of the intermediate Result Reachability-Plot

BOSS• Cluster Recognition

• Cluster Representatives

Knowledge

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICSOPTICS

Introduction

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

OPTICSOPTICSOrdering Points to Identify the Clustering StructureOrdering Points to Identify the Clustering Structure

OPTICS [Ankerst, Breunig, Kriegel, Sander 99]

• Yields a density-based hierarchical clustering

• Insensitive to its two input parameters , MinPts

• Result (so called reachability plot) can be easily

visualized and is suitable for interactive exploration

A1

A2

2

A1 A2 BB

A BA

B

1

Data Space Reachability Plot

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

44

reach

seedlist:

OPTICSOPTICSAlgorithmAlgorithm

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

(A, )

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist:

OPTICSOPTICSAlgorithmAlgorithm

A

44

reach

Database: 20 2-dimensional points, = 44, MinPts = 3

(B,40) (I, 40)

core-distance

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (I, 40) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (K, 18) (N, 19) (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (N, 19) (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (D, 22) (F, 22) (E, 30) (G, 35)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

OPTICSOPTICSAlgorithmAlgorithm

seedlist: (F, 22) (E, 22) (G, 32)

A B I J L M K N R P C D

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (G, 17) (E, 22)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (E, 15) (H, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (H, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: -

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E H

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: -

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E H

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICSOPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Cluster Recognition

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

•Recognition of Clusters via steepness:•Definition: Steep Elements

• UpPoint: The successor is % higher than this point• DownPoint: The successor is % lower than this point

•Definition: Steep Areas•A steep area starts end ends with a steep point•A steep area contains at most MinPoints contiguous non-steep points•A steep area must be maximal

Cluster RecognitionCluster Recognition- Clustering [Kriegel et al. 99]- Clustering [Kriegel et al. 99]

Steep Downward Points Steep Upward Points

Steep Down Area Steep Upward Area

Cluster

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Similar reachability values => no new cluster hierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

AA

B

C

A

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

A

B

A

B

C

B

A

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

A

C

B

A

B

C

B

C

A

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 1 (initial clustering):

• Sort all elements by descending reachability value• Find root clusters by scanning sorted list

Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

sorted list of reachability values

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 1 (initial clustering):

• Sort all elements by descending reachability value• Find root clusters by scanning sorted list

Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

pred

<<

succ

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results

Drop-Down-Clustering

Tree-Clustering

- Clustering

many clusters and subclustersare recognized

some clusters are recognized

no clusters are recognized

detection ofnarrowing clusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Cluster Representatives

Cluster Recognition

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach

Medoid-ApproachMedoid-Approach

A

IB

J

K

L

R

M

P N

C

D E

G

HST

U

V

Example with MinPts = 3

I

I

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)

Core-Distance ApproachCore-Distance Approach

A

IB

J

K

L

R

M

N

C

D E

G

HST

U

V

I

Example with MinPts = 3

P

P

PP

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)

Core-Distance ApproachCore-Distance Approach

A

IB

J

K

L

R

M

N

C

D E

G

HST

I

Example with MinPts = 5

PL

L

U

V

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)• Maximizing Successors (based on an OPTICS run)

Maximizing SuccessorsMaximizing Successors

A

IB

J

K

R

M

P N

C

D E

G

HST

I

P

L

OPTICS run

LLL

Example with MinPts = 3

E G B I K L P R N J M

reach

V

narrowing cluster

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSSBOSS

Cluster Representatives

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSBrowsing Optics-Plots for Similarity SearchBrowsing Optics-Plots for Similarity Search

• BOSS (BOSS (BBrowsing rowsing OOPTICS-Plots for PTICS-Plots for SSimilarity imilarity SSearch)earch)• Interactive data browsing tool based on reachability plotsInteractive data browsing tool based on reachability plots • User-friendly method to support the time-consuming task User-friendly method to support the time-consuming task

of finding similar parts:of finding similar parts:

• Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure

of the dataset at a glanceof the dataset at a glance

• Displaying suitable representatives for large clustersDisplaying suitable representatives for large clusters

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSArchitectureArchitecture

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSScreenshotScreenshot

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Conclusion

BOSS

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

• Contribution• New algorithm for cluster recognition

• New algorithms for finding suitable cluster representatives

• BOSS: a new data analysis tool

• Future Work• detailed evaluation of the new algorithms

ConclusionsConclusions

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Thank you for your attention

Any questions?

??

?

??

?

?

?

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

OPTICSOPTICSApplication RangesApplication Ranges

• OPTICS yields an intermediate result which serves as a

multi-purpose basis for further analysis:

Similarity Search

• Similarity search

Visualisation of the intermeediate result

OPTICS

DATA

Other Algorithms

Knowledge

Visual Data Mining

• Visual data mining

• Evaluation of similarity models

Evaluation of Similarity Models

k-nn query: