62
Washington, Washington, 08/27/03 08/27/03 Martin Pfeifle, Database Group, Martin Pfeifle, Database Group, University of Munich University of Munich Representatives for Representatives for Visually Analyzing Cluster Hierarchies Visually Analyzing Cluster Hierarchies ans-Peter Kriegel, tefan Brecheisen, Peer Kröger, artin Pfeifle , Maximillian Viermetz MDM/KDD2003 Washington, DC August 24 - 27, 2003 Database Group Institute for Computer Science University of Munich, Germany

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Representatives for Representatives for

Visually Analyzing Cluster HierarchiesVisually Analyzing Cluster Hierarchies

Hans-Peter Kriegel,Stefan Brecheisen, Peer Kröger,Martin Pfeifle, Maximillian Viermetz

MDM/KDD2003Washington, DCAugust 24 - 27, 2003

Database GroupInstitute for Computer ScienceUniversity of Munich, Germany

Page 2: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Introduction

Cluster Recognition

Cluster Representatives

BOSS

Page 3: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

IntroductionIntroduction

Telecommunication Data Market-Basket Data

Problem:

• Larger and larger amounts of data gathered automatically

• Too large for humans to analyze manually

Space Telescopes

Data anlysis tools:

• Help the user to get an overview over large data sets

• Help companies to get a competitive advantage out of the data

Page 4: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

IntroductionIntroductionSolution based on Visual Data MiningSolution based on Visual Data Mining

OPTICS

DATA

Visualisation of the intermediate Result Reachability-Plot

BOSS• Cluster Recognition

• Cluster Representatives

Knowledge

Page 5: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICSOPTICS

Introduction

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Page 6: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

OPTICSOPTICSOrdering Points to Identify the Clustering StructureOrdering Points to Identify the Clustering Structure

OPTICS [Ankerst, Breunig, Kriegel, Sander 99]

• Yields a density-based hierarchical clustering

• Insensitive to its two input parameters , MinPts

• Result (so called reachability plot) can be easily

visualized and is suitable for interactive exploration

A1

A2

2

A1 A2 BB

A BA

B

1

Data Space Reachability Plot

Page 7: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

44

reach

seedlist:

OPTICSOPTICSAlgorithmAlgorithm

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

(A, )

Page 8: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist:

OPTICSOPTICSAlgorithmAlgorithm

A

44

reach

Database: 20 2-dimensional points, = 44, MinPts = 3

(B,40) (I, 40)

core-distance

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 9: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (I, 40) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 10: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 11: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 12: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 13: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (K, 18) (N, 19) (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 14: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (N, 19) (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 15: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (R, 20) (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 16: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (P, 21) (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 17: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (C, 40)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 18: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (D, 22) (F, 22) (E, 30) (G, 35)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 19: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

OPTICSOPTICSAlgorithmAlgorithm

seedlist: (F, 22) (E, 22) (G, 32)

A B I J L M K N R P C D

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 20: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (G, 17) (E, 22)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 21: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (E, 15) (H, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 22: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: (H, 43)

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 23: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: -

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E H

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 24: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

A I

B

J

K

L

R

M

P

N

CF

DE G H

seedlist: -

OPTICSOPTICSAlgorithmAlgorithm

A B I J L M K N R P C D F G E H

44

reach

• Example Database (2-dimensional, 16 points)

• = 44, MinPts = 3

Page 25: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICSOPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Cluster Recognition

Page 26: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

•Recognition of Clusters via steepness:•Definition: Steep Elements

• UpPoint: The successor is % higher than this point• DownPoint: The successor is % lower than this point

•Definition: Steep Areas•A steep area starts end ends with a steep point•A steep area contains at most MinPoints contiguous non-steep points•A steep area must be maximal

Cluster RecognitionCluster Recognition- Clustering [Kriegel et al. 99]- Clustering [Kriegel et al. 99]

Steep Downward Points Steep Upward Points

Steep Down Area Steep Upward Area

Cluster

Page 27: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Page 28: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Page 29: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Page 30: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Similar reachability values => no new cluster hierarchy

Page 31: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RecognitionCluster RecognitionCluster-Tree [Sander et al. 03]Cluster-Tree [Sander et al. 03]

Root

3 4 5

1 2

1 2

3 4 5

significant local maxima

insignificant local maxima

Algorithm:• Find all local maxima and sort them in descending order• Split data set• Test for significance of split• Decide where to attach the sublcusters• Call the method recursively for new sublcusters

Page 32: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

AA

B

C

A

Page 33: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

A

B

A

B

C

B

A

Page 34: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Motivation:• Detection of narrowing clusters, e.g. cluster C

Cluster Definition:• A set of elements which is smaller than a given value• A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster

Cluster RecognitionCluster RecognitionDrop-Down Clustering: MotivationDrop-Down Clustering: Motivation

A

C

B

A

B

C

B

C

A

Page 35: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 1 (initial clustering):

• Sort all elements by descending reachability value• Find root clusters by scanning sorted list

Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

sorted list of reachability values

Page 36: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 1 (initial clustering):

• Sort all elements by descending reachability value• Find root clusters by scanning sorted list

Cluster RecognitionCluster RecognitionDrop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

Page 37: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 38: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 39: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 40: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

pred

<<

succ

borderpoints

clusterhierarchy

Page 41: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

borderpoints

clusterhierarchy

Page 42: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

pred

succ

borderpoints

clusterhierarchy

Page 43: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 44: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 45: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 46: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 47: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Phase 2 (recursive pool draining):• Sort all elements by descending reachability value• Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters

Cluster RecognitionCluster Recognitionroot-cluster sorted elements of

root-cluster

Drop-Down Clustering: AlgortihmDrop-Down Clustering: Algortihm

borderpoints

clusterhierarchy

Page 48: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results

Drop-Down-Clustering

Tree-Clustering

- Clustering

many clusters and subclustersare recognized

some clusters are recognized

no clusters are recognized

detection ofnarrowing clusters

Page 49: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Cluster Representatives

Cluster Recognition

Page 50: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach

Medoid-ApproachMedoid-Approach

A

IB

J

K

L

R

M

P N

C

D E

G

HST

U

V

Example with MinPts = 3

I

I

Page 51: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)

Core-Distance ApproachCore-Distance Approach

A

IB

J

K

L

R

M

N

C

D E

G

HST

U

V

I

Example with MinPts = 3

P

P

PP

Page 52: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)

Core-Distance ApproachCore-Distance Approach

A

IB

J

K

L

R

M

N

C

D E

G

HST

I

Example with MinPts = 5

PL

L

U

V

Page 53: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster Representatives

Algorithms for Detecting Cluster Representatives:• Medoid-Approach• Core-Distance Approach (based on an OPTICS run)• Maximizing Successors (based on an OPTICS run)

Maximizing SuccessorsMaximizing Successors

A

IB

J

K

R

M

P N

C

D E

G

HST

I

P

L

OPTICS run

LLL

Example with MinPts = 3

E G B I K L P R N J M

reach

V

narrowing cluster

Page 54: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Cluster RepresentativesCluster RepresentativesFirst Experimental ResultsFirst Experimental Results

Page 55: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSSBOSS

Cluster Representatives

Page 56: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSBrowsing Optics-Plots for Similarity SearchBrowsing Optics-Plots for Similarity Search

• BOSS (BOSS (BBrowsing rowsing OOPTICS-Plots for PTICS-Plots for SSimilarity imilarity SSearch)earch)• Interactive data browsing tool based on reachability plotsInteractive data browsing tool based on reachability plots • User-friendly method to support the time-consuming task User-friendly method to support the time-consuming task

of finding similar parts:of finding similar parts:

• Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure

of the dataset at a glanceof the dataset at a glance

• Displaying suitable representatives for large clustersDisplaying suitable representatives for large clusters

Page 57: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSArchitectureArchitecture

Page 58: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

BOSSBOSSScreenshotScreenshot

Page 59: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

OPTICS

Conclusion

Cluster Recognition

Cluster Representatives

BOSS

Conclusion

BOSS

Page 60: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

• Contribution• New algorithm for cluster recognition

• New algorithms for finding suitable cluster representatives

• BOSS: a new data analysis tool

• Future Work• detailed evaluation of the new algorithms

ConclusionsConclusions

Page 61: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Thank you for your attention

Any questions?

??

?

??

?

?

?

Page 62: Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies

Washington, 08/27/03Washington, 08/27/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

OPTICSOPTICSApplication RangesApplication Ranges

• OPTICS yields an intermediate result which serves as a

multi-purpose basis for further analysis:

Similarity Search

• Similarity search

Visualisation of the intermeediate result

OPTICS

DATA

Other Algorithms

Knowledge

Visual Data Mining

• Visual data mining

• Evaluation of similarity models

Evaluation of Similarity Models

k-nn query: