55
PARTITIONAL CLUSTERING ACM Student Chapter, Heritage Institute of Technology 17 th February, 2012 SIGKDD Presentation by Megha Nangia J. M. Mansa Koustav Mullick

PARTITIONAL CLUSTERING

  • Upload
    lenci

  • View
    88

  • Download
    4

Embed Size (px)

DESCRIPTION

PARTITIONAL CLUSTERING. ACM Student Chapter, Heritage Institute of Technology 17 th February , 2012 SIGKDD Presentation by Megha Nangia J. M. Mansa Koustav Mullick. Why do we cluster? . Clustering results are used: As a stand-alone tool to get insight into data distribution - PowerPoint PPT Presentation

Citation preview

Page 1: PARTITIONAL CLUSTERING

PARTITIONAL CLUSTERING

ACM Student Chapter,Heritage Institute of Technology

17th February, 2012SIGKDD Presentation byMegha NangiaJ. M. MansaKoustav Mullick

Page 2: PARTITIONAL CLUSTERING

Why do we cluster?

• Clustering results are used:

– As a stand-alone tool to get insight into data distribution

• Visualization of clusters may unveil important

information

– As a preprocessing step for other algorithms

• Efficient indexing or compression often relies on

clustering

Page 3: PARTITIONAL CLUSTERING

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more “similar” (in some sense or another) to each other than to those in other clusters.

Cluster analysis itself is not one specific algorithm. But the general task to be solved is forming similar clusters. It can be achieved by various algorithms.

What is Cluster Analysis?

Page 4: PARTITIONAL CLUSTERING

How do we define “similarity”?

Recall that the goal is to group together “similar” data –

but what does this mean?

No single answer – it depends on what we want to find

or emphasize in the data; this is one reason why

clustering is an “art”

The similarity measure is often more important than the

clustering algorithm used – don’t overlook this choice!

Page 5: PARTITIONAL CLUSTERING

Minimize Intra-cluster distance

Maximize Inter-cluster distance

Clustering:

Page 6: PARTITIONAL CLUSTERING

Applications:Clustering is a main task of explorative data mining to reduce

the size of large data sets. Its a common technique for statistical

data analysis used in many fields, including :

Machine learning

Pattern recognition

Image analysis

Information retrieval

Bioinformatics.

Web applications such as social network analysis, grouping of

shopping items, search result grouping etc.

Page 7: PARTITIONAL CLUSTERING

Requirements of Clustering in

Data Mining

• Scalability

• Ability to deal with different types of attributes

• Discovery of clusters with arbitrary shape

• Able to deal with noise and outliers

• Insensitive to order of input records

• High dimensionality

• Interpretability and usability

Page 8: PARTITIONAL CLUSTERING

Notion of clustering:

How many clusters?

Four Clusters Two Clusters

Six Clusters

Page 9: PARTITIONAL CLUSTERING

Clustering Algorithms:Clustering algorithms can be categorizedSome of the major algorithms are:1) Hierarchical or connectivity based clustering 2) Partitional clustering (K-means or centroid-based

clustering) 3) Density based4) Grid based5) Model based

Page 10: PARTITIONAL CLUSTERING

Mammals

Page 11: PARTITIONAL CLUSTERING
Page 12: PARTITIONAL CLUSTERING

Partitional Clustering:In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results into a partitioning of the data space into Voronoi cells.A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset

Page 13: PARTITIONAL CLUSTERING

Partitional Clustering :

Original Points A Partitional Clustering

Page 14: PARTITIONAL CLUSTERING

Hierarchical Clustering:Connectivity based clustering, also known as hierarchical clustering, is based on the core idea of objects being more related to nearby objects than to objects farther away. As such, these algorithms connect "objects" to form "clusters" based on their distance. At different distances, different clusters will form, which can be represented using a dendrogram. These algorithms do not provide a single partitioning of the data set, but instead provide an extensive hierarchy of clusters that merge with each other at certain distances.A set of nested clusters organized as a hierarchical tree

Page 15: PARTITIONAL CLUSTERING

Hierarchical Clustering:

p4 p1

p3

p2

Traditional Hierarchical Clustering

Non-traditional Hierarchical Clustering

Non-traditional Dendrogram

Traditional Dendrogram

1

2

3

4

5

6

1

23 4

5

Page 16: PARTITIONAL CLUSTERING

Hierarchical Clustering. Partitional Clustering.

Page 17: PARTITIONAL CLUSTERING

Partitioning Algorithms:

Partitioning method: Construct a partition of n

objects into a set of K clusters

Given: a set of objects and the number K

Find: a partition of K clusters that optimizes the

chosen partitionin`g criterion

Effective heuristic methods: K-means and K-medoids

algorithms

Page 18: PARTITIONAL CLUSTERING

Common choices for Similarity/ Distance measures:

Euclidean distance:

City block or Manhattan distance:

Cosine similarity:

Jaccard similarity:

N

n nn yxyxd1

2)(),(

yx

yxNyxC

N

i ii

1

cosine

1

),(

YXYXYXJSim

),(

Page 19: PARTITIONAL CLUSTERING

K-means Clustering:

Partitional clustering approach

Each cluster is associated with a centroid (center point)

Each point is assigned to the cluster with the closest

centroid

Number of clusters, K, must be specified

The basic algorithm is very simple

Page 20: PARTITIONAL CLUSTERING

K-Means Algorithm:

1. Select K points as initial Centroids.

2. Repeat:

3. Form k clusters by assigning all points to their respective closest centroid.

4. Re-compute the centroid for each cluster

5. Until: The centroids don`t change.

START

Choose K Centroids

Form k clusters.

Recompute centroid

Centroidschange

END

YES

NO

Page 21: PARTITIONAL CLUSTERING

Time Complexity• Assume computing distance between two instances is O(m)

where m is the dimensionality of the vectors.

• Reassigning clusters: O(kn) distance computations, or O(knm).

• Computing centroids: Each instance vector gets added once

to some centroid: O(nm).

• Assume these two steps are each done once for I iterations:

O(Iknm).

Page 22: PARTITIONAL CLUSTERING

K-means Clustering: Step 1Algorithm: k-means, Distance Metric: Euclidean Distance

Page 23: PARTITIONAL CLUSTERING

K-means Clustering: Step 2Algorithm: k-means, Distance Metric: Euclidean Distance

Page 24: PARTITIONAL CLUSTERING

K-means Clustering: Step 3Algorithm: k-means, Distance Metric: Euclidean Distance

Page 25: PARTITIONAL CLUSTERING

K-means Clustering: Step 4Algorithm: k-means, Distance Metric: Euclidean Distance

Page 26: PARTITIONAL CLUSTERING

0

1

2

3

4

5

0 1 2 3 4 5

k1

k2k3

K-means Clustering: Step 5Algorithm: k-means, Distance Metric: Euclidean Distance

Page 27: PARTITIONAL CLUSTERING

K-Means Clustering: Example 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 6

Page 28: PARTITIONAL CLUSTERING

K-Means Clustering: Example 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 6

Page 29: PARTITIONAL CLUSTERING

Importance of Choosing Initial Centroids …

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 5

Page 30: PARTITIONAL CLUSTERING

Importance of Choosing Initial Centroids …

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

xy

Iteration 5

Page 31: PARTITIONAL CLUSTERING

Two different K-means Clusterings

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Sub-optimal Clustering

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Optimal Clustering

Original Points

Page 32: PARTITIONAL CLUSTERING

Solutions to Initial Centroids Problem

• Multiple runs– Helps, but probability is not on your side

• Sample and use hierarchical clustering to determine initial centroids

• Select more than k initial centroids and then select among these initial centroids– Select most widely separated

• Postprocessing• Bisecting K-means

– Not as susceptible to initialization issues

Page 33: PARTITIONAL CLUSTERING

Evaluating K-means Clusters• Most common measure is Sum of Squared Error (SSE)

– For each point, the error is the distance to the nearest cluster– To get SSE, we square these errors and sum them.

– x is a data point in cluster Ci and mi is the representative point for cluster Ci • can show that mi corresponds to the center (mean) of the cluster

– Given two clusters, we can choose the one with the smallest error– One easy way to reduce SSE is to increase K, the number of clusters

• A good clustering with smaller K can have a lower SSE than a poor clustering with higher K

K

i Cxi

i

xmdistSSE1

2 ),(

Page 34: PARTITIONAL CLUSTERING

Strength

Relatively efficient: O(ikn), where n is # objects, k is # clusters, and i is # iterations. Normally, k, i << n.

Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms

Weakness

Applicable only when mean is defined, then what about categorical data?

Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes Also may give rise to Empty-clusters.

Page 35: PARTITIONAL CLUSTERING

Outliers

• Outliers are objects that do not belong to any cluster or form clusters of very small cardinality

cluster

outliers

Page 36: PARTITIONAL CLUSTERING

Bisecting K-Means:

A variant of k-means, that can produce a partitional or heirarchical clustering.

Which cluster to be picked for bisection ?

Can pick the largest Cluster , or The cluster With lowest average similarity, orCluster with the largest SSE.

Page 37: PARTITIONAL CLUSTERING

Bisecting K-Means Algorithm:START

Initialize clusters

Select a cluster

K clusters

END

NO

YES

1. Initialize the list of clusters.

2. Repeat:

3. Select a cluster from the list of clusters.

4. For i=1 to number_of_iterations

5. Bisect the cluster using k-means algorithm

6. End for

7. Select two clusters having the lowest SSE

8. Add the two clusters from the bisection to the list of clusters

9. Until: The list contains k clusters.

i < no. of iterations

YES

Bisect the cluster.i++

Add the two bisected clusters, having lowest SSE, to list of clusters

NO

Page 38: PARTITIONAL CLUSTERING

Bisecting K-means: Example

Page 39: PARTITIONAL CLUSTERING

Why bisecting K-means works better than regular K-means?

–Bisecting K-means tends to produce clusters of relatively uniform size.

–Regular K-means tends to produce clusters of widely different sizes.

–Bisecting K-means beats Regular K-means in Entropy measurement

Page 40: PARTITIONAL CLUSTERING

Limitations of K-means:K-means has problems when clusters are of differing – Sizes– Densities– Non-globular shapes

K-means has problems when the data contains outliers.

Page 41: PARTITIONAL CLUSTERING

Limitations of K-means: Differing Sizes

Original Points K-means (3 Clusters)

Page 42: PARTITIONAL CLUSTERING

Original Points K-means (3 Clusters)

Limitations of K-means: Differing Density

Page 43: PARTITIONAL CLUSTERING

Limitations of K-means: Non-globular Shapes

Original Points K-means (2 Clusters)

Page 44: PARTITIONAL CLUSTERING

Overcoming K-means Limitations

Original Points K-means ClustersOne solution is to use many clusters.

Find parts of clusters, but need to put together.

Page 45: PARTITIONAL CLUSTERING

Overcoming K-means Limitations

Original Points K-means Clusters

Page 46: PARTITIONAL CLUSTERING

Overcoming K-means Limitations

Original Points K-means Clusters

Page 47: PARTITIONAL CLUSTERING

K-Medoids AlgorithmWhat is a medoid?

A medoid can be defined as the object of a cluster, whose average dissimilarity to all the objects in the cluster is minimal, i.e, it is a most centrally located point in the cluster.

In contrast to the k-means algorithm, k-medoids chooses datapoints as centers(medoids or exemplars)

The most common realisation of k-medoid clustering is the Partitioning Around Medoids (PAM) algorithm.

Page 48: PARTITIONAL CLUSTERING

1. Initialize: randomly select k of the n data points as the medoids.

2. Associate each data point to the closest medoid.

3. For each medoid m

1. For each non-medoid data point o

1. Swap m and o and compute the total cost of

the configuration.

4. Select the configuration with the lowest cost.

5. Repeat steps 2 to 5 until there is no change in the medoid.

Partitioning around medoids(PAM) algorithm

Page 49: PARTITIONAL CLUSTERING

Demonstration of PAMCluster the following set of ten objects into two clusters i.e. k=2.Consider a data set of ten objects as follows:

Point Cordinate 1 Cordinate2

X1 2 6

X2 3 4

X3 3 8

X4 4 7

X5 6 2

X6 6 4

X7 7 3

X8 7 4

X9 8 5

X10 7 6

Page 50: PARTITIONAL CLUSTERING

Distribution of the data

1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Page 51: PARTITIONAL CLUSTERING

Step 1Initialize k centres. Let us assume c1=(3,4) and c2=(7,4).So here c1 and c2 are selected as medoids.Calculating distance so as to associate each data object to its nearest medoid.

c1 Data objects (Xi) Cost

3 4 2 6 3

3 4 3 8 4

3 4 4 7 4

3 4 6 2 5

3 4 6 4 3

3 4 7 3 5

3 4 8 5 6

3 4 7 6 6

C2 Data objects (Xi) Cost

7 4 2 6 7

7 4 3 8 8

7 4 4 7 6

7 4 6 2 3

7 4 6 4 1

7 4 7 3 1

7 4 8 5 2

7 4 7 6 2

Page 52: PARTITIONAL CLUSTERING

Then so the clusters become:

Cluster1={(3,4)(2,6)(3,8)(4,7)}

Cluster2={(7,4) (6,2)(6,4)(7,3)(8,5)(7,6)}

The total cost involved is 20

Page 53: PARTITIONAL CLUSTERING

Cluster after step 1

1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

c1

c2

Next, we choose a non-medoid point for each medoid, swap it with the medoid and re-compute the cost. If the cost is optimized, we make it the new medoid and proceed similarly, until there is no change in the medoids.

Page 54: PARTITIONAL CLUSTERING

Comments on PAM Algorithm

Pam is more robust than k-means in the

presence of noise and outliers because a medoid

is less influenced by outliers or other extreme

values than a mean

Pam works well for small data sets but does not

scale well for large data sets.

Page 55: PARTITIONAL CLUSTERING

Conclusion: Partitional clustering is a very efficient and easy to implement clustering

method. It helps us find the global and local optimums. Some of the heuristic approaches involve the K-means and K-medoid

algorithms.

However partitional clustering also suffers from a number of shortcomings: The performance of the algorithm depends on the initial centroids. So

the algorithm gives no guarantee for an optimal solution.Choosing poor initial centroids may lead to the generation of empty clusters

as well.The number of clusters need to be determined beforehand.Does not work well with non-globular clusters.

Some of the above stated drawbacks can be solved using the other popular Clustering approach, such as Hierarchical or density based clustering. Nevertheless the importance of partitional clustering cannot be denied.