44

Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Embed Size (px)

Citation preview

Page 1: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 2: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

DefinitionFinding groups of objects such that the

objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster

distances are maximized

Intra-cluster distances are

minimized

Page 3: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Applications• Group related documents for browsing• Group genes and proteins that have

similar functionality• Group stocks with similar price

fluctuations• Reduce the size of large data sets• Group users with similar buying

mentalities

Page 4: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Clustering is ambiguousThere is no correct or incorrect solution for

clustering.

How many clusters?

Four Clusters Two Clusters

Six Clusters

Page 5: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Challenges facedScalabilityAbility to deal with different types of attributesNoise & OutliersComplex shapes and types of dataIncremental clustering and insensitivity to the

order of input recordsHigh dimensionalityConstraint-based clusteringInterpretability and usability

Page 6: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Types of DataData Matrix

n-objects with p-variables.The structure is in the form of a relational table,

or n x p matrixDissimilarity Matrix

object-by-object structure. Stores a collection of proximities that are available for all pair of n objects.

d(i, j) is the dissimilarity between objects i and j.d(i, j) = d(j, i) and d(i, i) = 0

Page 7: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Types of DataInterval- Scaled VariablesBinary VariablesNominalOrdinalRatio-Scaled variablesVariables of Mixed Types

Page 8: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Interval- Scaled Variables

Page 9: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Interval-scaled variables contd…

Page 10: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Binary variablesBinary variable has only two states 0 and 1Dissimilarity between two binary variables is

by a 2*2 contingency table for binary variables

1 0

1 q r q+r

0 s t s+t

q+s r+t p

OBJ i

OBJ j

Page 11: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Dissimilarity between binary variablesName Gende

rFever Cough Test-1 Test-2 Test-3 Test-4

Jack M Y N P N N N

Mary F Y N P N P N

Jim M Y Y N N N N

D(Jack,Mary)=0.33D(Jack,Jim)=0.67D(Mary,Jim)=0.75

Page 12: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Categorical Variables

Page 13: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Ordinalsimilar to nominal variables, but values are

ordered in some sequence.Eg. rank or employees can be assistant,

associate, fullRatio-Scaled variables

Makes a positive measurement on a non-linear scaleEg. Growth of bacteria, radioactivity

Variables of Mixed Types

Other types of data

Page 14: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Types of clusteringHierarchical clustering(BIRCH)

A set of nested clusters organized as a hierarchical tree

Partitional Clustering(k-means,k-mediods)A division data objects into non-overlapping

(distinct) subsets (i.e., clusters) such that each data object is in exactly one subset

Density – Based(DBSCAN)Based on density functions

Grid-Based(STING)Based on nultiple-level granularity structure

Model-Based(SOM)Hypothesize a model for each of the clusters and

find the best fit of the data to the given model

Page 15: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Partitional Clustering

Original Points A Partitional Clustering

Page 16: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Hierarchical Clustering

p4p1

p3

p2

p4 p1

p3

p2

p4p1 p2 p3

p4p1 p2 p3

Traditional Hierarchical Clustering

Non-traditional Hierarchical Clustering

Traditional Dendrogram

Non-traditional Dendrogram

Page 17: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Clustering AlgorithmsPartitional

K-meansK-mediods

HierarchialAgglomerativeDivisive

Page 18: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

K-Mean AlgorithmEach cluster is represented by the mean value of

the objects in the clusterInput : set of objects (n), no of clusters (k)Output : set of k clustersAlgo

Randomly select k samples & mark them a initial cluster

Repeat Assign/ reassign in sample to any given cluster to which

it is most similar depending upon the mean of the cluster Update the cluster’s mean until No Change.

Page 19: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

K-Means (Array)Step 1: Randomly assign objects to k

clustersStep 2: Find the mean of each clusterStep 3: Re-assign objects to the cluster

with closest mean.Step 4: Go to step2

Repeat until no change.

Page 20: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Example 1Given: {2,3,6,8,9,12,15,18,22} Assume k=3.Solution:

Randomly partition given data set: K1 = 2,8,15 mean = 8.3 K2 = 3,9,18 mean = 10 K3 = 6,12,22 mean = 13.3

Reassign K1 = 2,3,6,8,9 mean = 5.6 K2 = mean = 0 K3 = 12,15,18,22 mean = 16.75

Page 21: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Reassign K1 = 3,6,8,9 mean = 6.5 K2 = 2 mean = 2 K3 = 12,15,18,22 mean = 16.75

Reassign K1 = 6,8,9 mean = 7.6 K2 = 2,3 mean = 2.5 K3 = 12,15,18,22 mean = 16.75

Reassign K1 = 6,8,9 mean = 7.6 K2 = 2,3 mean = 2.5 K3 = 12,15,18,22 mean = 16.75

STOP

Page 22: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Example 2Given {2,4,10,12,3,20,30,11,25} Assume k=2.

Solution:K1 = 2,3,4,10,11,12K2 = 20, 25, 30

Page 23: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Advantages•K-means is relatively scalable and efficient in processing large data sets•The computational complexity of the algorithm is O(nkt) n: the total number of objects k: the number of clusters t: the number of iterations Normally: k<<n and t<<nDisadvantage • Can be applied only when the mean of a cluster is defined• Users need to specify k• K-means is not suitable for discovering clusters with non convex shapes or clusters of very different size• It is sensitive to noise and outlier data points (can influence the mean value)

Page 24: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

K-Means (graph)Step1: Form k centroids, randomlyStep2: Calculate distance between centroids

and each objectUse Euclidean’s law do determine min distance:

d(A,B) = (x2-x1)2 + (y2-y1)2

Step3: Assign objects based on min distance to k clusters

Step4: Calculate centroid of each cluster using

C = (x1+x2+…xn , y1+y2+…yn)

n n

Go to step 2.Repeat until no change in centroids.

Page 25: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Example 1There are four types of medicines and each

have two attributes, as shown below. Find a way to group them into 2 groups based on their features.

Medicine Weight pH

A 1 1

B 2 1

C 4 3

D 5 4

Page 26: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

SolutionPlot the values on a graph.

Mark any k centeroids

Page 27: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Calculate Euclidean distance of each point from the centeroids.

D = 0 1 3.61 5

1 0 2.83 4.24

Based on minimum distance, we assign points to clusters:K1 = A

K2 = B, C, DCalculate new centeroidsC = 2+4+5 ,1+3+4 = (11/3 , 8/3)

3 3

Page 28: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Marking the new centroids

Continue the iteration, until there is no change in the centroids or clusters.

Page 29: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Final solution

Page 30: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Example 2Use K-means algorithm to create two

clusters. Given:

Page 31: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Example 3.Group the below points into 3 clusters

Page 32: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 33: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 34: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 35: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 36: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 37: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 38: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 39: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 40: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 41: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 42: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 43: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Page 44: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)