12

K-means Clustering

K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

Download PPTX Report

Upload
jennifer-mcbride
View
214
Download
1

Embed Size (px)

Citation preview

Page 1: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

Page 2: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• What is clustering?• Why would we want to cluster?• How would you determine clusters?• How can you do this efficiently?

Page 3: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• Strengths– Simple iterative method– User provides “K”

• Weaknesses– Often too simple bad results– Difficult to guess the correct “K”

Page 4: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

Basic Algorithm:• Step 0: select K• Step 1: randomly select initial cluster seeds

Seed 1 650

Seed 2 200

Page 5: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• An initial cluster seed represents the “mean value” of its cluster.

• In the preceding figure:– Cluster seed 1 = 650– Cluster seed 2 = 200

Page 6: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• Step 2: calculate distance from each object to each cluster seed.

• What type of distance should we use?– Squared Euclidean distance

• Step 3: Assign each object to the closest cluster

Page 7: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

Seed 1

Seed 2

Page 8: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• Step 4: Compute the new centroid for each cluster

Cluster Seed 1 708.9

Cluster Seed 2 214.2

Page 9: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Clustering

• Iterate:– Calculate distance from objects to cluster

centroids.– Assign objects to closest cluster– Recalculate new centroids

• Stop based on convergence criteria– No change in clusters– Max iterations

Page 10: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

K-means Issues

• Distance measure is squared Euclidean– Scale should be similar in all dimensions• Rescale data?

– Not good for nominal data. Why?• Approach tries to minimize the within-cluster

sum of squares error (WCSS)– Implicit assumption that SSE is similar for each

group

Page 11: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

WCSS

• The over all WCSS is given by: • The goal is to find the smallest WCSS• Does this depend on the initial seed values?• Possibly.

Page 12: K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

Bottom Line

• K-means– Easy to use– Need to know K– May need to scale data– Good initial method

• Local optima– No guarantee of optimal solution– Repeat with different starting values

Document Clustering using Improved K-means Algorithm · means algorithm [4] presented how ontological domains are used in clustering documents. Improved document clustering algorithm

Document Clustering using Improved K-means Algorithm · means algorithm [4] presented how ontological domains are used in clustering documents. Improved document clustering algorithm

Documents

How Many Clusters? Which Clustering Method?smyth/courses/cs274/readings/fraley_raftery.pdf · How Many Clusters? Which Clustering Method? ... k is the orthogonal matrix of eigenvectors,

How Many Clusters? Which Clustering Method?smyth/courses/cs274/readings/fraley_raftery.pdf · How Many Clusters? Which Clustering Method? ... k is the orthogonal matrix of eigenvectors,

Documents

Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering

Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering

Documents

Written Press Logistics: How Clustering and Sales ...ils2016conference.com/wp-content/uploads/2015/03/ILS2016_TE05_1.pdf · Written Press Logistics: How Clustering and Sales Forecasting

Written Press Logistics: How Clustering and Sales ...ils2016conference.com/wp-content/uploads/2015/03/ILS2016_TE05_1.pdf · Written Press Logistics: How Clustering and Sales Forecasting

Documents

Spiritual

How to “Alternatize” a Clustering Algorithm

How to “Alternatize” a Clustering Algorithm

Documents

DAY 1. What would life be like with no memory? How would you answer the question: how are you today? With no memory who would you be? How would

DAY 1. What would life be like with no memory? How would you answer the question: how are you today? With no memory who would you be? How would

Documents

Clustering · 2015-08-20 · Characteristics of Clustering Algorithms • How the algorithm operates … – Map the clustering problem to a different domain and solve a related problem

Clustering · 2015-08-20 · Characteristics of Clustering Algorithms • How the algorithm operates … – Map the clustering problem to a different domain and solve a related problem

Documents

Clustering Clustering

Clustering Clustering

Documents

How Subjective Clustering aids Affinity Diagram in ...kth.diva-portal.org/smash/get/diva2:1117451/FULLTEXT01.pdfMaster of Science Thesis Stockholm, Sweden 201 6. How Subjective Clustering

How Subjective Clustering aids Affinity Diagram in ...kth.diva-portal.org/smash/get/diva2:1117451/FULLTEXT01.pdfMaster of Science Thesis Stockholm, Sweden 201 6. How Subjective Clustering

Documents

How would you respond?

How would you respond?

Documents

Wake-up 1.How would you recognize an Amoeba? 2.How would you recognize euglena? 3.How would you recognize a paramecium?

Wake-up 1.How would you recognize an Amoeba? 2.How would you recognize euglena? 3.How would you recognize a paramecium?

Documents

How I would study:

How I would study:

Documents

BINF 636: Lecture 9: Clustering: How Do They - Genome Tools

BINF 636: Lecture 9: Clustering: How Do They - Genome Tools

Documents

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering

Documents

Documents

How to “Alternatize” a Clustering Algorithmpeople.cs.vt.edu/~ramakris/papers/alternatize-dmkd.pdf · 2012-08-03 · How to “Alternatize” a Clustering Algorithm 3 work here

How to “Alternatize” a Clustering Algorithmpeople.cs.vt.edu/~ramakris/papers/alternatize-dmkd.pdf · 2012-08-03 · How to “Alternatize” a Clustering Algorithm 3 work here

Documents

How would you divide

How would you divide

Documents

How to Create SQL Server 2005 Clustering With Virtual Server 2005

How to Create SQL Server 2005 Clustering With Virtual Server 2005

Documents

Resin, Deployment, Cloud, Clustering and EC2 (How To) Part II

Resin, Deployment, Cloud, Clustering and EC2 (How To) Part II

Technology

Computer Vision Lecture 5. Clustering: Why and How

Computer Vision Lecture 5. Clustering: Why and How

Documents

How to Configure IPSO Clustering

How to Configure IPSO Clustering

Documents

How would the South be rebuilt? How would liberated blacks fare as free people? How would the southern states be reintegrated into the Union? Who would

How would the South be rebuilt? How would liberated blacks fare as free people? How would the southern states be reintegrated into the Union? Who would

Documents

Review Jurnal “HOW TO FIND AN APPROPRIATE CLUSTERING …

Review Jurnal “HOW TO FIND AN APPROPRIATE CLUSTERING …

Documents

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples

Documents

Affinity Clustering: Hierarchical Clustering at Scalepapers.nips.cc/paper/7262-affinity-clustering-hierarchical-clustering-at-scale.pdf · Afﬁnity Clustering: Hierarchical Clustering

Affinity Clustering: Hierarchical Clustering at Scalepapers.nips.cc/paper/7262-affinity-clustering-hierarchical-clustering-at-scale.pdf · Afﬁnity Clustering: Hierarchical Clustering

Documents

©CMBI 2006 The WWW of clustering in Bioinformatics. or, How homo-sapiens thinks Clustering hokjes

©CMBI 2006 The WWW of clustering in Bioinformatics. or, How homo-sapiens thinks Clustering hokjes

Documents

Governing Banks: How Would An Ethical Framework For … - Governing Banks - How Would An Ethical... · How Would An Ethical Framework For Good Banks Make Any Difference? ... future

Governing Banks: How Would An Ethical Framework For … - Governing Banks - How Would An Ethical... · How Would An Ethical Framework For Good Banks Make Any Difference? ... future

Documents

Clustering and Prediction: some thoughts · Rates: How good is the clustering on a finite sample? Computation How fast is the algorithm? Examples Convergence and estimation rates

Clustering and Prediction: some thoughts · Rates: How good is the clustering on a finite sample? Computation How fast is the algorithm? Examples Convergence and estimation rates

Documents

C.WattersCS64031 Clustering. C.WattersCS64032 Clustering What Why How Results

C.WattersCS64031 Clustering. C.WattersCS64032 Clustering What Why How Results

Documents