23
K-means Clustering J.-S. Roger Jang J.-S. Roger Jang ( ( 張張張 張張張 ) CSIE Dept., National Taiwan Univ., Taiwan CSIE Dept., National Taiwan Univ., Taiwan http://mirlab.org/jang http://mirlab.org/jang [email protected] [email protected] Machine Learning K-means Clustering

K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Embed Size (px)

Citation preview

Page 1: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

K-means ClusteringK-means Clustering

J.-S. Roger Jang J.-S. Roger Jang ((張智星張智星 ))

CSIE Dept., National Taiwan Univ., TaiwanCSIE Dept., National Taiwan Univ., Taiwanhttp://mirlab.org/janghttp://mirlab.org/jang

[email protected]@mirlab.org

Machine Learning K-means Clustering

Page 2: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

2 112/04/19 2

Problem DefinitionProblem Definition

Input:• : A data set in d-dim. space• m: Number of clusters (we avoid using k here to avoid

confusion with other summation indices…)

Output:• m cluster centers:• Assignment of each xi to one of the m clusters:

Requirement:• The output should minimize the objective function…

mjc j 1,

},,,{ 21 nxxxX

ia

mjniam

jij

ij

,1

1,1},1,0{

1

Page 3: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

3 112/04/19 3

Objective FunctionObjective Function

Objective function (aka. distortion)

• d*m (for matrix C) plus n*m (for matrix A) tunable parameters with certain constraints on matrix A

• Np-hard problem if exact solution is required

iawithGxiffa

cccC

xxxX

wherecxacxeACXJ

cxe

m

jijjiij

m

n

m

j

n

ijiij

m

j Gxji

m

jj

Gxjij

ji

ji

,1,1

},,,{

},,,{

,),;(

1

21

21

1 1

2

1

2

1

2

Page 4: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

4 112/04/19 4

Strategy for MinimizationStrategy for Minimization

Observation• J(X; C, A) is parameterized by C and A• Joint optimization is hard, but separate

optimization with respect to C and A is easy

Strategy• Fix C and find the best A to minimize J(X; C, A)• Fix A and find the best C to minimize J(X; C, A)• Iterate the above two steps until convergence

Properties• The approach is also known as “coordinate

optimization”

Page 5: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

5 112/04/19 5

Task 1: How to Find Assignment A?Task 1: How to Find Assignment A?

Goal• Find A to minimize J(X; C, A) with fixed C

Facts• Analytic (close-form) solution exists:

• CACXJACXJACXJAA

),ˆ,;(),;(),;(minargˆ

otherwise

cxjifa qi

qij,0

minarg1ˆ

2

Page 6: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

6 112/04/19 6

Task 2: How to Find Centers in C?Task 2: How to Find Centers in C?

Goal• Find C to minimize J(X; C, A) with fixed A

Facts• Analytic (close-form) solution exists:

• AACXJACXJACXJCC

),,ˆ;(),;(),;(minargˆ

n

iij

n

iiij

j

a

xac

1

Page 7: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

7 112/04/19 7

AlgorithmAlgorithm

1. Initialize- Select initial m cluster centers

2. Find clusters - For each xi, assign the cluster with nearest center Find A to minimize J(X; C, A) with fixed C

3. Find centers- Recompute each cluster center as the mean of data in

the cluster Find C to minimize J(X; C, A) with fixed A

4. Stopping criterion- Stop if clusters stay the same. Otherwise go to step 2.

Page 8: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

8 112/04/19 8

Stopping CriteriaStopping Criteria

Two stopping criteria• Repeating until no more change in cluster

assignment• Repeat until distortion improvement is less than

a threshold

Facts• Convergence is assured since J is reduced

repeatedly.

),;(),;(),;(),;(),;(_),;( 33232212111 ACXJACXJACXJACXJACXJCXJ

Page 9: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

9 112/04/19 9

Properties of K-means ClusteringProperties of K-means Clustering

• K-means can find the approximate solution efficiently.

• The distortion (squared error) is a monotonically non-increasing function of iterations.

• The goal is to minimize the square error, but it could end up in a local minimum.

• To increase the probability of finding the global minimum, try to start k-means with different initial conditions.

• “Cluster validation” refers to a set of methods which try to determine the best value of k.

• Other distance measures can be used in place of the Euclidean distance, with corresponding change in center identification.

Page 10: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

10 112/04/19 10

K-means SnapshotsK-means Snapshots

Page 11: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

11

Demo of K-means ClusteringDemo of K-means Clustering

• Toolbox download• Utility Toolbox• Machine Learning Toolbox

• Demos•kMeansClustering.m•vecQuantize.m

112/04/19 11

Page 12: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

12

Demos of K-means ClusteringDemos of K-means Clustering

kMeansClustering.m

112/04/19 12

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Input 1

Inpu

t 2

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Page 13: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

13 112/04/19 13

Application: Image CompressionApplication: Image Compression

Goal• Convert a image from true colors to indexed

colors with minimum distortion.

Steps• Collect data from a true-color image• Perform k-means clustering to obtain cluster

centers as the indexed colors • Compute the compression rate

cnmc

cccnm

nm

after

before

bitsccnmafter

bitsnmbefore

22

2

2

log

24

*24

log

24

8*3*log**

8*3**

8*3*log**

8*3**

Page 14: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

14

True-color vs. Index-color ImagesTrue-color vs. Index-color Images

True-color image Each pixel is

represented by a vector of 3 components [R, G, B]

Index-color image Each pixel is

represented by an index into a color map

112/04/19 14

Page 15: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

15

Example: Image CompressionExample: Image Compression

Original imageDimension: 480x640

Data size: 480*640*3*8 bits = 0.92 MB

112/04/19 15

Page 16: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

16

Example: Image CompressionExample: Image Compression

Some quantities of the k-means clusteringn = 480x640 = 307200 (no of vectors to be clustered)

d = 3 (R, G, B)

m = 256 (no. of clusters)

112/04/19 16

Page 17: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

17112/04/19 17

Example: Image CompressionExample: Image Compression

Page 18: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

18112/04/19 18

Example: Image CompressionExample: Image Compression

Page 19: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

19

Indexing TechniquesIndexing Techniques

Indexing of pixels for an 2*3*3 image

Related command: reshape

X = imread('annie19980405.jpg');

image(X);

[m, n, p]=size(X);

index=reshape(1:m*n*p, m*n, 3)';

data=double(X(index));112/04/19 19

181716151413

121110987

65432113 15 17

14 16 187 9 11

8 10 121 3 5

2 4 6

Page 20: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

20 112/04/19 20

CodeCodeX = imread('annie19980405.jpg');

image(X)

[m, n, p]=size(X);

index=reshape(1:m*n*p, m*n, 3)';

data=double(X(index));

maxI=6;

for i=1:maxI

centerNum=2^i;

fprintf('i=%d/%d: no. of centers=%d\n', i, maxI, centerNum);

center=kMeansClustering(data, centerNum);

distMat=distPairwise(center, data);

[minValue, minIndex]=min(distMat);

X2=reshape(minIndex, m, n);

map=center'/255;

figure; image(X2); colormap(map); colorbar; axis image;

end

Page 21: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

21

Extensions to Image CompressionExtensions to Image Compression

Extensions to image data compression via clustering1. Use blocks as the unit for VQ (see exercise)

- Smart indexing by creating the indices of the blocks of page 1 first.

- True-color image display (No way to display the compressed image as an index-color image)

2. Use separate code books for RGB

112/04/19 21

What are the corresponding compression ratios?

Page 22: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

22

Extension to Circle FindingExtension to Circle Finding

How to find circles via a k-means like algorithm?

112/04/19 22

-1 0 1 2 3 4-2

-1

0

1

2

3

4

Page 23: K-means Clustering J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ., Taiwan mirlab.org Machine Learning K-means Clustering

Machine Learning K-means Clustering

23 112/04/19 23