15
DATA MINING CLUSTERING ANALYSIS

DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Embed Size (px)

Citation preview

Page 1: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

DATA MINING

CLUSTERING ANALYSIS

Page 2: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 2

CLUSTERING

Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups.

The balls of same colour are clustered into a group as shown below :

Concept

Definition (Cluster, Cluster analysis)

Page 3: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 3

CLUSTERING Which is a good cluster?

Data structures in data mining / clustering

Types of data in cluster analysis

Types of clustering

K-means:

Concept

Algorithm

Example

Comments

Page 4: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 4

The K-Means Clustering Method

Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Page 5: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 5

The K-Means Clustering Method

Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Update the cluster means

reassign

Page 6: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 6

The K-Means Clustering Method

Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Update the cluster means

Update the cluster means

reassignreassign

Page 7: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 7

The K-Means Clustering Method

How it works? Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4)

A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially.

Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2).

Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1| .

Page 8: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 8

The K-Means Clustering Method

Iteration # 1:

(2, 10) (5, 8) (1, 2)

Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster

A1 (2, 10)

A2 (2, 5)

A3 (8, 4)

A4 (5, 8)

A5 (7, 5)

A6 (6, 4)

A7 (1, 2)

A8 (4, 9)

Page 9: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 9

The K-Means Clustering Method

Iteration # 1: Point mean1 x1, y1 x2, y2 (2, 10) (2, 10) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = 0 + 0 = 0

Point mean2 x1, y1 x2, y2 (2, 10) (5, 8) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = 3 + 2 = 5

Point mean3 x1, y1 x2, y2 (2, 10) (1, 2) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = 1 + 8 = 9

Page 10: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 10

The K-Means Clustering Method

Iteration # 1:

(2, 10) (5, 8) (1, 2)

Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster

A1 (2, 10) 0 5 9 1

A2 (2, 5)

A3 (8, 4)

A4 (5, 8)

A5 (7, 5)

A6 (6, 4)

A7 (1, 2)

A8 (4, 9)

Page 11: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 11

The K-Means Clustering Method

Iteration # 1:

(2, 10) (5, 8) (1, 2)

Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster

A1 (2, 10) 0 5 9 1

A2 (2, 5) 5 6 4 3

A3 (8, 4) 12 7 9 2

A4 (5, 8) 5 0 10 2

A5 (7, 5) 10 5 9 2

A6 (6, 4) 10 5 7 2

A7 (1, 2) 9 10 0 3

A8 (4, 9) 3 2 10 2

Page 12: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 12

The K-Means Clustering Method

Iteration # 1: New clusters:

Cluster 1: (2, 10) Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9) Cluster 3: (2, 5) (1, 2)

New means: For Cluster 1, we only have one point A1(2, 10), which

was the old mean, so the cluster center remains the same. Cluster 2: ( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6) Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

Page 13: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 13

The K-Means Clustering Method

After Iteration 1:

Page 14: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 14

The K-Means Clustering Method

After Iteration 2 & 3:

Page 15: DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested

Data Mining (by R.S.K. Baber) 15