A genetic clustering algorithm for data with non-spherical-shape clusters

Preview:

DESCRIPTION

A genetic clustering algorithm for data with non-spherical-shape clusters. Outline. Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review. Motivation. - PowerPoint PPT Presentation

Citation preview

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A genetic clustering algorithm for data with non-spherical-shape clusters

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Lin Yu Tseng

Shiueng Bien Yang

Department of Information Management

Pattern Recognition 33 (2000) 1251-1259

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Outline

Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Some problems of the clustering. The number of clusters? The threshold distance d in neighborhood clustering. Non-spherical-shape clusters.

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective

To solve the problem of these traditional clustering algorithm.

A genetic clustering algorithm for clustering. Non-spherical-shape clusters. According to the similarities and automatically find the pr

oper k.

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

These clustering methods can broadly be classified into two categories: Hierarchical

agglomerative divisive

Non-hierarchical k-means

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

The problems in most of these clustering algorithms The number of clusters? Non-spherical shape cluster? The threshold of distance for merge?

GA clustering algorithm Searching, as same as clustering.

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Basic concept of Classical Genetic Algorithm

Encoding schemas

Fitness evaluation

Testing the end of the algorithm

Parent selection

Crossover operators

Mutation operators

NO Halt

YES

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The genetic clustering algorithm

The algorithm CLUSTERING consists of two stages

First stage

Nearest Neighbor

C1, C2, …, Cm

n objects,

O1, O2, …, On

Second stage

GA clustering

merge

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

First Stage

Step 1: find the nearest neighbor of each object Oi.

Step 2: dav, the average of the nearest neighbor distances.

The mean of u ?

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

First Stage

Step 3: compute the adjacency matrix Anxn.

Step 4: connected components be denoted by

C1, C2, …, Cm.

nij

otherwise

dOOifjiA ji

1 where

,

||||,

0

1),(

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

The initialization step Population Coding Dinter and Dintra

The three phases of GA Reproduction phase Crossover phase Mutation phase

Encoding schemas

Fitness evaluation

Testing the end of the algorithm

Parent selection

Crossover operators

Mutation operators

NO Halt

YES

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Distance matrix Dmxm of each pair of cluster Ci and Cj.

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

The initialization step Population: 50 strings. The length of each string is m:

{C1, C2, …, Cm}

For each string Ri, two sets Ui and U’i are defined

1 1 1 0 0

R1

1 0 1 1 0

R2

m

U1={C1, C2, C3} ; U’1={C4, C5}

U2={C1, C3, C4} ; U’2={C2, C5}

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Intra-distance Dintra and the inter-distance Dinter

U1={C1, C2, C3} ; U’1={C4, C5, C7}

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Reproduction phase Fitness function

SCORE(Ri) = Dinter(Ri)*w – Dintra(Ri), w within [1,3]. Reproducted probability

Crossover phase pc = 0.8.

Mutation phase pm = 0.1.

R1 1 1 1 0 0R2 1 0 1 1 0

N

iii RSCORERSCORE

1

)(/)(

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Merge_Sets_Finding Algorithm

Step 1: Sort the fitness of the strings.

Step 2: Choose Ri.

Step 3: Choose smallest l > i such that .IF no such l exists THEN go to Step 4(discarded)

ELSE i = l and go to Step 2(merge)

Step 4: End.

)(...)()( 21 NRSCORERSCORERSCORE

R1={C1, C2, C3}

R2={C3, C4, C6}

R3={C4, C5}iUUU

Ui ;1

UU l

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

Noise : distance > 2dav

Original

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

u=1.2, 8 clusters

7 clusters

19

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

6 clusters u=1.5 or 2, 5 clusters

20

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

u=1.2, w=2,

4 clusters (best)

3 clusters

21

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

2 clusters 4 clusters (direct GA)

22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

4 clusters (k-mean)

23

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 2

Original

4 clusters

3 clusters

2 clusters

24

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 3

Original

4 clusters

25

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Concluding and Summary

A genetic clustering algorithm CLUSTERING Non-spherical shape. Automatic clustering. Binary searching the proper interval for w.

26

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Personal Opinions The proper number of cluster decide by the value of w.

27

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Review

Using GCA to automatic clustering. Split : NN. Merge : Merge_Sets_Finding Algorithm.

Recommended