23
Cluster Analysis Based on Dunham, Clifton, Ullman and Tan et.all

Cluster Analysis - eduwavepool.unizwa.edu.om fileObjectives After finishing this class the students will: Get the overview of Clustering problems Know and understand algorithms for

  • Upload
    vonga

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Cluster Analysis

Based on Dunham, Clifton, Ullman and Tan et.all

Objectives

After finishing this class the

students will:

Get the overview of

Clustering problems

Know and understand

algorithms for solving the

clustering problems

Clustering Problem:

Given a database D={t1,t2,…,tn} of tuples

and an integer value k, the Clustering

Problem is to define a mapping f:Dg{1,..,k}

where each ti is assigned to one cluster

Kj, 1<=j<=k.

Clustering Problem:

A Cluster, Kj, contains precisely those

tuples mapped to it

Clustering Problem:

A clustering is a set of clusters

No prior knowledge

Number of clusters

Meaning of clusters

Unsupervised learning

Not A Clustering Problem:

Supervised classification

Have class label information

Simple segmentation

Dividing students into different registration

groups alphabetically, by last name

Not A Clustering Problem:

Results of a query

Groupings are a result of an external

specification

Graph partitioning

Some mutual relevance and synergy, but

areas are not identical

Intermezzo: How many clusters?

Intermezzo: How many clusters?

Intermezzo: How many clusters?

Intermezzo: How many clusters?

Intermezzo: How many clusters?

Intermezzo: How many clusters?

Types of Clustering Methods

Partitioning Methods

The simplest and the most fundamental in

clustering analysis

Construct various partitions and then

evaluate them by some criterion

Find the exclusive clusters of spherical shape

Distance based

Effective for small and medium size datasets

Types of Clustering Methods

Hierarchical Methods

Create a hierarchical decomposition (i.e., multiple

levels) of the set of data (or objects) using some

criterion

The bottom-up approach starts with object

forming a separate group, merges the objects or

groups close to one another until all the groups

are merged into one or any termination condition

holds.

Types of Clustering Methods

Density-Based Methods

Based on connectivity and density functions

Able to find arbitrarily shaped clusters

Clusters are the dense regions of objects in space

that are separated by low-density regions

Types of Clustering Methods

Density-Based Methods

The cluster density requires each point to have a

minimum number of points within in

“neighborhood”

May filter out outliers

Types of Clustering Methods

Grid-Based Methods

Quantize the object space into a finite number of

cells that form a grid structure

Fast processing time

k – Means Algorithm

Assumes Euclidean space

Start by picking k, the number of clusters

Initialize clusters by picking one point per

cluster

Example: pick one point at random, then k -1

other points, each as far away as possible from

the previous points

k – Means Algorithm: Populating

Clusters

For each point, place it in the cluster whose

current centroid it is nearest

After all points are assigned, fix the centroids

of the k clusters

Optional: reassign all points to their closest

centroid

Sometimes moves points between clusters

k – Means Algorithm

Reference

M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002

J. Han, M. Kamber, Pei, J., Data Mining: Concepts and Techniques, Elsevier, 2012

Tan, P.-N., Steinbach, M., Kumar, V.,

Introduction to Data Mining, Pearson

International, 2005

Dr. Ir. Muhammad Ikhwan Jambak, MEng