Upload
dash
View
62
Download
0
Embed Size (px)
DESCRIPTION
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
Presenter : Keng-Yu LinAuthor : Amir Ahmad , Lipika Dey
PRL. 2011
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation· Almost all subspace clustering algorithms proposed so far
are designed for numeric datasets.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
4
Objectives· This paper present a k-means type clustering algorithm
that finds clusters in data subspaces in mixed numeric and categorical datasets.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology· k-means clustering algorithm
1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
2. Assign each object to the group that has the closest centroid.
3. When all objects have been assigned, recalculate the positions of the K centroids.
4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Vote dataset
7
error rate : 4.8%Zaki et al. error rate : 3.8%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Mushroom datasets
8
error rate : 4.1%Zaki et al. error rate : 0.3%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· DNA datasets
9
error rate : 17%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Australian credit data
10
error rate : 13.9%Huang et al.(2005) error rate: 15%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions· This paper presented a clustering algorithm for
subspace clustering for mixed numeric and categorical data.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments· Advantage
· Applications Subspace clustering.
12