12
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets Presenter : Keng-Yu Lin Author : Amir Ahmad , Lipika Dey PRL. 2011

Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011

  • Upload
    dash

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

Presenter : Keng-Yu LinAuthor : Amir Ahmad , Lipika Dey

PRL. 2011

Page 2: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments

Page 3: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation· Almost all subspace clustering algorithms proposed so far

are designed for numeric datasets.

3

Page 4: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives· This paper present a k-means type clustering algorithm

that finds clusters in data subspaces in mixed numeric and categorical datasets.

Page 5: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology· k-means clustering algorithm

1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.

2. Assign each object to the group that has the closest centroid.

3. When all objects have been assigned, recalculate the positions of the K centroids.

4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

5

Page 6: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

6

Page 7: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Vote dataset

7

error rate : 4.8%Zaki et al. error rate : 3.8%

Page 8: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Mushroom datasets

8

error rate : 4.1%Zaki et al. error rate : 0.3%

Page 9: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· DNA datasets

9

error rate : 17%

Page 10: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Australian credit data

10

error rate : 13.9%Huang et al.(2005) error rate: 15%

Page 11: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions· This paper presented a clustering algorithm for

subspace clustering for mixed numeric and categorical data.

11

Page 12: Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments· Advantage

· Applications Subspace clustering.

12