15
Intelligent Database Systems Lab N.Y.U.S. T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan, Lida Xub, Feng Guo, Jun Lee, Baopin Yan Information Systems 32 (2007)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

local-density based spatial clustering algorithm

with noise

Presenter : Lin, Shu-Han

Authors : Lian Duan, Lida Xub, Feng Guo, Jun Lee, Baopin Yan

Information Systems 32 (2007)

Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

DBSCAN (Density Based Spatial Clustering of Applications with Noise) is density-based clustering method.

use global density parameter to characterize the datasets.

Clustering

Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

4

DBSCAN is a density-based algorithm. Density = number of points within a specified radius (Eps)

A point is a core point if it has more than a specified number of points (MinPts) within Eps

These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the

neighborhood of a core point

A noise point is any point that is not a core point or a border point.

DBSCAN

4

Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Original Points Point types: core, border and noise

Eps = 10, MinPts = 4

DBSCAN: Core, Border and Noise Points

5

Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Replace global density parameter Eps

MinPts

6

Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Overview

7

Core Point: local outlier factor - LOF(p) is small enough LOF: the degree the object is being outlying

LRD: the local-density of the object

:Local-density reachability

Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – LDBSCAN

8

Local-density reachable

LRD: the local-density of the object

reach-distk (p, o) = max{k-distance(o), d(p, o)}

Ex: LRD(p)/LRD(q)=1.28

Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – LDBSCAN

9

LOF: the degree the object is being outlying

Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – parameter

10

LOFUB \

MinPts

Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – parameter

11

Local density reachable:pct

LRD(q) = 0.8

LRD(p) = 1

0.8/1.2<1, 1!<0.8*1.2, // !Local density reachable

0.8/1.5<1,1 <0.8*1.5, // Local density reachable

Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – compare with OPTICS

12

Ordering Points To Identify the Clustering Structure

Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – compare with OPTICS

13

The idea of LOF

Page 14: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions

Global density parameter vs. different local densities LDBSCAN: Local-density-based

Page 15: Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage improves idea from other approach

Drawback It’s still hard to set the parameter

The real data is not a 2-D problem

Application not suitable for SOM