69
Recent Trends in Fuzzy Clustering: From Data to Knowledge Shenyang, August 2009 [email protected]

Recent Trends in Fuzzy Clustering: From Data to Knowledge

Embed Size (px)

DESCRIPTION

Recent Trends in Fuzzy Clustering: From Data to Knowledge. Witold Pedrycz Department of Electrical & Computer Engineering University of Alberta, Edmonton, Canada and Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland. [email protected]. Shenyang, August 2009. - PowerPoint PPT Presentation

Citation preview

Recent Trends in Fuzzy Clustering:

From Data to Knowledge

Shenyang, August 2009

[email protected]

Agenda

Introduction: clustering, information granulation and paradigm shift

Key challenges in clustering

Fuzzy objective-based clustering

Knowledge-based augmentation of fuzzy clustering

Collaborative fuzzy clustering

Concluding comments

Clustering

Areas of research and applications:

•Data analysis•Modeling•Structure determination

Google Scholar -2, 190,000 hits for “clustering” (as of August 6, 2009)

Clustering as aconceptual and algorithmic framework of information

granulationData information granules (clusters) abstraction of data

Formalism of: set theory (K-Means) fuzzy sets (FCM) rough sets

shadowed sets

Main categories of clustering

Graph-oriented and hierarchical (single linkage, complete linkage, average linkage..)

Objective function-based clustering

Diversity of formalisms and optimization tools(e.g., methods of Evolutionary Computing)

Key challenges of clustering

Data-driven methods

Selection of distance function (geometry of clusters)

Number of clusters

Quality of clustering results

The dichotomy and the shift of paradigm

Fuzzy Clustering: Fuzzy C-Means (FCM)

Given data x1, x2, …, xN, determine its structure byforming a collection of information granules – fuzzy sets

Objective function

2ik

N

1k

mik

c

1i||||uQ vx

Minimize Q; structure in data (partition matrix and prototypes)

Fuzzy Clustering: Fuzzy C-Means (FCM)

Vi – prototypes

U- partition matrix

FCM – optimization

2ik

N

1k

mik

c

1i||||uQ vx

Minimize

subject to

(a) prototypes

(b) partition matrix

Optimization - details

Partition matrix – the use of Lagrange multipliers

V = uikm

i=1

c

∑ dik2 + λ ( uik −1)

i=1

c

∂V

∂ust

= 0 ∂V

∂λ= 0

dik= ||xk-vi||2

–Lagrange multiplier

Optimization – partition matrix (1)

c

1iik

2ik

c

1i

mik 1)uλ(duV 0

λ

V 0

u

V

st

λ dmuu

V 2st

1mst

st

dm

λu 1-m

2

st1-m

1

st

c

1j

1m

2

jt1m

1

1dm

λ

c

1j

1m

2

jt

1m

1

d

1

m

λ

c

1j

1m

1

2jt

2st

st

dd

1u

Optimization- prototypes (2)

2ij

n

1jkj

N

1k

mik

c

1i)v(xuQ

Gradient of Q with respect to vs

N

1kstkt

mik 0)v(xu

N

1k

mik

N

1kkt

mik

st

u

xu

v

Euclidean distance

Fuzzy C-Means (FCM): An overviewprocedure FCM-CLUSTERING (x) returns prototypes and partition matrix

input : data x = {x1, x2, ..., xk}

local: fuzzification parameter: m

threshold:

norm: ||.||

INITIALIZE-PARTITION-MATRIX

t 0

repeat

for i=1:c do

N

1k

mik

N

1kk

mik

i

)t(u

)t(u

)t(

x

v compute prototypes

for i = 1:c do

for k = 1:N do

update partition matrix

c

1j

1)2/(m

jk

ik

ik

||(t)||

||(t)||

1)1t(u

vx

vx

update partition matrix

t t + 1

until ||U(t+1)-U(t)||

return U, V

Geometry of information granules

m =1.2 m =2.0 m =3.5

n=1

Domain Knowledge:Category of knowledge-

oriented guidance

Partially labeled data: some data are provided with labels (classes)

Proximity knowledge: some pairs of data are quantified interms of their proximity (closeness)

Viewpoints: some structural information is provided

Context-based guidance: clustering realized in a certain contextspecified with regard to some attribute

Clustering with domain knowledge

(Knowledge-based clustering)

Data

Information granules (structure)

CLUSTERING

Domain knowledge

Data-driven Data- and knowledge-driven

Data

Information granules (structure)

CLUSTERING

Context-based clustering

To align the agenda of fuzzy clustering with the principles of fuzzymodeling, the following features are considered:

Active role of the designer [customization of the model]

The structural backbone of the model is fully reflective of relationshipsbetween information granules in the input and output space

Clustering : construct clusters in input space X

Context-based Clustering : construct clusters in input space X given some context expressed in output space Y

Context-based clustering:Computing considerations

•computationally more efficient,•well-focused, •designer-guided clustering process

Data

structure

Data

structure

context

Context-based clustering

Context-based Clustering : construct clusters in input space X given some context expressed in output space Y

Context – hint (piece of domain knowledge) provided by designer who actively impacts the

development of the model

Context-based clustering:Context design

Context – hint (piece of domain knowledge) provided by designer who actively impacts the

development of the model. As such, context is imposed by the designer at the beginning

Realization of context

Designer focus information granule (fuzzy set)

(a) Designer, and (b) clustering of scalar data in output space

Context – fuzzy set (set) formed in the output space

Context-based clustering:Modeling

Determine structure in input space given the output is high

Determine structure in input space given the output is medium

Determine structure in input space given the output is low

Input space (data)

Context-based clustering:examples

Find a structure of customer data [clustering]

Find a structure of customer data considering customers making weekly purchases in the range [$1,000 $3,000]

Find a structure of customer data considering customers making weekly purchases at the level of

around $ 2,500

Find a structure of customer data considering customers making significant weekly purchases who

are young

no context

context

context

context(compound)

Context-oriented FCM

Data (xk, targetk), k=1,2,…,N

Contexts: fuzzy sets W1, W2, …, Wp

wjk = Wi(targetk) membership of j-th context for k-th data

c

1i

N

1kikjkikikj iNu0andk wu|0,1u)(WU

Context-driven partition matrix

Context-oriented FCM:Optimization flow

Objective function

Iterative adjustment of partition matrix and prototypes

2ik

c

1i

N

1k

mik ||||uQ vx

c

1j

1m

2

jk

ik

jkik

wu

vx

vx

N

1k

mik

N

1kk

mik

i

u

u xv

Subject to constraint U in U(Wj)

Viewpoints: definition

Description of entity (concept) which is deemed essential in describing phenomenon (system) and helpful in castingan overall analysis in a required setting

“external” , “reinforced” clusters

Viewpoints: definition

-150

-100

-50

0

50

100

150

200

0 100 200 300 400 500

x1

x2

a

b

x1

x2

a

viewpoint (a,b) viewpoint (a,?)

Viewpoints: definition

Description of entity (concept) which is deemed essential in describing phenomenon (system) and helpful in castingan overall analysis in a required setting

“external” , “reinforced” clusters

Viewpoints: definition

-150

-100

-50

0

50

100

150

200

0 100 200 300 400 500

x1

x2

a

b

x1

x2

a

viewpoint (a,b) viewpoint (a,?)

Viewpoints in fuzzy clustering

x1

x2

a

b

otherwise 0,

viewpointby the determined is B of rowth -i theof featureth -j theif 1,b ij

0

0

1

0

0

1

B

0

0

b

0

0

a

F

B- Boolean matrix characterizing structure: viewpoints prototypes (induced by data)

Viewpoints in fuzzy clustering

Q = 2ijkj

n

1:bji,1j

mik

c

1i

N

1k

2ijkj

n

0:bji,1j

mik

c

1i

N

1k

)f(xu)v(xu

ijij

1b if f

0bif vg

ijij

ijijij

2ijkj

n

1j

mik

c

1i

N

1k

)g(xuQ

Viewpoints in fuzzy clustering

x1

x2

a

b

otherwise 0,

viewpointby the determined is B of rowth -i theof featureth -j theif 1,b ij

0

0

1

0

0

1

B

0

0

b

0

0

a

F

B- Boolean matrix characterizing structure: viewpoints prototypes (induced by data)

Viewpoints in fuzzy clustering

Q = 2ijkj

n

1:bji,1j

mik

c

1i

N

1k

2ijkj

n

0:bji,1j

mik

c

1i

N

1k

)f(xu)v(xu

ijij

1b if f

0bif vg

ijij

ijijij

2ijkj

n

1j

mik

c

1i

N

1k

)g(xuQ

Labelled data and their description

Characterization in terms of membership degrees:

F = [fik] i=12,…,c , k=1,2, …., N

and supervision indicator b = [bk], k=1,2,…, N

Augmented objective function

Q =i=1

c

∑ uik2

k=1

N

∑ || xk − vi ||2 +β∑ (uik − fik )2bk || xk − vi ||2∑

> 0

Proximity hints

Characterization in terms of proximity degrees:

Prox(k, l), k, l=1,2, …., N

and supervision indicator matrix B = [bkl], k, l=1,2,…, N

Prox(k,l)

Prox(s,t)

Proximity measure

Properties of proximity:

(a)Prox(k, k) =1

(b)Prox(k,l) = Prox(l,k)

Proximity induced by partition matrix U:

Prox(k,l) = min(uik

i=1

c

∑ ,uil )

Augmented objective function

Q =i=1

c

∑ uik2

k=1

N

∑ || xk − vi ||2 +βi=1

c

∑k1=1

N

∑ [Prox(k1,k2) − Prox(U)(k1,k2)]2b(k1, k2) || xk1 − xk2 ||2

k2=1

N

> 0

Two general development strategies

SELECTION OF A “MEANINGFUL” SUBSET OF INFORMATION GRANULES

Two general development strategies

(1) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES (INFORMMATION GRANULES OF HIGHER TYPE)

Information granulesType -1

Information granulesType -2

Two general development strategies

(2) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES AND THE USE OF VIEWPOINTS

Information granulesType -1

Information granulesType -2

viewpoints

Two general development strategies

(3) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES – A MODE OF SUCCESSIVE CONSTRUCTION

Information granules andtheir representatives

ui(vk[ii]) =1

|| vk[ii]− z i ||Fii ∩ F

|| vk[ii]− z j ||Fii ∩ F

⎝ ⎜ ⎜

⎠ ⎟ ⎟

2/(m−1)

j=1

c

z1

z2

zc v1[ii]

Represent vk[ii] with the use of z1, z2, …, zc

Fii

F

Representation of fuzzy sets:two performance measures

Entropy measure

Reconstruction criterion (error)

Expressing performance through entropy measure

[ii]))(H(u k

c[ii]

1ki

c

1i

p

1ii

v

Reconstruction error

Q =

c[ii]

1k

2kk

p

1iiii

||[ii][ii])(ˆ|| Fvvv

where

ik

c

1i

mik [ii])(u[ii])(ˆ zvvv

[ii])(u/[ii])(u[ii])(ˆ k

c

1i

miik

c

1i

mik vzvvv

Requirement of “coverage” condition

p

1ii

c

1kik

FF

Optimization problem

p

1ii

c

1kik

FF p

1ii

c

1kik

FF

Form a collection of prototypes Z = {z1, z2, …, zc} such that

entropy (or reconstruction error)

is minimized while satisfying coverage criterion

MinZ Q subject to

Optimization of fuzzification coefficient (m)

MinZ Q subject to m>1 and p

1ii

c

1kik

FF

Collaborative structure development (2)

phenomenon, process, system…

Informationgranules

data-1 data-2data-P

Informationgranules ofhigher type

Collaborative structure determination:Information granules of higher order

D[1] D[2] D[P]

prototypes

Clustering

Prototypes(higher order)

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

Determining correspondence between clusters (3)

Clustering

Prototypes(higher order)

zj

Select prototypes in D[1], D[2], …, D[p] associated with zj with the highest degree of membership

Determining correspondence between clusters (4)

vi[ii]

zj

D[ii]

ijc[ii]1,2,...,iji

c[ii]

1k

2

jk

ji

ij

λmax[ii]λ

||[ii]||

||[ii]||

1[ii]λ

0

zv

zv

Prototype i0 associated with prototype zj

Family of associated prototypes

Prototype i1 in D[1] associated with prototype zj

Prototype i2 in D[2] associated with prototype zj

Prototype ip in D[p] associated with prototype zj

p21

p21

iii

iii

,...., ,

[P] [2],...., [1],

vvv

From numeric prototypes to granular prototypes

p21

p21

iii

iii

,...., ,

[P] [2],...., [1],

vvv

individual coordinate of the associated prototypes:

a1 a2 …. ap

1 2 …. p

Information granule

R

[0,1]

The principle of justifiable granularity:Interval representation

a1 a2 …. ap

1 2 …. p

b d

1

0

if a i ∈ [b,d] then elevate to membership grades to 1

required change : 1- μ i

a0

The principle of justifiable granularity:Interval representation

a1 a2 …. ap

1 2 …. p

b d

1

0

if a i ∉ [b,d] then reduce membership grades to 0

required change : μ i

a0

The principle of justifiable granularity:optimization criterion

z1 z2

1

0

Min b,d ∈R:b≤d{ (1− μ i) + μ i}a i ∉[b,d]

∑a i ∈[b,d]

Hyperbox prototypes

Hi

Hj

level)n aggregatio at the clusters ofnumber (the HH : ji ji

Interval-valued fuzzy setsand granular prototypes

Hi

Hj

x

Interval-valued fuzzy setsand granular prototypes

vi

x

Bounds of distances determined coordinate-wise

maxi

mini

|||

||||

vx

vx

Interval-valued fuzzy sets:membership function

c

1j

1

2

minj

maxi

i

c

1j

1

2

maxj

mini

i

||||

||||

1)(u

||||

||||

1)(u

m

m

vx

vx

x

vx

vx

x Upper bound

Lower bound

Collaborative structure determination:Structure refinement

Feedback and structurerefinement

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

Collaborative structure determination:Structure refinement

Iterate Clustering at the local level

Sharing findings and clustering at the higher (global) level

Assessment of quality of clusters in light of the global structure i(U)[ii] formed at the higher level

Refinement of clustering

Until termination criterion satisfied

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

phenomenon, process, system…

I nformationgranules

data-1 data-2data-P

I nformationgranules ofhigher type

2c[ii]

1i [ii]iki ||[ii]||(U)[ii]γQ[ii]

k

Xx

vx

Concluding comments

Paradigm shift from data-based clustering to knowledge-based clustering

Accommodation of knowledge in augmented objective functions

Emergence of type-2 (higher type) information granules when working with collaborative clustering