Cluster labeling fcl_weeklymeeting30102013

Preview:

Citation preview

svm@arch.ethz.chSEC

http://www.ad-exchange.fr/

Cluster Labeling with Double Application of SOM

Vahid Moosavi

Researcher at Future Cities Laboratory

PhD Student at Chair for Computer Aided Architectural Design (CAAD), ETH Zurich, Ludger Hovestadt

1 November 2013

Outline

• How to explain clustering and clusters?

• Cluster labeling problem

• Current methods and the proposed method

• The Case: Finding Thematic Research Areas within FCL

How to explain clustering and clusters?

• Conceptually Clusters are– Meant to show temporal and evolving identities

– To show emergent concepts from lower level concepts

– Bottom up from instances Vs. external references

– Decoupling identities from objects, things and instances and to create new dimensions in between• Brands,

• Spoken languages and dialects

• Academic disciplines

• Genres of movies, music classes , ….

3

How to explain clustering and clusters?

• And Technically Clusters – Show the direction of eigenvectors of data matrix– Are the result of transformations, which can be linear or nonlinear

• And what is happening with new phenomena like Big Data and information sharing is that externally defined (some times imposed!) references are not sufficient any more.– Academic disciplines– Software industries : Google Android Vs. Microsoft windows XP!

The goal: How to make the process of clustering and concept generation, computationally practical for individuals?

4

Cluster labeling problem

• Topic Modeling in Natural Language Processing– Document Clustering

– Automatic Sentiment Analysis

• Market Segmentation and Customer Clustering– CRM data

– City Call center Data (Mood of the City)

• Enterprise Knowledge Modeling Using Text Archives

Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013

6

Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013

7

Clustering and Cluster Labeling In terms of Geography-Andre Skupin (2005)

Clustering and Cluster Labeling

A Semantic Landscape of the Last.fm Music

Current Methods

• Differential Cluster Labeling

– Mutual Information

– Chi-Squared Selection

• Cluster-Internal Labeling

– Centroid Labels

– Title Labels

– External knowledge labels

The Proposed Method

• Use of SOM as a nonlinear Data-Clustering (transformation) and visualization technique

• Use of the concept of tensor to produce required data matrices for SOM

The Proposed Method

Tensors

(multi-aspect data representation)

13

Aspect A Features

Ob

ject

s

Ob

ject

s

Aspect A Features

• Wavelet Decomposition• One original object (one signal ) is

decomposed to several aspects (different scales or frequencies)

The Proposed Method

SOM (as a nonlinear data transformation: here used for clustering and visualization)

SOM is a Generic Machine works normally with Matrices of data

15

10 records, 100+ dimensions

200+ records, 100+ dimensions

200+ records, 100+ dimensionsBut with clear clusters

The Proposed Method

Features

Ob

ject

s

XOriginal Data set

SOM Clustering

Clusters Vector

Ob

ject

s

Y

Ob

ject

s

Features

A second Order Tensor

Clusters Vector

Feat

ure

s

Z

Tensor

SOM

Visualization of the main concepts (potential labels) within each cluster

The Case: Finding Thematic Research Areas within FCL

17

Finding Thematic Research Areas within FCL

18

Each row vector shows one persons interest related to those selected features

Features

Ob

ject

s

X

Finding Thematic Research Areas within FCL

19

First plot (each curve is one person)

Finding Thematic Research Areas within FCL

20SOM

Finding Thematic Research Areas within FCL

21

SOM + K means clustering

5 clusters detected

Now…What are the main concepts within each cluster? How to label these clusters?

Clusters Vector

Ob

ject

s

Y

Finding Thematic Research Areas within FCL

22

Tensor based transformation

A simple visualization of each cluster regarding to all the features

Clusters Vector

Ob

ject

s

Y

Features

Ob

ject

s

X

Clusters Vector

Feat

ure

s

Z

Finding Thematic Research Areas within FCL

23

Tensor based transformation + another SOM

Finding Thematic Research Areas within FCL

24

Tensor based transformation + another SOM

Finding Thematic Research Areas within FCL

25

Tensor based transformation + another SOM

Finding Thematic Research Areas within FCL

26

Tensor based transformation + another SOM

Finding Thematic Research Areas within FCL

27

Tensor based transformation + another SOM

Finding Thematic Research Areas within FCL

28

Tensor based transformation + another SOM

Thanks!

Recommended