http://www.ad-exchange.fr/
Cluster Labeling with Double Application of SOM
Vahid Moosavi
Researcher at Future Cities Laboratory
PhD Student at Chair for Computer Aided Architectural Design (CAAD), ETH Zurich, Ludger Hovestadt
1 November 2013
Outline
• How to explain clustering and clusters?
• Cluster labeling problem
• Current methods and the proposed method
• The Case: Finding Thematic Research Areas within FCL
How to explain clustering and clusters?
• Conceptually Clusters are– Meant to show temporal and evolving identities
– To show emergent concepts from lower level concepts
– Bottom up from instances Vs. external references
– Decoupling identities from objects, things and instances and to create new dimensions in between• Brands,
• Spoken languages and dialects
• Academic disciplines
• Genres of movies, music classes , ….
3
How to explain clustering and clusters?
• And Technically Clusters – Show the direction of eigenvectors of data matrix– Are the result of transformations, which can be linear or nonlinear
• And what is happening with new phenomena like Big Data and information sharing is that externally defined (some times imposed!) references are not sufficient any more.– Academic disciplines– Software industries : Google Android Vs. Microsoft windows XP!
The goal: How to make the process of clustering and concept generation, computationally practical for individuals?
4
Cluster labeling problem
• Topic Modeling in Natural Language Processing– Document Clustering
– Automatic Sentiment Analysis
• Market Segmentation and Customer Clustering– CRM data
– City Call center Data (Mood of the City)
• Enterprise Knowledge Modeling Using Text Archives
Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013
6
Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013
7
Clustering and Cluster Labeling In terms of Geography-Andre Skupin (2005)
Clustering and Cluster Labeling
A Semantic Landscape of the Last.fm Music
Current Methods
• Differential Cluster Labeling
– Mutual Information
– Chi-Squared Selection
• Cluster-Internal Labeling
– Centroid Labels
– Title Labels
– External knowledge labels
The Proposed Method
• Use of SOM as a nonlinear Data-Clustering (transformation) and visualization technique
• Use of the concept of tensor to produce required data matrices for SOM
The Proposed Method
Tensors
(multi-aspect data representation)
13
Aspect A Features
Ob
ject
s
Ob
ject
s
Aspect A Features
• Wavelet Decomposition• One original object (one signal ) is
decomposed to several aspects (different scales or frequencies)
The Proposed Method
SOM (as a nonlinear data transformation: here used for clustering and visualization)
SOM is a Generic Machine works normally with Matrices of data
15
10 records, 100+ dimensions
200+ records, 100+ dimensions
200+ records, 100+ dimensionsBut with clear clusters
The Proposed Method
Features
Ob
ject
s
XOriginal Data set
SOM Clustering
Clusters Vector
Ob
ject
s
Y
Ob
ject
s
Features
A second Order Tensor
Clusters Vector
Feat
ure
s
Z
Tensor
SOM
Visualization of the main concepts (potential labels) within each cluster
The Case: Finding Thematic Research Areas within FCL
17
Finding Thematic Research Areas within FCL
18
Each row vector shows one persons interest related to those selected features
Features
Ob
ject
s
X
Finding Thematic Research Areas within FCL
19
First plot (each curve is one person)
Finding Thematic Research Areas within FCL
20SOM
Finding Thematic Research Areas within FCL
21
SOM + K means clustering
5 clusters detected
Now…What are the main concepts within each cluster? How to label these clusters?
Clusters Vector
Ob
ject
s
Y
Finding Thematic Research Areas within FCL
22
Tensor based transformation
A simple visualization of each cluster regarding to all the features
Clusters Vector
Ob
ject
s
Y
Features
Ob
ject
s
X
Clusters Vector
Feat
ure
s
Z
Finding Thematic Research Areas within FCL
23
Tensor based transformation + another SOM
Finding Thematic Research Areas within FCL
24
Tensor based transformation + another SOM
Finding Thematic Research Areas within FCL
25
Tensor based transformation + another SOM
Finding Thematic Research Areas within FCL
26
Tensor based transformation + another SOM
Finding Thematic Research Areas within FCL
27
Tensor based transformation + another SOM
Finding Thematic Research Areas within FCL
28
Tensor based transformation + another SOM
Thanks!