16
TreeSOM :Cluster analysis in th e self-organizing map Neural Networks 19 (2006) 935-949 2006 Special Issue Reporter 張張張 D9515012

TreeSOM :Cluster analysis in the self- organizing map Neural Networks 19 (2006) 935-949 2006 Special Issue Reporter 張欽隆 D9515012

Embed Size (px)

Citation preview

TreeSOM :Cluster analysis in the self-organizing map

Neural Networks 19 (2006) 935-949 2006 Special Issue

Reporter張欽隆 D9515012

What is SOM

• Self-Organizing Map(Kohonen, 1987)

• Mapping high-dimensional data onto a low-dimensional grid

• Similar data elements are placed close together

SOM

• A SOM consists of neurons organized on a regular low dimensional grid

• Each neuron has a weight vector

• During training, neighboring neurons on the grid get similar weight vector. (come closer)

• Finally, similar data elements are placed close together.

Problem of SOM

• Different map initializations and input order of data elements may result in different clustering.

• For large data sets, analyze different clustering for finding best clustering is a lengthy and laborious task

This paper present…

• A new SOM visualization

• An unsupervised method for cluster analysis and confidence testing for finding reliable clustering.

Outline

• Visualization• Cluster discovery• Calibration• Best clustering• SOM as Tree• Cluster confidence• The most representative SOM

Visualization

• Tree variants of SOM visualization with umat (u-matrix, kraaijveld, Mao & Jain, 1992)

– (a) each cell is shaded according to the average distance form the neuron to its neighbors.

– (b) Grid density is double, some cells represent neurons and other are borders.

– (c) Use original SOM grid density but draw the borders between the neurons as thick lines

• Base on representation (c),we can define a cluster as a group of nodes surrounded by an uninterrupted border of a given shade.

Cluster discovery

• Cluster: a group of nodes with short distances between them and long distances to the other nodes.

• Algorithm for finding cluster: each node within a cluster must be connected to at least one other node within the same cluster with an edge that is shorter than the distance threshold.

Cluster discovery: clustering map

• Cluster borders are always displayed in black

• Cluster area are shaded according to the average distance between the nodes within the cluster. (nodes density)

Calibration

• Mapping data on a trained SOM is called calibration.

• The clusters are shaded according to the average distance between the data elements. (element density)

Best clustering• Problem:

– Data is grouped into few large loose cluster at high thresholds the clustering may be too coarse, grouping together unrelated items.

– Data is grouped into many small tighter clusters at low thresholds the clustering may be too fine, unnecessarily separating similar items.

• What is best clustering ?– The clustering with the tightest clusters with the greatest

distances between them.( looseness / average distance )– Algorithm:

SOM as Tree

• Why Tree ?– Traditional hierarchical clusterings visualization use tree.– Comparison become easier if the SOM itself is represented as a

tree.

Cluster confidence• Problem: SOM (clustering) may be different with different initialization and input order.

• How to determine the “true” clustering ?– Only the clusters found in the majority of cases, may be included in the final clustering (Kohonen)

• Consensus tree was used to tackle this problem.(Margush and McMorris, 1981)

Conensus tree – node equivalence• Node equivalence

– Two nodes are equivalent if their sets of leaves are identical.

Consensus tree• Confidence value

• Base on confidence values, the consensus tree is built, starting with the nodes with confidence above 50%.

The most representative SOM• The most representative SOM is the one which is closest to the consensus tree.

• How to measure the similarity ?– The more nodes are equivalent between two tree, the more similar they are.