Upload
ira-lewis
View
219
Download
5
Embed Size (px)
Citation preview
TreeSOM :Cluster analysis in the self-organizing map
Neural Networks 19 (2006) 935-949 2006 Special Issue
Reporter張欽隆 D9515012
What is SOM
• Self-Organizing Map(Kohonen, 1987)
• Mapping high-dimensional data onto a low-dimensional grid
• Similar data elements are placed close together
SOM
• A SOM consists of neurons organized on a regular low dimensional grid
• Each neuron has a weight vector
• During training, neighboring neurons on the grid get similar weight vector. (come closer)
• Finally, similar data elements are placed close together.
Problem of SOM
• Different map initializations and input order of data elements may result in different clustering.
• For large data sets, analyze different clustering for finding best clustering is a lengthy and laborious task
This paper present…
• A new SOM visualization
• An unsupervised method for cluster analysis and confidence testing for finding reliable clustering.
Outline
• Visualization• Cluster discovery• Calibration• Best clustering• SOM as Tree• Cluster confidence• The most representative SOM
Visualization
• Tree variants of SOM visualization with umat (u-matrix, kraaijveld, Mao & Jain, 1992)
– (a) each cell is shaded according to the average distance form the neuron to its neighbors.
– (b) Grid density is double, some cells represent neurons and other are borders.
– (c) Use original SOM grid density but draw the borders between the neurons as thick lines
• Base on representation (c),we can define a cluster as a group of nodes surrounded by an uninterrupted border of a given shade.
Cluster discovery
• Cluster: a group of nodes with short distances between them and long distances to the other nodes.
• Algorithm for finding cluster: each node within a cluster must be connected to at least one other node within the same cluster with an edge that is shorter than the distance threshold.
Cluster discovery: clustering map
• Cluster borders are always displayed in black
• Cluster area are shaded according to the average distance between the nodes within the cluster. (nodes density)
Calibration
• Mapping data on a trained SOM is called calibration.
• The clusters are shaded according to the average distance between the data elements. (element density)
Best clustering• Problem:
– Data is grouped into few large loose cluster at high thresholds the clustering may be too coarse, grouping together unrelated items.
– Data is grouped into many small tighter clusters at low thresholds the clustering may be too fine, unnecessarily separating similar items.
• What is best clustering ?– The clustering with the tightest clusters with the greatest
distances between them.( looseness / average distance )– Algorithm:
SOM as Tree
• Why Tree ?– Traditional hierarchical clusterings visualization use tree.– Comparison become easier if the SOM itself is represented as a
tree.
Cluster confidence• Problem: SOM (clustering) may be different with different initialization and input order.
• How to determine the “true” clustering ?– Only the clusters found in the majority of cases, may be included in the final clustering (Kohonen)
• Consensus tree was used to tackle this problem.(Margush and McMorris, 1981)
Conensus tree – node equivalence• Node equivalence
– Two nodes are equivalent if their sets of leaves are identical.
Consensus tree• Confidence value
• Base on confidence values, the consensus tree is built, starting with the nodes with confidence above 50%.