33
The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadi mitriou et al, Advisor: Dr. Hs u Graduate: Yu-We i Su

The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Embed Size (px)

Citation preview

Page 1: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

The Supervised Network Self-Organizing Map for Classification of Large Data Sets

Authors: Papadimitriou et al,

Advisor: Dr. Hsu

Graduate: Yu-Wei Su

Page 2: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Outline

Motivation Objective Introduction The Supervised Network SOM

The classification partition SOM(CP-SOM) The supervised expert network

Applications Conclusions Personal opinion

Page 3: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Motivation

Real data sets are frequently characterized by the large number of noisy observations

Unsupervised learning schemes usually can’t discriminate well over the state space of complex decision boundaries

Page 4: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Objective

To develop the Network Self-Organizing Map(SNet-SOM) to handle to ambiguous regions of state space

To develop more computationally efficient unsupervised learning scheme

Page 5: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Introduction

SNet-SOM utilizes a two stage learning process that identifying and classifying at the simple regions and supervised learning for the difficult ones

The simple regions is handled by the SNet-SOM based on SOM of Kohonen

The basic SOM is modified with a dynamic node insertion/deletion process with an entropy-based criterion

Page 6: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Introduction( cont.)

The difficult regions is handled by supervised learning process such as RBF(radial basis function) or SVM(support vector machine)

Page 7: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

The Supervised Network SOM

The SNet-SOM consists of two components The classification partition SOM(CP-SOM) The supervised expert network

Page 8: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM

The size of the CP-SOM is dynamically expanded with an adaptive process for the ambiguous regions

The dynamic growth is based on the entropy-base criterion

Classifcation are performed only at the unambiguous part of state space that corresponds to the neurons of small entropy

Page 9: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM( cont.)

CP-SOM learning flow Initialization phase

Usually four nodes to represent the input data It has lighter computational demands because of

avoiding the fine-tuning of neurons and the small size Adaptation phase

However , parameters do not need to shrink with time because the neighborhood is large enough to include the whole and during subsequent training epochs, the neighborhood becomes localized near the winning neuron

Page 10: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM( cont.)

Expansion phase The controlling the number of training patterns that cor

respond to the ambiguous regions is the motivation for modifying SOM

The expansion phase follows the adaptation phase SupervisedExpertMaxPatterns specified a limitation of tra

ining set SupervisedExpertMinPatterns specified the lower bound

of training set

kjkkj

kjj Njkwxkkw

Njkwkw

)),(()()(

),()1(

Page 11: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM( cont.)

1. To compute the entropy for every node I

2. Detection of the neurons whose are ambiguous according entropy threshold value

3. Evaluation of the map to compute the number of training patterns that correspond to the ambiguous neurons denoted by NumTrainingSetAtAmbiguous

cN

kkk PPmHN

1

log)(

Page 12: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM( cont.)

4. If NumTrainingSetAtAmbiguous > SupervisedExpertMaxPatterns

1. Perform map expansion by inserting smoothly at the neighborhood of each ambiguous neuron a number of neurons that depends on its fuzziness

2. Repeat the adaptation phase after the dynamic extension

else

If NumTrainingSetAtAmbiguous < Supervised ExpertMinPatterns

Reduce the parameter NodeEntropyThresholdForConsideringAmbiguous

and more node will be as ambiguous. Restarting from step 2

Page 13: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

CP-SOM( cont.)

else generate training and testing set for the supervise

d expert

endif The assignment of a class label to each neur

on of the CP-SOM is performed by majority-voting scheme

As a local averaging operator defined over the class labels of all the patterns that activate neuron as the winner

Page 14: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

The supervised expert network

Has the task of discriminating over the state space regions where are complex class decision boundaries

Appropriate neural network models are Radial Basis Function(RBF) and the Support Vector Machines(SVM)

Page 15: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

RBF supervising expert

Obtaining generalization performance by which obtain a tradeoff between the fitness of the solution to the training set and smoothness of the solution

The tradeoff cost function as :( positive real number called regularization para., D a stabilizer)

where

)()()( FCFCFC rs

l

iiis xFdFC

1

2)]([2

1)(

2

2

1)( DFFCr

Page 16: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

RBF supervising expert( cont.)

Proper generalization performance is a difficult issue as well as the selection of centers and para.

Supervised ExpertMinPatterns is hard to estimated

Page 17: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

SVM supervising expert

SVM obtains high generalization performance without prioir knowledge even dimension of input space is high

The classification is to estimate a function f:RN{±1} using input-output training data (x1,y1),…(xl,yl) RN x {±1}

Page 18: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

SVM supervising expert(cont.)

To minimized the risk in order to obtain generalization performance

Since P(x,y) is unknown ,can only minimize the empirical risk

),(|)(|2

1yxdPyxffR

l

iiiemp yxf

lfR

1

|)(|2

11][

Page 19: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

SVM supervising expert(cont.)

R[f] has the dependence on the VC dim para. h which is done by maximum separation Δ between different classes with linear hyperplane

For a set of pattern vector x1,…,xl X, the hyperplanes can be as {x X: w.x+b=0}, w a weight vector, b a bias

Page 20: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

SVM supervising expert(cont.)

Page 21: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

SVM supervising expert(cont.)

xi·w+b≥+1 for yi=+1 (1) yi(xi·w+b)-1≥0 ,i=1,…,l xi·w+b≤-1 for yi= -1 (2) , w is the Normal vector of H1,H2 H1: xi·w+b=1 ,H2: xi·w+b=-1 Margin=2/║w║, ◎ is a support vector.

Page 22: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Applications

synthesis data distinction of chaos from noise ischemia detection

Page 23: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Synthesis data

The synthesis model look like:

Construction steps:1. Generation of some proper value Vi,i=1,…,N

2. Induction of observation noise,V’i=1,…,N

3. Computation of the values of outcome variables

4. Induce observation noise to the outcome variables,O’

i

iin AfAAAfY )(),...,( 2,1

Page 24: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Synthesis data( cont.)

Page 25: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

distinction of chaos from noise

To design a classification system that is able distinguishing between a three-dim vector and random Gaussian noise

Lorenz chaotic system has been used to generate a chaotic trajectory that lying at the three-dim space

The difficulty of distinguishing noise is dependent on the state space region

Page 26: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

distinction of chaos from noise( cont.)

The regions far from the attractor can be handle effectively with the CP-SOM classification

Rest regions with supervised expert can’t be distinguished since these are regions where the classes overlap

Training set :20,000 half of Lorenz system and half of Gaussian noise

Test set:20,000 which is constructed similarly

Page 27: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

distinction of chaos from noise( cont.)

Size of ambiguous pattern set was near 2000 with entropy criterion 0.2

Plain SOM avg. performance 79% SNet-SOM with RBF 81% SNet-SOM with SVM 82%

Page 28: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

ischemia detection

The ECG signals of the European ST-T database, which are a set of longterm Holter recording provided by eight countries

From the samples composing each beat, a window of 400 millisec. is selected

Signal component forms the input to PCA in order to describe most of its content with a few coefficients

Page 29: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

ischemia detection( cont.)

The term dimensionality reduction refers to that 100-dim data vector X is represented with a vector X of 5-dim

A wavelet based denoising technique based on Lipschitz regulariation theory is applied

Training set:15,000 ST-T segment from 44,000 beats from 6 records, Two class:normal, ischemic

Test set: 15 records with 120,000 ECG beats

Page 30: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

ischemia detection( cont.)

Page 31: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

ischemia detection( cont.)

Page 32: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Conclusions

To obtain significant computational benefits in large scale problems

The SNet-SOM is a modular architecture that can be improved along many directions

Page 33: The Supervised Network Self- Organizing Map for Classification of Large Data Sets Authors: Papadimitriou et al, Advisor: Dr. Hsu Graduate: Yu-Wei Su

Personal opinion

Provide a director of detecting noise within improved SOM

It is a nice reference to my research