Upload
v-dhinakaran
View
218
Download
0
Embed Size (px)
Citation preview
8/13/2019 Lec-7-SOM
1/54
8/13/2019 Lec-7-SOM
2/54
2
Self Organized Map (SOM)
The self-organizing map (SOM)is a method forunsupervised learning, based on a grid of artificialneuronswhose weights are adapted to match inputvectors in a training set.
It was first described by the Finnishprofessor TeuvoKohonenand is thus sometimes referred to as a Kohonenmap.
SOM is one of the most popular neural computationmethods in use, and several thousand scientific articleshave been written about it. SOM is especially good at
producing visualizationsof high-dimensional data.
http://en.wikipedia.org/wiki/Unsupervised_learninghttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Unsupervised_learning8/13/2019 Lec-7-SOM
3/54
3
8/13/2019 Lec-7-SOM
4/54
4
Self Organizing Maps (SOM)
SOM is an unsupervised neural networktechnique that approximates an unlimited numberof input data by a finite set of models arranged ina grid, where neighbor nodes correspond to moresimilar models.
The models are produced by a learning algorithm
that automatically orders them on the two-dimensional grid along with their mutualsimilarity.
8/13/2019 Lec-7-SOM
5/54
5
The brain maps the external
multidimensional representation of
the world into a similar 1 or 2 -
dimensional internal representation.
That is, the brain processes the
external signals in a topology-
preserving way
Mimicking the way the brain
learns, our system should be able to
do the same thing.
Brains self-organization
8/13/2019 Lec-7-SOM
6/54
6
Why SOM ?
Unsupervised Learning
Clustering
Classification
Monitoring
Data Visualization
Potential for combination between SOM and other neural
network (MLP-RBF)
8/13/2019 Lec-7-SOM
7/54
8/13/2019 Lec-7-SOM
8/54
8
Concept of the SOM.Input space
Input layer
Reduced feature space
Map layer
s1
s2Mn
Sr
Ba
Clusteringand orderingof the cluster centersin a two dimensional grid
Cluster centers (code vectors) Place of these code vectors
in the reduced space
8/13/2019 Lec-7-SOM
9/54
9
Network Architecture
Two layers of units
Input: nunits (length of training vectors)
Output: munits (number of categories)
Input units fully connected with weights to output
units
Intralayer (lateral) connections
Within output layer
Defined according to some topology
Not weights, but used in algorithm for updating weights
8/13/2019 Lec-7-SOM
10/54
10
SOM - Architecture
Lattice of neurons (nodes) accepts and responds to set of inputsignals
Responses compared; winning neuron selected from lattice
Selected neuron activated together with neighbourhoodneurons
Adaptive process changes weights to more closely inputs
2d array of neurons
Set of input signals
Weights
x1 x2 x3 xn...
wj1 wj2 wj3 wjn
j
8/13/2019 Lec-7-SOM
11/54
11
Measuring distances between nodes
Distances between outputneurons will be used in thelearning process.
It may be based upon:
a) Rectangular lattice
b) Hexagonal lattice
Let d(i,j) be the distancebetween the output nodes i,j
d(i,j) = 1 if node j is in the firstouter rectangle/hexagon of nodei
d(i,j) = 2 if node j is in thesecond outer rectangle/hexagonof node i
And so on..
8/13/2019 Lec-7-SOM
12/54
12
Each neuron is a node containing a template against
which input patterns are matched.
All Nodes are presented with the same input pattern inparallel and compute the distance between their template
and the input in parallel.
Only the node with the closest match between the input
and its template produces an active output.
Each Node therefore acts like a separate decoder (or
pattern detector, feature detector) for the same input and
the interpretation of the input derives from the presence
or absence of an active response at each location
(rather than the magnitude of response or an input-output
transformation as in feedforward or feedback networks).
8/13/2019 Lec-7-SOM
13/54
13
SOM: interpretation
EachSOM neuron can be seen asrepresentinga cluster containing all the input examples
which are mapped to that neuron.
For a given input, the outputof SOM is the
neuron with weight vector most similar (with
respect to Euclidean distance) to that input.
8/13/2019 Lec-7-SOM
14/54
8/13/2019 Lec-7-SOM
15/54
15
Types of Mapping
Familiaritythe net learns how similar is a givennew input to the typical (average) pattern it hasseen before
The net finds Principal Componentsin the data Clusteringthe net finds the appropriate
categories based on correlations in the data
Encodingthe output represents the input, using
a smaller amount of bits Feature Mappingthe net forms a topographic
map of the input
8/13/2019 Lec-7-SOM
16/54
16
Possible Applications
Familiarity and PCA can be used to analyzeunknown data
PCA is used for dimension reduction Encoding is used for vector quantization
Clustering is applied on any types of data
Feature mapping is important for dimensionreduction and for functionality (as in thebrain)
8/13/2019 Lec-7-SOM
17/54
17
Simple Models
Network has inputs and outputs
There is no feedbackfrom the environment
no supervision
The network updates the weights following
some learning rule, and finds patterns,
features or categories within the inputspresented to the network
8/13/2019 Lec-7-SOM
18/54
18
Unsupervised Learning
In unsupervised competi tive learningthe neurons
take part in some competition for each input. The
winner of the competition and sometimes some
other neurons are allowed to change their weights
In simple competi tive learningonly the winner is
allowed to learn (change its weight).
In self -organizing mapsother neurons in the
neighborhood of the winner may also learn.
8/13/2019 Lec-7-SOM
19/54
19
Simple Competitive Learning
x1
x2
xN
W11
W12
W22WP1
WPN
Y1
Y2
YP
N inputs units
P output neurons
P x N weights
Pi
N
jjiji XWh
...2,1
1
01oriY
8/13/2019 Lec-7-SOM
20/54
20
Network Activation
The unit with the highest field hi fires
i*is the winner unit
Geometrically is closest to the current
input vector
The winning units weight vector is updated
to be even closer to the current input vector
*iW
8/13/2019 Lec-7-SOM
21/54
21
Learning
Starting with small random weights, at each
step:
1. a new input vector is presented to the network2. all fields are calculated to find a winner
3. is updated to be closer to the input
Using standard competitive learning equ.
*iW
)( ** jijji WXW
8/13/2019 Lec-7-SOM
22/54
22
Result
Each output unit moves to the center of
mass of a cluster of input vectors
clustering
8/13/2019 Lec-7-SOM
23/54
23
Competitive Learning, Cntd
It is important to break the symmetry in the
initial random weights
Final configuration depends on initialization
A winning unit has more chances of winning
the next time a similar input is seen
Some outputs may never fireThis can be compensated by updating the non
winning units with a smaller update
M b SOM l i
8/13/2019 Lec-7-SOM
24/54
24
More about SOM learning
Upon repeated presentations of the trainingexamples, the weight vectors of the neurons
tend to follow the distribution of the
examples.
This results in a topological ordering of the
neurons, where neurons adjacent to each other
tend to have similar weight vectors.
The input space of patterns is mapped onto a
discrete output space of neurons.
8/13/2019 Lec-7-SOM
25/54
25
SOMLearning Algorithm1. Randomly initialise all weights
2. Select input vector x = [x1
, x2
, x3
, , xn
] from training set
3. Compare xwith weights wjfor each neuron j to
4. determine winnerfind unit j with the minimum distance
5. Update winner so that it becomes more like x, together with
the winners neighbours for units within the radius
according to6. Adjust parameters: learning rate & neighbourhood function
7. Repeat from (2) until ?
i
iijj xwd 2)(
)]()[()()1( nwxnnwnw ijiijij
1)1()(0 nn Note that: Learning rate generally decreaseswith time:
8/13/2019 Lec-7-SOM
26/54
26
Example
An SOFM network with three inputs and two cluster units is to betrained using the four training vectors:
[0.8 0.7 0.4], [0.6 0.9 0.9], [0.3 0.4 0.1], [0.1 0.1 02] andinitial weights
The initial radius is 0 and the learning rate is 0.5 . Calculate theweight changes during the first cycle through the data, taking thetraining vectors in the given order.
5.08.0
2.06.04.05.0
weights to the first
cluster unit
0.5
0.6
0.8
8/13/2019 Lec-7-SOM
27/54
27
Solution
The Euclidian distance of the input vector 1 to cluster unit 1 is:
The Euclidian distance of the input vector 1 to cluster unit 2 is:
Input vector 1 is closest to cluster unit 1 so update weights to cluster unit 1:
26.04.08.07.06.08.05.0 2221 d
42.04.05.07.02.08.04.0 2222 d
)8.04.0(5.08.06.0
)6.07.0(5.06.065.0
)5.08.0(5.05.065.0
)]([5.0)()1(
nwxnwnw ijiijij
5.060.0
2.065.0
4.065.0
8/13/2019 Lec-7-SOM
28/54
28
Solution
The Euclidian distance of the input vector 2 to cluster unit 1 is:
The Euclidian distance of the input vector 2 to cluster unit 2 is:
Input vector 2 is closest to cluster unit 1 so update weights to cluster unit 1 again:
155.09.06.09.065.06.065.0 2221 d
69.09.05.09.02.06.04.0 2222 d
)60.09.0(5.060.0750.0
)65.09.0(5.065.0775.0
)65.06.0(5.065.0625.0
)]([5.0)()1(
nwxnwnw ijiijij
5.0750.02.0775.0
4.0625.0
Repeat the same update procedure for input vector 3
and 4 also.
N i hb h d F ti
8/13/2019 Lec-7-SOM
29/54
29
Neighborhood Function
Gaussian neighborhood function:
dji: lateral distance of neurons i and j
in a 1-dimensional lattice | j - i |
in a 2-dimensional lattice || rj- ri||where rjis the position of neuron j in the lattice.
2
2
2exp)(
ijiji
ddh
8/13/2019 Lec-7-SOM
30/54
30N13(1) N13(2)
N i hb h d F ti
8/13/2019 Lec-7-SOM
31/54
31
Neighborhood Function
measures the degree to which excitedneurons in the vicinity of the winning
neuron cooperate in the learning process.
In the learning algorithm is updated at
each iteration during the ordering phase
using the following exponential decay
update rule, with parameters
10 exp)( T
nn
8/13/2019 Lec-7-SOM
32/54
32
Neighbourhood function
0
0.5
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
Degree of
neighbourhood
Distance from winner
Degree of
neighbourhood
Distance from winner
Time
Time
8/13/2019 Lec-7-SOM
33/54
33
UPDATE RULE
)(-x)()()()1( )( nwnhnnwnw jxijjj
exponential decay update of the learning rate:
2
0 exp)( Tnn
8/13/2019 Lec-7-SOM
34/54
34
Illustration of learning for Kohonen maps
Inputs: coordinates (x,y) of points
drawn from a square
Display neuron j at position xj,yj
where its sjis maximum
Random initial positions
100 inputs 200 inputs
1000 inputsx
y
8/13/2019 Lec-7-SOM
35/54
35
Two-phases learning approach
Self-organizing or ordering phase. The learning rateand spread of the Gaussian neighborhood function
are adapted during the execution of SOM, using for
instance the exponential decay update rule.
Convergence phase. The learning rate and Gaussian
spread have small fixed values during the execution
of SOM.
O d i Ph
8/13/2019 Lec-7-SOM
36/54
36
Ordering Phase
Self organizing or ordering phase: Topological ordering of weight vectors.
May take 1000 or more iterations of SOM algorithm.
Important choice of the parameter values. For instance
(n): 0
= 0.1 T2
= 1000decrease gradually (n) 0.01
hji(x)(n): 0big enough T1 =
With this parameter setting initially the neighborhood ofthe winning neuron includes almost all neurons in thenetwork, then it shrinks slowly with time.
1000
log (0)
8/13/2019 Lec-7-SOM
37/54
37
Convergence Phase
Convergence phase:Fine tune the weight vectors.
Must be at least 500 times the number of neurons in
the network thousands or tens of thousands of
iterations.
Choice of parameter values:
(n) maintained on the order of 0.01.
Neighborhood function such that the neighbor of thewinning neuron contains only the nearest neighbors.
It eventually reduces to one or zero neighboring
neurons.
8/13/2019 Lec-7-SOM
38/54
38
8/13/2019 Lec-7-SOM
39/54
39
Another Self-Organizing Map
(SOM) Example From Fausett (1994)
n = 4, m = 2 More typical of SOM application
Smaller number of units in output than in input;dimensionality reduction
Training samplesi1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Input units:
Output units: 1 2
What should we expect as outputs?
Network Architecture
8/13/2019 Lec-7-SOM
40/54
40
What are the Euclidean Distances
Between the Data Samples? Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
i1 i2 i3 i4
i1 0
i2 0i3 0
i4 0
8/13/2019 Lec-7-SOM
41/54
41
Euclidean Distances Between Data
Samples Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
i1 i2 i3 i4i1 0
i2 3 0
i3 1 2 0
i4 4 1 3 0Input units:
Output units:1
2 What might we expect from the SOM?
Example Details Input units:
8/13/2019 Lec-7-SOM
42/54
42
Example Details Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)
With only 2 outputs, neighborhood = 0 Only update weights associated with winning output unit (cluster) at each
iteration
Learning rate(t) = 0.6; 1
8/13/2019 Lec-7-SOM
43/54
43
First Weight Update
Training sample: i1 Unit 1 weights
d2= (.2-1)2+ (.6-1)2+ (.5-0)2+ (.9-0)2 = 1.86
Unit 2 weights
d2
= (.8-1)2
+ (.4-1)2
+ (.7-0)2
+ (.3-0)2
= .98 Unit 2 wins
Weights on winning unit are updated
Giving an updated weight matrix:
3.7.4.8.
9.5.6.2.Unit 1:
Unit 2:
( , , , )
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
])3.7.4.8.[-0]01[1(6.0]3.7.4.8.[2 weightsunitnew
.12].28.76[.92
12.
9.
28.
5.
76.
6.
92.
2.Unit 1:
Unit 2:
i1: (1, 1, 0, 0)
8/13/2019 Lec-7-SOM
44/54
44
Second Weight Update
Training sample: i2 Unit 1 weights
d2= (.2-0)2+ (.6-0)2+ (.5-0)2+ (.9-1)2 = .66
Unit 2 weights
d2
= (.92-0)2
+ (.76-0)2
+ (.28-0)2
+ (.12-1)2
= 2.28 Unit 1 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
( , , , )
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
])9.5.6.2.[-1]00[0(6.0]9.5.6.2.[1 weightsunitnew
.96].20.24[.08
Unit 1:
Unit 2:
12.
9.
28.
5.
76.
6.
92.
2.
12.
96.
28.
20.
76.
24.
92.
08.
i1: (1, 1, 0, 0)
8/13/2019 Lec-7-SOM
45/54
45
Third Weight Update
Training sample: i3 Unit 1 weights
d2= (.08-1)2+ (.24-0)2+ (.2-0)2+ (.96-0)2 = 1.87
Unit 2 weights
d2
= (.92-1)2
+ (.76-0)2
+ (.28-0)2
+ (.12-0)2
= 0.68 Unit 2 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
( , , , )
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
])12.28.76.92.[-0]00[1(6.0]12.28.76.92.[2 weightsunitnew
.05].11.30[.97
Unit 1:
Unit 2:
12.
96.
28.
20.
76.
24.
92.
08.
05.
96.
11.
20.
30.
24.
97.
08.
i1: (1, 1, 0, 0)
8/13/2019 Lec-7-SOM
46/54
46
Fourth Weight Update
Training sample: i4 Unit 1 weights
d2= (.08-0)2+ (.24-0)2+ (.2-1)2+ (.96-1)2 = .71
Unit 2 weights
d2
= (.97-0)2
+ (.30-0)2
+ (.11-1)2
+ (.05-1)2
= 2.74 Unit 1 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
( , , , )
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
])96.20.24.08.[-1]10[0(6.0]96.20.24.08.[1 weightsunitnew
.98].68.10[.03
Unit 1:
Unit 2:
05.
98.
11.
68.
30.
10.
97.
03.
05.
96.
11.
20.
30.
24.
97.
08.
A l i h SOM Al i h
8/13/2019 Lec-7-SOM
47/54
47
Applying the SOM Algorithm
time (t) 1 2 3 4 D(t) (t)
1 Unit 2 0 0.6
2 Unit 1 0 0.6
3 Unit 2 0 0.6
4 Unit 1 0 0.6
Data sample utilized
winning output unit
Unit 1:
Unit 2:
00.1
05.
5.0
0.10
After many iterations (epochs)
through the data set:
Did we get the clustering that we expected?
8/13/2019 Lec-7-SOM
48/54
48
What clusters do thedata samples fall into?
Unit 1:
Unit 2:
00.1
05.
5.0
0.10
WeightsInput units:
Output units: 1 2
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)
Solution WeightsTraining samples
8/13/2019 Lec-7-SOM
49/54
49
Solution
Sample: i1
Distance from unit1 weights
(1-0)2+ (1-0)2+ (0-.5)2+ (0-1.0)2 = 1+1+.25+1=3.25
Distance from unit2 weights (1-1)2+ (1-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)
Sample: i2
Distance from unit1 weights
(0-0)2+ (0-0)2+ (0-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)
Distance from unit2 weights
(0-1)2+ (0-.5)2+ (0-0)2+ (1-0)2 =1+.25+0+1=2.25
Unit 1:
Unit 2:
0
0.1
0
5.
5.
0
0.1
0
g
Input units:
Output units: 1 2
g p
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
2
1 ,, ))((
n
k kjkl twid2 = (Euclidean distance)2=
Solution WeightsTraining samples
8/13/2019 Lec-7-SOM
50/54
50
Solution
Sample: i3
Distance from unit1 weights
(1-0)2+ (0-0)2+ (0-.5)2+ (0-1.0)2 = 1+0+.25+1=2.25
Distance from unit2 weights (1-1)2+ (0-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)
Sample: i4
Distance from unit1 weights
(0-0)2+ (0-0)2+ (1-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)
Distance from unit2 weights
(0-1)2+ (0-.5)2+ (1-0)2+ (1-0)2 = 1+.25+1+1=3.25
Unit 1:
Unit 2:
0
0.1
0
5.
5.
0
0.1
0Input units:
Output units: 1 2
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
2
1 ,, ))((
n
k kjkl twid2 = (Euclidean distance)2=
W d i
8/13/2019 Lec-7-SOM
51/54
51
Word categories
8/13/2019 Lec-7-SOM
52/54
52
Examples of Applications
Kohonen (1984). Speech recognition - a mapof phonemes in the Finish language
Optical character recognition - clustering ofletters of different fonts
Angeliol etal (1988)travelling salesmanproblem (an optimization problem)
Kohonen (1990)learning vector quantization(pattern classification problem)
Ritter & Kohonen (1989)semantic maps
8/13/2019 Lec-7-SOM
53/54
53
Summary Unsupervised learning is very common
US learning requires redundancy in the stimuli
Self organization is a basic property of the brains
computational structure SOMs are based on
competition (wta units)
cooperation
synaptic adaptation SOMs conserve topological relationships between
the stimuli
Artificial SOMs have many applications in
computational neuroscience
8/13/2019 Lec-7-SOM
54/54
54
End of slides