Lec-7-SOM

8/13/2019 Lec-7-SOM

1/54

8/13/2019 Lec-7-SOM

2/54

2

Self Organized Map (SOM)

The self-organizing map (SOM)is a method forunsupervised learning, based on a grid of artificialneuronswhose weights are adapted to match inputvectors in a training set.

It was first described by the Finnishprofessor TeuvoKohonenand is thus sometimes referred to as a Kohonenmap.

SOM is one of the most popular neural computationmethods in use, and several thousand scientific articleshave been written about it. SOM is especially good at

producing visualizationsof high-dimensional data.
http://en.wikipedia.org/wiki/Unsupervised_learninghttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Unsupervised_learning

8/13/2019 Lec-7-SOM

3/54

3

8/13/2019 Lec-7-SOM

4/54

4

Self Organizing Maps (SOM)

SOM is an unsupervised neural networktechnique that approximates an unlimited numberof input data by a finite set of models arranged ina grid, where neighbor nodes correspond to moresimilar models.

The models are produced by a learning algorithm

that automatically orders them on the two-dimensional grid along with their mutualsimilarity.

8/13/2019 Lec-7-SOM

5/54

5

The brain maps the external

multidimensional representation of

the world into a similar 1 or 2 -

dimensional internal representation.

That is, the brain processes the

external signals in a topology-

preserving way

Mimicking the way the brain

learns, our system should be able to

do the same thing.

Brains self-organization

8/13/2019 Lec-7-SOM

6/54

6

Why SOM ?

Unsupervised Learning

Clustering

Classification

Monitoring

Data Visualization

Potential for combination between SOM and other neural

network (MLP-RBF)

8/13/2019 Lec-7-SOM

7/54

8/13/2019 Lec-7-SOM

8/54

8

Concept of the SOM.Input space

Input layer

Reduced feature space

Map layer

s1

s2Mn

Sr

Ba

Clusteringand orderingof the cluster centersin a two dimensional grid

Cluster centers (code vectors) Place of these code vectors

in the reduced space

8/13/2019 Lec-7-SOM

9/54

9

Network Architecture

Two layers of units

Input: nunits (length of training vectors)

Output: munits (number of categories)

Input units fully connected with weights to output

units

Intralayer (lateral) connections

Within output layer

Defined according to some topology

Not weights, but used in algorithm for updating weights

8/13/2019 Lec-7-SOM

10/54

10

SOM - Architecture

Lattice of neurons (nodes) accepts and responds to set of inputsignals

Responses compared; winning neuron selected from lattice

Selected neuron activated together with neighbourhoodneurons

Adaptive process changes weights to more closely inputs

2d array of neurons

Set of input signals

Weights

x1 x2 x3 xn...

wj1 wj2 wj3 wjn

j

8/13/2019 Lec-7-SOM

11/54

11

Measuring distances between nodes

Distances between outputneurons will be used in thelearning process.

It may be based upon:

a) Rectangular lattice

b) Hexagonal lattice

Let d(i,j) be the distancebetween the output nodes i,j

d(i,j) = 1 if node j is in the firstouter rectangle/hexagon of nodei

d(i,j) = 2 if node j is in thesecond outer rectangle/hexagonof node i

And so on..

8/13/2019 Lec-7-SOM

12/54

12

Each neuron is a node containing a template against

which input patterns are matched.

All Nodes are presented with the same input pattern inparallel and compute the distance between their template

and the input in parallel.

Only the node with the closest match between the input

and its template produces an active output.

Each Node therefore acts like a separate decoder (or

pattern detector, feature detector) for the same input and

the interpretation of the input derives from the presence

or absence of an active response at each location

(rather than the magnitude of response or an input-output

transformation as in feedforward or feedback networks).

8/13/2019 Lec-7-SOM

13/54

13

SOM: interpretation

EachSOM neuron can be seen asrepresentinga cluster containing all the input examples

which are mapped to that neuron.

For a given input, the outputof SOM is the

neuron with weight vector most similar (with

respect to Euclidean distance) to that input.

8/13/2019 Lec-7-SOM

14/54

8/13/2019 Lec-7-SOM

15/54

15

Types of Mapping

Familiaritythe net learns how similar is a givennew input to the typical (average) pattern it hasseen before

The net finds Principal Componentsin the data Clusteringthe net finds the appropriate

categories based on correlations in the data

Encodingthe output represents the input, using

a smaller amount of bits Feature Mappingthe net forms a topographic

map of the input

8/13/2019 Lec-7-SOM

16/54

16

Possible Applications

Familiarity and PCA can be used to analyzeunknown data

PCA is used for dimension reduction Encoding is used for vector quantization

Clustering is applied on any types of data

Feature mapping is important for dimensionreduction and for functionality (as in thebrain)

8/13/2019 Lec-7-SOM

17/54

17

Simple Models

Network has inputs and outputs

There is no feedbackfrom the environment

no supervision

The network updates the weights following

some learning rule, and finds patterns,

features or categories within the inputspresented to the network

8/13/2019 Lec-7-SOM

18/54

18

Unsupervised Learning

In unsupervised competi tive learningthe neurons

take part in some competition for each input. The

winner of the competition and sometimes some

other neurons are allowed to change their weights

In simple competi tive learningonly the winner is

allowed to learn (change its weight).

In self -organizing mapsother neurons in the

neighborhood of the winner may also learn.

8/13/2019 Lec-7-SOM

19/54

19

Simple Competitive Learning

x1

x2

xN

W11

W12

W22WP1

WPN

Y1

Y2

YP

N inputs units

P output neurons

P x N weights

Pi

N

jjiji XWh

...2,1

1

01oriY

8/13/2019 Lec-7-SOM

20/54

20

Network Activation

The unit with the highest field hi fires

i*is the winner unit

Geometrically is closest to the current

input vector

The winning units weight vector is updated

to be even closer to the current input vector

*iW

8/13/2019 Lec-7-SOM

21/54

21

Learning

Starting with small random weights, at each

step:

1. a new input vector is presented to the network2. all fields are calculated to find a winner

3. is updated to be closer to the input

Using standard competitive learning equ.

*iW

)( ** jijji WXW

8/13/2019 Lec-7-SOM

22/54

22

Result

Each output unit moves to the center of

mass of a cluster of input vectors

clustering

8/13/2019 Lec-7-SOM

23/54

23

Competitive Learning, Cntd

It is important to break the symmetry in the

initial random weights

Final configuration depends on initialization

A winning unit has more chances of winning

the next time a similar input is seen

Some outputs may never fireThis can be compensated by updating the non

winning units with a smaller update

M b SOM l i

8/13/2019 Lec-7-SOM

24/54

24

More about SOM learning

Upon repeated presentations of the trainingexamples, the weight vectors of the neurons

tend to follow the distribution of the

examples.

This results in a topological ordering of the

neurons, where neurons adjacent to each other

tend to have similar weight vectors.

The input space of patterns is mapped onto a

discrete output space of neurons.

8/13/2019 Lec-7-SOM

25/54

25

SOMLearning Algorithm1. Randomly initialise all weights

2. Select input vector x = [x1

, x2

, x3

, , xn

] from training set

3. Compare xwith weights wjfor each neuron j to

4. determine winnerfind unit j with the minimum distance

5. Update winner so that it becomes more like x, together with

the winners neighbours for units within the radius

according to6. Adjust parameters: learning rate & neighbourhood function

7. Repeat from (2) until ?

i

iijj xwd 2)(

)]()[()()1( nwxnnwnw ijiijij

1)1()(0 nn Note that: Learning rate generally decreaseswith time:

8/13/2019 Lec-7-SOM

26/54

26

Example

An SOFM network with three inputs and two cluster units is to betrained using the four training vectors:

[0.8 0.7 0.4], [0.6 0.9 0.9], [0.3 0.4 0.1], [0.1 0.1 02] andinitial weights

The initial radius is 0 and the learning rate is 0.5 . Calculate theweight changes during the first cycle through the data, taking thetraining vectors in the given order.

5.08.0

2.06.04.05.0

weights to the first

cluster unit

0.5

0.6

0.8

8/13/2019 Lec-7-SOM

27/54

27

Solution

The Euclidian distance of the input vector 1 to cluster unit 1 is:


Input vector 1 is closest to cluster unit 1 so update weights to cluster unit 1:

26.04.08.07.06.08.05.0 2221 d

42.04.05.07.02.08.04.0 2222 d

)8.04.0(5.08.06.0

)6.07.0(5.06.065.0

)5.08.0(5.05.065.0

)]([5.0)()1(

nwxnwnw ijiijij

5.060.0

2.065.0

4.065.0

8/13/2019 Lec-7-SOM

28/54

28

Solution



Input vector 2 is closest to cluster unit 1 so update weights to cluster unit 1 again:

155.09.06.09.065.06.065.0 2221 d

69.09.05.09.02.06.04.0 2222 d

)60.09.0(5.060.0750.0

)65.09.0(5.065.0775.0

)65.06.0(5.065.0625.0

)]([5.0)()1(

nwxnwnw ijiijij

5.0750.02.0775.0

4.0625.0

Repeat the same update procedure for input vector 3

and 4 also.

N i hb h d F ti

8/13/2019 Lec-7-SOM

29/54

29

Neighborhood Function

Gaussian neighborhood function:

dji: lateral distance of neurons i and j

in a 1-dimensional lattice | j - i |

in a 2-dimensional lattice || rj- ri||where rjis the position of neuron j in the lattice.

2

2

2exp)(

ijiji

ddh

8/13/2019 Lec-7-SOM

30/54

30N13(1) N13(2)

N i hb h d F ti

8/13/2019 Lec-7-SOM

31/54

31

Neighborhood Function

measures the degree to which excitedneurons in the vicinity of the winning

neuron cooperate in the learning process.

In the learning algorithm is updated at

each iteration during the ordering phase

using the following exponential decay

update rule, with parameters

10 exp)( T

nn

8/13/2019 Lec-7-SOM

32/54

32

Neighbourhood function

0

0.5

1

-10 -8 -6 -4 -2 0 2 4 6 8 10

0

0.5

1

-10 -8 -6 -4 -2 0 2 4 6 8 10

Degree of

neighbourhood

Distance from winner

Degree of

neighbourhood

Distance from winner

Time

Time

8/13/2019 Lec-7-SOM

33/54

33

UPDATE RULE

)(-x)()()()1( )( nwnhnnwnw jxijjj

exponential decay update of the learning rate:

2

0 exp)( Tnn

8/13/2019 Lec-7-SOM

34/54

34

Illustration of learning for Kohonen maps

Inputs: coordinates (x,y) of points

drawn from a square

Display neuron j at position xj,yj

where its sjis maximum

Random initial positions

100 inputs 200 inputs

1000 inputsx

y

8/13/2019 Lec-7-SOM

35/54

35

Two-phases learning approach

Self-organizing or ordering phase. The learning rateand spread of the Gaussian neighborhood function

are adapted during the execution of SOM, using for

instance the exponential decay update rule.

Convergence phase. The learning rate and Gaussian

spread have small fixed values during the execution

of SOM.

O d i Ph

8/13/2019 Lec-7-SOM

36/54

36

Ordering Phase

Self organizing or ordering phase: Topological ordering of weight vectors.

May take 1000 or more iterations of SOM algorithm.

Important choice of the parameter values. For instance

(n): 0

= 0.1 T2

= 1000decrease gradually (n) 0.01

hji(x)(n): 0big enough T1 =

With this parameter setting initially the neighborhood ofthe winning neuron includes almost all neurons in thenetwork, then it shrinks slowly with time.

1000

log (0)

8/13/2019 Lec-7-SOM

37/54

37

Convergence Phase

Convergence phase:Fine tune the weight vectors.

Must be at least 500 times the number of neurons in

the network thousands or tens of thousands of

iterations.

Choice of parameter values:

(n) maintained on the order of 0.01.

Neighborhood function such that the neighbor of thewinning neuron contains only the nearest neighbors.

It eventually reduces to one or zero neighboring

neurons.

8/13/2019 Lec-7-SOM

38/54

38

8/13/2019 Lec-7-SOM

39/54

39

Another Self-Organizing Map

(SOM) Example From Fausett (1994)

n = 4, m = 2 More typical of SOM application

Smaller number of units in output than in input;dimensionality reduction

Training samplesi1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

Input units:

Output units: 1 2

What should we expect as outputs?

Network Architecture

8/13/2019 Lec-7-SOM

40/54

40

What are the Euclidean Distances

Between the Data Samples? Training samples

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

i1 i2 i3 i4

i1 0

i2 0i3 0

i4 0

8/13/2019 Lec-7-SOM

41/54

41

Euclidean Distances Between Data

Samples Training samples

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

i1 i2 i3 i4i1 0

i2 3 0

i3 1 2 0

i4 4 1 3 0Input units:

Output units:1

2 What might we expect from the SOM?

Example Details Input units:

8/13/2019 Lec-7-SOM

42/54

42

Example Details Training samples

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)

With only 2 outputs, neighborhood = 0 Only update weights associated with winning output unit (cluster) at each

iteration

Learning rate(t) = 0.6; 1

8/13/2019 Lec-7-SOM

43/54

43

First Weight Update

Training sample: i1 Unit 1 weights

d2= (.2-1)2+ (.6-1)2+ (.5-0)2+ (.9-0)2 = 1.86

Unit 2 weights

d2

= (.8-1)2

+ (.4-1)2

+ (.7-0)2

+ (.3-0)2

= .98 Unit 2 wins

Weights on winning unit are updated

Giving an updated weight matrix:

3.7.4.8.

9.5.6.2.Unit 1:

Unit 2:

( , , , )

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

])3.7.4.8.[-0]01[1(6.0]3.7.4.8.[2 weightsunitnew

.12].28.76[.92

12.

9.

28.

5.

76.

6.

92.

2.Unit 1:

Unit 2:

i1: (1, 1, 0, 0)

8/13/2019 Lec-7-SOM

44/54

44

Second Weight Update


d2= (.2-0)2+ (.6-0)2+ (.5-0)2+ (.9-1)2 = .66

Unit 2 weights

d2

= (.92-0)2

+ (.76-0)2

+ (.28-0)2

+ (.12-1)2

= 2.28 Unit 1 wins



Unit 1:

Unit 2:

( , , , )

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

])9.5.6.2.[-1]00[0(6.0]9.5.6.2.[1 weightsunitnew

.96].20.24[.08

Unit 1:

Unit 2:

12.

9.

28.

5.

76.

6.

92.

2.

12.

96.

28.

20.

76.

24.

92.

08.

i1: (1, 1, 0, 0)

8/13/2019 Lec-7-SOM

45/54

45

Third Weight Update


d2= (.08-1)2+ (.24-0)2+ (.2-0)2+ (.96-0)2 = 1.87

Unit 2 weights

d2

= (.92-1)2

+ (.76-0)2

+ (.28-0)2

+ (.12-0)2

= 0.68 Unit 2 wins



Unit 1:

Unit 2:

( , , , )

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

])12.28.76.92.[-0]00[1(6.0]12.28.76.92.[2 weightsunitnew

.05].11.30[.97

Unit 1:

Unit 2:

12.

96.

28.

20.

76.

24.

92.

08.

05.

96.

11.

20.

30.

24.

97.

08.

i1: (1, 1, 0, 0)

8/13/2019 Lec-7-SOM

46/54

46

Fourth Weight Update


d2= (.08-0)2+ (.24-0)2+ (.2-1)2+ (.96-1)2 = .71

Unit 2 weights

d2

= (.97-0)2

+ (.30-0)2

+ (.11-1)2

+ (.05-1)2

= 2.74 Unit 1 wins



Unit 1:

Unit 2:

( , , , )

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

])96.20.24.08.[-1]10[0(6.0]96.20.24.08.[1 weightsunitnew

.98].68.10[.03

Unit 1:

Unit 2:

05.

98.

11.

68.

30.

10.

97.

03.

05.

96.

11.

20.

30.

24.

97.

08.

A l i h SOM Al i h

8/13/2019 Lec-7-SOM

47/54

47

Applying the SOM Algorithm

time (t) 1 2 3 4 D(t) (t)

1 Unit 2 0 0.6

2 Unit 1 0 0.6

3 Unit 2 0 0.6

4 Unit 1 0 0.6

Data sample utilized

winning output unit

Unit 1:

Unit 2:

00.1

05.

5.0

0.10

After many iterations (epochs)

through the data set:

Did we get the clustering that we expected?

8/13/2019 Lec-7-SOM

48/54

48

What clusters do thedata samples fall into?

Unit 1:

Unit 2:

00.1

05.

5.0

0.10

WeightsInput units:

Output units: 1 2

Training samples

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)

Solution WeightsTraining samples

8/13/2019 Lec-7-SOM

49/54

49

Solution

Sample: i1

Distance from unit1 weights

(1-0)2+ (1-0)2+ (0-.5)2+ (0-1.0)2 = 1+1+.25+1=3.25

Distance from unit2 weights (1-1)2+ (1-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)

Sample: i2


(0-0)2+ (0-0)2+ (0-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)


(0-1)2+ (0-.5)2+ (0-0)2+ (1-0)2 =1+.25+0+1=2.25

Unit 1:

Unit 2:

0

0.1

0

5.

5.

0

0.1

0

g

Input units:

Output units: 1 2

g p

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

2

1 ,, ))((

n

k kjkl twid2 = (Euclidean distance)2=

Solution WeightsTraining samples

8/13/2019 Lec-7-SOM

50/54

50

Solution

Sample: i3


(1-0)2+ (0-0)2+ (0-.5)2+ (0-1.0)2 = 1+0+.25+1=2.25

Distance from unit2 weights (1-1)2+ (0-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)

Sample: i4


(0-0)2+ (0-0)2+ (1-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)


(0-1)2+ (0-.5)2+ (1-0)2+ (1-0)2 = 1+.25+1+1=3.25

Unit 1:

Unit 2:

0

0.1

0

5.

5.

0

0.1

0Input units:

Output units: 1 2

i1: (1, 1, 0, 0)

i2: (0, 0, 0, 1)

i3: (1, 0, 0, 0)

i4: (0, 0, 1, 1)

2

1 ,, ))((

n

k kjkl twid2 = (Euclidean distance)2=

W d i

8/13/2019 Lec-7-SOM

51/54

51

Word categories

8/13/2019 Lec-7-SOM

52/54

52

Examples of Applications

Kohonen (1984). Speech recognition - a mapof phonemes in the Finish language

Optical character recognition - clustering ofletters of different fonts

Angeliol etal (1988)travelling salesmanproblem (an optimization problem)

Kohonen (1990)learning vector quantization(pattern classification problem)

Ritter & Kohonen (1989)semantic maps

8/13/2019 Lec-7-SOM

53/54

53

Summary Unsupervised learning is very common

US learning requires redundancy in the stimuli

Self organization is a basic property of the brains

computational structure SOMs are based on

competition (wta units)

cooperation

synaptic adaptation SOMs conserve topological relationships between

the stimuli

Artificial SOMs have many applications in

computational neuroscience

8/13/2019 Lec-7-SOM

54/54

54

End of slides

Documents

Lec-7-SOM