Lec-7-SOM

Embed Size (px)

Citation preview

  • 8/13/2019 Lec-7-SOM

    1/54

  • 8/13/2019 Lec-7-SOM

    2/54

    2

    Self Organized Map (SOM)

    The self-organizing map (SOM)is a method forunsupervised learning, based on a grid of artificialneuronswhose weights are adapted to match inputvectors in a training set.

    It was first described by the Finnishprofessor TeuvoKohonenand is thus sometimes referred to as a Kohonenmap.

    SOM is one of the most popular neural computationmethods in use, and several thousand scientific articleshave been written about it. SOM is especially good at

    producing visualizationsof high-dimensional data.

    http://en.wikipedia.org/wiki/Unsupervised_learninghttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Visualizationhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Teuvo_Kohonenhttp://en.wikipedia.org/wiki/Finlandhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Unsupervised_learning
  • 8/13/2019 Lec-7-SOM

    3/54

    3

  • 8/13/2019 Lec-7-SOM

    4/54

    4

    Self Organizing Maps (SOM)

    SOM is an unsupervised neural networktechnique that approximates an unlimited numberof input data by a finite set of models arranged ina grid, where neighbor nodes correspond to moresimilar models.

    The models are produced by a learning algorithm

    that automatically orders them on the two-dimensional grid along with their mutualsimilarity.

  • 8/13/2019 Lec-7-SOM

    5/54

    5

    The brain maps the external

    multidimensional representation of

    the world into a similar 1 or 2 -

    dimensional internal representation.

    That is, the brain processes the

    external signals in a topology-

    preserving way

    Mimicking the way the brain

    learns, our system should be able to

    do the same thing.

    Brains self-organization

  • 8/13/2019 Lec-7-SOM

    6/54

    6

    Why SOM ?

    Unsupervised Learning

    Clustering

    Classification

    Monitoring

    Data Visualization

    Potential for combination between SOM and other neural

    network (MLP-RBF)

  • 8/13/2019 Lec-7-SOM

    7/54

  • 8/13/2019 Lec-7-SOM

    8/54

    8

    Concept of the SOM.Input space

    Input layer

    Reduced feature space

    Map layer

    s1

    s2Mn

    Sr

    Ba

    Clusteringand orderingof the cluster centersin a two dimensional grid

    Cluster centers (code vectors) Place of these code vectors

    in the reduced space

  • 8/13/2019 Lec-7-SOM

    9/54

    9

    Network Architecture

    Two layers of units

    Input: nunits (length of training vectors)

    Output: munits (number of categories)

    Input units fully connected with weights to output

    units

    Intralayer (lateral) connections

    Within output layer

    Defined according to some topology

    Not weights, but used in algorithm for updating weights

  • 8/13/2019 Lec-7-SOM

    10/54

    10

    SOM - Architecture

    Lattice of neurons (nodes) accepts and responds to set of inputsignals

    Responses compared; winning neuron selected from lattice

    Selected neuron activated together with neighbourhoodneurons

    Adaptive process changes weights to more closely inputs

    2d array of neurons

    Set of input signals

    Weights

    x1 x2 x3 xn...

    wj1 wj2 wj3 wjn

    j

  • 8/13/2019 Lec-7-SOM

    11/54

    11

    Measuring distances between nodes

    Distances between outputneurons will be used in thelearning process.

    It may be based upon:

    a) Rectangular lattice

    b) Hexagonal lattice

    Let d(i,j) be the distancebetween the output nodes i,j

    d(i,j) = 1 if node j is in the firstouter rectangle/hexagon of nodei

    d(i,j) = 2 if node j is in thesecond outer rectangle/hexagonof node i

    And so on..

  • 8/13/2019 Lec-7-SOM

    12/54

    12

    Each neuron is a node containing a template against

    which input patterns are matched.

    All Nodes are presented with the same input pattern inparallel and compute the distance between their template

    and the input in parallel.

    Only the node with the closest match between the input

    and its template produces an active output.

    Each Node therefore acts like a separate decoder (or

    pattern detector, feature detector) for the same input and

    the interpretation of the input derives from the presence

    or absence of an active response at each location

    (rather than the magnitude of response or an input-output

    transformation as in feedforward or feedback networks).

  • 8/13/2019 Lec-7-SOM

    13/54

    13

    SOM: interpretation

    EachSOM neuron can be seen asrepresentinga cluster containing all the input examples

    which are mapped to that neuron.

    For a given input, the outputof SOM is the

    neuron with weight vector most similar (with

    respect to Euclidean distance) to that input.

  • 8/13/2019 Lec-7-SOM

    14/54

  • 8/13/2019 Lec-7-SOM

    15/54

    15

    Types of Mapping

    Familiaritythe net learns how similar is a givennew input to the typical (average) pattern it hasseen before

    The net finds Principal Componentsin the data Clusteringthe net finds the appropriate

    categories based on correlations in the data

    Encodingthe output represents the input, using

    a smaller amount of bits Feature Mappingthe net forms a topographic

    map of the input

  • 8/13/2019 Lec-7-SOM

    16/54

    16

    Possible Applications

    Familiarity and PCA can be used to analyzeunknown data

    PCA is used for dimension reduction Encoding is used for vector quantization

    Clustering is applied on any types of data

    Feature mapping is important for dimensionreduction and for functionality (as in thebrain)

  • 8/13/2019 Lec-7-SOM

    17/54

    17

    Simple Models

    Network has inputs and outputs

    There is no feedbackfrom the environment

    no supervision

    The network updates the weights following

    some learning rule, and finds patterns,

    features or categories within the inputspresented to the network

  • 8/13/2019 Lec-7-SOM

    18/54

    18

    Unsupervised Learning

    In unsupervised competi tive learningthe neurons

    take part in some competition for each input. The

    winner of the competition and sometimes some

    other neurons are allowed to change their weights

    In simple competi tive learningonly the winner is

    allowed to learn (change its weight).

    In self -organizing mapsother neurons in the

    neighborhood of the winner may also learn.

  • 8/13/2019 Lec-7-SOM

    19/54

    19

    Simple Competitive Learning

    x1

    x2

    xN

    W11

    W12

    W22WP1

    WPN

    Y1

    Y2

    YP

    N inputs units

    P output neurons

    P x N weights

    Pi

    N

    jjiji XWh

    ...2,1

    1

    01oriY

  • 8/13/2019 Lec-7-SOM

    20/54

    20

    Network Activation

    The unit with the highest field hi fires

    i*is the winner unit

    Geometrically is closest to the current

    input vector

    The winning units weight vector is updated

    to be even closer to the current input vector

    *iW

  • 8/13/2019 Lec-7-SOM

    21/54

    21

    Learning

    Starting with small random weights, at each

    step:

    1. a new input vector is presented to the network2. all fields are calculated to find a winner

    3. is updated to be closer to the input

    Using standard competitive learning equ.

    *iW

    )( ** jijji WXW

  • 8/13/2019 Lec-7-SOM

    22/54

    22

    Result

    Each output unit moves to the center of

    mass of a cluster of input vectors

    clustering

  • 8/13/2019 Lec-7-SOM

    23/54

    23

    Competitive Learning, Cntd

    It is important to break the symmetry in the

    initial random weights

    Final configuration depends on initialization

    A winning unit has more chances of winning

    the next time a similar input is seen

    Some outputs may never fireThis can be compensated by updating the non

    winning units with a smaller update

    M b SOM l i

  • 8/13/2019 Lec-7-SOM

    24/54

    24

    More about SOM learning

    Upon repeated presentations of the trainingexamples, the weight vectors of the neurons

    tend to follow the distribution of the

    examples.

    This results in a topological ordering of the

    neurons, where neurons adjacent to each other

    tend to have similar weight vectors.

    The input space of patterns is mapped onto a

    discrete output space of neurons.

  • 8/13/2019 Lec-7-SOM

    25/54

    25

    SOMLearning Algorithm1. Randomly initialise all weights

    2. Select input vector x = [x1

    , x2

    , x3

    , , xn

    ] from training set

    3. Compare xwith weights wjfor each neuron j to

    4. determine winnerfind unit j with the minimum distance

    5. Update winner so that it becomes more like x, together with

    the winners neighbours for units within the radius

    according to6. Adjust parameters: learning rate & neighbourhood function

    7. Repeat from (2) until ?

    i

    iijj xwd 2)(

    )]()[()()1( nwxnnwnw ijiijij

    1)1()(0 nn Note that: Learning rate generally decreaseswith time:

  • 8/13/2019 Lec-7-SOM

    26/54

    26

    Example

    An SOFM network with three inputs and two cluster units is to betrained using the four training vectors:

    [0.8 0.7 0.4], [0.6 0.9 0.9], [0.3 0.4 0.1], [0.1 0.1 02] andinitial weights

    The initial radius is 0 and the learning rate is 0.5 . Calculate theweight changes during the first cycle through the data, taking thetraining vectors in the given order.

    5.08.0

    2.06.04.05.0

    weights to the first

    cluster unit

    0.5

    0.6

    0.8

  • 8/13/2019 Lec-7-SOM

    27/54

    27

    Solution

    The Euclidian distance of the input vector 1 to cluster unit 1 is:

    The Euclidian distance of the input vector 1 to cluster unit 2 is:

    Input vector 1 is closest to cluster unit 1 so update weights to cluster unit 1:

    26.04.08.07.06.08.05.0 2221 d

    42.04.05.07.02.08.04.0 2222 d

    )8.04.0(5.08.06.0

    )6.07.0(5.06.065.0

    )5.08.0(5.05.065.0

    )]([5.0)()1(

    nwxnwnw ijiijij

    5.060.0

    2.065.0

    4.065.0

  • 8/13/2019 Lec-7-SOM

    28/54

    28

    Solution

    The Euclidian distance of the input vector 2 to cluster unit 1 is:

    The Euclidian distance of the input vector 2 to cluster unit 2 is:

    Input vector 2 is closest to cluster unit 1 so update weights to cluster unit 1 again:

    155.09.06.09.065.06.065.0 2221 d

    69.09.05.09.02.06.04.0 2222 d

    )60.09.0(5.060.0750.0

    )65.09.0(5.065.0775.0

    )65.06.0(5.065.0625.0

    )]([5.0)()1(

    nwxnwnw ijiijij

    5.0750.02.0775.0

    4.0625.0

    Repeat the same update procedure for input vector 3

    and 4 also.

    N i hb h d F ti

  • 8/13/2019 Lec-7-SOM

    29/54

    29

    Neighborhood Function

    Gaussian neighborhood function:

    dji: lateral distance of neurons i and j

    in a 1-dimensional lattice | j - i |

    in a 2-dimensional lattice || rj- ri||where rjis the position of neuron j in the lattice.

    2

    2

    2exp)(

    ijiji

    ddh

  • 8/13/2019 Lec-7-SOM

    30/54

    30N13(1) N13(2)

    N i hb h d F ti

  • 8/13/2019 Lec-7-SOM

    31/54

    31

    Neighborhood Function

    measures the degree to which excitedneurons in the vicinity of the winning

    neuron cooperate in the learning process.

    In the learning algorithm is updated at

    each iteration during the ordering phase

    using the following exponential decay

    update rule, with parameters

    10 exp)( T

    nn

  • 8/13/2019 Lec-7-SOM

    32/54

    32

    Neighbourhood function

    0

    0.5

    1

    -10 -8 -6 -4 -2 0 2 4 6 8 10

    0

    0.5

    1

    -10 -8 -6 -4 -2 0 2 4 6 8 10

    Degree of

    neighbourhood

    Distance from winner

    Degree of

    neighbourhood

    Distance from winner

    Time

    Time

  • 8/13/2019 Lec-7-SOM

    33/54

    33

    UPDATE RULE

    )(-x)()()()1( )( nwnhnnwnw jxijjj

    exponential decay update of the learning rate:

    2

    0 exp)( Tnn

  • 8/13/2019 Lec-7-SOM

    34/54

    34

    Illustration of learning for Kohonen maps

    Inputs: coordinates (x,y) of points

    drawn from a square

    Display neuron j at position xj,yj

    where its sjis maximum

    Random initial positions

    100 inputs 200 inputs

    1000 inputsx

    y

  • 8/13/2019 Lec-7-SOM

    35/54

    35

    Two-phases learning approach

    Self-organizing or ordering phase. The learning rateand spread of the Gaussian neighborhood function

    are adapted during the execution of SOM, using for

    instance the exponential decay update rule.

    Convergence phase. The learning rate and Gaussian

    spread have small fixed values during the execution

    of SOM.

    O d i Ph

  • 8/13/2019 Lec-7-SOM

    36/54

    36

    Ordering Phase

    Self organizing or ordering phase: Topological ordering of weight vectors.

    May take 1000 or more iterations of SOM algorithm.

    Important choice of the parameter values. For instance

    (n): 0

    = 0.1 T2

    = 1000decrease gradually (n) 0.01

    hji(x)(n): 0big enough T1 =

    With this parameter setting initially the neighborhood ofthe winning neuron includes almost all neurons in thenetwork, then it shrinks slowly with time.

    1000

    log (0)

  • 8/13/2019 Lec-7-SOM

    37/54

    37

    Convergence Phase

    Convergence phase:Fine tune the weight vectors.

    Must be at least 500 times the number of neurons in

    the network thousands or tens of thousands of

    iterations.

    Choice of parameter values:

    (n) maintained on the order of 0.01.

    Neighborhood function such that the neighbor of thewinning neuron contains only the nearest neighbors.

    It eventually reduces to one or zero neighboring

    neurons.

  • 8/13/2019 Lec-7-SOM

    38/54

    38

  • 8/13/2019 Lec-7-SOM

    39/54

    39

    Another Self-Organizing Map

    (SOM) Example From Fausett (1994)

    n = 4, m = 2 More typical of SOM application

    Smaller number of units in output than in input;dimensionality reduction

    Training samplesi1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    Input units:

    Output units: 1 2

    What should we expect as outputs?

    Network Architecture

  • 8/13/2019 Lec-7-SOM

    40/54

    40

    What are the Euclidean Distances

    Between the Data Samples? Training samples

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    i1 i2 i3 i4

    i1 0

    i2 0i3 0

    i4 0

  • 8/13/2019 Lec-7-SOM

    41/54

    41

    Euclidean Distances Between Data

    Samples Training samples

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    i1 i2 i3 i4i1 0

    i2 3 0

    i3 1 2 0

    i4 4 1 3 0Input units:

    Output units:1

    2 What might we expect from the SOM?

    Example Details Input units:

  • 8/13/2019 Lec-7-SOM

    42/54

    42

    Example Details Training samples

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)

    With only 2 outputs, neighborhood = 0 Only update weights associated with winning output unit (cluster) at each

    iteration

    Learning rate(t) = 0.6; 1

  • 8/13/2019 Lec-7-SOM

    43/54

    43

    First Weight Update

    Training sample: i1 Unit 1 weights

    d2= (.2-1)2+ (.6-1)2+ (.5-0)2+ (.9-0)2 = 1.86

    Unit 2 weights

    d2

    = (.8-1)2

    + (.4-1)2

    + (.7-0)2

    + (.3-0)2

    = .98 Unit 2 wins

    Weights on winning unit are updated

    Giving an updated weight matrix:

    3.7.4.8.

    9.5.6.2.Unit 1:

    Unit 2:

    ( , , , )

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    ])3.7.4.8.[-0]01[1(6.0]3.7.4.8.[2 weightsunitnew

    .12].28.76[.92

    12.

    9.

    28.

    5.

    76.

    6.

    92.

    2.Unit 1:

    Unit 2:

    i1: (1, 1, 0, 0)

  • 8/13/2019 Lec-7-SOM

    44/54

    44

    Second Weight Update

    Training sample: i2 Unit 1 weights

    d2= (.2-0)2+ (.6-0)2+ (.5-0)2+ (.9-1)2 = .66

    Unit 2 weights

    d2

    = (.92-0)2

    + (.76-0)2

    + (.28-0)2

    + (.12-1)2

    = 2.28 Unit 1 wins

    Weights on winning unit are updated

    Giving an updated weight matrix:

    Unit 1:

    Unit 2:

    ( , , , )

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    ])9.5.6.2.[-1]00[0(6.0]9.5.6.2.[1 weightsunitnew

    .96].20.24[.08

    Unit 1:

    Unit 2:

    12.

    9.

    28.

    5.

    76.

    6.

    92.

    2.

    12.

    96.

    28.

    20.

    76.

    24.

    92.

    08.

    i1: (1, 1, 0, 0)

  • 8/13/2019 Lec-7-SOM

    45/54

    45

    Third Weight Update

    Training sample: i3 Unit 1 weights

    d2= (.08-1)2+ (.24-0)2+ (.2-0)2+ (.96-0)2 = 1.87

    Unit 2 weights

    d2

    = (.92-1)2

    + (.76-0)2

    + (.28-0)2

    + (.12-0)2

    = 0.68 Unit 2 wins

    Weights on winning unit are updated

    Giving an updated weight matrix:

    Unit 1:

    Unit 2:

    ( , , , )

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    ])12.28.76.92.[-0]00[1(6.0]12.28.76.92.[2 weightsunitnew

    .05].11.30[.97

    Unit 1:

    Unit 2:

    12.

    96.

    28.

    20.

    76.

    24.

    92.

    08.

    05.

    96.

    11.

    20.

    30.

    24.

    97.

    08.

    i1: (1, 1, 0, 0)

  • 8/13/2019 Lec-7-SOM

    46/54

    46

    Fourth Weight Update

    Training sample: i4 Unit 1 weights

    d2= (.08-0)2+ (.24-0)2+ (.2-1)2+ (.96-1)2 = .71

    Unit 2 weights

    d2

    = (.97-0)2

    + (.30-0)2

    + (.11-1)2

    + (.05-1)2

    = 2.74 Unit 1 wins

    Weights on winning unit are updated

    Giving an updated weight matrix:

    Unit 1:

    Unit 2:

    ( , , , )

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    ])96.20.24.08.[-1]10[0(6.0]96.20.24.08.[1 weightsunitnew

    .98].68.10[.03

    Unit 1:

    Unit 2:

    05.

    98.

    11.

    68.

    30.

    10.

    97.

    03.

    05.

    96.

    11.

    20.

    30.

    24.

    97.

    08.

    A l i h SOM Al i h

  • 8/13/2019 Lec-7-SOM

    47/54

    47

    Applying the SOM Algorithm

    time (t) 1 2 3 4 D(t) (t)

    1 Unit 2 0 0.6

    2 Unit 1 0 0.6

    3 Unit 2 0 0.6

    4 Unit 1 0 0.6

    Data sample utilized

    winning output unit

    Unit 1:

    Unit 2:

    00.1

    05.

    5.0

    0.10

    After many iterations (epochs)

    through the data set:

    Did we get the clustering that we expected?

  • 8/13/2019 Lec-7-SOM

    48/54

    48

    What clusters do thedata samples fall into?

    Unit 1:

    Unit 2:

    00.1

    05.

    5.0

    0.10

    WeightsInput units:

    Output units: 1 2

    Training samples

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)i4: (0, 0, 1, 1)

    Solution WeightsTraining samples

  • 8/13/2019 Lec-7-SOM

    49/54

    49

    Solution

    Sample: i1

    Distance from unit1 weights

    (1-0)2+ (1-0)2+ (0-.5)2+ (0-1.0)2 = 1+1+.25+1=3.25

    Distance from unit2 weights (1-1)2+ (1-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)

    Sample: i2

    Distance from unit1 weights

    (0-0)2+ (0-0)2+ (0-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)

    Distance from unit2 weights

    (0-1)2+ (0-.5)2+ (0-0)2+ (1-0)2 =1+.25+0+1=2.25

    Unit 1:

    Unit 2:

    0

    0.1

    0

    5.

    5.

    0

    0.1

    0

    g

    Input units:

    Output units: 1 2

    g p

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    2

    1 ,, ))((

    n

    k kjkl twid2 = (Euclidean distance)2=

    Solution WeightsTraining samples

  • 8/13/2019 Lec-7-SOM

    50/54

    50

    Solution

    Sample: i3

    Distance from unit1 weights

    (1-0)2+ (0-0)2+ (0-.5)2+ (0-1.0)2 = 1+0+.25+1=2.25

    Distance from unit2 weights (1-1)2+ (0-.5)2+ (0-0)2+ (0-0)2 = 0+.25+0+0=.25 (winner)

    Sample: i4

    Distance from unit1 weights

    (0-0)2+ (0-0)2+ (1-.5)2+ (1-1.0)2 = 0+0+.25+0 (winner)

    Distance from unit2 weights

    (0-1)2+ (0-.5)2+ (1-0)2+ (1-0)2 = 1+.25+1+1=3.25

    Unit 1:

    Unit 2:

    0

    0.1

    0

    5.

    5.

    0

    0.1

    0Input units:

    Output units: 1 2

    i1: (1, 1, 0, 0)

    i2: (0, 0, 0, 1)

    i3: (1, 0, 0, 0)

    i4: (0, 0, 1, 1)

    2

    1 ,, ))((

    n

    k kjkl twid2 = (Euclidean distance)2=

    W d i

  • 8/13/2019 Lec-7-SOM

    51/54

    51

    Word categories

  • 8/13/2019 Lec-7-SOM

    52/54

    52

    Examples of Applications

    Kohonen (1984). Speech recognition - a mapof phonemes in the Finish language

    Optical character recognition - clustering ofletters of different fonts

    Angeliol etal (1988)travelling salesmanproblem (an optimization problem)

    Kohonen (1990)learning vector quantization(pattern classification problem)

    Ritter & Kohonen (1989)semantic maps

  • 8/13/2019 Lec-7-SOM

    53/54

    53

    Summary Unsupervised learning is very common

    US learning requires redundancy in the stimuli

    Self organization is a basic property of the brains

    computational structure SOMs are based on

    competition (wta units)

    cooperation

    synaptic adaptation SOMs conserve topological relationships between

    the stimuli

    Artificial SOMs have many applications in

    computational neuroscience

  • 8/13/2019 Lec-7-SOM

    54/54

    54

    End of slides