3.Learning In previous lecture, we discussed the biological foundations of of neural computation...

Preview:

DESCRIPTION

In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)

Citation preview

3.Learning

In previous lecture, we discussed the biological foundationsof of neural computation including

single neuron models connecting single neuron behaviour with network models spiking neural networks computational neuroscience

In present one, we introduce

Statistical foundations of neural computation = Artificial foundations of neural computation

Artificial Neural Networks

Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) fly (but not like a bird) walk (in a funny way)

In present one, we introduce

Statistical foundations of neural computation = Artificial foundations of neural computation

Artificial Neural Networks

Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)

In present one, we introduce

Statistical foundations of neural computation = Artificial foundations of neural computation

Artificial Neural Networks

Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) walk (in a funny way)

Topic Pattern recognition

Cluster

Statistical Approach

Statistical Learning (training from data set, adaptation)

change weights or interaction between neurons according to examples, previous knowledge The purpose of learning is to minimize training errors on learning data

Learning (training from data set, adaptation) and

The purpose of learning is that to minimize training errors on learning data: learning error prediction errors on new, unseen data: generalization error

Learning (training from data set, adaptation) and

The purpose of learning is that to minimize training errors prediction errors

The neuroscience basis of learning remains elusive, although we have seen some progresses (see references in the previous lecture)

LEARNING: extracting principles from data set.

• Supervised learning: have a teacher, telling you where to go

• Unsupervised learning: not teacher, learn by itself

• Reinforcement learning: have a critics, wrong or correct

Statistical learning: the artificial, reasonable way of training and prediction

LEARNING: extracting principles from data set.

Supervised learning: have a teacher, telling you where to go

Unsupervised learning: not teacher, learn by itself

Reinforcement learning: have a critics, wrong or correctWe will concentrate on the first two. You could find reinforced learning from Haykin, Hertz et al. books or

Sutton R.S., and Barto A.G. (1998) Reinforcement learning: an introduction Cambridge, MA: MIT Press

Statistical learning: the artificial, reasonable way of training and prediction

Pattern recognition (classifications), a special case of learning

The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate)

Example: X, a bunch of faces x, a single face,

femaleisxif

maleisxifxf

11

)(

Pattern recognition (classifications), a special case of learning

The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate)

For example: X, a bunch of faces x, a single face,

f(f(

femaleisxif

maleisxifxf

11

)(

Pattern: as opposite of a chaos; it is an entity, vaguely defined, that could be given a name

Examples: • a fingerprint image, • a handwritten word, • a human face, • a speech signal, • an iris pattern etc.

Pattern: Given a pattern:

a. supervised classification (discriminant analysis) in which the input pattern is identified as a member of a predefined class

b. unsupervised classification (e.g.. clustering ) in which the patter is assigned to a hitherto unknown class. Unsupervised classification will be introduced in later Lectures

Pattern recognition is the process of assigning patterns to one of a number of classes

xy

feature extraction

pattern space(data)

featurespace

feature extraction

Hair length y =0

Hair length y = 30 cm

x =

x =

Pattern recognition is the process of assigning patterns to one of a number of classes

xy

feature extraction

classification

Decision space

pattern space(data)

featurespace

feature extraction

Hair length =0

Hair length = 30 cm

classification

Short hair= male

Long hair = female

Feature extraction: which is a very fundamental issue

For example: when we recognize a face, which feature we use ????

Eye pattern, geometric outline etc.

Two approaches: Statistical approach Clusters: template matching

In two steps: Find a discrimant function in terms of certain features

Make a decision in terms of the discrimant function

discriminant function: a function used to decide on class membership

Cluster: patterns of a class should be grouped or clustered together in pattern or feature space if decision space is to be partitioned

objects near together must be similarobjects far apart must be dissimilar

distance measures: choice becomes important for basis of classification

Once a distance is given, the pattern recognition is accomplished.

Hair Length

Distance metrics:

different distance will be employed later

To be a valid distance metric of the distance between two objectsin and abstract space W, a distance metric must satisfy following conditions

Distance metrics: different distance will be employed laterTo be a valid distance metric of the distance between two objectsin and abstract space W, a distance metric must satisfy following conditions

d(x,y)>=0 nonnegative

d(x,x)=0 reflexivity

d(x,y)=d(y,x) symmetrical

d(x,y)<= d(x,z)+d(z,y) triangle inequality

We will encounter different distances, for example distance metric -- relative entropy (distance from information theory

Hamming distance

For x = {xi} and y = {yi} dH(x , y ) = |xi-yi|

measure of sum of absolute different between each element of two vectors x and y

most often used in comparing binary vectors (binary pixel figures, black and white figures) e.g. dH ([1 0 0 1 1 1 0 1], [1 1 0 1 0 01 1]) = 4

= ( 1 1 1 1 1 1 1 1 0)

Euclidean Distance

For x = {xi} and y = {yi}

d (x , y ) = [(xi-yi)2]1/2

Most widely used distance, easy to calculate

Minkowski Distance For x = {xi} and y = {yi}

d (x , y ) = [xi-yi|r]1/r

r > 0

Statistical approach:

Hair length

Distribution density p1(x) and p2(x)

If p1(x) > p2(x) then x is in class one other wise it is in class two

The discriminant function is given by p1(x) = p2(x)

Now the problem of statistical pattern recognition is reduced to estimate the probability density for a given data {x} and {y}

In general there are two approaches • Parametric method • Nonparametric method

Parametric methods

Assumes knowledge of underlying probability density distribution p(x)

Advantages: need only adjust parameters distributions to obtain best fit. According to the central limit theorem, we could assume in many cases that the distribution is Gaussian (see below)

Disadvantage: if assumption is wrong than poor performance in terms of misclassification. However, if crude classification acceptable then this can be OK

Normal (Gaussian) Probability Distribution --common assumption that density distribution is normal

For single variable X

mean E X =

variance E ( X- E X)2 = 2

p x x( ) exp( ( ) ) 1

2 2

2

2

For multiple dimensions x

x feature vector, mean vector, covariance matrix an nxn matrix and is symmetric and ij = E [ (Xi- i) (Xj- j) ]the correlation between Xi and Xj

| | = determinant of = inverse of

p x x xn

T( )( ) | |

exp( ( ) ( ))/ /

12

122 1 2

1

xx

xn n

n

n nn

1 1 11 1

1

Fig. here

Mahalanobis distance

d x x xT( , ) ( ) ( ) 1

u1

u2

d x c( , )

Topic Hebbian learning rule

Hebbian learning rule is local: only involving two neurones, independent of other variables

We will return to Hebbian learning rule later in the course in PCA learning

There are other possible ways of learning which are demonstrated in experiments (see Nature Neuroscience, as in previous lecture)

Biological learning Vs. statistical learning

Biological learning: Hebbian learning rule

When an axon of cell A is near enough to excite a cell B and repeatedlyor persistently takes part in firing it, some growth process or metabolic changes take place in one of both cells such that A’s efficiency as one of the cell firing B, is increased

A B

Cooperation between two neuronsIn mathematical term: w(t) as the weight between two neurons a t time t w(t+1)=w(t)+ rA rB

Recommended