View
222
Download
0
Category
Preview:
DESCRIPTION
In present one, we introduce Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)
Citation preview
3.Learning
In previous lecture, we discussed the biological foundationsof of neural computation including
single neuron models connecting single neuron behaviour with network models spiking neural networks computational neuroscience
In present one, we introduce
Statistical foundations of neural computation = Artificial foundations of neural computation
Artificial Neural Networks
Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) fly (but not like a bird) walk (in a funny way)
In present one, we introduce
Statistical foundations of neural computation = Artificial foundations of neural computation
Artificial Neural Networks
Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)
In present one, we introduce
Statistical foundations of neural computation = Artificial foundations of neural computation
Artificial Neural Networks
Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) walk (in a funny way)
Topic Pattern recognition
Cluster
Statistical Approach
Statistical Learning (training from data set, adaptation)
change weights or interaction between neurons according to examples, previous knowledge The purpose of learning is to minimize training errors on learning data
Learning (training from data set, adaptation) and
The purpose of learning is that to minimize training errors on learning data: learning error prediction errors on new, unseen data: generalization error
Learning (training from data set, adaptation) and
The purpose of learning is that to minimize training errors prediction errors
The neuroscience basis of learning remains elusive, although we have seen some progresses (see references in the previous lecture)
LEARNING: extracting principles from data set.
• Supervised learning: have a teacher, telling you where to go
• Unsupervised learning: not teacher, learn by itself
• Reinforcement learning: have a critics, wrong or correct
Statistical learning: the artificial, reasonable way of training and prediction
LEARNING: extracting principles from data set.
Supervised learning: have a teacher, telling you where to go
Unsupervised learning: not teacher, learn by itself
Reinforcement learning: have a critics, wrong or correctWe will concentrate on the first two. You could find reinforced learning from Haykin, Hertz et al. books or
Sutton R.S., and Barto A.G. (1998) Reinforcement learning: an introduction Cambridge, MA: MIT Press
Statistical learning: the artificial, reasonable way of training and prediction
Pattern recognition (classifications), a special case of learning
The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate)
Example: X, a bunch of faces x, a single face,
femaleisxif
maleisxifxf
11
)(
Pattern recognition (classifications), a special case of learning
The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate)
For example: X, a bunch of faces x, a single face,
f(f(
femaleisxif
maleisxifxf
11
)(
Pattern: as opposite of a chaos; it is an entity, vaguely defined, that could be given a name
Examples: • a fingerprint image, • a handwritten word, • a human face, • a speech signal, • an iris pattern etc.
Pattern: Given a pattern:
a. supervised classification (discriminant analysis) in which the input pattern is identified as a member of a predefined class
b. unsupervised classification (e.g.. clustering ) in which the patter is assigned to a hitherto unknown class. Unsupervised classification will be introduced in later Lectures
Pattern recognition is the process of assigning patterns to one of a number of classes
xy
feature extraction
pattern space(data)
featurespace
feature extraction
Hair length y =0
Hair length y = 30 cm
x =
x =
Pattern recognition is the process of assigning patterns to one of a number of classes
xy
feature extraction
classification
Decision space
pattern space(data)
featurespace
feature extraction
Hair length =0
Hair length = 30 cm
classification
Short hair= male
Long hair = female
Feature extraction: which is a very fundamental issue
For example: when we recognize a face, which feature we use ????
Eye pattern, geometric outline etc.
Two approaches: Statistical approach Clusters: template matching
In two steps: Find a discrimant function in terms of certain features
Make a decision in terms of the discrimant function
discriminant function: a function used to decide on class membership
Cluster: patterns of a class should be grouped or clustered together in pattern or feature space if decision space is to be partitioned
objects near together must be similarobjects far apart must be dissimilar
distance measures: choice becomes important for basis of classification
Once a distance is given, the pattern recognition is accomplished.
Hair Length
Distance metrics:
different distance will be employed later
To be a valid distance metric of the distance between two objectsin and abstract space W, a distance metric must satisfy following conditions
Distance metrics: different distance will be employed laterTo be a valid distance metric of the distance between two objectsin and abstract space W, a distance metric must satisfy following conditions
d(x,y)>=0 nonnegative
d(x,x)=0 reflexivity
d(x,y)=d(y,x) symmetrical
d(x,y)<= d(x,z)+d(z,y) triangle inequality
We will encounter different distances, for example distance metric -- relative entropy (distance from information theory
Hamming distance
For x = {xi} and y = {yi} dH(x , y ) = |xi-yi|
measure of sum of absolute different between each element of two vectors x and y
most often used in comparing binary vectors (binary pixel figures, black and white figures) e.g. dH ([1 0 0 1 1 1 0 1], [1 1 0 1 0 01 1]) = 4
= ( 1 1 1 1 1 1 1 1 0)
Euclidean Distance
For x = {xi} and y = {yi}
d (x , y ) = [(xi-yi)2]1/2
Most widely used distance, easy to calculate
Minkowski Distance For x = {xi} and y = {yi}
d (x , y ) = [xi-yi|r]1/r
r > 0
Statistical approach:
Hair length
Distribution density p1(x) and p2(x)
If p1(x) > p2(x) then x is in class one other wise it is in class two
The discriminant function is given by p1(x) = p2(x)
Now the problem of statistical pattern recognition is reduced to estimate the probability density for a given data {x} and {y}
In general there are two approaches • Parametric method • Nonparametric method
Parametric methods
Assumes knowledge of underlying probability density distribution p(x)
Advantages: need only adjust parameters distributions to obtain best fit. According to the central limit theorem, we could assume in many cases that the distribution is Gaussian (see below)
Disadvantage: if assumption is wrong than poor performance in terms of misclassification. However, if crude classification acceptable then this can be OK
Normal (Gaussian) Probability Distribution --common assumption that density distribution is normal
For single variable X
mean E X =
variance E ( X- E X)2 = 2
p x x( ) exp( ( ) ) 1
2 2
2
2
For multiple dimensions x
x feature vector, mean vector, covariance matrix an nxn matrix and is symmetric and ij = E [ (Xi- i) (Xj- j) ]the correlation between Xi and Xj
| | = determinant of = inverse of
p x x xn
T( )( ) | |
exp( ( ) ( ))/ /
12
122 1 2
1
xx
xn n
n
n nn
1 1 11 1
1
Fig. here
Mahalanobis distance
d x x xT( , ) ( ) ( ) 1
u1
u2
d x c( , )
Topic Hebbian learning rule
Hebbian learning rule is local: only involving two neurones, independent of other variables
We will return to Hebbian learning rule later in the course in PCA learning
There are other possible ways of learning which are demonstrated in experiments (see Nature Neuroscience, as in previous lecture)
Biological learning Vs. statistical learning
Biological learning: Hebbian learning rule
When an axon of cell A is near enough to excite a cell B and repeatedlyor persistently takes part in firing it, some growth process or metabolic changes take place in one of both cells such that A’s efficiency as one of the cell firing B, is increased
A B
Cooperation between two neuronsIn mathematical term: w(t) as the weight between two neurons a t time t w(t+1)=w(t)+ rA rB
Recommended