57
Radial Basis Functions An Introduction Prof. Sarat K. Patra Senior Member, IEEE National Institute of Technology, Rourkela Odisha, India Email: [email protected]

RBF Classnote Mtech Spring2013

Embed Size (px)

DESCRIPTION

RBF Classnote Mtech Spring2013

Citation preview

Page 1: RBF Classnote Mtech Spring2013

Radial Basis Functions An Introduction

Prof. Sarat K. PatraSenior Member, IEEE

National Institute of Technology, Rourkela

Odisha, India

Email: [email protected]

Page 2: RBF Classnote Mtech Spring2013

Presentation Outline

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 2

Books and reference materials:

• S Haykin; Neural Networks – A comprehensive foundation; Pearson Education

• Christopher M Bishop; Neural Networks for Pattern recognition; Oxford University Press

• B Mulgrew; Applying radial basis functions ;Signal Processing Magazine, IEEE; Volume: 13 ,Issue: 2; 1998

Page 3: RBF Classnote Mtech Spring2013

What are we going to cover

Introduction

Soft computing Techniques

NN Architectures

Linear and non-linearly separable

Basis Functions

Regularized RBF; Generalized RBF

RBF Training and Examples

Difference with MLP

Conclusion

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 3

Page 4: RBF Classnote Mtech Spring2013

Different NN Architectures

• Perceptron (Only one neuron)

– Linear decision boundary

– Limited functionality

• MLP

• RBF

• Recurrent networks

• Self organizing maps

• Many more

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 4

Page 5: RBF Classnote Mtech Spring2013

Linear and Non-linearly Separable

• Take a 2 input single output

• Plot the each category output in input space using different symbols

• Take inputs in “x-y” plane

• Can you have a line separating the points into 2 categories?

– Yes – linearly separable (OR. AND gate)

– No – Non-linearly separable (EX-OR gate)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 5

Page 6: RBF Classnote Mtech Spring2013

Why network models beyond MLN?

• MLN (MLP) was already universal, but…

• MLN (MLP) can have many local minima.

• It is often too slow to train MLN.

• Sometimes, it is extremely difficult to optimize the structure of MLN.

• There may exist other network architecturesin terms of number of elements in each layer…whose performance could be superior to theone used.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 6

Page 7: RBF Classnote Mtech Spring2013

Radial Basis Function (RBF) Networks

RBFN are artificial neural networks forapplication to problems of supervised learning:

Regression

Classification

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 7

Page 8: RBF Classnote Mtech Spring2013

Pragmatic Regression

• Parametric regression-the form of the function is known butnot the parameter values.

• Typically, the parameters (both the dependent andindependent) have physical meaning.

• E.g. fitting a straight

line to a bunch

Of points-

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 8

Page 9: RBF Classnote Mtech Spring2013

Non-Pragmatic Regression

• No prior knowledge of the true form of the function.

• Using many free parameters which have no physical meaning.

• The model should be able to represent a very broad class of functions.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 9

Page 10: RBF Classnote Mtech Spring2013

Classification

• Purpose: assign previously unseen patterns to their respective classes.

• Training: previous examples of each class.

• Output: a class out of a discrete set of classes.

• Classification problems can be made to look like nonparametric regression.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 10

Page 11: RBF Classnote Mtech Spring2013

Time Series Prediction

• Estimate the next value and future values of asequence, such as:

• The problem is that usually it is not an explicit functionof time. Normally time series are modeled as auto-regressive in nature, i.e. the outputs, suitably delayed,are also the inputs:

• To create the training set from the available historicalsequence first requires the choice of how many andwhich delayed outputs affect the next output.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 11

Page 12: RBF Classnote Mtech Spring2013

Supervised Learning in RBFN

• Neural networks, including radial basis functionnetworks, are nonparametric models and theirweights (and other parameters) have noparticular meaning in relation to the problems towhich they are applied.

• Estimating values for the weights of a neuralnetwork (or the parameters of any nonparametricmodel) is never the primary goal in supervisedlearning.

• The primary goal is to estimate the underlyingfunction (or at least to estimate its output atcertain desired values of the input).

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 12

Page 13: RBF Classnote Mtech Spring2013

The idea of RBFNN

The MLN is one way to get non-linearity. The other is to use

The generalized linear discriminate function

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 13

j

jjwy )(x

Page 14: RBF Classnote Mtech Spring2013

The idea of RBFNN

For Radial Basis Function (RBF), the basis function is radial

Symmetry with respect to the input, whose valueis determined by the - distance from the data pointto the RBF center.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 14

M

m

mjmj

j

j

jj

x1

2

2

])([|||| distance,Euclidean For

measure. distance theis ||||

width. the center, therepresents where

)2/||||exp()(

ccx

cx

c

cxx

The Gaussian Kernel

Page 15: RBF Classnote Mtech Spring2013

Cover’s Theorem

“A complex pattern-classification problem cast inhigh-dimensional space nonlinearly is more likelyto be linearly separable than in a lowdimensional space”

(Cover, 1965).

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 15

Page 16: RBF Classnote Mtech Spring2013

Radial Basis Function Networks

• In its most basic form Radial-Basis Functionnetwork (RBF) involves three layers with entirelydifferent roles.

• The input layer is made up of source nodes thatconnect the network to its environment.

• The second layer, the only hidden layer, applies anonlinear transformation from the input space tothe hidden space.

• The output layer is linear, supplying the responseof the network to the activation pattern appliedto the input layer.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 16

Page 17: RBF Classnote Mtech Spring2013

The idea of RBFNN

• For RBFNN, we expect that the function to belearnt can be expressed as a linearsuperposition of a number of RBFs.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 17

The function is described as a linear

superposition Of three basis

functions.

Page 18: RBF Classnote Mtech Spring2013

RBF Structure

RBFNN: a two-layer network

Free parameters

--The network weights win the 2nd layer

--The form of basis functions

--The number of basis functions

--The location of basis functions.

E.g.: for Gaussian RBFNN, they are the number, the centersand the widths of basis functions

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 18

y

x

w

Page 19: RBF Classnote Mtech Spring2013

Some Theory

Given a set of N differentpoints {xi Rm0 i=1,2,...,N}and a corresponding set of Nreal numbers {di R1 i=1,2,...,N}, find a functionF:RN->R1 that satisfies theinterpolation condition

F(xi) = di , i=1,2,...,NThe radial-basis functiontechnique consists ofchoosing a function FF(x) = N

i=1 wi (x-xi )

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 19

Page 20: RBF Classnote Mtech Spring2013

Some Theory

Micchelli’s Theorem

Let {xi}N

i=1 be a set of distinct points in Rm0 Thenthe N-by-N interpolation matrix , whose joy-theelement is ij = (xj-xi) is non-singular.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 20

Page 21: RBF Classnote Mtech Spring2013

Regularization Networks

The regularization network is a universal approximator

The regularization network has the best approximation property

The solution computed by the regularization network is optimal.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 21

Page 22: RBF Classnote Mtech Spring2013

Generalized RBF Networks

• When N is large, the one-to-one correspondencebetween the training inputdata and the Green’sfunction produces aregularization networkthat may be consideredexpensive. ->

• An approximation of theregularized network.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 22

Page 23: RBF Classnote Mtech Spring2013

Generalized RBF Networks

• The approach taken involves searching for suboptimal solution in alower-dimensional space that approximates the regularized solution(Galerkin’s method).

F*(x) = m1 i=1 wi i(x),

where {i(x) | i=1,2,...,m1 N} is a new set of linearly independentbasis functions and the wi constitute a new set of weights.

• We set i(x) = G(x-ti ), i=1,2,... m1 where the set of centers {ti |i=1,2,...,m1} is to be determined.

Note that this particular choice of basis functions is the only thatguarantees that in the case of m1 = N and xi = ti i=1,2,...,N thecorrect solution is consistently recovered.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 23

Page 24: RBF Classnote Mtech Spring2013

localized Non-localized

RBF Structure (2)

• Universal approximation: for Gaussian RBFNN, it is capable to approximate any function.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 24

Page 25: RBF Classnote Mtech Spring2013

Exact Interpolation

• The idea of RBFNN is that we ‘interpolate’ thetarget function by using the sum of a number ofbasis functions.

• To illustrate this idea, we consider a special caseof exact interpolation, in which the number ofbasis functions M is equal to the number of datapoints N (M=N) and all

• The basis functions are centered at the datapoints. We want the target values are exactlyinterpolated by the summation of basis functions.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 25

Page 26: RBF Classnote Mtech Spring2013

Exact Interpolation

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 26

tw

cx

or

||)(||

,1for ,

1

M

j

n

j

n

jj

nn

tw

Nnty

Since M=N, is a square matrix and is non-singular for general cases, the result istw

1

Page 27: RBF Classnote Mtech Spring2013

RBF Output with 3 centers

1-Dimensional problem

Center location (-1, 0, 1)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 27

Page 28: RBF Classnote Mtech Spring2013

RBF Output with 4centres (EX-OR)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 28

σ2= 0.1

σ2= 1.0

Page 29: RBF Classnote Mtech Spring2013

RBF Output with 4centres

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 29

σ2= 0.1

σ2= 1.0

Page 30: RBF Classnote Mtech Spring2013

An example of exact interpolation

For Gaussian RBF (1D input)

21 data points are generated by y=sin(px) plus noise (strength=0.2)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 30

The target data points areindeed exactly interpolated,but the generalizationperformance is not good.

Page 31: RBF Classnote Mtech Spring2013

The hybrid training procedure

• The number of basis functions needs not to be equal tothe number of data points. Actually, in a typicalsituation, M should be much less than N.

• The centers of basis functions are no longerconstrained to be at the input data points. Instead, thedetermination of centres becomes part of the trainingprocess.

• Instead of having a common width parameter , eachbasis function can has its own width, which is also tobe determined by learning.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 31

Page 32: RBF Classnote Mtech Spring2013

An example of RBFNN

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 32

Exact interpolation, =0.1

RBFNN, 4 basis functions, 0.4

Page 33: RBF Classnote Mtech Spring2013

The hybrid training procedure

• Unsupervised learning in the first layer. This is to fix thebasis functions by only using the knowledge of inputdata. For Gaussian RBF, it often includes deciding thenumber, locations and the width of RBF.

• Supervised learning in the second layer. This is todetermine the network weights in the second layer. Ifwe choose the sum-of-square error, it becomes aquadratic function optimization, which is easy to solve.

• In summary, the hybrid training avoids to usesupervised learning simultaneously in two layers, andgreatly simplify the computational cost.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 33

Page 34: RBF Classnote Mtech Spring2013

Basis function optimization

The form of basis function is predefined, and isoften chosen to be Gaussian.

The number of basis function has often to bedetermined by trials, e.g. though monitoringthe generalization performance.

The key issue in unsupervised learning is todetermine the locations and the widths of basisfunctions.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 34

Page 35: RBF Classnote Mtech Spring2013

Algorithms for basis function optimization

Subsets of data points.

• To randomly select a number of input datapoints as basis function centers.

• The width can be chosen to be equal and tobe given by some multiple of the averagedistance between the basis function centers.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 35

Page 36: RBF Classnote Mtech Spring2013

Algorithms for basis function optimization

Gaussian mixture models.

• The choice of basis functions is essential tomodel the density distribution of the inputdata (intuitively we want the centers of basisfunctions to be at high density regions). Wemay assume input data is generated by amixture of Gaussian distribution. Optimizingthe probability density model returns thebasis function centers and widths.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 36

Page 37: RBF Classnote Mtech Spring2013

Algorithms for basis function optimization

Clustering algorithms.

• In this approach the input data is assumed toconsist of a number of clusters. Each clustercorresponds to one basis function, with thecenter being the basis function center. Thewidth can be set to be equal to some multipleof the average distance between all centers.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 37

Page 38: RBF Classnote Mtech Spring2013

K-means clustering algorithm (1)

• The algorithm partition data points into K disjointsubsets (K is predefined).

• The clustering criteria are:

– The cluster centers are set in the high density regionsof data

– A data point is assigned to the cluster with which ithas the minimum distance to the center

• Mathematically, this is equivalent to minimizingthe sum-of-square clustering function

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 38

Page 39: RBF Classnote Mtech Spring2013

K-means clustering algorithm (2)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 39

cluster in points data theofmean the:1

points data containingcluster th the:

where

||||1

2

j

Sn

n

j

j

jj

K

j Sn

j

n

SN

NjS

J

j

j

xc

cx

Page 40: RBF Classnote Mtech Spring2013

K-means clustering algorithm (3)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 40

• Step 1: Initially randomly assign data points to one of Kclusters. Each data point will then have a cluster label.

• Step 2: Calculate the mean of each cluster C.• Step 3:Check whether each data pointed has the right

cluster label. For each data point, calculate its distancesto all K centers. If the minimum distance is not thevalue of this data point in its cluster center, the clusteridentity of this data point will then be updated to theone that gives the minimum distance.

• Step 4: After each epoch checking (one turn for all datapoints), if no updating occurs, i.e., J reaches theminimum value, then stop. Otherwise, go back to step-2.

Page 41: RBF Classnote Mtech Spring2013

An example of data clustering

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 41

Before clustering After clustering

Page 42: RBF Classnote Mtech Spring2013

The network training

• The network output after clustering

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 42

termbias the:1)(

clusteringby obtained centers the:

RBFGaussian the:0for ),2/||||exp()(

)()(

0

22

0

x

c

cxx

xx

j

jj

j

K

j

j

j

wy

N

n

nM

j

n

jj twE1

2

0

)(2

1)( w

The error output is

Page 43: RBF Classnote Mtech Spring2013

RBF in Time series Prediction

• We will show an example of using RBFNN for timeseries prediction.

• Time series prediction: to predict the systembehavior based on its history.

• Suppose the time course of a system is denotedas{S(1),S(2),…S(n)}, where S(n) is the system stateat time step n. The task is to predict the systembehavior at n+1 based on the knowledge of itshistory. i.e., {S(n),S(n-1),S(n-2),…}. This is possiblefor many problems in which system states arecorrelated over time.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 43

Page 44: RBF Classnote Mtech Spring2013

RBF in Time series Prediction

• Consider a simple example, the logistic map, in which the system state x is updated iteratively according to

• Our task is to predict the value of x at any step based on its values in the previous two steps, i.e., to estimate xn based on xn-2 and

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 44

)1(1 nnn xrxx

Page 45: RBF Classnote Mtech Spring2013

Generating training data from the logistic map

• The logistic map, though is simple, shows many interesting behaviors. (More detail can be found at http://mathworld.wolfram.com/LogisticMap.html

• The data collecting process:• Choose r=4, and the initial value of x to be 0.3

• Iterate the logistic map 500 steps, and collect 100 examples from the last

• 100 iterations (chopping the data into triplets, each triplet gives one input-output pair).

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 45

Page 46: RBF Classnote Mtech Spring2013

Generating training data from the logistic map

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 46

The input data space

The time course of the system state

Page 47: RBF Classnote Mtech Spring2013

Clustering the input data

• We cluster the input data by using the K-means clustering algorithm.

• We choose K=4. The clustering result returns the centers of basis functions and the scale of width.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 47

Page 48: RBF Classnote Mtech Spring2013

The training result of RBFNN

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 48

2 and between iprelationsh The nn xx

Page 49: RBF Classnote Mtech Spring2013

The training result of RBFNN

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 49

1 and between iprelationsh The nn xx

Page 50: RBF Classnote Mtech Spring2013

Time series predicted data

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 50

Page 51: RBF Classnote Mtech Spring2013

Comparison with MLP

RBF• Simple structure: one hidden layer,

linear combination at the output layer

• Simple training: the hybrid training: clustering + the quadratic error function

• Localized representation: the input space is covered by a number of localized basis functions. A given input typically only activate significantly a limited number of hidden units (those are within a close distance)

MLP

• Complicated structure: often many layers and many hidden units

• Complicated training: optimizing multiple layer together, local minimum and slow convergence.

• Distributed representation: for a given input, typically many hidden units will be activated.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 51

Page 52: RBF Classnote Mtech Spring2013

Comparison with MLP (2)

• Different ways of interpolating data

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 52

MLP: data are classified by hyper-planes. RBF: data are classified according to clusters

Page 53: RBF Classnote Mtech Spring2013

Shortcomings of RBFNN

• Unsupervised learning implies that RBFNNmay only achieve a sub - optimal solution,since the training of basis functions does notconsider the information of the outputdistribution.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 53

Example: a basis function ischosen based only on thedensity of input data, whichgives p (x). It does not matchthe real output function h (x).

Page 54: RBF Classnote Mtech Spring2013

Shortcomings of RBFNN

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 54

Example: the output function is only determined by one input component, theother component is irrelevant. Due to unsupervised, RBFNN is unable to detectthis irrelevant component, whereas, MLP may do (the network weightsconnected to irrelevant components will tend to have smaller values).

Page 55: RBF Classnote Mtech Spring2013

Some Theory

The XOR problem: (x1 OR x2) AND NOT (x1 AND x2)

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 55

Page 56: RBF Classnote Mtech Spring2013

Summary

• The structure of an RBF network is unusual in that the constitution of its hidden units is entirely different from that of its output units.

• Tikhonov’s regularization theory provides a sound mathematical basis for the formulation of RBF networks.

• The Green’s function G (x, ) plays a central role in the theory.

3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 56

Page 57: RBF Classnote Mtech Spring2013

Queries ?????