Soft Computing 2

8/13/2019 Soft Computing 2

1/33


2/33


3/33


4/33


5/33

NeuralnetworksNN Architecture Learning methods

Gradient Descent Hebbian Competitive Stochastic

Single layer FFN ADALINE (Adaptive

Linear Neuron

AM (Associative Memory LVQ (Learning vector

quantization)

ement

Hopfield

Perceptron

op e

( self organizing feature

map

Multilayer FFN CCM (Cauchy

Machines

Neo-cognition

Function)

Recurrent networks RNN BAM( Bidirectional AM)

BSB(Brain state in a box)

ART( Adaptive resonance

theory)

Boltzmann and

Cauchy Machines

Hopfield


6/33

AdaptiveFilteringProblem


7/33


8/33


9/33

Unconstrainedoptimizationtechniques


10/33

Newtons

SteepestDescent

GaussNewton

Method

SteepestDescent


11/33


12/33

LMS

Al orithmTheLeastMeanSquare(LMS)algorithm,introducedbyWidrowandHoffin1959

, .

LMSalgorithmusestheestimatesofthegradientvectorfromtheavailabledata.

LMSincorporatesaniterativeprocedurethatmakessuccessivecorrectionstothe

weig tvectorint e irectiono t enegativeo t egra ientvectorw ic

eventuallyleadstotheminimummeansquareerror.

Com aredtootheral orithmsLMSal orithmisrelativel sim le itdoesnot

requirecorrelationfunctioncalculationnordoesitrequirematrixinversions.


13/33

LMSAlgorithm


14/33


15/33


16/33

graphrepresentation


17/33

Solutionfollowstherandomtrajectoryhence

.

steepestdescent

follows

well

defined

LMSdoes

not

require

the

knowledge

of

the

statisticso t eenvironment

Simpleandrobustasitismodelindependent

Slowrate

of

convergence


18/33

Learningcurves


19/33

MultilayerNeuralNetwork(perceptrons)


20/33

activationfunction

Highly

connected


21/33

Backpropagationalgorithm

Backpropagation is a common method of teaching artificial neuralne wor s ow o per orm a g ven as . was rs escr e y

Arthur E. Bryson and YuChi Ho in 1969,

]

but it wasn't until 1974 andlater, through the work of Paul Werbos, David E. Rumelhart,Geoffre E. Hinton and Ronald J. Williams that it ainedrecognition, and it led to a renaissance in the field of artificialneural network research.

It is a supervised learning method, and is a generalization of thedelta rule. It requires a teacher that knows, or can calculate, thedesired output for any input in the training set. It is most useful for

, ,

that have no connections that loop). The term is an abbreviation for"backward propagation of errors". Backpropagation requires thatthe activation function used by the artificial neurons (or "nodes")

e eren a e.


22/33

Backpropagation networks are necessarily multilayer perceptrons(usually with one input, one hidden, and one output layer). In order forthe hidden layer to serve any useful function, multilayer networks musthave nonlinear activation functions for the multiple layers: a multilayernetwork using only linear activation functions is equivalent to some

, .commonly used include the logistic function, the softmax function, andthe gaussian function.


23/33

you're not sure how to relate it to the output. The roblem a ears to have overwhelmin

complexity, but there is clearly a solution.

It is easy to create a number of examples of thecorrect behavior.

The solution to the problem may change over time,

within the bounds of the given input and outputparameters (i.e., today 2+2=4, but in the future we

may n t at + = . .

Outputs can be "fuzzy", or nonnumeric.


24/33

The conver ence obtained from back ro a ation

learning is very slow. The convergence in backpropagation learning is

.

The result may generally converge to any local

gradient descent exists on a surface which is notflat.

ac propagat on earn ng requ res nput sca ngor normalization. Inputs are usually scaled intothe ran e of +0.1f to +0.9f for best erformance.


25/33


26/33


27/33

TrainingaTwoLayerFeedforwardNetwork

1.Take the set of trainin atterns ou wish the network to learn

{ini p, outj p : i = 1 ninputs, j = 1 noutputs, p = 1 npatterns} .

2. Set up your network with ninputs input units fully connected to

nhidden nonlinearhidden units via connections with weights,

which in turn are fully

3. Generate random initial weights, e.g. from the range [smwt,

+smwt]4. Select an appropriate error functionand learning rate .

5. Apply the weight update equation for each training pattern p.

wpatterns is called oneepoch of training.

6. Re eat ste 5 until the network error function is small

enough.

The extension to networks with more hidden layers should be

rac ca ons era ons or ac


28/33

rac ca ons era ons or ac

PropagationLearningMost

of

the

practical

considerations

necessary

for

general

Back

Propagation

learning

1.Doweneedtopreprocessthetrainingdata?Ifso,how?

2.Howdowechoosetheinitialweightsfromwhichwestartthetraining?

3.Howdowechooseanappropriatelearningrateh?

. ,

set?

5.Aresomeactivation/transferfunctionsbetterthanothers?

.

7.Howcanweavoidlocalminimaintheerrorfunction?

8.Howdoweknowwhenweshouldstopthetraining?

However,there

are

also

two

important

issues

9.Howmanyhiddenunitsdoweneed?

10.Shouldwehavedifferentlearningratesforthedifferentlayers?


29/33

HowManyHiddenUnits?The best number of hidden units depends in a complex way on many factors,

including:

1.The number of trainin atterns

2. The numbers of input and output units

3. The amount of noise in the training data

.

5. The type of hidden unit activation function

6. The training algorithmToo few hidden units will generally leave high training and generalisation

errors due to underfitting. Too many hidden units will result in low

training errors, but will make the training unnecessarily slow, and will

resu t n poor genera sat on un ess some ot er tec n que suc asregularisation) is used to prevent overfitting.

Virtually all rules of thumb you hear about are actually nonsense. A

sens e strategy s to try a range o num ers o en un ts an see

which works best.

Diff t L i R t f Diff t


30/33

DifferentLearningRatesforDifferent

L r ?A network as a whole will usually learn most efficiently if all its neurons arelearning at roughly the same speed. So maybe different parts of the

network should have different learning rates h. There are a number of

factors that may affect the choices:

1.Thelaternetworkla ers nearertheout uts willtendtohavelar erlocal

gradients(deltas)thantheearlierlayers(nearertheinputs).

2.Theactivationsofunitswithmanyconnectionsfeedingintooroutofthem

.

3.ActivationsrequiredforlinearunitswillbedifferentforSigmoidalunits.

4.Thereisempiricalevidencethatithelpstohavedifferentlearningratesh

.

Inpractice,itisoftenquickertojustusethesamerateshforalltheweights

andthresholds,ratherthanspendingtimetryingtoworkoutappropriate

.

determinegoodlearningrates.


31/33

NNArchitecture

HopfieldNetwork

KohonenSelfOrganizingMap

RadialBasis

Function

Network

ART(Adaptiveresonancetheory

BSB(BrainstateinaboxModel)

MarkovChains

Helmholtzmachines

Boltzmannmachine

Simulatedannealing

KalmanFilters

SaptioTemporalModelsofaneuron

Bellmantheorem

KullbackLeiblerDivergence


32/33

Expansion,Generation,Transmission

Distri ution,Structura

ReactivePower

Reliability


33/33

p an

Generationscheduling,Economicdispatch,OPF,Unitcommitment,

Reactivepowerdispatch,Voltagecontrol,Securityassessment,Static,

Dynamic,Maintenancescheduling,Contractmanagement

Equipmentmonitoring,

SystemLoadforecasting,Loadmanagement,Alarmprocessing/Fault,agnos s, erv cerestorat on, etwor sw tc ng, ont ngencyana ys s,

FACTs,Stateestimation

Analysis/Modeling,

Power

flow,Harmonics,Transient

stability,Dynamic

s a y, on ro es gn, mu a on opera ors, ro ec on

Documents

Soft Computing 2