BERHAMPUR SOFT COMPUTING (PECS 3401)- NEURAL …kishoreinvssut.yolasite.com/resources/Soft...ON SOFT COMPU F INFORMATION 04 L the or uts They the fin weigh ii. A They not. T which

AS PER THE SYLLABUS OF BPUT FOR SEVENTH

Lecture Notes | KISHORE KUMAR SAHU

RIT, BERHAMPUR SOFT COMPUTING (PECS 3401)- NEURAL NETWORKS SEMESTER OF AE&IE BRANCH.

INTRONeural nability toBIOLOThe basystemscomponas followi. DenThey acto the respons

ii. CelThey armanipuliii. AxoThey actto the ne

ARTIFThey arinspiredneuronsartificiaare simineuron. i. Inp

ODUCTION networks are simpo learn and acquirOGICAL NEUROsic unit of neurs is a neuron. Tnents of a neuron aws:

ndrite (Synapse)t as the i/p channneuron. They aible to feed the inpll body (Soma) re the heart of late the inputs recon t as the o/p channeighbouring neuro

FICIAL NEURONre the models thd form the bios. The componenl neuron networks ilar to that of bio

puts BY: K

NEURAL NEplified models of bre the knowledge aON ron The are nels are puts to the neuron

neuron system. eived from the synnel for the neuron. ons. N MODEL hat are ological nts of

(ANN) ological

L

KISHORE KUMA

ETWORKS FUbiological neuron sand make it availab

ns form the neighboThey are responnapses. They are responsi

LECTURE NOTES

R SAHU, DEPT O

CHAPTER-UNDAMENTAsystems that haveble for future use.

ouring neurons. nsible to processible for giving outp

ON SOFT COMPU

F INFORMATION

04 AL the

s or puts

They the finweighii. AThey not. Twhichthe iniii. OThey depen

ARCHAn artlarge architan ANsignalgraphsynapi. SThis tcomprlayers

input The receivand treceivsignallinks

UTING P a g e | 2

N TECHNOLOGY,

are responsible fonal input to the acthed inputs i.e. =Activation Functioare responsible foThis is possible duh the inputs are alwput is propagated i.e. = ∅( ) =Outputs are responsible nding on the decisiHITECTURE OFtificial neural netwnumber of simpletecture inspired byNN structure can ls in always restrhs, where the verptic links. Broadly tSingle layer feed-ftype of networksrises of twos, namely theand output layer.input layersve the input signalthe output layersve the outputls. The synapticcarrying the

RIT, BERHAMPU

or receiving inputstivation functions.+ + ⋯on or checking whethue to a threshold ways checked. If thto the output othe∅(∑ − )

for receiving theion taken by the acF NEURAL NETworks is defined ase highly interconny the structure of tbe represented uricted in one dirertices represents there are three claforward networkss o e . s l s t c e

UR.

s i.e. x1, x2, …, xn an The input to the n⋯ + = ∑her the input signavalue present in the inputs exceed ther not = . e output from thctivation function.

TWORKS s a data processinnected processingthe cerebral cortexusing a directed gection only so weneurons, and theasses of neural netw

s

nd summing up theneuron is the sum . al is to sent to outthe activation unithe threshold value he activation fun

g system consisting elements (ANN)x of the brain. Genraph. Since the fle make use of dire edges representworks as follows.

en get of the tput of t with e, then

ctions ng of a is an erally, ow of rected ts the

weights Such a nlayer thperformii. MuThis arccalled layer NNpresencnumberintermelayers

hidden unit olayer ishidden hidden uinput sigweight hidden-ohidden nmulti layiii. RecNetworkfeedbackdifferenalso poshave a s

LEARNLeaningtypes: su

connect the evernetworks is said tohe network is callms the computationulti layer feed-forwchitecture is as multi N due to the e of a r of diate called as

layer. The of hidden s called as neurons or

units. These hiddegnals along with thcalled as input-hioutput layer weigneuron in one layyered neural is repcurrent networksks in this class hak loop. This is what from other nessible that the neuelf loop i.e. feed ba

NING METHODg methods in Neurupervised, unsuper

BY: K

ry input layer to o be feed-forward iled as single layen. The input layer jward networks

n layer are responhe output layer. Thidden layer weightht. If a multi-layeyer and in the othepresented as 3-5-4

s s at least one at make them tworks. It is rons will also ack into itself. DS ral Networks can rvised and reinforce

L

KISHORE KUMA

the output neuroin type or acyclic ier as it is the outjust pass the signa

nsible for performhey are connected t and connect wiered NN is haviner, and 2 in the ou4-2.

be broadly classed.

LECTURE NOTES

R SAHU, DEPT O

n but not vice-vein nature. Despite tput layer alone tls to output layer.

ming the processinwith input layer wth output layer wng 3 input neuronutput layer. Then

ified into three b

ON SOFT COMPU

F INFORMATION

ersa. two that

g of with with n, 5 this

asic

i. Sndpabpi.i.

i.ii.

ii. Upptii.i.

ii.ii

iii. Rpc

UTING P a g e | 3

N TECHNOLOGY,

Supervised Learnnetwork is associdesired pattern. Aprocess, when a coand the correct exbe used to changeperformance. Gradient descendefined in terms it is required thdifferentiable, aserror E. Thus, if Δjth neuron of the leaning rate paraweight Stochastic learnfashion. An examechanism emplof NN systems.

Unsupervised Leapresent to the netpatterns and henceto structural featurHebbian learninoldest leaning mpattern pair (Xi, correlation matrtranspose of asso

i. Competitive Leastrongly to input is presented, all undergoes weighReinforced Learnipresent the expeccorrect or incorre

RIT, BERHAMPU

ing: In this, everyated with an outA teacher is assuomparison is madexpected output, to e network parament learning: This of weights and thehat the activations the weight updaΔ is the weight two neighbouringameter and is

ing: In this methoample is evidentloyed by Boltzmanarning: in this leatwork. It is as if the, the system learnres in the input patng: It is based on cmechanism inspire

Yi) are associatedrix. It is compuociated output vectarning: In this mt stimuli have theirneurons in the lht adjustment. Hening: In this methocted answer but oect. The informat

UR.

y input pattern thtput pattern, whicmed to be presee between the netwdetermine the ereters, which resultis based on the me activation functin function employate is dependent update of the linkg layers, then Δthe error gradienod, weights are adt in simulated nn and Cauchy macarning method, thhere is no teacherns of its own by ditterns. correlative weighted by biology. Ind b the weight muted as = ∑tor Yi. method, those ner weights updatedlayer compete andnce, it is a "winner-od, a teacher thouonly indicated if ttion provided hel

hat is used to trach is the target ont during the leawork's computed orror. The error cant in an improvemminimization of eron of the networkyed by the netwoon the gradient ok connecting the it= , where nt with reference justed in a probabannealing-the leachines, which are ahe target output r to present the deiscovering and adat adjustment. This n this, the input-omatrix W, known a. Here ieurons which re. When an input pd the winning ne-takes-all" stratergugh available, doethe computed outlps the network

in the or the arning output n then ent in rror E k. Also, ork is of the th and is the to the bilistic arning a kind is not esired apting is the output as the is the spond attern eurons gy. es not tput is in its

leapenfor

CHARAi. Thetheii. Theexaunkuntiii. Theoutiv. Thepatv. ThedisSOME Neural problemi. Pat

rning process. A nalty for a wrong ams of learning.

ACTERISTICS Oe NNs exhibit mapeir associated outpe NNs learn by exaamples of a probleknown instance otrained. e NNs possess thtcomes form past te NNs are robust tterns form incompe NNs can procetributed manner.APPLICATIONnetworks have sms. Some of them ottern recognition

BY: K

reward is given fanswer. But, reinfo

OF NEURAL NEpping capabilities,put patterns. amples. Thus, NN aem before they aref the problem. Thhe capability to gtrends. systems and are fplete, partial or noess information iN DOMAINS successfully appliof are listed below:

(PR)/image proc

L

KISHORE KUMA

for a correct answorced learning is n

ETWORKS , that is, they can architecture can be tested for their inhey can identify negeneralize. This thfault tolerant. Theyoisy patterns. in parallel, at hiied for the solut: cessing:

LECTURE NOTES

R SAHU, DEPT O

wer computed annot one of the popu

map input patternbe trained with knonference capabilityew objects previouhey can predict ny therefore, recall igh speed, and ition of a variety

ON SOFT COMPU

F INFORMATION

nd a ular

n to own y on usly new full n a y of

Neuraimagetasks.ii. OThis csolutithe sharisinusing iii. FNeuratrendsmeteoiv. CNeurasystemincorpused f

UTING P a g e | 4

N TECHNOLOGY,

al networks have es, handwritten chOptimization/ concomprises problemons. Examples of hortest possible tog out of industrialNNs. Forecasting and ral networks have s. They have, thorology, stock marControl systems: al networks have gms. Dozens of coporating NN technfor the control of c

RIT, BERHAMPU

shown remarkabharacters, printednstrain satisfactioms which need tosuch problems inour given a set of cl and manufacturirisk assessment: exhibited the caherefore, found rket, banking, and egained commercialmputer products,nology, is a standichemical plants, ro

UR.

ble progress in thd characters, speecon: o satisfy constrainnclude manufacturcities, etc. Several ng fields have fouapability to prediample applicatioeconometrics withl ground by finding, especially, by thng example. Besidobots and so on.

he recognition of ch and other PR nts and obtain opring scheduling, fiproblems of this nund acceptable soluict situation fromons in areas such high success rateg applications in cohe Japanese compdes they have also

visual based ptimal inding nature utions

m past ch as es. ontrol panies o been

ACTIVTo gene∑functionA very csum is cthe outpknown aSignumAlso knoSigmoidThis fuasymptoslope pbetweenis an impHyperboThe funcThe othPiecewis

VATION FUNCTerate the final ou.) is passed on to

n/ squash function wcommonly used accompared with a tput is 1 else it is as Heaviside functi

function own as the quantiz

dal function unction is a conotic values 0 and arameter, which n the two asymptoportant feature of olic tangent functction is given by ∅her activation funse Linear, Hard Lim

BY: K

BACKPR

TIONS utput y, the sum o a non-linear filtwhich releases thectivation function threshold value .0. i.e. = (∑on and is such that

zer function, definetinuous function 1 or -1 and +1 is adjust the abrupotic values. SigmoNN theory. tion (I) = ℎ ( ) andnctions are Radiamiter, Unipolar Sig

L

KISHORE KUMA

ROPAGATIO

input ( = +er called activate output. i.e. =is the Thresholding If the value of I i− ), where, t y = ∅(I) = 1, I0, Ied as = ∅(I) =

that varies gragiven by: ∅(I) =ptness of the funidal functions ared can produce negaal Basis function, moidal and Bipola

LECTURE NOTES

R SAHU, DEPT O

CHAPTER-ON NETWOR

+ + ⋯ +tion function/ tran( ). g function. In this, is greater that , t is the step func . +1, I−1, I .

adually between . Where α is nction as it chan differentiable, whative outputs valueUnipolar Multimor Sigmoidal.

ON SOFT COMPU

F INFORMATION

05 KS

=nsfer

the then tion

the the nges hich es.

odal,

UTING P a g e | 5

N TECHNOLOGY,

RIT, BERHAMPU

UR.

SINGLEA singlea input lwe havex1, and xand w2 The oucalculateinputs aline, wrepresenproblemperceptr

E LAYER PERCe layer perceptronlayer and a outpute a perceptron witx2 as inputs with and a bias x0 withutput of the pered as the sum oand bias i.e. net= which makes cleanting problems thms that are linearlyron.

BY: K

CEPTRON n consists of t layer. Here h two input weights w1, h weight w0. rceptron is of weighted w0 + w1x1+ w2x2. Thar that single laat are linearly sepy separable, hence

L

KISHORE KUMA

his represents the ayer perceptron arable. Since logice can be easily sim

LECTURE NOTES

R SAHU, DEPT O

equation of a straare responsiblec gates correspondulated by single la

ON SOFT COMPU

F INFORMATION

ight of ds to ayer

Algor

Step 1

Step 2Step 3Step 4Step 5

Step 6

Step 7

ExamThe tr

It

x----

UTING P a g e | 6

N TECHNOLOGY,

rithm for training

1. Initialize tbetween 02. While stop3. For each tr4. Set the act5. Compute tActivation+1,−1,6. If o/p andfollows: ( ) an7. Test for sto

mple: Train araining set for AND

Inpu

t

Net

x1 x2 b-1 -1 01 -1 0-1 1 01 1 0-1 -1 -1 -1 -1 -1 --1 1 -1 -1 1 1

RIT, BERHAMPU

g a single layer perthe weights and bi0-1. pping condition fairaining pair do stetivation of i/p. the net i/p to the function is use00 . d target are not eq( ) = ( ) nd ( ) = (opping condition.a single layer perceD gate is as follows

Net

Outp

ut

Targ

et

y T0 0 -10 0 -10 0 -10 0 1-3 -1 -1-3 -1 -1-3 -1 -13 1 1

UR.

rceptron ias to zero. Also seils do steps-3 to steep-4 to step-6. activation functioned to compute qual then change t) + and ( ). eptron for an ANDs:

W

eigh

t Ch

ange

∆w1 ∆w2 1 1-1 11 -11 1Training is suoutput of the perceptron istarget output

et the learning ratep-7. n = + (∑the o/p = (the weights and b( ) = ( ) +else ( ) =

gate.

Upda

ted

Wei

ght

∆b w1 w2 b-1 1 1 -1-1 -1 1 -1-1 1 -1 -11 1 1 1uccessful since the single layer s equal to the t.

x1 x2 -1 -1 1 -1 -1 1 1 1

te in ). ) =

bias as + .=

b 1 1 1 1

T -1 -1 -1 1

ADALIADA lineuron single componsingle concludsame asupdate ( )Algorith

Step 1.

Step 2. Step 3. Step 4. Step 5.

Step 6.

Step 7.

MADAThe MAhandle problemwhich several networkperceptrcalled areason t

INE NETWOKSne networks (anetworks) are simlayer perceptrnents are similar layer perceptrone the ADA line s single layer percequation are ) = ( ) + (hm for training a Initialize thebetween 0-1While stoppiFor each traiSet the activaCompute theActivation f+1, 0−1, 0If o/p and tafollows: ( )Test for stop

ALINE NETWORDA line networks linearly separam. This is a netwis composed constituent ADA lk or single laron. This networkas MADA line duethat is has a numbeBY: K

adaptive linear milar to that of ron. All the to that of a n. So we can networks are eptron, except theas follows: − ).i.e. Tk is repsingle layer percee weights and bias. ing condition fails ining pair do step-ation of i/p. e net i/p to the actfunction is used . arget are not equa ( ) = el= ( ) and (pping condition.

RKS can able ork of line ayer k is e to er of hidden layers

L

KISHORE KUMA

e in the training al( ) = ( )placed with the ereptron s to zero. Also set do steps-3 to step4 to step-6. tivation function to compute thal then change the( ) = ( )= ( ) + (lse( ) = ( ).

s.

LECTURE NOTES

R SAHU, DEPT O

lgorithm we have + ( − )ror ( − ). the learning rate -7.

= + (∑he o/p = (e weights and bia) + ( − )− ).

ON SOFT COMPU

F INFORMATION

the and in ). ) =

s as and

Algor

Step 1

Step 2Step 3Step 4Step 5Step 6

Step 7

Step 8

Step 9

Exam

x1 x

‐1 ‐

1

‐1

1 ‐

x1 x

‐1 ‐

1

‐1

1 ‐

x1 x

‐1 ‐

1

‐1

1 ‐

UTING P a g e | 7

N TECHNOLOGY,

rithm for MADA lin

1. Initialize trandom va0.5. 2. While stop3. For each tr4. Set the act5. Compute t6. Activation= (7. Calculate apply activ8. If o/p andfollows:( ) = (9. Test for sto

mple: performT

x2 w1 w2 b

‐1 0.43 0.43 0

1 0.43 0.43 0

1 0.43 0.43 0

‐1 0.43 0.43 0

x2 w1 w2 b

‐1 0.43 0.43 0

1 ‐0.17 0.43 0

1 0.43 0.43 0

‐1 0.43 0.43 0

x2 w1 w2 b

‐1 0.43 0.43 0

1 ‐0.77 0.43 0

1 0.43 0.43 0

‐1 0.43 0.43 0

RIT, BERHAMPU

ne the i/p and hiddenalues and hidden pping condition fairaining pair do stetivation of i/p unitthe net i/p to the h function is us( ) . the net i/p for vation function ford target are not eq = ( ) + ( −) = ( ) and opping condition.m the training proTRAINING X‐OR BY MADb1 yin y T

0.5 ‐0.36 ‐1 ‐

0.5 1.36 1 ‐

0.5 0.5 1 1

0.5 0.5 1 1

b1 yin y T

0.5 ‐0.36 ‐1 ‐

0.5 0.76 1 ‐

0.5 0.5 1 1

0.5 0.5 1 1

b1 yin y T

0.5 ‐0.36 ‐1 ‐

0.5 0.16 1 ‐

0.5 0.5 1 1

0.5 0.5 1 1

UR.

n neuron layer welayer and o/p layils do steps-3 to steep-4 to step-8. s. hidden units ( )sed to compute o/p neuron (r each yi, =qual then change t ( ) = (− ). else ( ) = ( )cess with MADA liDALINE ALGORITHM T ∆w1 ∆w2 ∆b

1 0 0 0

1 ‐0.6 0 ‐0.

1 0 0 0

1 0 0 0

T ∆w1 ∆w2 ∆b

1 0 0 0

1 ‐0.6 0 ‐0.

1 0 0 0

1 0 0 0

T ∆w1 ∆w2 ∆b

1 0 0 0

1 ‐0.6 0 ‐0.

1 0 0 0

1 0 0 0

eights and bias to yer weights and bep-9. = + (∑ )hidden unit o) = + (∑( ) . the weights and b) + ( − )).

ine with 2 to 1 for b1 w1n w2n b

0.43 0.43 0

6 ‐0.17 0.43 ‐0

0.43 0.43 0

0.43 0.43 0

b1 w1n w2n b

0.43 0.43 0

6 ‐0.77 0.43 ‐0

0.43 0.43 0

0.43 0.43 0

b1 w1n w2n b

0.43 0.43 0

6 ‐1.37 0.43 ‐0

0.43 0.43 0

0.43 0.43 0

small bias to ). output, ) and bias as and X-OR.

b1n

0.5

0.1

0.5

0.5

b1n

0.5

0.1

0.5

0.5

b1n

0.5

0.1

0.5

0.5

LECTURE NOTES ON SOFT COMPUTING P a g e | 8

BY: KISHORE KUMAR SAHU, DEPT OF INFORMATION TECHNOLOGY, RIT, BERHAMPUR.

x1 x2 w1 w2 b1 yin y T ∆w1 ∆w2 ∆b1 w1n w2n b1n

‐1 ‐1 0.43 0.43 0.5 ‐0.36 ‐1 ‐1 0 0 0 0.43 0.43 0.5

1 1 ‐1.37 0.43 0.5 ‐0.44 ‐1 ‐1 0 0 0 ‐1.37 0.43 0.5

‐1 1 0.43 0.43 0.5 0.5 1 1 0 0 0 0.43 0.43 0.5

1 ‐1 0.43 0.43 0.5 0.5 1 1 0 0 0 0.43 0.43 0.5 BACK PROPAGATION NEURAL NETWORKS A multilayer feed forward back propagation neural network with one layer of z hidden units is shown in the figure. The layer having the bias from w01, w02,…,w0m and hidden neuron having the bias v01, v02,…,v0m. The figure only feed forward network is shown but during the back propagation phase of learning the signals are sent in the reverse direction, i.e. from o/p to i/p layer. The training algorithm of back propagation involves four steps: Initialization of weights and bias- The bias and the weights are chosen as small numbers. Feedforward-Each i/p pair receives the i/p and transmits to the hidden layer. Each hidden unit then calculates the o/p and transmits this signal to the o/p units. Back propagation of error-Each o/p is compared with target o/p and the determines the error. Based on the error the factor δk is computed and is used to distribute the error at o/p unit yk back to all units in the previous layer. Similarly the factor δj is computed for each hidden unit. Update the weights and the bias- The weights and the bias are updated using the δ-factor and activation. δ=(tk-yk)f ’ (tk-yk)(Generalized δ rule of learning). Algorithm for Back propagation

Step 1. Initialize weights and bias to small random values. Step 2. While stopping condition fails do steps-3 to step-10. Step 3. For each training pair do step-4 to step-9. Step 4. Each i/p unit receives the signal xi and transmits this signal to all other units. Step 5. Compute the net i/p to the hidden units ( ) = + ∑ . . Activation function is used to compute the o/p of hidden units, = ( ) .

Step 6. Calculate the net i/p for o/p neuron ( ) = + ∑ . and apply activation function for each yk, = ( ) . Step 7. For each yk calculate the δk=(tk-yk)f '(yin(k)) = (tk-yk).f (yin(k)).(1- f (yin(k))). Step 8. Each hidden unit sums its δ i/p from the previous layer and o/p is given as: ( ) = ∑ . . The error information is calculated as ∆ = ( ). ( ( )). Step 9. Each of the o/p unit update its bias and weights as given below. ( ) = ( ) + ∆ where ∆ = and bias ( ) = ( ) + ∆ where ∆ = . Each hidden unit update its weights and bias as follows ( ) = ( ) +∆ where ∆ = ∆ and bias ( ) = ( ) + ∆ where ∆ = ∆ . Step 10. Test for stopping condition. Merits of Back propagation Algorithm 1. The mathematical formulation is so compatible for any kind of networks. 2. Multilayer neural network trained with back propagation algorithm has got a greater representation capability. Any non linear activation function. 3. It requires good set of training data. It can tolerate noise and missing data in training sample. 4. Easy to implement. 5. The computation time is reduced if the weights chosen are small at the beginning stage. 6. Wider application. 7. Can be used to store a huge amount of pattern. 8. The batch update of weights exists, which provides a smoothing effect on weight corrections. Demerits of Back propagation Algorithm 1. Learning often takes longer time to converge. 2. Complex functions requires more iterations. 3. Gradient descent method used in back propagation algorithm gives guarantee to minimize error at local minima. 4. The network may be trapped in a local minima, though a better solution is available nearby. 5. The training may sometimes cause temporal instability to the system.



Example: Find the new weights for a network with i/p pattern [0.6 0.8 0] and target o/p is 0.9. The i/p hidden weights are = 2 1 01 2 20 3 1 and hidden o/p weights are w=[-1 1 2]T. v03=-1 and w01=-1. The learning rate = 0.3 and use binary sigmoid activation function. Solution: ( ) = ; ( ) = + ∑ . and = ( ) = ( ).

( ) = + (∑ . ) = 0 + (0.6*2 + 0.8*1 + 0*0) = 2; ( ) = + (∑ . ) = 0 + (0.6*1 + 0.8*2 + 0*3) = 2.2; ( ) = + (∑ . ) = -1 + (0.6*0 + 0.8*2 + 0*1) = 0.6; = ( ) = ( ) = . ∗ = . = 0.645; = ( ) = ( ) = . ∗ . = . = 0.659; = ( ) = ( ) = . ∗ . = . = 0.5448; ( ) = + ∑ . and = ( ) = ( ). ( ) = + ∑ . = -1 + (0.645*-1 + 0.659*1 + 0.5448*2) = 0.1; = ( ) = ( ) = . ∗ . = . = 0.50749 ≈ 0.5075;

δk=(tk-yk)f '(yin(k)) = (tk-yk).f (yin(k)).(1- f (yin(k))) δ1 = (0.9 – 0.5075) * 0.5075 * (1-0.5075)=0.0977. Error at hidden layers ( ) = ∑ . ;

( ) = ∑ . = 0.0977*-1 = -0.0977; ( ) = ∑ . = 0.0977*1 = 0.0977;

( ) = ∑ . = 0.0977*2 = 0.1954; Error term at hidden layer: ∆ = ( ). ( ) = ( ). ( ) 1 − ( ) ; ∆ = ( ). ( ) 1 − ( ) = -0.09777*0.645*(1-0.645) = -0.01802; ∆ = ( ). ( ) 1 − ( ) = 0.09777*0.659*(1-0.659) = 0.01769; ∆ = ( ). ( ) 1 − ( ) = 0.1954*0.5448*(1-0.5448) = 0.03906; Weight Updating Change in weight (i/p-hidden layer) ∆ = ∆ ∆ = ∆ =0.3* -0.02238*0.6= -0.00324; ∆ = ∆ =0.3* -0.01769*0.6= 0.00318; ∆ = ∆ =0.3* 0.03906*0.6= 0.000703; ∆ = ∆ =0.3* -0.02238*0.8= 0.00432; ∆ = ∆ =0.3*0.01769*0.8= 0.00424; ∆ = ∆ =0.3*0.03906*0.8= 0.000937; ∆ = ∆ =0.3*-0.02238*0= 0; ∆ = ∆ =0.3*0.01769*0= 0; ∆ = ∆ =0.3*0.03906*0= 0; New weights are as follows ( ) = ( ) + ∆

= 2 − 0.00324 1 + 0.00318 0 + 0.0007031 + 0.00432 2 + 0.00424 2 + 0.0009370 + 0 3 + 0 1 + 0 = 0 0 00 0 00 0 0

Change ∆ =∆ =∆ =New bia= 0Change ∆ =∆ =∆ =New wew = [-1+Change ∆ =New bia(

in bias (hidden lay∆ =0.3* -0.02238∆ =0.3* -0.01769∆ =0.3* 0.03906as are ( ) =− 0.0054 0 + 0.00in the weights for = 0.3 ∗ 0.09= 0.3 ∗ 0.09= 0.3 ∗ 0.09eights for the hidde+0.01526 1+0.015in bias for o/p laye= 0.3 ∗ 0.097as for output layer) = ( ) + ∆

BY: K

yer): ∆ = ∆ 8= -0.0054; 9= 0.0053; = 0.00117; ( ) + ∆ ; 053 − 1 + 0.0011hidden-o/p layer:977 ∗ 0.645 = 0.01977 ∗ 0.659 = 0.01977 ∗ 0.5448 = 0.0en-o/p layer: (58 2+0.01287] = er: ∆ = 7 = 0.023; ( ) = (∆ = −1 + 0.023

LE

KISHORE KUMA

7 = −0.0054 0.0 ∆ = 1526; 1558; 01287; ) = ( ) +[-0.98474 1.0155

( ) + ∆ . 3 = −0.976.

ECTURE NOTES O

R SAHU, DEPT O

0053 − 0.99883 .

+ ∆ 8 2.01287].

ON SOFT COMPU

F INFORMATION

BACK

To tradata sAlso, magniBPTT this fnetwonetwonetwoTraininetwoobservof steps level.)weighAfter each iweigh

UTING P a g e | 10

N TECHNOLOGY,

K PROPAGATIO

ain a recurrent neushould be an orderan initial value mitude is used for thbegins by unfoldfigure. This recurorks, f and g. Wheork contains k instork has been unfoling then proceedork with backpropvations, , inare needed becau) Typically, backprhts as each trainingeach pattern is preinstance of f (f1,f2,hts. Also, is

RIT, BERHAMPU

ON THROUGH

ural network usingred sequence of inpmust be specified his purpose. ing a recurrent nerrent neural netwen the network itances of f and oneded to a depth of ks in a manner sipagation, except n sequential oruse the unfolded nropagation is appg pattern is presenesented, and the w,...,fk) are averagedcalculated as

UR.

TIME (BPTT)

g BPTT, some trainput-output pairs, for . Typicallyeural network thrwork contains twis unfolded throue instance of g. In k=3. milar to training that each epoch rder. Each train. (All ofnetwork contains iplied in an online nted. weights have been d together so that

ALGORITHM

ning data is neededy, the vector with rough time as showo feed-forward nugh time, the unfthe example show

a feed-forward nmust run througning pattern cof the actions for kinputs at each unfmanner to updatupdated, the weigthey all have the, which provide

d. This . zero-

own in neural folded wn, the neural gh the onsists

k time-folded te the ghts in same es the



information necessary so that the algorithm can move on to the next time-step, t+1. Pseudo-code Pseudo-code for BPTT: Back_Propagation_Through_Time(a, y) // a[t] is the input at time t. y[t] is the output Unfold the network to contain k instances of f do until stopping criteria is met: x = the zero-magnitude vector;// x is the current context for t from 0 to n - 1 // t is time. n is the length of the training sequence Set the network inputs to x, a[t], a[t+1], ..., a[t+k-1] p = forward-propagate the inputs over the whole unfolded network e = y[t+k] - p; // error = target - prediction Back-propagate the error, e, back across the whole unfolded network Update all the weights in the network Average the weights in each instance of f together, so that each f is identical x = f(x); // compute the context for the next time-step

Advantages BPTT tends to be significantly faster for training recurrent neural networks than general-purpose optimization techniques such as evolutionary optimization. Disadvantages BPTT has difficulty with local optima. With recurrent neural networks, local optima is a much more significant problem than it is with feed-forward neural

networks. The recurrent feedback in such networks tends to create chaotic responses in the error surface which cause local optima to occur frequently, and in very poor locations on the error surface. RADIAL BASIS FUNCTION NETWORK (RBFN) Let us look at some regression models: 1. Polynomial regression with one variable ( , ) = + + + ⋯ = ∑ 2. Simple linear regression with D variable ( , ) = + + ⋯ + = , in one-dimensional case ( , ) = + , which is a straight line. 3. Linear regression with Basis function ∅ ( ) ( , ) = + ∑ ∅ ( ) = ∅( ) there are now M parameters instead of D parameters For this basis function we use Radial Basis functions. A radial basis function depends only on the radial distance (typically Euclidean) from the origin, i.e. ∅( ) = ∅(‖ ‖). If the basis function is centered at then ∅ ( ) = ℎ(‖ − ‖). We would look at radial basis functions centered at the data points , n=1,...,N.

Typically h(x) is a Gaussian ∅ ( ) = − ‖ ‖22 or a logistic function ∅ ( ) = 11+ ‖ ‖2 . The RBFN is characterised by two types of weights i.e. hypothetical (fixed weights of input-hidden layer) and adjustable weights (weights that can be adjusted, hidden-output layer weights i.e. wi). Algorithm for Radial Basis Function N/W

Step 1. Initialize weights to small random values. Step 2. While stopping conditions fails do step-3 to step-9. Step 3. Activate the inputs neurons by applying a set of inputs.



Step 4. For each input do step-5 to step-8. Step 5. Calculate RBF. Step 6. Choose centre for each radial basis function. Step 7. Calculate output for each hidden neuron as ℎ ( ) = − ∑ ‖ ‖22=0 where xi is the applied input, it the centre of Gaussian function and is the smoothing parameter or width of the Gaussian function. Step 8. Calculate the output of the network as ( ) = ∑ ℎ ( ) +=1 where weights of the adjustable connections, ℎ ( )is the response of the RB neurons, and is the bias for output neurons. Step 9. Test for the stopping condition. = ∑ ∑ ( ) − , where E is the error and where n is the number of input patterns, k is the sum of the values for each output node k. ( ) is the achieved output for node k given input ( ) and is the desired output for k node given input n. Comparison of RBFs and BP-MLP

RBF N/W BP_MLPAlways consists of three layers, i.e. input, output and hidden layer. Consists of one input and output layer and a number of hidden layer. The input-hidden layer weights are hypothetical (cannot be changed). All the connections are adjustable.The response of the hidden layers follows the Gaussian function. The response of the hidden layers is a linear connection of all the inputs of that neuron Constructs local approximation to non-linear input-output mapping. Constructs global approximation to non-linear input-output mapping. Has non-linear activation function It has linear activation function.

KOHONEN SELF ORGANIZING FEATURE MAP (SOM) The key principle for map formation is that training should be taken place over an extended region of the network, which uses the concept of neighbourhood neurons. The competitive networks is similar to a single layer network, except there exist an interconnection between the output neurons because of which all the output neurons compete with each other and the winner will be selected. There are two methods to select a winner. 1. Squared Euclidean method: Here we will find the square of the distance between input vector xi and weight vector wij i.e. ( ) = ∑ ( − )2=1 . The winner is that neuron which is having small distance. 2. Dot Product method: Using this method will find the dot product of input vector and weight vector which is given by ( ) = ∑ ∗=1 . That neuron is treated as winner that is having the large amount of dot products. Algorithm for Kohonen Self Organizing Feature Map Initially the weights and learning rates (α) are set to a small values. The inputs vector to be clustered presented to the network. Once the input vector is given and based on the initial weights all the output neurons will compete with each other and the winner will be selected. Based on the winner selection, weights are up to date for a particular unit.

Step 1. Initialize weights and learning rate. Step 2. While stopping condition fails do step-3 to step-9. Step 3. For each input vector do step-4 to step-6. Step 4. For each j compute the squared Euclidean distance ( ) = ∑ ( − )2=1 for 1 ≤ ≤ 1 ≤ ≤ . Step 5. Find the index j for minimum Dj. Step 6. For all units j for a specified neighbourhood of j and for all i update the weights. ( ) = + ( − ). Step 7. Update the learning rule. Step 8. Reduce the radius of topological neighbourhood. Step 9. Test for stopping condition.



Example: = 0.2 0.6 0.4 0.9 0.20.3 0.5 0.7 0.6 0.8 , = 0.3 0.4 , α=0.3. Solution: ( ) = ∑ ( − )2=1

D(1)=(x1-w11)2+(x2-w21)2=(0.3-0.2)2+(0.4-0.3)2=0.01+0.01=0.02, D(2)=(x1-w12)2+(x2-w22)2=(0.3-0.6)2+(0.4-0.5)2=0.09+0.01=0.1, D(3)=(x1-w13)2+(x2-w23)2=(0.3-0.4)2+(0.4-0.7)2=0.01+0.09=0.1, D(4)=(x1-w14)2+(x2-w24)2=(0.3-0.9)2+(0.4-0.6)2=0.36+0.04=0.4, D(5)=(x1-w15)2+(x2-w25)2=(0.3-0.2)2+(0.4-0.8)2=0.01+0.16=0.17, ( ) = + ( − )

w11n=w11o+α(x1-w11)=0.2+0.3(0.3-0.2)=0.2+0.03=0.23;

w21n=w21o+α(x2-w21)=0.3+0.3(0.4-0.3)=0.3+0.03=0.33;

= 0.23 0.6 0.4 0.9 0.20.33 0.5 0.7 0.6 0.8 . LEARNING VECTOR QUANTIZATION (LVQ) The architecture of LVQ is similar to SOM architecture. In LVQ networks each output unit has a known class. Hence it uses supervised learning method. The method for initializing the reference vector. 1. Take first m training data and use them as weight vectors and remaining vectors used for training. 2. Initialize the reference vector randomly and assign initial weights and classes randomly. Training Algorithm of LVQ The algorithm for the LVQ network is to find the output unit that has a matched pattern with the input vector. At the end of the process if x (input vector) and w (weight vector) belongs to the same class, then weights are moved towards the new input vector.

In this method winner neuron is identified. The winner neuron index is compared with the target and based on the comparison results weights are updated. Step 1. Initialize the weights and learning rates. Step 2. If stopping condition fails do step-3 to step-7. Step 3. For each training input perform step-4 to step-5. Step 4. Compute j using square Euclidean distance method. ( ) = ∑ − 2=1 , find j when D(j) is minimum. Step 5. Update wij as follows: if T=Cj then ( ) = ( ) + − ( ) else ( ) = ( ) − − ( ) Step 6. Reduce the learning rate α. Step 7. Test for the stopping condition.

Example: 1 0 1 00 0 1 111 10 00 01 is the vector and class is given by 1212 and α=0.1 to 0.05. Solution: Let us consider 1st two vectors as reference w1=[1 0 1 0] and w2=[0 0 1 1] and other two vectors used as training data. α=0.1, x1=[1 1 0 0] and T=1. Calculate ( ) = ∑ − 2=1 D(1) = (1-1)2 + (0-1)2 + (1-0)2 + (0-0)2 = 2. D(2) = (0-1)2 + (0-1)2 + (1-0)2 + (1-0)2 = 4. Therefore D(1) is minimum i.e. j=1 and Cj = 1. Now ( ) = ( ) − − ( ) w1(new) = [1 0 1 0] + 0.1*([1 1 0 0] – [1 0 1 0]) = [1 0 1 0] + 0.1*([0 1 -1 0]) = [1 0 1 0] + [0 0.1 -0.1 0] = [1 0.1 0.9 0]



w1 = [1 0.1 0.9 0] w2 = [0 0 1 1] x2 = [1 0 0 1] α=0.1 and T=2. D(1) = (1 - 0)2 + (0.1 - 0)2 + (0.9 - 1)2 + (0 - 1)2 = 1.82 D(2) = (0 - 1)2 + (0 - 0)2 + (1 - 0)2 + (1 - 1)2 = 2 D(1) is minimum and j=1 and Cj = 1 and T!=Cj w1(new) = [1 0.1 0.9 0] + 0.1*([1 0 0 1] – [1 0.1 0.9 0]) = [1 0.1 0.9 0] + 0.1*([0 -0.9 -0.1 1]) = [1 0.1 0.9 0] + [0 -0.09 -0.01 0.1] = [1 0.11 0.81 -0.1] αn = 0.5 x0.1 = 0.05. SIMULATED ANNEALING NEURAL NETWORKS Simulated annealing was developed in the mid 1970's by Scott Kirkpatric, along with a few other researchers. Simulated annealing was original developed to better optimized the design of integrated circuit (IC) chips. Annealing is the metallurgical process of heating up a solid and then cooling slowly until it crystallizes. If this cooling process is carried out too quickly many irregularities and defects will be seen in the crystal structure. Ideally the temperature should be deceased at a slower rate. A slower fall to the lower energy rates will allow a more consistent crystal structure to form. This more stable crystal form will allow the metal to be much more durable. Simulated annealing seeks to emulate this process. Simulated annealing begins at a very high temperature where the input values are allowed to assume a great range of random values. As the training progresses the temperature is allowed to fall. This restricts the degree to which the inputs are allowed to vary. This often leads the simulated annealing algorithm to a better solution, just as a metal achieves a better crystal structure through the actual annealing process. Simulated annealing can be used to find the minimum of an arbitrary equation that has a specified number of inputs. In the case of a neural network, as we will learn in Chapter 10, this equation is the error function of the neural network. he weight matrix of a neural network makes for an excellent set of inputs for the simulated annealing algorithm to minimize for. Different sets of weights are used for the

neural network, until one is found that produces a sufficiently low return from the error function. First, for each temperature the simulated annealing algorithm runs through a number of cycles. This number of cycles is predetermined by the programmer. As the cycle runs the inputs are randomized. Only randomizations which produce a better suited set of inputs will be kept. Once the specified number of training cycles has been completed, the temperature can be lowered. Once the temperature is lowered, it is determined of the temperature has reached the lowest allowed temperature. If the temperature is not lower than the lowest allowed temperature, then the temperature is lowered and another cycle of randomizations will take place. If the temperature is lower than the minimum temperature allowed, the simulated annealing algorithm is complete. A neural network's weight matrix can be thought of as a linear array of floating point numbers. Each weight is independent of the others. It does not matter if two weights contain the same value. The only major constraint is that there are ranges that all weights must fall within. Because of this the process generally used to randomize the weight matrix of a neural network is relatively simple. Using the temperature, a random ratio is applied to all of the weights in the matrix. This ratio is calculated using the temperature and a random number. The higher the

temperamatrix. AHOPFIHopfieldconstrucartificialartificialinputs. Wthere iassociathave astate of entails t• The∑• Theor e• A nfor

A Hopficonnectwij . Thewhose cNow givthe netwtwo way• Asynupdapicke• Syncupda

ature, the more liA lower temperatuIELD NETWORd networks are cted from l neurons. These l neurons have N With each input i s a weight wi ted. They also an output. The the output is mainthe following operae value of each inpx is calculated.e output state of thequal to 0. It is setneuron retains itmula: = 1 ∶ ∑−1 ∶ ∑ield network is a ted. The connectioe collection of all components are wijven the weight mawork is defined if ys of updating themnchronous: one piates immediately. ed at random, whichronous: the weigating the neurons.

BY: K

kely the ratio wilure will most likelyRKS

ntained, until the nations: put, xi is determinhe neuron is set tot to -1 if the weights output state unx 0 ∑ x 0 network of N sun weight from neusuch numbers is ij . atrix and the updawe tell in which om: cks one neuron, cThis can be donech is called asynchghted input sumsThen all neurons

LE

KISHORE KUMA

ll cause a larger cy produce a smalle

neuron is updated.ned and the weigho +1 if the weighteted input sum is smntil it is updateduch artificial neururon j to neuron i represented by tating rule for neuorder we update thcalculates the weie in a fixed orderhronous random us of all neurons arare set to their ne

ECTURE NOTES O

R SAHU, DEPT O

change in the weer ratio.

. Updating the neuhted sum of all inped input sum is larmaller than 0. again. Written arons, which are fis given by a numthe weight matrixrons the dynamiche neurons. Thereghted input sumr, or neurons canpdating. re calculated withew value, accordin

ON SOFT COMPU

F INFORMATION

ight

uron puts, rger as a fully mber x W, cs of eare

and n be hout g to

theex

USE The wthe nenodesor synthen rThe idmatrixnetwoConteassocicue an

UTING P a g e | 15

N TECHNOLOGY,

e value of their wample of synchron

OF THE HOPFIway in which the Hetwork by setting s. The network is tnchronous updatinread out to see whdea behind the Hx. The input musork then retrieve ent Addressable Miation. The patternnd association (se

RIT, BERHAMPU

weighted input sumnous updating.

IELD NETWORopfield network isall nodes to a spethen subject to a nng. This is stoppedich pattern is in thopfield network ist contain part othe patterns storMemory (CAM). Tns that are stored ee Fig. 2). By ente

UR.

m. The lecture sli

RK s used is as followsecific value, or by number of iteratiod after a while. Thhe network. is that patterns arof these patterns.red in the weightThe network can in the network arring the cue into

ides contain an ex

s. A pattern is entesetting only part ons using asynchrhe network neuronre stored in the w. The dynamics ot matrix. This is also be used for re divided in two the network, the

xplicit

ered in of the onous ns are weight of the called auto-parts: entire

pattern,restoresADAPT

The fuzz– input – input – rules t– outpu– the ou– a decisWe havsomewhmodellininterpreshape ofchange automatModel LSupposea fuzzy consist osimilarlyassociatbe chosorder to

which is stored ins the association thTIVE NEURO FU

zy inference systemcharacteristics to imembership functto a set of output ct characteristics toutput membership sion associated wive only considerehat arbitrarily chng systems whoseetation of the charf the membership the shape of tically adjusted deLearning and Infee we already have inference model/of a number of mey to that of neuted with a given men so as to tailoro account for the

BY: K

n the weight matrihat belongs to a givUZZY INFEREN

m that we have coninput membershiption to rules, characteristics, o output membersfunction to a singlith the output. ed membership hosen. Also, we he rule structure is eracteristics of the functions dependsthe membership pending on the darence Through ANa collection of inp/system that apprombership functionural networks. Rmembership functir the membership ese types of varia

LE

KISHORE KUMA

ix, is retrieved. In ven cue. NCE SYSTEM

nsidered is a modep functions, hip functions, andle-valued output, ofunctions that hahave only appliedessentially predetevariables in the ms on parameters thfunction. The ata that we try to mNFIS put/output data anoximate the data.ns and rules with aRather than choosion arbitrarily, thefunctions to the ations in the data

ECTURE NOTES O

R SAHU, DEPT O

this way the netw

el that maps d or ave been fixed, d fuzzy inferenceermined by the usmodel. In general hat can be adjusteparameters can model. nd would like to bSuch a model woadjustable parametsing the parameese parameters coinput/output data values. The neu

ON SOFT COMPU

F INFORMATION

work

and e to ser’s the d to be uild ould ters ters ould a in uro-

adaptto leafunctithe gifunctifunctialgoriyour f

FigureDatas

UTING P a g e | 16

N TECHNOLOGY,

tive learning technarn information aion parameters thaiven input/outpution anfis construion parameters ithm alone, or in cofuzzy systems to le

es: left to right aets, Training, Test

RIT, BERHAMPU

niques provide a mabout a data set, at best allow the at data. Using a gcts a fuzzy inferare tuned (adjuombination with aearn from the data

and top to bottoting, Rule base, FIS

UR.

method for the fuz in order to comassociated fuzzy ingiven input/outpurence system (FIsted) using eitha least squares typea they are modeling

om, Sugeno FIS, I, ANFIS network.

zzy modeling procmpute the membenference system tot data set, the toS) whose membeher a backpropage of method. This ag.

Input, Output, Lo

cedure ership o track oolbox ership gation allows

oading

Documents

BERHAMPUR SOFT COMPUTING (PECS 3401)- NEURAL …kishoreinvssut.yolasite.com/resources/Soft...ON SOFT COMPU F INFORMATION 04 L the or uts They the fin weigh ii. A They not. T which