BPDES1

7/27/2019 BPDES1

1/10

7/27/2019 BPDES1

2/10

7/27/2019 BPDES1

3/10

- 1 0 . 0 0 - 5 . 0 0 0 . 0 0 5 . 0 0 1 0 . 0 0

I N P U T

0 . 0 0

0 . 2 0

0 . 4 0

0 . 6 0

0 . 8 0

1 . 0 0

O

U

T

P

U

T

0 . 0 0

0 . 2 0

0 . 4 0

0 . 6 0

0 . 8 0

1 . 0 0

- 1 0 . 0 0 - 5 . 0 0 0 . 0 0 5 . 0 0 1 0 . 0 0

Figure 3. Sigmoidal Transfer Function

It is possible to construct a BP network using any one of several transfer functions. CHBPNsoftware uses a logistic sigmoidal transfer function, which has the following form:

Output, O = 1/{1 + exp(-Sj)} (1)

where Sj is the weighted sum of the inputs of neuronj. The benefit of using a logistic transferfunction is that it is easy to calculate its derivative, as required later in Equation (2) and Equation(11).

The hidden layer, which has no direct connection to input or output, is first computational layer inthe network. As the figure suggests, each hidden-layer neuron is connected to all of the input nodes.Each hidden neuron calculates the weighted sum of its inputs (Sj), applies the transfer function tothe sum to generate a result (Oj), then passes the result to the output layer. The number of neurons inthe hidden layer influences the network's behavior, often significantly. Networks with too manyhidden neurons tend to memorize the training data; those with too few cannot learn the problem.

The output layer generates the network's output. Each output-layer neuron is connected to all of thehidden neurons. Like the hidden neurons, each output neuron calculates the weighted sum of itsinputs and applies the transfer function to produce a result. The output layer then transmits all of theindividual results as the network's output vector. The number of neurons in the output layer usuallyfollows from the number of categories you want the network to recognize or from the function you

want it to emulate.

A BP network always passes data forward through the hidden layer to the output layer. Applying avector to the input side produces a corresponding vector on the output side. The network thus acts asa function, mapping input patterns to output patterns. The network learns to associate specific inputpatterns with specific output patterns by adjusting its weights during training, described in the nextsection.

7/27/2019 BPDES1

4/10

BP Algorithm Overview

A BP network learns by example, repeatedly processing a training file that contains a series of inputvectors and the correct (or target) output vector for each. Each pass through the training file is oneepoch. During each epoch, the network compares the target result with the actual result, calculates

the error, and modifies the network's weights to minimize the error. Through this process--calledsupervised training--the network learns to associate input patterns with the correct output patternsThe accumulated change in the weights represents what the network learned, and saving the trainedweights preserves the learned solution. This section provides a brief overview of the calculations BPperforms during training.

As an example the steps involved for a 3-layer network using a generalized delta rule withmomentum factor can be summarized as :

Assign the initial values for the primary weight matrices Wkh ( for input to hidden layer )and Whi .( for hidden to output layer)

Calculate the output vector for each pattern of the input vector (Inp)k =1,km km

Ih =Wkh(Inp)k (2) k =1

Oh = 1/{1 + exp(-Ih)} (3)

hm

Ii =

Whi Oh (4)

h =1

Oi = 1/{1 + exp(-Ii)} (5)

where subscript h denotes hidden neuron and hm is the total number of hidden neurons.

Calculate the error for the output layer. ( if the output so obtained is not matching withthe coefficient vectorAi=1,n )

i = Oi - Ai (6)

n

Calculate the sum of the square of errors for all the output neurons,(i)2 and thenincrement the total error, E for all the patterns, T. i=1

T n

E =(i)2 (7)

7/27/2019 BPDES1

5/10

t=1 i=1

After calculating the error for a neuron, the algorithm then adjusts the neuron's weights using theLeast Mean Squared (LMS) learning rule, also called the Delta rule. The following equationcomputes the new weight vector for output neuronj:

Update the weights between output to hidden layer i.e Wih

Wihnew = Wih

old + [ i (Oi/ Ii)Oh + Wihold ] (8)

where (0,1) is called the momentum factor and (0,1) is called the learning rate.

As Equation (8) suggests, the vector term added to the current weight vector to adjust it duringtraining is often called the delta vector.

This learning rule is designed to adjust the network's weights to find the least mean squared error forthe network as a whole. This minimization has an intuitive geometrical meaning. It can be shown

that the mean squared error is a quadratic function of the weight vector. As a result, plotting themean squared error against the weight vector components produces a hyperparabolic surface. Figure4 shows an idealized error surface, assuming two-dimensional weight vectors for simplicity.

Aggregate Error

Delta vector

Old weight vector Weight y Ideal weight vector New weight vector

Weight x

Figure 4. Gradient Descent

As Figure 4 suggests, the geometrical effect of the LMS learning rule is to move the weight vectorstoward values that produce the minimum mean squared error, represented by the bottom point of thebowl-shaped error surface. In reality, the error surface typically has complex ravine-like featuresand many local minima. The delta vector follows the locally-steepest path, a little like a ball rollingdownhill, so this process is some times called gradient descent.

So far, this discussion ignores the network's layered architecture. A typical network has at least twolevels of weighted connections, one for the hidden neurons and the other for the output neurons. Atfirst glance, it is difficult to determine how the hidden neurons contribute to the output-layer errorbecause their target output is unknown. The key to the BP algorithm lies in assigning credit for theactual results back to the hidden layer as required.

7/27/2019 BPDES1

6/10

First, finding the error for the output-layer neurons is straightforward. As shown in Equation (2), theerror is proportional to the difference between the target result and the actual result. To find theerror in the hidden layer, the error value for each output-layer neuron is sent back to the hiddenlayer using the same weighted connections. This backward propagation of errors gives the algorithmits name.

Because the error is transmitted backward using the original weighted connections, BP in effectassumes that a hidden neuron contributes to the output-layer error in proportion to the weighted sumof the back-propagated error values. This sum is calculated using the hidden-to-output weightsbefore the output layer updates them.

Error for the hidden layer; n

h = (Oh/ Ih)Wihi (9)i =0

Updation of the weights between hidden to input layer i.e. Whk

Whknew = Whk

old + [h(Inp)k + Whkold] (10)

It may be noted that for a sigmoid function the derivative Olayer/Ilayer = O layer (1-Olayer). Thederivative term serves to moderate the sum using the value of the hidden neuron's result. Thismoderation is necessary partly because a strong connection sometimes transmits a weak result. If ahidden neuron's result approaches zero, for instance, then that neuron probably did not contributemuch to the output error, even if the neuron is strongly weighted. Allowing for the hidden-layeroutput when computing hidden-layer error reduces the risk of blaming a neuron unfairly. Thederivative term also contributes to the network's stability.

If there are more than one hidden layer then the equation for weight change is given by hm(L+1)

Wh(L) h(L-1)new = [(Oh(L)/ I h(L))O h(L-1)Wh(L+1) h(L)newh(L+1) ] + Wh(L) h(L-1)old (11) h(L+1) =0

where subscript L is the layer number. L+1 is the layer number nearest to the output layer.

Go for the next pattern of the input vector (Inp)k=1,km.

Repeat the process for all the patterns,T. As already indicated earlier, sets of suchpatterns can be created from the data generated either experimentally or by numerical

simulation. Training gets over if error criterion based on total error,E is met.

After the training is over ( i.e. adaptation of the weight matrices is frozen ), the validation of theartificial neural network is done by utilizing fresh experimental/simulated input vector (Inp)k=1,kmThe weight matrices of the validated neural network are preserved for their use in final application.

7/27/2019 BPDES1

7/10

Because the transfer function is an "S"-shaped curve, its derivative is a bell-shaped curve. Thesigmoid's derivative has large values in the middle range and small values toward both extremes.This shape assures that large changes in weights do not occur when the sum of the back-propagatederrors approaches a very large or very small value. Each hidden neuron calculates its own error, thenadjusts the input-to-hidden layer connection weights using the LMS learning rule described inEquation (8). The network is then ready for the next input pattern.

To summarize, the algorithm follows this sequence of steps during training:

1. Receives an input pattern at the unweighted input layer.2. Calculates the hidden-layer weighted sums and applies the transfer function to the sumsproducing the hidden-layer result.3. Transmits the hidden-layer result to the output layer.4. Calculates the output-layer weighted sums and applies the transfer function to the sums,producing the output-layer result.5. Compares the actual output-layer result with the target result, calculating an output-layer errorfor each neuron.

6. Transmits the output-layer errors to the hidden layer (back-propagation).7. Calculates the hidden-layer error using the weighted sum of the back-propagated error vectormoderated by the hidden-layer result.8. Updates the output-layer and hidden layer weights using the LMS learning rule.9. Receives the next pattern.

For example a typical pseudo-code description is as follows :Set maximum acceptable error/* user-specified value, often that worst case element in any pattern is within 10% of desired */repeat

{total error = 0;for each pattern in training set do{ /* forward activity flow */

get next patternfor each NEURON in middle layer do{

compute net input = weighted sum of input pattern elementsapply transfer function f(I) = 1/(1+exp(- I))save net input

/* needed for backward error pass and derivative computation */}/* end for middle layer NEURON */

for each NEURON in output layer do{compute net input weighted sum of middle layer output elementsapply transfer function f(I) = 1/(1 + exp(- I))

7/27/2019 BPDES1

8/10

display output}

/* end for each output-layer NEURON *//* backward error pass; omit this pass if not actively training the network *//* note this is the most effiecient proceedure, but clarifies the order of each step */

for each NEURON in the output layer do/* compute the error for each output layer NEURON */

{compute error = func(desired output - actual output)total error = total error + error

/* function func can be simple LMS or ABS or any other function */}

/* end for each output-layer NEURON */

for each NEURON in the middle layer do/* backpropagate the output layer NEURONs error to the middle layer */

compute incoming error = weighted sum of the output layer errorscompute final error = incoming error*(net input) *(1-net input)

/* because of choice of transfer function,df/dI = f(I)*(1-f(I) ), net input I, in this formula is this NEURONs net inputas computed in the forward activation flow */}

/* end for each middle layer NEURON */

for each NEURON in the output layer do/* adjust the weights between the middle layer and the output layer */{

for each weight from a middle layer NEURON do/* these are the incoming weights to the output layer */

{compute weight change = lr * E * I

/* I is the incoming activityalong this connection; E is this NEURONs error; lr is learing constant */update weight}

/* end for each weight */}

/* end for each output layer NEURON */

for each NEURON in the middle layer do/* adjust the weights between the input layer and the middle layer */

{for each weight from a input layer NEURON do

/* these are the incoming weights to the middle layer */{compute weight change = lr * E * I

/* I is the incoming activityalong this connection; E is this NEURONs error; lr is learing constant */

7/27/2019 BPDES1

9/10

update weight}

/* end for each weight */}

/* end for each middle layer NEURON */

}/* end for each pattern in the training set */}/* end repeat until total error < maximum acceptable error */until ( total error < maximum acceptable error )

During training, you typically test the network to evaluate its ability to process data it has not seenbefore. Testing involves presenting a file that --like the training input file--contains known input andtarget output patterns. During a test, the network passes a series of input vectors through the networkto generate a series of output vectors. Comparing the actual and target results lets you measure thenetwork's accuracy to decide whether to continue training, possibly after adjusting a constant orotherwise changing the network. No learning occurs during testing.

After training and testing end, you typically validate the network with fresh data to confirm that itbehaves as expected. Validation is recommended because testing influences training, for instance byending it.

After testing and validating the network, you save its weights to preserve what it learned duringtraining. You then load the trained weights before running the network as an application. Whenrunning the trained network, the network processes a file that contains "real" data. Unlike thetraining, testing, and validation input files, the application input file does not supply a target result.Instead, it supplies only input vectors. The network reads a series of vectors from the file, passes

them from layer to layer, then writes the corresponding series of output vectors to a file. The outputvectors might represent categories or some other response, depending on the task the network hasbeen trained to perform.

When to Use BP

You can use BP for a broad range of applications that directly or indirectly depend on modeling afunction. The underlying function can be simple or complex, linear or nonlinear, and continuous ordiscontinuous. Often, you use BP when the equation describing the function is unknown and traditional methods do not provide an adequate solution.

The following applications take advantage of the algorithm's properties:

* Pattern classification* Process control* Signal processing

7/27/2019 BPDES1

10/10

* Medical diagnosis* Noise filtering* Optical character recognition* Converting speech or text to phonemes* Encoding and compressing data* Financial forecasting.

Many other applications that require function mapping are also possible. Because BP is a frequentlyused as a classifier, contrasting it with another classifier such as Learning Vector Quantization(LVQ) may suggest a basis of choice for applications that could use either algorithm. Thedifferences include the following:

* Mathematically, BP minimizes sum-squared error, while LVQ minimizes the number ofmisclassifications. (In fact LVQ often increases sum-squared error by moving its initial decisionboundaries away from the "centers" of the categories to improve classification accuracy.)

* BP often trains more slowly than LVQ because it is computationally more complex. On the other

hand, trained BP networks that perform the same task are often smaller and may be faster.

* BP learns from all input vectors, no matter how far from the center of the class. LVQ, incontrast, can ignore outliers during training. And, since you can change the size of the checkwindow, LVQ lets you tune the degree of rejection.

* BP supplies a vector as its result, while LVQ supplies a token called a class ID. Because thevector components are variable values, BP some times interpolates "new" results in between theoriginal target output vectors. LVQ, on the other hand, draws its class IDs from a fixed populationand never interpolates in this sense.

Which differences are benefits depends on the application at hand. For applications that can useeither algorithm, usually the best way to decide between them is to try both.

BP has several drawbacks that may affect its suitability for some applications. First, BP is notguaranteed to solve the problem because the LMS algorithm cannot always find an acceptable localminimum. Second, BP requires training data labeled with a target result for each training pattern,but labeled data is sometimes difficult or impossible to obtain.Finally, BP is computationally complex, requiring many passes through the training data. As aresult, BP using conventional computers has been for the most part limited to small problems thatcan be solved off-line.

Documents

BPDES1