48
The back-propagation algorithm January 8, 2012 Ryan

The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back-propagation algorithm

January 8, 2012

Ryan

Page 2: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The neuron

I The sigmoid equation is what is typically used as a transferfunction between neurons. It is similar to the step function,but is continuous and differentiable.

I

σ(x) =1

1 + e−x(1)

x

y

-5 -4 -3 -2 -1 0 1 2 3 4 5

1

Figure: The Sigmoid Function

I One useful property of this transfer function is the simplicityof computing it’s derivative. Let’s do that now...

Page 3: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The neuron

I The sigmoid equation is what is typically used as a transferfunction between neurons. It is similar to the step function,but is continuous and differentiable.

I

σ(x) =1

1 + e−x(1)

x

y

-5 -4 -3 -2 -1 0 1 2 3 4 5

1

Figure: The Sigmoid Function

I One useful property of this transfer function is the simplicityof computing it’s derivative. Let’s do that now...

Page 4: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The neuron

I The sigmoid equation is what is typically used as a transferfunction between neurons. It is similar to the step function,but is continuous and differentiable.

I

σ(x) =1

1 + e−x(1)

x

y

-5 -4 -3 -2 -1 0 1 2 3 4 5

1

Figure: The Sigmoid Function

I One useful property of this transfer function is the simplicityof computing it’s derivative. Let’s do that now...

Page 5: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)

=e−x

(1 + e−x)2

=

(

1 + e−x

)

− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−

(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 6: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=

(

1 + e−x

)

− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−

(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 7: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=

(

1 + e−x

)

− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−

(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 8: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=(1 + e−x)− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−

(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 9: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=(1 + e−x)− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−

(

1

(1 + e−x) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 10: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=(1 + e−x)− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 11: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=(1 + e−x)− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 12: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The derivative of the sigmoid transfer function

d

dxσ(x) =

d

dx

(1

1 + e−x

)=

e−x

(1 + e−x)2

=(1 + e−x)− 1

(1 + e−x)2

=1 + e−x

(1 + e−x)2−(

1

(

1 + e−x

) 2

)2

= σ(x)− σ(x)2

σ′ = σ(1− σ)

Page 13: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Single input neuron

σξω

θ

O

Figure: A Single-Input Neuron

In the above figure (2) you can see a diagram representing a singleneuron with only a single input. The equation defining the figure is:

O = σ(ξω)

Page 14: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Single input neuron

σξω

θ

O

Figure: A Single-Input Neuron

In the above figure (2) you can see a diagram representing a singleneuron with only a single input. The equation defining the figure is:

O = σ(ξω + θ)

Page 15: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Multiple input neuron

ξ1ω1

ξ2ω2

ξ3ω3

θ

O∑

σ

Figure: A Multiple Input Neuron

Figure 3 is the diagram representing the following equation:

O = σ(ω1ξ1 + ω2ξ2 + ω3ξ3 + θ)

Page 16: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

A neural network

I J K

Figure: A layer

A neural network

Page 17: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

A neural network

I J K

Figure: A neural network

Page 18: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

A neural network

I J K

Figure: A neural network

Page 19: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 20: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 21: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 22: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 23: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 24: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

Notation

I x`j : Input to node j of layer `

I W `ij : Weight from layer `− 1 node i to layer ` node j

I σ(x) = 11+e−x : Sigmoid Transfer Function

I θ`j : Bias of node j of layer `

I O`j : Output of node j in layer `

I tj : Target value of node j of the output layer

Page 25: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The error calculation

Given a set of training data points tk and output layer output Ok

we can write the error as

E =1

2

∑k∈K

(Ok − tk)2

We let the error of the network for a single training iteration bedenoted by E . We want to calculate ∂E

∂W `jk

, the rate of change of

the error with respect to the given connective weight, so we canminimize it.Now we consider two cases: The node is an output node, or it is ina hidden layer...

Page 26: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk=

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 27: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk=

∂Wjk

1

2

∑k∈K

(Ok − tk)2

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 28: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk= (Ok − tk)

∂WjkOk

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 29: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk= (Ok − tk)

∂Wjkσ(xk)

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 30: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk= (Ok − tk)σ(xk)(1− σ(xk))

∂Wjkxk

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 31: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk= (Ok − tk)Ok(1−Ok)Oj

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 32: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Output layer node

∂E

∂Wjk= (Ok − tk)Ok(1−Ok)Oj

For notation purposes I will define δk to be the expression(Ok − tk)Ok(1−Ok), so we can rewrite the equation above as

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

Page 33: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 34: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=

∂Wij

1

2

∑k∈K

(Ok − tk)2

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 35: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=∑k∈K

(Ok − tk)∂

∂WijOk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 36: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=∑k∈K

(Ok − tk)∂

∂Wijσ(xk)

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 37: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=∑k∈K

(Ok − tk)σ(xk)(1− σ(xk))∂xk∂Wij

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 38: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=∑k∈K

(Ok − tk)Ok(1−Ok)∂xk∂Oj

·∂Oj

∂Wij

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 39: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=∑k∈K

(Ok − tk)Ok(1−Ok)Wjk∂Oj

∂Wij

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 40: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij=

∂Oj

∂Wij

∑k∈K

(Ok − tk)Ok(1−Ok)Wjk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 41: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij= Oj(1−Oj)

∂xj∂Wij

∑k∈K

(Ok − tk)Ok(1−Ok)Wjk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 42: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij= Oj(1−Oj)Oi

∑k∈K

(Ok − tk)Ok(1−Ok)Wjk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 43: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij= Oj(1−Oj)Oi

∑k∈K

(Ok − tk)Ok(1−Ok)Wjk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 44: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

Hidden layer node

∂E

∂Wij= Oj(1−Oj)Oi

∑k∈K

(Ok − tk)Ok(1−Ok)Wjk

But, recalling our definition of δk we can write this as

∂E

∂Wij= OiOj(1−Oj)

∑k∈K

δkWjk

Similar to before we will now define all terms besides the Oi to beδj , so we have

∂E

∂Wij= Oiδj

Page 45: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

How weights affect errors

For an output layer node k ∈ K

∂E

∂Wjk= Ojδk

whereδk = Ok(1−Ok)(Ok − tk)

For a hidden layer node j ∈ J

∂E

∂Wij= Oiδj

whereδj = Oj(1−Oj)

∑k∈K

δkWjk

Page 46: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

What about the bias?

If we incorporate the bias term θ into the equation you will findthat

∂O∂θ

= O(1−O)∂θ

∂θ

and because ∂θ/∂θ = 1 we view the bias term as output from anode which is always one.

This holds for any layer ` we are concerned with, a substitutioninto the previous equations gives us that

∂E

∂θ= δ`

(because the O` is replacing the output from the “previous layer”)

Page 47: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

What about the bias?

If we incorporate the bias term θ into the equation you will findthat

∂O∂θ

= O(1−O)∂θ

∂θ

and because ∂θ/∂θ = 1 we view the bias term as output from anode which is always one.This holds for any layer ` we are concerned with, a substitutioninto the previous equations gives us that

∂E

∂θ= δ`

(because the O` is replacing the output from the “previous layer”)

Page 48: The back-propagation algorithm · The back propagation algorithm Notation I x‘ j: Input to node j of layer ‘ I W‘ ij: Weight from layer ‘ 1 node i to layer ‘node j I ˙(x)

The back propagation algorithm

1. Run the network forward with your input data to get thenetwork output

2. For each output node compute

δk = Ok(1−Ok)(Ok − tk)

3. For each hidden node calulate

δj = Oj(1−Oj)∑k∈K

δkWjk

4. Update the weights and biases as followsGiven

∆W = −ηδ`O`−1

∆θ = −ηδ`apply

W + ∆W →W

θ + ∆θ → θ