Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
I t d ti t N lIntroduction to Neural NetworksNetworks
Gianluca Pollastri, Head of LabSchool of Computer Science and Informatics and
Complex and Adaptive Systems LabsComplex and Adaptive Systems LabsUniversity College [email protected]
CreditsCredits
Geoffrey Hinton, University of Toronto.borrowed some of his slides for “Neural
Networks” and “Computation in Neural Networks” courses.
Paolo Frasconi, University of Florence.This guy taught me Neural Networks in the firstThis guy taught me Neural Networks in the first
place (*and* I borrowed some of his slides too!).
Recurrent Neural Networks (RNN)Recurrent Neural Networks (RNN)
One of the earliest versions: Jeffrey Elman, 1990, Cognitive Science., , g
P bl it i ’t t t tiProblem: it isn’t easy to represent time with Feedforward Neural Nets: usually time is represented with space.Attempt to design networks with memoryAttempt to design networks with memory.
RNNsRNNs
The idea is having discrete time steps, and considering the hidden layer at time t-1 as g yan input at time t.This effectively removes cycles: we canThis effectively removes cycles: we can
model the network using an FFNN, and model memory explicitly.
Ot
Xt dXt d
Itd = delay element
BPTTBPTT
BackPropagation Through Time. If Ot is the output at time t It the input at If Ot is the output at time t, It the input at
time t, and Xt the memory (hidden) at time t we can model the dependencies ast, we can model the dependencies as follows:
BPTTBPTT
We can model both f() and g() with (possibly multilayered) networks.(p y y )We can transform the recurrent network by
unrolling it in timeunrolling it in time. Backpropagation works on any DAG. An
RNN becomes one once it’s unrolled.
Ot
Xt dXt d
Itd = delay element
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
gradient in BPTTgradient in BPTT GRADIENT(I O T) { GRADIENT(I,O,T) { # I=inputs, O=outputs, T=targets T := size(O); X0 := 0; for t := 1..T Xt := f( Xt-1 , It ); for t := 1..T { Ot := g( Xt , It ); g.gradient( Ot - Tt );g g δt = g.deltas( Ot - Tt ); } for t := T..1for t : T..1 f.gradient( δt ); δt-1 += f.deltas( δt ); } }
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
Ot Ot+1Ot-1 Ot+2Ot-2
X XX XX Xt Xt+1Xt-1 Xt+2Xt-2
It It+1It-1 It+2It-2
What I will talk aboutWhat I will talk about
Neurons Neurons Multi-Layered Neural Networks:
Basic learning algorithm E pressi e po er Expressive power Classification
How can we *actually* train Neural Networks: Speeding up training Speeding up training Learning just right (not too little, not too much) Figuring out you got it right
Feed back networks? Feed-back networks? Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann
Machines) Recurrent Neural Networks Recurrent Neural Networks Bidirectional RNN 2D-RNN
Concluding remarksConcluding remarks
Bidirectional Recurrent Neural Networks (BRNN)
BRNNBRNN
Ft = ( Ft-1 , Ut )Bt = ( Bt+1 Ut )Bt ( Bt+1 , Ut )Yt = ( Ft , Bt , Ut )
• () () ed () are realised with NN• () () ed () are realised with NN• (), () and () are independent from t:
stationary
BRNNBRNN
Ft = ( Ft-1 , Ut )Bt = ( Bt+1 Ut )Bt ( Bt+1 , Ut )Yt = ( Ft , Bt , Ut )
• () () ed () are realised with NN• () () ed () are realised with NN• (), () and () are independent from t:
stationary
BRNNBRNN
Ft = ( Ft-1 , Ut )Bt = ( Bt+1 Ut )Bt ( Bt+1 , Ut )Yt = ( Ft , Bt , Ut )
• () () ed () are realised with NN• () () ed () are realised with NN• (), () and () are independent from t:
stationary
BRNNBRNN
Ft = ( Ft-1 , Ut )Bt = ( Bt+1 Ut )Bt ( Bt+1 , Ut )Yt = ( Ft , Bt , Ut )
• () () ed () are realised with NN• () () ed () are realised with NN• (), () and () are independent from t:
stationary
Inference in BRNNsInference in BRNNs
FORWARD(U) { FORWARD(U) { T size(U); F B 0; F0 BT+1 0; for t 1..T Ft = ( Ft 1 , Ut ); Ft ( Ft-1 , Ut ); for t T..1 Bt = ( Bt+1 , Ut );t ( t+1 t ) for t 1..T Yt = ( Ft , Bt , Ut ); return Y; }
Learning in BRNNsLearning in BRNNs
GRADIENT(U Y) { f T 1 GRADIENT(U,Y) { T size(U); F B 0;
for t T..1 δFt-1 +=
.backprop&gradient(δFt ); F0 BT+1 0; for t 1..T Ft = ( Ft-1 , Ut );
p p g ( Ft ); for t 1..T δBt+1 +=
b k & di t(δ )t t 1 t
for t T..1 Bt = ( Bt+1 , Ut );
f t 1 T {
.backprop&gradient(δBt ); }
for t 1..T { Yt = ( Ft , Bt , Ut ); [δFt δBt] = [δFt, δBt]
.backprop&gradient( Yt - Yt ); }
What I will talk aboutWhat I will talk about
Neurons Neurons Multi-Layered Neural Networks:
Basic learning algorithm E pressi e po er Expressive power Classification
How can we *actually* train Neural Networks: Speeding up training Speeding up training Learning just right (not too little, not too much) Figuring out you got it right
Feed back networks? Feed-back networks? Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann
Machines) Recurrent Neural Networks Recurrent Neural Networks Bidirectional RNN 2D-RNN
Concluding remarksConcluding remarks
2D RNNs2D RNNs
ll i ldi fPollastri & Baldi 2002, BioinformaticsBaldi & Pollastri 2003, JMLR
2D RNNs2D RNNs
2D RNNs2D RNNs
2D RNNs2D RNNs
2D RNNs2D RNNs
2D RNNs2D RNNs
2D RNNs2D RNNs