109
4 3hrs Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 1 / 109

Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

43hrs

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 1 / 109

Page 2: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Back to optimization

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109

Page 3: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stochastic optimization

• Optimize a cost that is a random variable

• Types of randomness:

- Measurement plus noise: R+ ν

- Multiple effects mixed together (we might use a mixture model)

- Unknown statistical properties

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 3 / 109

Page 4: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Monte Carlo integration

• Expectation of a random variable X :

E {X} =

∫Eξ px(ξ) dξ

(over the whole data space E)

• . . . But only a sample {x1, . . . , xn} is given (training set)

• Empirical distribution Px(ξ) =1

n

∑nl=1 δ (ξ − xl)→

• Approximate (empirical) expectation of X :

E {X} =

∫Eξ Px(ξ) dξ =

1

n

n∑l=1

xl

• This is a Monte Carlo integral

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 4 / 109

Page 5: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• Suppose that R is classification performance (risk).

• We want to optimize the true risk, the one computed on all possible,infinite data:

R(w) =

∫R (y(x),w) p(x)dx.

• This is a function of w(the weights identify one specific neural net)

• It is also a function of the data distribution p(x)

(the performance is estimated on the data)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 5 / 109

Page 6: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• When training a neural network we don’t have p(x), but only thetraining set {x1, . . . ,xn}

• From the training set we have the empirical distribution

Px(ξ) =1

n

n∑l=1

δ (ξ − xl)

• so we can compute a Monte Carlo estimate of the risk

R(w, X) =1

np

np∑l=1

R (y(xl),w)

this is the empirical risk.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 6 / 109

Page 7: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Training by epoch

• Optimize using the whole training set to estimate the cost

• It means computing R (and the ∆W )

• on the basis of a Monte Carlo estimate of risk

• Finds the optimal value of an approximate (empirical) cost function

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 7 / 109

Page 8: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stochastic approximation

• A special kind of stochastic optimization

• R is estimated at each input pattern using that pattern alone

• Extremely unreliable estimation – but it converges in probability!

• Robbins and Monro, 1951; Kiefer and Wolfowitz, 1952

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 8 / 109

Page 9: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• Convergence in probability:

limn→∞

Pr(|Rn −R| ≥ ε

)= 0

• Rn is the estimate of R on a training set of size n

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 9 / 109

Page 10: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stochastic approximation

• Given:

- A function R whose gradient ∇R we want to set to zero, or minimize(but we can’t compute analytically)

- A sequence G1, G2, . . . , Gl, . . . of random samples of ∇R, affectedby random noise

- A decreasing sequence η1, η2, . . . , ηl, . . . of step size coefficients

• Basic iteration:w(l + 1) = w(l)− ηl Gl

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 10 / 109

Page 11: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stochastic approximation: The intuition

• Each sample gives a noisy (stochastic) estimate of the gradient

• ⇒ ∇R + noise

• By averaging over time, noise cancels out

• Random variations also make it possible to escape local minima

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 11 / 109

Page 12: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Results on convergence of stochastic approximation

• If R is twice differentiable and convex,

then stochastic approximation converges with a rate of O(

1

l

)• A condition of convergence (not optimal rate of convergence):

0 <∑l

η2l = A <∞

• Usually the hypotheses are not met (complex cost landscape) and wedon’t have guarantees.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 12 / 109

Page 13: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Training by pattern

• is computing R (and the ∆W )

• on the basis of an estimate of risk on a single point

• An extreme Monte Carlo estimate on a training set of one observationonly

• Finds the approximate optimal value of an approximate costfunction

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 13 / 109

Page 14: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Implementation of training

• By epoch: estimation loop, then update

• By pattern: estimation + update loop

• By pattern on a training set: l = random

• Learning rate η → By pattern: keep it low

• → By epoch: make it adaptive

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 14 / 109

Page 15: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Multi-layer neural networks

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 15 / 109

Page 16: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Connectionism and Parallel Distributed Processes

David Rumelhart James McClelland Geoffrey Hinton

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 16 / 109

Page 17: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

What is connectionism?

• Connectionism is an approach to cognitive science that characterizeslearning and memory through the discrete interactions between nodesof neural networks

• Representation of concepts and rules not concentrated in symbols witha lot of meaning, but in sub-symbolic “neural encodings” (neuronactivations) which have a meaning only if taken collectively as patterns

• Neural networks are distributed and massively parallel

• They rely on spontaneously-generated internal representations

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 17 / 109

Page 18: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Network topologies

Most general: feedback.

*

Units may be visible or hidden (*)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 18 / 109

Page 19: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Network topologies

A special type of feedback is lateral connections

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 19 / 109

Page 20: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Network topologies

Less general: a topology where cycles are forbidden: feedforward.Visible units may be input or output.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 20 / 109

Page 21: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Network topologies

Least general: multi-layer

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 21 / 109

Page 22: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Why multi-layer?

Linear separability

Feature discovery

Hierarchies of abstractions

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 22 / 109

Page 23: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Example: Parity

Problem: Given any input string of d bits, tell whether the number of bitsset (= 1) is even.

Generalizes XOR: it is not linearly separable

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 23 / 109

Page 24: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Example: Parity

The solution requires d hidden units

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 24 / 109

Page 25: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Universal approximation theorem

G. Cybenko 1989

A feed-forward network with a single hidden layer containing a finitenumber of neurons can approximate any continuous function on compactsubsets of Rd

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 25 / 109

Page 26: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

How do we train a multi-layer neural network?

1 With a suitable algorithm

2 With a sequence of independent trainings

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 26 / 109

Page 27: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• As we have seen, learning (e.g., learning to recognize) can be cast asthe problem of optimizing a suitable cost function (risk)

• But most optimization methods rely on the necessary minimumcondition ∇E = 0 or on the direction of the gradient ∇E

→ requirement: E must be at least differentiable (even better if alsoconvex, but that’s not always possible)

• Even if E is differentiable, for hidden units we cannot compute anerror term like (t− a)2 (mse)

→ requirement: we need a way to do this

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 27 / 109

Page 28: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

A differentiable activation function

• Let’s write the discriminant function for a problem with two Gaussian,spherical, equal-variance classes.

• Translation of the origin, rotation of axes. . .

• 1-dimensional symmetrical problem in x with only two parameters

p(x|ω1) =1√2πσ

exp

[(x− µ)2

2σ2

]p(x|ω2) =

1√2πσ

exp

[(x+ µ)2

2σ2

]

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 28 / 109

Page 29: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

For the Bayes theorem:

P (ω1) =p(x|ω1)P (ω1)

p(x|ω1)P (ω1) + p(x|ω2)P (ω2)

P (ω2) =p(x|ω2)P (ω2)

p(x|ω1)P (ω1) + p(x|ω2)P (ω2)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 29 / 109

Page 30: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

2-class discriminant function:

g(x) = P (ω1)− P (ω2)

=exp

[(x−µ)22σ2

]exp

[(x−µ)22σ2

]+ exp

[(x+µ)2

2σ2

] − exp[(x+µ)2

2σ2

]exp

[(x−µ)22σ2

]+ exp

[(x+µ)2

2σ2

]removing the factors 1/

√2πσ

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 30 / 109

Page 31: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

g(x) =exp

[−x2+µ2

2σ2

]exp

[xµσ2

]− exp

[−x2+µ2

2σ2

]exp

[−xµσ2

]exp

[−x2+µ2

2σ2

]exp

[xµσ2

]+ exp

[−x2+µ2

2σ2

]exp

[−xµσ2

]The common positive factor exp

[−x2+µ2

2σ2

]cancels out:

g(x) =exµ

σ2 − e−xµ

σ2

exµ

σ2 + e−xµ

σ2

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 31 / 109

Page 32: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• We replace x with the score r = x ·w′

• We can absorb the factor µ/σ2 into the norm of w′:

w =µ

σ2w′

• We obtain

g(r) =er − e−r

er + e−r, r = x ·w

g(r) = hyperbolic tangent activation, tanh(r)

• logistic or sigmoid activation:

σ(r) =1

1 + e−r=

tanh(r) + 1

2

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 32 / 109

Page 33: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

-1

-0.5

0

0.5

1

-10 -5 0 5 10

a

r

SIGMOID

TANH

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 33 / 109

Page 34: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

-1

-0.5

0

0.5

1

-10 -5 0 5 10

a

r

HEAVISIDE

SIGN

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 34 / 109

Page 35: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

• The sigmoid is the solution of the logistic equation

y′ = y(1− y)

• Therefore, by definition,

∂σ(r)

∂r= σ(r) ( 1− σ(r) )

• Also,∂ tanh(r)

∂r= 1− tanh2(r)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 35 / 109

Page 36: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

The error back-propagation algorithm

• Discovered by Amari/Werbos/Parker/Rumelhart/Hinton/Williams from1974 to 1986

• The name appears in Rosenblatt’s book “Principles of Neurodynamics”in 1962

• A clever application of the chain rule of differential calculus

• We can perform gradient descent in a distributed way and withoutactually computing derivatives

• The responsibility for errors is back-propagated from the outputsback inside the network, and distributed among the hidden layers.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 36 / 109

Page 37: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

The chain rule

df(g(x))

dx=

df(y)

dy

∣∣∣∣y=g(x)

dg(x)

dx

Where is the “chain”?

df(g(h(x)))

dx=

df(g)

dg↗ dg(h)

dh↗ dh(x)

dx

which, for instance, can be used to prove that

∂σ(r)

∂wi=

dσ(r)

dr∂r

∂wi= σ′(r)xi = σ(r) ( 1− σ(r) )xi (1)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 37 / 109

Page 38: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 38 / 109

Page 39: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

np number of patterns in the training setni number of input units

nh number of hidden unitsno number of output unitsnw total number of weights, nw = (ni + 1)nh + (nh + 1)noi index for input componentsj index for hidden unitsk index for output unitsxi i-th component of input patternrj net stimulus of the j-th hidden unitrk net stimulus of the k-th output unit

shj j-th hidden unit activation valuesok k-th component of outputtgk k-th component of target

whiji weight to j-th hidden unit from i-th input unit [(ni + 1)× nh]wohkj weight to k-th output unit from j-th hidden unit [(nh + 1)× no]

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 39 / 109

Page 40: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Loss function λ(sok, tgk) = (tgk − sok)2

1 in general there may be several output units;

2 the overall cost function is not quadratic (a paraboloid) because thenetwork is non-linear

Non convex cost function

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 40 / 109

Page 41: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Expected cost

E =

∫1

2

1

no

no∑k=1

(sok(x)− tgk(x))2 p(x)dx (2)

E is known only through its estimate on the training set (here by epoch):

E =1

np

np∑l=1

1

2

1

no

no∑k=1

(sok(xl)− tgk(xl))2 (3)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 41 / 109

Page 42: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Summation and differentiations are both linear and therefore can beexchanged freely.

E =1

2

1

no

no∑k=1

(sok − tgk)2 (4)

We only consider one pattern

• For training online = by pattern, we will apply immediately the ∆w

as we did with perceptron and Adaline

• For training by epoch, we will sum several ∆w and apply them onlyat the end of each pass (a training epoch).

• For training by batch, we will sum several ∆w and apply them onlyafter some % of a complete pass.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 42 / 109

Page 43: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

The operation of the multilayer perceptron is divided in two steps:

• activation forward-propagation

• error back-propagation.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 43 / 109

Page 44: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Forward propagation

→→→→→→→→

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 44 / 109

Page 45: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Forward propagation

∀j rj =

ni∑i=0

whijixi ⇒ shj = σ(rj) (5)

∀k rk =

nh∑j=0

wohkjshj ⇒ sok = σ(rk) (6)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 45 / 109

Page 46: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Error back-propagation

←←←←←←←←

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 46 / 109

Page 47: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Error back-propagation and update

we start from computation of partial derivatives, i.e., the gradient of theerror.

w is generically any of the weights of the network.

We need all the components of the gradient ∇E

These are∂E

∂wfor all possible w

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 47 / 109

Page 48: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

∂E

∂w=

1

2

1

no

no∑k=1

∂ (sok − tgk)2

∂w=

1

no

no∑k=1

(sok − tgk)∂sok∂w

(7)

Depending on whether w is a woh or a whi we will have differentexpansions of the above expression.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 48 / 109

Page 49: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Hidden-to-output weights wohkj

∂E

∂wohkj=

1

no

no∑k′=1

(sok′ − tgk′)∂sok′

∂rk′

∂rk′

∂shj(8)

We can drop all terms not depending on k, those with k′ 6= k:

∂E

∂wohkj=

1

no(sok − tgk)

∂sok∂rk

∂rk∂shj

(9)

We plug in quantities known from the forward pass:

∂E

∂wohkj=

1

no(sok − tgk)σ

′(rk)shj (10)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 49 / 109

Page 50: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

If we define

δk = (sok − tgk)σ′(rk) (11)

we have a generalization of the “delta” term which we have seen in the deltarule by Widrow and Hoff.

Generalized delta rule for the hidden-to-output weights:

∆wohkj = −ηδkshj , (12)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 50 / 109

Page 51: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Problem with the input-to-hidden weights:not all terms are readily available.We use again the chain rule to find another formulation for ∂E/∂whiji

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 51 / 109

Page 52: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

∂E

∂whiji=

1

2

1

no

no∑k=1

∂ (sok − tgk)2

∂whiji= (13)

=1

no

no∑k=1

(sok − tgk)∂sok∂rk

∂rk∂shj

∂shj∂whiji

(14)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 52 / 109

Page 53: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Now the quantities appearing in the last equation are available, again fromeither the forward pass or theory:

• (sok − tgk)∂sok∂rk

= δk

•∂rk∂shj

= wohkj

•∂shj∂whiji

=∂shj∂∂rj

∂rj∂whiji

= σ′(rj)xi

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 53 / 109

Page 54: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

∂E

∂whiji=

1

no

no∑k=1

(sok − tgk)∂sok∂rk

∂rk∂shj

∂shj∂whiji

(15)

=1

no

no∑k=1

[δkσ′(rk)wohkj

] [σ′(rj)xi

](16)

Note that the summation here does not disappear

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 54 / 109

Page 55: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

We can further manipulate the expression, by first isolating the terms whichdo not contribute to the summation:

=

[1

no

no∑k=1

δkσ′(rk)wohkj

] [σ′(rj)xi

](17)

and then identifying the generalized delta for the input-to-hidden weights:

δj =

[1

no

no∑k=1

δkσ′(rk)wohkj

](18)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 55 / 109

Page 56: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Generalized delta rule

for the input-to-hidden weights:

∆whiji = −ηδjxi , (19)

amazingly similar in form to that for the hidden-to-output weights

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 56 / 109

Page 57: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Important property of multi-layer networks

The layered network is the simplest possible connectivity that has theuniversal approximation property.

Should be large enough – or deep enough

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 57 / 109

Page 58: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Generalization and overfitting

The number of weights needs to be high.

We must take care of controlling overfitting.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 58 / 109

Page 59: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Overfitting

Is the situation where

• R is low

• but |R−R| is high

Symptom: While training we are happy, but then tests fail!

No generalization due to too much specialization (learning the trainingset, not the classificatin rule)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 59 / 109

Page 60: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Multi-layer perceptrons not a good model for the brain?

Some evidence that the brain uses sparse (localized) rather than dense(distributed) representations.

Probably both

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 60 / 109

Page 61: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Deep neural networks

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 61 / 109

Page 62: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

David Hubel and Torsten Wiesel

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 62 / 109

Page 63: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Hubel and Wiesel placed electrodes in animals brains (visual cortex)They discovered the columnar organization of neurons

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 63 / 109

Page 64: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Each layer in a cortical colum extracts features from the input it receivesfrom the previous layer

These features are more and more abstract

Edges – Simple shapes – Composite shapes – Eyes, mouths, noses. . . Grandmother(The Grandmother Cell hypothesis)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 64 / 109

Page 65: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Learning features

in neural networksInternal representation in hidden layers

Hierarchy requires many layers (deep networks).

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 65 / 109

Page 66: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 66 / 109

Page 67: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Learning: Limits of multi-layer networks

Error back-propagation does not work well with very deep structures

Vanishing gradient phenomenon:At each layer, the backpropagated components of the gradient becomeexponentially smaller.

To avoid the problem: use shallow networks (theoretically sufficient).

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 67 / 109

Page 68: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Example of a shallow architecture

Support vector machines

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 68 / 109

Page 69: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Representational advantage of depth

In the 80’s and early 90’s some works proved that:some logical functions, that can be implemented with a depth of klayers, require exponentially more units if reduced to k − 1 layers

In the 2010’s: dependent inputs (variables) need very deep networks

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 69 / 109

Page 70: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

How to avoid training the whole network altogether?

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 70 / 109

Page 71: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Multi-level hierarchies of networks

Cascaded networks of unsupervised layers trained one after the other+Final classification layer

The whole structure is finally trained with error back-propagation

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 71 / 109

Page 72: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

The idea is not new: Neocognitron

K. Fukushima, 1987

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 72 / 109

Page 73: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Unsupervised learning principles

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 73 / 109

Page 74: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Information Bottleneck

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 74 / 109

Page 75: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Information Bottleneck

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 75 / 109

Page 76: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Techniques using the "information bottleneck" principle

Using statistics and entropy

• Coding theory

• Stochastic complexity and minimum description length

Using errors

• Autoencoders

• PCA

• Rate-distortion theory

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 76 / 109

Page 77: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Autoencoders

An autoencoder is a special case of a multi-layer perceptron charcterized bytwo aspects:

1 Structure: number of units in the input layer = number of units in theoutput layer > number of hidden units

2 Learned task: an autoencoder is trained to approximate theidentity function (= replicate its input at the output)

An autoencoder is not a classifier

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 77 / 109

Page 78: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Autoencoders

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 78 / 109

Page 79: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Autoencoders

What is interesting is not the output value (is an approximation to the input)but the pattern present on the hidden layer

Since we don’t use any target (the target coincides with the input),the autoencoder task is unsupervisedSometimes termed "self-supervised"

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 79 / 109

Page 80: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Learned features from a set of images

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 80 / 109

Page 81: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Recognizing handwritten digits

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 81 / 109

Page 82: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Features for recognizing ’0’ from ’8’

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 82 / 109

Page 83: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Features for recognizing ’1’ from ’8’

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 83 / 109

Page 84: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

An example of an autoencoder for learning features fromsymbolic data

Task: diagnose Lyme disease from patient records

Problem: many features (observed signs and symptoms) are binary and verysparse

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 84 / 109

Page 85: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

An example of an autoencoder for learning features fromsymbolic data

Learning the features

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 85 / 109

Page 86: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

An example of an autoencoder for learning features fromsymbolic data

Using the learned features

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 86 / 109

Page 87: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Principal component analysis

Is an instance of factor analysis:Discover the few unobservable factors that give rise to observable(measurable) variables

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 87 / 109

Page 88: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Example of factor analysis problem:Discover the abilities underlying performance in school tests

Observed variables Marks in algebra testMarks in geometry testMarks in literature testMarks in foreign language testMarks in music testMarks in essay

Hidden factors Linguistic abilitySpatial abilitySymbolic processing ability

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 88 / 109

Page 89: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Principal Component Analysis or PCA

is a linear solution to the factor analysis problem

Linear: factors are linear combinations of patterns

v = λ1x1 + λ2x2 + . . .+ λdxd

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 89 / 109

Page 90: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

PCA works on the Covariance matrix of data

Covariance between input xi and input xj :

σi,j = σj,i = E {(xi − xi)(xj − xj)}

E{} expectation (or mean over te training set), xi mean of i-th input

Σ =

σ1,1 σ1,2 . . . σ1,dσ2,1 σ2,2 . . . σ2,d...

. . ....

σd,1 σd,2 . . . σd,d

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 90 / 109

Page 91: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Note: If X is the training set as a matrix and all inputs have zero mean ,i.e., X −X , then

Σ = XTX

X = X-repmat(mean(X),size(X,1),1)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 91 / 109

Page 92: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Principal componentsThe "factors" in PCA are called principal components and are given bythe eigenvectors of Σ:

v1, . . . ,vd

If we project pattern x = [x1, x2, . . . , xd] onto the componentvi = [v1, v2, . . . , vd]

we obtain the value of the i-th factor, or component, or feature, for patternx:

ai = x · vi =∑i

xivi

OK, components; but why "principal"?

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 92 / 109

Page 93: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Property

1 Eigenvectors of Σ can be ordered by the corresponding eigenvalues,from largest to smallest

2 Eigenvectors are thus ordered by variance or energy or level ofactivity from largest to smallest

3 Projection of the training set X onto the first r (principal)components gives the best rank r approximation to X itself, whenmeasured by mean square error

PCA is a form of lossy compressionThe principal components are features useful to represent the data in asynthetic way (information bottleneck)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 93 / 109

Page 94: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

It has been proved that an autoencoder with linear activations learnsthe principal components

This is because the objective is the mean squared reconstruction error of alower-rank representation, the same as PCA

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 94 / 109

Page 95: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Oja’s neuron

A single-unit model with linear (identity) activation

a = x ·w

Learning rule:

w← w + ηx(a− aw)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 95 / 109

Page 96: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

It can be proven that, for small η, Oja’s learning rule is a first-order Taylorapproximation of the Rayleigh quotient iteration method of finding theprincipal eigenvector.

At convergence, w is the principal component of Σ.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 96 / 109

Page 97: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Oja’s neuron is a neural principal component analyzer

Advantages over using explicit eigensolvers (e.g., LAPACK eigensolver, orMatlab’s eig function):

1 Distributed

2 Online (big data!)

Disadvantages:

1 Stochastic (convergence in probability)

2 Slower because of the requirement of small η

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 97 / 109

Page 98: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Restricted Boltzmann Machines

A generative modelInvented by G. HintonStarted in the Eighties (Boltzmann machines) then developed in thefollowing decades

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 98 / 109

Page 99: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Boltzmann Machines:

• binary-valued units

• bi-directional connections

• symmetric weight (equal in the two directions)

• general topology (feedback possible)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 99 / 109

Page 100: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

The restricted version has the limitation that its topology must be abipartite graph

This makes it more tractable

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 100 / 109

Page 101: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Energy

• v = [vi] and h = [hi] visible and hidden unit activation values,respectively

• wi,j weight between vi and hj• ai and bi biases of visible and hidden units, respectively

then we can define an "energy"

E(v,h) = −∑i

aivi −∑j

bjhj −∑i

∑j

viwi,jhj

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 101 / 109

Page 102: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Probability of states

The probability of any possible network state is

P (v,h) =1

Ze−E(v,h)

with Z partition function (normalizer)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 102 / 109

Page 103: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Probability of states

Since intra-layer connections are not present, probability of activation ofone unit does not depend on that of other units in the same layer – only inthe other layer

P (vi = 1|h) =1

1 + e−(ai+∑j wi,jhj)

P (hj = 1|v) =1

1 + e−(bj+∑i wi,jvi)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 103 / 109

Page 104: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Training a RBM

Algorithm called contrastive divergenceUses random sampling from the probabilities (computed as above):

• Apply one input

• Compute probability P (h|v) - Sample from it to generate hiddenconfiguration

• Compute a positive update step ∆w+ = vhT (outer product)

• Generate one possible input v′ from the hidden configuration

• Compute probability P (h′|v′)• Compute a negative update step ∆w− = v′h′T

• Apply update: w← w + η(∆w+ −∆w−)

This does not optimize any explicit objective function!!

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 104 / 109

Page 105: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Training RBMs of large size is not simpleThere are tricks to make the task easier

Example: weight sharing and convolutional neural networksThese help with data having correlated inputs, as in image, video, speech,general time series.

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 105 / 109

Page 106: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Deep Belief Networks

• A DBN is a sequence of RBMs

• Each RBM can be trained independently of the following ones

• greedy strategy

• The last layer can be a classifier

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 106 / 109

Page 107: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Deep networks can be built out of RBMs, but also out of autoencoders

Autoencoders are less insensitive to random noise

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 107 / 109

Page 108: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

Neural networks: Why bothering?

Deep learning achieved success in very complex tasks and won manycompetitionsExample: extracting words from audio and transforming them in automaticsubtitles (cfr. Youtube)

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 108 / 109

Page 109: Introduction to neural networks - unige.it · Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109. ... Network topologies Most general: feedback. Units may be visible

T H E E N D

Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 109 / 109