23
First models First Winter Multilayer Perceptron Second Winter Deep Learning Artificial Neural Networks Historical description Victor G. Lopez 1 / 23

Artificial Neural Networks - University of Texas at Arlington history-victor 2017.pdf · per describing a logical calculus of neural networks. ... 1959: ADALINE and MADALINE Bernard

  • Upload
    lyminh

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Artificial Neural Networks

Historical description

Victor G. Lopez

1 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Artificial Neural Networks (ANN)

• An artificial neural network is a computational model that attempts to

emulate the functions of the brain.

2 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Characteristics of ANNs

Modern ANNs are complex arrangements of processing units able to adapt theirparameters using learning techniques.

Their plasticity, nonlinearity, robustness and highly distributed framework have at-tracted a lot of attention from many areas of research.

Several applications have been studied using ANNs: classification, pattern recog-nition, clustering, function approximation, optimization, forecasting and prediction,among others.

To date, ANN models are the artificial intelligence methods that imitate human in-telligence more closely.

3 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

1890s: A neuron model

Santiago Ramon y Cajal proposes the brain works in a parallel and distributedmanner, with neurons as basic processing units.

He described the first complete biological model of the neuron.

4 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Neural synapses

A neural synapse is the region where the axon of a neuron interacts with anotherneuron.

A neuron usually receives information by means of its dendrites, but this is notalways the case.

Neurons share information using electrochemical signals.

5 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Action Potential

The signal sent by a single neuron is usually weak, but a neuron receives manyinputs from many other neurons.

The inputs from all the neurons are integrated. If a threshold is reached, the neuronsends a powerful signal through its axon, called an action potential.

6 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Neural Pathways

The action potential is an all-or-none signal. It doesn’t matter if the threshold isbarely reached or vastly surpassed, the resulting action potential is the same.

This means that the action potential alone does not carry much information. Allcerebral processes, like memory or learning, depend on neural pathways.

There are over 1011 neurons in the human brain, forming around 1015 synapses.They form the basis of human intelligence and consciousness.

7 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

1943: McCulloch and Pitts

Warren McCulloch (neurophysiologist) and Walter Pitts (matematician) wrote a pa-per describing a logical calculus of neural networks.

Their model can, in principle, approximate any computable function.

This is considered the birth of artificial intelligence.

8 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

1958: The Perceptron

Frank Rosenblatt (psychologist) proposes the Perceptron with a novel method ofsupervised learning.

This is the oldest neural network still in use today.

9 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Single-neuron Perceptron

Here, the activation function f was selected as a saturation function. This simulatesthe all-or-none property of the action potential.

The single-neuron Perceptron can solve classification problems of two linearly se-parable groups.

10 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Perceptron with a Layer of Neurons

Using several neurons, the Perceptron can clas-sify objects into many categories, as long asthey are linearly separable.

The number of total categories is 2S , with S thenumber of neurons.

11 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Training algorithm per neuron

1 Initialize the weights W0.

2 Compute the output of the network for input pk. If the output is correct, set

Wk+1 = Wk

3 If the output is incorrect, set

Wk+1 = Wk − ηpk, if WTk pk ≥ 0

Wk+1 = Wk + ηpk, if WTk pk < 0

Here, 0 < η ≤ 1 is the learning rate.

12 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Logical gates AND and OR

Separation of the outputs of the logical gates AND and OR are simple examples ofproblems solvable by the single-layer Perceptron.

In contrast, the outputs of the XOR gate are not linearly separable.

x1 x2 y0 0 00 1 01 0 01 1 1

AND

x1 x2 y0 0 00 1 11 0 11 1 1

OR

x1 x2 y0 0 00 1 11 0 11 1 0

XOR

13 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

1959: ADALINE and MADALINE

Bernard Widrow and Marcian Hoff developed models called ADALINE (adaptivelinear elements) and MADALINE (Multiple ADALINE).

The main difference with respect to the Perceptron is the absence of the thresholdactivation function.

Training of these networks is performed using derivatives.

14 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

1970s: First Winter in ANNs research

After the successful introduction and development of ANNs during the 1960s, in-terest in their applications decayed for almost two decades.

The limitations of the single-layer Perceptron narrowed its possible practical imple-mentations.

Theoretical research showed that a multilayer Perceptron would drastically improveits performance, but there was no training algorithm for it.

15 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

1986: Multilayer Perceptron

16 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

1986: Multilayer Perceptron

In his 1974 PhD thesis, Paul Werbos proposed to use the backpropagation algo-rithm as a solution to the multilayer Perceptron training problem. His suggestion,however, remained ignored for more than a decade.

In 1986, the backpropagation method is finally popularized in a paper by Rumelhart,Hinton and Williams.

The multilayer Perceptron became the most powerful ANN model to date.

It is proven to solve nonlinearly separable classification problems, it can approxi-mate any continuous function, it generalizes from particular samples, among manyother applications.

17 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Backpropagation TrainingThis is a supervised learning. Then, we have a list of inputs and their correspondingtarget outputs, (pk, tk).

We can compute the output of the NN for each given input pk. This is called theforward propagation step. For a 3 layer network, this would be

ak = f3(W 3f2(W 2f1(W 1pk + b1) + b2) + b3)

Define the output error as ek = tk − ak.

Now, it is our interest to minimize the squared error

J =1

2e2k =

1

2(tk − ak)2

or the average sum of the squared error

J =1

2Q

Q∑k=1

e2k =1

2Q

Q∑k=1

(tk − ak)2

18 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Backpropagation Training

Use a gradient descent algorithm to update the weights Wk while minimizing theerror ek

Wk+1 = Wk + ∆Wk

∆Wk = −η∂J

∂Wk

The chain rule for derivatives can be used to obtain a clearer expression

∂J

∂Wk=

∂J

∂ek

∂ek

∂ak

∂ak

∂Wk

From the previous definitions we note that

∂J

∂ek= ek,

∂ek

∂ak= −1

and ∂ak∂Wk

depends on the activation functions f i. Notice that all activation functionsmust be differentiable.

19 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Backpropagation Algorithm

1 Initialize the weights W0.

2 By forward propagation, get ak.

3 Calculate the error ek = tk − ak.

4 Update the neural weights as

Wk+1 = Wk + η∂ak

∂Wkek

20 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Activation functions

21 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Late 1990s - Early 2000’s: Second Winter in ANNs research

In the 1990s, several applications of ANNs are studied and implemented. Areas asvision, pattern recognition, unsupervised learning and reinforcement learning takeadvantage of the ANNs adaptive characteristics.

Late in that decade, a new difficulty delays the advancement in the field. The ba-sic backpropagation algorithm is not appropriate for several hidden layers, mainlybecause of limited computational capabilities.

Many researchers became pessimistic about ANNs.

22 / 23

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

2006-2016: Deep learning

In 2006, Hinton, Osindero and Teh published a fast learning algorithm for deepbelief networks. This marks the dawn of deep learning.

The decade of 2010s has seen a boom in deep neural network applications. Com-panies as Microsoft, Google and Facebook have developed advanced deep lear-ning ANNs. Optimism has returned to the field and human-level intelligence is ex-pected to be achieved in a few decades.

23 / 23