Artificial Neural Networks - University of Texas at Arlington history-victor 2017.pdf · per describing a logical calculus of neural networks. ... 1959: ADALINE and MADALINE Bernard

First modelsFirst Winter

Multilayer PerceptronSecond WinterDeep Learning

Artificial Neural Networks

Historical description

Victor G. Lopez

1 / 23



DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron

Artificial Neural Networks (ANN)

• An artificial neural network is a computational model that attempts to

emulate the functions of the brain.

2 / 23




Characteristics of ANNs

Modern ANNs are complex arrangements of processing units able to adapt theirparameters using learning techniques.

Their plasticity, nonlinearity, robustness and highly distributed framework have at-tracted a lot of attention from many areas of research.

Several applications have been studied using ANNs: classification, pattern recog-nition, clustering, function approximation, optimization, forecasting and prediction,among others.

To date, ANN models are the artificial intelligence methods that imitate human in-telligence more closely.

3 / 23




1890s: A neuron model

Santiago Ramon y Cajal proposes the brain works in a parallel and distributedmanner, with neurons as basic processing units.

He described the first complete biological model of the neuron.

4 / 23




Neural synapses

A neural synapse is the region where the axon of a neuron interacts with anotherneuron.

A neuron usually receives information by means of its dendrites, but this is notalways the case.

Neurons share information using electrochemical signals.

5 / 23




Action Potential

The signal sent by a single neuron is usually weak, but a neuron receives manyinputs from many other neurons.

The inputs from all the neurons are integrated. If a threshold is reached, the neuronsends a powerful signal through its axon, called an action potential.

6 / 23




Neural Pathways

The action potential is an all-or-none signal. It doesn’t matter if the threshold isbarely reached or vastly surpassed, the resulting action potential is the same.

This means that the action potential alone does not carry much information. Allcerebral processes, like memory or learning, depend on neural pathways.

There are over 1011 neurons in the human brain, forming around 1015 synapses.They form the basis of human intelligence and consciousness.

7 / 23




1943: McCulloch and Pitts

Warren McCulloch (neurophysiologist) and Walter Pitts (matematician) wrote a pa-per describing a logical calculus of neural networks.

Their model can, in principle, approximate any computable function.

This is considered the birth of artificial intelligence.

8 / 23




1958: The Perceptron

Frank Rosenblatt (psychologist) proposes the Perceptron with a novel method ofsupervised learning.

This is the oldest neural network still in use today.

9 / 23




Single-neuron Perceptron

Here, the activation function f was selected as a saturation function. This simulatesthe all-or-none property of the action potential.

The single-neuron Perceptron can solve classification problems of two linearly se-parable groups.

10 / 23




Perceptron with a Layer of Neurons

Using several neurons, the Perceptron can clas-sify objects into many categories, as long asthey are linearly separable.

The number of total categories is 2S , with S thenumber of neurons.

11 / 23




Training algorithm per neuron

1 Initialize the weights W0.

2 Compute the output of the network for input pk. If the output is correct, set

Wk+1 = Wk

3 If the output is incorrect, set

Wk+1 = Wk − ηpk, if WTk pk ≥ 0

Wk+1 = Wk + ηpk, if WTk pk < 0

Here, 0 < η ≤ 1 is the learning rate.

12 / 23




Logical gates AND and OR

Separation of the outputs of the logical gates AND and OR are simple examples ofproblems solvable by the single-layer Perceptron.

In contrast, the outputs of the XOR gate are not linearly separable.

x1 x2 y0 0 00 1 01 0 01 1 1

AND

x1 x2 y0 0 00 1 11 0 11 1 1

OR

x1 x2 y0 0 00 1 11 0 11 1 0

XOR

13 / 23




1959: ADALINE and MADALINE

Bernard Widrow and Marcian Hoff developed models called ADALINE (adaptivelinear elements) and MADALINE (Multiple ADALINE).

The main difference with respect to the Perceptron is the absence of the thresholdactivation function.

Training of these networks is performed using derivatives.

14 / 23



1970s: First Winter in ANNs research

After the successful introduction and development of ANNs during the 1960s, in-terest in their applications decayed for almost two decades.

The limitations of the single-layer Perceptron narrowed its possible practical imple-mentations.

Theoretical research showed that a multilayer Perceptron would drastically improveits performance, but there was no training algorithm for it.

15 / 23



1986: Multilayer Perceptron

16 / 23



1986: Multilayer Perceptron

In his 1974 PhD thesis, Paul Werbos proposed to use the backpropagation algo-rithm as a solution to the multilayer Perceptron training problem. His suggestion,however, remained ignored for more than a decade.

In 1986, the backpropagation method is finally popularized in a paper by Rumelhart,Hinton and Williams.

The multilayer Perceptron became the most powerful ANN model to date.

It is proven to solve nonlinearly separable classification problems, it can approxi-mate any continuous function, it generalizes from particular samples, among manyother applications.

17 / 23



Backpropagation TrainingThis is a supervised learning. Then, we have a list of inputs and their correspondingtarget outputs, (pk, tk).

We can compute the output of the NN for each given input pk. This is called theforward propagation step. For a 3 layer network, this would be

ak = f3(W 3f2(W 2f1(W 1pk + b1) + b2) + b3)

Define the output error as ek = tk − ak.

Now, it is our interest to minimize the squared error

J =1

2e2k =

1

2(tk − ak)2

or the average sum of the squared error

J =1

2Q

Q∑k=1

e2k =1

2Q

Q∑k=1

(tk − ak)2

18 / 23



Backpropagation Training

Use a gradient descent algorithm to update the weights Wk while minimizing theerror ek

Wk+1 = Wk + ∆Wk

∆Wk = −η∂J

∂Wk

The chain rule for derivatives can be used to obtain a clearer expression

∂J

∂Wk=

∂J

∂ek

∂ek

∂ak

∂ak

∂Wk

From the previous definitions we note that

∂J

∂ek= ek,

∂ek

∂ak= −1

and ∂ak∂Wk

depends on the activation functions f i. Notice that all activation functionsmust be differentiable.

19 / 23



Backpropagation Algorithm

1 Initialize the weights W0.

2 By forward propagation, get ak.

3 Calculate the error ek = tk − ak.

4 Update the neural weights as

Wk+1 = Wk + η∂ak

∂Wkek

20 / 23



Activation functions

21 / 23



Late 1990s - Early 2000’s: Second Winter in ANNs research

In the 1990s, several applications of ANNs are studied and implemented. Areas asvision, pattern recognition, unsupervised learning and reinforcement learning takeadvantage of the ANNs adaptive characteristics.

Late in that decade, a new difficulty delays the advancement in the field. The ba-sic backpropagation algorithm is not appropriate for several hidden layers, mainlybecause of limited computational capabilities.

Many researchers became pessimistic about ANNs.

22 / 23



2006-2016: Deep learning

In 2006, Hinton, Osindero and Teh published a fast learning algorithm for deepbelief networks. This marks the dawn of deep learning.

The decade of 2010s has seen a boom in deep neural network applications. Com-panies as Microsoft, Google and Facebook have developed advanced deep lear-ning ANNs. Optimism has returned to the field and human-level intelligence is ex-pected to be achieved in a few decades.

23 / 23

Documents

Artificial Neural Networks - University of Texas at Arlington history-victor 2017.pdf · per describing a logical calculus of neural networks. ... 1959: ADALINE and MADALINE Bernard