23
pdfcrowd.com open in browser PRO version Are you a developer? Try out the HTML to PDF API LCC HOME | CLASSES | CONTACT US SEARCH | A - Z | QUICK FIND INTRO NEURAL NETWORKS An Introduction to Neural Networks: The Perceptron Feedback

An Introduction to Neural Networks © 2013 Lower Columbia College

Embed Size (px)

DESCRIPTION

© 2013 Lower Columbia College

Citation preview

Page 2: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

The human brain is essentially a large and unimaginably complex Neural Network. We can also thinkof the brain as an organized series of interconnected subsections of Neural Networks. We will look athow nature has implemented the Neural Network, and then look at the workings of the most commonartificial Neural Network, the Perceptron

Neurons

Page 3: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

The Neural Network of a mature human brain contains about 100 billion nerve cells called neurons.These neurons are the fundamental part of the Neural Network. Neurons form complex networks ofinterconnections, called synapses, with each other. A typical neuron can interconnect with up to10,000 other neurons, with the average neuron interconnecting with about 1,000 other neurons.

Synapse of Interconnecting Neurons

Page 4: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

For a more detailed information on the synaptic interconnections between neurons at the microscopiclevel, there are interesting animations to be found at:

Chemical Synapse

The Mind Project

Brain Basics - Firing of Neurons

The Biological Neural Network

Page 5: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Altho the mechanisms of synapse itself is compelling, the focus of this article is the Neural Networkitself, and particularly, the Perceptron. The Perceptron is a simple and common configuration of anartificial Neural Network. We will start with a brief history of Artificial Intelligence and the Perceptronitself.

Artificial Intelligence - Mimicking the Human Brain

Page 6: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

With the advent of electronic computers in the 1940s, people beganto think about the possibility of artificial brains, or what is commonlyknown as Artificial Intelligence. In the beginning, some thought thatthe logic gate, the building block of digital computers, could serve asan artificial neuron, but this idea was quickly rejected. In 1949, DonaldHebb proposed an artificial neuron that more closely mimicked thebiological neuron, where each neuron would have numerousinterconnections with other neurons. Each of these interconnectswould have a 'weight' multiplier associated with it. Learning would beachieved by changing the weight multipliers of each of theinterconnections. In 1957, Frank Rosenblatt implemented a Hebbneuron, which he called a 'Perceptron'.

In 1974, Paul Werbos in hisPhD thesis first described theprocess of training artificialneural networks through a process called the"Backpropagation of Errors". Just as Frank Rosenblattdeveloped the ideas of Donald Hebb, in 1986 David E.Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams tookthe idea of Paul Werbos and developed a practicalBackpropagation algorithm, which led to a renaissance in thefield of artificial neural network research.

Where that renaissance has led is to a new 'Perceptron', amulti-layered Perceptron. The multi-layered Perceptron of

today is now synonomous with the term 'Perceptron', and has also become synonomous with theterm 'Neural Network' itself.

The Feed-Forward Multi-Layered Perceptron

Page 7: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

What makes the modern Feed-Forward Multi-Layered Perceptron so powerful is that is essentiallyteaches itself by using the Backpropagation learning algorithm. We will look into how the Multi-Layered perceptron works, and the process by which it teaches itself using Backpropagation.

Neural Networks Using the Multi-Layered PerceptronNASA: A Prediction of Plant Growth in Space

Page 8: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Obviously, anything done in space must be done a efficient as possible. To optimize plant growth,NASA created this Perceptron, taught with actual data, to simulate different growth environments.

Mayo Clinic: A Tumor Classifier

Page 9: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

The above perceptron is self-explanitory. A perceptron need not be complex to be useful.

An Early Commercial Use: The Original Palm Pilot

Page 10: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Altho it may seem awkward now since most cell phones have a full keyboard, the early Palm Pilot useda stylus, or electronic pen to enter in characters freehand. It used a perceptron to learn to read aparticular user's handwriting. An unforseen popular use of the Palm Pilot was for anthopologists to useit to enter script from ancient languages that they transcribed from ancient stone and clay tablets.

Page 11: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Papnet: Assisted Screening of Pap Smears

Papnet is a commercial neural network-based computer program for assisted screening of Pap(cervical) smears. A Pap smear test examines cells taken from the uterine cervix for signs ofprecancerous and cancerous changes. A properly taken and analysed Pap smear can detect very earlyprecancerous changes. These precancerous cells can then be eliminated, usually in a relatively simpleoffice or outpatient procedure.

Type These Characters: The anti-NeuralNet Application

Page 12: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

You may have seen the kind of prompt shown above when logging on to some web site. Its purpose isto disguise a sequence of letters so a Neural Net cannot read it. With this readable-only-by-a-humansafeguard, web sites are protected against other computers entering these sites via exhaustiveattempts at passwords.

The Individual Nodes of the Multi-Layered Perceptron

Since the modern Perceptron is a Neural Network in itself, to understand it we need to go back toits basic building block, the artificial neuron. As was stated, the original perceptron served as theartificial neuron. We will call what serves today as the artificial neuron the Threshold Logic Unit, orTLU.

Page 14: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

The original Hebb neuron would sum all of its inputs. Each input in turn was the product of anexternal input times that external input's cooresponding weight multiplier. The Threshold Logic Unit,or TLU, adds a significant feature to the original Hebb neuron, the Activation Function. TheActivation Function takes as input the sum of what is now called Input Function, which isessentially the Hebb neuron, and scales it to a value between 0 and 1.

The selection of the mathematical function that implements the Activation Function is a pivotaldesign decision. Not only does it control the mapping if the Input Function's sum to a valuebetween 0 and 1, its selection directly affects the development of the Perceptron's ability to teachitself, as we will see shortly. A common Activation Function that we will use is the sigmoidfunction.

With the sigmoid Activation Function, each TLU will now output a value between 0 and 1. This 0-1valued output, coupled with a weight multiplier with a value between -1 and +1, will keep values within

Page 15: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

the network to a manageable level. With the Activation Function, the TLU more closely mimics theoperation of the neuron.

Teaching the Perceptron Using Backpropagation

Let's start with a very simple perceptron with 2 Inputs and 2 outputs, as shown below. As ourperceptron receives Input 1 and Input 2 it responds with the values of its ouputs, Output 1 andOutput 2.

The process of teaching this perceptron consists of giving it a series of input pairs, and thencomparing the actual output values that were generated with the desired output values thatcorrespond to each pair of inputs. Based upon the difference between the actual output values andthe desired output values, adjustments are made.

The only things that are changed during training are the weight multipliers. Consequently, theprocess of teaching the perceptron is a matter of changing the weight multipliers until the actual

Page 16: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

outputs are as close as possible to the desired outputs. As we stated earlier, the perceptron usesthe process of Backpropagation to change its weight multipliers, and thus teach itself.

Mathematics of Learning via Minimizing Error

Locating Output 1 in our simple perceptron, we can see that it has 2 inputs. Those inputs to Output 1are the outputs from each of the TLUs in the Hidden Layer. The Hidden Layer outputs are eachmultiplied by their corresponding weigth multipliers, wo11 and wo21. The weights are identified bytheir source and destination TLUs. For example, the ID 'wo21' stands for weight to an ouput fromHidden node 2 to Output node 1.

Looking again at the weigth multipliers of the 2 inputs to Output 1, wo11 and wo21, we can make a3 dimensional graph where the x-axis corresponds to the value of the wo11 weight multiplier and they-axis corresponds to the value of the wo21 weight multiplier. The meaning of the height, or z-axiswill follow.

Page 17: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Altho weights multipliers can be negative, we will consider only the possible positive values for theseweights, which are between 0 and 1. For a given value of the 2 coordinates (wo11, wo21), there isan associated amount of difference between the desired value of Output 1 and the actual value. Thisdifference we will call the delta, and is the value of the height in the z-axis.

At the ideal values of wo11 and wo21, the Delta is zero, so the height is zero. The farther anypair of wo11 and wo21 values are from the ideal values, the height of the delta (size of the error)increases. The result is that this 3D graph forms a bowl or funnel shape, with the ideal values of

Page 18: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

wo11 and wo21 at the bottom.

So any time we find ourselves at some point (wo11, wo21) on the graph that is not the idealpoint, we will want to slide downhill toward the bottom of our virtual bowl. Differential Calculus givesus this ability with the Gradient. The mathematical function for the Gradient is:

Yikes! Fortunately, we don't have to worry about the particulars. We just need to know thatmathematically, once we have identified a non-ideal pair of values for wo11 and wo21 in our virtualbowl, we have a means of determining the direction to go to get closer to the ideal values.

Now we can get a feel for the theoretical "Backpropagation of Errors" process that Paul Werbosdescribed in 1974. We will now go into the steps of the actual process that was finally implementedby a team in 1986. Obviously, it was not a trivial task.

In short, the team of 1986 constructed a generalization of the complicated mathematics for ageneric perceptron, and then simplified that mathematical process down into simple parts. Thismathematical simplification was essentially doing on a very large scale what we do on a small scalewhen we simplify a fraction to lowest terms.

To imagine the level of complexity of the original model, consider that in our simple perceptron,Output 1 has only 2 inputs. These 2 inputs form a 3 dimensional graph. To model up to n inputs,mathematicians had to imagine a virtual 'bowl' in n+1 dimensional hyperspace. Then they had todescribe an n+1 dimensional gradient.

After all this complexity, the first major simplification was to eliminate the n+1 dimensionalhyperspace. Differential Calculus is still involved, but only in 2 dimensions with 1 independentvariable. So the adjustment to each incoming weight multiplier could be considered independently.

Taking another look at our perceptron:

Page 19: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

We can now mathematically adjust out weights to Output 1 with the following equations:

Page 20: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

The Learning rate constant η is a fractional multiplier used to limit the size of any particularadjustment. With computers the perceptron can go thru the same learning sequence over and overagain. So each adjustment can be small. If adjustments are too large, the adjustments mightovercompensate and the weights would just oscillate back and forth.

Whew! Things are now much better now than they were with partial derivatives in hyperspace, butthere is still the matter of the derivative of the sigmoid function, which is our TLU's Activationfunction. As stated earlier, using the sigmoid function was a pivotal design decision. By looking atits derivative, we can see why:

Page 21: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Consequently, the derivative of the sigmoid function, or Activation function, becomes a simplealgebraic expression of the value of the function itself. Since the value of the Activation function, whichwe call y, has to be computed anyway to determine the output value of any TLU, the derivative termbecomes trivial.

With this simple derivative term, once we compute y, the adjustment to wo21 becomes the simplealgebraic expression:

Page 22: An Introduction to Neural Networks © 2013 Lower Columbia College

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

ONTO PAGE 2:

THE PROCESS OF BACKPROPAGATION

Copyleft © 2010 - Feel free to use for educational purposesby Cary Rhode, Math instructor at Lower Columbia College, Longview, WA, USAand all around Great Guy [email protected]