Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht

Neural networksNeural networksfor data miningfor data mining

Eric PostmaEric Postma

MICC-IKATMICC-IKAT

Universiteit MaastrichtUniversiteit Maastricht

OverviewOverview

Introduction: The biology of neural networks• the biological computer

• brain-inspired models

• basic notions

Interactive neural-network demonstrations• Perceptron

• Multilayer perceptron

• Kohonen’s self-organising feature map

• Examples of applications

http://en.wikipedia.org/wiki/Image:Gray627.png

A typical AI agentA typical AI agent


Two types of learningTwo types of learning

• Supervised learningSupervised learning• curve fitting, surface fitting, ...curve fitting, surface fitting, ...

• Unsupervised learningUnsupervised learning• clustering, visualisation...clustering, visualisation...


An input-output functionAn input-output function


Fitting a surface to four pointsFitting a surface to four points


RegressionRegression


http://en.wikipedia.org/wiki/Image:Data_plot_women_weight_vs_height.jpg

http://en.wikipedia.org/wiki/Image:Plot_regression_women.jpg

ClassificationClassification


The history of neural networksThe history of neural networks

• A powerful metaphorA powerful metaphor

• Several decades of theoretical analyses led Several decades of theoretical analyses led to the formalisation in terms of statisticsto the formalisation in terms of statistics

• Bayesian frameworkBayesian framework

• We discuss neural networks from the We discuss neural networks from the original metaphorical perspective original metaphorical perspective


(Artificial) neural networks(Artificial) neural networks

The digital computer The digital computer versusversus

the neural computerthe neural computer


The Von Neumann architectureThe Von Neumann architecture


The biological architectureThe biological architecture


Digital versus biological computersDigital versus biological computers

5 distinguishing properties5 distinguishing properties• speedspeed• robustness robustness • flexibilityflexibility• adaptivityadaptivity• context-sensitivitycontext-sensitivity


Speed: Speed: The “hundred time steps” argumentThe “hundred time steps” argument

The critical resource that is most obvious is The critical resource that is most obvious is time. Neurons whose basic computational time. Neurons whose basic computational speed is a few milliseconds must be made to speed is a few milliseconds must be made to account for complex behaviors which are account for complex behaviors which are carried out in a few hudred milliseconds carried out in a few hudred milliseconds (Posner, 1978). This means that (Posner, 1978). This means that entire complex entire complex behaviors are carried out in less than a hundred behaviors are carried out in less than a hundred time steps.time steps.

Feldman and Ballard (1982)Feldman and Ballard (1982)


Graceful DegradationGraceful Degradation

damage

performance


Flexibility: the Flexibility: the NeckerNecker cube cube


vision = constraint satisfactionvision = constraint satisfaction


And sometimes plain search…And sometimes plain search…


AdaptivitiyAdaptivitiy

processing implies learningprocessing implies learning

in biological computers in biological computers

versus versus

processing does not imply learningprocessing does not imply learning

in digital computersin digital computers


Context-sensitivity: patternsContext-sensitivity: patterns

emergent propertiesemergent properties


Robustness and context-sensitivityRobustness and context-sensitivitycoping with noisecoping with noise


The neural computerThe neural computer

• Is it possible to develop a model after the Is it possible to develop a model after the natural example?natural example?

• Brain-inspired models:Brain-inspired models:• models based on a restricted set of structural en models based on a restricted set of structural en

functional properties of the (human) brainfunctional properties of the (human) brain


The Neural Computer (structure)The Neural Computer (structure)


Neurons, Neurons, the building blocks of the brainthe building blocks of the brain



Neural activityNeural activity

in

out


Synapses,Synapses,the basis of learning and memory the basis of learning and memory


Learning:Learning: Hebb Hebb’s rule’s ruleneuron 1 synapse neuron 2


Forgetting in neural networksForgetting in neural networks


Towards Towards neural networksneural networks


ConnectivityConnectivityAn example:An example:The visual system is a The visual system is a feedforward hierarchy of feedforward hierarchy of neural modules neural modules

Every module is (to a Every module is (to a certain extent) certain extent) responsible for a certain responsible for a certain functionfunction


(Artificial) Neural Networks(Artificial) Neural Networks

• NeuronsNeurons• activityactivity• nonlinear input-output functionnonlinear input-output function

• Connections Connections • weightweight

• LearningLearning• supervisedsupervised• unsupervisedunsupervised


Artificial NeuronsArtificial Neurons

• input (vectors)input (vectors)• summation (excitation)summation (excitation)• output (activation)output (activation)

i


Input-output functionInput-output function

• nonlinear function:nonlinear function:

e

f(e)

f(x) = 1 + e -x/a

1

a 0

a


Artificial Connections Artificial Connections (Synapses)(Synapses)

• wwABAB

• The weight of the connection from neuron The weight of the connection from neuron AA to to neuron neuron BB

A BwAB


The PerceptronThe Perceptron


Learning in the PerceptronLearning in the Perceptron• Delta learning ruleDelta learning rule

• the difference between the desired output the difference between the desired output ttand the actual output and the actual output oo, , given input given input xx

• Global error E Global error E • is a function of the differences between the is a function of the differences between the

desired and actual outputsdesired and actual outputs


Gradient DescentGradient Descent


Linear decision boundariesLinear decision boundaries


Minsky and Papert’s Minsky and Papert’s connectedness connectedness argumentargument


The history of the PerceptronThe history of the Perceptron

• Rosenblatt (1959)Rosenblatt (1959)

• Minsky & Papert (1961)Minsky & Papert (1961)

• Rumelhart & McClelland (1986)Rumelhart & McClelland (1986)


The multilayer perceptronThe multilayer perceptron

input

one or more hidden layers

output


Training the MLPTraining the MLP• supervised learningsupervised learning

• each training pattern: input + desired output each training pattern: input + desired output • in each in each epochepoch: present all patterns : present all patterns • at each presentation: adapt weightsat each presentation: adapt weights• after many epochs convergence to a local minimumafter many epochs convergence to a local minimum


phoneme recognition with a MLPphoneme recognition with a MLP

input: frequencies

Output:pronunciation


Non-linear decision boundariesNon-linear decision boundaries


Compression with an MLPCompression with an MLPthe the autoencoderautoencoder


hidden representationhidden representation


Restricted Boltzmann machines (RBMs)Restricted Boltzmann machines (RBMs)


http://www.sciencemag.org/content/vol313/issue5786/images/large/313_504_F1.jpeg

Learning in the MLPLearning in the MLP




Preventing OverfittingPreventing Overfitting

GENERALISATION GENERALISATION = performance on test set= performance on test set

• Early stoppingEarly stopping• Training, Test, and Validation setTraining, Test, and Validation set• kk-fold cross validation-fold cross validation

• leaving-one-out procedureleaving-one-out procedure


Image Recognition with the MLPImage Recognition with the MLP



Hidden RepresentationsHidden Representations


Other ApplicationsOther Applications

• PracticalPractical• OCROCR• financial time seriesfinancial time series• fraud detectionfraud detection• process controlprocess control• marketingmarketing• speech recognitionspeech recognition

• TheoreticalTheoretical• cognitive modelingcognitive modeling• biological modelingbiological modeling


Some mathematics…Some mathematics…


PerceptronPerceptron


Derivation of the delta learning ruleDerivation of the delta learning rule

Target output

Actual output

h = i


MLPMLP


Sigmoid functionSigmoid function

• May also be theMay also be the tanhtanh functionfunction • (<-1,+1> (<-1,+1> instead of instead of <0,1>)<0,1>)

• DerivativeDerivative f’(x) = f(x) [1 – f(x)] f’(x) = f(x) [1 – f(x)]


Derivation generalized delta ruleDerivation generalized delta rule


Error funError functionction (LMS) (LMS)


AdaptationAdaptation hidden-output hidden-output weightsweights


AAdaptationdaptation input-hidden input-hidden weightsweights


Forward Forward andand Backward Propagation Backward Propagation


Decision boundaries of PerceptronsDecision boundaries of Perceptrons

Straight lines (surfaces), linear separable


Decision boundaries of MLPsDecision boundaries of MLPs

Convex areas (open or closed)


Decision boundaries of MLPs Decision boundaries of MLPs

Combinations of convex areas


Learning and representing Learning and representing similaritysimilarity


Alternative conception of neuronsAlternative conception of neurons

• Neurons do not take the weighted sum of their Neurons do not take the weighted sum of their inputs (as in the perceptron), but measure the inputs (as in the perceptron), but measure the similarity of the weight vector to the input similarity of the weight vector to the input vectorvector

• The activation of the neuron is a measure of The activation of the neuron is a measure of similarity. The more similar the weight is to the similarity. The more similar the weight is to the input, the higher the activationinput, the higher the activation

• Neurons represent “prototypes”Neurons represent “prototypes”


Course CodingCourse Coding


22nd ordernd order isomor isomorphismphism


Prototypes forPrototypes for preprocessing preprocessing


Kohonen’s SOFMKohonen’s SOFM(Self Organizing Feature Map)(Self Organizing Feature Map)

• Unsupervised learningUnsupervised learning• Competitive learningCompetitive learning

output

input (n-dimensional)

winner


Competitive learningCompetitive learning

• Determine the winner (the neuron of which Determine the winner (the neuron of which the weight vector has the smallest distance the weight vector has the smallest distance to the input vector)to the input vector)

• Move the weight vector Move the weight vector ww of the winning of the winning neuron towards the input neuron towards the input ii

Before learning

i

w

After learning

i w


Kohonen’s ideaKohonen’s idea

• Impose a topological order onto the Impose a topological order onto the competitive neurons (e.g., competitive neurons (e.g., rectangular map)rectangular map)

• Let neighbours of the winner share Let neighbours of the winner share the “prize” (The “postcode lottery” the “prize” (The “postcode lottery” principle.)principle.)

• After learning, neurons with similar After learning, neurons with similar weights tend to cluster on the mapweights tend to cluster on the map


Biological inspirationBiological inspiration


Topological orderTopological order

neighbourhoodsneighbourhoods• SquareSquare

• winner (red)winner (red)• Nearest neighboursNearest neighbours

• HexagonalHexagonal• Winner (red)Winner (red)• Nearest neighboursNearest neighbours


inputs

Outputs(map)



A simple exampleA simple example

• A topological map of 2 x 3 neurons A topological map of 2 x 3 neurons and two inputsand two inputs

2D input

input

weights

visualisation


Weights before trainingWeights before training


Input patterns Input patterns (note the 2D distribution)(note the 2D distribution)


Weights after trainingWeights after training


Another exampleAnother example

• Input: uniformly randomly distributed pointsInput: uniformly randomly distributed points

• Output: Map of 20Output: Map of 2022 neurons neurons

• TrainingTraining• Starting with a large learning rate and Starting with a large learning rate and

neighbourhood size, both are gradually decreased neighbourhood size, both are gradually decreased to facilitate convergenceto facilitate convergence


Weights visualisationWeights visualisation


Dimension reductionDimension reduction

3D input2D output


Adaptive resolutionAdaptive resolution

2D input2D output


Output map representationOutput map representation


Application of SOFMApplication of SOFM

Examples (input) SOFM after training (output)


Visual features (biologically plausible)Visual features (biologically plausible)


Face Face ClassificationClassification


Colour classificationColour classification


Car classificationCar classification


• Principal Components Analysis (PCA)Principal Components Analysis (PCA)

pca1pca2

pca1

pca2

Projections of data

Relation with statistical methods 1Relation with statistical methods 1


Relation with statistical methods 2Relation with statistical methods 2• Multi-Dimensional Scaling (MDS)Multi-Dimensional Scaling (MDS)• Sammon MappingSammon Mapping

Distances in high-dimensional space


Documents

Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht