15
Extreme Learning Machines Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014

Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Extreme Learning MachinesTony Oakden

ANU AI Masters Project (early Presentation)

4/8/2014

Page 2: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

This presentation covers:

• Revision of Neural Network theory

• Introduction to Extreme Learning Machines ELM

• Early Results

• Brief description of code

• Discuss possible future work

Page 3: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Neural Network Revision

• In a single layer perceptron inputs are connected to output nodes via weights

• Training is carried out using least squares or similar function

• Pros• Simple and quick to train

• Cons• Can only learn to classify linearly separable problems

Page 4: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Hidden Layer

• To classify none linear data we must add an additional layer of weights between input and output (hidden layer)

• When combined with a suitable activation function (sigmoidal for example) the network can classify none linear functions

• To train the hidden layer we propagate errors on the output back through the network. This is the back propagation algorithm

• Pros:• Can theoretically classify any data set

• Cons• Training of the network can be very slower

Page 5: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Extreme Learning Machines

• Provide a way to train networks to classify none linear problems without back propagation

• These networks still use a hidden layer. But the weights and bias in the hidden layer are set to random values

• We only train the output nodes.

• Training is achieved using least squares algorithm

• Pros:• Very fast training time

• Cons:• Less accurate

http://www.ntu.edu.sg/home/egbhuang/

Page 6: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Wait, we use random weights? Huh?

• Sounds too good to be true so lets look at some results:

• http://fastml.com/extreme-learning-machines/

Page 7: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Two Spirals Data Set

• First set of experiments where carried out with the twin spiral data set.

• This was used because:• It is a difficult set to classify

• Easy visualization of results

Page 8: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Neural Network trained with back propagation

• 20 nodes in hidden layer

• Training time is 6.4 seconds

• Training accuracy is 100%

(Testing was performed with training data)

Page 9: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Extreme Learning Machine

• 20 nodes in hidden layer

• Training time is 0.02 seconds

• Training accuracy is 69%

Not great…

But…

Page 10: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Extreme Learning continued

• 200 nodes in hidden layer

• Training time is 0.066 seconds

• Training accuracy is 97%

If the number of nodes in the hidden layer is significantly increased then performance improves dramatically but time taken to train still remains much faster than a traditional network

Page 11: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Accuracy plotted against hidden Layer/20

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chart Title

Page 12: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Matlab Code • http://www.ntu.edu.sg/home/egbhuang/reference.html%create random weights for hidden layer

InputWeight=rand(NumberofHiddenNeurons,NumberofInputNeurons)*2-1;

BiasofHiddenNeurons=rand(NumberofHiddenNeurons,1);

….

tempH=InputWeight*trainData.P;

ind=ones(1,NumberofTrainingData);

BiasMatrix=BiasofHiddenNeurons(:,ind); % Extend the bias matrix BiasofHiddenNeurons to match the dimention of H

tempH=tempH+BiasMatrix;

% Calculate hidden neuron output matrix H

% we can use a variety of activation functions here but we’ll stick to sigmoidal for now…

H = 1 ./ (1 + exp(-tempH));

OutputWeight=pinv(H') * trainData.T'; % pinv gives Moore-Penrose pseudoinverse matrix

http://www.mathworks.com.au/help/matlab/ref/pinv.html

Page 13: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Conclusion

• As can be see training times for ELM are very fast.

• From these early experiments 100 times faster than traditional back prop for similar accuracy

• Accuracy is slightly lower, with other data sets back prop achieved 85% ELM 80%. But for many applications is still good enough

• Increasing the number of nodes in the hidden layer improves performance at the expense of a small increase in training time

Page 14: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

Further research

• Use of ELM with GA for feature selection (this weeks work)• Experiment with different data sets• Perform more rigorous analysis of results• So far we have only looked at binary classifiers. How does ELM algorithm

cope with multi-class classification?• Can we improve the accuracy of ELM in some way, maybe by combining

results with cascade networks?• What about continuous data sources?

• Second part of project is cascade networks, • can these be combined with elm in some way?

Page 15: Extreme Learning Machines...•Introduction to Extreme Learning Machines ELM •Early Results •Brief description of code •Discuss possible future work Neural Network Revision •In

references

Guang-Bin Huang: An Insight into Extreme Learning Machines: Random Neurons,

Random Features and Kernels (Springer Science+Business Media New York 2014)