Upload
juan-a-suarez-romero
View
1.297
Download
1
Embed Size (px)
DESCRIPTION
Presentation at IWANN 2003
Citation preview
A New Learning Method for Single Layer Neural Networks Based on a
Regularized Cost Function
Juan A. Suárez-RomeroÓscar Fontenla-Romero
Bertha Guijarro-BerdiñasAmparo Alonso-Betanzos
Laboratory for Research and Development in Artificial Intelligence
Department of Computer Science, University of A Coruña, Spain
2
Outline
• Introduction
• Supervised learning + regularization
• Alternative loss function
• Experimental results
• Conclusions and Future Work
3
Single layer neural network
f 1
f 2
f J
+
+
+
x 1 s
x 2 s
x I s
1 y 1 s
y 2 s
y J s
z 1 s
z 2 s
z J s
•I inputs
•J outputs
•S samples
4
f 1
f 2
f J
+
+
+
x 1 s
x 2 s
x I s
1 y 1 s
y 2 s
y J s
z 1 s
z 2 s
z J s
Single layer neural network
f j+
x 1 s
x 2 s
x I s
1
y j sz j s
b j
w j 1
w j 2
w j I
5
Cost function
• Supervised learning + regularization
MSE Regularization term(Weight Decay)
Non-linear neural functions
⇓Not guaranteed to have a unique minimum
(local minima)
6
Alternative loss function
• Theorem Let xjs be the j-th input of a one-layer neural network, djs, yjs be the j-th desired and actual outputs, wij, bj be the weights, and f, f-1, f´ be the nonlinear function, its inverse and its derivative. Then to minimize Lj is equivalent to minimize, up to the first order of the Taylor series expansion, the below alternative loss function:
where:
7
Alternative loss function
f j+
x 1 s
x 2 s
x I s
1
y j s
z j s d j s
d j s = f - 1 ( d j s )
8
Alternative cost function
• Supervised learning + regularization
Alternative MSE Regularization term(Weight Decay)
9
Alternative cost function
• Optimal weights and bias can be obtained deriving it with respect to the weights and the bias of the network and equating the partial derivatives to zero
10
Alternative cost function
• We can rewrite previous system to obtain a system of (I+1)×(I+1) linear equations Variables
Independent termsCoefficients
• Advantages– Solved using a system of linear equations ⇒ fast training with
low computational cost– Convex function ⇒ unique minimum– Incremental + parallel learning ⇒ only the coefficients matrix and
the independent terms vector must be stored
11
Experimental results
• Two kind of problems
• Intrusion Detection– Classification problem
• Box-Jenkins time series– Regression problem
xe1
1f(x) −+
=
[0,1]α ∈
12
Intrusion Detection problem
• KDD’99 Classifier Learning Contest
• Two-class classification problem: attack and normal connections
• Each sample formed by 41 high-level features
• 30000 samples for training
• 4996 samples for testing
13
Intrusion Detection problem
• In order to study the influence of training set size and regularization parameter– Initial training set of 100 samples
– Next training set is obtained adding 100 new samples to previous set, up to 2500 samples
– For each training set, several neural networks have been trained, with α from 0 (no regularization) to 1, in steps of 5×10-3
• In order to obtain a better estimation of the true error– Repeat this process 12 times with different training set
• The α with minimum test classification error is chosen
14
Intrusion Detection problem
700400
15
Box-Jenkins problem
• Regression problem
• Estimate CO2 concentration in a gas furnace from methane flow rate
• Predict y(t) from {y(t-1), y(t-2), y(t-3), y(t-4), u(t-1), u(t-2), u(t-3), u(t-4), u(t-5), u(t-6)}
• 290 samples
16
Box-Jenkins problem
• In order to study the influence of training set size and regularization parameter– 10-fold cross validation (261 examples for training and 29 for
testing)
– For each validation round, generate several training sets, from 9 to 261 examples, in steps of 9 examples
– For each previous data set, train and test several neural networks varying α from 0 (no regularization) to 1 in steps of 10-3
• In order to obtain a better estimation of the true error, mainly with small training sets– Repeat validation 10 times with different composition of training
sets
• The α with minimum NMSE error is chosen
17
Box-Jenkins problem
18
Box-Jenkins problem
• There is no difference using regularization (except for small training sets)
• The neural network performs well, and using regularization do not enhance results
• Add normal random noise with σ=γσt, where σt is standard desviation from original time series, and γ∈{0.5, 1}
19
Box-Jenkins problem
198
189207
20
Conclusions and Future Work
• A new supervised learning method for single layer neural networks using regularization has been introduced– Global optimum
– Fast training
– Incremental and parallel learning
– Better generalization capability
• Applied to two problems: classification and regression– Regularization generally obtains a better solution, mainly with
small training sets or noisy data
• As future work, an analytical method to obtain the regularization parameter is being analyzed
A New Learning Method for Single Layer Neural Networks Based on a
Regularized Cost Function
Juan A. Suárez-RomeroÓscar Fontenla-Romero
Bertha Guijarro-BerdiñasAmparo Alonso-Betanzos
Laboratory for Research and Development in Artificial Intelligence
Department of Computer Science, University of A Coruña, Spain
T h a n ky o u
f o
r
y o u ra t t e n t i o
!n