A New Learning Method for Single Layer Neural Networks Based on a Regularized Cost Function

A New Learning Method for Single Layer Neural Networks Based on a

Regularized Cost Function

Juan A. Suárez-RomeroÓscar Fontenla-Romero

Bertha Guijarro-BerdiñasAmparo Alonso-Betanzos

Laboratory for Research and Development in Artificial Intelligence

Department of Computer Science, University of A Coruña, Spain

2

Outline

• Introduction

• Supervised learning + regularization

• Alternative loss function

• Experimental results

• Conclusions and Future Work

3

Single layer neural network

f 1

f 2

f J

+

+

+

x 1 s

x 2 s

x I s

1 y 1 s

y 2 s

y J s

z 1 s

z 2 s

z J s

•I inputs

•J outputs

•S samples

4

f 1

f 2

f J

+

+

+

x 1 s

x 2 s

x I s

1 y 1 s

y 2 s

y J s

z 1 s

z 2 s

z J s

Single layer neural network

f j+

x 1 s

x 2 s

x I s

1

y j sz j s

b j

w j 1

w j 2

w j I

5

Cost function


MSE Regularization term(Weight Decay)

Non-linear neural functions

⇓Not guaranteed to have a unique minimum

(local minima)

6

Alternative loss function

• Theorem Let xjs be the j-th input of a one-layer neural network, djs, yjs be the j-th desired and actual outputs, wij, bj be the weights, and f, f-1, f´ be the nonlinear function, its inverse and its derivative. Then to minimize Lj is equivalent to minimize, up to the first order of the Taylor series expansion, the below alternative loss function:

where:

7

Alternative loss function

f j+

x 1 s

x 2 s

x I s

1

y j s

z j s d j s

d j s = f - 1 ( d j s )

8

Alternative cost function


Alternative MSE Regularization term(Weight Decay)

9


• Optimal weights and bias can be obtained deriving it with respect to the weights and the bias of the network and equating the partial derivatives to zero

10


• We can rewrite previous system to obtain a system of (I+1)×(I+1) linear equations Variables

Independent termsCoefficients

• Advantages– Solved using a system of linear equations ⇒ fast training with

low computational cost– Convex function ⇒ unique minimum– Incremental + parallel learning ⇒ only the coefficients matrix and

the independent terms vector must be stored

11

Experimental results

• Two kind of problems

• Intrusion Detection– Classification problem

• Box-Jenkins time series– Regression problem

xe1

1f(x) −+

=

[0,1]α ∈

12

Intrusion Detection problem

• KDD’99 Classifier Learning Contest

• Two-class classification problem: attack and normal connections

• Each sample formed by 41 high-level features

• 30000 samples for training

• 4996 samples for testing

13


• In order to study the influence of training set size and regularization parameter– Initial training set of 100 samples

– Next training set is obtained adding 100 new samples to previous set, up to 2500 samples

– For each training set, several neural networks have been trained, with α from 0 (no regularization) to 1, in steps of 5×10-3

• In order to obtain a better estimation of the true error– Repeat this process 12 times with different training set

• The α with minimum test classification error is chosen

14


700400

15

Box-Jenkins problem

• Regression problem

• Estimate CO2 concentration in a gas furnace from methane flow rate

• Predict y(t) from {y(t-1), y(t-2), y(t-3), y(t-4), u(t-1), u(t-2), u(t-3), u(t-4), u(t-5), u(t-6)}

• 290 samples

16

Box-Jenkins problem

• In order to study the influence of training set size and regularization parameter– 10-fold cross validation (261 examples for training and 29 for

testing)

– For each validation round, generate several training sets, from 9 to 261 examples, in steps of 9 examples

– For each previous data set, train and test several neural networks varying α from 0 (no regularization) to 1 in steps of 10-3

• In order to obtain a better estimation of the true error, mainly with small training sets– Repeat validation 10 times with different composition of training

sets

• The α with minimum NMSE error is chosen

17

Box-Jenkins problem

18

Box-Jenkins problem

• There is no difference using regularization (except for small training sets)

• The neural network performs well, and using regularization do not enhance results

• Add normal random noise with σ=γσt, where σt is standard desviation from original time series, and γ∈{0.5, 1}

19

Box-Jenkins problem

198

189207

20

Conclusions and Future Work

• A new supervised learning method for single layer neural networks using regularization has been introduced– Global optimum

– Fast training

– Incremental and parallel learning

– Better generalization capability

• Applied to two problems: classification and regression– Regularization generally obtains a better solution, mainly with

small training sets or noisy data

• As future work, an analytical method to obtain the regularization parameter is being analyzed

A New Learning Method for Single Layer Neural Networks Based on a

Regularized Cost Function

Juan A. Suárez-RomeroÓscar Fontenla-Romero

Bertha Guijarro-BerdiñasAmparo Alonso-Betanzos

Laboratory for Research and Development in Artificial Intelligence

Department of Computer Science, University of A Coruña, Spain

T h a n ky o u

f o

r

y o u ra t t e n t i o

!n

Technology

A New Learning Method for Single Layer Neural Networks Based on a Regularized Cost Function