Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

REDUCED HYPERBF NETWORKS: REDUCED HYPERBF NETWORKS: REGULARIZATION BY EXPLICIT COMPLEXITYREDUCTION AND SCALED RPROP BASED TRAINING

Rami N. MahdiEric C. Rouchka

Bi i f ti L b

1

Bioinformatics LabDepartment of Computer Engineering and Computer ScienceUniversity of Louisville

PATTERN RECOGNITION

Classify data samples based either on:A priori knowledgeStatistical information extracted from available labeled data

Different methods learn the boundaries using different approachesapproaches. 2

SUPPORT VECTOR MACHINE

Transform samples to a new spaceFind points at the boundaryMaximize the separation margin

3

RBF - NNLearn significant clustersClass samples are distinctively described by a sum of weighted Gaussians

4

RBF - NN

1

Diagonal Scaling Matrices Full Scaling Matrices

Results are interpretableSignificant neurons represent significant clusters

5

HyperBF Networks

Regular RBF HyperBF

6

Locally Scaled RBF (HyperBF)

7Simplified Notation

HYPERBF-NN (ELLIPTICAL GAUSSIANS)

TrainingPerform clustering Initialize neuronsInitialize weightsEstimates all variables simultaneously using gradient optimization

11

8

CHALLENGES

Challenging Optimizationg g pExample: (MNIST hand written digits: 748 features)

100 neurons would contain 156900 parameters.

Optimization Function Not Convexp

Over Fitting (very complex model)

9

RPROP ALGORITHM

It uses a separate learning factor for every variableUses the direction of the first derivative and not the magnitude

η increases if the direction of the derivative stays the same from previous iteration.

d if di ti hη decreases if direction change.

Gradient Descent

RPropRProp

10- Subject to: and

SCALED RPROP

Adaptive Estimation of Ti-Init and Ti-Max

Ti-Init and Ti-Max are estimated by bounding

the change to the output 11output

SCALED RPROP WITH PARTIAL BACKTRACKINGInit

Network (hierarchal clustering)Loop (iSRProp)

C t ight d i tiCompute weights derivativesUpdate network

For every neuron jCompute all and derivatives.

Update network using Rpropif Error Increases

R ll b k 25% f h l d j• Roll back 25% of the last updates to neuron j.End if

End forUntil Convergence. 12

DATASETS

Data Set # of samples # of classes # of test Samples # of Features

MNIST 60,000 10 10,000 784

USPS 7291 10 2,007 256

TSS 93550 2 N/A 1024

SO 6238 26 1 9 61ISOLET 6238 26 1559 617

Wis. Breast Cancer 569 2 N/A 32

P i 17766 3 6621 357Protein 17766 3 6621 357

SatImage 4435 6 2000 36

13

ISRPROP VS IRPROP+ VS BPVSISRPROP VS. IRPROP+ VS. BPVS

14(a) USPS net: 100 neurons, (b) MNIST net: 100 neurons, (c) TSS net: with 30 neurons, (d) Breast Cancer net: 40 neurons, (e) Protein net: 30 neurons, and (f) Satimage net: 60 neurons

REGULARIZATION (ANTI – OVER-FITTING)

Simple models need less examples to approximate

Statistical Learning Theory: Generalization 1 / ComplexityStatistical Learning Theory: Generalization 1 / Complexity

15

REDUCED HYPERBF

16

HyperBF

Reduced HyperBF

17

RESULTS

CV Error% Test Error%Data Set k-Folds

%HBF R-HBF SVM HBF R-HBF SVM

USPS 10 2.47 1.37 1.74 5.83 4.38 4.78MNIST 5 3.33 2.29 1.52 3.23 2.05 1.42ISOLET 10 4.44 3.03 2.45 6.54 3.78 3.21

Breast Cancer 10 4.04 1.67 1.93 N/A N/A N/AProtein 10 38.61 32.03 29.56 38.07 29.9 29.9

Satimage 10 9.8 8.71 7.86 10.7 9.5 8.8

TSS Validation auROC%HBF R-HBF SVM88.5 94.06 94.42

18

COMPARISON OF MODEL STRUCTURE

D t S t# of support

t # factive dims

i R HBF %~Size R tiDataSet vectors # of neurons in R-HBF % Ratio

USPS 1464 200 0.36 1:10MNIST 16523 200 0.24 1:172ISOLET 3956 260 0.29 1:26

Breast Cancer 79 40 0.084 1:12Protein 12019 30 0.22 1:910

SATIMAGE 1322 60 0.46 1:24

TSS 14554 30 0.13 1:1900

MNIST-HBF is about 172 times smallerMNIST HBF is about 172 times smaller

19

SENSITIVITY TO REGULARIZATIONPARAMETERS

20a) ISOLET, b) USPS, and c) Protein. Stared boxes are the ones with the highest accuracy

AVAILABILITY

HyperBF optimization tool with source code are made available at : http://bioinformatics.louisville.edu/HyperBFLib.html

HyperBFLib is developed and the important classes are:RH BFN t 2Cl T i k f l blRHyperBFNet_2Class: Train networks for two class problems.RHyperBFNet_MultiClass: Train networks for multi-class problems.HeirarchalAgglomerative: Perfrom hierarchal clustering with moving centers

DataLoader; load or save data objects of different type including arrays DataLoader; load or save data objects of different type including arrays, clusters, and objects.USPS_Client: A sample implementation to use the above classes to training HyperBF to classify the USPS dataset in tow cases: Multi-class and two class classification.

Formatted USPS Dataset is made available as example of formatting data.

For further question write the package, send email to: [email protected]

CONCLUSION

iSRprop is shown to be practical and convergent optimization f method for training HyperBF networks

The proposed regularization improved the generalization of H BF t k i ifi tlHyperBF networks significantly

Reduced HyperBF is shown to be competitive to SVM with significantly smaller model structure (1-3 orders of magnitude)

Reduced HyperBF is shown to facilitate higher level analysis..

22

Documents

Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer