22
REDUCED HYPERBF NETWORKS: REDUCED HYPERBF NETWORKS: REGULARIZATION BY EXPLICIT COMPLEXITY REDUCTION AND SCALED RPROP BASED TRAINING Rami N. Mahdi Eric C. Rouchka Bi i f ti Lb 1 Bioinformatics Lab Department of Computer Engineering and Computer Science University of Louisville

Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

REDUCED HYPERBF NETWORKS: REDUCED HYPERBF NETWORKS: REGULARIZATION BY EXPLICIT COMPLEXITYREDUCTION AND SCALED RPROP BASED TRAINING

Rami N. MahdiEric C. Rouchka

Bi i f ti L b

1

Bioinformatics LabDepartment of Computer Engineering and Computer ScienceUniversity of Louisville

Page 2: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

PATTERN RECOGNITION

Classify data samples based either on:A priori knowledgeStatistical information extracted from available labeled data

Different methods learn the boundaries using different approachesapproaches. 2

Page 3: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

SUPPORT VECTOR MACHINE

Transform samples to a new spaceFind points at the boundaryMaximize the separation margin

3

Page 4: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

RBF - NNLearn significant clustersClass samples are distinctively described by a sum of weighted Gaussians

4

Page 5: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

RBF - NN

1

Diagonal Scaling Matrices Full Scaling Matrices

Results are interpretableSignificant neurons represent significant clusters

5

Page 6: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

HyperBF Networks

Regular RBF HyperBF

6

Page 7: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

Locally Scaled RBF (HyperBF)

7Simplified Notation

Page 8: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

HYPERBF-NN (ELLIPTICAL GAUSSIANS)

TrainingPerform clustering Initialize neuronsInitialize weightsEstimates all variables simultaneously using gradient optimization

11

8

Page 9: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

CHALLENGES

Challenging Optimizationg g pExample: (MNIST hand written digits: 748 features)

100 neurons would contain 156900 parameters.

Optimization Function Not Convexp

Over Fitting (very complex model)

9

Page 10: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

RPROP ALGORITHM

It uses a separate learning factor for every variableUses the direction of the first derivative and not the magnitude

η increases if the direction of the derivative stays the same from previous iteration.

d if di ti hη decreases if direction change.

Gradient Descent

RPropRProp

10- Subject to: and

Page 11: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

SCALED RPROP

Adaptive Estimation of Ti-Init and Ti-Max

Ti-Init and Ti-Max are estimated by bounding

the change to the output 11output

Page 12: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

SCALED RPROP WITH PARTIAL BACKTRACKINGInit

Network (hierarchal clustering)Loop (iSRProp)

C t ight d i tiCompute weights derivativesUpdate network

For every neuron jCompute all and derivatives.

Update network using Rpropif Error Increases

R ll b k 25% f h l d j• Roll back 25% of the last updates to neuron j.End if

End forUntil Convergence. 12

Page 13: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

DATASETS

Data Set # of samples # of classes # of test Samples # of Features

MNIST 60,000 10 10,000 784

USPS 7291 10 2,007 256

TSS 93550 2 N/A 1024

SO 6238 26 1 9 61ISOLET 6238 26 1559 617

Wis. Breast Cancer 569 2 N/A 32

P i 17766 3 6621 357Protein 17766 3 6621 357

SatImage 4435 6 2000 36

13

Page 14: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

ISRPROP VS IRPROP+ VS BPVSISRPROP VS. IRPROP+ VS. BPVS

14(a) USPS net: 100 neurons, (b) MNIST net: 100 neurons, (c) TSS net: with 30 neurons, (d) Breast Cancer net: 40 neurons, (e) Protein net: 30 neurons, and (f) Satimage net: 60 neurons

Page 15: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

REGULARIZATION (ANTI – OVER-FITTING)

Simple models need less examples to approximate

Statistical Learning Theory: Generalization 1 / ComplexityStatistical Learning Theory: Generalization 1 / Complexity

15

Page 16: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

REDUCED HYPERBF

16

Page 17: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

HyperBF

Reduced HyperBF

17

Page 18: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

RESULTS

CV Error% Test Error%Data Set k-Folds

%HBF R-HBF SVM HBF R-HBF SVM

USPS 10 2.47 1.37 1.74 5.83 4.38 4.78MNIST 5 3.33 2.29 1.52 3.23 2.05 1.42ISOLET 10 4.44 3.03 2.45 6.54 3.78 3.21

Breast Cancer 10 4.04 1.67 1.93 N/A N/A N/AProtein 10 38.61 32.03 29.56 38.07 29.9 29.9

Satimage 10 9.8 8.71 7.86 10.7 9.5 8.8

TSS Validation auROC%HBF R-HBF SVM88.5 94.06 94.42

18

Page 19: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

COMPARISON OF MODEL STRUCTURE

D t S t# of support

t # factive dims

i R HBF %~Size R tiDataSet vectors # of neurons in R-HBF % Ratio

USPS 1464 200 0.36 1:10MNIST 16523 200 0.24 1:172ISOLET 3956 260 0.29 1:26

Breast Cancer 79 40 0.084 1:12Protein 12019 30 0.22 1:910

SATIMAGE 1322 60 0.46 1:24

TSS 14554 30 0.13 1:1900

MNIST-HBF is about 172 times smallerMNIST HBF is about 172 times smaller

19

Page 20: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

SENSITIVITY TO REGULARIZATIONPARAMETERS

20a) ISOLET, b) USPS, and c) Protein. Stared boxes are the ones with the highest accuracy

Page 21: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

AVAILABILITY

HyperBF optimization tool with source code are made available at : http://bioinformatics.louisville.edu/HyperBFLib.html

HyperBFLib is developed and the important classes are:RH BFN t 2Cl T i k f l blRHyperBFNet_2Class: Train networks for two class problems.RHyperBFNet_MultiClass: Train networks for multi-class problems.HeirarchalAgglomerative: Perfrom hierarchal clustering with moving centers

DataLoader; load or save data objects of different type including arrays DataLoader; load or save data objects of different type including arrays, clusters, and objects.USPS_Client: A sample implementation to use the above classes to training HyperBF to classify the USPS dataset in tow cases: Multi-class and two class classification.

Formatted USPS Dataset is made available as example of formatting data.

For further question write the package, send email to: [email protected]

Page 22: Rami N. Mahdi 1 Eric C. Rouchkabioinformatics.louisville.edu/localresources/software/HyperBFLib.pdfRami N. Mahdi Eric C. Rouchka Bi i f ti L b 1 Bioinformatics Lab Department of Computer

CONCLUSION

iSRprop is shown to be practical and convergent optimization f method for training HyperBF networks

The proposed regularization improved the generalization of H BF t k i ifi tlHyperBF networks significantly

Reduced HyperBF is shown to be competitive to SVM with significantly smaller model structure (1-3 orders of magnitude)

Reduced HyperBF is shown to facilitate higher level analysis..

22