Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
REDUCED HYPERBF NETWORKS: REDUCED HYPERBF NETWORKS: REGULARIZATION BY EXPLICIT COMPLEXITYREDUCTION AND SCALED RPROP BASED TRAINING
Rami N. MahdiEric C. Rouchka
Bi i f ti L b
1
Bioinformatics LabDepartment of Computer Engineering and Computer ScienceUniversity of Louisville
PATTERN RECOGNITION
Classify data samples based either on:A priori knowledgeStatistical information extracted from available labeled data
Different methods learn the boundaries using different approachesapproaches. 2
SUPPORT VECTOR MACHINE
Transform samples to a new spaceFind points at the boundaryMaximize the separation margin
3
RBF - NNLearn significant clustersClass samples are distinctively described by a sum of weighted Gaussians
4
RBF - NN
1
Diagonal Scaling Matrices Full Scaling Matrices
Results are interpretableSignificant neurons represent significant clusters
5
HyperBF Networks
Regular RBF HyperBF
6
Locally Scaled RBF (HyperBF)
7Simplified Notation
HYPERBF-NN (ELLIPTICAL GAUSSIANS)
TrainingPerform clustering Initialize neuronsInitialize weightsEstimates all variables simultaneously using gradient optimization
11
8
CHALLENGES
Challenging Optimizationg g pExample: (MNIST hand written digits: 748 features)
100 neurons would contain 156900 parameters.
Optimization Function Not Convexp
Over Fitting (very complex model)
9
RPROP ALGORITHM
It uses a separate learning factor for every variableUses the direction of the first derivative and not the magnitude
η increases if the direction of the derivative stays the same from previous iteration.
d if di ti hη decreases if direction change.
Gradient Descent
RPropRProp
10- Subject to: and
SCALED RPROP
Adaptive Estimation of Ti-Init and Ti-Max
Ti-Init and Ti-Max are estimated by bounding
the change to the output 11output
SCALED RPROP WITH PARTIAL BACKTRACKINGInit
Network (hierarchal clustering)Loop (iSRProp)
C t ight d i tiCompute weights derivativesUpdate network
For every neuron jCompute all and derivatives.
Update network using Rpropif Error Increases
R ll b k 25% f h l d j• Roll back 25% of the last updates to neuron j.End if
End forUntil Convergence. 12
DATASETS
Data Set # of samples # of classes # of test Samples # of Features
MNIST 60,000 10 10,000 784
USPS 7291 10 2,007 256
TSS 93550 2 N/A 1024
SO 6238 26 1 9 61ISOLET 6238 26 1559 617
Wis. Breast Cancer 569 2 N/A 32
P i 17766 3 6621 357Protein 17766 3 6621 357
SatImage 4435 6 2000 36
13
ISRPROP VS IRPROP+ VS BPVSISRPROP VS. IRPROP+ VS. BPVS
14(a) USPS net: 100 neurons, (b) MNIST net: 100 neurons, (c) TSS net: with 30 neurons, (d) Breast Cancer net: 40 neurons, (e) Protein net: 30 neurons, and (f) Satimage net: 60 neurons
REGULARIZATION (ANTI – OVER-FITTING)
Simple models need less examples to approximate
Statistical Learning Theory: Generalization 1 / ComplexityStatistical Learning Theory: Generalization 1 / Complexity
15
REDUCED HYPERBF
16
HyperBF
Reduced HyperBF
17
RESULTS
CV Error% Test Error%Data Set k-Folds
%HBF R-HBF SVM HBF R-HBF SVM
USPS 10 2.47 1.37 1.74 5.83 4.38 4.78MNIST 5 3.33 2.29 1.52 3.23 2.05 1.42ISOLET 10 4.44 3.03 2.45 6.54 3.78 3.21
Breast Cancer 10 4.04 1.67 1.93 N/A N/A N/AProtein 10 38.61 32.03 29.56 38.07 29.9 29.9
Satimage 10 9.8 8.71 7.86 10.7 9.5 8.8
TSS Validation auROC%HBF R-HBF SVM88.5 94.06 94.42
18
COMPARISON OF MODEL STRUCTURE
D t S t# of support
t # factive dims
i R HBF %~Size R tiDataSet vectors # of neurons in R-HBF % Ratio
USPS 1464 200 0.36 1:10MNIST 16523 200 0.24 1:172ISOLET 3956 260 0.29 1:26
Breast Cancer 79 40 0.084 1:12Protein 12019 30 0.22 1:910
SATIMAGE 1322 60 0.46 1:24
TSS 14554 30 0.13 1:1900
MNIST-HBF is about 172 times smallerMNIST HBF is about 172 times smaller
19
SENSITIVITY TO REGULARIZATIONPARAMETERS
20a) ISOLET, b) USPS, and c) Protein. Stared boxes are the ones with the highest accuracy
AVAILABILITY
HyperBF optimization tool with source code are made available at : http://bioinformatics.louisville.edu/HyperBFLib.html
HyperBFLib is developed and the important classes are:RH BFN t 2Cl T i k f l blRHyperBFNet_2Class: Train networks for two class problems.RHyperBFNet_MultiClass: Train networks for multi-class problems.HeirarchalAgglomerative: Perfrom hierarchal clustering with moving centers
DataLoader; load or save data objects of different type including arrays DataLoader; load or save data objects of different type including arrays, clusters, and objects.USPS_Client: A sample implementation to use the above classes to training HyperBF to classify the USPS dataset in tow cases: Multi-class and two class classification.
Formatted USPS Dataset is made available as example of formatting data.
For further question write the package, send email to: [email protected]
CONCLUSION
iSRprop is shown to be practical and convergent optimization f method for training HyperBF networks
The proposed regularization improved the generalization of H BF t k i ifi tlHyperBF networks significantly
Reduced HyperBF is shown to be competitive to SVM with significantly smaller model structure (1-3 orders of magnitude)
Reduced HyperBF is shown to facilitate higher level analysis..
22