Upload
jeremy-barrie-norton
View
214
Download
0
Embed Size (px)
Citation preview
Visual Media
Capture of shape and appearance of real objects and people
Sign languagerecognition
Preservation of cultural artefacts
3D Broadcastproduction
Animation
Medical Imaging and Remote Medical Imaging and Remote SensingSensing
3D MRI image analysis (brain tumour detection) Alzheimer’s condition diagnosis (PET brain imaging) 2D-3D Elastic image matching
3D liver reconstruction
Microcalcificationdetection
Vascularreconstruction
Seismic
Pipelinedetection
Robot VisionRobot Vision
Visual learning Scene interpretation Model selection Control of
perception
3D object recognition from 2D views
Vision based navigation
Target detection
Visual surveillance
Multimedia Signal Processing and Multimedia Signal Processing and InterpretationInterpretation
VOICE
FACE
LIPS
Fusion
Biometrics
Image/Video Retrieval
Ensemble MLP classifier DesignEnsemble MLP classifier Design
Terry Windeatt University of Surrey, UK
IntroductionIntroduction
Ensembles – Multiple Classifier Systems (MCS)
Ensemble Multi-layer Perceptron Architecture
Tuning Base Classifiers using measures & OOB estimate
Multi-class ECOC using OOB
Feature Selection and Feature Ranking
Face Recognition
SINGLE CLASSIFIER APPROACH
Goals assign a pattern to one of several classes
find best possible feature settraining setlearning machine structure & parameters
Task is especially difficult when
number of classes is high
classes highly overlapped in feature space
training samples are few and very noisy
Learning is ill-posed problem & requires built-in assumptions
Multi-layer Perceptron (MLP)Multi-layer Perceptron (MLP)
Input layer Hidden layer Output layer
Unstable Base Classifier from random starting weights
#hidden nodes varies complexity &
#epochs varies degree of training
MLPClassifier 1
MLPClassifier 2
MLPClassifier B
Combiner
1
2
B
MCS ArchitectureMCS Architecture
Idea is to use multiple simple MLPs rather than single complex MLP
Bias/Variance 0/1 loss function more complex than regression
-ensemble reduces variance & tuning base classifier reduces bias
Multiple Classifiers (MCS)Multiple Classifiers (MCS)
MCS based upon:• finding classifiers that perform well but diversely• appropriate combining strategy
Techniques:• Different types of classifiers• Different parameters same classifier • Different unstable base classifiers e.g MLP • Different Feature Sets e.g Random Subspace• Different Training Sets e.g. Bagging/Boosting• Different class labeling e.g. ECOC
Measures of Diversity• Accuracy/Diversity Dilemma
Base Classifier Parameter Base Classifier Parameter TuningTuning
Importance of Parameter Tuning Every researcher seems to get good results but how?
Need to measure sensitivity to parameters Helps understand significance of results
Requires systematic change of parameters
How to set parameters? Alternates to validation set or cross-validation techniques
Out-of-Bootstrap (OOB)Out-of-Bootstrap (OOB)
Bootstrapping – Sample with Replacement
Promotes diversity among classifiers
OOB provides alternative to validation
Base classifier OOB uses training patterns left outapprox one third
Ensemble OOB uses classifiers left outapprox one third
= f(Xm), where m = 1,…. is number of patterns
xmi and {0,1}, i = 1 …B
•2-class problem
•B parallel base classifiers
• incompletely specified & noisy function
BINARY-TO-BINARY MAPPING
),,,( 21 mBmmm xxxX
CLASS SEPARABILITY MEASURE
2-CLASS
calculated over pairs of patterns chosen from different classes
Example: 1 indicates correct classification
0 0 1 1 1 0 0 1 1 0 class 1
1 1 1 0 1 0 0 1 1 0 class 2
B
j
bqjpj
aabpqN
1
)(~
CLASS SEPARABILITY MEASURE
calculated over pairs of patterns p & q chosen from different classes
a,b{0,1}
2
1
00
1
1
11 ~~
K
N
K
Nq
pqq
pq
p
,, 01 yy
PAIR-WISE DIVERSITY MEASURES Q
bmj
m
ami
abijN
1
,, 01 yy
a,b{0,1}
Use counts between classifier pairs:
Giving N11 N10 N01 N00
10010011
10010011
NNNN
NNNNQ ji
EXPERIMENTS 2-CLASSEXPERIMENTS 2-CLASS
100 single hidden-layer MLP base classifiers
Levenberg-Marquardt training, default parameters
Systematic variation of epochs and nodes
Different random starting weights + bootstrapping
Datasets random 20/80 train/test split (10 runs)– with added classification noise to encourage overfitting
DATASET #pat #class #con #dis
cancer 699 2 0 9
card 690 2 6 9
credita 690 2 3 11
diabetes 768 2 8 0
heart 920 2 5 30
ion 351 2 31 3
vote 435 2 0 16
Figure : Mean test error rates, OOB estimates, measures , Q for Diabetes 20/80 with [2,4,8,16] nodes
mean test error, , Q over seven 20/80 two-class datasets using 8 hidden-node bootstrapped base classifiers for [0,20,40] % noise
MULTI-CLASS ECOCMULTI-CLASS ECOC
Coding step: – Map training patterns into two super-classes according to 1’s and
0’s in ECOC matrix Z
Train base classifier on 2-class decompositions
Decoding step: – Assign test pattern according to minimum distance to row of
ECOC matrix Z
MULTI-CLASSMULTI-CLASSECOC CODE MATRIXECOC CODE MATRIX
Example ECOC matrix:Example ECOC matrix:
0 1 1 1 .............0 1 1 1 .............1 0 0 0 .............1 0 0 0 .............0 1 0 1 .............0 1 0 1 .............1 0 1 0 .............1 0 1 0 .............1 1 0 1 .............1 1 0 1 .............0 0 1 0 .............0 0 1 0 .............
each row is a code wordeach row is a code wordeach column defines two super-classeseach column defines two super-classes
6 classes
Distance-based decoding rules Distance-based decoding rules (e.g. Hamming, L(e.g. Hamming, L11))
10…1
10…1 1
3
2
Pattern Space ECOC Ensemble Target Classes
MLP
MLP
MLP
01…0
11…1
*** OOB uses only classifiers that are not used in training
Experiments Multi-classExperiments Multi-class
200 base classifiers
Random ECOC matrices
20/80 train/test split repeated 10 times
Levenberg-Marquardt training algorithm
DATASET #pat #class #con #dis
dermatology 366 6 1 33
ecoli 336 8 5 2
glass 214 6 9 0
iris 150 3 4 0
segment 2310 7 19 0
soybean 683 19 0 35
thyroid 7200 3 6 15
vehicle 846 4 18 0
vowel 990 11 10 1
wave 5000 3 21 0
yeast 1484 10 7 1
Yeast 2/4/8/16 nodes 1-69 epochs
Feature RankingFeature Ranking Intended for large number of features – small sample One vs multi-dimensional Context of MCS - base classifier vs combiner Simple one-dim methods Sophisticated multi-dim search methods
Modulus of MLP weights – ‘product of weights’
j
jiji WWw 21
W1 is the first layer weight matrix and W2 is the output weight vector
Recursive Feature EliminationRecursive Feature Elimination
Simple algorithm for eliminating irrelevant features and operates recursively as follows:
1) Rank the features according to a suitable feature ranking method
2) Identify and remove the r least ranked features
If r>1, usually desirable from an efficiency viewpoint, a feature subset ranking is obtained.
Mean test error rates, Bias, Variance for RFE MLP ensemble over seven 2-class Datasets 20/80, 10/90. 5/95 train/test split
Yeast RFE for [20/80 10/90 5/95]
Face Recognition - ORL DatabaseFace Recognition - ORL Database
400 images of forty faces - 40 class identification problem
Variation in lighting,facial hair, pose ….
Controlled background with subjects upright frontal
No need for face detection so fair comparison
We use 40-dim PCA + 20-dim LDA
Random 50/50 train/test split
16 hidden node MLP L-M base classifiers (x200)
Expts repeated twenty times with 40 x 200 ECOC code matrix
ORL Database - Results
Test error, , Q for ORL 50/50 database using 16 hidden-node base classifiers for [0,20,40] %
classification noise.
Facial Action Unit (FACS)Facial Action Unit (FACS)
Difficult because depends on age, ethnicity, gender, and
occlusions due to cosmetics, hair, glasses
FACS categorises deformation and motion into visual classes
Decouples interpretation from individual actions
Requires skilled practitioners
Small sample size problem
– Large #features and small #training pats
Cohn-Kanade DatabaseCohn-Kanade Database
• frontal camera from 100 university students
• contains posed (as opposed to the more difficult spontaneous) expression sequences
• only the last image is au coded.
• combinations of aus, in some cases non-additive
•Upper face aus au1 au2 au4 au5 au6 au7
Design DecisionsDesign Decisionsa) All image sequences of size 640 x 480 chosen from the database
b) Last image in sequence (no neutral) giving 424 images, 115 containing au1
c) Full image resolution, no compression
d) Manually located eye centres plus rotation/scaling into 2 common eye
coordinates
e) Window extracted of size 150 x 75 pixels centred on eye coordinates
f) Forty Gabor filters [18], five special frequencies at five orientations with top 4
principle components for each Gabor filter, 160-dimensional feature vector
g) Comparison of feature selection schemes described in Section 3
h) Comparison of MLP ensemble and Support Vector Classifier
i) Random training/test split of 90/10 and 50/50 repeated twenty times and
averaged
ID sc1 sc2 sc3 sc4 sc5 sc6
superclass {} 1,2 1,2,5 4 6 1,4
#patterns 149 21 44 26 64 18
sc7 sc8 sc9 sc10 sc11 sc12
1,4,7 4,7 4,6,7 6,7 1 1,2,4
10 39 16 7 6 4
ECOC super-classes of action units and number of patterns
2-classError %
2-classROC
ECOCError %
ECOCROC
au1 8.0/16/28 0.97/16/36 9.0/4/36 0.94/4/17
au2 2.9/1/22 0.99/16/36 3.2/16/22 0.97/1/46
au4 8.5/16/36 0.95//16/28 9.0/1/28 0.95/4/36
au5 5.5/1/46 0.97/1/46 3.5/1/36 0.98/1/36
au6 10.3/4/36 0.94/4/28 12.5/4/28 0.92/1/28
au7 10.3/1/28 0.92/16/60 11.6/4/46 0.92/1/36
mean 7.6 0.96 8.1 0.95
Table 3: Mean best error rates (%) and area under ROC showing #nodes /#features for au classification 90/10 with optimized PCA features and MLP ensemble
ConclusionConclusion
Measures may be used to optimise base classifier
parameters without validation
OOB estimate can select optimal features
– Even for Ensemble OOB
Multi-class uses OOB with ECOC
Modulus of MLP weights is simple feature
ranking that works well with RFE
THANK YOU THANK YOU
Feature ranking schemes comparedFeature ranking schemes compared
RFE with MLP weights RFE with noisy bootstrap
– Extends training set by resampling with noise Boosting single feature each iteration One-dimensional class-separability
– Trace(SW-1 *SB) Within & Between class scatter
SFFS (Sequential Floating Forward Search)
perceptron-ensemble classifier
rfenn rfenb 1dim SFFS boost
Mean20/80 15.1 14.6 14.2 15.4 15.4
Mean10/90 16.3 16.3 16.6 18.0 17.6
Mean5/95 18.4 18.5 20.0 21.3 21.3
Table : Mean best error rates for seven two-class problems (20/80, 10/90, 5/95 train/test ) with five feature-ranking schemes
The extended M2VTS (XM2VTS) database
• Contains 295 subjects
• Recorded in four separate sessions over 5 months
• Experimental protocol assigns 200 clients and 95 impostors.
• 3 training, 3 evaluation and 2 test images.
• Impostor set partitioned into 25 evaluation and 70 test impostors
• Features are extracted using PCA + 199-dim LDA
Distance based combination
Use ECOC with 200 x 512 matrix
To test client claim is authentic use average distance (L1 Norm) between vector y and the elements of set of class i
N
l
b
jjj
li yy
Nyd
1 1
1)(
where yj is the jth binary classifier output, and ylj is the jth
classifier output for the lth member of class i.
distance is checked against a decision threshold
FA 1.3% FR 0.8%
16 node MLP-ensemble classifier
rfenn rfenb 1dim SFFS boost
10.0/28 10.9/43 10.9/43 12.3/104 11.9/43
Linear SVC classifier
rfesvc rfenb- 1dim SFFS boost
11.6/28 12.1/28 11.9/67 13.9/67 12.4/43
Mean best error rates (%)/number of Gabor features for au1 classification 90/10 with five feature ranking schemes
Windeatt T. and Ghaderi R., Coding and Decoding Strategies for multiclass learning problems, Information Fusion, 4(1), 2003, pp 11-21.
Windeatt T, Vote Counting Measures for Ensemble Classifiers, Pattern Recognition, 36(12), 2003, pp 2743-2756.
J. Kittler, R. Ghaderi, T. Windeatt and J. Matas Face verification via error correcting output codes, Image and Vision Computing, Volume 21, Issues 13-14, 1 December 2003, Pages 1163-1169.
T. Windeatt, Diversity Measures for Multiple Classifier System Analysis and Design, Information Fusion, 6 (1), 2004, 21-36.
T. Windeatt, Accuracy/ Diversity and Ensemble Classifier Design, IEEE Trans Neural Networks, 17(4), July, 2006.
R. S. Smith, T. Windeatt, Decoding Rules for ECOC, Proc. 6th Int. Workshop Multiple Classifier Systems, Editors: N. C. Oza, R. Polikar, J. Kittler, F. Roli, Seaside, Calif, USA, June 2005, Lecture notes in computer science, Springer-Verlag, pp 53-63.
M. Prior, T. Windeatt, Over-fitting in Ensembles of Neural Network Classifiers within ECOC frameworks, Proc. 6th Int. Workshop Multiple Classifier Systems, Editors: N. C. Oza, R. Polikar, J. Kittler, F. Roli, Seaside, Calif, USA, June 2005, Lecture notes in computer science, Springer-Verlag, pp 286-295.
T. Windeatt, Ensemble Neural Classifier Design for Face Recognition, European Symposium on Artificial Neural Networks, ESANN2007, Bruges, April 2007.
T. Windeatt, M. Prior, Stopping Criteria for Ensemble-based Feature Selection, Proc. 7th Int. Workshop Multiple Classifier Systems, Prague May 2007, Lecture notes in computer science, Springer-Verlag, pp
T. Windeatt, M. Prior, N. Effron, N. Intrator, Ensemble-based Feature Selection Criteria, Proc. Conference on Machine Learning Data Mining MLDM2007, Leipzig, July 2007.