Upload
mervin-rose
View
212
Download
0
Embed Size (px)
Citation preview
Challenge Submissions for the Feature Extraction Class Georg Schneider ([email protected])
my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'});
my_model=chain({normalize, s2n('f_max=1000'), my_classif});
MISSION
In the class, different algorithms for feature extraction and selection were presented. To get practical experience with the methods, we experimented on real datasets taken from the NIPS 2003 feature selection challenge. Starting from a given baseline model, different algorithms and modifications were tried. The goal was to outperform the baseline model or even the best challenge entry.
A Matlab® framework was provided which contained code for different learning objects. Because of its modular structure, it was convenient to build models from different algorithms and to try different combinations of them.
COMPARISONDATASET
- Do feature selection before normalization.- Smooth image before feature selection- Find optimal number of features using cross-validation- Tune classifier by modifying its parameters
BASELINE MODEL IMPROVEMENTS MY MODEL
GISETTEmy_classif=svc({'coef0=0.5', 'degree=5', 'gamma=0', 'shrinkage=1'});my_model=chain({convolve(exp_ker({'dim1=13', 'dim2=13'})), s2n('f_max=2000'), normalize, my_classif})
my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.1'});
my_model=chain({s2n('f_max=300'), normalize, my_classif})
DEXTER
my_classif=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'});
my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif})
MADELON
my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.1'});
my_model=chain({standardize, s2n('f_max=1100'), normalize, my_svc})
ARCENE
- Use training- and validation-set for training- Find optimal number of features using cross-validation- Vary shrinkage to further improve error
- Experiment with probe method using a relief filter (no better results)- Increase width of the rbf-kernel
- Use training- and validation-set for training- Adjust number of features (not much effect)- Increase shrinkage
- Keep more features with TP- Chain with s2n feature selection to further decrease number of features
my_model=chain({TP('f_max=1000'), naive, bias});DOROTHEA
my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.2'});my_model=chain(s2n('f_max=4000'), normalize, my_classif})
my_classif=svc({'coef0=1', 'degree=0', 'gamma=0.5', 'shrinkage=1'});
my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif})
my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.9'});my_model=chain({standardize, s2n('f_max=1000'), normalize, my_svc})
my_model=chain({TP('f_max=2000'), normalize, s2n('f_max=800'), naive, bias});
CONCLUSION
Base Best My
00.5
11.5
2Balanced Error Rate
Base Best My
0
2
4
6
Balanced Error Rate
Base Best My
5.56
6.57
7.5Balanced Error Rate
Base Best My
05
101520
Balanced Error Rate
Base Best My
05
101520
Balanced Error Rate
Feature selection is crucial for the performance of classifiers. The assessment of feature significance leads to better generalization and thus to a smaller error rate. Even a simple feature selection criterion as the signal-to-noise ratio can result in better classification of the data. With the GISETTE dataset, prior knowledge about the data enabled us to use specialized methods (smoothing) to obtain better performance.Further work can be done in analysing the structure of datasets, to find good performing models for a specific type of data.
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
NEW YORK, October 2, 2001 – Instinet Group Incorporated (Nasdaq: INET), the world’s largest electronic agency securities broker, today announced tha
0 2000 4000 6000 8000 10000 12000 14000 160000
10
20
30
40
50
60
70
80
90
100