Upload
ntnu
View
164
Download
3
Embed Size (px)
Citation preview
0.5setgray0
0.5setgray1
0.5setgray.70
0.5setgray.90
Selective Gaussian Naive Bayes Model forDiffuse Large-B-Cell Lymphoma Classification:Some Improvements in Preprocessing and Variable Elimination
Barcelona, July 2005Andres Cano, F. Javier Garcıa, Andres Masegosa and Serafın Moral
Dept. Computer Science and Artificial Intelligence
University of Granada
Slide . . ..– p. 1/13
Introduction: Gene Expression Data
MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.
It’s a micromatrix.
Each row contains geneticmaterial of a given gene.
Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).
The micromatrix is scannedand the data is obtained.
Hybridization of LymphochipSlide . . ..– p. 2/13
Introduction: Gene Expression Data
MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.
It’s a micromatrix.
Each row contains geneticmaterial of a given gene.
Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).
The micromatrix is scannedand the data is obtained.Hybridization of Lymphochip
Slide . . ..– p. 2/13
Diffuse Large-B-Cell Lymphoma Classification
The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.
Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.
They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.
The problem is, using this sort of data sets:
Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.
Find a minimum subset of genes that make thisclassification.
Slide . . ..– p. 3/13
Diffuse Large-B-Cell Lymphoma Classification
The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.
Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.
They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.
The problem is, using this sort of data sets:
Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.
Find a minimum subset of genes that make thisclassification.
Slide . . ..– p. 3/13
Bayesian Classification of gene expression
Data Domain:
Continuous Data
p(X1|C) C1 C2
x1
10,3 0,8
x2
10,7 0,2
Discretized Data
Data Dependences:
C
X1 X2 X3
Naive Bayes Structure
C
X1 X2 X3
TAN Structure
Slide . . ..– p. 4/13
Bayesian Classification of gene expression
Data Domain:
Continuous Data
p(X1|C) C1 C2
x1
10,3 0,8
x2
10,7 0,2
Discretized Data
Data Dependences:
C
X1 X2 X3
Naive Bayes Structure
C
X1 X2 X3
TAN Structure
Slide . . ..– p. 4/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
−Select the best features using a reasonable criterion.−Use a independent criterion.−Advantage: Very efficiency.−Problem: The criterion is not associated to the problem.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
−Select the best features using a final criterion.−For each subset of features, try to solve the problem.−Advantage: It is very powerful.−Problem: It is very time cosuming.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Feature Selection with Gene Expression Data
These data sets have:
High Dimensionality:4000 and 20000 genes.
Low number of Cases:40 and 200 cases.
FSS Problems:
High Risk Overfitting
Low reliability results.
Solutions:
Filter methods.
Wrapper methods.
Filter Method + Wrapper Method
Slide . . ..– p. 5/13
Preordering the features
FSS Search:Non Selected Features
C
X1 X2 X3 X4
X5 X6 X7 X8
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X1 X2 X3 X4
X5 X6 X7 X8X1
Accuracy = 83 %
Step1: Search in Non Selected Features.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X1 X2 X3 X4
X5 X6 X7 X8X2
Accuracy = 89 %
Step1: Search in Non Selected Features.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X1 X2 X3 X4
X5 X6 X7 X8X8
Accuracy = 84 %
Step1: Search in Non Selected Features.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X3
X1 X2 X4
X5 X6 X7 X8
Step 2: Select the best Node.
Accuracy = 91 %
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X3
X1 X2 X4
X5 X6 X7 X8X1
Accuracy = 88 %
Step 3: Follow the Search.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X3
X1 X2 X4
X5 X6 X7 X8X2
Accuracy = 91 %
Step 3: Follow the Search.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X3 X7 X5
X1 X2 X4
X6 X8
Accuracy = 93 %
Until the Stop Condition.
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:Non Selected Features
C
X3 X7 X5
X1 X2 X4
X6 X8
Accuracy = 93 %
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:
C
Non Selected Features
X3
X5
X7
X1
....
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:
C
Non Selected Features
X3
X5
X7
X1
....
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:
C
Non Selected Features
X3
X5
X7
X1
....
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
FSS Search:
C
Non Selected Features
X3
X5
X7
X1
....
Limited Search Space
Changes:
Introduction of preorder in the features
Filter Preorder Accuracy Preorder
Limit the search space to the N-first features
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
Non Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
Non Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X3
Accuracy = 88 %
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
Non Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X5
Accuracy = 88 %
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
Non Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X7
Accuracy = 84 %
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
X3
Step 2: Select the best Node.
Accuracy = 88 %Non Selected Features
Limited Search SpaceX5
X7
X1
X2
X6
....
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
X3
Non Selected Features
Limited Search Space
X5
X7
X1
X2
X6
X4
....
Step 3: Follow the Search.
X5
Accuracy = 89 %
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
X3
Non Selected Features
Limited Search Space
X5
X7
X1
X2
X6
X4
....
Step 3: Follow the Search.
X7
Accuracy = 87 %
Slide . . ..– p. 6/13
Preordering the features
Limited FSS Search:
C
X3 X1 X5
Non Selected Features
Limited Search SpaceX7
X2
X6
X4
....
Until the Stop Condition.
Accuracy = 95 %
Slide . . ..– p. 6/13
Irrelevant Variable Elimination
Heuristic for Irrelevant features:
C
X1 X2 X3
Classifier X
Train Set
C
Z
Classifier Z
Z not irrelevant to X
Slide . . ..– p. 7/13
Irrelevant Variable Elimination
Heuristic for Irrelevant features:C
X1 X2 X3
Classifier X
Train Set
C
Z
Classifier Z
Z not irrelevant to X
Right Classified Non-Right Classified
Slide . . ..– p. 7/13
Irrelevant Variable Elimination
Heuristic for Irrelevant features:C
X1 X2 X3
Classifier X
Train Set
C
Y
Classifier Y
Y irrelevant to X
C
Z
Classifier Z
Z not irrelevant to X
Right Classified Non-Right Classified
Slide . . ..– p. 7/13
Irrelevant Variable Elimination
Heuristic for Irrelevant features:C
X1 X2 X3
Classifier X
Train Set
C
Z
Classifier Z
Z not irrelevant to X
Right Classified Non-Right Classified
Slide . . ..– p. 7/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
Not Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
Not Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X3
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
Not Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X5
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
Not Selected Features
Limited Search Space
X3
X5
X7
X1
X2
X6
....
Step1: Search in Non Selected Features.
X7
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Step 2: Select the best Node.
Not Selected Features
Limited Search SpaceX5
X7
X1
X2
X6
....
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Not Selected Features
Limited Search SpaceX5
X7
X1
X2
X6
....
Step 3: Elimination of Irrelvant Features respect to X3.
X3
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Not Selected Features
Limited Search SpaceX7
X1
X2
X6
....
Step 3: Elimination of Irrelvant Features respect to X3.
X3
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Not Selected Features
Limited Search SpaceX7
X2
X6
....
Step 3: Elimination of Irrelvant Features respect to X3.
X3
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Step 4: Follow the Search.
Not Selected Features
Limited Search Space
X7
X2
X6
X8
X4
X9
....
X7
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Step 4: Follow the Search.
Not Selected Features
Limited Search Space
X7
X2
X6
X8
X4
X9
....
X2
Slide . . ..– p. 8/13
Limited Search with Variable Elimination
Wrapper Search with Variable Elimination:
C
X3
Until the Stop Condition.
X6 X4
X5 X1 X8 X7
Irrelevant Features
X9 X10
Not Selected FeaturesFinal Subset Selected
Slide . . ..– p. 8/13
Classifying Diffuse Large-B-Cell Lymphoma
Data Base I: Taken from the work of Alizadeth et al (2000).
42 samples ( 21 GCB + 21 ABC).
348 genes.
Validation Scheme: Leave-one-out Validation.
Data Base II: Taken from the work of Wright et al (2004).
217 samples (134 GCB + 83 ABC).
8503 genes.
Validation Scheme:
−10 Train and Test sets of equal size.
−Each Train set is reduced by a filter method.
−Number of Filtered Genes: 78,7± 4,4
Slide . . ..– p. 9/13
Classifying Diffuse Large-B-Cell Lymphoma
Data Base I: Taken from the work of Alizadeth et al (2000).
42 samples ( 21 GCB + 21 ABC).
348 genes.
Validation Scheme: Leave-one-out Validation.
Data Base II: Taken from the work of Wright et al (2004).
217 samples (134 GCB + 83 ABC).
8503 genes.
Validation Scheme:
−10 Train and Test sets of equal size.
−Each Train set is reduced by a filter method.
−Number of Filtered Genes: 78,7± 4,4
Slide . . ..– p. 9/13
Classifying Diffuse Large-B-Cell Lymphoma
Data Base I: Taken from the work of Alizadeth et al (2000).
42 samples ( 21 GCB + 21 ABC).
348 genes.
Validation Scheme: Leave-one-out Validation.
Data Base II: Taken from the work of Wright et al (2004).
217 samples (134 GCB + 83 ABC).
8503 genes.
Validation Scheme:
−10 Train and Test sets of equal size.
−Each Train set is reduced by a filter method.
−Number of Filtered Genes: 78,7± 4,4
Slide . . ..– p. 9/13
Experimental Results I
Slide . . ..– p. 10/13
Experimental Results I
Feature Preorder
Data Base Data Random Preorder Filter Preorder Accuracy Preorder
DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1
DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5
DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5
DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0
Slide . . ..– p. 10/13
Experimental Results I
Feature Preorder
Data Base Data Random Preorder Filter Preorder Accuracy Preorder
DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1
DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5
DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5
DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0
Preorder Limit
Data Base Data LFSS
DB1 Accuracy 92,8± 2,1
DB1 No Genes 3,8± 0,3
DB2 Accuracy 91,8± 0,4
DB2 No Genes 7,8± 3,0
Slide . . ..– p. 10/13
Experimental Results I
Feature Preorder
Data Base Data Random Preorder Filter Preorder Accuracy Preorder
DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1
DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5
DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5
DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0
Preorder Limit
Data Base Data LFSS
DB1 Accuracy 92,8± 2,1
DB1 No Genes 3,8± 0,3
DB2 Accuracy 91,8± 0,4
DB2 No Genes 7,8± 3,0
Slide . . ..– p. 10/13
Experimental Results I
Feature Preorder
Data Base Data Random Preorder Filter Preorder Accuracy Preorder
DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1
DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5
DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5
DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0
Preorder Limit
Data Base Data LFSS
DB1 Accuracy 92,8± 2,1
DB1 No Genes 3,8± 0,3
DB2 Accuracy 91,8± 0,4
DB2 No Genes 7,8± 3,0
Slide . . ..– p. 10/13
Experimental Results II
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Filter Preorder vs Accuracy Preorder
Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder
DB1 Accuracy 88,1± 3,3 95,2± 1,4
DB1 No Genes 3,9± 0,1 5,4± 0,1
DB2 Accuracy 90,7± 0,5 93,0± 0,4
DB2 No Genes 7,6± 2,7 8,1± 5,6
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Filter Preorder vs Accuracy Preorder
Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder
DB1 Accuracy 88,1± 3,3 95,2± 1,4
DB1 No Genes 3,9± 0,1 5,4± 0,1
DB2 Accuracy 90,7± 0,5 93,0± 0,4
DB2 No Genes 7,6± 2,7 8,1± 5,6
Slide . . ..– p. 11/13
Experimental Results II
Elimination of Irrelevant Features
Data Base Data LFSS-VE LFSS FSS
BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9
BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5
BD1 No Eval 1882 2840 74900
DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6
DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2
DB2 No Eval 1018 1080 8002
Filter Preorder vs Accuracy Preorder
Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder
DB1 Accuracy 88,1± 3,3 95,2± 1,4
DB1 No Genes 3,9± 0,1 5,4± 0,1
DB2 Accuracy 90,7± 0,5 93,0± 0,4
DB2 No Genes 7,6± 2,7 8,1± 5,6
Slide . . ..– p. 11/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Experimental Results III
Results Comparison
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 38 1 2
GCB 2 57 8
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 3,5 4,8
GCB 3,2 58,8 5,0
Test Dataset
True class Predicted class
ABC GCB Unclass.
ABC 32,7 1,3 7,0
GCB 1,7 57,4 7,9
- Wright et al. Classifier- Validated in one partition- 27 genes selected.
- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.
- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.
Slide . . ..– p. 12/13
Conclusions and Future Work
The wrapper technique is a powerful method in supervisedclassification task.
Its main disadvantage is its high computational cost.
In special, in Gene Expression Data bases due to its highdimensionality.
LFSS-VE solves these disadvantages in the DLBCLclassification using a Preodering of the features and aLimited Search Space.
The elimination of irrelevant features is a good method toenhance the performance of a wrapper method.
The future line of work is the validation of our model withother data sets: breast cancer, colon cancer, leukemia ...
Slide . . ..– p. 13/13