75
0.5 setgray0 0.5 setgray1 Selective Gaussian Naive Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination Barcelona, July 2005 Andr ´ es Cano, F. Javier Garc´ ıa, Andr ´ es Masegosa and Seraf´ ın Moral Dept. Computer Science and Artificial Intelligence University of Granada Slide ....– p. 1/13

Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

  • Upload
    ntnu

  • View
    164

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

0.5setgray0

0.5setgray1

0.5setgray.70

0.5setgray.90

Selective Gaussian Naive Bayes Model forDiffuse Large-B-Cell Lymphoma Classification:Some Improvements in Preprocessing and Variable Elimination

Barcelona, July 2005Andres Cano, F. Javier Garcıa, Andres Masegosa and Serafın Moral

Dept. Computer Science and Artificial Intelligence

University of Granada

Slide . . ..– p. 1/13

Page 2: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Introduction: Gene Expression Data

MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.

It’s a micromatrix.

Each row contains geneticmaterial of a given gene.

Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).

The micromatrix is scannedand the data is obtained.

Hybridization of LymphochipSlide . . ..– p. 2/13

Page 3: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Introduction: Gene Expression Data

MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.

It’s a micromatrix.

Each row contains geneticmaterial of a given gene.

Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).

The micromatrix is scannedand the data is obtained.Hybridization of Lymphochip

Slide . . ..– p. 2/13

Page 4: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Diffuse Large-B-Cell Lymphoma Classification

The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.

Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.

They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.

The problem is, using this sort of data sets:

Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.

Find a minimum subset of genes that make thisclassification.

Slide . . ..– p. 3/13

Page 5: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Diffuse Large-B-Cell Lymphoma Classification

The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.

Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.

They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.

The problem is, using this sort of data sets:

Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.

Find a minimum subset of genes that make thisclassification.

Slide . . ..– p. 3/13

Page 6: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Bayesian Classification of gene expression

Data Domain:

Continuous Data

p(X1|C) C1 C2

x1

10,3 0,8

x2

10,7 0,2

Discretized Data

Data Dependences:

C

X1 X2 X3

Naive Bayes Structure

C

X1 X2 X3

TAN Structure

Slide . . ..– p. 4/13

Page 7: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Bayesian Classification of gene expression

Data Domain:

Continuous Data

p(X1|C) C1 C2

x1

10,3 0,8

x2

10,7 0,2

Discretized Data

Data Dependences:

C

X1 X2 X3

Naive Bayes Structure

C

X1 X2 X3

TAN Structure

Slide . . ..– p. 4/13

Page 8: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 9: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 10: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 11: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 12: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

−Select the best features using a reasonable criterion.−Use a independent criterion.−Advantage: Very efficiency.−Problem: The criterion is not associated to the problem.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 13: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

−Select the best features using a final criterion.−For each subset of features, try to solve the problem.−Advantage: It is very powerful.−Problem: It is very time cosuming.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 14: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Page 15: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 16: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X1

Accuracy = 83 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 17: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X2

Accuracy = 89 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 18: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X8

Accuracy = 84 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 19: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8

Step 2: Select the best Node.

Accuracy = 91 %

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 20: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8X1

Accuracy = 88 %

Step 3: Follow the Search.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 21: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8X2

Accuracy = 91 %

Step 3: Follow the Search.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 22: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X3 X7 X5

X1 X2 X4

X6 X8

Accuracy = 93 %

Until the Stop Condition.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 23: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:Non Selected Features

C

X3 X7 X5

X1 X2 X4

X6 X8

Accuracy = 93 %

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 24: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 25: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 26: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 27: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Limited Search Space

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Page 28: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Slide . . ..– p. 6/13

Page 29: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X3

Accuracy = 88 %

Slide . . ..– p. 6/13

Page 30: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X5

Accuracy = 88 %

Slide . . ..– p. 6/13

Page 31: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X7

Accuracy = 84 %

Slide . . ..– p. 6/13

Page 32: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

X3

Step 2: Select the best Node.

Accuracy = 88 %Non Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Slide . . ..– p. 6/13

Page 33: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

X3

Non Selected Features

Limited Search Space

X5

X7

X1

X2

X6

X4

....

Step 3: Follow the Search.

X5

Accuracy = 89 %

Slide . . ..– p. 6/13

Page 34: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

X3

Non Selected Features

Limited Search Space

X5

X7

X1

X2

X6

X4

....

Step 3: Follow the Search.

X7

Accuracy = 87 %

Slide . . ..– p. 6/13

Page 35: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Preordering the features

Limited FSS Search:

C

X3 X1 X5

Non Selected Features

Limited Search SpaceX7

X2

X6

X4

....

Until the Stop Condition.

Accuracy = 95 %

Slide . . ..– p. 6/13

Page 36: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Irrelevant Variable Elimination

Heuristic for Irrelevant features:

C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Slide . . ..– p. 7/13

Page 37: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Page 38: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Y

Classifier Y

Y irrelevant to X

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Page 39: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Page 40: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

Slide . . ..– p. 8/13

Page 41: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Slide . . ..– p. 8/13

Page 42: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X3

Slide . . ..– p. 8/13

Page 43: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X5

Slide . . ..– p. 8/13

Page 44: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X7

Slide . . ..– p. 8/13

Page 45: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 2: Select the best Node.

Not Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Slide . . ..– p. 8/13

Page 46: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Page 47: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX7

X1

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Page 48: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX7

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Page 49: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 4: Follow the Search.

Not Selected Features

Limited Search Space

X7

X2

X6

X8

X4

X9

....

X7

Slide . . ..– p. 8/13

Page 50: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 4: Follow the Search.

Not Selected Features

Limited Search Space

X7

X2

X6

X8

X4

X9

....

X2

Slide . . ..– p. 8/13

Page 51: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Until the Stop Condition.

X6 X4

X5 X1 X8 X7

Irrelevant Features

X9 X10

Not Selected FeaturesFinal Subset Selected

Slide . . ..– p. 8/13

Page 52: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Page 53: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Page 54: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Page 55: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results I

Slide . . ..– p. 10/13

Page 56: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Slide . . ..– p. 10/13

Page 57: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Page 58: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Page 59: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Page 60: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Slide . . ..– p. 11/13

Page 61: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Page 62: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Page 63: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Page 64: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Page 65: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Page 66: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Page 67: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Page 68: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Page 69: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 70: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 71: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 72: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 73: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 74: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Page 75: Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classification: Some Improvements in Preprocessing and Variable Elimination

Conclusions and Future Work

The wrapper technique is a powerful method in supervisedclassification task.

Its main disadvantage is its high computational cost.

In special, in Gene Expression Data bases due to its highdimensionality.

LFSS-VE solves these disadvantages in the DLBCLclassification using a Preodering of the features and aLimited Search Space.

The elimination of irrelevant features is a good method toenhance the performance of a wrapper method.

The future line of work is the validation of our model withother data sets: breast cancer, colon cancer, leukemia ...

Slide . . ..– p. 13/13