Upload
latifah-pugh
View
22
Download
0
Embed Size (px)
DESCRIPTION
Logical Analysis of Diffuse Large B Cell Lymphoma. Gabriela Alexe 1 , Sorin Alexe 1 , David Axelrod 2 , Peter Hammer 1 , and David Weissmann 3 of RUTCOR(1) and Department of Genetics(2), Rutgers University; and Robert Wood Johnson Medical School(3). This Talk. Lymphoma - PowerPoint PPT Presentation
Citation preview
Logical Analysis of Logical Analysis of Diffuse Large B Cell LymphomaDiffuse Large B Cell Lymphoma
Gabriela AlexeGabriela Alexe11, Sorin Alexe, Sorin Alexe11, David Axelrod, David Axelrod22, , Peter HammerPeter Hammer11, and David Weissmann, and David Weissmann33
of RUTCOR(1) and Department of Genetics(2), Rutgers of RUTCOR(1) and Department of Genetics(2), Rutgers University; and Robert Wood Johnson Medical School(3)University; and Robert Wood Johnson Medical School(3)
RRUUTTCCOORR
22
This Talk
• LymphomaLymphoma• Gene Expression Level AnalysisGene Expression Level Analysis
• cDNA MicroarraycDNA Microarray• Applied to Diffuse Large B-Cell LymphomaApplied to Diffuse Large B-Cell Lymphoma
• Logical Analysis of DatLogical Analysis of Data• Discretization/BinarizationDiscretization/Binarization• Support SetsSupport Sets• Pattern GenerationPattern Generation• Theories and ModelsTheories and Models• PredictionPrediction
LymphomaLymphoma
RRUUTTCCOORR
44
Lymphoma
• Cancer of lymphoid cellsCancer of lymphoid cells• ClonalClonal
• Uncontrolled growthUncontrolled growth
• MetastasisMetastasis
• LymphomaLymphoma• DiagnosisDiagnosis
• GradeGrade
RRUUTTCCOORR
55
Diffuse Large B Cell Lymphoma (DLBCL)
• 31% of non-Hodgkin lymphoma cases31% of non-Hodgkin lymphoma cases
• 50% long-term, disease-free survival50% long-term, disease-free survival
• Clinical variabilityClinical variability
• Prognosis & therapyPrognosis & therapy
• IPIIPI
• MorphologyMorphology
• Gene expressionGene expression
RRUUTTCCOORR
66
Diffuse Large B Cell Lymphoma
RRUUTTCCOORR
77
Spleen with Diffuse Large B Cell Lymphoma
Gene Expression Level AnalysisGene Expression Level Analysis
RRUUTTCCOORR
99
DNA-RNA Hybridization
RRUUTTCCOORR
1010
Gene Expression Profiling
cDNA microarray analysis
Tumor Standard
RRUUTTCCOORR
1111
DLBCL & cDNA Microarray Analysis
• Distinct types of diffuse large B-cell Distinct types of diffuse large B-cell lymphoma identified by gene lymphoma identified by gene expression profiling,expression profiling,Alizadeh et al., Nature, Vol 403, pp 503-511Alizadeh et al., Nature, Vol 403, pp 503-511
• cDNA microarray data -> cDNA microarray data -> unsupervised hierarchical unsupervised hierarchical agglomerative clusteringagglomerative clustering• Germinal center signature: 76% survival Germinal center signature: 76% survival
at 5 yearsat 5 years• Activated B cell signature: 16% at 5 Activated B cell signature: 16% at 5
yearsyears
RRUUTTCCOORR
1212
DLBCL Clustering
Each case (patient) is a point in N-dimensional space where N = # of genes Germinal
center genes
Activated B cell genes
RRUUTTCCOORR
1313
DLBCL Survival by Type
RRUUTTCCOORR
1414
Supervised Learning Classification of DLBCL
• Diffuse large B-cell lymphoma prediction Diffuse large B-cell lymphoma prediction
by gene-expression profiling and by gene-expression profiling and
supervised machine learningsupervised machine learningShipp et al., Nature Medicine, vol 8, p 68-74Shipp et al., Nature Medicine, vol 8, p 68-74
• Prognosis of DLBCLPrognosis of DLBCL
• Highly correlated genes -> weighted Highly correlated genes -> weighted
voting algorithmvoting algorithm
Logical Analysis of DataLogical Analysis of Data
RRUUTTCCOORR
1717
Logical Analysis of Data (LAD)
• Non-statistical method based on:Non-statistical method based on:• CombinatoricsCombinatorics
• OptimizationOptimization
• LogicLogic
• Based on dataset of cases/patientsBased on dataset of cases/patients
• LAD learns patterns characteristic of LAD learns patterns characteristic of classesclasses• Subsets of patients who are +/- for a conditionSubsets of patients who are +/- for a condition
• Collections of patterns are extensibleCollections of patterns are extensible• PredictionsPredictions
RRUUTTCCOORR
1818
The Problem : The Problem : Approximation of Hidden FunctionApproximation of Hidden Function
HiddenHiddenFunctionFunction
LAD LAD ApproximatioApproximationn
DatasetDataset
RRUUTTCCOORR
1919
Main Components of LADMain Components of LAD
• Discretization/BinarizationDiscretization/Binarization
• Support SetsSupport Sets
• Pattern GenerationPattern Generation
• Theories and ModelsTheories and Models
• PredictionPrediction
RRUUTTCCOORR
2020
DiscretizationDiscretization
Separating Separating CutpointsCutpoints
Minimum Set of Minimum Set of SeparatingSeparatingCutpointsCutpoints
RRUUTTCCOORR
2121
Cutpoints and Support Set
• Minimization is NP hard
• Numerous powerful methods
• Support set:
• Cutpoints define a grid in which ideally no
cell contains both + and – cases
• Cutpoints simplify data and decrease
noise
RRUUTTCCOORR
2222
Patterns
• Examples: • Gene A > 34 & gene B < 24 & gene C < 2
• Positive and negative patterns• Pattern parameters:
• Degree (# of conditions)• Prevalence (# of +/- cases that satisfy it)• Homogeneity (proportion of +/- cases
among those it covers)
• Best: low degree, large prevalence, high homogeneity
• Patterns are extensible!
RRUUTTCCOORR
2323
Pattern Generation• Generate patterns based on learning set• Stipulate control parameters. For example:
• Degree 4• + & - prevalences >= 70%• + & - homogeneities = 100%
• All 75 patterns in 1.2 seconds on Pentium IV 1 Gz PC
• Evaluate set:• Average # of patterns covering each
observation• Accuracy applied to evaluation set
RRUUTTCCOORR
2424
Patterns: IllustrationPatterns: Illustration
Positive PatternPositive Pattern Negative PatternNegative Pattern
RRUUTTCCOORR
2525
Positive TheoryPositive Theory Negative TheoryNegative Theory
Theories: Approximations of the 2 RegionsTheories: Approximations of the 2 Regions
A theory is a set of positive (or negative) patterns such that every positive (or negative) case is covered.
RRUUTTCCOORR
2626
Models
• A set of a positive and a negative theory
• A good model:• Small number of features (genes)
• Patterns are high quality• Low degrees
• High prevalences
• High homogeneities
• Number of patterns is small• Maximize their biologic interpretability
RRUUTTCCOORR
2727
Positive TheoryPositive Theory Negative TheoryNegative Theory
ModelModel
Theories and ModelsTheories and Models
Positive AreaPositive Area
Negative AreaNegative Area
Unexplained AreaUnexplained Area
Discordant AreaDiscordant Area
RRUUTTCCOORR
2828
LAD Prediction
• A new case: a set of gene expression A new case: a set of gene expression
levelslevels
• Satisfy some positive & no negative?Satisfy some positive & no negative?
• Satisfy some negative & no positive ?Satisfy some negative & no positive ?
• Satisfy some of both?Satisfy some of both?
• Which more?Which more?
• Does not satisfy any (rare) Does not satisfy any (rare)
RRUUTTCCOORR
2929
8 Gene Classification Model
Gene index 6642 6992 3890 5383 3674 2004 1692 2280
Description Butyrophilin (BTF1) mRNADystrobrevin-alpha mRNAP120E4F transcription factor mRNAMitogen induced nuclear orphan receptor (MINOR) mRNASM15 gene (human interferon-related protein SM15 (U09585); final exon similar to partial sequence of human EST R48415, but would require alternative splice) extracted from Human cosmid LUCA14Neurotrophin-3 (NT-3) geneLecithin-cholesterol acyltransferase mRNA, with 5' and 3' flanking DNA sequencesBETA-1,4 N-ACETYLGALACTOSAMINYLTRANSFERASE
Accession # U90543_at U46744_at U87269_at U12767_at U73167_cds5_atM37763_at M12625_at M83651_at
Pattern Positive Negative Positive NegativeP1 >0.49 0. >0.3 72.22 0.00 62.50 30.00
P2 >0.48 >0.46 0.40 72.22 0.00 50.00 20.00
P3 >0.48 >0.46 0.36 72.22 0.00 62.50 10.00
P4 >0.48 >0.47 >0.0 72.22 0.00 50.00 20.00
P5 0.3 >0.1 >0.6 61.11 0.00 62.50 20.00
P6 >0.46 0.30 61.11 0.00 50.00 10.00
P7 >0.63 >0.46 55.56 0.00 25.00 0.00
P8 >0.8 >0.8 55.56 0.00 50.00 20.00
P9 >0.49 >0. 55.56 0.00 50.00 30.00
N1 0.60 0.69 0.6 0.00 72.73 12.50 70.00
N2 0.3 0.69 0.7 0.00 68.18 12.50 50.00
N3 0.3 0.69 0. 0.00 63.64 12.50 40.00
N4 0.60 >0.10.6 0.00 63.64 50.00 70.00
N5 0.3 0.69 >0.10.69 0.00 63.64 0.00 50.00
N6 0.3 0. 0.00 59.09 0.00 40.00
Prevalence (%)
Training set Test set
RRUUTTCCOORR
3030
Accuracy of Prognosis
RRUUTTCCOORR
3131
Conclusion
• Logical Analysis of Data (LAD ): a versatile new Logical Analysis of Data (LAD ): a versatile new classification method here applied to diagnosis classification method here applied to diagnosis and prognosis of lymphoma.and prognosis of lymphoma.
• LAD genes differ almost entirely from those LAD genes differ almost entirely from those specified by other studies.specified by other studies.
• Genes not individually correlated with diagnosis Genes not individually correlated with diagnosis or prognosis but highly correlated in or prognosis but highly correlated in combinations of as few as two genes. combinations of as few as two genes.
• Patterns suggest biologic pathwaysPatterns suggest biologic pathways
• LAD provides highly accurateLAD provides highly accurate pprognosis of DLBCL
RRUUTTCCOORR
3232
Contacts
• Gabriela Alexe: [email protected]
• Soren Alexe: [email protected]
• David Axelrod:
• Peter Hammer:
• David Weissmann: [email protected]