View
218
Download
0
Embed Size (px)
Citation preview
• Predictive sub-typing of subjects
• Retrospective and prospective studies
• Exploration of clinico-genomic data
• Identify relevant gene expression patterns
Issues in Bayesian Tree Modeling of Clinical and Gene
Expression Data
Current Areas of Application
Breast Cancer
lymph node status
disease recurrence
Ovarian Cancer
tumor location
Lymph Node Involvement Is a Key Breast Cancer Risk Factor
But -- lymph node dissection also carriesmorbidity and inaccuracy
Identifying Metagenes Associated With Lymph Node Status
Tumor Sample
Gen
e
Metagenes/ Expression Signatures
• Dimension reduction: Signal improvement
• Clustering• Singular value decomposition• Empirical or model-based factor
analysis• Characterize patterns in data
Gene Clustering
Gene Clustering (cont’d)
Factor extraction (SVD)
Differential Gene Expression
Differential Gene Expression (Threshold 1)
Differential Gene Expression (Threshold 2)
Differential Gene Expression (Threshold 3)
Nonlinear Expression
Nonlinear Expression (Threshold 1)
Nonlinear Expression (Threshold 2)
Lymph Node Metastasis Metagenes
Ovarian Tumor Site Genes
Statistical Tree Models for Clinico-Genomic Prediction
• Regression trees: Non-linear, interactions
• Recursive partitioning • Retrospective studies• Many trees: Model uncertainty• Predictions average across trees
Binary Outcomes Retrospective Sampling
)|1(Pr 1 ppxxY
LN +LN +
Binary Outcomes: Prospective Inference from Retrospective
Model)τ,...,xτ|x(Yπ kk 111Pr
10Pr 111 , yy), |Yτ(x a y
10,|Pr 11222 , yy), Yx(x a y
0201
1211
0Pr
1Pr
1 ,,
,,
aa
aa
)(Y
)(Y
π
π
Binary Outcomes: Retrospective Model
• Model conditionals for predictors
• Nonparametric Bayes: Dirichlet model
• Modeling in x space – joint structure• Implies Beta priors on
y)Yτ,...,xτ|xF(x 1k1kk ,11
iya
Growing Binary Trees
Node split:• Each candidate predictor:threshold pair• 2x2 table: 2 Bernoulli’s, fixed columns
(Y=0/1)• Assess and select split, or stop• Conservative Bayesian tests
Multiple trees:• Multiple splits at any node
Inference with Many Binary Trees
Within-tree inference & prediction: • Sequences of beta posteriors for • Simulate: Impute Pr(Y=1|leaf)
Multiple trees:• Likelihood across trees• Average predictions across trees • Model (predictor:threshold)s
uncertainty• “Smoothing” classification boundaries
iya
Binary Outcome: Lymph Node Metastasis
Tumor Sample
Gen
e
Predictive trees:
• Nonparametric Bayes’
• Metagene expression
• Retrospective sampling
Lancet 2003 (Huang, West et Lancet 2003 (Huang, West et al)al)
Predicting Lymph Node Status With Metagenes
LN+ LN-
Pro
babi
lity
of L
N+
Out-of-sample cross validation
Sample
Forests of Clinico-Genomic Trees
Select from potential clinical and genomic predictor variables
multiple trees
variable combination – co-occurrence
multiple subtypes
… With Metagenes and Clinical Predictors
LN+ LN-
Pro
babi
lity
of L
N+
Out-of-sample cross validation
Sample
Lymph Node Clinico-Genomic Predictors
Predicting Ovarian Tumor Site
Omentum Ovary
Pro
babi
lity
of O
men
tum
Out-of-sample cross validation
Sample
Gene Identification
• Implicated metagenes – gene subsets
• Genes correlated with key metagenes
Breast Cancer – nodal metastasis:• Interferon pathway/inducible gene subset
• Interferons mediate anti-tumor response
Evidence of dysfunction of normal anti-tumor response?
Ovarian Cancer – tumor site:• Growth regulatory pathway/inducible gene subset
Evidence of dysfunction of normal cell growth?
Ongoing Research
• Stochastic search (sequential,annealing)
• Representation of tree ‘forest’• Metagene definition/ creation• Cluster implementation of tree models
Computational & Applied Genomics Program
Joseph Nevins Mike WestErich Huang Ed IversenHolly Dressman
Duke University
Koo Foundation-Sun Yat Sen Cancer Center
Andrew Huang, Skye Cheng, Mei-Hua Tsou
http://www.isds.duke.edu/~jennifer/
Department of Obstetrics and Gynecology
John LancasterAndrew Berchuck
Growing Binary Trees (2x2)
kk tx
kk tx
0Y 1Y
1N
0N00n 01n
10n 11n
N
)Yτ,x,τ|xP(x 1k1kkk 0,11
)1,,,|( 11 YxxxP 1k1kkk
?