64
Overview 1 Introduction 2 Case Studies and Modeling Approaches QSTAR Project Microbiome Project Joint Modeling Approach 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures Multiple Surrogacy Partial Surrogacy Orthogonal Surrogacy Computational Aspects 4 Conclusion 1 / 64

High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

1 / 64

Page 2: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

High Dimensional Surrogacy: A JointModeling Approach

Rudradev Sengupta

October 4, 2018

Page 3: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Research Team

Affiliation Collaborators

Interuniversity Institute for Biostatistics and StatisticalBioinformatics (I-BioStat), Belgium

Ariel Alonso Abad, GeertMolenberghs, Ziv Shkedy.

Janssen Pharmaceutical Companies of Johnson &Johnson, Beerse, Belgium

Luc Bijnens, Nolen JoyPerualila-Tan, Wim Van derElst.

3 / 64

Page 4: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

4 / 64

Page 5: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling Approaches

3 High Dimensional Surrogacy and Biomarker Detection

4 Conclusion

5 / 64

Page 6: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Clinical Trials

Very slow, costly and inefficient development process.The choice of endpoint(s), to assess the drug efficacy,plays an important role.Measuring the endpoint(s) can become difficult, timeconsuming and expensive.Surrogacy in clinical trials.

6 / 64

Page 7: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

“Surrogate” and “True” Endpoint in Clinical Trials

A “true” endpoint can be a response or a clinical outcomeor time to event etc.A “surrogate” endpoint serves as a substitute for the “true”endpoint as it can usually be measured more cheaply andconveniently.Before using a surrogate as a substitute for the trueendpoint it should be validated.Statistical methods for the identification and evaluation ofsurrogate endpoints in randomized clinical trials have beendeveloped over last three decades.

7 / 64

Page 8: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Biomarkers in Clinical Trials

A biomarker is objectively measured and evaluatedindicator of normal biological or pathogenic processes orpharmacologic responses to a therapeutic intervention.A surrogate marker is a biomarker intended to substitute aclinical endpoint.All surrogate markers are biomarkers, but not allbiomarkers can qualify as surrogate markers.

8 / 64

Page 9: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Biomarkers in Drug Discovery Experiments

Understanding the mechanism of action of a newcompound.Integrating multiple data sources.High dimensional data.

I High dimensional surrogacy.

9 / 64

Page 10: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

The Surrogacy Framework: Graphical Illustration

X

Y

Z

The surrogacy framework for two endpoints, X and Y .The variable Z represents a binary grouping variable.The association between the biomarker (X ) and the clinicalendpoint (Y ) after adjusting for the grouping variable (Z ).

10 / 64

Page 11: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Three Decades of Surrogacy

1989: Prentice

Surrogate endpoints in clinical trials

1998: Buyse and Molenberghs

Individual Level Surrogacy

2000: Buyse et al.

Trial Level Surrogacy

2005: Burzykowski et al. Evaluation of Surrogate Endpoints

2007: Alonso and Molenberghs

Information Theory Approach

2016: Alonso et al.

Applied Surrogate Endpint Evaluation with SAS and R

2016: Perualila et al.

Joint Model

2010: Lin et al.

Biomarkers in pre-clinical and clinical microarray experiments

2012: Van Sanden et al.

Genomic biomarkers in microarray experiments

2015: Verbist et al.

Lessons learned from the qstar project.

Clinical Trials

High Dimensional DataNon-clinical Trials

2018: Sengupta et al.

1992: Freedman et al.

Statistical Validation of Intermediate Endpoints

Main focus: different approaches to evaluate individual level surrogacy.

11 / 64

Page 12: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker Detection

4 Conclusion

12 / 64

Page 13: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

13 / 64

Page 14: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

QSTAR Framework

Data integration in drug discovery.Why is QSTAR important?

Compound

target

Biological processes

Known: the chemical structure of the new compound. Unknown: targets & biological process. The main idea: Information about gene expression will help to understand the biological processes related to the new compound (i.e. understanding the mechanism of action).

14 / 64

Page 15: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

QSTAR Data Structure

An indicator variable for the k th fingerprint feature (FF) and i thcompound,

Zki =

{1, if the k th FF is present in the i th compound,0, otherwise.

15 / 64

Page 16: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

16 / 64

Page 17: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

transPAT Data

PAT - Pulsed Antibiotic Treatment model of pediatric exposures.

Hypothesis: A series of short, therapeutic-dose pulses ofantibiotic administered early in life will perturb the intestinalmicrobiota and lead to long-lasting alterations in metabolic andimmune profiles.

Exactly same data structure.

“Donor” Mouse “Donor” Mouse

Germ-free mice (n=7) Germ-free mice (n=8)

Microbiota Transfer

Pulsed Antibiotic (Tylosin) Treatment

Normal Microbiota Development

PAT-altered Microbiota (loss of some early-life protective bacteria)

17 / 64

Page 18: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

transPAT Data Structure

intervention variable

Similar setting as beforewith three different datasources.Main goal:

The associationbetween microbiomeand immunity taking theintervention intoaccount).Development of modelsto identify microbiomebiomarkers.

18 / 64

Page 19: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

CERTIFI Case Study

A phase II study.Explores association between the fecal microbiota and itsrole in therapeutic response of Chron’s disease.Patients are treated with ustekinumab (UST; Stelara).Brings back to the biomarker framework.Talk by Dea Putri - Session 6a, 16:35 - 16:55).

19 / 64

Page 20: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

20 / 64

Page 21: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Model Formulation

1989: Prentice

Surrogate endpoints in clinical trials

1998: Buyse and Molenberghs

Individual Level Surrogacy

2000: Buyse et al.

Trial Level Surrogacy

2005: Burzykowski et al. Evaluation of Surrogate Endpoints

2007: Alonso and Molenberghs

Information Theory Approach

2016: Alonso et al.

Applied Surrogate Endpint Evaluation with SAS and R

2016: Perualila et al.

Joint Model

2010: Lin et al.

Biomarkers in pre-clinical and clinical microarray experiments

2012: Van Sanden et al.

Genomic biomarkers in microarray experiments

2015: Verbist et al.

Lessons learned from the qstar project.

Clinical Trials

High Dimensional DataNon-clinical Trials

2018: Sengupta et al.

1992: Freedman et al.

Statistical Validation of Intermediate Endpoints

Xj

Y

Z ρj

α j

β

αj : fingerprint effect on the j th gene.

ρj : fingerprint-adjusted associationbetween the gene expression andbioactivity data.

21 / 64

Page 22: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Joint Model: Estimation and Inference

Estimation:(XjiYi

)∼ N

[(µj + αjZiµY + βZi

),Σj

],

Σj =

(σjj σjYσjY σYY

)and ρjk =

σjY√σjjσYY

.

Inference:H0j : αjk = 0,H1j : αjk 6= 0.

H0j : ρjk = 0,H1j : ρjk 6= 0.

Gene-specific analysis, per fingerprint feature.BH-FDR multiplicity adjustment is done.

22 / 64

Page 23: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling Approaches

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy MeasuresComputational Aspects

4 Conclusion

23 / 64

Page 24: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

High Dimensional Surrogacy

Computational solutions for an upscaled analysis.Surrogacy setting with multiple candidates that can serveas biomarkers.

Application of the Joint Model within a High Dimensional Setting

- Single Surrogacy - Multiple Surrogacy - Partial Surrogacy - Orthogonal Surrogacy

Modeling Aspects

Computational Aspects

- Optimized Implementation with R

- Parallel computing using computer cluster

24 / 64

Page 25: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

25 / 64

Page 26: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Example - EGFR Project

Biomarkers (X): A 3595 × 35 transcriptomics matrix.

Primary endpoint (Y): The bioassay measurements (i.e. the pIC50

values) is a vector of length 35.

Z: A 138 × 35 binary grouping variable.

Per fingerprint feature, there are 3595 models to be fitted.

26 / 64

Page 27: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Example of One Gene (FOSL1)

X

X

X

ρ=−0.76

Computation time toanalyze onefingerprint and all3595 genes ∼ 377seconds (in laptop).Fingerprint effect ongene experession.Negativeassociation.

27 / 64

Page 28: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Top 5 Differentially Expressed Genes with HighAdjusted Correlation

Verbist et al. (2015) linked cell growth activity with downregulation ofgenes FOSL1 and FGFBP1 for a particular chemical feature.

28 / 64

Page 29: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

29 / 64

Page 30: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Single Surrogacy

Xj

Y

Z ρj

Models to identify one biomarkerat a time.

Reduction in computation time tofind one biomarker.

30 / 64

Page 31: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

31 / 64

Page 32: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Multiple Surrogacy: Introduction

Once a primary biomarker is known,can we add something more in thecontext of surrogacy?

A subset of k genes is used as abiomarker - multiple adjustedassociation replaces single surrogacy.

32 / 64

Page 33: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Multiple Surrogacy: Model Formulation (I)

Considers a subset of k genes that can be used as a jointsurrogate for pIC50.Example: genes in the same biological pathway that wasfound by the joint model.Van der Elst et al. (2018) extended the joint model,

Xi1Xi2...

XikYi

∼ N

µ1 + α1Ziµ2 + α2Zi

...µk + αkZiµY + βZi

.

33 / 64

Page 34: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Multiple Surrogacy: Model Formulation (II)

Σk =

σ11 σ12 . . . σ1k σ1yσ21 σ22 . . . σ2k σ2y

......

. . ....

...σk1 σk2 . . . σkk σkyσy1 σy2 . . . σyk σyy

Adjusted correlation between two biomarkers:

ρij =σij√σiiσjj

.

Adjusted correlation between the j th biomarker and theresponse, pIC50

ρyj =σyj√σyyσjj

.

34 / 64

Page 35: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Multiple Surrogacy: Model Formulation (III)

The covariance matrix:

Σk =

(ΣX ,X Σ

X ,YΣX ,Y σY ,Y

).

Multivariate adjusted association:

γ2 = ρ2Y ,X1,X2,...,Xk

=ΣX ,Y Σ−1

X ,X Σ′

X ,Y

σY ,Y.

35 / 64

Page 36: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Gene FOSL1

FOSL1 is used as a known primary biomarker.

X

X

X

ρ=−0.76

36 / 64

Page 37: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

EGFR Project: Illustration when K=2

Joint model: FOSL1i1Xi2Yi

∼ N

µFOSL1 + αFOSL1Ziµ2 + α2ZiµY + βZi

.γ2 = ρ2

Y ,FOSL1,X2= joint surrogacy value of X2 and FOSL1.

ρY ,FOSL1 and ρY ,X2 are the marginal surrogacy values forFOSL1 and X2, respectively.Gain in surrogacy = ρ2

Y ,FOSL1,X2- ρ2

Y ,FOSL1.

37 / 64

Page 38: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

EGFR Project: Multiple Surrogacy

Top 5 genes, sorted according to their multiple adjustedassociation, when used together with FOSL1:

Genes ρY ,X2 ρ2Y ,FOSL1,X2

Gain in Surrogacy ValueMPHOSPH9 -0.26 0.69 0.11TOP2A -0.35 0.69 0.11MYO6 0.73 0.68 0.10PNISR 0.76 0.68 0.10EREG -0.60 0.67 0.09

ρ2Y ,FOSL1 = 0.58.

Gain in surrogacy = ρ2Y ,FOSL1,X2

- 0.58.

38 / 64

Page 39: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Top Genes

Density of multipleadjusted association forthe remaining genes,given FOSL1:Multiple Adjusted Association for the remaining genes|FOSL1

ρY,FOSL1,X2

2

Density

0.58 0.60 0.62 0.64 0.66 0.68 0.70

020

40

60

80

MPHOSPH9

TOP2A

MYO6

PNISR

EREG

KRT10

Example of the top gene,MPHOSPH9:

ρY ,MPHOSPH9 = -0.26.ρ2

Y ,MPHOSPH9,FOSL1 = 0.69.

pIC50

−1.5 −0.5 0.5 1.0 1.5

−1

.0−

0.5

0.0

0.5

1.0

−1

.5−

0.5

0.5

1.0

1.5

−0.76 FOSL1

−1.0 −0.5 0.0 0.5 1.0

−0.26 0.66

−0.15 −0.05 0.05 0.15

−0

.15

−0

.05

0.0

50

.15

MPHOSPH9

39 / 64

Page 40: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

40 / 64

Page 41: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Partial Surrogacy (I)

X1

Y

Z X2

ρY,X

1 ρX1,X2

ρY ,X2

Adjusted association between Y and X1: ρY ,X1|Z .Adjusted association between Y and X2: ρY ,X2|Z .Adjusted association between X1 and X2: ρX1,X2|Z .

41 / 64

Page 42: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Partial Surrogacy (II)

partial surrogacy effect : surrogacy value of X2, given X1and Z .For k = 2, the covariance matrix:

Σ =

σ11 σ12 σ1yσ21 σ22 σ2yσy1 σy2 σyy

.Partial adjusted association:

ρY ,X2|X1,Z = ρY ,X2|X1 =ρy2 − ρy1ρ12√

(1− ρ2y1)(1− ρ2

12).

42 / 64

Page 43: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Graphical Illustration: Partial Surrogacy (I)

Low correlation between all three variables.Low partial adjusted correlation between Y and X2, givenX1.

Y

−1 0 1 2

−1

01

2

−1

01

2

0.19 X1

−1 0 1 2

0.0023 0.19

−2 −1 0 1 2

−2

−1

01

2

X2

ρY,X2|X1= − 0.0396

−1

0

1

2

−2 −1 0 1 2

Residuals: X2*

Resid

uals

: Y

*

FP: 0 − absent 1 − present

43 / 64

Page 44: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Graphical Illustration: Partial Surrogacy (II)

Three correlated variables.Low partial adjusted correlation between Y and X2, givenX1.

Y

−1.5 −0.5 0.5 1.0 1.5

−2.

0−

1.0

0.0

0.5

1.0

−1.

5−

0.5

0.5

1.0

1.5

0.87 X1

−2.0 −1.0 0.0 0.5 1.0

0.85 0.98

−1 0 1 2

−1

01

2

X2

ρY,X2|X1= − 0.0028

−0.5

0.0

0.5

1.0

−0.2 0.0 0.2

Residuals: X2*

Resid

uals

: Y

*

FP: 0 − absent 1 − present

44 / 64

Page 45: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Graphical Illustration: Partial Surrogacy (III)

Three correlated variables.Relatively high partial adjusted correlation between Y andX2, given X1.

Y

−2 −1 0 1 2

−1

01

2

−2

−1

01

2

0.82 X1

−1 0 1 2

0.67 0.54

−2 −1 0 1 2

−2

−1

01

2

X2

ρY,X2|X1= 0.4587

−0.5

0.0

0.5

1.0

1.5

−1 0 1

Residuals: X2*

Resid

uals

: Y

*

FP: 0 − absent 1 − present

45 / 64

Page 46: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

EGFR Project: Partial Surrogacy (I)

Density of partial correlationfor all the genes, excludingFOSL1:

Partial Correlation for the remaining genes|FOSL1

ρY,X2 | FOSL1

Density

−0.4 −0.2 0.0 0.2 0.4 0.6

0.0

0.5

1.0

1.5

2.0

2.5

MPHOSPH9

TOP2A

MYO6

PNISR

EREG

TCIRG1

MYC

Top 5 genes:

Genes ρY ,X2 ρY ,X2|FOSL1

MPHOSPH9 -0.26 0.51TOP2A -0.35 0.51MYO6 0.73 0.49PNISR 0.76 0.48EREG -0.60 0.47

46 / 64

Page 47: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Gene TCIRG1

●●●●

●●●● ●●●●●● ●● ●

●●

●●

●●

●●

TCIRG1,Observed:−442307337

9.5 9.6 9.74.5

5.0

5.5

6.0

6.5

7.0

Gene Expression

pIC

50Unadj. Asso. −0.3569

●●

●●●● ●●●●●● ●● ●

●●

●●

●●

TCIRG1,Residuals:−442307337

−0.1 0.0 0.1

−1.0

−0.5

0.0

0.5

1.0

Gene Expression

pIC

50

FP: ● ●Absent Present

Adj. Asso. −0.3476

Negatively associatedwith pIC50.ρY ,TCIRG1 = −0.35.

47 / 64

Page 48: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

EGFR Project: Partial Surrogacy (II)

Three correlated variables.Zero partial adjusted correlation between pIC50 andTCIRG1, given FOSL1.

pIC50

−1.5 −0.5 0.5 1.0 1.5

−1.0

−0.5

0.0

0.5

1.0

−1.5

−0.5

0.5

1.0

1.5

−0.76 FOSL1

−1.0 −0.5 0.0 0.5 1.0

−0.35 0.46

−0.15 −0.05 0.05

−0.1

5−

0.0

50.0

5

TCIRG1

ρpIC50,TCIRG1|FOSL1 = 0

−0.5

0.0

0.5

1.0

−0.10 −0.05 0.00 0.05

Residuals: TCIRG1*

Resid

uals

: pIC

50

*

FP: 0 − absent 1 − present

48 / 64

Page 49: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

49 / 64

Page 50: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Orthogonal Surrogacy: Introduction

X1

Y

Z X2

ρY,X

1

ρY ,X2

Adjusted association between Y and X1: ρYX1|Z .Adjusted association Y and X2: ρYX2|Z .X1 and X2 are conditionally independent: ρX1X2|Z = 0.High partial surrogacy: ρYX2|X1,Z .

50 / 64

Page 51: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Orthogonal Surrogacy

Σ =

σ11 0 σ1y0 σ22 σ2yσy1 σy2 σyy

and P =

ρ11 0 ρ1y0 ρ22 ρ2yρy1 ρy2 ρyy

It is a special case of partial surrogacy.X1 and X2 are uncorrelated but both are correlated with Y .High partial adjusted association between X2 and Y sinceX1 does not explain the variation of X2.

ρY ,X2|X1,Z = ρY ,X2|X1 =ρy2√

(1− ρ2y1)

Inference:H0 : σ12 = 0,H1 : σ12 6= 0.

51 / 64

Page 52: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Graphical Illustration: Orthogonal Surrogacy

X1 and X2 are independent.High adjusted partial correlation between Y and X2, givenX1.

Y

−3 −2 −1 0 1 2

−2

−1

01

2

−3

−2

−1

01

2

0.71 X1

−2 −1 0 1 2

0.58 −0.14

−3 −2 −1 0 1 2

−3

−2

−1

01

2

X2

ρY,X2|X1= 0.9677

−2

−1

0

1

2

−3 −2 −1 0 1 2

Residuals: X2*

Resid

uals

: Y

*

FP: 0 − absent 1 − present

52 / 64

Page 53: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling ApproachesQSTAR ProjectMicrobiome ProjectJoint Modeling Approach

3 High Dimensional Surrogacy and Biomarker DetectionSingle Surrogacy for High Dimensional DataDifferent Surrogacy Measures

Multiple SurrogacyPartial SurrogacyOrthogonal Surrogacy

Computational Aspects

4 Conclusion

53 / 64

Page 54: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Computational Issues

Computation time for one fingerprint feature ∼ 377seconds.R code with loop over all genes.For all fingerprint features and all genes: a loop over allgenes, nested within a loop over all fingerprint features -takes around 14.45 hours.

Main Question: How to have faster implementation when wehave more data and do further analysis to utilize all the data ?

54 / 64

Page 55: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Code Structure

Loop over genes.For each gene:

gls() - joint model.Summarize and combine results from the model.

Summarize and combine results for all the genes.Computation time for all genes and all fingerprint features∼ 14.45 hours.

55 / 64

Page 56: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Distribution of Computational Time for Joint Model

All the functions used for the analysis fall into three groups:

gls(),anova() & summary() functions andall other functions e.g., data.frame() and cor().

gls anova + summary others

Functions

% o

f Tota

l C

om

puta

tional T

ime

020

40

60

80

56 / 64

Page 57: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Parallelization for the Joint Model

Using R packages:foreach package - foreach().parallel package - clusterApply(),clusterApplyLB().

Using worker framework:It is a “master-slave” framework.Master: divides the bigger and more complex main probleminto smaller subproblems and supplies them to the slaves.Slaves: finish the computations and return the results backto the master and check for the next jobs assigned to them,if any.A user-specific parallelization framework in a cluster.Requires small tweaks, e.g., additional files, restructuringthe code etc., to run the code.

57 / 64

Page 58: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Upscaling the Analysis for EGFR Project

Computational time for complete analysis with for loopover genes and fingerprint features ∼ 14.45 hours.With the worker framework:

With one master and 138 workers and each worker with afor loop over 3595 genes for one of the 138 fingerprintfeatures = 259.35 seconds.With 880 cores and 190 genes per core = 97.64 seconds,

67 seconds to fit the models and 30.64 to gather the resultsfrom different cores and combine them.880139 = 6.33 times more cores.259.3597.64 = 2.66 times speedup.

Sengupta et al. (2018) - accepted in Journal ofBiopharmaceutical Statistics.

58 / 64

Page 59: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Density of Adjusted Correlation

All fingerprint features for one gene (FGFBP1):

−0.85 −0.80 −0.75

010

20

30

40

Gene FGFBP1

Estimated adjusted correlation (ρ̂)

Density

−442307337

59 / 64

Page 60: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Gene FGFBP1 for a Particular Fingerprint Feature(-1592278635)

α̂ = 1.24239.

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

Gene FGFBP1

Estimated fingerprint effect on gene expression (α̂)

Density

−1592278635

60 / 64

Page 61: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Overview

1 Introduction

2 Case Studies and Modeling Approaches

3 High Dimensional Surrogacy and Biomarker Detection

4 Conclusion

61 / 64

Page 62: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Summary (I)

Biomarker

Clinical Endpoint

Treatment

For drug discovery often some biomarkers are known.Partial and orthogonal surrogacy allow us to evaluate thesurrogacy value of adding possible biomarker(s), fromdifferent sources, to the primary biomarker.

62 / 64

Page 63: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

Summary (II)

Similar approach can be implemented in other experimentsas well,

Joint model to identify microbiome biomarkers (talk by DeaPutri - Session 6a, 16:35 - 16:55).Multiple surrogacy in the context of microbiome data hasbeen studied by Van der Elst et al., 2018.

63 / 64

Page 64: High Dimensional Surrogacy: A Joint Modeling Approach · 3 High Dimensional Surrogacy and Biomarker Detection Single Surrogacy for High Dimensional Data Different Surrogacy Measures

64 / 64