Upload
ngehlenborg
View
294
Download
1
Embed Size (px)
DESCRIPTION
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014) http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/ In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline. http://stratomex.caleydo.org
Citation preview
Visual Exploration of Clinical and Genomic Data for Patient Stratification
NILS GEHLENBORG !
@nils_gehlenborg・http://www.gehlenborg.com
Broad Institute of MIT and Harvard
Cancer ProgramHarvard Medical School Center for Biomedical Informatics
Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA
Marc Streit Johannes Kepler University, Linz, Austria
Christian Partl Graz University of Technology, Graz, Austria
Sam Gratzl Johannes Kepler University, Linz, Austria
Dieter Schmalstieg Graz University of Technology, Graz, Austria !
Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA
Peter J Park Harvard Medical School, Boston, MA, USA !
Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA !!!!Special thanks to
Broad Institute TCGA Genome Data Analysis Center Team in particular Michael S Noble, Lynda Chin & Gaddy Getz
Team
Peter J Park NIH/NCI The Cancer Genome Atlas !
Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award !!
Funding
?
TCGAThe Cancer Genome Atlas
20+ cancer types ×
500 patients
10,000+ patients
mRNA expression
microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
Stratome
Anthony92931 / Wikimedia Commons
Correlation with clusters based on other data types?
Different outcomes?
Mutations or copy number variants associated with clusters?
Demographic differences?
How can we explore overlap of patient sets across stratifications?
How can we compare properties of patient sets within a stratification?
How can we discover “interesting” stratifications and pathways to consider
How can we handle terabytes of clinical and genomic data in visualization tools?
Challenges
Problem 1 !
Comparing Patient Sets across Stratifications
Pat
ien
ts
Stratifications
mRNA Copy Numbergene X
Mutationgene Y
del
amp
normal
mut
normal
#2
#3
#4
#1
mRNA Copy Numbergene X
Mutationgene Y
del
amp
normal
mut
normal
#2
#3
#4
#1
mRNA Copy Numbergene X
Mutationgene Y
del
amp
normal
mut
normal
#2
#3
#4
#1
mRNA Copy Numbergene X
Mutationgene Y
del
amp
normal
mut
normal
#2
#3
#4
#1
StratomeX(short for Stratome Explorer)
mRNA Copy Number Mutation
del
amp
normal
mut
normal
#2
#3
#4
#1
Select band
Select block
Compare clusterings: consensus NMF and hierarchical
Park columns
Compare clusterings: left cluster split
Compare clusterings: right cluster split
Compare clusterings: left cluster contained in right cluster
Problem 2 !
Comparing Patient Sets within Stratifications
Block Visualizations: Patient Properties
Numerical Data
Matrix
Vector
Matrix + (Pathway) Maps
Categorical Data
Scalar
Add KEGG glioma pathway and map mRNA transcript levels
Modify color mapping on the fly
View pathway detail (cluster 2)
Zoom into pathway detail (cluster 2): EGFR down-regulated
View pathway detail (cluster 3)
Zoom into pathway detail (cluster 3): EGFR up-regulated
Add copy number for EGFR
Add copy number for EGFR
Add survival stratified by TP53 mutation status
View detail of Kaplan-Meier plot based on TP53
?
Knowledge-driven Exploration
Data-driven Exploration
Problem 3 !
Finding “Interesting” Stratifications and Pathways
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Query
Retrieve
Visualize
Stratifications
Clinical Params
Pathways
Guided Exploration
LineUp
S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)
Main TCGA Paper published in Nature in 2013 !
First goal here: Characterize mRNA clusters
Example: Clear Cell Renal Carcinoma (KIRC)
View TCGA mRNA subtypes
Add MutSig q-values for mutations
Invert q-value mapping
Add filter to inverted q-value as cut-off
Query mutated genes
Retrieve Stratifications
Sets with large overlap: Jaccard Index
Similar stratifications: Adjusted Rand Index
Survival: Log Rank Score (one vs rest)
Retrieve Pathways
Gene Set Enrichtment Score: original or PAGE (one vs rest)
Queries
Query mutated genes
Result of Jaccard Index query: preview PTEN
Query mutated genes
Query mutated genes
Query mutated genes with cluster m2
Result of Jaccard Index query: preview MTOR
Re-order columns
Add TCGA microRNA subtypes (direct insert mode)
Add TCGA microRNA subtypes (direct insert mode)
Observe large overlap between m1 and mi3
Observe large overlap between m3 and mi2
Query for copy number variation matching m3
Query only tumor suppressor genes (Vogelstein et al.)
Query only tumor suppressor genes (Vogelstein et al.)
Score only deletions
Score only deletions
Score only deletions
Score only deletions
Score only deletions
View CDKN2A copy number status and m3 and mi2 overlap
Add survival stratified by TCGA microRNA clusters
Find gene mutation that affects survival
Score only mutations
Score only mutations
Score only mutations
Score only mutations
View BAP1 mutation status and survival stratified by BAP1
View BAP1 mutation status and survival stratified by BAP1
View BAP1 mutation status and survival stratified by BAP1
Query for enriched pathway in TCGA mRNA cluster m4
Preview KEGG ribosome pathway overexpression in m4
Confirm selection
Change color mapping
View ribosome pathway detail for TCGA mRNA cluster m4
?
Problem 4 !
Dealing with Terabytes of Cancer Genomics Data
TCGA Data Coordination Center
Broad Institute Genome Data Analysis Center
Standardized Data Sets
Standardized Analyses
Analysis Reports
MSKCC cBio Portal
TCGA Working Groups
StratomeX
...
Standardized Data Sets Standardized Analyses Analysis Reports
Data set versioning
Format normalization
Removal of redacted data
. . .
Mutation Analysis
Copy Number Analysis
Clustering
Correlations
Pathway Analysis
. . .
102
http://gdac.broadinstitute.org individual downloads and view reports
firehose_getbulk download
Standardized Data Sets Standardized Analyses Analysis Reports
102
http://gdac.broadinstitute.org individual downloads and view reports
firehose_getbulk download
Standardized Data Sets Standardized Analyses Analysis Reports
Data Matrices Stratifications
mRNA (array & sequencing) microRNA (array & sequencing) methylation reverse phase protein array clinical parameters
clustering (CNMF & hierarchical) gene mutation status (binary) gene copy number status (5 class)
+ = one per tumor type
Data Package
Standardized Data Sets Standardized Analyses Analysis Reports
up to 24 data and result files from 18 Firehose archives up to 500 MB (190 MB compressed)
Data Packages
Schroeder et al. Genome Medicine 2013, 5:9
How can we explore overlap of patient sets across stratifications?
How can we compare properties of patient sets within a stratification?
How can we discover “interesting” stratifications and pathways to consider
How can we handle terabytes of clinical and genomic data in visualization tools?
Challenges
StratomeX is part of the Caleydo Visualization Framework
Implemented in Java, uses OpenGL and Eclipse Rich Client Platform
Binaries available for Linux, Windows, Mac OS X
Requires Java 1.7 JRE or JDK (on Mac OS X)
Open source licensed under BSD license
Source code on GitHub
CALEYDO
StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
CALEYDO
Plans !
Where to go from here?
Domino
S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer Graphics (2014)
INTEGRATION
INTEGRATION
INTE
GRA
TIO
N
INTEGRATION
Horizontal Integration across Data Types
Biological Insight
Vertical Integration across Data Levels
Confirmation & TroubleshootingIN
TEG
RATI
ON
Refinery Platform
!
! |
!
! |
Data repository based on ISA-Tab for reproducible research
Workflow execution in Galaxy
Integrated visualization tools with access to provenance
http://www.refinery-platform.org
Refinery Platform
StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
CALEYDO
Execute Logrank Test query
Select displayed set
Execute Jaccard Index query
Select displayed Z[YH[PÄJH[PVU
Execute Adjusted Rand Index query
Select pathway
Select displayedset
Execute GSEA query
Select displayedZ[YH[PÄJH[PVU
Select clinical param.in LineUp view
Select displayedZ[YH[PÄJH[PVU
Execute LogrankTest query
Execute PAGE query
:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU� :LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�in LineUp view
Select pathway Select pathway Select pathway Select clinical param. in LineUp view
(KK�Z[YH[PÄJH[PVU
Based on LogrankTest score (survival)
Based on similarity toKPZWSH`LK�Z[YH[PÄJH[PVU
Based on overlapwith displayed set
Add pathway
Stratify with displayedZ[YH[PÄJH[PVU
Find based on differentialexpression in displayed set
Add other data
Stratify with displayedZ[YH[PÄJH[PVU
Display\UZ[YH[PÄLK
Add pathway
Based on LogrankTest score (survival)
Manually
Add other data
Add independentcolumn
Add dependentcolumn
Add independentcolumn to existing one
Manually
Based on GSEA Based on PAGE
6WLU�8\LY`�>PaHYK�
Select clinical param.in LineUp view
in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view
6WLU�8\LY`�>PaHYK� 6WLU�8\LY`�>PaHYK�