Upload
nils-gehlenborg
View
632
Download
1
Embed Size (px)
Citation preview
Data Visualization to Enhance our Understanding of the Cancer Genome
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
NILS GEHLENBORG
@nils_gehlenborg http://gehlenborglab.org
SAMUEL GRATZL JOHANNES KEPLER UNIVERSITY LINZ
ALEXANDER LEX UNIVERSITY OF UTAH
MARC STREIT JOHANNES KEPLER UNIVERSITY LINZ
ROLE OF VISUALIZATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
PUBLICATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
PUBLICATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
PRESENTATION“Storytelling”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
EXPLORATION“Pattern Discovery”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
HYPOTHESIS
hypothesis generation
EXPLORATION
HYPOTHESIS-DRIVEN DISCOVERY
“Pattern Discovery”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION
DATA-DRIVEN DISCOVERY
“Pattern Discovery”
The Cancer Genome Atlas
10,000+
genomes + clinical data + molecular data
CANCER SUBTYPES
mRNA expression microRNA expression
DNA methylation
protein expression
copy number variants mutation calls
clinical parameters
mRNA expression - clustering
C4C3C2C1
mRNA expression - clustering
C4C3C2C1
mRNA expression
copy number variants
- clustering
- gene X
C4C3C2C1
mRNA expression
copy number variants
DEL NORMAL AMP
- clustering
- gene X
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
- clustering
- gene X
- gene Y
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
ALGORITHMIC APPROACHES
VISUALIZATION APPROACHES
unsupervised clustering of multivariate data
integrated clustering across data types (e.g., Mo et al., PNAS, 2013)
correlation testing
integrative heatmaps
network-based stratification (Hofree et al., Nat Methods, 2014)
Verhaak et al., Cancer Cell, 2010
Matrix Visualization: Publication Figure
ALGORITHMIC APPROACHES
VISUALIZATION APPROACHES
unsupervised clustering of multivariate data
integrated clustering across data types (e.g., Mo et al., PNAS, 2013)
correlation testing
integrative heatmaps
network-based stratification (Hofree et al., Nat Methods, 2014)
genome browsers (UCSC Cancer Genomics Browser, IGV)
Robinson et al., Nat Biotech, 2011
Genome-Centric Visualization: IGV
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, Comput Graph Forum, 2012 M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
Divide & Conquer Visualization: StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications Clinical Params
Pathways
GUIDED EXPLORATION
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
Query
Rank
VisualizeStratifications
Clinical Params Pathways
GUIDED EXPLORATION
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
AND NOW WHAT?
DATA-DRIVEN DISCOVERY
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION“Pattern Discovery”
DATA-DRIVEN DISCOVERY
PUBLICATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION
PRESENTATION“Storytelling”
“Pattern Discovery”
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
Current Model
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration PresentationWhat we show.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
What we tell.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
What we did.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
track provenance
annotate observations
make sense of observations
tell the story
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
Capture
Label
Understand
Explain
track provenance
annotate observations
make sense of observations
tell the story
C
L
U
E
CLUE
vistories
Authoring
Exploration Presentation
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
CLUE Model
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
Exploration
Authoring
Presentation
VISTORY = visual story + historyDo collaborative data analysis.
Use during peer-review.
Publish with a paper.
Embed in a presentation.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
http://vistories.orgDemos and prototypes built with
We are hiring postdocs & developers!
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details.
Data visualization, analysis, and management for: • genomic structural variants • dynamics of the 3D genome • cancer subtypes in patient cohorts • exploration tools for data repositories • provenance graphs