Upload
nils-gehlenborg
View
74
Download
4
Embed Size (px)
Citation preview
Provenance Visualization for Biomedical Discovery
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
Nils Gehlenborg・http://gehlenborglab.org・@nils_gehlenborg
Tracing the Origins of Data and Ideas
Nature asked 1,576 researchers if there is a reproducibility crisis in science.
M Baker, Nature 533, 452-454, 2016
0% 100%
No crisis (3%)
Don’t know (7%)
Slight crisis (38%)
M Baker, Nature 533, 452-454, 2016
Significant crisis (52%)
Nature asked 1,576 researchers if there is a reproducibility crisis in science.
M Baker, Nature 533, 452-454, 2016
Intentional?
Inability to capture everything?
Inability to communicate everything?
M Baker, Nature 533, 452-454, 2016
Intentional?
Inability to capture everything?Inability to communicate everything?
SOCIAL ISSUE
TECHNICAL ISSUES
The term, provenance, has been used in a variety of ways to describe different types of origins and histories.
ED Ragan et al., IEEE Transactions on Visualization and Computer Graphics 22, 31 – 40, 2015
Types of Provenance Information
ED Ragan et al., IEEE Transactions on Visualization and Computer Graphics 22, 31 – 40, 2015
Data Visualization
Interaction
Insight
Rationale
COMPUTATIONAL WORKFLOWS
EXPLORATORY VISUALIZATION
Types of Provenance Information
Data Visualization
Interaction
Insight
Rationale
COMPUTATIONAL WORKFLOWS
EXPLORATORY VISUALIZATION
Types of Provenance Information
PUBLICATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
PRESENTATION“Storytelling”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis generation
EXPLORATION“Pattern Discovery”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
HYPOTHESIS
hypothesis generation
EXPLORATION
HYPOTHESIS-DRIVEN DISCOVERY
“Pattern Discovery”
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION
DATA-DRIVEN DISCOVERY
“Pattern Discovery”
mRNA expression microRNA expression
DNA methylation
protein expression
copy number variants mutation calls
clinical parameters
mRNA expression microRNA expressionDNA methylation
protein expressioncopy number variants mutation calls
clinical parameters
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
DEL NORMAL AMP
C4C3C2C1
mRNA expression
copy number variants
mutation calls
WILDTYPEMUT
- clustering
- gene X
- gene Y
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, Comput Graph Forum, 2012 M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
Divide & Conquer Visualization: StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications Clinical Params
Pathways
GUIDED EXPLORATION
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
Query
Rank
VisualizeStratifications
Clinical Params Pathways
GUIDED EXPLORATION
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, Nat Methods, 2014
DATA-DRIVEN DISCOVERY
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION“Pattern Discovery”
DATA-DRIVEN DISCOVERY
PUBLICATION
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis generation
EXPLORATION
PRESENTATION“Storytelling”
“Pattern Discovery”
finding figure/videoAuthoringExploration Presentation
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
Current Model
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration PresentationWhat we show.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
What we tell.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
finding figure/videoAuthoringExploration Presentation
What we did.
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
track provenance
annotate observations
make sense of observations
tell the story
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
Capture
Label
Understand
Explain
track provenance
annotate observations
make sense of observations
tell the story
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
Capture
Label
Understand
Explain
track provenance
annotate observations
make sense of observations
tell the story
C
L
U
E
CLUE
vistories
Authoring
Exploration Presentation
DATA-DRIVEN DISCOVERY
DATA-DRIVEN COMMUNICATION
CLUE Model
VISTORY = visualization + story + history
Do collaborative data analysis.
Use during peer-review.
Publish with a paper.
Embed in a presentation.
SAMUEL GRATZL JOHANNES KEPLER UNIVERSITY LINZ
ALEXANDER LEX UNIVERSITY OF UTAH
MARC STREIT JOHANNES KEPLER UNIVERSITY LINZ
HOLGER STITZ JOHANNES KEPLER UNIVERSITY LINZ
My lab is hiring postdocs!
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details.
Data visualization, analysis, and management for: • exploration tools for data repositories • provenance graphs • genomic structural variants • dynamics of the 3D genome • cancer subtypes in patient cohorts