Upload
harry-hochheiser
View
326
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Adventures in Translational Bioinformatics
Harry Hochheiser
University of Pittsburgh School of Medicine
Department of Biomedical [email protected]
© 2010 Harry HochheiserLicensed under Creative Commons Attribution-NonCommercial-NoDerivs license
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Biomedical Informatics The use of informatics for the improvement of biomedical
research and clinical care
Shortliffe & Blois “The Computer Meets Medicine andBiology: Emergence of a Discipline” in Shortliffe & Cimino, eds., “Biomedical Informatics: Computer Applications in Health Care and Biomedicine
Friedman “A "fundamental theorem" of biomedical informatics. JAMIA 2009
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
http://www.ncats.nih.gov/research/cts/cts.html
What is “Translational” research?● My research is translational...
● ..because I want to get funding?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Co-Clinical TrialsNardella, Lunardi, Pataik, and Cantley The APL Paradigm and the “Co-Clinical Trial” Project Cancer Discovery 2011● Acute Proyelocytic Leukemia
● 13 years of work – cloning, description, transgenics, trials
● Successful therapy now approved for clinical use..
● Co-clinical trials – synchronize human and animal trials.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Measuring Success
● How is success of translational science measured?
● Cures? Therapies?
● Citations mapping model organism studies to T2, T3, T4 success?
● How do we assess impact that doesn't play out in the usual 3-5 year grant cycle?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
FaceBasehttp://www.facebase.org
National Institute of Dental and Craniofacial Research
Five-year initial phase 2009-2014
“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”
10 Projects: U-flavor contracts
Data Management and Coordination Hub
– “One-stop access to craniofacial research data”
– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
FaceBase Projects Hochheiser, et al. 2011
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
FaceBase Data
● 10 projects
● 4 organisms: mouse, zebrafish, chick, human
● Varying developmental time points
● Data types● Expression: microarray, ChIP-Seq, miRNA, ● Images: 3D MRI, OPT, microMRI, microCT, 3D human facial
mesh● Sequences: Genotypes.GWAS, CNV● Software: 3D facial image analysis● Strains: Cre recobminase drivers
● 1 Data management and coordination hub
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
What does it mean to effectively share data?
Multiple data types
– Modalities
– Organisms
– Protocols
Multiple Groups
– Some collaborative, others...
.
Diversity of data
Diversity of projects, goals, etc.
Challenges are both technical and social/organizational
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
FaceBase Hub Technical Challenges
Data DiversitySequences
microarraysmiRNAImages..
Data Volume Genotypes RNA-Seq
Anatomy Phenotype
Integration UCSC Genome Browser Between datasets
“Controlled Access” Human Data Genotypes Facial Images
Tools Searching Browsing
Metadata
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Metadata: Theory
● Well-defined metadata fields/attributes
● Controlled vocabularies provide consistent terminology for each field
● Link to appropriate resources as needed: NCBI, MGI, etc..
● Additional attributes specific to each data type
● Consistent metadata supports search, navigation...
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Metadata in practice
“What's the difference between mutation and Genotype?”
“Uhh. I'll have to get back to you on that.”
“Strain name? Is “C57B6J” the same as “C57Bl/6J?”
“We're lucky to have any information at all when it comes to these mice...”
“The official name of the strain is ____, but everybody calls it ____”
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Biomedical Ontologies might solvesome terminology and structure problems
Human Anatomy: EHDAA, FMA
Mouse Anatomy: EMAP, MA
Mouse/Human Phenotypes:
MP/HPO
Genes: GO
Data Models: MIAME, MiXX,
ISA-TAB, OBI
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Metadata Possibilties: Ontologies & Data Models
Ontologies, Data models, etc. User Practice
Grand Canyon NPS http://www.fotopedia.com/items/fickr-7553734530
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Implications
Good metadata is vital for search, retrieval, data sharing, reuse
Curation effort can compensate for some inadequacies, but
– EC
Curation Effort, Mq Metadata Quality
– EC = f( 1/M
q)
– Possibly non-linear
–
This doesn't scale
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Search tools
● Search + Link: current practice in bioinformatics repositories
● Text search over metadata● Linear results list● Results pages with many links
● Advantages● Easy, fast
● Disadvantages● Relationships between items in result list are not clear● No overviews to help with higher-level meta-understanding
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
The Ontology of Craniofacial Development and MalformationExtend existing ontologies to provide richer models of
craniofacial anatomy, development, and malformation
Foundational model of Anatomy (FMA), Human Phenotype Ontology (HPO), Mouse Anatomy (MA)..
Add human developmental description
Human-Mouse links
https://www.facebase.org/content/ocdm
But... better models don't guarantee usage
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
The tool gap
“Tools have advanced to the point of being able to support users fairly successfully in finding and reading off data (e.g. to classify and find multidimensional relationships of interest) but not in being able to interactively explore these complex relationships in context to infer causal explanations and build convincing biological stories amid uncertainty.”
B. Mirel Supporting cognition in systems biology analysis: findings on users' processes and design implications (2009) J. Biomedical Discovery & Collaboration
• Need: Tools that effectively leverage combination of human intellect and computing power
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Information Visualization● Interactive displays of high-dimensional data sets
● Coordinated views facilitate comparison across dimensions and
● Multi-scale
● Overview● Zoom/filter● Full details on requested items
● Rapid, incremental, reversible queries
● Avoid 0-hit or million-hit queries.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Toward integrative tools
Human Mouse Zebrafish
Sequence
Expression
Images
Morphometry
Genotypes
How do we build connections?
How do we get users to think differently about the data?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Information Visualization● Interactive displays of high-dimensional data sets
● Coordinated views facilitate comparison across dimensions and
● Multi-scale
● Overview● Zoom/filter● Full details on requested items
● Rapid, incremental, reversible queries
● Avoid 0-hit or million-hit queries.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Technology Probe: Connecting related data items
• Single search, multiple data types
• Links between items indicate connections
• Images or other details on demand for direct access to data
• Rapid response supports interactivity, exploration, serendipitous discovery
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Technology Probe: Gene Atlas
• Genes displayed on timeline, indicating when active
• Images display expression localization
• Large image for detailed views.
• Mouse-over coordinated links
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Vision vs. reality
● Data is still coming in
● Gene atlas requires coordinated analysis ..
● What can we do with data that is available?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Technology Probe: Gene Atlas
• Genes displayed on timeline, indicating when active
• Images display expression localization
• Large image for detailed views.
• Mouse-over coordinated links
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Current Search Interface
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Timeline viewApproximate alignment of datasets on common timeline
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
3D Facial Meshes
Two datasets
– Weinberg & Marazita: Caucasian facial norms
– Spritz: Tanzanian
Identifiable data -access only upon DAC approval
Query tool: identify subsets for further analysis.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
UCSC Genome Browser Mirror
●Genomebrowser.facebase.org
● > 30 tracks, mouse mm9
● RNA-seq● ChIP-seq● Various anatomic
locations, time points, etc.
● Human CNV hg19
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Imaging
● microMRI,
● MicroCT
● OPT
● Interactive Browsers?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Fishface
Atlas of zebrafish craniofacial development
C. Kimmel, et al.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Current Status
As of 4 February 2013
> 200 datasets
> 100 in queue awaiting curation or investigator approval
Continuing to refine metadata and data organization
Expecting large infux of data over the next year.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Toward Greater Integration
Data
Currently
Data
Metadata Metadata
Links via metadata: Comparable organism, stage, Gene, etc..
Ideally
Data Data
Metadata Metadata
Links via data:Commonly expressed genes,Sequences,Pathways
“Easy” for similar dataExpression, Sequences
Harder for diverse data
Images?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
FaceBase as a model
● Diverse researchers and projects
● Focus: specific clinical domain
● Goal: collect and coordinated data – community resource
● Outstanding challenges?
● Is this even the right idea?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Build it and they (may) come?
●Speculative claim: assemble good data and it will be of use
●How to evaluate this?
●How to compare to n R01s that might otherwise have been funded? (question asked by program officer)
● ENCODE controversy
“..the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.”
M. Eisen http://www.michaeleisen.org/blog/?p=1179
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Three key factors
Tools Incentives
Outreach
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Desperately needed!
Search tools – finding data
Annotation Tools – describing data
– Using ontologies
– Common data models
Tools
Minimize the effort required to provide the high-quality metadata needed for translational bioinformatics.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Perhaps the worst bioinformatics data management tool
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Perhaps the best bioinformatics data management tool
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Pros and Cons of Spreadsheets
Cons:
– ad-hoc data models
– ad-hoc data fields
– No consistency, provenance...
Pros:
– Ubiquitous
– Simple
– Flexible
– Familiar
Analytic tools: more rigor, less generality
Alternatives:
Experiment Metadata
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Usability and OntologiesCan we at least get the terms right?
Maguire et al. 2012
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Beyond Autocomplete
● Substring autocomplete – state of art for finding terms in biomedical ontologies
● Cons: no context, room for specialization or generalization?
● Can we build a tool that will easily support both search and navigation?
Foundational Model Explorerhttp://fme.biostr.washington.edu/FME/index.html
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Why not let the curators do it?
● Expense
● Accuracy
● Speed
● Scientists know their data best
Claim – To be sustainable, data sharing must be self-curated.
But.. why should anyone bother?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Scientific incentives...
Research Results
PapersFunding
Traditionally.... Where doesData sharing fit in?
Data Sharing
?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
How to promote sharing..
1. Better tools → reduced effort
Eannotation
< Ecollection
+ epsilon
2. Recognize effort: alternative models of academic credit
ImpactStory...
Altmetrics: Value all research products (H. Piwowar, Nature 1/9/13, doi:10.1038/493159a)
“For all new grant applications from 14 January, the US National Science Foundation (NSF) asks a principal investigator to list his or her research “products” rather than “publications” in the biographical sketch section.”
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Related Efforts
● Genomics Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) – microbiome, expression, phenotype sharing
● LAMHDI/MONARCH – User tools for ontologically-driven discovery of model genes for human diseases
● Research Social Network Collaboration Finding tool – using VIVO and related networking tools to find collaborators..
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]
Acknowledgments
FaceBase: Mary Marazita, Jeff Murray, Yang Chai, Bruce Arnonow, Kristin Artinger, Teri Beaty, Jim Brinkley, David Clouthier, Michael Cunningham, Michael Dixon, Leah Rae Donahue, Scott E. Fraser, Benedikt Hallgrimsson, Junichi Iwata, Ophir Klein, Stephen Murray, Fernando Pardo-Manuel de Villena, John Postlethwait, Steven Potter, Lina Shapiro, Richard Spritz, Axel Visel, Seth Weinberg, Paul Trainor
U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel
OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro
NIDCR: Steven Scholnick, Emily Harris, Jeannine Helm, Nadya Lumelsky, Lillian Shum
Support: NIH Grants U01 DE020057, 3U01DE020050-03S1