Upload
sage-base
View
399
Download
0
Embed Size (px)
DESCRIPTION
Stephen Friend, Jan 10, 2013. National Academy of Sciences, Washington DC
Citation preview
Scien&fic Opportuni&es from Heterogeneous Biological Data Analysis: Overcoming Complexity
Stephen Friend MD PhD President
Sage Bionetworks (Non-‐Profit)
Integra&ng Environmental Health Data to Advance Discovery
Session 1 Using Heterogeneous Data to Advance DIscovery
Navigating between states of wellness
Rui Chang et al. PLoS Computational Biology
Normal State
Disease State
Now possible to generate massive amount of human “omic’s” data
Network Modeling Approaches for Diseases are emerging
IT Infrastructure and Cloud compute capacity allows a genera&ve open approach to solving problems
Nascent Movement for pa&ents to Control Sensi&ve informa&on allowing sharing
Open Social Media allows ci&zens and experts to use gaming to solve problems
1-‐ Now possible to generate massive amount of human “omic’s” data 2-‐Network Modeling Approaches for Diseases are emerging 3-‐ IT Infrastructure and Cloud compute capacity allows a genera&ve open approach to biomedical problem solving 4-‐Nascent Movement for pa&ents to Control Sensi&ve informa&on allowing sharing 5-‐ Open Social Media allows ci&zens and experts to use gaming to solve problems
A HUGE OPPORTUNITY -‐-‐ A HUGE RESPONSIBILITY
HEART
VASCULATURE
KIDNEY
IMMUNE SYSTEM
transcriptional network
protein network
metabolite network
Non-coding RNA network
GI TRACT
BRAIN
ENVIRONMENT EN
VIR
ON
MEN
T
ENVIRONMENT
ENVI
RO
NM
ENT
.
TENURE FEUDAL STATES
• alchemist
The value of appropriate representations/ maps
BUILDING PRECISION MEDICINE
Extensions of Current Ins&tu&ons
Proprietary Short term Solu&ons
Open Systems of Sharing in a Commons
Why Sage Bionetworks? (non-‐profit)
We believe in a world where biomedical research is about to fundamentally change. We think it will be o^en conducted in an open, collabora1ve way where teams of teams can contribute to making be_er, faster, relevant discoveries
We enable others
• Leading biomedical modeling research
• Novel training doctoral and internship programs
We ac1vate/We challenge
• Diverse collabora&ons with individuals/researchers and ins&tu&ons to collec&vely encourage sharing
• Use Crowdsourcing approaches to engage the communi&es
• Developing pla%orms for collabora&on and engagement – Synapse, BRIDGE
• Defining governance approaches– Portable Legal Consent
We research
Collaborators (par&al)
§ Government § NIH, LSDF, NCI
§ Pharma Partners § Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Roche, Johnson &Johnson, H3
26
§ Foundations § Kauffman CHDI, Gates Foundation § RWJF, Sloan, OneMind
§ Academic § Levy (Framingham) § Rosengren (Lund) § Krauss (CHORI) § Schadt (MSSM) § c
§ Federation § Ideker, Califano, Nolan, Schadt, Vidal
Better Models of Disease:
INFORMATION COMMONS
Technology Platform
Challenges
Impa
ctfu
l Mod
els
Governance
Two recurring problems in Alzheimer’s disease research
28
Stage 3:
Iden
&fy causal re
la&o
nships
Ambiguous pathology Are disease-‐associated molecular systems & genes destruc&ve, adap&ve, or both? Bo_om line: We need to iden&fy causal factors vs correla&ve or adap&ve features of disease.
Diverse mechanisms How do diverse muta&ons and environmental factors combine into a core pathology? Bo_om line: There is no rigorous / consistent global framework that integrates diverse disease factors.
Iden&fying key disease systems and genes-‐ Gaiteri et al.
Example “modules” of coexpressed genes, color-‐coded
1.) Iden&fy groups of genes that move together – co-‐expressed “modules” -‐ correlated expression of mul&ple genes across many pa&ents -‐ co-‐expression calculated separately for Disease/healthy groups
-‐ these gene groups are o^en coherent cellular subsystems, enriched in one or
more GO func&ons
1.) Iden&fy groups of genes that move together – coexpressed “modules” 2.) Priori&ze the disease-‐relevance of the modules by clinical and network measures
Priori&ze modules through expression synchrony with clinical measures or tendency to reconfigure themselves in disease
vs
Iden&fying key disease systems and genes
Infer directed/causal rela&onships and clear hierarchical structure by incorpora&ng eSNP informa&on (no hair-‐balls here)
vs
Priori&ze modules through expression synchrony with clinical measures or tendency too reconfigure themselves in disease
Iden&fying key disease systems and genes
1.) Iden&fy groups of genes that move together – coexpressed “modules” 2.) Priori&ze the disease-‐relevance of the modules by clinical and network measures 3.) Incorporate gene&c informa&on to find directed rela&onships between genes
Figure key:
Five main immunologic families found in Alzheimer’s-‐associated module Square nodes in surrounding network denote literature-‐supported nodes. Node size is propor@onal to connec@vity in the full module.
(Interior circle) Width of connec@ons between 5 immune families are linearly scaled to the number of inter-‐family connec@ons.
Labeled nodes are either highly connected in the original network, implicated by at least 2 papers as associated with Alzheimer’s disease, or core members of one of the 5 immune families.
Core family members are shaded.
Transforming networks into biological hypotheses
Tes&ng network-‐based hypotheses
Design-‐stage AD projects at Sage
Fusing our exper&se in…
Join us in uni&ng genes, circuits and regions to build mul&-‐scale biophysical disease models. Contact [email protected]
Diffusion Spectrum Imaging
Microcircuits & neuronal diversity
Gene regulatory networks
Feedback
Tool: PORTABLE LEGAL CONSENT Control of Private informa&on by Ci&zens allows sharing
weconsent.us John Wilbanks
• Online educa&onal wizard • Tutorial video • Legal Informed Consent Document • Profile registra&on • Data upload
John Wilbanks TED Talk “Let’s pool our medical data” weconsent.us
two approaches to building common scientific knowledge
Text summary of the completed project Assembled after the fact
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Synapse is GitHub for Biomedical Data
• Data and code versioned • Analysis history captured in real time • Work anywhere, and share the results with anyone • Social/Interactive Science
• Every code change versioned • Every issue tracked • Every project the starting point for new work • Social/Interactive Coding
Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
“Synapse is a compute plaiorm for transparent, reproducible, and modular collabora&ve research.”
Currently at 16K+ datasets and ~1M models
Download analysis and meta-analysis Download another Cluster Result Download Evaluation and view more stats
• Perform Model averaging • Compare/contrast models • Find consensus clusters • Visualize in Cytoscape
Pancancer collaborative subtype discovery
Objective assessment of factors influencing model performance (>1 million predictions evaluated)
Sanger CCLE Predic&on accuracy
improved by…
Not discre&zing data
Including expression data
Elas&c net regression
130 compounds 24 compounds
Cross v
alida1
on predic1on
accuracy (R
2 )
In Sock Jang
Erich Huang, Brian Bot, Dave Burdick
Sage-‐DREAM Breast Cancer Prognosis Challenge Building be_er disease models together
154 par&cipants; 27 countries 334 par&cipants; >35 countries
>500 models posted to Leaderboard
breast cancer data
Challenge Launch: July 17
Sep 26 Status
Caldos/Aparicio
Sage Bionetworks-‐DREAM Breast Cancer Prognosis Challenge Phase 2 Best Performing Team: A_ractor Metagenes Team Members: Wei-‐Yi Cheng, Tai-‐Hsien Ou Yang, and Dimitris Anastassiou
How to accelerate and make affordable the efforts required to build be_er models of disease ?
Build a way for the pa&ents ac&vely to engage with exis&ng researchers to share their insights in real-‐&me around what is happening to them ( their state of wellness or disease) where their narra&ves, samples, data, insights, and funds are shown to enable decision making in what they should do, what treatments they need
BRIDGE Seed Projects
Fanconi Anemia Project
Melanoma Hunt
Diabetes Ac1vated Community
Breast Cancer
Chronic Fa1gue
Syndrome
51
ABCDE “ugly duckling” Dermoscopy Pathology Molecular
MD
There is no standard screening program for skin lesions; seeing an MD is self directed
Educa&on is derived from top-‐down experien&al knowledge
?Photos
HPI
Best accuracy of clinical diagnosis = 64% (Grin, 1990)
160k new cases/year 48k deaths in 2012 in US
Both intra-‐ and inter-‐ ins&tu&onal data are siloed
MELANOMA Screening – Could it be be_er?
52
1.Ac1vated ci1zens take skin pictures
2. Store tons of data!
3. Run algorithmic cChallenges in the compute space
4. Give back risk-‐assessment & educa1on to the ci1zens
virtual cycle: con&nuous aggrega&on of data enriching the model
Initial focus on building the data needed
54
Novel Data collection + Usage
1-‐Now possible to generate massive amount of human “omic’s” data 2-‐ Network Modeling for Diseases are emerging 3-‐ IT Infrastructure and Cloud compute capacity allows a genera&ve open approach to biomedical problem solving 4-‐Nascent Movement for pa&ents to Control Private informa&on allowing sharing 5-‐Open Social Media allowing ci&zens and experts to use gaming to solve problems THESE FIVE TRENDS CAN ENABLE SUSTAINABLE AFFORDABLE WAYS TO DEVELOP THE REQUIRED DATA INTEGRATION TO OVERCOME THE PUZZLE OF THE CURRENT COMPLEXITY
Navigating between states of wellness
Rui Chang et al. PLoS Computational Biology
Normal State
Disease State
Fourth Sage Commons Congress – San Francisco April 19-‐20 Ten Young Inves&gator Awards
Bob Young
Top Hat Joep Lange
AIDS Organizer Wadah Khanfar
Ex-‐ Al Jazeera Patrick Meier
Ex-‐ Ushhidi Jennifer Pahlka Code for America