58
Scien&fic Opportuni&es from Heterogeneous Biological Data Analysis: Overcoming Complexity Stephen Friend MD PhD President Sage Bionetworks (NonProfit) Integra&ng Environmental Health Data to Advance Discovery Session 1 Using Heterogeneous Data to Advance DIscovery

Friend NAS 2013-01-10

Embed Size (px)

DESCRIPTION

Stephen Friend, Jan 10, 2013. National Academy of Sciences, Washington DC

Citation preview

Page 1: Friend NAS 2013-01-10

Scien&fic  Opportuni&es  from  Heterogeneous    Biological  Data  Analysis:  Overcoming  Complexity  

       

Stephen  Friend  MD  PhD  President  

Sage  Bionetworks  (Non-­‐Profit)  

 Integra&ng  Environmental  Health  Data  to  Advance  Discovery  

 Session  1  Using  Heterogeneous  Data  to  Advance  DIscovery  

 

Page 2: Friend NAS 2013-01-10

Navigating between states of wellness

Rui Chang et al. PLoS Computational Biology

Normal State

Disease State

Page 3: Friend NAS 2013-01-10

Now  possible  to  generate  massive  amount  of  human  “omic’s”  data  

Page 4: Friend NAS 2013-01-10

             Network  Modeling  Approaches  for  Diseases  are  emerging  

Page 5: Friend NAS 2013-01-10

IT  Infrastructure  and  Cloud  compute  capacity  allows  a  genera&ve  open  approach  to  solving  problems  

Page 6: Friend NAS 2013-01-10

Nascent  Movement  for  pa&ents  to  Control  Sensi&ve  informa&on    allowing  sharing    

Page 7: Friend NAS 2013-01-10

Open  Social  Media  allows  ci&zens  and  experts  to  use    gaming  to  solve  problems  

Page 8: Friend NAS 2013-01-10

1-­‐  Now  possible  to  generate  massive  amount  of  human  “omic’s”  data    2-­‐Network  Modeling  Approaches  for  Diseases  are  emerging      3-­‐  IT  Infrastructure  and  Cloud  compute  capacity  allows  a  genera&ve  open  approach  to  biomedical  problem  solving      4-­‐Nascent  Movement  for  pa&ents  to  Control  Sensi&ve  informa&on    allowing  sharing    5-­‐  Open  Social  Media  allows  ci&zens  and  experts  to  use    gaming  to  solve  problems      

 A  HUGE  OPPORTUNITY  -­‐-­‐    A  HUGE  RESPONSIBILITY  

Page 9: Friend NAS 2013-01-10
Page 10: Friend NAS 2013-01-10
Page 11: Friend NAS 2013-01-10
Page 12: Friend NAS 2013-01-10
Page 13: Friend NAS 2013-01-10
Page 14: Friend NAS 2013-01-10

HEART

VASCULATURE

KIDNEY

IMMUNE SYSTEM

transcriptional network

protein network

metabolite network

Non-coding RNA network

GI TRACT

BRAIN

ENVIRONMENT EN

VIR

ON

MEN

T

ENVIRONMENT

ENVI

RO

NM

ENT

Page 15: Friend NAS 2013-01-10
Page 16: Friend NAS 2013-01-10
Page 17: Friend NAS 2013-01-10
Page 18: Friend NAS 2013-01-10

.

Page 19: Friend NAS 2013-01-10

 TENURE      FEUDAL  STATES          

Page 20: Friend NAS 2013-01-10
Page 21: Friend NAS 2013-01-10

•  alchemist  

Page 22: Friend NAS 2013-01-10

The value of appropriate representations/ maps

Page 23: Friend NAS 2013-01-10
Page 24: Friend NAS 2013-01-10

 BUILDING  PRECISION  MEDICINE  

   

Extensions  of  Current  Ins&tu&ons      

Proprietary  Short  term  Solu&ons      

Open  Systems  of  Sharing  in  a  Commons  

Page 25: Friend NAS 2013-01-10

Why  Sage  Bionetworks?  (non-­‐profit)  

We  believe  in  a  world  where  biomedical  research  is  about  to  fundamentally  change.  We  think  it  will  be  o^en  conducted  in  an  open,  collabora1ve  way  where  teams  of  teams    can  contribute  to  making  be_er,  faster,  relevant  discoveries  

We  enable  others  

•  Leading  biomedical  modeling  research    

•  Novel  training  doctoral  and  internship  programs  

We  ac1vate/We  challenge  

•  Diverse  collabora&ons  with  individuals/researchers  and  ins&tu&ons  to  collec&vely    encourage  sharing  

•  Use  Crowdsourcing  approaches  to  engage  the  communi&es  

•  Developing  pla%orms  for  collabora&on  and  engagement  –  Synapse,  BRIDGE    

•  Defining  governance  approaches–  Portable  Legal  Consent  

 

We  research  

Page 26: Friend NAS 2013-01-10

Collaborators  (par&al)  

§  Government §  NIH, LSDF, NCI  

§  Pharma  Partners  §  Merck,  Pfizer,  Takeda,  Astra  Zeneca,                Amgen,  Roche,    Johnson  &Johnson,  H3  

26

§  Foundations §  Kauffman CHDI, Gates Foundation §  RWJF, Sloan, OneMind

§  Academic §  Levy (Framingham) §  Rosengren (Lund) §  Krauss (CHORI) §  Schadt (MSSM) §  c

§  Federation §  Ideker, Califano, Nolan, Schadt, Vidal

Page 27: Friend NAS 2013-01-10

Better Models of Disease:

INFORMATION COMMONS

Technology Platform

Challenges

Impa

ctfu

l Mod

els

Governance

Page 28: Friend NAS 2013-01-10

Two  recurring  problems  in  Alzheimer’s  disease  research  

28  

Stage  3:    

Iden

&fy  causal  re

la&o

nships  

Ambiguous  pathology    Are  disease-­‐associated  molecular  systems  &  genes  destruc&ve,  adap&ve,  or  both?    Bo_om  line:  We  need  to  iden&fy  causal  factors  vs  correla&ve  or  adap&ve  features  of  disease.  

Diverse  mechanisms    How  do  diverse  muta&ons  and  environmental  factors  combine  into  a  core  pathology?    Bo_om  line:  There  is  no  rigorous  /  consistent  global  framework  that  integrates  diverse  disease  factors.  

     

Page 29: Friend NAS 2013-01-10

Iden&fying  key  disease  systems  and  genes-­‐  Gaiteri  et  al.  

Example  “modules”  of  coexpressed  genes,  color-­‐coded  

1.)  Iden&fy  groups  of  genes  that  move  together  –  co-­‐expressed  “modules”                                -­‐  correlated  expression  of  mul&ple  genes  across  many  pa&ents                                  -­‐  co-­‐expression  calculated  separately  for  Disease/healthy  groups  

                                 -­‐  these  gene  groups  are  o^en  coherent  cellular  subsystems,  enriched  in  one  or    

 more  GO  func&ons        

Page 30: Friend NAS 2013-01-10

1.)  Iden&fy  groups  of  genes  that  move  together  –  coexpressed  “modules”    2.)  Priori&ze  the  disease-­‐relevance  of  the  modules  by  clinical  and  network  measures    

Priori&ze  modules  through  expression  synchrony  with  clinical  measures  or  tendency  to  reconfigure  themselves  in  disease  

vs  

Iden&fying  key  disease  systems  and  genes  

Page 31: Friend NAS 2013-01-10

Infer  directed/causal  rela&onships  and  clear  hierarchical  structure  by  incorpora&ng  eSNP  informa&on  (no  hair-­‐balls  here)    

vs  

Priori&ze  modules  through  expression  synchrony  with  clinical  measures  or  tendency  too  reconfigure  themselves  in  disease  

Iden&fying  key  disease  systems  and  genes  

1.)  Iden&fy  groups  of  genes  that  move  together  –  coexpressed  “modules”    2.)  Priori&ze  the  disease-­‐relevance  of  the  modules  by  clinical  and  network  measures    3.)  Incorporate  gene&c  informa&on  to  find  directed  rela&onships  between  genes    

Page 32: Friend NAS 2013-01-10

Figure  key:      

Five  main  immunologic  families  found  in  Alzheimer’s-­‐associated  module    Square  nodes  in  surrounding  network  denote  literature-­‐supported  nodes.    Node  size  is  propor@onal  to  connec@vity  in  the  full  module.      

 

(Interior    circle)  Width  of  connec@ons  between  5  immune  families  are  linearly  scaled  to  the  number  of  inter-­‐family  connec@ons.    

 

Labeled  nodes  are  either  highly  connected  in  the  original  network,  implicated  by  at  least  2  papers  as  associated  with  Alzheimer’s  disease,  or  core  members  of  one  of  the  5  immune  families.    

 

Core    family  members  are  shaded.  

Page 33: Friend NAS 2013-01-10

Transforming  networks  into  biological  hypotheses  

Page 34: Friend NAS 2013-01-10

Tes&ng  network-­‐based  hypotheses  

Page 35: Friend NAS 2013-01-10

Design-­‐stage  AD  projects  at  Sage  

Fusing  our  exper&se  in…  

Join  us  in  uni&ng  genes,  circuits  and  regions  to  build  mul&-­‐scale  biophysical  disease  models.    Contact  [email protected]  

Diffusion  Spectrum  Imaging  

Microcircuits  &    neuronal  diversity  

Gene  regulatory  networks  

Feedback  

Page 36: Friend NAS 2013-01-10

Tool:    PORTABLE  LEGAL  CONSENT  Control  of  Private  informa&on  by  Ci&zens  allows  sharing  

   weconsent.us  John  Wilbanks  

 

•  Online  educa&onal  wizard  •  Tutorial  video  •   Legal  Informed  Consent  Document  •   Profile  registra&on  •   Data  upload        

John  Wilbanks  TED  Talk  “Let’s  pool  our  medical  data”  weconsent.us  

Page 37: Friend NAS 2013-01-10

two approaches to building common scientific knowledge

Text summary of the completed project Assembled after the fact

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Page 38: Friend NAS 2013-01-10

Synapse is GitHub for Biomedical Data

•  Data and code versioned •  Analysis history captured in real time •  Work anywhere, and share the results with anyone •  Social/Interactive Science

•  Every code change versioned •  Every issue tracked •  Every project the starting point for new work •  Social/Interactive Coding

Page 39: Friend NAS 2013-01-10

Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

Page 40: Friend NAS 2013-01-10

“Synapse  is  a  compute  plaiorm    for  transparent,  reproducible,  and    modular  collabora&ve  research.”  

Page 41: Friend NAS 2013-01-10

Currently  at  16K+  datasets  and  ~1M  models  

Page 42: Friend NAS 2013-01-10

Download analysis and meta-analysis Download another Cluster Result Download Evaluation and view more stats

•  Perform Model averaging •  Compare/contrast models •  Find consensus clusters •  Visualize in Cytoscape

Page 43: Friend NAS 2013-01-10

Pancancer collaborative subtype discovery

Page 44: Friend NAS 2013-01-10

Objective assessment of factors influencing model performance (>1 million predictions evaluated)

Sanger   CCLE  Predic&on  accuracy  

improved  by…  

Not  discre&zing  data  

Including  expression  data  

Elas&c  net  regression  

130  compounds   24  compounds  

Cross  v

alida1

on  predic1on

 accuracy  (R

2 )  

In  Sock  Jang  

Page 45: Friend NAS 2013-01-10
Page 46: Friend NAS 2013-01-10

Erich  Huang,  Brian  Bot,  Dave  Burdick  

Page 47: Friend NAS 2013-01-10
Page 48: Friend NAS 2013-01-10

Sage-­‐DREAM  Breast  Cancer  Prognosis  Challenge      Building  be_er  disease  models  together  

154  par&cipants;  27  countries    334  par&cipants;  >35  countries    

>500  models  posted  to  Leaderboard  

breast  cancer  data  

Challenge  Launch:  July  17  

Sep  26  Status  

Caldos/Aparicio

Sage  Bionetworks-­‐DREAM  Breast  Cancer  Prognosis  Challenge    Phase  2  Best  Performing  Team:  A_ractor  Metagenes    Team  Members:  Wei-­‐Yi  Cheng,  Tai-­‐Hsien  Ou  Yang,  and  Dimitris  Anastassiou      

Page 49: Friend NAS 2013-01-10

How  to  accelerate  and  make  affordable    the  efforts  required  to  build  be_er  models  of  disease  ?  

   

Build  a  way  for  the  pa&ents  ac&vely  to  engage  with  exis&ng  researchers  to  share  their  insights  in  real-­‐&me  around  what  is  happening  to  them  (  their  state  of  wellness  or  disease)  where  their  narra&ves,  samples,  data,  insights,  and  funds  are  shown  to  enable  decision  making  in  what  they  should  do,  what  treatments  they  need  

 

Page 50: Friend NAS 2013-01-10
Page 51: Friend NAS 2013-01-10

BRIDGE Seed Projects

Fanconi  Anemia  Project  

Melanoma  Hunt  

Diabetes  Ac1vated  Community  

Breast  Cancer    

Chronic  Fa1gue  

Syndrome  

51  

Page 52: Friend NAS 2013-01-10

ABCDE  “ugly  duckling”  Dermoscopy  Pathology  Molecular  

MD  

There  is  no  standard  screening  program  for  skin  lesions;  seeing  an  MD  is  self  directed  

Educa&on  is  derived    from  top-­‐down  experien&al  knowledge  

?Photos  

HPI  

Best  accuracy  of  clinical  diagnosis  =  64%  (Grin,  1990)  

160k  new  cases/year  48k  deaths  in  2012  in  US  

Both  intra-­‐  and  inter-­‐  ins&tu&onal  data  are  siloed  

MELANOMA  Screening  –  Could  it  be  be_er?  

52  

Page 53: Friend NAS 2013-01-10
Page 54: Friend NAS 2013-01-10

1.Ac1vated  ci1zens    take  skin  pictures  

2.  Store  tons  of  data!  

3.  Run  algorithmic  cChallenges  in  the  compute  space  

4.  Give  back  risk-­‐assessment  &  educa1on  to  the  ci1zens  

virtual  cycle:  con&nuous  aggrega&on  of  data  enriching  the  model    

Initial focus on building the data needed

54  

Novel Data collection + Usage

Page 55: Friend NAS 2013-01-10

   1-­‐Now  possible  to  generate  massive  amount  of  human  “omic’s”  data    2-­‐  Network  Modeling  for  Diseases  are  emerging      3-­‐  IT  Infrastructure  and  Cloud  compute  capacity  allows  a  genera&ve  open  approach  to  biomedical  problem  solving      4-­‐Nascent  Movement  for  pa&ents  to  Control  Private  informa&on    allowing  sharing    5-­‐Open  Social  Media  allowing  ci&zens  and  experts  to  use    gaming  to  solve  problems      THESE  FIVE  TRENDS    CAN  ENABLE  SUSTAINABLE  AFFORDABLE  WAYS  TO  DEVELOP  THE  REQUIRED  DATA  INTEGRATION  TO  OVERCOME  THE  PUZZLE  OF  THE  CURRENT  COMPLEXITY  

Page 56: Friend NAS 2013-01-10
Page 57: Friend NAS 2013-01-10

Navigating between states of wellness

Rui Chang et al. PLoS Computational Biology

Normal State

Disease State

Page 58: Friend NAS 2013-01-10

Fourth  Sage  Commons  Congress  –  San  Francisco  April  19-­‐20                                    Ten  Young  Inves&gator  Awards        

 Bob  Young  

   Top  Hat    Joep  Lange  

 AIDS  Organizer    Wadah  Khanfar    

 Ex-­‐  Al  Jazeera    Patrick  Meier  

 Ex-­‐  Ushhidi    Jennifer  Pahlka            Code  for  America