43
Justin Guinney, PhD Director, Computational Oncology Sage Bionetworks Co-Director DREAM Challenges Open science and data sharing in practice

Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Justin Guinney, PhD

Director, Computational Oncology

Sage Bionetworks

Co-Director

DREAM Challenges

Open science and data sharing

in practice

Page 2: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

promote open systems, incentives, and norms

to redefine how complex biological data is

gathered, shared, and used

Page 3: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

our research is built on three pillars

Open

Science

Team

Science

Participant

centered

Science

Page 4: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

we pilot approaches

to create open systems, incentives, and norms

Pilot Systems and Approaches

Open

Science

Team

Science

Participant

centered

Science

Page 5: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

we build infrastructure

to provide robust, reusable solutions

Infrastructure

Open

Science

Team

Science

Participant

centered

Science

Pilot Systems and Approaches

Page 6: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

we support research communities

that operate under these principles

Infrastructure

Pilot Systems and Approaches

Open

Science

Team

Science

Participant

centered

Science

Page 7: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

cancer communities

Cancer Systems

Biology

Project Genie

NTAP CTF BD2K Neo-epitopes

DREAM Challenges Colorectal cancer

Infrastructure

Pilot Systems and Approaches

Open

Science

Team

Science

Participant

centered

Science

e-consent

Page 8: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Data sharing: with whom?

Sharing with the research community.

Sharing with collaborators.

Sharing with oneself.

Page 9: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Barriers to sharing

Culture / reluctance to share / weak sharing policies

Disorganization and lack of mechanisms to facilitate sharing

Page 10: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Synapse: data management system

http://synapse.org

Page 11: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations
Page 12: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations
Page 13: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Synapse

Dash boarding for meta-data

Access controls for sharing

Governance facilities and auditing

Docker store for methods and pipelines

Embedding of visualizations and tools

Page 14: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

CTF & Sage

Building networks among CTF researchers

powered by

Page 15: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Building a network for the NF community

Page 16: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Data sharing vignettes

1. AACR Project GENIE

2. DREAM Challenges

Page 17: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations
Page 18: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

GENIE: Motivation

Page 19: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

GENIE Consortium

Page 20: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

First Data Release

Released January 5, 2017

~19,000 samples

Includes genomic data plus Tier 1 Clinical Data: cancer type, primary v.

metastatic sample, gender, race, age at sequencing, etc.

Data is now available at:

Sage Synapse Platform: http://synapse.org/genie

cBioPortal for Cancer Genomics: http://www.cbioportal.org/genie/

Users are required to agree to terms of access at each site.

Page 21: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Multiple Gene Panels

Page 22: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

GENIE Landscape Three largest sample sets:

Non-Small Cell Lung Cancer

Breast Cancer

Colorectal Cancer

Page 23: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Landscape of Clinical Actionability

Long tail of Level 2B mutations

where mutation is linked to

standard therapy in a different

cancer type.

Page 24: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

GENIE’s future 2nd release scheduled for end of 2017

Expect to double current database size: over 40k samples!!

More extensive clinical annotation, including patient outcomes, staging,

and treatments

In process of moving GENIE data to GDC!

Page 25: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations
Page 26: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

A crowdsourcing effort that poses quantitative challenges in biomedicine.

Our mission is

to contribute to the solution of important biomedical problems

to foster collaboration between research groups

to democratize data

to accelerate research

to objectively assess and benchmark algorithms

Page 27: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Over last 10 years, we have run Challenges on:

Breast cancer prognosis

Prostate cancer prognosis

Somatic variant detection

Drug sensitivity prediction

Drug combination prediction

Drug toxicity prediction

ALS

Alzheimer’s

Many others…

Page 28: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Models of

sharing

Page 29: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

‘Data to model(ers)’

Page 30: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

30

Goal: Predict overall survival in patients with

metastatic castration resistant prostate cancer

Enthuse 33

N=470

Training data Validation data

Enthuse M1

N=380

Guinney, et al, Lancet Oncology, 2017

Page 31: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

How can we improve model reproducibility?

How can we improve utilization of restricted data?

Page 32: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Cheap and scalable data storage and

computing

Page 33: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Virtualization and container

technologies: platform agnostic

application and model portability

Page 34: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

‘Data to model(ers)’

‘Models to data’ Hybrid:

Page 35: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Goal: Improve identification of “high-risk” patients with

newly diagnosed multiple myeloma

Public Private

Page 36: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

‘Models to data’

Page 37: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Goal: Improve accuracy of digital mammograms screening by

classifying images as low or high risk for breast cancer

1 in 10 women are falsely diagnosed with breast cancer.

Page 38: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

641k images

146k exams

87k women

But…

Images not allowed to

be directly accessed by

participants.

Page 39: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

641k

10k

600k

Page 40: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

~ 1k participants

~ 10k model submissions

~ 1k TB (1 Petabyte) data usage

~ 874k CPU-hours

Key statistics: DM Challenge

Challenge summary • Currently, in 3rd round of leaderboard phase

• Validation phase begins in April

• Currently, top models are performing as well as a

radiologist (sensitivity + specificity)

Page 41: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Prostate Cancer,

Drug Combination,

Toxicogenomics

Multiple Myeloma,

RNA fusion detection

Digital Mammography

Challenges

Page 42: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Data sharing: what can you do?

• Play an active role in setting data sharing policies.

• Set clear guidelines and expectations on what is meant by data sharing.

• Put in place mechanisms for oversight and enforcement of data sharing practices.

Page 43: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations

Thank you