20
ACMI 2017 Winter Symposium 1 Mike Becich, MD PhD Department of Biomedical Informatics Chair and University Distinguished Professor, Associate Vice Chancellor for Informatics Associate Director, U Pit Cancer Inst, Clin Trans Sci Inst University of Pittsburgh School of Medicine NCI Board of Scientific Advisors Towards a Data Commons ACMI 2017 Winter Symposium Duck Key, FL

Towards a Data Commons

Embed Size (px)

Citation preview

Page 1: Towards a Data Commons

ACMI 2017 Winter Symposium 1

Mike Becich, MD PhDDepartment of Biomedical Informatics

Chair and University Distinguished Professor,Associate Vice Chancellor for Informatics

Associate Director, U Pit Cancer Inst, Clin Trans Sci InstUniversity of Pittsburgh School of Medicine

NCI Board of Scientific Advisors

Towards a Data CommonsACMI 2017 Winter Symposium

Duck Key, FL

Page 2: Towards a Data Commons

ACMI 2017 Winter Symposium 2

Motivations• Making Data Sharing Efficient (and Persistent)• NIH Institutes/Center (ICs) are funding “Commons”

– Precision Medicine and Data Science programs are drivers• NLM’s Trans-NIH Biomedical Informatics Coordinating

Committee (BMIC):• Data Sharing Repositories - https://

www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html • Common Data Elements (CDE) Resource Portal - https://

www.nlm.nih.gov/cde/index.html

• NCI, NIAID, NICHD and NLM have been most proactive to date

Page 3: Towards a Data Commons

ACMI 2017 Winter Symposium 3

NIH Initiatives• NIH Data Commons Pilots – https://

datascience.nih.gov/commons– Model Organism Database– BD2K Centers Pilots (e.g. Pitt/Harvard)– Human Microbiome Project– NCI Data Commons

• Genomic Data Commons (GDC) – U Chicago – TCGA data• Cloud Pilots - ongoing

• NIAID – BD2K Center for Enhanced Data Annotation and Retrieval (Musen/Stanford)

Page 4: Towards a Data Commons

ACMI 2017 Winter Symposium 4

Big Data To Knowledge (BD2K) bioCADDIE and DataMed

• USCD (Ohno-Machado) BD2K Data Discovery Index Project – bioCADDIE

• DataMed v1.5 available• Aims to allow in a PubMed-like fashion to

search for and discover data sets• Is this scalable to provide institutional

infrastructure?

Page 5: Towards a Data Commons

ACMI 2017 Winter Symposium 5

bioCADDIE and DataMed v1.5

https://datamed.org/

Page 6: Towards a Data Commons

ACMI 2017 Winter Symposium 6

PCORI CDRN and CTSA ACT start to unlockClinical Data from EHRs – Key Drivers

Page 7: Towards a Data Commons

ACMI 2017 Winter Symposium 7

Further Fuel is Precision Medicine Initiative – Adding Biospecimens, Mobile Sensors

Page 8: Towards a Data Commons

ACMI 2017 Winter Symposium 8

• Operationally data sharing is an NIH requirement• Most Institutions (maybe all) don’t really treat data as

the valued asset it is – era of Data Science• Most health science investigators are struggling due to

access to scalable storage, high performance computing and open source tool maintenance – the day of supercomputing is here

• Hence, institutions (and BMI) need to support a “real” plan for Research Data Management

• At Pitt Data Commons = Research Data Management

Key Question – How to Pull It All Together

Page 9: Towards a Data Commons

ACMI 2017 Winter Symposium 9

Data Commons Infrastructure @ PittData

Infrastructure Component

Awareness Implementation & Deployment

Adoption Comments

CRIS/Center for Research Computing

Yellow Red Red Under discussion

DMPT Tool Yellow Yellow Yellow In progress

Box storage (small scale) sharing

Green Green green Not the type of “cloud computing” we need for research – simply storage and no HPC, software tools

Storage (large scale) Red Red Red Turn to PSC, SaM and commercial cloud provider(s) – need scale and flexibility

Data Catalogue Red Red red

Metadata schema / ontologies

Red Red red No institutional data schema in place; disciplinary standards present in some areas

Analysis tools Green (everyone knows this is needed)

Yellow red Check licensing arrangements

Visualization tools Yellow Red Red

DOIs Yellow Red Red

Deposit Red Red Red

Repository/ preservation

Red Red Red Noted as a major gap

Tracking tools Red Red Red

Training Yellow Yellow Yellow 4 classes offered by HSLS

Advocacy/ guides Yellow Yellow Yellow In development by ULS, HSLS, CSSD

Page 10: Towards a Data Commons

ACMI 2017 Winter Symposium 10

Who’s at the table?• School of Computing and Information (SCI):

– Department of Computer Science– School of Information Science

• Dept of Information Culture & Data Stewardship (Liz Lyons - chair)

• Department of Biomedical Informatics– CRIO for the Health Sciences – Recruiting Op TBN

• CIO & Computing Services and Systems Development• Center for Research Computing – New Director TBN• Pittsburgh Supercomputing Center• Health Sciences & University (Pitt and CMU) Libraries• Office of Research

Page 11: Towards a Data Commons

ACMI 2017 Winter Symposium 11

Building Blocks – Pittsburgh Genome Research Repository (Rebecca Jacobson – ACMI)

Page 12: Towards a Data Commons

ACMI 2017 Winter Symposium 12

Building Blocks – BD2K - Center for Causal Discovery (Greg Cooper - ACMI)

Page 13: Towards a Data Commons

ACMI 2017 Winter Symposium 13

Building Blocks – Pittsburgh Health Data Alliance (Becich – ACMI)

• Two Centers created:• Center for Machine

Learning in Healthcare• Led by Joe Marks in

CMU School of Computer Science

• Center for Commercial Applications (CCA)• Led by Mike Becich

and Don Taylor• $2M/yr in Early Stage• $22M in follow on

funding for successful projects

• Launch in July 2015

Page 14: Towards a Data Commons

ACMI 2017 Winter Symposium 14

Page 15: Towards a Data Commons

ACMI 2017 Winter Symposium 15

• NCI – Cancer Immunology Data Commons (CIDC) – linked to Cancer Immunologic Monitoring and Analysis Centers (CIMAC)

• PDX Data Commons – Patient Derived Xenografts – linked to PDX Trial Research Network

• NCI Commons Credits for cloud HPC

New National Funding Ops – NCI

Page 16: Towards a Data Commons

ACMI 2017 Winter Symposium 16

• TOPMed – Trans-Omics for Precision Medicine goals:– Collect and assemble -omics (RNASeq, methylation,

metabolomics, epigenomics, and proteomics) data with WGS and clinical outcomes data across diverse populations including those traditionally underrepresented in research.

– Build a data commons repository that the scientific community can use for future research and to enable precision medicine.

– Stimulate systems medicine approaches that help organize data to ensure they are accessible and interpretable for health disease research.

– Promote discoveries about the fundamental mechanisms that underlie HLBS disorders.

NHLBI Data Commons

Page 17: Towards a Data Commons

ACMI 2017 Winter Symposium 17

• Archiving and Sharing of Longitudinal Data Resources on Aging (U24)– Foster data sharing and wider use of longitudinal

data for research on aging in the behavioral and social sciences

– sharing best practices in data and metadata documentation, and disseminating information about useful data sets to the research community

NIA Data Commons Efforts

Page 18: Towards a Data Commons

ACMI 2017 Winter Symposium 18

• Archiving and Documenting Child Health and Human Development Data Sets– support archiving and documenting existing data sets in order

to enable secondary analysis of these data by the scientific community

– Types of data include survey data, administrative data, results of assays conducted on biospecimens, data from clinical trials, and patient registries.

– Also included are archiving activities for data that is to be added to existing data sets in order to enhance their potential scientific impact, such as geographic information systems (GIS), community-level, or registry data.

NICHD Data Commons Efforts

Page 19: Towards a Data Commons

ACMI 2017 Winter Symposium 19

• “... efficient storage, manipulation, analysis, and sharing of research output, from all parts of the research lifecycle...” PE Bourne– Funding opportunities are being launched across the NIH– Time to fit your local, regional and national data sharing

and analysis needs– Need “jumpstart” funding in research computing

infrastructure– Sustainability possible if Offices of Research ensure “data

sharing” infrastructure is budgeted on each grant

Common Goals of Commons

Page 20: Towards a Data Commons

ACMI 2017 Winter Symposium 20

Conclusions

Please join in this effort by e-mailing me – [email protected] Interest/Skills/Personal Goals – I will send you Pitt’s RoadMap

• Biomedical Informatics and the new home (NLM) of the Data Science Program is Key

• Influencers should assist the new NLM Director in the four working groups of the NLM Strategic Planning Process

• Key innovations in development of research objects, integrative metadata development, causal analytics and novel research computing environments (supercomputing/cloud computing/storage) will be key!!!