33
#CMIMI18 #CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical Informatics Emory University School of Medicine @_AshishSharma

Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Data Engineering and Imaging Informatics for Precision OncologyAshish Sharma PhDAssistant Professor, Biomedical InformaticsEmory University School of Medicine

@_AshishSharma

Page 2: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Disclosures

None

Page 3: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Cancer has been progressively redefined over the past 20 years

Global Oncology Trends 2017. Report by the

QuintilesIMS Institute

Page 4: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Increase In Number And Complexity Of Treatment

Global Oncology Trends 2017. Report by the QuintilesIMS Institute

Page 5: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

How do Data Sci. & Engg. enable Precision Oncology ?

Data Science

AlgorithmsData Engineering

Page 6: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

- Data for AI Development

- Processing Pipelines

- Scale (Cloudy Medicine)

Data Engineering

Going from Bench to Bedside*

Good Algorithms

Outline

Page 7: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Page 8: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Big Data is not helpful for developing algorithms if data is not FAIR

FindableAccessibleInteroperableReusable"The FAIR Guiding Principles For Scientific Data

Management And Stewardship." Scientific Data 3 (2016)

Page 9: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

FAIR Data The Cancer Imaging ArchiveTCIA encourages and supports cancer imaging open science community by hosting and managing Findable, Accessible, Interoperable, and Reusable (FAIR) images and associated/derived dataClark et al. J Digital Imaging 26.6 (2013_: 1045-1057

~.75PB downloaded over a rolling 12 month window

Page 10: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18 #CMIMI18

TCIA is Not Just an Image Repository • Radiology

• Digital Pathology

• Radiotherapy data

• Imaging features• Labels, Segmentations,

Features….

• Clinical data

• Links to genomic data

Page 11: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Hard to be FAIR

TCIAThis is where electronic medical record gets a little complicated

Sadly TCIA has multiple ways to store non-image data

• Often non-image data is difficult to reuse

• In some cases (e.g., NLST) it is used to create data cohorts

• Often it is difficult conduct studies that make use of non-image data in an integrative manner.

Page 12: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

How do you build a FAIR repo —Requirements and ChallengesClinical DataOne uniform management strategy for all non-image data (clinical) Enhance data exploration, cohort identification, visual analytics

Imaging Features Featurebase for Radiomics and Pathomics featuresOne data representation

Enhanced and automated data curationNon-image data, pathology data, feature sets

Enable efficient deployment and support cloud deployments

Page 13: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Platform for Imaging in Precision Medicine (PRISM)

• PRISM will evolve and containerize the TCIA technology stack to streamline its deployment and incorporate new tools for analysis and management of images and imaging features with clinical context to enrich TCIA’s datasets.

• Semantic integration of TCIA non-image data • Tools for Pathology image data analysis and management • Some new functionality will go into both TCIA and PRISM• Freely available as containerized microservices and OSS

Page 14: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

PRISM Architecture

Page 15: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Building upon PRISM at Emory

GOAL: Streamline access to imaging for research and quality studiesJoint between DBMI and Radiology

Near-real time replication of the PACS (ongoing) Extract metadata for research and quality studies (ongoing) Integrate with orders and reports Simplify access to images for research studies Secure storage, processing and de-Identification (when reqd.) Link w/ Data Warehouse; EMR… Co-located computing and storage

Page 16: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Imaging != Rad + RT

Hello Digital Pathology

Page 17: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Digital Pathology for Precision Oncology

Image analysis and DL methods to extract features from images Link Rad/Path features to “omics”, outcome biological phenomena Identify trillions of objects – nuclei, glands, ducts, tumor niches… Support queries against ensembles of features

(multiple algorithms/datasets) Analysis of integrated spatially mapped structural/”omic” information

to gain insight into cancer mechanism and to choose best intervention

Page 18: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

18

● Deep learning based computational stain for staining tumor infiltrating lymphocytes (TILs)

● Computationally stained TILs correlate with pathologist eye and molecular estimates

● TIL patterns linked to tumor and immune molecular features, cancer type, and favorable outcomes● Potentially guide treatment

selection

● 4,759 subjects (TCGA) == 5,202 H&E slides; 13 cancer types

Saltz et al. Cell Reports 2018 doi.org/10.1016/j.celrep.2018.03.086

Page 19: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Quantitative Imaging Pathology - QuIP

Page 20: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Data Processing Pipelines

Page 21: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Challenges

Model Development, Training TensorFlow, Keras, pyTorch,

MATLAB....Notebooks, IDEs…

Deployment (going to bedside) Data Wrangling (w/o Human in the

loop) on-demand deployment of

Algorithms Scalability Performance and LatencyMonitoring, Testing and ReliabilityUser Interfaces

Page 22: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Containers, Microservices and APIs

• No monoliths — think stages (preprocessing, segmentation, feature selection, classification, CNNs…)• Stage Independent and if possible stateless• Helps in scaling, deployment and redundancy

Containers (an easy way to do it)+Encapsulate the code and immediate dependencies+Easy to share, adopt, deploy- Security implications (Docker vs. Singularity)

• Situation gets better if using K8s Check out Grunt from Panos, Brad Erickson and the Mayo team

Page 23: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Simple, Effective Data Processing Design

Patient Data

PROs• Real-time Data Streams• Easy to test and maintain• Easier to upgrade algorithm• Easy to build dashboards and

visual analytic tools• Secure

CONs• Data and Processing are tightly

coupled• Hard to deploy multiple algorithms• Reengineer similar systems for

each new algorithm

• Deployment are not elastic• No-automatic failovers

Page 24: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Streaming Architectures

Modular design achieved by decoupling data and processing

Data is streamed into Kafka Cluster

Algorithmic pipelines subscribe to topics and process data

Enables rapid prototyping and deployment of algorithms

Preserves the scalability and reliability gains

Page 25: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Scale and Integrate via Cloudy Pipelines

Page 26: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Why Cloud First

What about Local infrastructure?

Hybrid Infrastructure?

Scalable and Affordable Computing- On Demand Computing

(lower capital expenditures)

Managed services that enable new design patterns for computing- Big Query/RedShift- Serverless Computing- data wrangling tools, e.g. DataFlow

Lower/Different barriers to adoption- Work with APIs, not Servers- Local IT has to become cloud aware

Page 27: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Cloudy for Scalability & Redundancy Patient Data

○ Leverage Vendor Services for Scalability and Redundancy○ Deployed AISE on AWS Lambda

and ML Engine

○ Deployment Time < 1day○ Improves model development

by allowing one to test, during development, with real-world scale and constraints

Ver 1.0

Ver 2.0

Page 28: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Processing at Scale

Hint: Docker is not the silver bullet

Cloudy Pipelines can Work (e.g. Google Genomics, DNANexus, NCI Cloud Resources, Globus Genomics…)

1. Think multi-stage pipelines not standalone executables/apps

2. Stages containerized or API endpoints

3. Workflow languages to author pipelines (CWL, WDL……)

4. Rely on orchestrators capable of running pipelines on local/cloud/hybrid

Lessons from Genomics

Page 29: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

Processing at Scale

Reproducibility Share tools (via Code or Containers) Github + Docker Hub +

DockStore … Share and Publish Models TensorFlow Hub; ModelHub.AI; …

Scale Early stages Docker Swarm; Kubernetes etc. Technically getting there but hard to adopt Serverless AWS Lambda; Google Cloud Functions pywren

Integration w/ EMR FHIR, DICOMWeb

Examples are for illustrative purposes and not endorsements

Page 30: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

National Cancer Data Ecosystem Recommendations

Warren Kibbe “Data: Where Precision Oncology and Learning Health Meet”. SAMSI Workshop on Precision Medicine, August 16, 2018

Page 31: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18

https://gdc.cancer.gov

NCI Cancer Research Data Commons

https://cbiit.cancer.gov/ncip/cancer-data-commons#CMIMI18

Page 32: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Final Words

• Need Big, FAIR Data that is representative of the population

• Develop algorithms but think about deployment and scale

• Partnerships and Teams of Techies and MDs; Academic and IndustryTEAM SCIENCE AT ITS BEST

Cloud computing, HPC and AI can have a transformative effect on medicine

Page 33: Data Engineering and Imaging Informatics for …...#CMIMI18#CMIMI18 Data Engineering and Imaging Informatics for Precision Oncology Ashish Sharma PhD Assistant Professor, Biomedical

#CMIMI18#CMIMI18

Acknowledgements● Emory DBMI Engineers, PostDocs and

Students

● Fred Prior PhDTCIA Team, Dept. of Biomedical InformaticsUniv. of Arkansas for Medical Sciences

● Joel Saltz, MD PhDDept. of Biomedical InformaticsStony Brook University

U01CA187013-06

UG3CA225021-01

14X138

U24CA215109-02U24CA180924-05