55
Biomedical Informatics and Integrative Cancer Research Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Wci Pop Sci Feb 2011

Embed Size (px)

DESCRIPTION

Talk given to the Emory Cancer Control and Population Science Program 2/17/2011 Describing Biomedical Informatics, Integrative Cancer Research, caBIG and CTSA

Citation preview

Page 1: Wci Pop Sci Feb 2011

Biomedical Informatics and Integrative Cancer Research

Joel Saltz MD, PhDDirector Center for Comprehensive

Informatics

Page 2: Wci Pop Sci Feb 2011

Objectives

• Brain Tumor in Silico Center• Whole Exome Sequencing and

Hypertension in African American Populations

• Biomedical Informatics: caBIG, CTSA Informatics Tools and Infrastructure

Page 3: Wci Pop Sci Feb 2011

Integrative Analysis: Tumor Microenvironment

• Structural and functional differentiation within tumor

• Molecular pathways are time and space dependent

• “Field effects” – gradient of genetic, epigenetic changes

• Radiology, microscopy, high throughput genetic, genomic, epigenetic studies, flow cytometry, microCT, nanotechologies …

• Create biomarkers to understand disease progression, response to treatment

Tumors are organs consisting of many interdependent cell types

• From John E. Niederhuber, M.D. Director National Cancer Institute, NIH presented at Integrating and Leveraging the Physical Sciences to Open a New Frontier in Oncology, Feb 2008

Page 4: Wci Pop Sci Feb 2011

Informatics Requirements

•Parallel initiatives Pathology, Radiology, “omics”

•Exploit synergies between all initiatives to improve ability to forecast survival & response.

RadiologyImaging

Patient Outco

me

Pathologic Features

“Omic”Data

Page 5: Wci Pop Sci Feb 2011

Structural Complexity

Page 6: Wci Pop Sci Feb 2011

Tumor Microenvironment(roughly 25TB/cm2 tissue)

Page 7: Wci Pop Sci Feb 2011

In Silico Center for Translational NeuroOncology Informatics

Director: Joel Saltz, MD, PhD; PI Dan Brat MD, PhD

AIMS1. Determine genetic and gene expression

correlates of high resolution nuclear morphometry in the diffuse gliomas and their relation to MR features using Rembrandt and TCGA datasets.

2. Determine the influence of tumor micro-environment on gene expression profiling and genetic classification using TCGA data

3. Examine the gene expression profile of low grade gliomas that progress to GBM for predictive clustering, prognostic significance and correlates with pathologic and radiologic features.

4. Identify correlates of MRI enhancement patterns in astrocytic neoplasms with underlying vascular changes and gene expression profiles.

Page 8: Wci Pop Sci Feb 2011

8

In Silico Program Objectives (from NCI)• In silico is an expression used to mean "performed on computer

or via computer simulation.“ (Wikipedia)• In silico science centers: support investigator-initiated,

hypothesis-driven research in the etiology, treatment, and prevention of cancer using in silico methods• Generating and publishing novel cancer research findings leveraging

caBIG tools and infrastructure• Identifying novel bioinformatics processes and tools to exploit

existing data resources• Encouraging the development of additional data resources and

caBIG analytic services• Assessing the capabilities of current caBIG tools• Emory, Columbia, Georgetown, Fred Hutchinson Cancer ,

Translational Genomics Research Institute

Page 9: Wci Pop Sci Feb 2011

TCGA: Large Scale Integrative multi-”omic” Cancer Study

Page 10: Wci Pop Sci Feb 2011

TCGA Research Network

Digital Pathology

Neuroimaging

Page 11: Wci Pop Sci Feb 2011

Distinguish (and maybe redefine) astrocytic, oligodendroglialand oligoastrocytic tumors using TCGA and Rembrandt

Important since treatment and Outcome differ

• Link nuclear shape, texture to biological and clinical behavior

• How is nuclear shape, texture related to gene expression category defined by clustering analysis of Rembrandt data sets?

• Relate nuclear morphometry and gene expression to neuroimaging features (Vasari feature set)

• Genetic and gene expression correlates of high resolution nuclear morphometry and relation to MR features using Rembrandt and TCGA datasets.

Page 12: Wci Pop Sci Feb 2011

TCGA Brain Pathology CriteriaAttributes that Relate to Entire Specimen

Roughly 200 TCGA specimens; Three Reviewers with Dan Brat adjudicating

Not Present: Not detected on any blockPresent: detected on any blockAbundant: present in ≥ 50% of 10x

fields in ≥ 50% blocks

Microvascular hyperplasia elements (1,2) Complex/glomeruloid Circumferential endothelial hyperplasia

Necrosis elements (3,4) Multiple serpentine pseduopalisading

pattern Zonal necrosis

Small cell component Gemistocytes “Oligodendroglioma-like” component

with perinuclear cytoplasmic halos Perineuronal and/or perivascular satellitosis Multi-nucleated/giant cells Epithelial metaplasia Mesenchymal metaplasia Entrapped gray matter Entrapped white matter Micro-mineralization

Inflammation Macrophage/histiocytic infiltrates Lymphocytic infiltrates Polymorphonuclear leukocytic infiltrates

Page 13: Wci Pop Sci Feb 2011

Characterization of specific microanatomicstructures

Characterization of neoplastic nuclei

• Nuclear size (area and perimeter)

• shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio)

• intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis)

characterization of regions of angiogenesis

• endothelial hypertrophy • endothelial hyperplasia• microvascular hyperplasia• glomeruloid proliferation• area of angiogenesis region• shape – (how the region

departs from a fitted tubular structure)

• normalized color

Page 14: Wci Pop Sci Feb 2011

Feature Extraction

TCGA Whole Slide Images

Jun Kong

Page 15: Wci Pop Sci Feb 2011

Oligodendroglioma Astrocytoma

Nuclear Qualities

Class Assignment

1 10

Page 16: Wci Pop Sci Feb 2011

Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology

Astrocytoma vs Oligodendroglima• Assess nuclear size (area and

perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).

Page 17: Wci Pop Sci Feb 2011

Whole slide scans from 15 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo components399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells

Machine-based Classification of TCGA GBMs (J Kong)

TCGA Gene Expression Query:c-Met overexpression

Separation:p =1.4 X 10 -22

Page 18: Wci Pop Sci Feb 2011

Imaging Pathology MolecularTime 1 – 8 yrs

Examine gene expression profiles of low grade gliomas that progress to GBM for predictive clustering and correlates with pathologic and radiologic features.

Page 19: Wci Pop Sci Feb 2011

ClassicalProneural Neural Mesenchymal

Hierarchical clustering of 176 Rembrandt samples using TCGA classification genes defines four major subtypes.

(Lee Cooper and Carlos Moreno)

Page 20: Wci Pop Sci Feb 2011

75 lower-grade gliomas in REMBRANDT (p < 0.0003).

Lee CooperCarlos Moreno

Predicting Recurrence/Survival43 oligodendrogliomas in REMBRANDT (p < 0.0002).

Page 21: Wci Pop Sci Feb 2011

Neuroimaging Correlates

Define relationship between contrast-enhancement, perfusion and permeability with vascular changes

Correlate MR characteristics defined by the Vasari Feature Set with pathologic grade, vascular morphology and gene expression profiles

Page 22: Wci Pop Sci Feb 2011

Angiogenesis Segmentation

H&EImage

ColorDeconvolution

HematoxylinImage

EosinImage

EosinImage

SpatialNorm.

DensityImage

DensityCalculation

BoundarySmoothing

DensityImage

ObjectID

SegmentedVessels

Eosin intensity image

Angiogenic Segmentation

Page 23: Wci Pop Sci Feb 2011

States of AngiogenesisEndothelial Hypertrophy

Complex MicrovascularHyperplasia

Endothelial Hyperplasia

Lee CooperSharath Cholleti

Page 24: Wci Pop Sci Feb 2011

Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in

GBM• Lee A.D. Cooper; Carlos S. Moreno; Candace S. Chisolm; Christina Appin;

David A. Gutman; Jun Kong; Tahsin Kurc; Joel H. Saltz; Daniel J. Brat• Frozen sections from 88 GBM samples were manually marked to identify

regions of necrosis and angiogenic vessels exhibiting endothelial hypertrophy, hyperplasia, or complex microvascular proliferation

• Markups were used to calculate extent of both necrosis and angiogenesis as a percentage of total tissue area

• Gene expression from the HT-HGU133A platform analyzed using Significance Analysis of Microarrays (SAM); Cox Proportional Hazards modeling to identify mRNAs significantly associated with extent of necrosis and/or angiogenesis using a false discovery rate cutoff of < 5%

Page 25: Wci Pop Sci Feb 2011

Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in

GBM• Associated with necrosis were master regulators of the

mesenchymal tumor subtype, including C/EBP-B, C/EBP-D, STAT3, FOSL2, and RUNX1

• IPA analysis of genes correlated with necrosis identified significantly enriched canonical pathways including :

• HIF-1α (p = 3.0e-7), NFκB (p = 1.4e-3), • IL-6 (p = 6.9e-6), FGF (p = 2.7e-5), • ERK/MAPK (p = 1.2e-4), • Protein Kinase A signaling (p = 1.9e-4), • Thrombin signaling (p = 5.2e-3),• HGF (p = 0.023) signaling.

Page 26: Wci Pop Sci Feb 2011

Vasari Imaging Criteria(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)

• Require standardized validated feature sets to describe de novo disease.

• Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology:– To define a comprehensive set of imaging

features of cancer– For reporting imaging results– To provide a more quantitative, reproducible

basis for assessing baseline disease and treatment response

Page 27: Wci Pop Sci Feb 2011

Classify Imaging Features of Entire Tumor and Resected Specimen

Record features of the entire tumor at baseline.

Distinguish features that comprise tissue in resected specimen.

Imaging Features of Resected Specimen• Extent of resection of enhancing tumor• Extent resection of nCET• Extent resection of vasogenic edema

Page 28: Wci Pop Sci Feb 2011

Defining Rich Set of Qualitative and Quantitative Image Biomarkers

• Community-driven ontology development project; collaboration with ASNR

• Imaging features (5 categories)– Location of lesion– Morphology of lesion margin (definition, thickness,

enhancement, diffusion)

– Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)

– Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)

– Resection features (extent of nCE tissue, CE tissue, resectedcomponents)

Page 29: Wci Pop Sci Feb 2011

Results: Reader Agreement• High inter-observer agreement among

the three readers– (kappa = 0.68, p<0.001)

• Percentage agreement was also high for most features individually– 22 of 30 features (73%) had agreement greater than

50%– Twelve features (40%) had >80% agreement– No feature had less than 20% agreement

• Feature agreement rose substantially when used with tolerance (+/- 1).

Page 30: Wci Pop Sci Feb 2011

Preliminary Relationships of Features to Survival

• Cox proportional hazards models were fit to each of the thirty features related to overall survival.

• Features associated with lower survival included (p<.0001):– Proportion of enhancing tissue at baseline.– Thick or nodular enhancement characteristics.– Contralateral hemisphere invasion.

• Proportion of non contrast enhancing tumor (nCET) had positive correlation with survival.

• Tumor size at baseline had no relationship to survival.

Page 31: Wci Pop Sci Feb 2011

Recent Findings Relating Radiology, Pathology “Omics”

• Linear regression models incorporating multiple imaging features or a single VASARI feature (ependymal extension) and tumor gene expression can be used to predict patient survival.

• Multiple statistically significant associations between imaging and genomic features in glioblastomas. EGFR mutant tumors were significantly larger than TP53 mutant tumors, and were more likely to demonstrate pial involvement. CDKN2A homozygous deletion associated with an ill-defined nonenhancing tumor margin and enhancing pial involvement.

• Significant association between minimal enhancing tumor (≤5% proportion of the overall tumor) and Proneural classification (p=0.0006). Significant association between a >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).

Page 32: Wci Pop Sci Feb 2011

Minority GridGrady, Kaiser-Atlanta, MSM-East Point, Jackson-HindsMorehouse, Emory, Jackson Heart Study, University of

Washington, Baylor

• Aim 1: Establish organizational framework as consoritium of academic medical centers and minority-serving “safety net” medical care facilities

• Aim 2: Establish an EHR-linked bioinformatics/bio-repository infrastructure that facilitates in depth genotyping, phenotypic characterization and logitudinal surveillance of minority patients

• Aim 3: Demonstrate utility of MH-GRID with a “use case” project that defines genetic, personal and social-environmental determinants of severe hypertension in African Americans

• This platform could also be leveraged to carry out cancer studies

Page 33: Wci Pop Sci Feb 2011

Overall Goals of Minority Grid

• Breadth and nature of genomic variation associated with clinical phenotypes among patients of various bio-geographical ancestral groups

• Bio-ancestry-specific, low frequency/major effect DNA variants that contribute to racial differences in drug responsiveness, health outcomes and health disparities

• Characterization of admixture• Long term outcome of patients with at-risk variants

revealed by whole exome sequencing

Page 34: Wci Pop Sci Feb 2011

Approach

• Identification of 1200 cases, 1200 controls– Controls have longitudinal followup with BP consistently below

120/80• Whole exon sequencing• Detection of new common variants and rare/low frequency

variants• EHR data, interview data: health literacy, perceived stress,

dietary intake, physical activity, neighborhood characteristics (via geocoding)

• Clinical Laboratory analyses: electrolytes, plasma creatinine, lipid profile, glucose, estimated GFR

• Project funded for roughly 2 months and is getting underway

Page 35: Wci Pop Sci Feb 2011

Transcontinental Railway: The Golden Spike - Triumph of Standards

Page 36: Wci Pop Sci Feb 2011

Semantic Interoperability: Same ideas, different words

Page 37: Wci Pop Sci Feb 2011

Challenges• Unprecedented magnitude of change

throughout the system• Constant flow of information to

manage• Legacy systems• Cultural barriers

The ca“BIG” Picture

(from Ken Beutow)

Page 38: Wci Pop Sci Feb 2011

The ca“BIG” Picture

The cancer Biomedical Informatics Grid (caBIG):

• Standards-based vocabulary, data elements, data models facilitate information exchange

• Common, widely distributed infrastructure permits cancer community to focus on innovation

• Collection of interoperable component-based applications developed to common standards

• Cancer information is widely available to diverse communities (from Ken Beutow)

Page 39: Wci Pop Sci Feb 2011

Biomedical Informatics and Middleware

DisseminatesInformation

GridInformation Integration

Brings in InformationGrid

Information Integration

Translates andIntegrates Information

Natural Language ProcessingOntologies

Page 40: Wci Pop Sci Feb 2011

caGrid -- “Octopus middleware”

caGrid Components– Language (metadata,

ontologies)– Grid Service Graphical

Development Toolkit (Introduce)

– DICOM compatability (IVI middleware)

– Security (GAARDS)– Advertisement and

Discovery– Workflow

Page 41: Wci Pop Sci Feb 2011

Integrated BIP• Architecture working group to design a common

architecture• Collaborative projects

– Security infrastructure– Testing framework– Bioinformatics support– Registry implementation at Grady for quality improvement

and cardiovascular research– LIMS deployment for biospecimen management– i2b2 deployment for clinical data

• Leverage institutional strengths for education and training

• Leverage over $3.8M in grant and internal funding this year

Page 42: Wci Pop Sci Feb 2011

ACTSI-wide Federated Data Warehouse System

Develop integrative, federated ACTSI information warehouse Integrated clinical/imaging/”omic”/biomarker/tissue information

should always be available A virtually centralized, big Atlanta wide information warehouse that

has all relevant data Patients seen and information gathered at any ACTSI site, specimens sent

to any affiliated core, imaging carried out at any affiliated site Give me all gene expression, SNP, virtual slide images, hematology

studies and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010.

Development efforts Security, Web Portal, Common Data Elements & Vocabularies,

Identifiers, High-performance Computing middleware, Testing framework.

Page 43: Wci Pop Sci Feb 2011

DATA

INTEGRATION

Information Warehouse User AccessAcquisition Transfer

CPOEOR systemPatient MgmtDictated reportsPathology reports

Daily

ADT LabRespiratory Blood Endoscopy CardiologySiemens Img

Real time

Patient BillingPractice Plans

Pt Satisfaction

Weekly

Monthly

Cancer GeneticsWoundImagesTissuePulmonaryGenomic Data

Web

Multi-Dimensional Analysis & Data Mining Ad-hoc

Query

Error Report Benchmarking

De-IdentificationHonest Broker

Wound Center

Research

Web Scorecards & Dashboards

Image Analysis

Text Mining, NLP

Business Clinical

Research External

Meta Data

Ove

rvie

w

Crucial to Leverage Institutional Data

Ohio State Information Warehouse Infrastructure

Page 44: Wci Pop Sci Feb 2011

ACTSI-wide Federated Data Warehouse

Page 45: Wci Pop Sci Feb 2011

Enhanced Registries• Linked Databases for Research

• Leverages common data elements and models and existing standards. Initially for cardiovascular disease, diabetes and co-morbidities.

• Derived data elements represent categories of data and temporal patterns of interest.

• Linked to source data – initially, the Emory Healthcare Clinical Data Warehouse and the Grady Health System Diabetes Patient Tracking System.

• Supports end-user researcher query and analysis.

• Research PACS• Federated support for management of image data. • DICOM standards and Grid services for federated access.• Management of image analysis results.

Page 46: Wci Pop Sci Feb 2011

Registry Project Status

• Co-morbidity registry prototype completed that exports demographics, encounters, readmissions, discharge diagnoses and diagnosis categories, and medication categories to Excel pivot tables

• Has been used by Emory Healthcare to identify co-morbidities associated with readmissions for patient populations at high risk

• System development is ongoing

Page 47: Wci Pop Sci Feb 2011

1997: Virtual Microscope at Hopkins/Maryland

Page 48: Wci Pop Sci Feb 2011

Distinguishing Characteristic in Gliomas

Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images

Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities

Oligodendroglioma Astrocytoma

Nuclear QualitiesRound shaped withsmooth regular texture

Elongated with rough, irregular texture

Page 49: Wci Pop Sci Feb 2011

PAIS Database

Implemented with IBM DB2 for large scale pathologyimage metadata (~million markups per slide)

Represented by a complex data model capturing multi-faceted information including markups, annotations,algorithm provenance, specimen, etc.

Support for complex relationships and spatial query: multi-level granularities, relationships between markups andannotations, spatial and nested relationships

Page 50: Wci Pop Sci Feb 2011
Page 51: Wci Pop Sci Feb 2011

PAIS Database and Analysis Pipeline

Suite of analysis algorithms and pipelines that carry outthe following tasks:

1. segmentation of cells and nuclei;2. characterization of shape and texture features of

segmented nuclei;3. storage of nuclei meta-data in relational database;4. mechanism supporting spatial queries for human-

annotated nuclei;5. machine learning methods that integrate information from

features to accomplish classification tasks.

Page 52: Wci Pop Sci Feb 2011

Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray

(PI’s: Foran and Saltz)

Build reference library ofexpression signatures, integrate state-of-the-art multi-spectral imaging capability and build a deployable clinical decision support system for analyzing imaged specimens.

Technologies and computational tools developed during the course of the project to be tested on a Grid-enabled, virtual laboratory established among strategic sites located at CINJ, Emory, RU, UPenn, OSU, and ASU.Funded by NIH through grant#5R01LM009239-02 David J. Foran, Ph.D.

Page 53: Wci Pop Sci Feb 2011

ACTSI: Example Active Biomedical Informatics Projects

In Silico Study of Brain Tumors Minority Health Genomics and Translational Research

Bio-Repository Database (MH-GRID) ACTSI Cardiovascular, Diabetes, Brain Tumor Registry Early Hospital Readmission CFAR (Center for AIDS Research) HIV/Cancer Project Radiation Therapy and Quantitative Imaging Integrative Analysis of Text and Discrete Data Related

to Smoking Cessation and Asthma Metadata Analysis of Glycan Structures Semantic Query and Analysis of Integrative Datasets in

Renal Transplant Clinical Studies (CTOT-C)

Page 54: Wci Pop Sci Feb 2011

Thanks to:

• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)

• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads

• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others

• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi

Schreibmann, Paul Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti,

Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)

Page 55: Wci Pop Sci Feb 2011

Thanks!