110
Integrating Genomic and Clinical Data in Electronic Health Records and Biomedical Repositories: Challenges, Solutions and Opportunities American College of Medical Informatics (ACMI)

James Cimino, MD - ACMI Bridge Day Panel

  • Upload
    amia

  • View
    308

  • Download
    0

Embed Size (px)

DESCRIPTION

Integrating Genomic and Clinical Data in Electronic Health Records and Biomedical Repositories: Challenges, Solutions and Opportunities

Citation preview

Page 1: James Cimino, MD - ACMI Bridge Day Panel

Integrating Genomic and Clinical Data in Electronic Health Records and

Biomedical Repositories: Challenges, Solutions and Opportunities

American College of Medical Informatics(ACMI)

Page 2: James Cimino, MD - ACMI Bridge Day Panel

What is ACMI?

The American College of Medical Informatics is a college of elected fellows from the United States and abroad who have made significant and sustained contributions to the field of medical informatics. Initially incorporated in 1984, the organization later dissolved its separate corporate status to merge with the American Association for Medical Systems and Informatics (AAMSI) and the Symposium on Computer Applications in Medical Care (SCAMC) when the American Medical Informatics Association was formed in 1989. The College now exists as an elected body of fellows within AMIA, with its own bylaws and regulations that guide the organization, its activities, and its relationship with the parent organization.

Page 3: James Cimino, MD - ACMI Bridge Day Panel

Integration of Genomic and Clinical Data

The integration of genomic data into the more traditional phenomic databases, such as electronic health records and biomedical data warehouses, offers great potential for the advancement of biomedical research and patient care. However, there are a number of challenges to accomplishing this integration in a seamless manner, including the consistent, standardized representation and coding of the data, coping with the shear volume of information, and proper indexing of important genomic features to facilitate retrieval. The panelists, all Fellows of the American College of Medical Informatics, each a leader in their own institutions and in the field of biomedical informatics, will describe their work on addressing the challenges of genomic-phenomic integration, with working solutions and examples of how such integration can be brought to bear on tasks such as helping clinicians understand their patients’ genetic data, using genetic data to support clinical decision making, and advancing biomedical research. The panel will also discuss implications for national standards on representation and data sharing. The session will include time for audience participants to share the solutions from their own institutions.

Page 4: James Cimino, MD - ACMI Bridge Day Panel

Integration of Genomic and Clinical Data• Potential for biomedical research and patient care

• Consistent standardized representation

• Consistent standardized coding

• Coping with the volume

• Indexing of important genomic features

• Helping clinicians understand patients’ genetic data

• Using genetic data to support clinical decision making

• Advancing biomedical research

• Implications for national standards for representation

• Implications for national standards for data sharing

Page 5: James Cimino, MD - ACMI Bridge Day Panel

Presenters• Shawn N. Murphy, FACMI (representation)

– Massachusetts General Hospital– Harvard Medical School– Partners HealthCare

• Henry Lowe, FACMI (linking genome & phenome)– Stanford University

• Elmer V. Bernstam, FACMI (reuse for research)– University of Texas at Houston

• Riccardo Bellazzi, FACMI (supporting research)– Università di Pavia

• Lucila Ohno-Machado, FACMI (iDASH Center)– University of California at San Diego

• Peter Tarczy-Hornoch, FACMI (decision support)– University of Washington

Page 6: James Cimino, MD - ACMI Bridge Day Panel

Expression of Genomic Variants in a Clinical Research Database

Shawn Murphy MD, Ph.D.

Lori Phillips MS

Brian Wilson

Page 7: James Cimino, MD - ACMI Bridge Day Panel

De-identified

Data Warehouse

1) Queries for aggregate patient numbers

00000042185793......

00000042185793......

2) Returns identified patient data

Z731984XZ74902XX......

Real identifiers

Query construction in web tool

Encrypted identifiers

OR- Start with list of specific patients, usually from (1)- Authorized use by IRB Protocol- Returns contact and PCP information, demographics, providers, visits, diagnoses, medications, procedures, laboratories, microbiology, reports (discharge, LMR, operative, radiology, pathology, cardiology, pulmonary,

endoscopy), and images into a Microsoft Access database and text files.

- Warehouse of in & outpatient clinical data- 5.0 million Partners Healthcare patients- 1.3 billion diagnoses, medications, procedures, laboratories, & physical findings coupled to demographic & visit data- Authorized use by faculty status- Clinicians can construct complex queries- Queries cannot identify individuals, internally can produce identifiers for (2)

Research Patient Data Registry exists at Partners Healthcare to find patient cohorts for clinical research

Page 8: James Cimino, MD - ACMI Bridge Day Panel

Query items Person who is using tool

Query construction

Results - broken down by number distinct of patients

Page 9: James Cimino, MD - ACMI Bridge Day Panel
Page 10: James Cimino, MD - ACMI Bridge Day Panel

HGVS Variant Notation

VariantWildtype Sequence

Footprint

Page 11: James Cimino, MD - ACMI Bridge Day Panel
Page 12: James Cimino, MD - ACMI Bridge Day Panel
Page 13: James Cimino, MD - ACMI Bridge Day Panel

Set of patients is selected through RPDR and data is gathered into a data mart

RPDR

Selected patients

Data directly from RPDR

Data from other hospital sources

Data collected specifically for project

Daily Automated Queries search for Patients and add Data

ProjectSpecific

Phenotypic Data

Page 14: James Cimino, MD - ACMI Bridge Day Panel

Data is available through a specialized Workbench

Page 15: James Cimino, MD - ACMI Bridge Day Panel

Requirements of Genomic Variant Notation

Ability to organize the variants for ease of navigation

Ability to query for the variant in the workbench Implication is that the identifier (basecode) for the variant does not

change over time or is maintainable.

Ability to explore or annotate the variant within the workbench Implication is that we know enough about the variant so that it can be

located in existing external genome browsers, analytical tools, etc

Page 16: James Cimino, MD - ACMI Bridge Day Panel

Challenges of Genomic Variant Notation

• Balancing the capabilities of multiple providers– Genomic labs may report data differently

• Maintainability– Define the variant so it may be reliably identified over time

• Balancing the needs of multiple consumers– Needs may differ for geneticists vs physicians vs research scientists

Page 17: James Cimino, MD - ACMI Bridge Day Panel

Proposed Strategy for Clinical Data feeds

Gather SNP data from reference data

Gather SNP data from genomic lab reporting system

Page 18: James Cimino, MD - ACMI Bridge Day Panel

Weighing the data provided by the lab source

• Gene location MYH7

• Flanking sequences

– 5’ AGGCGCTAGAGAAGTCCGAGGCTC

– 3’ CCGCAAGGAGCTGGAGGAGAAGAT

• Positional information c.2606

• Nucleotide substitution G>A

• Functional information p.Arg869His

Page 19: James Cimino, MD - ACMI Bridge Day Panel

Proposed Strategy for Research Data feeds

1. Store Summarized Genomic Annotation Information Within the current fact table of the star schema (EAV table)

2. Store Detailed Genomic Annotation Information Within A Object Orientated Data Base.

3. Store Genomic Datasets (BAM, PED etc…) Within A Secure File System – Indexed within i2b2 Data Mart

Page 20: James Cimino, MD - ACMI Bridge Day Panel

MongoDB – Data Persistence

GenomicFeatures

gridFS- BAM Files

Data AnalysisMeta Data

ExperimentMeta Data

Interface

I2b2 Web Service API Genomic Report API

Report Engine

Galaxy- Raw Data Storage- ‘Canned’ Workflow Reports

I2b2-Galaxy Adaptor

Other Resources

Domain Experts

Report Request Broker Genomic Data Importer( PED, GFF3 ... )

Genomic Data Exporter( PED, GFF3, BED, WIG ... )

Export GeneLevel Results to CRC

I2b2 Hive Domain Power Users

PM-Cell(Authentication)

CRC-Cell(Summary Annotations)

i2b2 Hive Core

R Perl cURL

Flow Diagram

Page 21: James Cimino, MD - ACMI Bridge Day Panel

How do we make Invariant Variants…that are palatable for human use in queries?

• RS number

• Gene name + flanking sequences

• HGVS name

Page 22: James Cimino, MD - ACMI Bridge Day Panel

RS number

• Uniquely identifies a variant over time ….but….

• Novel variants may not have rs number – User may not want to submit to dbSNP

Page 23: James Cimino, MD - ACMI Bridge Day Panel

Gene name + flanking sequences

• Not guaranteed if gene has several isoforms

– EGFR

Page 24: James Cimino, MD - ACMI Bridge Day Panel

HGVS Name

• Uniquely identifies variant within a referenced and versioned accession and details the nucleotide substitution.

NM_005228.3:c.2155G>T

RefSeq accession Position

Coding DNA

Nucleotidesubstitution

Page 25: James Cimino, MD - ACMI Bridge Day Panel

Is there a common denominator in all of this?

• Yes … all ultimately describe variant location on a chromosome.

• Nucleotide substitution defines the physical manifestation of the variant.

WE PROPOSE:– HGVS name (n/t subst, positional info)– Flanking sequences ( a way to verify positional info)

AS A WAY TO UNEQUIVOCALLY EQUATE TWO VARIANTS – ACROSS DOMAINS – ACROSS VERSIONS

Page 26: James Cimino, MD - ACMI Bridge Day Panel

GenomicMetadata record

GenomicMetadata Version 1.0 ReferenceGenomeVersion hg18 SequenceVariant HGVSName NM_0005228.3:c.2155G>T SystematicName c.2155G>T SystematicNameProtein p.Glu719Cys AaChange missense DnaChange substitution SequenceVariantLocation GeneName EGFR FlankingSeq_5 GAATTCAAAAAGATCAAAGTGCTG FlankingSeq_3 GCTCCGGTGCGTTCGGCACGGTGT RegionType exon RegionName Exon 18 Accessions Accession Name NM_005228 Type mrna (NCBI) Accession Name NP_005219 Type protein (NCBI) Accession Name NT_004487 Type contig (NCBI) ChromosomeLocation Chromosome chr7 Region 7p12 Orientation +

Page 27: James Cimino, MD - ACMI Bridge Day Panel

Combining equivalent terms

Page 28: James Cimino, MD - ACMI Bridge Day Panel

Linking to external services

• Genome Browser

– Requires chromosome location; reference genome

• PolyPhen (predicted functional effects)

– Requires chromosome location; reference genome

– RS number

– Or HGVS name

Page 29: James Cimino, MD - ACMI Bridge Day Panel

VISTA Services

• Flankmap (location service)

Converts several formats to a chromosome location on a reference genome

– Gene/flanking sequence

– Full HGVS notation

– dbSNP rs number

• Conservation plots

– Based on location

Page 30: James Cimino, MD - ACMI Bridge Day Panel

VISTA workbench tools

Page 31: James Cimino, MD - ACMI Bridge Day Panel

Embedded VISTA browser

Page 32: James Cimino, MD - ACMI Bridge Day Panel

References

• Kimball, R. The Data Warehousing Toolkit. New York: John Wiley, 1997.• Murphy, S.N., Gainer, V.S., Chueh, H. A Visual Interface Designed for

Novice Users to find Research Patient Cohorts in a Large Biomedical Database. AMIA, Fall Symp. 2003: 489-493.

• Murphy, S.N., Weber, G., Mendis, M., Gainer, V.S., Churchill, S., Kohane, I.S. Serving the Enterprise and Beyond with Informatics for Integrating Biology and the Bedside (i2b2). Journal of the American Medical Informatics Association, 2010 March 1; 17(2): 124-130.

• den Dunnen JT, Antonarakis SE: Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion. Hum Mutat 2000, 15:7-12.

• Dalgleish R, et al.: Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Medicine; 2010, 2:24.

• http://www.hgvs.org/mutnomen/recs.html

Page 33: James Cimino, MD - ACMI Bridge Day Panel

33

Integrating Clinical and Genomic Data:

Opportunities, Challengesand a Proposal

Henry Lowe MD

Stanford Center for Clinical Informatics

And The Division of Systems Medicine

Stanford University School of Medicine

Page 34: James Cimino, MD - ACMI Bridge Day Panel

• Electronic Health Record Deployment Increasing• Creation of Clinical Data Warehouses Increasing

• Support for Research Access to Clinical Data• Optimized for use of Aggregate Data• Cohort Searching, Data Review & Analysis• Clinical Data (Including Text) Mining Tools

• Aggregation of Clinical Data across sites

Opportunities – Clinical Data Warehouses

Page 35: James Cimino, MD - ACMI Bridge Day Panel

• Linkage of Clinical and Biospecimen Data• Characterizing Biospecimens using Clinical Data• Identifying Biospecimen Cohorts• Linkage of Genomic Data to Clinical Data• Integration of Genomic Data back into the EHR

Opportunities – Biospecimen Linkage

Page 36: James Cimino, MD - ACMI Bridge Day Panel

• Clinical Data is not the Entire Phenotype• Missing Data (e.g. Occupational History)• May be Spread across many eHealth Systems

• Clinical Data is not Perfect• Diagnoses may be coded only in ICD9• Important Data may be missing• Clinical Text may be challenging to parse

Challenges – Clinical Data

Page 37: James Cimino, MD - ACMI Bridge Day Panel

• Creating validated algorithms to define phenotype from EMRs is complex

• Diagnostic Codes Alone may not be sufficient (eMERGE)

• Extracting phenotypic data from clinical text can be difficult

• Phenotype data use goes beyond genomic studies, e.g. Research Cohort Identification

Challenges – Identifying Phenotype

Page 38: James Cimino, MD - ACMI Bridge Day Panel

• Create a Web-based, searchable directory of validated high level phenotype algorithms

• Encourage contributions from multiple sites• Algorithms would be freely available for use• Would use a standard set of metadata elements• Would use a standard description formalism• Provide APIs to support application/system level

access to the phenotype algorithm directory

Proposal – A National Phenotype Catalog

Page 39: James Cimino, MD - ACMI Bridge Day Panel

Challenges in Leveraging Clinical Data

Elmer BernstamProfessor

Biomedical Informatics and Internal MedicineDirector, Biomedical Informatics ComponentCenter for Clinical and Translational Research

The University of Texas Health Science Center at Houston

Page 40: James Cimino, MD - ACMI Bridge Day Panel

Main points

• To leverage genomic data, need (matching) clinical data– Research data

• Expensive and scarce• Relatively easy to compute• May not accurately reflect clinical reality

– Routine clinical data• Plentiful and “cheap” (though may not match)• Very hard to compute• Necessary

• Challenges inherent in routine clinical data

Page 41: James Cimino, MD - ACMI Bridge Day Panel

Traditional view

CPRCPR

Clinical data(CDW)

Clinical data(CDW)

Genetic data

Genetic data

Page 42: James Cimino, MD - ACMI Bridge Day Panel

Why do we need routine data?

• Clinical research moving abroad– Glickman SW, McHutchinson JG, Peterson ED, et al. Ethical and scientific

implications of the globalization of clinical research. N Engl J Med. 2009;360(8):816–823, PMID:19228627.

• If we are to compete…– Need to make use of routine data– Only some routine clinical care can be outsourced

Page 43: James Cimino, MD - ACMI Bridge Day Panel

Problems

• Little overlap between clinical and research data– Genetic data on study subjects– Clinical data on patients who are not study subjects

• Clinical data not like research data– Measurement error– Missing data– Biased data– …

Page 44: James Cimino, MD - ACMI Bridge Day Panel

Attempts to leverage clinical data• [Quality of care]• [(Non-representative)Cohort selection]• Reproducing large RCTs

– Extremely large sample sizes– Example: Tannen RL. Weiner MG. Xie D. Use of primary care electronic medical record

database in drug efficacy research on cardiovascular outcomes: comparison of database and randomised controlled trial findings. BMJ. 2009. 338:b81. – 8M patients (5.7% of the population of the UK)

• Reproducing prediction rules– Example: Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to

improve classification in narrative clinical databases: An application to community-acquired pneumonia. Comp Biol Med, 37 (2007) 296-304.

– Often doesn’t work• Solution 1: eliminate problematic data (10% of sample)

– Bias• Solution 2: account for the confounds via statistical model

– Requires knowing the answer

Page 45: James Cimino, MD - ACMI Bridge Day Panel

Required enabling technologies

• Infrastructure– Collect, store, protect, analyze, update

• NLP, NLP, NLP– Structured (billing) data misleading

• (UTH data) 20% endometrial cancer, 50% breast cancer

• Statistics– Requires unusual degree of collaboration with

statistical colleagues

Page 46: James Cimino, MD - ACMI Bridge Day Panel

For the present

• Critically important research area• Careful to maintain enthusiasm without over-

promising– AI Winter(s)

Page 47: James Cimino, MD - ACMI Bridge Day Panel

Thank you!

Elmer [email protected]

Page 48: James Cimino, MD - ACMI Bridge Day Panel

Integrating genomic and clinical data: some challenges from EU and italian

projects

University of Pavia, ItalyRiccardo Bellazzi

Biomedical Informatics

Labs‘Mario

Stefanelli’

Page 49: James Cimino, MD - ACMI Bridge Day Panel

Cardiology

Oncology

IRCCS Fondazione S. Maugeri

IRCCS Fondazione C. Mondino

Headache

IRCCS Policlinico S. Matteo

The EU Inheritance project

Collaborations BMI labs and Pavia hospitals

Page 50: James Cimino, MD - ACMI Bridge Day Panel
Page 51: James Cimino, MD - ACMI Bridge Day Panel

Clinical Bioinformatics – the Italbionet / i2b2 Pavia project

DW / clinical research chart

Intelligent query / data mining

Knowledge repositories

Reasoning systems

EMR

Research data-bases

Discharge letters

HIV

Biobanks

Page 52: James Cimino, MD - ACMI Bridge Day Panel

Projects

Genetic of arrythomogenic diseases

Support to oncology research

IRCCS Fondazione S. Maugeri

Page 53: James Cimino, MD - ACMI Bridge Day Panel

TRIAD and I2b2

TRIAD: Transatlantic registry of inheritedArrythmogenic diseases

i2b2

Page 54: James Cimino, MD - ACMI Bridge Day Panel

ETL - KETTLE

Page 55: James Cimino, MD - ACMI Bridge Day Panel

TRIAD and i2b2

Page 56: James Cimino, MD - ACMI Bridge Day Panel

Adding statistical functionalities

Page 57: James Cimino, MD - ACMI Bridge Day Panel

BIOINFORMATICS METHODOLOGY AND TECHNOLOGY TO INTEGRATECLINICAL AND BIOLOGICAL KNOWLEDGE SUPPORTING ONCOLOGY TRANSATIONAL RESEARCH (ONCO-I2B2)

Page 58: James Cimino, MD - ACMI Bridge Day Panel
Page 59: James Cimino, MD - ACMI Bridge Day Panel

Inheritance: dilated cardiomiopathies

IRCCS Policlinico S. Matteo

The EU Inheritance project

Projects

Page 60: James Cimino, MD - ACMI Bridge Day Panel

Dilated cardiomiopathy

Centre for Inherited Cardiovascular Diseases - IRCCS Policlinico San Matteo - Pavia

Page 61: James Cimino, MD - ACMI Bridge Day Panel

“DCM”

DystrofinopathiesLaminopathiesDesminopathies

MitocondriopathiesEpicardinopathiesActinopathiesZaspopathies

Desmosonopathies

From DCM to…

Clinically oriented genetic investigation

Centre for Inherited Cardiovascular Diseases - IRCCS Policlinico San Matteo - Pavia

Page 62: James Cimino, MD - ACMI Bridge Day Panel

ECGRest, effort,

holter

PedigreeFamily screening

SymptomsDuration

Physicalevaluation

Non FamilialFamilial: AD, AR,

X-LR, MT

Cardiac, ExtraCardiac,Recent

Onset, Long term

Muscle, SkinEyes, Kidney,

Liver, Lung

LAB

Imaging: echo,MRI

RV Cath

AVB, PR, WPW, etc,

CPK, Leukocytes,Enzymes, Metab.

Etc

LVNC, DE

EMB

Family screeningClinical markers

DiagnosticHypothesis:

Before Genetic Testing

Increasing the number of genotyped CMP

One gene ---> one disease

Page 63: James Cimino, MD - ACMI Bridge Day Panel

Inheritance architecture

I2b2 environment

Web interface

Data analysis plugin

Text mining and literature search engines

Reasoning module

Wiki-based collaborative system

Annotation tools

KB/Red flags

Data warehouse

Cardioregister

Page 64: James Cimino, MD - ACMI Bridge Day Panel
Page 65: James Cimino, MD - ACMI Bridge Day Panel

Projects

IRCCS Fondazione C. Mondino

Headache

Page 66: James Cimino, MD - ACMI Bridge Day Panel

Populating the datawarehouse

CRC

Research Clinical Data

Ontology Mapped Clinical Data

Domain Ontology

Documents

Legacy Databases

NLP System

ICHD Diagnosis

ICHD Code System

Page 67: James Cimino, MD - ACMI Bridge Day Panel
Page 68: James Cimino, MD - ACMI Bridge Day Panel

Task 1. Computational methods and tools to perform data mining and knowledge integration

Web-based data analyticsWeb-based annotation

Automated Literature search

Mining annotations and literature

Efficient management ofMS-data

Page 69: James Cimino, MD - ACMI Bridge Day Panel
Page 70: James Cimino, MD - ACMI Bridge Day Panel

• Several projects where the same architecture can be applied:

• Main adaptation needs:– Specific domain ontologies– Representation of genetic information– Representation of phenotypic information– Importing data from EHR

• Interesting research directions related to building –omics enabled decision support and knowledge management tools

In summary

Several projects where the same architecture can be applied:

Main adaptation needs: Specific domain ontologies Representation of genetic information Representation of phenotypic information Importing data from EHR

Interesting research directions related to building –omics enabled decision support and knowledge management tools

Page 71: James Cimino, MD - ACMI Bridge Day Panel

Integrating Genomic and Clinical Data for EHR and Biomedical Repositories

Lucila Ohno-Machado, MD, PhDDivision of Biomedical Informatics UCSD

TBI-CRI Bridge Day Panel03/8/11

Page 72: James Cimino, MD - ACMI Bridge Day Panel

EHR and Genomics at

Division of Biomedical Informatics overview

Research and Applications• Clinical Data Warehouse

– NLP, privacy technology, preference management

• integrating Data for Analysis, Anonymization, and Sharing

• Personalized risk assessment– How ‘personalized’ is it?

Page 73: James Cimino, MD - ACMI Bridge Day Panel
Page 74: James Cimino, MD - ACMI Bridge Day Panel

• 550,000 outpatient visits/year

• 180,000 hospital admissions

• 17 million orders• 2 million patients

Clinical Data Warehouse

Page 75: James Cimino, MD - ACMI Bridge Day Panel

UCLA(Epic)

Data matching function: Map D onto data dictionaries

Clinician/Researcher wants data

Return data D

Request about individual

Request for data D

UC Irvine (Eclipsys)

UC Davis(Epic)

UCSF(GE)

Community Partners

UCSD(Epic)

Page 76: James Cimino, MD - ACMI Bridge Day Panel

EHR and Genomics at

Division of Biomedical Informatics overview

Research and Applications• Clinical Data Warehouse

– NLP, privacy technology, preference management

• integrating Data for Analysis, Anonymization, and Sharing

• Personalized risk assessment– How ‘personalized’ is it?

Page 77: James Cimino, MD - ACMI Bridge Day Panel

Sharing Data– Today

• Public repositories (mostly non-clinical)• Limited data use agreements

– Tomorrow• Annotated public databases• Informed consent management system• Certified trust network

Sharing Computational Resources– Today

• Computer scientists looking for data, biomedical and behavioral scientists looking for analytics

• Duplication of pre-processing efforts• Massive storage and high performance computing limited to a

few institutions– Tomorrow

• Processed, de-identified, ‘anonymized’, shared data• Secure biomedical/behavioral cloud

integrating Data forAnalysis, Anonymization and Sharing

Page 78: James Cimino, MD - ACMI Bridge Day Panel

Analysis

• Compression• Query language• NLP• Study design

(2nd generation seq)

• Pattern recognition (computing with streams, rare

events)

• High performance computing

(Courtesy Bafna and Varghese)

Page 79: James Cimino, MD - ACMI Bridge Day Panel

Anonymization

Page 80: James Cimino, MD - ACMI Bridge Day Panel

Informed Consent Management SystemDo I wish to disclose data D to P for Reason R?

Information Exchange Registry

Provider P requests Data D on individual I for Reason R

Does the law, Regulation require D to be sent?

Yes No

Yes

No

Individual preferences

Preferences

Inspection

Focus Groups,Surveys

•Identity Management

•Trust Management

Home

Trusted Broker(s)

Patient I

Community

Respecting Privacy and Getting the Job Done

Security Entity

Healthcare Entity

Preference Registry

I can check who or which entity looked (wanted to look) at the data for what reasons

Page 81: James Cimino, MD - ACMI Bridge Day Panel

EHR and Genomics at

Division of Biomedical Informatics overview

Research and Applications• Clinical Data Warehouse

– NLP, privacy technology, preference management

• integrating Data for Analysis, Anonymization, and Sharing

• Personalized risk assessment– How ‘personalized’ is it?

Page 82: James Cimino, MD - ACMI Bridge Day Panel
Page 83: James Cimino, MD - ACMI Bridge Day Panel
Page 84: James Cimino, MD - ACMI Bridge Day Panel
Page 85: James Cimino, MD - ACMI Bridge Day Panel

Personalized Medicine

If the rule of thumb for building predictive models is 10 cases per variable:

How many individual genotypes are needed?

Page 86: James Cimino, MD - ACMI Bridge Day Panel
Page 87: James Cimino, MD - ACMI Bridge Day Panel

22%

Page 88: James Cimino, MD - ACMI Bridge Day Panel

16%

Page 89: James Cimino, MD - ACMI Bridge Day Panel

“this program shows the estimated health risks of people with your same age, gender, and risk factor levels”

Your Risk

p=1

x

Page 90: James Cimino, MD - ACMI Bridge Day Panel

“this means that 5 of 100 people with this level of risk will have a heart attack or die”

Page 91: James Cimino, MD - ACMI Bridge Day Panel

Input space

“people with your same age, gender, and risk factor levels”

People “like you”

Output space

“people with this level of risk”

me

p=1

x

Page 92: James Cimino, MD - ACMI Bridge Day Panel

People “like me”

height

gender

me

Page 93: James Cimino, MD - ACMI Bridge Day Panel

Patients “like you”

Page 94: James Cimino, MD - ACMI Bridge Day Panel

Patients “like you”

me

height

gender0 1

1

Page 95: James Cimino, MD - ACMI Bridge Day Panel

Patients “like you”

me

height

gender

risk

0

2

1

1

Page 96: James Cimino, MD - ACMI Bridge Day Panel

Assessing Quality of Individual Predictions

• Hybrid model construction– Non-parametric and parametric regression– Kernel-based models

• Evaluation of calibration – Graphical tools based on calibration error– Input-based assessment

• Calibration methods– Smooth isotonic regression (1:30 Cyril Magnin II)– Doubly-penalized SVM

Page 97: James Cimino, MD - ACMI Bridge Day Panel

Summary

• We need to aggregate as much information we can from experiments and clinical data to create reasonable predictive models

• Objective models are being used in a variety of medical domains, but few users know their limitations

• We need better methods to assess the quality of the models

Page 98: James Cimino, MD - ACMI Bridge Day Panel

Genome-Phenome Integration @ UCSD

Funding from NLM, NHLBI, NHGRI, NIBIB, NCRR, NIGMS, AHRQ, Fogarty, VAMRF, Komen Foundation, UCSD Medical Center

Page 99: James Cimino, MD - ACMI Bridge Day Panel

Integrating Genomic and Clinical Data in EHRs and Biomedical Repositories:Challenges, Solutions and

OpportunitiesPeter Tarczy-Hornoch MD Director, Biomedical Informatics Core, ITHS Director, Research and Data Integration, ITS Head and Professor, Biomedical and Health Informatics Adjunct Professor, Computer Science and Engineering Professor, Neonatology

March 9, 2011AMIA TBI-19/CRI-01 ACMI Panel

Page 100: James Cimino, MD - ACMI Bridge Day Panel

Electronic Medical

Record/Clinical Data

Electronic Case Report Form Data

Biodata(Instruments)

Biospecimens

Researcher

Honest

broker

IRB approved protocol

IRB approved protocol

IRB approved protocol

Solutions for generating new genomic knowledge require integrating diverse phenotypic and genomic data

Page 101: James Cimino, MD - ACMI Bridge Day Panel

The University of Washington data repository (Amalga) integrates phenotypic data from 30+ interfaces (10/2010)

Scope of Repository

• 3.5M patients, 42M visits, 220M+ lab results, 180M+ diagnoses & procedures over 18 years• 14 data systems populating Amalga via 30+ real-time or batch interfaces• 2.7 Terabytes of data• 4M new messages/day• Use IRB/HIPAA compliant

Page 102: James Cimino, MD - ACMI Bridge Day Panel

Amalga can identify patients with a given phenotype and help investigators augment phenotypic information • Eligibility criteria (IRB approved study)

Patients whose age >=18 years and are not deceased

AND

Had ICD-9 codes of 648.* OR 250.* OR 648 OR 250

AND

Had lab test results (Albumin >= 30 and <= 400) OR (Albumin/Creatinine Ratio >= 30 and <= 400) within the last 2 years.

AND

Had ANY encounter in the service centers for Internal Medicine OR Diabetes Care Center OR Family Medical Center in the last 2 years

AND

Have not had a diagnosis of 592.* OR 592 OR 585.6 OR V42.0

AND

Have not had lab test (Calcium > 10.5) OR (GFR < 60) OR (Hemoglobin A1C HPLC > 9.5) OR (Hemoglobin A1C Rapid > 9.5).

• Nightly updates to candidate list, automated notification, & custom study input screen

Link demographic, diagnoses, labs, & visit history data

Page 103: James Cimino, MD - ACMI Bridge Day Panel

Some phenotypes more challenging to capture

Capurro, Tarczy-Hornoch

TBI 2011 (TBI-10)

Page 104: James Cimino, MD - ACMI Bridge Day Panel

Semantic alignment in data repositories pulling data from disparate systems is a challenge

- 2 medication lists- 2 systems - Pharmacy - Medical record

- Single dictionary (A)

- 1 medication lists- 1 system - Medical record

- Single dictionary (B)

- n medication lists- n systems (members) - hospitals - clinics - pharmacies - mail-order- NO dictionary

Ongoing research and research opportunities: ontologies, semantic alignment

Page 105: James Cimino, MD - ACMI Bridge Day Panel

EHR computable phenotypes may not be granular enough thus text mining is a key opportunity

CONFIDENTIAL – UNPUBLISHED DATA (Black, Capurro et al)

Page 106: James Cimino, MD - ACMI Bridge Day Panel

Systems to bring genomic knowledge to the point of care need to integrate with both genomic and phenotypic data• “Pharmacogenomics (PGx) is the study of the genetic basis of

variability among individuals in response to drugs” (Pharmacogenomics & Personalized Medicine, Colen N. 2008)

Overby, Tarczy-Hornoch et al BMC Bioinformatics 2010

Motivation

Example: Tamoxifen and time to recurrence

Increased monitoring for poor metabolizers recommended

Increased monitoring for poor metabolizers recommended

Note: Given limited evidence, as of 2009, ASCO does NOT recommend testing for CYP2D6

Page 107: James Cimino, MD - ACMI Bridge Day Panel

Methods

Genotypescoring system

Raw from Sheffield et al. Clin Bio Rev. 2009

Overby et al IDAMAP 2010

Pharmacogenomic decision support requires reasoning across assertions with different levels of evidence

Page 108: James Cimino, MD - ACMI Bridge Day Panel

Prototype system built on Amalga integrates Illumina SNP data and clinical data and basic genomic knowledge

• Potential applications: discovery of associations, validation of associations, clinical alerts/reminders

• Research opportunities: genomics, data modeling, data mining, text mining/NLP, decision support

Data is from simulated patients

* Overby: decision support for pharmacogenomics, * Yetisgen-Yildiz: phenotype extraction

Page 109: James Cimino, MD - ACMI Bridge Day Panel

Collaboration is key to realize the opportunity for use biomedical data to advance genomic research & practice

13 Cores includingBiomedical Informatics(and Regulatory/Bioethics)

Academics & Clinical- Lab Medicine- Pathology- Genome Sciences- Northwest Institute of Genetic Medicine

Faculty (19+39)- Research, ServiceStudents (39)- MS, PhD, Postdoc

Medical Records Billing SystemsData Repositories

Biomedical Data &

Biospecimens

Nursing Public Health

ComputerScience

Page 110: James Cimino, MD - ACMI Bridge Day Panel

Acknowledgements (incomplete)

• Funding: NCRR UL1 RR 025014, NLM T15 LM07442, NIH, NSF, AHRQ, UW Medicine

• Faculty Nick Anderson, Jim Brinkley,

Alon Halevy, Ira Kalet, Kari Stephens, Dan Suciu, Peter Tarczy-Hornoch

• PhD Students Eithon Cadag, Daniel Capurro,

Paul Fearn, Alicia Guidry, Ping Lin, Brent Louie, Peter Mork, Casey Overby, Rupa Patel, Terry Shen

• ITHS BMI Core Staff Bill Barker, Tony Black, Joshua

Franklin, Gene Hart, Greg Hather, Xenia Hertzenberg, Brent Louie, May Lim, Paul Oldenkamp, Roy Pardee, Jim Piper, Jaime Prosser, Justin Prosser, Ron Shaker, Richard Veino

• ITS Staff Joe Frost, Jim Hoath, Mike Kuffel,

Soohee Lee, Dave Rankin, Dan Sullivan, Paul Tittel, Tanya Tobin, and more