20
AMIA CRI Summit 2011 CRI-09: Cross-Institutional Systems to Support Phenotyping in Biomedical Research: Experiences from the eMERGE Network Luke Rasmussen Marshfield Clinic David Carrell, PhD Group Health Research Institute William Thompson, PhD Northwestern University Hua Xu, PhD Vanderbilt University Jyoti Pathak, PhD Mayo Clinic

AMIA CRI Summit 2011 CRI-09: Cross-Institutional Systems to Support Phenotyping in Biomedical Research: Experiences from the eMERGE Network Luke Rasmussen

Embed Size (px)

Citation preview

AMIA CRI Summit 2011

CRI-09: Cross-Institutional Systems to Support Phenotyping in Biomedical Research:

Experiences from the eMERGE Network

Luke RasmussenMarshfield Clinic

David Carrell, PhDGroup Health Research Institute

William Thompson, PhDNorthwestern University

Hua Xu, PhDVanderbilt University

Jyoti Pathak, PhDMayo Clinic

eMERGE Consortium

• Principal sponsor: NHGRI with additional funding from NIGMS

• NIH-funded consortium (CTSA awardee institutions)

• DNA Biobanks linked to EHR data• Consortium members

– Group Health of Puget Sound– Marshfield Clinic– Mayo Clinic– Northwestern University– Vanderbilt University

QRS duration

Dementia

Peripheral vascular disease

Cataracts Type II diabetes

Coordinating center

Marshfield ClinicBiobank Population

Geographically defined cohortStable population

Minimal selection biasOver 95% of medical events captured in EMR

DataAll levels of inpatient and outpatient care5 decades of retrospective clinical data

Prospective & continuous data collection via EHREvent, testing, treatment and outcomes represented

High utilization of primary care to classify controlsClinical, financial and environment data

Health Events

eMERGE Contributors• NHGRI

– Rongling Li– Heather Junkins– Teri Manolio– Jim Ostell

• Group Health– Eric Larson– Gail Jarvik– Chris Carlson– Wylie Burke– Gene Jart– David Carrell– Malia Fullerton– Walter Kukull– Paul Crane– Noah Weston

• Northwestern– Rex Chisholm– Bill Lowe– Phil Greenland– Wendy Wolf– Maureen Smith– Geoff Hayes– Pedro Avila– Joel Humowiecki– Jen Allen-Pacheco– Amy Lemke– Will Thompson

• Marshfield– Cathy McCarty– Peggy Peissig– Luke Rasmussen– Marilyn Ritchie– Justin Starren– Russ Wilke– Dick Berg– Jim Linneman

• Mayo Clinic– Christopher G. Chute– Iftikhar J. Kullo– Barbara Koenig– Suzette Bielinski– Mariza de Andrade

• Vanderbilt– Dan Roden– Dan Masys– Josh Denny– Brad Malin– Ellen Wright Clayton– Dana Crawford– Jonathan Haines– Jonathan Schildcrout– Jill Pulley– Melissa Basford– Marilyn Ritchie

RFA HG-07-005:Genome-Wide Studies in Biorepositories with

Electronic Medical Record Data

• 2007 NIH Request for Applications from the National Human Genome Research Institute

“The purpose of this funding opportunity is to provide support for investigative groups affiliated with existing biorepositories to develop necessary methods and procedures for, and then to perform, if feasible, genome-wide studies in participants with phenotypes and environmental exposures derived from electronic medical records, with the aim of widespread sharing of the resulting individual genotype-phenotype data to accelerate the discovery of genes related to complex diseases.” (Emphasis added)

Tools and Methods

Presenter Topic

Luke RasmussenMarshfield Clinic

Reusable phenotype algorithmsTechniques to facilitate future reuse of phenotype algorithms.

David CarrellGroup Health

Clinical Text Explorer Search InterfaceFacilitates exploration of EHR for rapid phenotyping and algorithm refinement.

William ThompsonNorthwestern University

clinical Text Analysis and Knowledge Extraction System (cTAKES)Natural language processing (NLP) system utilized for multiple phenotypes, including PAD.

Hua XuVanderbilt University

MedExNLP system utilized within eMERGE with additional applications to pharmacogenomic research.

Jyoti PathakMayo Clinic

eleMAPFacilitates harmonization and standardization of phenotype variables across sites.

AMIA CRI Summit 2011

Reusable Phenotype Algorithms

Luke RasmussenSenior Programmer/Analyst

Marshfield Clinic Research FoundationBiomedical Informatics Research Center

Phenotype Development

• Multi-disciplinary teams

• Multiple sites

• Iterative

• Intangible →Tangible

EMR-based Phenotype Algorithms

• Typical components– Billing and diagnoses codes– Procedure codes– Labs– Medications– Phenotype-specific co-variates (e.g., Demographics,

Vitals, Smoking Status, CASI scores)– Pathology– Imaging?

• Organized into inclusion and exclusion criteria

EMR-based Phenotype Algorithms

• Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95%

• For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98%

Primary Phenotypes

Site Phenotype Validation (PPV/NPV)

Group Health Dementia 73% / 92%

Marshfield Clinic

Cataracts / Low HDL 98% / 98%

82% / 96%

Mayo Clinic PAD 94% / 99%

Northwestern University

Type 2 DM 98% / 100%

Vanderbilty University

QRS Duration 97% / 100%

Supplemental Phenotypes

Site Phenotype Validation (PPV/NPV)

Group Health WBC *

Marshfield Clinic

Diabetic Retinopathy

80% / 98%

Mayo Clinic RBC 98% / 94%

Northwestern University

Lipids / Height 92% / 100%

95% / 100%

Vanderbilty University

PheWAS *

* - Not available at this time

Phenotype Reuse

• T2DM Diabetic Retinopathy– Identification of DM– T2DM included T1DM for exclusion

• Low HDL Lipids

Phenotype Reuse

T2DM

Diabetic Retinopathy

Iterative Refinement for Reuse

Condition - Subtype A Condition - Subtype B

Condition

Subtype A

Subtype B

Formalizing Reuse

• Identified potential for reuse

• Leverage significant work

• Phenotypes available: www.gwas.org

• Limitations– Site-specific implementations

Impressions

• Easy to do

• Fits with eMERGE goals

• Can fit retrospectively

• Prospective mindset

AMIA CRI Summit 2011

Thank You

Luke Rasmussen

[email protected]