Upload
lester
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping. Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics. June 11, 2012. Project 3: Collaborators & Acknowledgments. - PowerPoint PPT Presentation
Citation preview
Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput PhenotypingJyotishman Pathak, PhDAssistant Professor of Biomedical Informatics
June 11, 2012
SHARPn High-Throughput Phenotyping
Project 3: Collaborators & Acknowledgments• CDISC (Clinical Data Interchange Standards Consortium)
• Rebecca Kush, Landen Bain• Centerphase Solutions
• Gary Lubin, Jeff Tarlowe• Group Health Seattle
• David Carrell• Harvard University/MIT
• Guergana Savova, Peter Szolovits• Intermountain Healthcare/University of Utah
• Susan Welch, Herman Post, Darin Wilcox, Peter Haug• Mayo Clinic
• Cory Endle, Rick Kiefer, Sahana Murthy, Gopu Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor
©2012 MFMER | slide-2
SHARPn High-Throughput Phenotyping
Phenotyping is still a bottleneck…
©2012 MFMER | slide-4[Image from Wikipedia]
SHARPn High-Throughput Phenotyping
EHR systems: United States 2002—2011
©2012 MFMER | slide-5
[Millwood et al. 2012]
SHARPn High-Throughput Phenotyping
Electronic health records (EHRs) driven phenotyping
• EHRs are becoming more and more prevalent within the U.S. healthcare system• Meaningful Use is one of the major drivers
• Overarching goal• To develop high-throughput automated
techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings
©2012 MFMER | slide-6
SHARPn High-Throughput Phenotyping
EHR-driven Phenotyping Algorithms - I• Typical components
• Billing and diagnoses codes• Procedure codes• Labs• Medications• Phenotype-specific co-variates (e.g., Demographics,
Vitals, Smoking Status, CASI scores)• Pathology• Imaging?
• Organized into inclusion and exclusion criteria
©2012 MFMER | slide-8
SHARPn High-Throughput Phenotyping
DataTransformTransform
EHR-driven Phenotyping Algorithms - II
PhenotypeAlgorithm
Visualization
Evaluation
NLP, SQL
Rules
Mappings [eMERGE Network]
©2012 MFMER | slide-9
SHARPn High-Throughput Phenotyping
Example: Hypothyroidism Algorithm
No secondary causes (e.g., pregnancy, ablation)
No ICD-9s forHypothyroidism
NoAbnormalTSH/FT4
No Antiboides for TTG/TPO
ICD-9s forHypothyroidism
Antibodies forTTG or TPO(anti-thyroglobulin,anti-thyroperidase)
AbnormalTSH/FT4
No thyroid-altering medications (e.g., Phenytoin, Lithium)
Thyroid replace. meds
Case 1 Case 2
No thyroid replace. meds
Control
2+ non-acute visits in 3 yrs
No hx of myasthenia gravis
©2012 MFMER | slide-10
[Denny et al., 2012]
SHARPn High-Throughput Phenotyping
Hypothyroidism Algorithm: Validation
Positive Predictive Values (PPV) Based on Chart Review – All Sites
SiteEHR-based
Cases/Controls
Sampled forChart Review
Cases/ControlsOld CasePPV (%)
New Case PPV (%)
Group Health 430/1,188 50/50 92 98
Marshfield 509/1193 50/50 88 91
Mayo Clinic 250/2,145 100/100 76 97
Northwestern 103/516 50/50 88 98
Vanderbilt 184/1,344 50/50 90 98All sites 1,421/6,362 — 87 96
©2012 MFMER | slide-11
[Denny et al., 2012]
Data Categories used to define the EHR-driven Phenotyping Algorithms
Clinical gold standard
EHR-derived phenotype
Phenotype Definitions
Validation (PPV/NPV)
Alzheimer’s Dementia
Demographics, clinical examination of mental status, histopathologic examination
Diagnoses, medications
Demographics, laboratory tests, radiology reports
73%
Cataracts Clinical exam finding (Ophthalmologic examination)
Diagnoses, procedure codes
Demographics, medications
98%/98%
Peripheral Arterial Disease
Radiology test results (ankle-brachial index or arteriography)
Diagnoses, procedure codes, medications, radiology test results
Demographics 94%/99%
Type 2 Diabetes Laboratory Tests Diagnoses, laboratory tests, medications
Demographics, height, weight, family history
98%/100%
Cardiac Conduction
ECG measurements ECG report results Demographics, diagnoses, procedure codes, medications, laboratory tests
97%
[eMERGE Network]©2012 MFMER | slide-12
SHARPn High-Throughput Phenotyping
0.5 5
Genotype-Phenotype Association Results
0.5 50.5 5.01.0
Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.0[Ritchie et al. 2010]
observedpublished
©2012 MFMER | slide-13
SHARPn High-Throughput Phenotyping
Key lessons learned from eMERGE• Algorithm design and transportability
• Non-trivial; requires significant expert involvement• Highly iterative process• Time-consuming manual chart reviews• Representation of “phenotype logic” for transportability
is critical
• Standardized data access and representation• Importance of unified vocabularies, data elements, and
value sets• Questionable reliability of ICD & CPT codes (e.g., billing
the wrong code since it is easier to find)• Natural Language Processing (NLP) is critical
©2012 MFMER | slide-14
SHARPn High-Throughput Phenotyping
DataTransformTransform
Algorithm Development Process - Modified
PhenotypeAlgorithm
Visualization
Evaluation
NLP, SQL
Rules
Mappings
Semi-Automatic Execution
[eMERGE Network]
©2012 MFMER | slide-15
SHARPn High-Throughput Phenotyping
DataTransformTransform
Algorithm Development Process - Modified
PhenotypeAlgorithm
Visualization
Evaluation
NLP, SQL
Rules
Mappings
Semi-Automatic Execution
©2012 MFMER | slide-16
• Standardized representation of clinical data
• Create new and re-use existing clinical element models (CEMs)
• Standardized and structured representation of phenotype definition criteria
• Use the NQF Quality Data Model (QDM)
• Conversion of structured phenotype criteria into executable queries
• Use JBoss® Drools (DRLs)
[Welch et al. 2012][Thompson et al., submitted 2012]
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
The SHARPn “phenotyping funnel”
©2012 MFMER | slide-17
Phenotype specific patient cohorts
DRLs
QDMs
CEMs
[Welch et al. 2012][Thompson et al., submitted 2012]
[Li et al., submitted 2012]
Intermountain EHR
Mayo Clinic EHR
SHARPn High-Throughput Phenotyping
Clinical Element ModelsHigher-Order Structured Representations
©2012 MFMER | slide-18
[Stan Huff, IHC]
SHARPn High-Throughput Phenotyping
Pre- and Post-Coordination
©2012 MFMER | slide-19
[Stan Huff, IHC]
SHARPn High-Throughput Phenotyping [Stan Huff, IHC]
CEMs available for patient demographics, medications, lab measurements, procedures etc.
©2012 MFMER | slide-21
SHARPn data normalization flow - I
CEM MySQL database with normalized patient information
[Welch et al. 2012]
SHARPn High-Throughput Phenotyping
SHARPn data normalization flow - II
©2012 MFMER | slide-22
CEM MySQL database with normalized patient information
SHARPn High-Throughput Phenotyping
DataTransformTransform
Algorithm Development Process - Modified
PhenotypeAlgorithm
Visualization
Evaluation
NLP, SQL
Rules
Mappings
Semi-Automatic Execution
©2012 MFMER | slide-23
• Standardized representation of clinical data
• Create new and re-use existing clinical element models (CEMs)
• Standardized and structured representation of phenotype definition criteria
• Use the NQF Quality Data Model (QDM)
[Welch et al. 2012][Thompson et al., submitted 2012]
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
Our task: human readable machine computable
©2012 MFMER | slide-24
[Thompson et al., submitted 2012]
SHARPn High-Throughput Phenotyping
NQF Quality Data Model (QDM)• Standard of the National Quality Forum (NQF)
• A structure and grammar to represent quality measures in a standardized format
• Groups of codes in a code set (ICD-9, etc.)• "Diagnosis, Active: steroid induced diabetes" using
"steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)”
• Supports temporality & sequences• AND: "Procedure, Performed: eye exam" > 1 year(s)
starts before or during "Measurement end date"• Implemented as set of XML schemas
• Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.)
©2012 MFMER | slide-25
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-26
116 Meaningful Use Phase I Quality Measures
SHARPn High-Throughput Phenotyping
Example: Diabetes & Lipid Mgmt. - I
©2012 MFMER | slide-27
Human readable HTML
SHARPn High-Throughput Phenotyping
Example: Diabetes & Lipid Mgmt. - II
©2012 MFMER | slide-28
Computable XML
SHARPn High-Throughput Phenotyping
NQF Measure Authoring Tool (MAT)
©2012 MFMER | slide-29
SHARPn High-Throughput Phenotyping
DataTransformTransform
Algorithm Development Process - Modified
PhenotypeAlgorithm
Visualization
Evaluation
NLP, SQL
Rules
Mappings
Semi-Automatic Execution
©2012 MFMER | slide-30
• Standardized representation of clinical data
• Create new and re-use existing clinical element models (CEMs)
• Standardized and structured representation of phenotype definition criteria
• Use the NQF Quality Data Model (QDM)
• Conversion of structured phenotype criteria into executable queries
• Use JBoss® Drools (DRLs)
[Welch et al. 2012][Thompson et al., submitted 2012]
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
JBoss® open-source Drools rules based management system (RBMS)
©2012 MFMER | slide-31
• Represents knowledge with declarative production rules• Origins in artificial intelligence
expert systems• Simple when <pattern> then
<action> rules specified in text files
• Separation of data and logic into separate components
• Forward chaining inference model (Rete algorithm)
• Domain specific languages (DSL)
SHARPn High-Throughput Phenotyping
Example Drools rule
©2012 MFMER | slide-32
rule "Glucose <= 40, Insulin On“
when $msg : GlucoseMsg(glucoseFinding <= 40,
currentInsulinDrip > 0 )then
glucoseProtocolResult.setInstruction(GlucoseInstructions.GLUCOSE_LESS_THAN_40_INSULIN_ON_MSG);end
{binding} {Java Class} {Class Getter Method}
Parameter {Java Class}
{Class Setter Method}
{Rule Name}
SHARPn High-Throughput Phenotyping
Automatic translation from NQF QDM criteria to Drools
Measure Authoring
Toolkit
Drools Engine
From non-executable to executable
Data TypesXML-based structured
representation
Value Setssaved in XLS
files
MeasuresXML-basedStructured
representation
Mapping data typesand value sets
Fact Models
Converting measures to Drools scripts
Droolsscripts
©2012 MFMER | slide-33
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
Automatic translation from NQF QDM criteria to Drools
©2012 MFMER | slide-34
[Li et al., submitted 2012]
The “executable” Drools flow
©2012 MFMER | slide-35
©2012 MFMER | slide-36
Phenotype library and workbench - I
1. Converts QDM to Drools2. Rule execution by querying
the CEM database3. Generate summary reports
http://phenotypeportal.org
©2012 MFMER | slide-37
Phenotype library and workbench - IIhttp://phenotypeportal.org
©2012 MFMER | slide-38
Phenotype library and workbench - IIIhttp://phenotypeportal.org
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-39
Phenotype library and workbench - IV
SHARPn High-Throughput Phenotyping
SHARPn High-Throughput Phenotyping
Additional on-going research efforts - I• Machine learning and
association rule mining• Manual creation of
algorithms take time• Let computers do the
“hard work”• Validate against
expert developed ones
©2012 MFMER | slide-41
[Caroll et al. 2011]
SHARPn High-Throughput Phenotyping
Additional on-going research efforts - I
• Origins from sales data• Items (columns): co-morbid conditions• Transactions (rows): patients• Itemsets: sets of co-morbid conditions• Goal: find all itemsets (sets of conditions)
that frequently co-occur in patients.• One of those conditions should be DM.
• Support: # of transactions the itemset I appeared in• Support({TB, DLM, ND})=3
• Frequent: an itemset I is frequent, if support(I)>minsup
Patient TB DLM
ND … IEC
001 Y Y Y Y
002 Y Y Y Y
003 Y Y
004 Y
005 Y Y Y
X: infrequent
[Simon et al. 2012]
©2012 MFMER | slide-42
SHARPn High-Throughput Phenotyping
Additional on-going research efforts - II
©2012 MFMER | slide-43
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-44
TRALI/TACO sniffer
Additional on-going research efforts - II
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-45
SHARPn High-Throughput Phenotyping
Active Surveillance for TRALI and TACO
Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service.
Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service.
©2012 MFMER | slide-46
Additional on-going research efforts - III• Phenome-wide association scan (PheWAS)
• Do a “reverse GWAS” using EHR data• Facilitate hypothesis generation
©2012 MFMER | slide-47
[Pathak et al. submitted 2012]
SHARPn High-Throughput Phenotyping
Publications till date (conservative)
Year 1 (2011) Year 2 (2012) Year 3 (2013)0
2
4
6
8
10
12
14
8
66
2
12
PapersAbstractsUnder review
©2012 MFMER | slide-48
SHARPn High-Throughput Phenotyping
Mayo projects and collaborations• Ongoing
• Transfusion related acute lung injury (Kor)• Drug induced liver injury (Talwalkar)• Drug induced thrombocytopenia and neutropenia (Al-Kali)• Active surveillance for celiac disease (Murray)• Warfarin dose response & heartvalve replacements (Pereira)• Phenotype definition standardization (HCPR/Quality)
• Getting started/planning• Pharmacogenomics of systolic heart failure
(Bielinski/Pereira)• Pharmacogenomics of SSRI (Mrazek/Weinshilboum)• Lumbar image reporting with epidemiology (Kallmes)• Active clinical trial alerting (CTMS/Cancer Center)
©2012 MFMER | slide-49
SHARPn High-Throughput Phenotyping
HTP related presentations• June 11th, 2012
• Using EHRs for clinical research (Vitaly Herasevich)• Association rule mining and T2D risk prediction (Gyorgy Simon)• Scenario-based requirements engineering for developing EHR add-ons
to support CER in patient care settings (Junfeng Gao)• June 12th, 2012
• Exploring patient data in context clinical research studies: Research Data Explorer (Adam Wilcox et al.)
• Utilizing previous result sets as criteria for new queries with FURTHeR (Dustin Schultz et al.)
• Semantic search engine for clinical trials (Yugyung Lee)• Knowledge-driven workbench for predictive modeling (Peter Haug et al.)• Clinical analytics driven care coordination for 30-day readmission –
Demonstration from 360 Fresh.com (Ramesh Sairamesh)
©2012 MFMER | slide-50
SHARPn High-Throughput Phenotyping
Thank You!
©2012 MFMER | slide-51