Upload
tomasz-adamusiak
View
705
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Presented at 2014 AMIA Joint Summits, April 9, 2014, San Francisco, CA BACKGROUND. Pancreatic cancer is one of the most common causes of cancer-related deaths in the United States, it is difficult to detect early and typically has a very poor prognosis. We present a novel method of large-scale clinical hypothesis generation based on phenome wide association study performed using Electronic Health Records (EHR) in a pancreatic cancer cohort. METHODS. The study population consisted of 1,154 patients diagnosed with malignant neoplasm of pancreas seen at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. We evaluated death of a patient as the primary clinical outcome and tested its association with the phenome, which consisted of over 2.5 million structured clinical observations extracted out of the EHR including labs, medications, phenotypes, diseases and procedures. The individual observations were encoded in the EHR using 6,617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. We remapped this initial code set into UMLS concepts and then hierarchically expanded to support generalization into the final set of 10,164 clinical concepts, which formed the final phenome. We then tested all possible pairwise associations between any of the original 10,164 concepts and death as the primary outcome. RESULTS. After correcting for multiple testing and folding back (generalizing) child concepts were appropriate, we found 231 concepts to be significantly associated with death in the study population. CONCLUSIONS. With the abundance of structured EHR data, phenome wide association studies combined with knowledge engineering can be a viable method of rapid hypothesis generation.
Citation preview
EHR-based Phenome Wide Association Study
in Pancreatic Cancer
Tomasz Adamusiak MD PhD
@7omasz
Conflict of interest disclosure
Tomasz Adamusiak has no real or apparent conflicts of interest to report
2
Learning Objectives
• Recognize the value of structured clinical information
• Identify computational and terminology challenges in big data analytics
3
phe·no·type
n. Clinical Informatics
all clinically relevant features contained in patient’s electronic health record
4
Phenome
5
0% 10% 20% 30% 40% 50% 60% 70%
Other
Labs
Medications
Procedures
Problems
Concepts (7k)
Observations (2M)
Re-transform data on demand
6
Focused on discrete data elements
Categorical variables:
• Ethnicity
• Problems
• Procedures
• Medications
• Clinical results
– laboratory tests
– vitals
7
Meaningful Use context
Categorical variables:
• Ethnicity
• Problems
• Procedures
• Medications
• Clinical results
– laboratory tests
– vitals
Clinical terminologies:
• OMBSNOMED CT
• ICD9/10
• HCPCS/CPT-4
• Medi-SpanRxNorm
• CPT-4
• LOINC
8
Pancreatic cancer has an extremely poor prognosis
• Survival: For all stages combined,
• 1-year relative survival rates 25%
• 5-year relative survival rates 6%
Source: http://www.cancer.org/acs/groups/content/@nho/documents/document/acspc-024113.pdf
9
Test all associations with pancreatic cancer and death as primary outcome
10
1298 patients
2 359 265 observations
2004 - 2013
ICD 9/10, SNOMED CT, CPT-4, LOINC,
RxNorm
6 617 codes
10 164 concepts 231 significant
associations
Single terminology
11
Add other terminologies to the mix
12
Use relations other than subsumption (non-isa)
13
Use relations other than subsumption (non-isa) to increase statistical power
14
Histamine H2 Antagonists
Cimetidine
Cimetidine 300 MG
Cimetidine 300 MG Oral Tablet
Cimetidine 400 MG
Cimetidine 400 MG Oral Tablet
constitutes
ingredient_of
isa RxNorm
Use meta-categorization (UMLS Semantic Network)
15
Not all codes are created equal
16
Expansion in UMLS across MU sources
17
Diabetes mellitus without mention of complication,
type II or unspecified type, not stated as
uncontrolled
ICD-9
ICD-10
SNOMED CT
NDF-RT
Situation with explicit
context
Metabolic diseases
roots:
6o of terminological Kevin Bacon
Acute myocardial infarction
Myocardial ischemia
Vascular Diseases
Disorder of soft tissue
Collagen Diseases
Connective Tissue Diseases
Epidermal and dermal conditions
Skin and subcutaneous tissue disorders
Dermatologic disorders
18
UMLS is ideal for integration of heterogeneous clinical data
• Translational potential (OMIM, GO, NCIt)
• Single entry point to MU terminologies
• Cross-walk between MU terms
• Terminology-agnostic
• Text-mining
19
Extracting genetic information out of EHR is a major challenge
Encounter due to genetic counseling
Yes No
Outcome Deceased 2 813
Alive 3 336
20
Background reference
Methods: • Chi-squared test • Bonferroni correction • RR estimate of effect size
Statistically significant highlights
Decreased Risk (RR < 1)
• sevoflurane Inhalant Solution
• Ionic iodinated contrast media
Increased Risk (RR > 1)
• cytopathology
• cimetidine
21
Resource utilization
CORRELATION DOES NOT IMPLY CAUSATION
Private traits and attributes are predictable from digital records of human behavior. Kosinski M1, Stillwell D, Graepel T. PMID: 23479631
22 By Jono Winn (Flickr) [CC-BY-2.0], via Wikimedia Commons
Future work: cohort profiles
1. Malignant neoplasm of pancreas (C0346647) 2. Digestive System Neoplasms (C0012243) 3. Glucose test, blood by glucose monitoring device(s) cleared by the FDA
specifically for home use (C0373627) 4. Hepatic function panel This panel must include the following: Albumin (82040)
Bilirubin, total (82247) Bilirubin, direct (82248) Phosphatase, alkaline (84075) Protein, total (84155) Transferase, alanine amino (ALT) (SGPT) (84460) Transferase, aspartate amino (AST) (SGOT) (C0812554)
5. Basic metabolic panel (Calcium, total) This panel must include the following: Calcium, total (82310) Carbon dioxide (bicarbonate) (82374) Chloride (82435) Creatinine (82565) Glucose (82947) Potassium (84132) Sodium (84295) Urea nitrogen (BUN) (84520) (C0519823)
6. Regular Insulin, Human 100 UNT/ML Injectable Solution (C0977794) 7. heparin sodium, porcine 10 UNT/ML Injectable Solution (C0977415) 8. Pancreatic Diseases (C0030286) 9. Dexamethasone 4 MG/ML Injectable Solution (C0976136) 10. Sodium Chloride 0.154 MEQ/ML Injectable Solution (C0980221)
23
Limitations
• Gaps in data
– Out of network
– Provider-related
– Terminology-related
24
Thank you
Co-authors:
Mary Shimoyama, PhD
@7omasz
25
Results
http://dx.doi.org/10.6084/m9.figshare.816958
For more background information
Next-generation phenotyping using the Unified Medical Language System (UMLS). Adamusiak T, Shimoyama N, Shimoyama M, JMIR Med Inform. doi:10.2196/medinform.3172
Acknowledgements
We thank Stacy Zacher, Glenn Bushee, and Bradley Taylor for their help.
This project was funded in part by the Advancing a Healthier Wisconsin endowment at the Medical College of Wisconsin and the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through grant UL1 RR031973.