19
Rapid Identification of Pneumonias in BioSense Data Using Radiology Text Reports Armen Asatryan, MD, MPH, 1 Haobo Ma, MD, MS, 1 Roseanne English, BS, 2 Jerome Tokars, MD, MPH 2 1 Science Applications International Corporation 2 Centers for Disease Control and Prevention Atlanta, GA The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention and/or Science Applications International Corporation

Rapid Identification of Pneumonias in BioSense Data Using

  • Upload
    jared56

  • View
    794

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Rapid Identification of Pneumonias in BioSense Data Using

Rapid Identification of Pneumonias in BioSense Data Using Radiology Text Reports

Armen Asatryan, MD, MPH,1 Haobo Ma, MD, MS,1 Roseanne English, BS,2 Jerome Tokars, MD, MPH2

1Science Applications International Corporation 2Centers for Disease Control and Prevention

Atlanta, GA

The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the

Centers for Disease Control and Prevention and/or Science Applications International Corporation

Page 2: Rapid Identification of Pneumonias in BioSense Data Using

Background -- BioSense

• BioSense started in 2003 as CDC’s electronic biosurveillance system

• Purposes: early detection and situational awareness • Data sources: DoD Medical Treatment Facilities, VA

Medical Centers, LabCorp, non-federal acute care hospitals

• Currently, 371 acute care hospitals send ≥1 type of data to BioSense, 42 send radiology text reports

• Available data: facility, patient disposition, visit dates, patient demographics, microbiology lab order/results, radiology orders/results, hospital pharmacy orders, chief complaint, ICD-9 codes, vital signs (temp, BP) in the ER

Page 3: Rapid Identification of Pneumonias in BioSense Data Using

Background -- Pneumonia

• Pneumonia is a major contributor to morbidity and mortality– 1.4 M pneumonia hospital discharges in the US in

2003 (American Lung Association, www.lungusa.org)– Pneumonia and influenza are 7th leading cause of

death in the US– Total cost to the US economy estimated at $37.5

billion in 2004 (ibid)• Early detection of pneumonia is important as

several Category A bioterrorism agent-related diseases as well as influenza can present as pneumonia

Page 4: Rapid Identification of Pneumonias in BioSense Data Using

Previous Research: Administrative Data for Pneumonia Detection

• ICD-9 code- and DRG-based algorithms have Sn 48-66%, Sp >99%, and PPV 73-81% for identifying pneumonia when compared to clinical (physician) reference (Aronsky et al, 2005).

• Bayesian network based algorithm for identifying pneumonia (ICD-9 code as a reference): AUC 0.93 (CI 0.907, 0.948), at fixed Sn 95%, Sp was 69%, PPV 7.3%, NPV 99.8% (Aronsky et al, 2000).

Page 5: Rapid Identification of Pneumonias in BioSense Data Using

Objective

• Objective of study: find keywords in radiology reports that are most highly associated with pneumonia diagnosis

• Radiology text reports may be available with 1-3 days, whereas final diagnosis may take 1-3 weeks– In our data set, radiology text reports were available,

on average, 9 days earlier than ICD-9 discharge codes

Page 6: Rapid Identification of Pneumonias in BioSense Data Using

Methods• Studied radiology text reports from 13 hospitals sending

data to BioSense• Study period: 2/2006 through 1/2007• Included inpatient, outpatient, and ER patients• Keyword-based text parsing SAS® program using

RegEx and accounting for negations and double negations (e.g. pneumonia cannot be excluded)

• Search terms: airspace, consolidation, density, infiltrate, opacity, and pneumonia/pneumonitis

• Logistic regression to evaluate the independent association of the keywords with a final diagnosis of pneumonia (ICD-9 codes 480--486).

Page 7: Rapid Identification of Pneumonias in BioSense Data Using

Example of a CXR showing pneumonia

Page 8: Rapid Identification of Pneumonias in BioSense Data Using

Example of a text radiology report with keywords underlined

• AP and lateral chest film on 01/01/2007 at 12:00 AM. No comparison films available. Clinical history: 58 years old black female with cough and fever. Findings: Right lower lobe infiltrate, consistent with pneumonia. Left lung, heart, mediastinum, and pulmonary vasculature unremarkable. No pneumothorax. No pleural effusion. Impression: Right lower lobe pneumonia.

Page 9: Rapid Identification of Pneumonias in BioSense Data Using

Validation of Text Parsing• Sample dataset: 400 reports (300 with any

of the keywords, including negations, and 100 without)

• Gold-standard for text parsing validation: manual review of the radiology reports

• Manual review for the search keywords: “pneumonia” 77, others 63, none 260

• Text parsing: sensitivity 98.5%, specificity 98.6%

Page 10: Rapid Identification of Pneumonias in BioSense Data Using

Results• 67,714 reports of chest X-rays performed

for 42,510 hospital visits of 34,883 patients

• Male : female ratio = 1 : 1

• Age– Range: <1 day (newborn) to 105 years old– Mean: 55 years old

Page 11: Rapid Identification of Pneumonias in BioSense Data Using

Results• Among all 67,714 radiology reports

– ICD-9 Dx of Pneumonia: 8,631 (12.8%)– Keywords found (some reports had > 1

keyword)• Opacity 22.1%• Infiltrate 15%• Density 8.2%• Pneumonia 6%• Consolidation 5.5%• Airspace disease 1.1%

Page 12: Rapid Identification of Pneumonias in BioSense Data Using

Logistic Regression Model to Evaluate Keywords

• Reports may have >1 keyword in many combinations:– Right lower lobe infiltrate, consistent with pneumonia– Right lower lobe density, suggestive of airspace

disease or mass

• Model shows independent association between each keyword and final diagnosis

• Model contained 6 dichotomous indicator variables (i.e., 0=keyword not present in report, 1=keyword present)

Page 13: Rapid Identification of Pneumonias in BioSense Data Using

Logistic Regression ModelPredictors of Final Diagnosis of Pneumonia

Keyword

Odds Ratio* 95% Confidence Interval

Infiltrate 3.5 3.3-3.6

Pneumonia 3.0 2.7-3.2

Airspace 2.7 2.3-3.3

Consolidation 2.7 2.5-2.9

Opacity 2.0 1.9-2.1

Density 1.2 1.1-1.3

* Reference group is no mention of the keyword, i.e., reports mentioning “infiltrate” are 3.5 times more likely to have a final ICD-9 diagnosis of pneumonia than reports without this keyword

Page 14: Rapid Identification of Pneumonias in BioSense Data Using

Keyword Index

Pneumonia diagnosis

No pneumonia diagnosis

One or more keyword

6,186 18,180

No keywords 2,445 40,903

All 8,631 59,083

Page 15: Rapid Identification of Pneumonias in BioSense Data Using

Results: Keyword Index

• One or more of the 5 keywords was found in– 6,186/8,631 (71.7%) of reports of patients

with a pneumonia diagnosis– 18,180/59,083 (30.8%) of patients without a

pneumonia diagnosis– Sensitivity 72%, specificity 69%, kappa=0.23

Page 16: Rapid Identification of Pneumonias in BioSense Data Using

Limitations / Strengths

• Limitations– Study included a small number of facilities in a limited

geographical area– No formal data verification performed– Based on ICD-9 coded diagnosis, rather than formal

clinical case definition– Variations in readings between radiologists (style,

preferences, quality)• Strengths

– Used empirical process to determine keywords associated with pneumonia

– Used clinically rich data

Page 17: Rapid Identification of Pneumonias in BioSense Data Using

Discussion• Computerized text parsing can successfully identify

keywords in radiology text reports• Five keywords (infiltrate, pneumonia, airspace,

consolidation, and opacity) were independently associated with a final ICD-9 coded diagnosis of pneumonia

• Reasonable sensitivity for the 5 keywords and ICD-9 Dx, but low kappa. Reasons:– Inadequacy of coded diagnoses– Clinical Dx in the presence of a negative CXR– Either ICD-9 code or CXR may indicate pneumonia, but neither

is definitive– Possible missing CXRs

Page 18: Rapid Identification of Pneumonias in BioSense Data Using

Radiology Text Report Processed by Natural Language Processing

• History Section – HPI AP PORTABLE CHEST AT 1100 HOURS HISTORY:

Shortness of breath. • Diagnostic Tests Section

– There are chronic and degenerative changes. There is marked chronic lung disease. Extensive pulmonary opacity is seen in the right mid and right lower lung zone compatible with pneumonia. This is superimposed on diffuse chronic lung disease. A focal area of scarring and/ or discoid atelectasis is seen at the medial left lung base. A central line enters from the right and terminates in the superior vena cava.

• Assessment Section – IMPRESSION: Marked chronic lung disease. Interval

development of a large right mid and right lower lung zone opacity compatible with pneumonia.

Page 19: Rapid Identification of Pneumonias in BioSense Data Using

Future Steps

• Study additional facilities• Evaluate radiology text keywords using a formal

case definition along with chart review as the gold standard

• Evaluate the value of natural language processors in finding additional keywords and implementing more sophisticated concepts such as degrees of certainty, presence of comorbidities, and earliness of detection