Real-WorldPerformanceof Deep-Learning-Based System for … · 2018. 9. 26. · #CMIMI18#CMIMI18 Real-WorldPerformanceof Deep-Learning-Based System for Intracranial Hemorrhage Detection

#CMIMI18#CMIMI18

Real-World Performance ofDeep-Learning-Based System for

Intracranial Hemorrhage DetectionSehyo Yune, MD MPH MBA

Hyunkwang Lee, Stuart Pomerantz, Javier Romero, Shahmir Kamalian,Ramon Gonzalez, Michael Lev, Synho Do

Department of RadiologyMassachusetts General Hospital

LABORATORY OFMEDICAL IMAGING

AND COMPUTATION

#CMIMI18

Promise vs. Reality

#CMIMI18

Intracranial Hemorrhage Detection System

SAH 99.99%

IPH 0.24%

SDH 0.08%

EDH 0.06%

IVH 0.01%

#CMIMI18

Data Collection

Searched the institutional research database for non-contrast head CT scans acquired February 2005 - August 2017Only used 5-mm axial images Exclusion Criteria: History of brain surgery Intracranial tumor Intracranial device placement Skull fracture Cerebral infarct

#CMIMI18

Data Collection

Data annotation Development set: consensus of 5 neuroradiologists at slice level Test set: confirmation of clinical report by 1 neuroradiologist at case level

Development dataset Test dataset

# of cases # of images # of cases # of images

ICH (+) 625 5,240 100 3,525

ICH (-) 279 9,518 100 3,411

Total 904 14,758 200 6,936

#CMIMI18

Performance of ICH Detection System

AUC, 0.99398% Sensitivity95% Specificity

Dots denote radiologist performance: 1st year resident, 2nd year resident, 3rd year resident,attending radiologist (9-year experience), attending radiologist (20-year experience)

ROC curve for detection of ICH

#CMIMI18

Will This Work in the Real-World?

#CMIMI18

Real-world data collection

All consecutive cases of non-contrast head CT acquired at the emergency department from September – November 2017 2,606 cases collected Labeled by using natural language processing of clinical reports 163 ICH (+), 2,443 ICH (-)

#CMIMI18

Model Performance Comparison

Model Prediction

ICH (+) ICH (-)

Clinical report ICH (+) 142 21 Sensitivity: 87.1%

ICH (-) 1,018 1,425 Specificity: 58.3%

PPV: 12.2% NPV: 98.5%

Model Prediction

ICH (+) ICH (-)

Clinical report + Expert confirmation

ICH (+) 98 2 Sensitivity: 98%

ICH (-) 5 95 Specificity: 95%

PPV: 95.1% NPV: 97.9%

Selected test dataset

Real-world test dataset

NPV, negative predictive value; PPV, positive predictive value

#CMIMI18

Model Performance Comparison

ROC curve on selected data set (n=200) ROC curve on real-world data set (n=2,606)

#CMIMI18

Why?

#CMIMI18

Difference in Vendor Distribution

Development Selected test Real-world test

CT Vendor ICH(+) ICH(-) ICH(+) ICH(-) ICH(+) ICH(-)

A 725 233 68 87 96 1,445

B 101 1 32 12 67 998

C 55 12

D 23 33 1

Model Prediction

ICH (+) ICH (-)

Clinical report

ICH (+) 79 17 Sensitivity: 82.3%

ICH (-) 383 1,062 Specificity: 73.5%

Data Distribution

Vendor A Performance in real-world dataset

Vender B Performance in real-world dataset

Model Prediction

ICH (+) ICH (-)

Clinical report


ICH (-) 635 363 Specificity: 36.4%

#CMIMI18

Difference in Vendor Distribution

Development Selected test Real-world test

CT Vendor ICH(+) ICH(-) ICH(+) ICH(-) ICH(+) ICH(-)

A 725 233 68 87 96 1,445

B 101 1 32 12 67 998

C 55 12

D 23 33 1

Model Prediction

ICH (+) ICH (-)

Clinical report


ICH (-) 383 1,062 Specificity: 73.5%

Data Distribution

Vendor A Performance in real-world dataset

Vender B Performance in real-world dataset

Model Prediction

ICH (+) ICH (-)

Clinical report


ICH (-) 635 363 Specificity: 36.4%

#CMIMI18

Review of False Negative Cases

21 FN cases reviewed by neuroradiologist with > 20-year experience

8 no acute bleeding (report hedging)

11 small bleeding not visualized on axial CT images 2 small (3mm, 10mm) acute subdural hematoma

#CMIMI18

Review of False Negative Cases

8 no acute bleeding (report hedging) 11 small bleeding not visualized on axial CT images

2 small (3mm, 10mm) acute subdural hematoma

#CMIMI18

Review of False Positive Cases

1,018 false-positive cases (5,269 slices) split into 5 sets to be reviewed by 5 neuroradiologists

Hyperdense falx or tentorium (1,580 slices / 420 cases) CT artifacts (1,545 slices / 463 cases) Bleeding (875 slices / 92 cases) Other non-bleed pathology (663 slices / 149 cases) Calcification (348 slices / 130 cases) Others (743 slices / 373 cases)

#CMIMI18

False Positive Case

Falx hyperdensity

#CMIMI18

Hyperdense Falx/tentorium

False-positive case True-negative case True-positive case

#CMIMI18

CT Artifact

Motion, streak, beam hardening, head tilt, etc.

#CMIMI18

Beam Hardening Artifact in Dentate Nucleus

False-positive case True-negative case

#CMIMI18

Other False Positive Cases

Bleeding (875 slices / 92 cases) Chronic ICH, extracranial bleeding, hemorrhagic tumor

Other non-bleed pathology (663 slices / 149 cases) Encephalomalacia, meningioma, metastatic mass, vasogenic edema, post-

surgical change, old infarct

Others (743 slices / 373 cases) Dense blood vessels, deep sulcus, subdural hygroma

#CMIMI18

Review of False Positive Cases

Falx/tentorium CT artifacts Bleeding Other

pathology Calcification Others Total

A 113 (26.9%) 121 (26.1%) 43 (46.7%) 76 (51.0%) 72 (55.4%) 128 (34.3%) 383 (37.6%)

B 307 (73.1%) 342 (73.9%) 49 (53.3%) 73 (49.0%) 58 (44.6%) 245 (65.7%) 635 (62.4%)

Total 420 463 92 149 130 373 1,018

Red texts indicate significantly larger numbers. Statistical significance was determined by Pearson’s χ2 test.

Number of cases acquired by scanners from the two vendors by FP category

#CMIMI18

Future Work

Validate the labels assigned by NLP of clinical reports Improve the model by re-training the CNNs Distinguish chronic vs. subacute vs. acute bleeding Recognize other pathologies

Validate the improved model in a new setting (different CT manufacturers, image acquisition/reconstruction protocols, patient populations)Optimize parameters for each clinical setting before deployment

#CMIMI18

Take Home Message

Know the reality, use it accordingly

Documents

Real-WorldPerformanceof Deep-Learning-Based System for … · 2018. 9. 26. · #CMIMI18#CMIMI18 Real-WorldPerformanceof Deep-Learning-Based System for Intracranial Hemorrhage Detection