Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Use of Active Learning for Selective Annotation of Training Data in a

Supervised Classification System for Digitized Histology

Scott Doyle1, Michael Feldman2, John Tomaszewski2, Anant Madabhushi1

1Department of Biomedical Engineering, Rutgers, The State University of New Jersey2Department of Surgical Pathology, University of Pennsylvania

http://lcib.rutgers.edu

Outline Background

Digital Prostate Histopathology Supervised Classification Active Learning

Methodology Active Learning Data Description Experimental Setup

Experimental Results Concluding Remarks

Prostate Cancer Detection

~1 million biopsies per year in USA 10-12 tissue samples per biopsy 80% benign diagnosis Large amount of data to analyze

Computer-Aided Diagnosis

Identifies regions of interest / suspicion Quantitative Automated Reduces variability

Supervised classification system

Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A. “A Hierarchical Computer-aided Classification Scheme for Automated Detection of Prostatic Adenocarcinoma from Digitized Histology,” APIII 2006

Supervised Classification Expert segmentation for training Histopathology:

Expensive, time-consuming to annotate Cost per training sample is high

Supervised Classification Random training inefficient Possible redundancy with existing

training No guarantee of improved accuracy

Solution: Active Learning Choose training samples intelligently, not

randomly Increased accuracy per training sample Forced choice of training, maximized accuracy

Useful where: Large amount of unlabeled data Annotations are expensive

Ideally suited for histopathology data

Active Learning

Classifier Performance

Accuracy

# of Training Samples

Random Learning

Active Learning

Previous Work Liu [2004], Vogiatzis and Tsapatsoulis [2006]

Gene microarray data Yao, et al [2008]

Content-based image retrieval Little work done in histopathology with Active

Learning

Outline Background




Build Classifier

Active Learning Methodology

Cancer Non-cancerUncertain Classification

Obtained from pathologist

Training DataLabeled

Unlabeled

Build Classifier Classify UnlabeledTraining


Uncertain ClassificationInformative Samples

Certain ClassificationUninformative

Obtain Expert Labels Combine With Original Set

Eliminate, labeling these adds no information

+

Identify InformativeRegions


Generate New ClassifierNew Training Set

Feature Extraction

Cancer Region

Original Image

Feature Images

Classification

Feature Images C4.5 Decision Tree

Doyle, S., Madabhushi, A., Feldman, M., Tomaszeweski, J.: A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology, MICCAI, Lecture Notes in Computer Science, Vol. 4191, pp. 504-511, 2006.

“Random Forest” [Brieman, 2001]Majority voting determines classification

Image Data Description 27 H&E stained digital biopsy samples Data breakdown:

Initial Training Set Unlabeled Training Set Testing Set

Active Learning drawn from Unlabeled Training

Groups rotated so all images are tested

Classification Three training groups evaluated:

Initial set:

Active Learning set:

Random Learning set:

Initial Training

Active LearningInitial Training

RandomLearningInitial Training

+

+

Outline Background




Results: Qualitative

Original ImageRandom LearningActive Learning


Random Learning

Active Learning


Original ImageRandom LearningActive Learning


Active LearningRandom Learning

Quantitative Evaluation


Area Under the ROC Curve

0.93

0.935

0.94

0.945

0.95

0.955

0.96

Initial Active Learning Random Learning

AU

C


Classification Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Initial Active Learning Random Learning

Acc

ura

cy

Outline Background




Concluding Remarks Maximize classification accuracy by choosing

training intelligently Efficiently obtain annotations Make the most use of “training budget” Build Active Learning into clinical applications

Online training correction / modification User feedback

Acknowledgements The Coulter foundation (WHCF 4-

29368) New Jersey Commission on Cancer

Research The National Cancer Institute

(R21CA127186-01, R03CA128081-01) The US Department of Defense

(427327) The Society for Medical Imaging and

Informatics

Documents

Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman