Upload
mervin-heath
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Use of Active Learning for Selective Annotation of Training Data in a
Supervised Classification System for Digitized Histology
Scott Doyle1, Michael Feldman2, John Tomaszewski2, Anant Madabhushi1
1Department of Biomedical Engineering, Rutgers, The State University of New Jersey2Department of Surgical Pathology, University of Pennsylvania
http://lcib.rutgers.edu
Outline Background
Digital Prostate Histopathology Supervised Classification Active Learning
Methodology Active Learning Data Description Experimental Setup
Experimental Results Concluding Remarks
Prostate Cancer Detection
~1 million biopsies per year in USA 10-12 tissue samples per biopsy 80% benign diagnosis Large amount of data to analyze
Computer-Aided Diagnosis
Identifies regions of interest / suspicion Quantitative Automated Reduces variability
Supervised classification system
Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A. “A Hierarchical Computer-aided Classification Scheme for Automated Detection of Prostatic Adenocarcinoma from Digitized Histology,” APIII 2006
Supervised Classification Expert segmentation for training Histopathology:
Expensive, time-consuming to annotate Cost per training sample is high
Supervised Classification Random training inefficient Possible redundancy with existing
training No guarantee of improved accuracy
Solution: Active Learning Choose training samples intelligently, not
randomly Increased accuracy per training sample Forced choice of training, maximized accuracy
Useful where: Large amount of unlabeled data Annotations are expensive
Ideally suited for histopathology data
Active Learning
Classifier Performance
Accuracy
# of Training Samples
Random Learning
Active Learning
Previous Work Liu [2004], Vogiatzis and Tsapatsoulis [2006]
Gene microarray data Yao, et al [2008]
Content-based image retrieval Little work done in histopathology with Active
Learning
Outline Background
Digital Prostate Histopathology Supervised Classification Active Learning
Methodology Active Learning Data Description Experimental Setup
Experimental Results Concluding Remarks
Build Classifier
Active Learning Methodology
Cancer Non-cancerUncertain Classification
Obtained from pathologist
Training DataLabeled
Unlabeled
Build Classifier Classify UnlabeledTraining
Active Learning Methodology
Uncertain ClassificationInformative Samples
Certain ClassificationUninformative
Obtain Expert Labels Combine With Original Set
Eliminate, labeling these adds no information
+
Identify InformativeRegions
Active Learning Methodology
Generate New ClassifierNew Training Set
Feature Extraction
Cancer Region
Original Image
Feature Images
Classification
Feature Images C4.5 Decision Tree
Doyle, S., Madabhushi, A., Feldman, M., Tomaszeweski, J.: A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology, MICCAI, Lecture Notes in Computer Science, Vol. 4191, pp. 504-511, 2006.
“Random Forest” [Brieman, 2001]Majority voting determines classification
Image Data Description 27 H&E stained digital biopsy samples Data breakdown:
Initial Training Set Unlabeled Training Set Testing Set
Active Learning drawn from Unlabeled Training
Groups rotated so all images are tested
Classification Three training groups evaluated:
Initial set:
Active Learning set:
Random Learning set:
Initial Training
Active LearningInitial Training
RandomLearningInitial Training
+
+
Outline Background
Digital Prostate Histopathology Supervised Classification Active Learning
Methodology Active Learning Data Description Experimental Setup
Experimental Results Concluding Remarks
Results: Qualitative
Original ImageRandom LearningActive Learning
Results: Qualitative
Random Learning
Active Learning
Results: Qualitative
Original ImageRandom LearningActive Learning
Results: Qualitative
Active LearningRandom Learning
Quantitative Evaluation
Quantitative Evaluation
Area Under the ROC Curve
0.93
0.935
0.94
0.945
0.95
0.955
0.96
Initial Active Learning Random Learning
AU
C
Quantitative Evaluation
Classification Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Initial Active Learning Random Learning
Acc
ura
cy
Outline Background
Digital Prostate Histopathology Supervised Classification Active Learning
Methodology Active Learning Data Description Experimental Setup
Experimental Results Concluding Remarks
Concluding Remarks Maximize classification accuracy by choosing
training intelligently Efficiently obtain annotations Make the most use of “training budget” Build Active Learning into clinical applications
Online training correction / modification User feedback
Acknowledgements The Coulter foundation (WHCF 4-
29368) New Jersey Commission on Cancer
Research The National Cancer Institute
(R21CA127186-01, R03CA128081-01) The US Department of Defense
(427327) The Society for Medical Imaging and
Informatics