Upload
dinhthien
View
227
Download
1
Embed Size (px)
Citation preview
Automated Diagnosis of Otitis Media:
A Vocabulary and Grammar
bimagicLab Dept. of Biomedical Engineering
Carnegie Mellon University
Anupama Kuruvilla
Automated Diagnosis of Otitis Media:
A Vocabulary and Grammar
Anupama Kuruvilla
Advisor: Prof. Jelena Kovacevic
Center for Bioimage Informatics
Department of Biomedical Engineering
Carnegie Mellon University, Pittsburgh, PA 15213
Thesis Manuscript
Submitted in partial fulfillment of the requirements towards the Ph.D. degree
awarded by the Department of Biomedical Engineering, Carnegie Inst. of Tech.,
Carnegie Mellon University.
Thesis Committee Members
Prof. Jelena Kovacevic (Advisor)
Departments of Biomedical Engineering and
Electrical and Computer Engineering
Carnegie Mellon University
Dr. Alejandro Hoberman
Division of General Academic Pediatrics
Children’s Hospital of Pittsburgh
University of Pittsburgh School of Medicine
Prof. Jose M. F. Moura
Departments of Biomedical Engineering and
Electrical and Computer Engineering
Carnegie Mellon University
Prof. George D. Stetten
Department of Biomedical Engineering and
The Robotics Institute
Carnegie Mellon University
എനറെe അമമകയ ം അമപപചചന ം, എനനില അമ അമഎനന ം അമവിശവസിചചദിന ം, അമ
എനറെനന സനനഹിചചദിന ം, അമ എനറെe അമകയ പിടിചചദിന ം.
To Amma and Appachen,
who never doubted, endlessly loved, and constantly supported.
Abstract
This thesis presents an automated algorithm for classifying diagnostic categories of
otitis media (middle ear inflammation): acute otitis media, otitis media with effusion,
and no effusion. Acute otitis media represents a bacterial superinfection of the middle
ear fluid, while otitis media with effusion, represents a sterile effusion that tends to
subside spontaneously. Diagnosing children with acute otitis media is difficult as
it is often confused with otitis media with effusion leading to overprescription of
antimicrobials as they are beneficial only for children with acute otitis media. Such
misdiagnoses is of increasing concern as it leads to mismanaged episodes of otitis
media and most importantly compromises the efficacy of any future treatments for a
bacterial infection. The current standard of clinical diagnosis of otitis media is visual
examination of the tympanic membrane, this manual and subjective evaluation has
clearly shown its limitations prompting the need for an accurate and automated
diagnostic algorithm.
To that end, we design a feature set understood by both otoscopists and engineers
based on the actual visual cues used by otoscopists; we term this the otitis media
vocabulary. We also design a process to combine the vocabulary terms based on the
decision process used by otoscopists; we term this the otitis media grammar. The
algorithm achieves 93.5% classification accuracy, outperforming both clinicians who
did not receive special training and state-of-the-art classifiers.
Acknowledgments
This thesis would be inconceivable if it was not for the very much appreciated support
of all the people who have taken part in my journey. This is my attempt at a tribute
to few of those people.
I wish to express my heartfelt gratitude to my advisor Prof. Jelena Kovacevic for
mentoring me during my years of graduate study. Her guidance, cheerful enthusiasm
and dedication has made my journey at CMU filled with adventure and fun from start
to finish. I have benefited very much from the freedom and independence she has
allowed me, while always knowing that I could count on her support when needed. No
ornamentation of words can do justice in expressing my gratitude and joy in learning
under her advice and guidance that extends far beyond the realm of research. Thank
you for being such an admirable “guru”.
I would like to thank my thesis committee members for consenting to be part of
this work. My sincere thanks to Dr. Alejandro Hoberman—his expert opinion was
very instrumental in the development of this work. I am grateful to him for promptly
and patiently answering all my questions, and providing good insights to the problem.
Many thanks to Prof. Jose M. F. Moura for his valuable feedback on the project and
introducing us to RICOH Inc., with whose collaboration we have an opportunity to
turn this research into a product. A warm thanks to Prof. George D. Stetten for his
comments that improved our work and pushed it to a higher level.
ii
iii
My sincere thanks to all the collaborators with whom I have had the opportunity
to work with and learn from. In particular, I thank Dr. Nader Shaikh who helped
us with the data and enthusiastically offered his help and comments on our paper.
I thank Dr. Pedro Quelhas for his initial work on the project. I would also like to
thank Dr. Pablo Hennings Yeomans, a senior member of our lab, who very patiently
introduced to his initial work on otitis media classification. Our meetings were fun
and all the inputs were greatly helpful and very much appreciated.
I would like to express my gratitude to Prof. Gustavo. K. Rohde for being my
co-advisor during my initial years as a graduate student. His passion for research and
attention to detail is something we should all strive for.
During my thesis, I have had the opportunity to supervise two very talented
students; Jian Li and Lakshmi Dhevi Jayagobi. Their significant contribution finds
place in this thesis. I have greatly enjoyed working with the both of them and cherish
good memories of our interactions.
I am grateful to all my academic siblings at bimagicLab, working with whom has
been my fortune and privilege. I have learnt so much and enjoyed our time together.
I feel lucky to have known Michael McCann for being a very helpful labmate and
cheerful neighbor. Many thanks to him for painstakingly reading every document
I have written including this thesis, his quick software fix ups, great conversations,
and above all, being a good friend. Many thanks to Filipe Condessa, for our mutual
enthusiasm to seeing each others ’novel’ MATLAB plots, our fun conversations, and
enjoyable walks home. Thank you Kuan-Chieh (Jackie) Chen and Siheng Chen, for
being sweet neighbors, your inputs and readiness to always offer help. I have benefited
from the interactions with my seniors in the lab and thank Wei Wang, Cheng Chen
and Ramamurthy Bhagavatula for helping me with their guidance and expertise. I
also thank my other CBI labmates over the years and CBI staff for making it a happy
iv
working environment.
Many have contributed to this thesis both directly and indirectly, in person, over
emails, phone and across time zones. Many thanks to all of them for their friendship
through all these years even though most are half-way across the globe. I would like
to mention some of them here depending on where I met them.
Pittsburgh: To Dr. Susan Cherian, for all the different hats she wore during
our friendship. Thank you for being my friend, my 24/7 therapist and helpline, yoga
partner, and above all my family. Thank you for introducing me to your wonderful
family and especially Adeline (Coco) who taught me the concept of “fake laugh”. I
have always secretively enjoyed it when people mistook me for her daughter and was
never made to feel any less. Cheers to our “cosmic-consciousness” for a lifetime. To
Rev. Fr. John Mathew Elanjileth, for being our partner in crime for numerous drives,
gifts and wonderful times. Special thanks to Meena Rocky for all our fun shopping
trips, spoiling me with custom-made birthday cakes, and always welcoming me home
with warmth and care. Special thank you to Ozzie Miloykovich for our super-fun
girl nights, discovering art, music and movies in Pittsburgh, and uncountable trips to
your kitchen for gourmet food and chocolate bars. I would like to say a big thank you
to all the wonderful people I met in Pittsburgh and have become friends along the
way. In particular, warm thanks to Rathna Veeramachaneni for being a dear friend
and Sumedha Sethi for all the fun times. Many thanks to Deepa Krishnaswamy, for
introducing me to plays, music and art, painting, dance lessons, walks in the park
and the list goes on.
Bangalore: I am extremely grateful to Dr. Gowri Srinivasa, for being part of
my life right from undergraduate work up until now and of course introducing me to
Jelena! I consider myself very fortunate to have been associated with you first as your
student and now as a peer. Thank you for guiding and encouraging me throughout
this journey. It is always a treat to read super-long emails with updates on ‘namma
Bengaluru’. I express my deepest gratitude to Prof. Ajey S.N.R for introducing me
to early lessons of signal processing. I will forever cherish his expert tutelage.
My incomparable gratitude to Vaibhav Upadhyaya, for being my biggest fan and
rock through all times. It is my good fortune that I can always count on him being
my “Atlas”. Thank you for being such a splendid fellow traveler. Thank you Vaidehi
Murthy, for being my feel-good factor, all the support and love along these long years
across continents. Going through everything with her made it so much easier! Many
thanks to Rakshatha Krishnamurthy for constantly checking up on me, heart to heart
conversations and most importantly for her role as “akashwani”.
Thank you to my dearest sister, Poornima Kuruvilla for among other things, her
excitement on happenings in my life and the sheer happiness I benefit from her just
being part of my life.
Finally, I would like to thank the two people who have been with me from my
beginning and all through: my parents.
Amma: Much of what I have learned over the years came as the result of being your
daughter. You have always inspired me, consciously and subconsciously contributed
tremendously to whom I have grown up to be. I am eternally gratefully for all the
cheers, chiding, laughs, lessons and my everyday dose of encouragement. I am so
blessed. Thank you!
Appachen: For all the sacrifices you made for me and undying support through
all times. I will always appreciate your understanding and belief in me. Thank you
for letting me choose and be.
I also gratefully acknowledge support from the NSF through award 1017278, the
NIH through award 1DC010283, the NIH-NIAID through award 3U01AI066007-02S1,
and the CMU CIT Infrastructure Award.
Contents
1 Introduction 1
1.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Diagnosis and Management of Otitis Media 6
2.1 Diagnostic Categories of Otitis Media . . . . . . . . . . . . . . . . . . 6
2.1.1 Acute Otitis Media . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Otitis Media with Effusion . . . . . . . . . . . . . . . . . . . . 8
2.1.3 No Effusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Clinical Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Simple Hand-held Otoscope . . . . . . . . . . . . . . . . . . . 10
2.2.2 Pneumatic Otoscope . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Video Otoscope . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Tympanometry . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Diagnostic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Misdiagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Judicious Use of Antimicrobial Agents . . . . . . . . . . . . . 16
3 Background and Related Work 18
3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
vi
CONTENTS vii
3.1.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Image Correction . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Overview of Classification Methods . . . . . . . . . . . . . . . 24
3.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Clustering and Classification Problems . . . . . . . . . . . . . 27
3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Vocabulary and Grammar . . . . . . . . . . . . . . . . . . . . 39
3.3.3 Automated Classification of Otitis Media . . . . . . . . . . . . 40
4 Goals of the Thesis 42
4.1 Gaps to Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.2 Guiding Principles . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Diagnosis as Classification . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Preprocessing 49
5.1 Automated Segmentation of the Tympanic Membrane . . . . . . . . . 50
5.2 Image Correction: Inpainting Tympanic Membrane Images . . . . . . 51
5.3 Rejection of Unreliable Data . . . . . . . . . . . . . . . . . . . . . . . 53
5.3.1 Rejection due to Specular Highlights . . . . . . . . . . . . . . 53
5.3.2 Rejection due to Over/Under Exposure . . . . . . . . . . . . . 53
5.3.3 Rejection due to Presence of Cerumen . . . . . . . . . . . . . 55
6 Otitis Media Vocabulary 57
6.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
CONTENTS viii
6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.1 Bulging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.2 Central concavity . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3.3 Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3.4 Malleus presence . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.5 Translucency . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.6 Amber level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.7 Bubble presence . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.8 Grayscale variance . . . . . . . . . . . . . . . . . . . . . . . . 66
7 Otitis Media Grammar 67
7.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2.1 Hierarchical-Rule based Grammar . . . . . . . . . . . . . . . . 68
7.2.2 Fuzzy-Logic based Grammar . . . . . . . . . . . . . . . . . . . 72
8 Experimental Results 78
8.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.2 Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2.1 Diagnosis by Expert Otoscopists . . . . . . . . . . . . . . . . . 79
8.2.2 Data Set 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.2.3 Diagnosis by General Pediatricians . . . . . . . . . . . . . . . 81
8.3 Automated Classifiers for Comparison . . . . . . . . . . . . . . . . . . 82
8.3.1 Correlation Filter Classification System . . . . . . . . . . . . . 83
8.3.2 Multiresolution Classifier . . . . . . . . . . . . . . . . . . . . . 83
8.3.3 SIFT and Shape Descriptors with SVM Classifier . . . . . . . 83
CONTENTS ix
8.3.4 WND-CHARM Classifier . . . . . . . . . . . . . . . . . . . . . 84
8.3.5 Random Forest Classifier . . . . . . . . . . . . . . . . . . . . . 84
8.4 Classification of Tympanic Membrane Images . . . . . . . . . . . . . 85
8.4.1 Results: DS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.4.2 Results: DS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.4.3 Results: DS3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9 Conclusions 92
CONTENTS x
List of Acronyms
ANFIS Adaptive neuro fuzzy inference system
AOM Acute otitis media
CAD Computer-aided diagnosis
CFC Correlation filter classification system
CFCR Correlation filter classification system with rejection
DoG Difference of Gaussians
DS1 Data set 1
DS2 Data set 2
DS3 Data set 3
GP General Pediatricians
H&E Hematoxylin and eosin
MRC Multiresolution classifier
MRCR Multiresolution classifier with rejection
NN Neural network
NOE No effusion
OMC Otitis media classifier
OMCR Otitis media classifier with rejection
OME Otitis media with effusion
OMFLC Otitis media fuzzy logic classifier
OMFLCR Otitis media fuzzy logic classifier with rejection
RF Random forests classifier
RFR Random forests classifier with rejection
CONTENTS xi
List of Acronyms
SIFT Scale invariant feature transform
SSC SIFT and Shape descriptors with SVM classifier
SSCR SIFT and Shape descriptors with SVM classifier with rejection
SVM Support vector machine
WCM WND-CHRM classifier
WCMR WND-CHRM classifier with rejection
List of Notations
B Bright region
D Dark region
fa Amber feature
fb Bulging feature
fc Central concavity feature
fl Light feature
fm Malleus presence feature
ft Translucency feature
I Square neighborhood
K Number of clusters initialized for K-means algorithm
(m,n) Pixel location in an image
(mc, nc) Pixel location of central concavity detection
Nt Total number of pixels to train translucency feature
Ntl Number of training images for translucency feature
R Set of radii
r Radius of circular neighborhood
X Original image
Xa Binary image of amber level detection
Xbp Binary image of bubble presence detection
Xc Binary mask of cerumen detection
Xb Binary mask of bulging detection
Xd Depth map
Xm Binary mask of segmented region
Xt Binary image of translucency detection
Td Threshold of the depth map
Tl Threshold of light feature
θmax Direction perpendicular to maximum illumination gradient
List of Figures
2.1 Bulging observed in TM during incidence of AOM. . . . . . . . . . . 8
2.2 Examples of tympanic membrane images of OME. . . . . . . . . . . . 9
2.3 Examples of tympanic membrane images of normal ears. . . . . . . . 9
2.4 Images of hand-held otoscopes. . . . . . . . . . . . . . . . . . . . . . 11
2.5 A example of tympanogram readings [52]. . . . . . . . . . . . . . . . 13
4.1 Illustration of inter-class similarity. Examples of tympanic membrane
images of OME (left) and AOM (right) showing strong similarity in
appearance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Illustration of intra-class variability. Examples of tympanic membrane
images of OME, different severity conditions along OME condition
leads to different presentations. . . . . . . . . . . . . . . . . . . . . . 44
4.3 Guidelines for grammar design: Decision tree for the diagnosis of otitis
media [64]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Block diagram of our proposed otitis media classifier. . . . . . . . . . 48
5.1 Comparison of automated segmentation (top) and hand segmentation
by expert otoscopists (bottom). . . . . . . . . . . . . . . . . . . . . . 50
xiii
5.2 Correction of specular highlights for AOM (left), OME (middle) and
NOE (right). Input images are in the top row, identification of specular
highlight regions in the middle row, and correction of the identified
regions in the bottom row. . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Examples of rejected images from each class with AOM (left), OME
(middle) and NOE (right). Top row corresponds to images rejected
due to washed out appearance and bottom row corresponds to images
rejected due to dull appearance. . . . . . . . . . . . . . . . . . . . . . 54
5.4 Examples of rejected images. Top row corresponds to input images
and bottom row corresponds to images showing detected wax regions. 55
6.1 Computation of the bulging feature. . . . . . . . . . . . . . . . . . . . 60
6.2 Computation of the central concavity feature. . . . . . . . . . . . . . 61
6.3 Computation of the light feature. . . . . . . . . . . . . . . . . . . . . 62
6.4 Computation of the malleus presence feature. . . . . . . . . . . . . . 64
7.1 Initial grammar for classifying otitis media. . . . . . . . . . . . . . . . 68
7.2 Stage 1: Grammar for identifying AOM. . . . . . . . . . . . . . . . . 69
7.3 Stage 2: Grammar for identifying NOE. (Black arrows/boxes denote
those paths belonging to this stage; gray ones belong to Stage 1.) . . 70
7.4 Stage 3: Grammar for identifying OME. (Black arrows/boxes denote
those paths belonging to this stage; gray ones belong to Stages 1 and 2.) 71
7.5 Example of a binary membership function. . . . . . . . . . . . . . . . 73
7.6 Example of a continuous membership. . . . . . . . . . . . . . . . . . . 74
7.7 Examples of membership function using sigmoidal functions. . . . . . 75
List of Tables
4.1 Guidelines for vocabulary design: Otoscopic findings associated with
clinical diagnostic categories of tympanic membrane images [63]. . . . 46
8.1 High variability in the diagnoses among the three expert otoscopists on
the tympanic membrane images in data set DS3. The rows correspond
to the total number of images assigned by an expert to each diagnostic
category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.2 Agreement of diagnoses by two expert otoscopists on the diagnosis of
tympanic membrane images in data set DS3. . . . . . . . . . . . . . . 81
8.3 Diagnoses by three general pediatricians (columns) versus the ground
truth of expert otoscopists (rows). . . . . . . . . . . . . . . . . . . . . 82
xv
LIST OF TABLES xvi
8.4 Classification accuracies (in %) on the ground-truth set of 181 tym-
panic membrane images. Each row corresponds to the class-wise clas-
sification accuracies and columns correspond to the diagnosis by three
general pediatricians (GP) as well as the following algorithms: corre-
lation filter classification system (CFC), WND-CHRM (WCM), mul-
tiresolution classifier (MRC), SIFT and shape descriptors with SVM
classifier (SSC), random forest classifier (RF), our initial classifier [36],
otitis media classifier (OMC) [37], and otitis media fuzzy logic classifier
(OMFLC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.5 Classification accuracies (in %) on the ground-truth set of 170 tym-
panic membrane images out of 181 images after rejection. Each row
corresponds to the class-wise classification accuracies and columns cor-
respond to classification by the following algorithms: correlation filter
classification system (CFCR), WND-CHRM (WCMR), multiresolu-
tion classifier (MRCR), SIFT and shape descriptors with SVM clas-
sifier (SSCR), random forest classifier (RFR), otitis media classifier
(OMCR), and otitis media fuzzy logic classifier (OMFLCR). . . . . . 86
8.6 Diagnoses by three general pediatricians (columns 2, 3, and 4) and
OMFLC (columns 5, 6, and 7) versus the ground truth of expert oto-
scopists (rows) on images in data set DS1. . . . . . . . . . . . . . . . 87
8.7 Classification accuracies (in %) on the ground-truth set of 390 tym-
panic membrane images (267 AOM, 82 OME, and 41 NOE). Each row
corresponds to the class-wise classification accuracies and columns cor-
respond to the classification by the following algorithms: WND-CHRM
(WCM), correlation filter classification system (CFC), multiresolution
classifier (MRC), random forest classifier (RF), SIFT and shape de-
scriptors with SVM classifier (SSC), otitis media classifier (OMC) [37],
and otitis media fuzzy logic classifier (OMFLC). . . . . . . . . . . . . 89
8.8 Classification accuracies (in %) on the ground-truth set of 233 out of
390 tympanic membrane images (144 AOM, 52 OME, and 37 NOE)
after rejection. Each row corresponds to the class-wise classification ac-
curacies and columns correspond to the classification by the following
algorithms: random forest classifier (RFR), correlation filter classifica-
tion system (CFCR), WND-CHRM (WCMR), multiresolution classi-
fier (MRCR), SIFT and shape descriptors with SVM classifier (SSCR),
otitis media classifier (OMCR) [37], and otitis media fuzzy logic clas-
sifier (OMFLCR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.9 Classification accuracies (in %) on the ground-truth set of 248 tym-
panic membrane images (58 AOM, 112 OME, and 78 NOE). Each
row corresponds to the class-wise classification accuracies and columns
correspond to the classification by the following algorithms: multires-
olution classifier (MRC), correlation filter classification system (CFC),
WND-CHRM (WCM), SIFT and shape descriptors with SVM classifier
(SSC), random forest classifier (RF), otitis media classifier (OMC) [37],
and otitis media fuzzy logic classifier (OMFLC). . . . . . . . . . . . . 90
LIST OF TABLES xviii
8.10 Classification accuracies (in %) on the ground-truth set of 162 out of
248 tympanic membrane images (44 AOM, 46 OME, and 72 NOE).
Each row corresponds to the class-wise classification accuracies and
columns correspond to the classification by the following algorithms:
multiresolution classifier (MRCR), correlation filter classification sys-
tem (CFCR), random forest classifier (RFR), SIFT and shape descrip-
tors with SVM classifier (SSCR), WND-CHRM (WCMR), otitis media
classifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR). 90
Chapter 1
Introduction
Middle-ear inflammation, clinically known as acute otitis media (AOM) is a frequent
condition affecting a majority of the pediatric population for which antimicrobials
are prescribed in the United States. The children are mostly affected in their first
two years of life, particularly between 6 and 12 months. The number of otitis media
episodes has increased substantially in the past two decades, with approximately 25
million visits to office-based physicians in the US and a total of 20 million prescriptions
for antimicrobials related to otitis media yearly [69]. This results in significant social
burden and indirect costs due to time lost from school and work, with an estimated
annual medical expenditure of approximately $2 billion [56].
The correct diagnosis and management of otitis media has a significant impact on
the health of children and overall use of antimicrobial agents. Even though numerous
clinical studies exist on this prevalent problem, there is a vast amount of variability
in the medical community on the optimal diagnostic criteria and management of
otitis media. Thus, AOM is frequently over-diagnosed as it gets confused with otitis
media with effusion (OME, a sterile effusion that subsides spontaneously), resulting in
unnecessary antibiotic prescriptions to a substantial proportion of patients in whom
1
CHAPTER 1. INTRODUCTION 2
it leads to adverse effects and increased bacterial resistance without the expected
benefit of an improved clinical outcome. This issue is of increasing concern since it
leads to mismanaged episodes of otitis media and most importantly compromises the
efficacy of any future treatments for a bacterial infection.
The current standard of clinical diagnosis of AOM includes the visual exami-
nation of the tympanic membrane using an otoscope; this is time-consuming, not
reproducible, error-prone, subjective, and shows limited intra and inter observer re-
producibility. These concerns underscore the critical need for developing an accurate,
efficient, and automated system for the classification of otitis media into AOM, OME,
and no effusion (NOE) leading to the goal of this thesis:
To develop an accurate automated classification algorithm for
the classification of otitis media into three distinct diagnostic
categories, based on tympanic membrane images.
The material presented in this thesis is developed based on the domain-knowledge
of expert otoscopists. In an ideal situation, if each of the diagnostic categories of
otitis media presented a unique set of signs and symptoms, then an engineer’s job
of designing an automated classifier would reduce to building a look-up table that
associates diagnostic categories with a set of signs and symptoms. Unfortunately,
this is not the case. In a real-world situation the experts’ diagnostic process involves
weighing different pieces of information and evidence gathered over multiple situations
in order to reach the most appropriate diagnosis for the presented situation. It is our
goal to mimic the expert otoscopists’ diagnostic capability by designing a system that
utilizes the expert human knowledge in addition to performing numerical calculations
on the data.
Currently, an automated classification algorithm for otitis media does not exist
and we are the first to develop such algorithm that can be used as a diagnostic aid to
CHAPTER 1. INTRODUCTION 3
classify tympanic membrane images into one of the three stringent clinical diagnostic
categories: AOM, OME, and NOE. This contribution is significant because it provides
clinicians with an objective, highly accurate classification system that will be a easy
to use clinical aid to discriminate among AOM, OME and NOE.
An accurate automated classification system will enable a more appropriate use of
antibiotics by decreasing the rate of misdiagnoses of OME as AOM (over-prescription),
and AOM as OME (under-prescription). A decrease in over-prescription will result in
reduction of (1) adverse effects, (2) bacterial resistance, as well as (3) financial costs
(direct medication costs, copays, emergency department and primary care provider
visits, and indirect missed work, special day care arrangements) associated with over-
prescription. A decrease in under-prescription will result in appropriate treatment
for a bacterial infection and similar reduced financial costs. Together, these will lead
to more appropriate and enhanced quality of care.
1.1 Thesis Contributions
We present the main contributions of the thesis:
1. Otitis Media Vocabulary. We develop a vocabulary (feature set) understood
by both otoscopists and engineers based on the actual visual cues used by oto-
scopists. Our working hypothesis is that mimicking the features of the trained
otoscopists closely increases the classification accuracy. The results presented
in this thesis demonstrate that using a small set of physiologically meaningful
features increases the classification accuracy.
2. Otitis Media Grammar. We develop a grammar (decision-making process) to
combine the vocabulary terms based on the decision process used by otoscopists.
Our working hypothesis is that developing a decision-making process based on
CHAPTER 1. INTRODUCTION 4
clinical diagnostic process will yield a highly accurate classification system to
distinguish the diagnostic categories of AOM/OME/NOE. The otitis media
classifier built using otitis media grammar outperforms the other classifiers as
demonstrated in this thesis.
1.2 Thesis Outline
This thesis is organized as follows. In Chapter 2, we introduce the three diagnostic
categories of otitis media and present a review of the current diagnostic tools and pro-
cedures employed in a pediatrician’s office for diagnosing otitis media. We highlight
the importance of accurate diagnosis and judicious administration of antimicrobials.
The focus of these sections is emphasize the challenges of accurately diagnosing otitis
media and to introduce the need for an accurate automated classification system.
In Chapter 3, we present the background on segmentation and image correction
techniques. This is followed by a discussion of unsupervised and supervised classi-
fication methods used in this thesis. The focus of these sections is to provide the
necessary background and highlight some of the advantages and limitations of these
methods. We also present the related previous work in the area, starting with a review
of the existing computer-aided diagnostic systems and then introduce the notion of
vocabulary and grammar that is the basis and inspiration of the methods we develop.
Finally, we also discuss the previous work in the area of automated diagnosis of otitis
media.
In Chapter 4, we highlight the challenges presented by the tympanic membrane
images. The need for automated classification system for otitis media and provide
the overall framework of the otitis media classification system designed in this thesis.
In the three subsequent chapters, we present each module of the otitis media
CHAPTER 1. INTRODUCTION 5
classifier in detail, starting with preprocessing, feature extraction and classification.
In Chapter 5, we present preprocessing of the tympanic membrane images, which
is the first module of the otitis media classification system. In Section 5.1, we
present the segmentation of tympanic membrane, previous work of bimagicLab [4]
(Dr. Kovacevic’s group). From Section 5.2, we present work of this thesis starting
with image correction and rejection techniques for making the data reliable for further
processing.
In Chapter 6, we describe the otitis media vocabulary, one of the major contribu-
tion of the thesis. The vocabulary features are designed to mimic the actual visual
cues used by expert otoscopists’ during the clinical diagnosis.
In Chapter 7, we describe the otitis media grammar, another major contribution
of this thesis. Here, we present the decision making process to combine the vocab-
ulary terms mimicking the diagnostic process used by the expert otoscopists while
examining otitis media.
In Chapter 8, we discuss the evaluation of otitis media classifier compared to
other automated classifiers on the tympanic membrane images of otitis media. The
otitis media classifier demonstrates superior performance on the classification of the
tympanic membrane images of three diagnostic categories of otitis media. Results
demonstrate that the performance of the classifier is comparable to the diagnosis
agreed upon by a panel of three expert otoscopists.
In Chapter 9, we conclude this thesis by summarizing our work and proposing
ideas for the future.
Chapter 2
Diagnosis and Management of
Otitis Media
Otitis media is the childhood illness that results in the most frequent reasons for visits
to a pediatricians office. The purpose of this chapter is to provide an understanding
of the diagnostic categories of otitis media that are important for the discussion in
this thesis. We introduce the three diagnostic categories of otitis media, followed by
different diagnostic systems used in the current clinical setting. We conclude with
the current standard of evaluation in clinical setting and the importance of reducing
unnecessary and ineffective antimicrobial therapy prescribed for non-acute cases of
otitis media.
2.1 Diagnostic Categories of Otitis Media
Otitis media is the general term for the inflammation of the middle ear. It occurs in
the Eustachian tube that connects the middle ear cavity with the nasopharynx. The
Eustachian tube performs three main functions: Firstly, it allows the passage of air in
6
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 7
the tube that is important to maintain equal pressure on both sides of the eardrum.
Secondly, it drains middle ear secretions into the nose and, lastly, it prevents the flow
of fluid back up the tube into the middle ear. Since the Eustachian tube is narrow
in infants even slight amount of swelling blocks the tube, impeding its function and
causing fluid buildup in the middle ear that can lead to an inflammation. The most
frequent occurrence of this condition is during the first two years of life owing to the
ongoing physiological and immunological development of children. Studies show that
70% of children experience at least one episode of otitis media during their first year
and 93% experience otitis media once in their first seven years [69].
“Otitis” means inflammation and “media” means ear, hence the name – otitis
media. The term “otitis media” refers to a continuum of ear disease: acute middle
ear infections (acute otitis media, purulent otitis media, suppurative otitis media),
the accumulation of fluid in the middle ear (otitis media with effusion, serous otitis
media), or both.
2.1.1 Acute Otitis Media
Acute otitis media (AOM) is an infection of the middle ear and Eustachian tube. It
can occur at any age, but primarily affects children between the ages of six months
and two years. AOM is a frequent condition affecting a majority of the pediatric
population for which antibiotics are prescribed.
Children with AOM present different symptoms such as ear pain (otalgia), ear
discharge (otorrhea) and/or temporary hearing loss. During AOM, the tympanic
membrane looks inflamed, opaque and bulged with marked redness. Bulging of the
tympanic membrane is considered the most reliable otoscopic characteristic in cases
of AOM [63]. Figure 2.1 shows examples of mild bulging, moderate bulging and
severe bulging observed in AOM. Additional non-specific symptoms in young children
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 8
(a) Mild bulging. (b) Moderate bulging. (c) Severe bulging.
Figure 2.1: Bulging observed in TM during incidence of AOM.(Images courtesy: All the tympanic membrane images presented and used in this thesis areprovided by Dr. Hoberman and Dr. Shaikh.)
include: irritability, fever, night waking, poor feeding due to decreased appetite, cold
symptoms, conjunctivitis and occasional balance problems.
2.1.2 Otitis Media with Effusion
Otitis media with effusion (OME) is the presence of sterile middle ear fluid without
signs and symptoms of acute ear infection; this condition tends to subside sponta-
neously. Many cases result in recurrent OME and 5% to 10% of the cases last more
than a year [38, 55]. This is an important clinical condition due to its prevalence in
children and the costs associated with it. Approximately 2.2 million cases of OME
are reported annually in the United States with associated costs of $4 billion [66].
Despite the high prevalence of OME, most common clinical practices are unable to
identify these cases correctly. Diagnosing OME accurately is crucial for proper man-
agement and distinguishing OME from AOM is fundamental in ensuring appropriate
use of antimicrobials.
In OME, the tympanic membrane is often partially cloudy with decreased mobil-
ity and an air-fluid level or bubble may be visible [63]. Examples of the tympanic
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 9
Figure 2.2: Examples of tympanic membrane images of OME.
membrane images of OME are shown in Figure 2.2. Some cases of OME may be
asymptomatic or the patients may experience ear discomfort, hearing loss, or a feeling
of ear-fullness. Children who are at the risk for delayed speech or language are more
likely to be affected due to hearing problems associated with OME.
OME may occur due to dysfunction of Eustachian tube resulting from a upper res-
piratory infection and may precede or succeed the occurrence of AOM. However, since
OME does not result from bacterial infection, it does not benefit from antimicrobial
therapy. Therefore, the task of distinguishing OME from AOM is of utmost impor-
tance. This will ensure avoiding unnecessary use of antibiotics, which leads to adverse
effects of incorrect medication and development of harmful bacterial resistance.
Figure 2.3: Examples of tympanic membrane images of normal ears.
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 10
2.1.3 No Effusion
Figure 2.3 shows examples of a normal tympanic membrane. They are pearly gray in
color, translucent, in normal position, and with clear bony landmarks. The landmarks
include the short process and the manubrium of the malleus that are easily observable.
The tympanic membrane moves inward on the application of positive pressure and
outward on applying negative pressure. If the tympanic membrane does not move
with gentle application of pressure, chances are there is presence of middle ear effusion,
sterile or infectious.
2.2 Clinical Diagnostics
When a child is brought to a pediatrician’s office complaining of ear pain and discom-
fort, the clinical diagnostic procedure starts with the parent taking a questionnaire
where the symptoms of the conditions are to be described. This is usually followed
by an examination of the ear by a clinician. A variety of diagnostic tools are available
in the current market that are utilized for the visual examination of the tympanic
membrane. In this section, we briefly introduce some of the diagnostic devices and
tests that might be encountered during the clinical examination and evaluation of
otitis media.
2.2.1 Simple Hand-held Otoscope
An otoscope (see Figure 2.4(a)) is a medical device that enables the clinical examiner
to look at the middle ear and visualize the tympanic membrane. It consists of a
handle and a head. The head of the otoscope houses the illumination source and a
simple low-power magnifying lens. The front end of the otoscope has a disposable
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 11
(a) Simple otoscope. (b) Pneumatic otoscope.
Figure 2.4: Images of hand-held otoscopes.(Images courtesy: http://otoscopy.hawkelibrary.com/album08/INst_6.html)
plastic ear speculum with varying sizes. The examiner inserts the speculum of the
otoscope into the ear canal by gently pulling on the pinna (the outer ear) up or down
to straighten the ear canal that has a natural curve, and makes it easier to visualize
the tympanic membrane. The external canal and the tympanic membrane can be
visualized by looking into the magnifying lens and through the speculum. These
hand-held otoscopes can be wall-mounted or portable. Wall-mounted otoscopes are
attached by a flexible power cord to a base, which serves as a source of electric
power plugged into an outlet and as a resting base when not in use. Portable devices
are powered by batteries housed in the handle. As the ear canal is lined with hair
follicles and glands that produce a waxy oil called cerumen, buildup of cerumen often
obstructs the clear visualization of the tympanic membrane. Most models facilitate
the insertion of instruments through the otoscope into the ear canal for removing
wax.
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 12
2.2.2 Pneumatic Otoscope
Examination with a pneumatic otoscope allows for determining the mobility of a pa-
tient’s tympanic membrane in response to pressure changes. The examiner gently
puffs air using the attached bulb (see Figure 2.4(b)) into the ear canal to observe
the movement of the tympanic membrane. The normal tympanic membrane moves
in response to pressure. Immobility may be due to fluid buildup in the middle ear,
sterile or infected, a perforation, or tympanosclerosis, among other reasons. Pneu-
matic otoscopy helps to detect the presence of effusion even when the appearance
of the tympanic membrane gives no clear indication of the condition. The detection
of tympanic membrane mobility helps in establishing the diagnosis of OME. This
otoscope is relatively cheaper than the other devices and can be effectively used with
appropriate training.
2.2.3 Video Otoscope
Many doctors’ offices employ more sophisticated video-otoscopes and otoendoscopes,
which connect to a light source (halogen, xenon or LED) and a computer, and can
record images or video. Single hand-held otoscopes do not permit acquisition of
images and/or video and require diagnosis on the spot, while video-otoscopes and
otoendoscopes do; however, the clinician views the feed on a side screen while holding
the device in the ear canal of an often-squirming young child.
2.2.4 Tympanometry
A tympanometer is a hand-held device that provides quantitative information on the
presence of fluid and function in the middle ear. The examination is done by placing
the probe into the ear canal, a sound stimulus is transmitted into the canal while
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 13
Figure 2.5: A example of tympanogram readings [52].
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 14
a vacuum pump adjusts the pressure in the ear causing the tympanic membrane
to move. A microphone in the instrument detects the returning sound energy. The
mobility of the tympanic membrane is at its maximum when the air pressures are equal
on both sides of it. This often done in conjunction with pneumatic otoscopy, which
provides qualitative measure of tympanic membrane mobility, whereas tympanometry
provides quantitative information. The graphic display of this information showing
the amount of positive and negative pressure generated, the absorption of acoustic
energy by the middle ear, and ear canal volume is called a tympanogram.
Figure 2.5 is an example of a tympanogram. The different curves correspond to
distinct conditions of the middle ear [52]. For example, Figure 2.5(a) shows a flat
curve, which is indicative of decreased mobility of the tympanic membrane. Fig-
ure 2.5(b) shows a completely flat curve showing very low ear canal volume; this
is an indication that the ear canal may be occluded with cerumen. Figure 2.5(c)
shows a flat curve but with volume, this would mean that the ear canal volume is
increased due to perforation of the tympanic membrane. The perforation results in
more volume in the ear canal than the normal volume. Figure 2.5(d) shows a wide
curve with a height in the normal range; though not a clear indication of a pathol-
ogy, it could mean a starting or clearing of OME. Figure 2.5(e) shows a curve with
normal height and negative pressure. Building up of negative pressure increases the
risk of upper respiratory infection and hence presents an increased risk to develop
AOM. Figure 2.5(f) indicates high positive peak pressure due to bulging of tympanic
membrane that is clear indication of AOM.
Additionally, there are a number of other clinical procedures. Tympanocentesis
is a procedure where a tube is placed in the tympanic membrane to drain the fluid
accumulated in the middle ear. Acoustic reflectometry is a test that measures the
amount of sound reflected back by the tympanic membrane as an indirect measure of
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 15
the fluid buildup. If the child has persistent ear infections and fluid buildup in the
middle ear, tests are performed by an audiologist to assess hearing, speech skills and
to detect any impediments to normal development.
2.3 Diagnostic Uncertainty
A major challenge in diagnosing otitis media is distinguishing between OME and
AOM. OME is more prevalent than AOM since it can be present during the onset
of AOM or when AOM is resolved. A misdiagnosis of AOM leads to unnecessary
prescription of antibiotics. It is of utmost importance that the examiners must avoid
such false-positive diagnosis in children. The diagnosis of otitis media is particularly
difficult in young children and infants in the preverbal state. Other factors such as the
narrowness of auditory canal, inability of the child to remain still during examination,
or incomplete removal of cerumen from the ear canal adds to the difficulty in making
the diagnosis.
2.3.1 Misdiagnosis
The inherent difficulties in distinguishing among the three diagnostic categories of
otitis media, together with the above issues, make the diagnosis by nonexpert oto-
scopists notoriously unreliable and lead to the following:
1. Overprescription of antibiotics.
AOM is frequently overdiagnosed; this happens when NOE or OME is diagnosed
as AOM, resulting in unnecessary antibiotic prescriptions that lead to adverse
effects and increased bacterial resistance [1]. Overdiagnosis is more common
than underdiagnosis because doctors typically try to avoid the possibility of
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 16
leaving an ill patient without treatment, leading to antibiotic prescriptions in
uncertain cases.
2. Underprescription of antibiotics.
Misdiagnosis of AOM as either NOE or OME leads to underdiagnosis. Most
importantly, children’s symptoms are left unaddressed. Occasionally, under-
diagnosis can lead to an increase in serious complications such as perforation of
the tympanic membrane, and very rarely, mastoditis [68].
3. Increased financial costs and burden.
There are direct and indirect financial costs associated with misdiagnoses such
as medication costs, co-payments, emergency department and primary care
provider visits, missed work, and special day care arrangements.
2.3.2 Judicious Use of Antimicrobial Agents
As argued earlier, otitis media is the most frequently treated condition in the pe-
diatrics and the consistent leading reason for prescription of antimicrobials. The
cumulative evidence from the available literature confirms that antimicrobial agents
are unnecessary and non-beneficial for non-AOM cases; these wrong prescriptions lead
to spread of antimicrobial resistance.
Even though there exists a general agreement that only AOM cases benefit from
antimicrobials, stringent criteria to establish the diagnosis in a clinical setting is miss-
ing. For example, in a survey conducted on about 165 pediatricians, 147 combinations
of signs and symptoms were identified as criteria for diagnosis [24]. Such wide vari-
ability in diagnostic criteria leads to non-standard management of otitis media of
which wrong prescriptions are a highly prevalent outcome.
One of the major considerations is accurately classifying AOM and OME that
CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 17
directly translates to optimal management of otitis media. In a study to assess the
ability of pediatricians and otolaryngologists to differentiate the physical findings of
otitis media through visual evaluation [60], the participants were shown video images
and asked to state their diagnosis. Among the pediatricians, the average rate of
correct diagnosis was 50%. OME was often misdiagnosed as AOM. The average rate
of correct diagnosis for otolaryngologists was 73%. Another study, [59], reported the
results of testing the skill level of pediatricians from different countries. The average
percentage of correct diagnosis performed by pediatricians in Italy was 54%, Greece
36%, South Africa 53%, and USA 51%.
The above concerns underscore the need for an accurate automated classification
system that can be used as a clinical aid during the diagnosis of otitis media to ensure
reliable diagnosis and hence help reduce the development of bacterial resistance.
Chapter 3
Background and Related Work
In this chapter, we present the background material relevant to the discussion in this
thesis. We begin by introducing image preprocessing techniques focusing on segmen-
tation and image correction. This is followed by a discussion of feature extraction
methods and an overview of supervised and unsupervised classification methods. Fi-
nally, we present the previous work relevant to the research presented here.
3.1 Preprocessing
Preprocessing removes undesired noise and enhances the data for the further analysis
and processing. This commonly involves normalizing the intensity of the individual
pixels in the image, removing reflections, and selecting regions of interest in the
image for further computation. Segmentation is the first preprocessing step in our
system aimed at delineating the tympanic membrane region from the image. In
this section, we briefly discuss basics of segmentation and introduce active-contour
segmentation [32,74] used for segmenting the tympanic membrane from an otoscopic
image. This is followed by a review of image correction techniques currently available
18
CHAPTER 3. BACKGROUND AND RELATED WORK 19
to solve a wide range of illumination problems.
Before we define any operations on an image, we introduce the notation to denote
a digital image in our discussion. We denote an image by X ∈ RM×N two-dimensional
array and can be represented as a function of two variables (m,n) and the domain is
given by M ×N .
3.1.1 Segmentation
When a human observer views a scene, the processing that takes place in the visual
system inherently segments the scene. This is done so effectively that the complex
scene now reduces to a collection of coherent objects. While the task of segmenta-
tion seems trivial in human visual processing, it is not in the case of digital image
analysis. Segmentation is a fundamental task in any image processing pipeline where
an image is divided into multiple regions and background [21]. With increasing size
and number of medical images, segmentation algorithms are necessary for delineating
regions of interest for further analysis. Methods for performing segmentation vary
widely depending on the task at hand and the modality of imaging. There is no good
universal segmentation method that works well on all types of image data, however,
there are general methods that are widely applicable on a variety of data. Typically,
application-specific methods fare better than general methods by using prior knowl-
edge of the data. Here, we discuss active-contour based segmentation that has been
used in segmentation of the tympanic membrane images. A full description of other
available segmentation methods is beyond the scope of our discussion and we refer
the reader to additional references for further details [54, 58].
Active-Contour Segmentation Active-contour segmentation belongs to deformable
models using energy-minimizing curves known as “snakes”. An initial contour is
CHAPTER 3. BACKGROUND AND RELATED WORK 20
placed in the image, which is then evolved to best fit the desired object/region in
the image. The deformity of the contour is comparable to an elastic rubber band
placed outside the target shape and the shape is found when the rubber band stops
shrinking and fits the shape.
Active contours can be expressed as an energy minimization problem. The target
region in the image is an energy functional having properties that control the way the
contour can expand, contract or curve. The contour evolves according to two types
of forces acting on it [11, 15, 32, 43, 74], the external forces from the image such as
edges and internal forces from the contour itself such as its curvature. The points at
the same energy level are connected by a snake. The snake evolves in an iterative
manner by searching in a local neighborhood to select new points that have lower
snake energy. The external forces from the image force the contour to move and
deform from its initial position to best fit the desired region in the image.
We now formally define the snake formulation as the addition of the contour’s
internal energy, and the image energy denoted by Eint, and Eimage respectively. These
functions act on the set of coordinates of control points that make up the snake, v(s).
v(s) = (m(s), n(s)),
where m(s) and n(s) are the Cartesian coordinates of the contour and s is the nor-
malized index of control points. The energy functional of the snake, Esnake is then
defined as
Esnake =∫ 1
s=0Eint(v(s)) + Eimage(v(s))ds.
The goal of the snake is to evolve by minimizing the above equation. This is
CHAPTER 3. BACKGROUND AND RELATED WORK 21
achieved by seeking a set of points v(s) such that
dEsnakedv
= 0.
Let us now consider the parameters that influence the snake’s behavior. The
internal energy Eint is a combination of a continuity term and a smoothness term
written as
Eint = α(s)∣∣∣∣∣dv(s)ds
∣∣∣∣∣2
+ β(s)∣∣∣∣∣d2v(s)ds2
∣∣∣∣∣2
,
where the term dv(s)/ds measures the energy due to stretching; high values of this
term imply high rate of change on the contour. It controls the spacing between the
points and attempts to keep the points at equal distances from each other making the
contour continuous. The second term d2v(s)/ds2 measures the energy from curving.
This term enforces smoothness by avoiding abrupt changes in the curvature. Choice
of α and β controls the shape evolution of the snake. Low values for α imply the
points can be unevenly spaced, whereas higher values imply that the snake aims to
attain evenly spaced contour points. Low values for β imply that curvature is not
minimized and the contour can form corners in its perimeter whereas high values
force the snake to stick to smooth contours.
The other source of energy is the image energy Eimage. The purpose of this term
is to attract the contour towards the target contour using image-features such as
brightness or edges. This is achieved by computing the gradient of the intensity at
each snake point.
Active contours have the advantage of being autonomous ad self-adapting in search
of the minimal energy contour. While local optimization properties of snakes are
sometimes desirable this could also lead to getting stuck in a local minimum state. In
[43], the authors comment on the advantages and disadvantages of energy approaches
CHAPTER 3. BACKGROUND AND RELATED WORK 22
of deforming contours and provide an extended literature on snakes.
3.1.2 Image Correction
Several methods [6, 57, 71] are shown to be robust in correcting local illumination
changes. Most of these methods adjust the pixel intensity value of the image using a
nonlinear mapping function for illumination correction based on the estimated local
illumination at each pixel location and combining the adjusted illumination image
with the reflectance image to generate an output image. The extent of possible image
correction and editing ranges from replacement or mixing with another source image
region, to altering some aspects of the original image locally such as illumination or
color. Since these methods can be used to locally modify image characteristics, we
aim to correct local specular highlights observed in tympanic membrane images.
One of the useful image correction method is Poisson image editing [57] that can
be used for correcting regions in an image in a seamless manner. The main idea is to
fill the target region with pixel values obtained by interpolation of pixel values along
the boundary of the target region. We are interested in achieving local changes in
the regions of specular highlights in the tympanic membrane images. Here, we briefly
discuss Poisson image editing technique.
In the an image X, let Ω be a closed region with a boundary ∂Ω. Let f be an
unknown scalar function defined over Ω, f ∗ be a known scalar function defined on X
minus the interior of Ω, and v be a vector field defined over Ω. For each pixel p(m,n)
in X, let Np be the set of its 4-connected neighbors in X. Let < p, q > denote each
such pixel pair such that q ∈ Np. The boundary of Ω is then given by
∂Ω = p ∈ X\Ω : Np ∩ Ω 6= ∅.
CHAPTER 3. BACKGROUND AND RELATED WORK 23
The value of the function f at a pixel p is denoted by fp. The task is to compute
the set of intensities, f |Ω = fp, p ∈ Ω. This is achieved by solving the minimization
problem:
minf |Ω
∑<p,q>∩Ω6=∅
(fp − fq − vpq)2,with fp = f ∗p∀p ∈ ∂Ω,
where vpq is the projection of v((p + q)/2) on the edge [p, q]. The solution to the
minimization problem above can be obtained by solving for the simultaneous linear
equations:
∀p ∈ Ω, |Np|fp −∑
q∈Np∩Ωfq =
∑q∈Np∩Ω
f ∗q +∑
q ∈ Npvpq.
In Chapter 5, we will see examples of tympanic membrane images that are cor-
rected using this technique to mitigate the problem of local specular highlights in the
image.
3.2 Classification
Every time we open our eyes and look, we effortlessly perform a visual feat far beyond
the capability of today’s most sophisticated computers, though well within the ca-
pability of a kindergartner. This feat is pattern recognition, a typical human ability
that plays an important role in everyday life in reading texts, identifying people, or
even finding a way home. It is this very ease with which we perform these tasks that
belies our “superior” pattern recognition ability.
As we have seen in the previous chapter, the need for an automated classifica-
tion system is crucial. In the particular case of diagnosing otitis media, the expert
otoscopists rely on their training and experience to distinguish among the three diag-
nostic categories. The advantage of years of experience allows them to assign each of
the otitis media cases to a diagnostic category to the best of their knowledge. Such
CHAPTER 3. BACKGROUND AND RELATED WORK 24
manual processing has its limitations prompting the need to switch to computer-aided
diagnosis.
3.2.1 Overview of Classification Methods
Distinguishing diagnostic categories of otitis media from tympanic membrane im-
ages is an image classification task, which we now formally define. Let us assume
that a digital image X ∈ RM×N of the tympanic membrane image is stored as a
two-dimensional array and can be represented as a function of two variables (m,n).
Classification can be defined as a mapping from the space of input images RM×N to
the output space Y = 1, 2, . . . , C of class labels. To reduce the dimensionality of
the problem from M ×N to k, where k M ×N , a feature extractor is defined as
a map f : RM×N 7→ F from the input space to the feature space F = Rk. This is
followed by a classifier defined as a map from the feature space F to the output space
of class labels Y , c : F 7→ Y . The entire classification is then seen as the composition
of these two maps, s = c f .
3.2.2 Feature Extraction
Feature extraction is the process of defining a set of measurements on the image
characteristics that will most efficiently or meaningfully represent the information
that is important for analysis and classification. This is the most critical step in a
classification pipeline since features made available directly influence the efficacy of
classification.
Feature extraction techniques try to capture some intuitive visual attributes from
the image such as composition of the image, placement of objects, spatial relationship
between the objects, color, contrast, pattern etc. Most common feature extraction
CHAPTER 3. BACKGROUND AND RELATED WORK 25
techniques seek to capture some of the visual properties from an image such as edges
[9, 45, 76], color [30, 33], texture [22, 31, 39, 44] and shape [43, 74]. We focus here
specifically on two following general feature sets that were used in building our initial
otitis media classifiers:
Haralick Texture Features
These features are designed based on the assumption that the texture information
in an image is contained in the overall or average spatial relationship of the adja-
cent gray-level pixels. Four directions of adjacency are defined for calculation of the
Haralick texture features and are calculated using four gray-level co-occurance matri-
ces [22,23]. Each element [i, j] of such matrix is obtained by counting the number of
times a pixel with value i appears adjacent to a pixel with value j. Each such entry
can be considered as the probability that a pixel of value i will be found adjacent to
a pixel of value j. The four directions of adjacency are defined; horizontal, vertical,
left and right diagonals. To describe the texture of the image 14 statistical measures
are calculated that make up the Haralick texture feature set.
Scale Invariant Feature Transform
The scale invariant feature transform (SIFT) [40, 41] for an image is a set of local
feature vectors. A local region in an image is described by its center coordinates, the
radius of the region, its orientation in radians and the histogram of gradients. Each
of these feature vectors is invariant to scaling, rotation or translation of the image.
The SIFT features are extract in a 4-step filtering approach:
1. Scale-Space Extrema Detection. In this stage, the image is filtered using a
scale space function. This is to detect locations (keypoints) and scales that are
CHAPTER 3. BACKGROUND AND RELATED WORK 26
identifiable from different views of the same object. The scale-space is defined
by the function:
L(m,n, σ) = G(m,n, σ) ∗X(m,n),
where ∗ is a convolution operator, G(m,n, σ) is a Gaussain, where scale is varied
by the parameter σ, and X(m,n) is the input image. To locate a keypoint in the
scale-space, the difference of Gaussians (DoG) is used by obtained by computing
the difference between two images, one with scale a times the other given by,
D(m,n, σ) = L(m,n, aσ)− L(m,n, σ).
The local maxima or minima of D(m,n, σ) at each (m,n) is compared with 8
neighbors at same scale, and its 9 neighbors one scale up and down.
2. Keypoint Localization. The keypoints that have poor contrast or poor localized
on an edge are discarded. This is done by comparing the absolute value of the
DoG scale space at the peak with a threshold and discarding the peak if its
value falls below the threshold.
3. Orientation Assignment. Once a peak is identified in the DoG scale space, its
orientation is computed by a histogram of gradient orientations in a Gaussian
window 1.5 times the scale σ of the keypoint. This histogram is then smoothed
and the maximum value is selected as its dominant orientation.
4. Keypoint Descriptor. The SIFT descriptor is a spatial histogram of image gra-
dients. The keypoint descriptor uses a set of 16 histograms, each having 8 ori-
entation bins spaced evenly between [0, 2π], resulting in a feature vector with
128 elements.
CHAPTER 3. BACKGROUND AND RELATED WORK 27
Expert Classification Features
Unlike the general features that are applicable to a wide variety of problems, features
can be designed specifically for an application/problem area such as classification of
human faces, fingerprints, documents, natural images, medical images, among others.
Efforts have been made to design physiologically meaningful features trying to mimic
the actual visual cues of the experts in their evaluation process. Examples of such
application-specific features are histopathology vocabulary for delineation of tissues
in images of H&E-stained teratomas [3] and similar vocabulary features were used
in [46] for automated detection of colitis.
3.2.3 Clustering and Classification Problems
In this thesis, we focus most of our discussions on supervised learning methods. Here
we discuss briefly one of the unsupervised learning methods that has been used in our
early attempts to build a classification system and briefly introduce other available
standard classifiers.
K-means Clustering
Clustering involves grouping a set of data points—feature vectors into non-overlapping
partitions, or clusters, where members within a cluster are “more similar” to one
another than to the members belonging to other clusters. The term “more similar”,
when applied to grouping points is defined by some measure of proximity. When a set
of data points is clustered, every point is assigned to some cluster, and then each of
these clusters can be characterized by a single reference point, usually by an average
of the points belonging to the cluster.
K-means clustering [42] is one of the simplest and most popular unsupervised
CHAPTER 3. BACKGROUND AND RELATED WORK 28
learning algorithm. In order to compensate for the lack of labeled training data, the
algorithm learns the characteristics of the data through multiple iterations. Consider
the data set x1, x2, . . . , xT. The main idea is to group these T points into a prede-
fined number of clusters, in this case K. Each cluster is then represented by a single
point that is center of the cluster obtained by averaging all the points xt belong-
ing to that cluster. Let µ1, µ2, . . . , µk be the cluster centers initialized randomly.
Each data point xt is assigned to a cluster where the distance to the cluster center
is minimum. Once all the points in the data set are assigned to the clusters centers,
the process can be iterated again by recomputing new cluster centers. Finally, this
algorithm aims at minimizing an objective function,
J =T∑t=1
K∑k=1
atk‖xt − µk‖2,
where atk is a indicator function where atk = 1 if xt is assigned to cluster k and 0
otherwise.
Although clustering does not require training data, they do require initialization
of the cluster centers and are known to be sensitive to initializations.
Correlation Filters
Correlation filters have been traditionally designed for distinguishing patterns from
each other and from the background. In this section, we give a high-level overview of
correlation filter theory. For an excellent, more comprehensive and exhaustive survey
of correlation filters we point the readers to [35]. A correlation filter can be seen as a
spatial-frequency domain array or a template in the image domain that is specifically
learned from a set of training data which is a good representative of the desired
class of pattern/object. This template is then compared to the query image using a
CHAPTER 3. BACKGROUND AND RELATED WORK 29
cross-correlation function by spanning the query image by relative shifts between the
template and the query. This can be efficiently computed in the frequency domain
(u, v) as,
C(u, v) = X(u, v)H∗(u, v),
where X(u, v) is the 2D Fourier transform (FT) of the query pattern and H(u, v)
is the correlation filter obtained by 2D FT of the template and C(u, v) is the 2D FT
of the correlation output c(m,n). Here ∗ denotes the complex conjugate.
Once such a template is learned, it can be used as a simple and effective clas-
sifier. The main idea is that correlation filters must ideally produce a sharp peak
at the center of the correlation output c(m,n) (obtained by performing 2D inverse
Fourier transform on the C(u, v)) for the authentic/true class and no such peak for an
impostor/false class. Attractive properties of correlation filters are shift invariance,
robustness to noise, graceful degradation of the response to occlusions and in some
cases simple closed-form solutions. There are different types of correlation filters;
minimum average correlation energy filter, maximum average correlation height fil-
ter, quadratic correlation filters are most commonly used. In Chapter 7, we will see
how correlation filters are used to learn a template of tympanic membrane and used
to classify them into three distinct diagnostic categories.
Support Vector Machines
Support Vector Machine (SVM) learning algorithms are one of the most popular “off-
the-shelf” supervised classification methods. SVM is built on the key idea of learning
a decision boundary that maximizes the distance between points of opposite classes
closest to the boundary. In pursuit of the optimum boundary separation, by using
duality, the separation problem is transformed into another problem that might be
CHAPTER 3. BACKGROUND AND RELATED WORK 30
easier to solve. This transformation typically involves projection of the data into a
higher-dimensional space using a non-linear mapping.
Let us consider a binary classification problem, where the training data are
(x1, y1), (x2, y2), . . . , (xT , yT ),
where yt ∈ −1,+1 to denote the class labels for the output class for the training
example xt. The ultimate goal of the SVM algorithm is to construct a hyperplane that
maximally separates the data. For a binary classification problem, this corresponds
to finding a hyperplane such that one side contains all the examples labeled yt = −1
and the other side contains all the points for which yt = +1. When, the problem
contains more than two classes, it is solved by reducing to a number of simpler
binary classification problems. Once this hyperplane is constructed from the training
data, the algorithm makes prediction on the testing data by checking which side of
the hyperplane the testing data on. Note that there could be infinitely many such
separating hyperplanes, in that case, the optimal decision boundary is chosen by
maximizing the classification margin that ensures that the best chance that the new
unseen data points will fall on the correct side of the boundary.
Since we are seeking for a linear decision boundary, the classifier is of the form
h(x) = sign(wTφ(x) + b),
where w is a vector of weights and b is the positive term that represents the margin and
φ as the transformation of the original data set into the new and better represented
space where the data is separable.
An appealing feature of SVM classification is how the decision boundary is sparsely
represented. The hyperplane separating the data points depends on the weights on the
CHAPTER 3. BACKGROUND AND RELATED WORK 31
training data. Far away samples receive zero weights while the training points close
to the decision boundary receive non-zero weights. The training points of opposite
classes close to the decision boundary are called the support vectors. The training
points that far from the separating plane do not influence the decision boundary,
making SVMs robust to outliers. This feature of SVM reduces overfitting of the data
making SVMs very popular and widely used for classification problems.
Neural Networks
Neural networks (NN) [61] were built on the principle of trying to model the learn-
ing and adaptation processes in a human brain, thus are similar to their biological
counterparts. One efficient and often employed way of solving complex problem is
following the principle of “divide and conquer”; splitting a complex problem into nu-
merous simple problems. Networks are one approach for achieving the reduction of
complex problems into simple components defined by a set of building blocks, and
connections between them.
NNs are an example of such networks where the building blocks are computa-
tional units called “neurons” and the connection between them is characterized by
the weights assigned to them. The network is made up of interconnected neurons that
work in parallel on the principles of learning and adaptation from the training data.
These neurons are organized in layers, where the neurons in one layer are connected
to the adjacent layer and each of these connections is assigned a weight. Each neuron
takes in multiple inputs and generates an output that is a weighted sum of its input
signals. This output is then input to the activation function or the subsequent hidden
layer. Each activation function is evaluated as,
aj =N∑i=1
xiwij + wj0,
CHAPTER 3. BACKGROUND AND RELATED WORK 32
where aj is the activation of neuron j, N is the number of inputs, wij is the weight
assigned to the connection between neurons i and j, and wj0 is the bias at the layer j.
Each of these activations are then transformed using a nonlinear activation function
h(.) that can be a hard threshold function, a sigmoid function or tan-sigmoid function
to produce output y given by,
y = h(aj).
During the learning process these weights are adjusted in the network to accom-
plish best performance in classifying the training data. Depending on the nature of
the problem and the training data, one can expect the NNs to learn the data quite
well and are known to be robust to nonlinear relationships in the data. Many meth-
ods have been devised to train a NN, the most popular being the backpropogation
scheme [61], where the weights are adjusted in each layer such that the error between
the desired output and the actual output is reduced. One of the main disadvantages
include its “black-box” nature, where the internal interactions between the neurons
and the layers themselves become intractable. It also requires a large amount of
training data to be trained properly and produce reliable predictions on the unseen
data.
Random Forests
A Random Forest (RF) consists of a collection or ensemble of simple tree predictors,
which collectively assign a single label when presented with a set of features. The final
output label is the most popular class among all the trees in the ensemble obtained
by the majority voting. The random forest method [8] combines two main ideas,
bagging [7] and random-subspace sampling [25].
A random forest classifier can be defined as collection of tree-structured classifiers.
CHAPTER 3. BACKGROUND AND RELATED WORK 33
The tree is built on a bootstrapped version of the training data. Bootstrapping
is a fundamental resampling method that is built on the basic idea that the true
distribution F can be estimated from the so-called empirical distribution F .
Let the training data be (xi, yi), i = 1, 2, ..., T then, the empirical distribution
function can be written as a discrete probability function given by
PF (x, y) =
1T
if(x, y) = (xi, yi), for i.
0, otherwise.(3.1)
A bootstrap sample of size T built from the training data is
(x∗i , y∗i ), i = 1, 2, . . . , T,
where each (x∗i , y∗i ) are drawn uniformly at random from the training set with re-
placement. This then corresponds to exactly T independent draws and then just
approximate the true F .
Each tree in the forest ensemble is then built on a bootstrapped sample and the
splitting at each node is performed on the best feature from a random subset of
features. The final output label is then assigned by taking a majority vote of all the
individual classifiers in the ensemble.
There are two sources of randomness. Firstly, it comes from bootstrapping: the
T data samples are selected at random with replacement. Secondly, at each node
a subset of the features is randomly sampled from the complete feature set. Each
node is then split using the best among this randomly chosen subset of features.
By introducing the randomness into growing the trees, one expects to benefit from
constructing very dissimilar trees, but this is not guaranteed.
The tree is then grown to its maximum depth with no pruning. An estimate of
CHAPTER 3. BACKGROUND AND RELATED WORK 34
the error can be obtained from the training data by predicting the labels of the data
not included in the bootstrapped sample. This is called out-of-bag (OOB) error for
one tree using the bootstrapped sample. The OOB error is calculated each tree in
the forest and the error of the entire ensemble is calculated by aggregating the errors
obtained for all the trees.
Another advantage of this method is the calculation of feature importance. Under-
standing the importance of features used to build a tree helps in avoiding overfitting,
improve the model performance and gives a deeper insight to the underlying nature
of the data. Different criteria have been used to measure feature importance. The
most commonly used importance measure in RF classifiers is Gini index. This is used
to determine which feature is to be used to split the tree in the training phase. For
example, under a binary split case, let p represent the fraction of positive samples
and (1 − p) represent the fraction of negative sample at a node. The impurity G
measured by Gini index at that node is given by
G = 2p(1− p)
The purer a node is, the smaller the value of the Gini index. Every time a node is
split by selecting the feature from the subset that would yield the purest node.
The bagging scheme provides the generalization property that improves with de-
crease in variance and improves the over-all generalization error. Unlike classical
decision trees, this method has been proved to be robust against over fitting and
hence there is no need to prune the trees. The complete method is directed by only
two parameters, the number of trees in the ensemble and the number of features to
be randomly sampled at each node.
CHAPTER 3. BACKGROUND AND RELATED WORK 35
Rule-Based Classifiers
Systems that are built on a set of rules have many desirable properties. They are easy
to understand and the rules can be based on the prior domain knowledge. Rule-based
methods provide a comprehensible description about a system instead of black box
prediction. These set of rules are useful if the rules are not numerous, understandable,
and predict unseen data with high accuracy. Different types of rule formations are
used to express the nature of data, the most common ones are classical preposition
rules, association rules, fuzzy logic rules, threshold rules, and similarity based rules
[5, 48]. Here, we will briefly discuss classical logic rules and fuzzy rules used in the
discussion of the thesis.
Classical logic rules have the simplest type of rule formation using a logical propo-
sition. This type of rule-based classifier uses a collection of “if . . . then . . . ” rules
for classifying the data. The classifier is governed by a set of rules of the form
R = (r1, r2, . . . , rk), where R is the set of rules and ri’s are the classification rules.
Each classification rule then can be expressed as
ri : (conditioni) 7→ yi.
The left-hand side of the of the rule is called the rule antecedent or the rule condition,
that is made up of tests on the feature values. The right-hand side of the rule is called
the consequent that contains the class label yi. A rule condition is generally of the
form
conditioni = (F1 V1) ? (F2 V2) ? · · · ? (Fk Vk),
where (Fj Vj) is an feature-value pair and is a operator mostly chosen from the set
CHAPTER 3. BACKGROUND AND RELATED WORK 36
of relation operators. The result of each feature-value pair is then combined using a
operator ? chosen from the set of logical operators.
The advantage of using such predefined classical logic rules are that they are simple
description of the data that are made of function that are defined on combination
of features. The drawback of this method is that it partitions the feature-space into
hyperboxes and provide an abrupt step-wise approximation to decision boundaries.
Fuzzy Logic
Fuzzy set theory is also known as possibility theory allows dealing with vague or
inexact facts, and is useful for data mining systems performing rule-based classifica-
tion. This concept was introduced in [75]. This can basically be seen as multivalued
logic, that allows for intermediate values to be defined between the conventional crisp
evaluations such as true/false, yes/no, or high/low. The term “fuzzy” suggest an im-
precise boundary than an abrupt one. Unlike the traditional hard computing, fuzzy
logic accommodates imprecision of the real world by allowing for soft computing. It is
a fascinating area of research because it reaches a good trade off between significance
and precision that we humans are good at managing.
Now, we formally defined a fuzzy set and fuzzy membership functions. Consider
a classical set A defined as a collection of elements and the membership or non
membership of each element x is defined by a membership function γA(x), which can
be seen as a mapping:
γA : X 7→ 0, 1
Here γA(x) takes the value either 0 or 1, which represent the truth value of element
x in A whereas fuzzy theory allows for defining partial membership. Let X denote
the universe of discourse of fuzzy set A that characterized by its membership function
CHAPTER 3. BACKGROUND AND RELATED WORK 37
γA defined as a mapping:
γA : X 7→ [0, 1]
The most commonly used membership functions are triangular, trapezoidal, Gaus-
sian, sigmoidal or piece-wise linear. The type of membership function chosen is
mostly dependent on the application and based on available domain knowledge of the
problem.
Fuzzy classification system are an application of fuzzy theory. Expert knowledge
can be expressed in a natural way using linguistic variables, described by the fuzzy
set as described above. For example, the expert knowledge can be translated into a
fuzzy rule such as,
IF feature A is low AND feature B is medium THEN output = class 1.
Each of these rules has a basic “if...then..” formulation with a consequence that is
the output class. The final output is assigned by combining the outputs decisions from
each of the rules, this is done in a variety of ways, the most commonly employed are
majority voting and averaging the individual decisions. Depending on the application
requirement it might not be necessary to define all possible formulations of the rules
using the features since some possibilities may never be observed in the real data.
In contrast to the powerful neural networks that learn a model from the training
data that is more often difficult to interpret, fuzzy logic can built relying on the
knowledge of the domain experts who have better understanding and insight of the
system to their experience. The basis for fuzzy logic is natural language of human
communication because fuzzy logic is built on structures of qualitative description.
It is ‘computing with words’ quoting the author of fuzzy logic [75].
In Chapter 7, we will use fuzzy logic to design the otitis media grammar (rules)
used in the automated otitis media classifier designed to mimic the clinical diagnostic
CHAPTER 3. BACKGROUND AND RELATED WORK 38
decisions made by expert otoscopists while evaluating tympanic membranes.
3.3 Related Work
In this section, we present previous related works divided into three broad categories.
Firstly, computer-aided diagnosis—we give an overview of the existing work on dif-
ferent diagnostic systems. Secondly, vocabulary and grammar—a brief discussion on
the previous works on understanding the human perception of color patterns in a
vocabulary and grammar framework, which is the basic guiding principle followed
in this thesis. We then conclude with classification of otitis media—we present the
related work we are aware of in the literature on automated classification of otitis
media.
3.3.1 Computer-Aided Diagnosis
Understanding and interpreting medical images is arguably one of the most difficult
tasks of pattern recognition. In most medical fields, more often the diagnosis is
inferred manually by medical professionals. The accuracy of the diagnosis of such
manual evaluation shows high variability due to multiple factors such as level of
experience, bias, noise, or fatigue. There is a need for automated aids in the medical
image evaluation processes to achieve objective evaluation. Automation or semi-
automation of clinical diagnostics is done by using computerized systems, which will
automatically process the medical data and present an output useful to the medical
professional.
Using computers to aid in the analysis of medical image images is not new. The
goal of the computer-aided diagnosis (CAD) systems is to produce fast and accu-
rate decisions, reduce interpretation errors as well as variation between and within
CHAPTER 3. BACKGROUND AND RELATED WORK 39
observers through objective evaluation. CAD systems have been used in different
medical fields and as well as on a wide variety of acquisition modalities. The research
in the area of CAD has seen very high activity in the last three decades. We briefly
review a few CAD systems currently employed in clinical diagnosis. For more ex-
tensive review on research and development of CAD systems, we refer the readers
to [20,67].
One of the early CAD systems was interpreting radiology images; the main idea
was to provide a computer output as a useful “second opinion” to the radiologist.
Since then, a number of CAD systems have been developed and employed to help in
diagnostic decision making. For example, CAD systems are employed in automated
detection and classification of various abnormalities in mammograms [10], detecting
lung diseases [17], and brain tumor assessment [28], among many others.
3.3.2 Vocabulary and Grammar
To achieve our goal, we adopt the guiding principles below, partly inspired by [18,
19, 49–51]. The authors performed research on understanding how humans perceive
and measure similarity of color patterns. To understand this process, the authors
performed a subjective experiment that resulted in five perceptual criteria that the
subjects used to compare and associate similarity between color patterns. These
perceptual criteria were named as the vocabulary—a set of basic categories used by
humans in judging similarity of color patterns. The second aspect of the research
was understanding the relative importance and relationships between these basic cat-
egories, as well as hierarchy of rules to combine them—grammar.
In this thesis, we aim to find the corresponding vocabulary and grammar for otitis
media that make the otoscopist’s language.
CHAPTER 3. BACKGROUND AND RELATED WORK 40
• Vocabulary: A set of visual cues that characterize the images of tympanic mem-
brane according to expert otoscopists.
• Grammar : A set of rules that govern the association and hierarchy to combine
the vocabulary terms that mimic the clinical decision process of the expert
otoscopists.
3.3.3 Automated Classification of Otitis Media
To our knowledge, the only works in this area are [47,73], where the authors used color
features to classify otitis media. In both these works, the classification was formulated
as a two-class problem, distinguishing between normal cases and otitis media. There
was no distinction made between cases of AOM or OME but they were together
labeled as otitis media. The ground truth was provided by an otolaryngologist.
In [73], from each image a rectangular region from the tympanic membrane and
circular annular ring region from the auditory canal was selected by otolaryngologist;
the pixels from these selected regions were transformed in to the CIELAB space.
Only the chrominance channels a and b were used for computation.
During the training phase, a color pair (representative of tympanic membrane
and auditory canal) is formed by the expectation of the mono-modal distribution
of the tympanic membrane and auditory canal. The modal color of the tympanic
membrane is predicted using a linear regression on the modal color of the auditory
canal. The regression was applied on the principal components of the two color modes
obtained from principal component analysis. Two regression models were constructed
one for each class, namely normal and otitis. The classification was made based on
the model that gave least prediction error.The model performed poorly detecting 74%
of the normal cases and 62.5% of the otitis media cases.
CHAPTER 3. BACKGROUND AND RELATED WORK 41
In [47], the authors extract two features from the tympanic membrane images;
HSV color histogram and HSV color coherence vectors. These features are classified
using different standard classification methods such as k-nearest neighbors, decision
trees, linear discriminant analysis, naıve Bayes, NN, and SVM. The highest accuracy
of 73.11% reported using NN on color coherence vectors.
The authors in both [47,73] conclude that color alone is not sufficient to distinguish
otitis media from normal cases.
Chapter 4
Goals of the Thesis
In this chapter, we begin by outlining the challenges presented by tympanic membrane
images for processing. This is followed by the discussion of the guiding principles we
adhere to in the process of building an automated classifier. In the last section, we
present the diagnosis of otitis media as a classification problem by giving an overview
of the framework of our classificaiton system.
4.1 Gaps to Fill
As discussed in the previous chapter, there is a critical need to develop an automated
method that can process tympanic membrane images and classify them into AOM,
OME or NOE. The rationale underlying the research work presented in this thesis is
than an automated classification algorithm will enable clinicians to properly diagnose
and treat AOM, reducing the likelihood of adverse effects of bacterial resistance. Such
an automated classification system does not currently exist.
42
CHAPTER 4. GOALS OF THE THESIS 43
4.1.1 Challenges
The difficulties of reaching a clear diagnosis in otitis media arises from multiple
sources. The very nature of the disease presents difficulty since otitis media refers
to a continuum of middle ear infection conditions. The absence of a clear decision
boundary between OME and AOM makes the diagnosis a hard problem for general
pediatricians and sometimes even for experienced otoscopists.
The current standard of examination includes visual examination of the tympanic
membrane by inserting the otoscope along the ear canal while holding an often crying
and squirming child. In cases where the ear canal is blocked by cerumen, this adds
to the workload of the examiner where the cerumen must be removed from the ear
canal to have adequate visualization of the tympanic membrane.
Figure 4.1: Illustration of inter-class similarity. Examples of tympanic membraneimages of OME (left) and AOM (right) showing strong similarity in appearance.
Despite the challenges discussed above during a clinical examination, the tympanic
membrane images also pose challenges for computation. Since there is no standard
procedure for the acquisition of images, it depends solely on the examiner. This
causes many variations in the data such as nonuniform positioning of the tympanic
membrane, in-plane and out-of-plane rotations in the image, inadequate visualization
of the tympanic membrane, illumination problems (nonuniformity, local artifacts).
Failure to remove cerumen from the ear canal leads to occlusion of the tympanic
CHAPTER 4. GOALS OF THE THESIS 44
Figure 4.2: Illustration of intra-class variability. Examples of tympanic membraneimages of OME, different severity conditions along OME condition leads to differentpresentations.
membrane that makes the computation unreliable.
The variation that arises from the acquisition and the absence of clear separation
between OME and AOM presents “inter-class similarity”, where images from distinct
categories have similar appearance as shown in Figure 4.1. On the other hand, in
“intra-class variability”, the images from the same diagnostic category have different
appearance as shown in Figure 4.2.
Finally, the issue of the absence of gold-standard in differentiating diagnostic
categories of otitis media leads to disagreement on ground truth among the experts
making the development of an automated method a challenging task.
4.1.2 Guiding Principles
The prevalence of the problem, disagreement on ground truth and the other associated
challenges make our goal hard and ambitious. In our attempt to build an accurate
automated otitis media classifier, we follow these guiding principles.
Vocabulary We aim to design a feature set understood by both otoscopists and
engineers based on the actual visual cues used by otoscopists; we term this the otitis
media vocabulary.
CHAPTER 4. GOALS OF THE THESIS 45
To explore the diagnostic processes used, Drs. Shaikh and Hoberman conducted
a study to examine findings that the expert otoscopists use during their clinical di-
agnosis [63]. During the study, endoscopic still images of tympanic membranes of
783 children were obtained and examined by expert otoscopists. The examining oto-
scopist recorded information regarding a history of otalgia, and findings concerning
the following tympanic membrane characteristics: color (amber, blue, gray, pink, red,
white, yellow), translucency (translucent, semi-opaque, opaque), position (neutral,
retracted, bulging), mobility (decreased, not decreased), and areas of marked red-
ness, as distinct from mild or moderate redness (present, absent). A random sample
of 135 (in ratio 2:2:1 of AOM:OME:NOE) of these images was sent for review to an-
other group of 7 independent expert otoscopists, resulting in a data set of 945 image
evaluations. To control for differences in color rendition between computers, color-
calibrated laptops were mailed to each expert. They were asked to independently
describe tympanic membrane findings and assign a diagnosis of AOM/OME/NOE.
Just by evaluating still images, with no information about mobility or ear pain, the
diagnosis (AOM vs. no AOM) endorsed by the majority of experts was in agreement
with the live diagnosis 88.9% of the time, underscoring the limited role that symptoms
and mobility of the tympanic membrane have in the diagnosis of AOM. Live diagnosis
refers to the diagnosis based on physical examination and evaluation of the child at the
time of the encounter and is not based on images. Among both groups of otoscopists,
bulging of the tympanic membrane was the finding judged best to differentiate AOM
from OME. 96% of ears during live diagnosis and 93% of ear image evaluations were
assigned a diagnosis of AOM based on presence of bulging. By members of the
two groups who assigned the diagnosis of OME, bulging of the tympanic membrane
was reported in 0% and 3% of ears during live diagnosis and ear image evaluations
respectively. Opacification of the tympanic membrane was the finding that best
CHAPTER 4. GOALS OF THE THESIS 46
AOM OME NOE
Color White, pale yellow, markedly red White, amber, gray, blue Gray, pinkPosition Distinctly full, bulging Neutral, retracted Neutral, retractedTranslucency Opacified Opacified, semi-opacified Translucent
Table 4.1: Guidelines for vocabulary design: Otoscopic findings associated with clin-ical diagnostic categories of tympanic membrane images [63].
differentiated OME from NOE.
To design the otitis media vocabulary, we follow the guidelines in Table 4.1 that
summarizes these otoscopic findings.
Grammar We aim to design a rule-based decision process to combine the vocabu-
lary terms based on the decision process used by otoscopists; we term this the otitis
media grammar.
Figure 4.3: Guidelines for grammar design: Decision tree for the diagnosis of otitismedia [64].
To design the grammar, we use the findings from [64], where the authors em-
pirically examined the findings used by a group of expert otoscopists for diagnosing
otitis media. In this study, the relative importance of signs and symptoms in diagno-
sis of AOM was described and then used to develop a rule-based decision tree method
CHAPTER 4. GOALS OF THE THESIS 47
to diagnose otitis media. At each visit of the patient, the otoscopist recorded the
following tympanic membrane characteristics: color (amber, blue, gray, pink, white,
yellow), degree of opacification (translucent, semi opaque, opaque), position (neutral,
retracted, bulging), decreased mobility (yes, no), presence of air-fluid level(s) (yes,
no), and presence of areas of marked redness (yes, no). A decision tree was then
developed based on the recorded tympanic membrane characteristics using recursive
partitioning to classify the cases into one of the three diagnostic categories. This man-
ual decision tree uses two decisions to discriminate among the diagnostic categories;
first, bulging is used to distinguish AOM from OME and NOE, and if no bulging was
present, opacification or air-fluid level is used to distinguish between OME and NOE
(see Figure 4.3). For ease of reference, we name the diagnosis of AOM, NOE and
OME as Stage 1, 2, and 3, respectively.
To design the otitis media grammar, we follow the guidelines in Figure 4.3 that
summarizes this decision process.
4.2 Diagnosis as Classification
The main assumption in this thesis is that diagnosis can be viewed as classification.
Our goal is to classify diagnostic categories of otitis media into three clinically dis-
tinct categories. The signs and symptoms in combination with examination lead the
expert otoscopist to reach a conclusion or diagnosis of the condition being observed.
When viewed as a classification problem, we study the problem using classification
techniques, starting with learning an algorithm with data whose label is known and
using the learned algorithm to predict the labels for test examples.
As argued earlier, accurate diagnosis of otitis media requires both experience and
understanding of the domain in enough detail. We observed that the experts are
CHAPTER 4. GOALS OF THE THESIS 48
able to sort their way through details, clear the confusion and state a diagnosis with
reasonable confidence. We believe that through collaboration and feedback from the
experts, we can understand, formulate, and build a system to automate classification
of otitis media successfully, as this thesis demonstrates.
Figure 4.4: Block diagram of our proposed otitis media classifier.
In pursuit of building an automated classifier, we adhere to the guiding principles
discussed before. Our intuition behind the presented approach is that mimicking
the visual cues and decision process of otoscopists will lead to high classification
accuracy comparable to that of the expert otoscopists on the tympanic membrane
images. In Figure 4.4, we present the overall structure of the classification system.
The subsequent chapters will explain each of the constituent blocks in the otitis media
classifier in detail.
Chapter 5
Preprocessing
To compute features, image preprocessing is crucial because it is expected that some
regions in the image such as the ear canal are not relevant for computation, hence
it is necessary to delineate the tympanic membrane. Moreover, we aim to eliminate
or minimize the impact of image artifacts arising from illumination problems. These
artifacts will affect feature computation and hence must be corrected. To that end,
we start with an automated segmentation step to locate the tympanic membrane and
apply a local illumination correction to mitigate the problem of specular highlights.
If a captured image cannot be salvaged by local illumination correction, then it is
deemed not fit for processing and the image is rejected from further computation. In
a clinical setting, this rejection procedure in the algorithm could prompt the clinician
to retake the image. Unreliable images are also rejected based on global illumination
artifacts such as very bright appearance due to over exposure and dull appearance
due to under exposure, and occlusion of the tympanic membrane due to buildup of
cerumen in ear canal.
49
CHAPTER 5. PREPROCESSING 50
5.1 Automated Segmentation of the Tympanic Mem-
brane
Figure 5.1: Comparison of automated segmentation (top) and hand segmentation byexpert otoscopists (bottom).
Segmentation is a crucial step to extract relevant regions on which reliable features
for classification can be computed. We now briefly summarize an active-contour
based segmentation algorithm [32] we adapted for our purposes1: First, a so-called
snake potential of the grayscale version of the input image is computed, followed
by a set of forces that outline the gradients and edges of the image. The active-
contour algorithm [74] is then initialized by a circumference in the center of the image.
The algorithm iteratively grows this contour and stops at a predefined convergence
criterion, which leaves an outline that covers the relevant region in the image. This
outline is used to generate the final mask Xm that is applied to the input image
to obtain the final result shown in Figure 5.1. We evaluated the performance of1Automated segmentation of tympanic membrane was implemented by Dr. Pablo Hennings
Yeomans during the early phase of this project
CHAPTER 5. PREPROCESSING 51
the algorithm on automatically segmented images against hand segmented images by
expert otoscopists, and found that we can automatically segment prior to classification
without hurting the performance of the classifier. By adding this segmentation stage,
the classification system becomes completely automated by not requiring the clinician
to specify where the tympanic membrane is positioned.
5.2 Image Correction: Inpainting Tympanic Mem-
brane Images
One of the problems encountered is the presence of specular highlight regions caused
by residual cerumen (wax) in the ear canal and wax on surface of the hair in the ear
canal, which might remain after the examination. Cerumen reflects the light from the
otoscope, which results in white regions in the image as shown in Figure 5.2 (top).
These regions of local specular highlights have to be corrected.
Our aim is to detect the specular highlights in the image and locally correct them.
We use a simple thresholding scheme on image intensities to identify the specular
highlight areas with white pixels. These detected regions are shown in Figure 5.2
(middle row). Once these regions are detected, we apply Poisson image editing tech-
nique [57] explained in Section 3.2.2 to each color channel separately. The local image
correction is achieved by replacing the white pixels with pixel intensities approximated
by interpolating the pixel intensities from the neighborhood of the specular highlight
areas. The corrected images are shown in Figure 5.2 (bottom row).
CHAPTER 5. PREPROCESSING 52
Figure 5.2: Correction of specular highlights for AOM (left), OME (middle) and NOE(right). Input images are in the top row, identification of specular highlight regionsin the middle row, and correction of the identified regions in the bottom row.
CHAPTER 5. PREPROCESSING 53
5.3 Rejection of Unreliable Data
The frequently encountered problem in the tympanic membrane images is variation
presented by image illumination. Nonuniform illumination produces both local arti-
facts such as specular highlights and global artifacts such as dark or very-brightly lit
images.
5.3.1 Rejection due to Specular Highlights
Some of the segmented images may contain large regions of white pixels due to over-
exposure. Poisson image editing method relies on using the neighboring pixels to
approximate intensities in the region to be corrected, and thus, are effective only
when the region to be corrected is small. We empirically found that if the area of
continuous white pixels is more than 15% of total pixels in the segmented tympanic
membrane image, correcting such regions gives unreliable results and hence such an
image should be rejected. Our aim is to use the rejection stage in the real application
and prompt the clinician to retake the image until deemed suitable for processing.
5.3.2 Rejection due to Over/Under Exposure
Depending on the angle and amount of light incident on the membrane and the
ear canal, we encounter different illumination problems related to brightness and
contrast. Artifacts such as shading, shadows, and changes due to global variation in
the intensity or color due to overexposure or underexposure will also affect feature
computation. Both overexposure and underexposure results in loss of detail in images
and leads to very bright appearance and dark appearance respectively.
One of the most commonly used methods to characterize the distribution of pixel
CHAPTER 5. PREPROCESSING 54
Figure 5.3: Examples of rejected images from each class with AOM (left), OME(middle) and NOE (right). Top row corresponds to images rejected due to washed outappearance and bottom row corresponds to images rejected due to dull appearance.
intensities in a image is a histogram. We calculate the histogram statistics descriptor
(HSD) [21] to describe the distribution of pixel intensities. The six statistical measures
are mean, standard deviation, second central moment (also known as R measure),
the third central moment, the uniformity measure and the entropy. The mean of an
image gives the measure of average intensity whereas the standard deviation gives
the measure of average contrast. The second central moment reflects the relative
smoothness of the intensity in a region. The third central moment measures the
skewness of a histogram. The entropy of an image measures the degree of randomness
of the image. The entropy is usually calculated from the first-order histogram of an
image. The uniformity measure is a factor inversely proportional to the variance
of an image. It measures the uniformity of intensity in the histogram. A training
set was selected manually containing two classes of data; images to be rejected due
to overexposure/underexposure and images of proper illumination that are fit for
further computation. An SVM classifier was trained for a two-class problem on the
CHAPTER 5. PREPROCESSING 55
six features extracted from the images in the training set. Figure 5.3 shows examples
of images rejected using this procedure.
5.3.3 Rejection due to Presence of Cerumen
Figure 5.4: Examples of rejected images. Top row corresponds to input images andbottom row corresponds to images showing detected wax regions.
Build up of cerumen in the ear canal leads to inadequate visualization of the
tympanic membrane. Computation on such tympanic membrane images lead to un-
reliable results. We aim to reject these images from further computation and in a
clinical setting, this rejection step can be used a prompt to the clinician for cerumen
removal.
The areas of cerumen in the tympanic membrane image is detected using a color-
assignment technique outlined in Section 6.3.5, used to measure translucency of a
tympanic membrane. During the training phase, regions of wax are hand segmented.
The pixels from these hand-segmented regions are clustered using K-means algorithm.
To detect cerumen region in a image X, for each pixel (m,n), the Euclidean distance
between the pixel and the cluster centers is computed. If any of the K distances fall
CHAPTER 5. PREPROCESSING 56
below a threshold (Tt = 10, found experimentally), the pixel is labeled as a cerumen
pixel. This results in a binary image Xc shown in Figure 5.4 (bottom row) indicating
cerumen and noncerumen regions. The degree of cerumen in the image is defined
as mean of Xc and is used to reject images where the amount of pixels labeled as
cerumen occupy more than 10% of the segmented image.
Chapter 6
Otitis Media Vocabulary
6.1 Main Idea
The expert otoscopist uses specialized knowledge when discriminating between the
different diagnostic categories. The goal of our proposed methodology is to create a
feature set—otitis media vocabulary, which will mimic the visual cues used by trained
otoscopists to diagnose otitis media.
6.2 Methodology
To design the otitis media vocabulary we will follow the process outlined in [3], where a
histopathology vocabulary was designed for automated identification and delineation
of tissues in images of H&E-stained teratomas. Similar vocabulary features were used
in [46] for automated detection of colitis.
Formulation of initial set of descriptions We obtain initial descriptions of those
characteristics best describing a given diagnostic category from the summary of oto-
scopic findings in Table 4.1.
57
CHAPTER 6. OTITIS MEDIA VOCABULARY 58
Computational translation of key terms From this set, the key terms, such as
bulging, are translated into their computational synonyms, creating a computational
vocabulary. In our case, we construct a feature called bulging, which measures the
area of the bulged region in the tympanic membrane.
Computational translation of descriptions Using the computational vocabu-
lary, the entire otoscopist’s descriptions, such as bulging and white, are translated.
Verification of translated descriptions Based on these translated descriptions,
and without access to the image, the otoscopist tries to identify the diagnostic cat-
egory being described, emulating the overall system with translated descriptions as
features and the otoscopist as the classifier.
Refinement of insufficient terms If the otoscopist is unable to identify a diag-
nostic category based on translated descriptions, or if a particular translation is not
understandable, then that translation is refined and presented again to the otoscopist
for verification.
Otitis media vocabulary If the otoscopist is able to identify a diagnostic category
based on translated descriptions, then the discriminative power of the key terms and
their corresponding computational interpretations are validated, and these terms can
be included as otitis media vocabulary terms to create features.
This feedback loop is iterated until a sufficient set of terms have been collected to
formulate the otitis media vocabulary:
bulging fb central concavity fc light f` malleus presence fm
translucency ft amber level fa bubble presence fbp grayscale variance fv
.
CHAPTER 6. OTITIS MEDIA VOCABULARY 59
6.3 Vocabulary
We designed the vocabulary features, bulging, central concavity, malleus presence,
translucency, amber level, and bubble presence based on otoscopic findings listed
in Table 4.1. Supplementing the features designed based on otoscopic findings, we
designed an additional two, light and grayscale variance, based on our observations
and to catch classifier errors.
1. The first three vocabulary features, bulging, central concavity, and light, describe
the distinct characteristics associated with AOM.
2. The next two vocabulary features, malleus presence and translucency, are in-
dicative of NOE.
3. The final three vocabulary features, amber level, bubble presence, and grayscale
variance, describe the characteristics of OME.
We now explain each of the vocabulary features in detail.
6.3.1 Bulging
In [64], the authors showed that bulging of the tympanic membrane is crucial for
diagnosing AOM. We will design a feature that calculates the percentage of bulged
region in the tympanic membrane; we call it the bulging feature. The goal is to
derive a 3D tympanic membrane shape from a 2D image, by expressing it in terms
of depth at each pixel. For example, in AOM, we should be able to identify high-
depth variation due to bulging of the tympanic membrane in contrast to low-depth
variation in NOE due to tympanic membrane being neutral or retracted. The shape
from shading technique [70] can be applied to recover a 3D shape from a single
CHAPTER 6. OTITIS MEDIA VOCABULARY 60
monocular image. The input is a grayscale scale version of the segmented original
RGB image X ∈ RM×N as shown in Figure 6.1(a). The depth at each pixel can be
calculated in an iterative manner using the image gradient and a linear approximation
of the reflectance function of the image. Figure 6.1(b) shows the result of depth map
Xd identifying the bulged regions in the tympanic membrane. The depth map Xd is
then thresholded at Td (here Td = 0.6) to obtain a binary mask Xb of bulging regions
in the tympanic membrane.
(a) Original image. (b) Depth recovered showing the bulged area in red.
Figure 6.1: Computation of the bulging feature.
We then define the bulging feature as the mean of Xb,
fb = E[Xb ] .
6.3.2 Central concavity
The tympanic membrane is attached firmly to the malleus that is one of the three
middle ear bones called auditory ossicles. In the presence of an infection, the tympanic
membrane begins to bulge in the periphery. The central region, however, remains
attached to the malleus forming a concavity. We design a feature to identify the
concave region located centrally in the tympanic membrane; we call it the central
concavity feature. The input is a grayscale version (Figure 6.2(a)) of the segmented
original RGB image X ∈ RM×N as in Figure 5.1. We extract a circular neighborhood
CHAPTER 6. OTITIS MEDIA VOCABULARY 61
of radius R around the pixel (m,n). This circular neighborhood is then transformed
into its polar coordinates to obtain XR(r, θ), with r ∈ 1, 2, . . . , R, θ ∈ [0, 2π], and
r =√
(m−mc)2 + (n− nc)2, θ = arctan (n− nc)(m−mc)
,
where (mc, nc) are the center coordinates of the neighborhood XR. In Figure 6.2(b),
the resulting image has r as the horizontal axis and θ as the vertical one. The
concave region changes from dark to bright from the center towards the periphery
of the concavity; in polar coordinates this change from dark to bright occurs as the
radius grows, see Figure 6.2(b). Defining the bright region B = (r, θ) | r > R′ and
the dark region D = (r, θ) | r ≤ R′, and with R′ ∈ [1/4R, 3/4R], we compute the
ratio of the two means,
fc,R′ =E[XR(r, θ) |(r,θ)∈B
]E[XR(r, θ) |(r,θ)∈D
] ,As the concave region is always centrally located, we experimentally determine a
square neighborhood I (here 151× 151) to compute the central concavity feature,
fc = max(m,n)∈I,R′
fc,R′ .
(a) Grayscale. (b) Polar. (c) Labeled.
Figure 6.2: Computation of the central concavity feature.
CHAPTER 6. OTITIS MEDIA VOCABULARY 62
6.3.3 Light
Examination of the tympanic membrane is performed by an illuminated otoendoscope.
The distinct bulging in AOM results in nonuniform illumination of the tympanic
membrane, in contrast to the uniform illumination in NOE. Our aim is to construct
a feature that will measure this nonuniformity as the ratio of the brightly-lit to the
darkly-lit regions; we call it the light feature.
We start by performing contrast enhancement on the grayscale image in Fig-
ure 6.3(a) to make the nonuniform lighting prominent. The resulting image in Fig-
ure 6.3(b) is thresholded at T` (found experimentally) to obtain a mask of the brightly-
lit binary image Xbl in Figure 6.3(c).
(a) Grayscale. (b) Contrast-enhanced. (c) Dominant orientation.
Figure 6.3: Computation of the light feature.
To find the direction (θmax) perpendicular to the maximum illumination gradi-
ent, we look at lines passing through (mc, nc) (the pixel coordinates at which fc
is obtained) at the angle θ with the horizontal axis. Defining the bright region
B = (m,n) | n ≥ tan(θ)(m − mc) + nc and the dark region D = (m,n) | n <
tan(θ)(m−mc) + nc, we compute the ratio of the two means,
r(θ) =E[Xbl(m,n) |(m,n)∈B
]E[Xbl(m,n) |(m,n)∈D
] .
CHAPTER 6. OTITIS MEDIA VOCABULARY 63
Then, the direction perpendicular to the maximum illumination gradient is given
by
θmax = arg maxθ
r(θ),
and we define the light feature as
f` = r(θmax).
6.3.4 Malleus presence
In OME or in NOE, the tympanic membrane position is either neutral or retracted
and makes the short process of the malleus visible. We design a feature to detect the
partial or complete appearance of the malleus that would help in distinguishing AOM
from OME and NOE; we call it the malleus presence feature. To identify the presence
of the malleus, we perform an ellipse fitting (shown as a red outline in Figure 6.4(a))
to identify the major axis. The image is then rotated to align the major axis with the
horizontal axis. Mean-shift clustering [14] is then performed as shown in Figure 6.4(b),
followed by Canny edge detection [9]. Hough transform [16] is applied on the obtained
edges around the major axis (50-pixel neighborhood empirically obtained) to detect a
straight line (shown in red Figure 6.4(c)) extending to the periphery that will indicate
the visibility of the malleus. If such a line is detected then the feature malleus presence
fm is assigned a value of 1 and 0 otherwise.
6.3.5 Translucency
Translucency of the tympanic membrane is the main characteristic of NOE in contrast
with opacity in AOM and semi-opacity in OME; it results in the clear visibility of
the tympanic membrane, which is primarily gray. We design a feature to measure the
CHAPTER 6. OTITIS MEDIA VOCABULARY 64
(a) Ellipse fitting. (b) Mean-shift clustering. (c) Malleus detection.
Figure 6.4: Computation of the malleus presence feature.
grayness of the tympanic membrane; we call it the translucency feature. We do that
by using a simple color-assignment technique. As these images were acquired under
different lighting and viewing conditions, according to [2], at least 3–6 images are
needed to characterize a structure/region under all lighting and viewing conditions.
We take the number of images to be Ntl = 20.
To determine gray-level clusters in translucent regions, we extract Nt pixels from
translucent regions (Nt = 100) of Ntl RGB images by hand segmentation, to obtain
a total of NtlNt pixels from images (here 2000). We then perform clustering of
these NtlNt pixels using k-means clustering to obtain K cluster centers ck ∈ R3,
k = 1, 2, . . . , K, (K = 10) capturing variations of gray in the translucent regions.
To compute the translucency feature for a given image X, for each pixel (m,n), we
compute K Euclidean distances of X(m,n) to the cluster center ck, k = 1, 2, . . . , K,
dk(m,n) =
√√√√ 3∑i=1
(Xi(m,n)− ck,i)2,
with i = 1, 2, 3, denoting the color channel. If any of the computed K distances falls
below a threshold Tt = 10 (found experimentally), the pixel is labeled as translucent
and belongs to the region Rt = (m,n) | mink dk(m,n) < Tt. The binary image Xt
is then simply the characteristic function of the region Rt, Xt = χRt .
CHAPTER 6. OTITIS MEDIA VOCABULARY 65
We then define the translucency feature as the mean of Xt,
ft = E[Xt ] .
6.3.6 Amber level
We use the knowledge that OME is predominantly amber or pale yellow to distinguish
it from AOM and NOE. We design a feature to measure the presence of amber in the
tympanic membrane; we call it the amber feature. We apply a color-assignment tech-
nique similar to that used for computing Xt to obtain a binary image Xa, indicating
amber and nonamber regions. We define the amber feature as the mean of Xa,
fa = E[Xa ] .
6.3.7 Bubble presence
The presence of visible air-fluid levels, or bubbles, behind the tympanic membrane is
an indication of OME. We design a feature to detect the presence of bubbles in the
tympanic membrane; we call it the bubble presence feature. The algorithm takes in red
and green channels of the original RGB image and performs Canny edge detection [9],
to place parallel boundaries on either sides of the real edge, creating a binary image
Xbp in between. This is followed by filtering and morphological operations to enhance
edge detection and obtain smooth boundaries. We then define the bubble feature as
the mean of Xbp,
fbp = E[Xbp ] .
CHAPTER 6. OTITIS MEDIA VOCABULARY 66
6.3.8 Grayscale variance
Another discriminating feature is the variance of the intensities across the grayscale
version of the image Xv. We define the feature grayscale variance as the variance of
the pixel intensities in the image Xv,
fv = var(Xv) ;
for example, OME has a more uniform appearance than AOM and NOE, and has
consequently a much lower variance that can be used to distinguish it from the rest.
Chapter 7
Otitis Media Grammar
7.1 Main Idea
The modeling of human perception of otitis media diagnosis is new—starting with the
vocabulary feature design and the set of rules considered as the basic grammar of the
otoscopist’s language. For designing the grammar, it is important to understand the
way these rules are applied. An important aspect of our work is to use feedback from
expert otoscopists to improve classification performance by mimicking their diagnostic
process.
7.2 Grammar
In this section, we present the design process of grammar. We begin by presenting
the initial grammar that consists of a set of rules used to combine six vocabulary
features. This is followed by an improved grammar that consists of a set of rules to
combine eight vocabulary features mimicking the decision process designed by expert
otoscopists exactly. Finally, we present the grammar implemented using fuzzy logic.
67
CHAPTER 7. OTITIS MEDIA GRAMMAR 68
7.2.1 Hierarchical-Rule based Grammar
In [36], we designed an initial grammar shown in Figure 7.1, a simple hierarchi-
cal classifier that uses two levels. At the first level, binary decisions were used
to split the images into two superclasses; AOM/OME (acute infection/middle ear
fluid infection) and NOE/OME (no infection/middle ear fluid infection). At the
second level, these superclasses were split into individual diagnostic categories us-
ing a weighted combination (wa, wbp, wt, wv) of four features, amber level fa, bubble
presence fbp, translucency ft, and grayscale variance fv. A weighted combination,
wafa + wbpfbp + wtft + wvfv was used to split superclasses into AOM/OME/NOE.
Figure 7.1: Initial grammar for classifying otitis media.
We then modified the grammar in [37] to mimic the decision process used by
expert otoscopists in Figure 4.3 exactly. The decision process uses a hierarchical rule-
based classification scheme based on the domain knowledge of the expert otoscopists.
The classification is done in three stages by distinguishing one diagnostic category at
a time: AOM (Stage 1), NOE (Stage 2), and OME (Stage 3) respectively, which we
now describe in more detail.
CHAPTER 7. OTITIS MEDIA GRAMMAR 69
Figure 7.2: Stage 1: Grammar for identifying AOM.
Stage 1: Identification of AOM
As the first stage, we detect the instances of AOM based on bulging, light, central
concavity, and malleus presence features as shown in Figure 7.2. Ideally, if there
is bulging present, the image should be classified as AOM as shown in Figure 4.3,
but our bulging feature alone cannot accomplish the task. We use the other features
in the otitis media vocabulary that describe the AOM characteristics such as light,
central concavity, and malleus presence in order to aid separation of AOM from NOE
and OME. In some cases, OME images can exhibit partial bulging and therefore have
a high possibility of being grouped as AOM. In such cases, we use low amber level to
distinguish AOM from OME.
Stage 2: Identification of NOE
Low value of bulging, light, central concavity, and malleus presence features elimi-
nates the possibility of AOM being the diagnosis. Such a situation results in either the
diagnosis being NOE or OME (see Figure 7.3). In Stage 2, our goal is to distinguish
NOE from OME. The translucency feature, which is the most distinguishing char-
acteristic of NOE, can be used here to identify normal cases. In this stage, NOE is
CHAPTER 7. OTITIS MEDIA GRAMMAR 70
Figure 7.3: Stage 2: Grammar for identifying NOE. (Black arrows/boxes denote thosepaths belonging to this stage; gray ones belong to Stage 1.)
identified from the superclass NOE/OME by a high value of the translucency feature,
or low values of all the features characteristic of OME: amber level, bubble presence,
and grayscale variance.
Stage 3: Identification of OME
Figure 7.4 shows the complete otitis media grammar. Most of OME cases are iden-
tified from the superclass NOE/OME from Stage 2 as high values of amber level,
bubble presence, and grayscale variance features. Some cases of OME can exhibit
partial bulging resulting in high values of the bulging feature; in such cases, we can
correctly detect OME if the values of light and central concavity features are low,
and the value of amber level feature is high.
The threshold values for the features were calculated during the training phase of
CHAPTER 7. OTITIS MEDIA GRAMMAR 71
Figure 7.4: Stage 3: Grammar for identifying OME. (Black arrows/boxes denotethose paths belonging to this stage; gray ones belong to Stages 1 and 2.)
the algorithm. We performed a five-fold nested cross-validation. During each fold,
the data was split into training and testing, and the training set was further split into
two sets: learning and validation. We used misclassification rate of the validation set
as the criterion to learn the threshold for each split. The threshold was fixed where we
obtained the least misclassification rate during training and was used on the testing
set.
The complete otitis media grammar that we designed in Figure 7.4 thus follows
the exact structure of the decision tree designed by expert otoscopists in Figure 4.3.
CHAPTER 7. OTITIS MEDIA GRAMMAR 72
7.2.2 Fuzzy-Logic based Grammar
The hierarchical rule-based decision process presented above is constructed using
binary splits on a feature at each node evaluated based on a threshold learned during
the training stage. In our effort to closely mimic the otoscopists’ clinical decision
making process, we present a modified rule-based grammar by employing fuzzy-logic
based decisions. Fuzzy logic is often employed to capture the imprecise modes of
reasoning that play an essential role in the human ability to make decisions. Let
us consider an example in the context of diagnosis of otitis media. One of most
significant diagnostic decision making rule used by an expert otoscopist has the form:
If bulging is high, then it is AOM.
The quantity high is a linguistic variable and there is no corresponding precise
real value that differentiates the high from not high. A set by definition is a collection
of elements that have a definite membership, that is, either they belong to the set or
they do not. Referring back to our example of bulging, in the case of the grammar
presented in Section 7.2.1, bulging in an image was described as high or low as shown
in Figure 7.5. Note how the first two tympanic membrane images were assigned a
value of 0 for bulging despite the distinct difference in the amount of bulging between
them. Such representation does not work very well when trying to describe a real-
world problem like clinical diagnosis. Another drawback of this lack of distinction
is that though there is presence of some bulging in the second tympanic membrane
image, the binary membership function forces us to assign no bulging at all.
In such situations, the fuzzy set approach provides a much better representation of
the amount of bulging in the image. The set in Figure 7.6, is defined by a continuously
inclining function. The membership function for fuzzy set allows for a range of values
[0, 1]. The vertical axis shows the membership value of the bulging in the fuzzy set.
CHAPTER 7. OTITIS MEDIA GRAMMAR 73
Figure 7.5: Example of a binary membership function.
Here, the first image has a membership value of 0 since there is no bulging present,
whereas the second image gets a membership of 0.45 that is a not-very-high bulging,
and the last image gets a membership of 1.0 for presenting definitely high amount of
bulging.
The grammar shown in Figure 7.4 built on the exact structure of the decision
tree designed by expert otoscopists is modified using a fuzzy inference system to
incorporate the notion of non-abrupt feature memberships. The fuzzy inference sys-
tem consists of mainly five functional layers: (1) input, (2) fuzzification, (3) decision
rules, (4) decision making, and (5) output defuzzification. Each of the layers must be
defined for the otitis media classifier which we describe below.
Layer 1: Input Layer. At the onset of the fuzzy inference system we define the
input image as a set of otitis media vocabulary features.
Layer 2: Fuzzification Layer. The membership functions for each feature is de-
fined. The membership degree is calculated for all the features. In the otitis media
CHAPTER 7. OTITIS MEDIA GRAMMAR 74
Figure 7.6: Example of a continuous membership.
classifier, each input vocabulary feature is defined by two membership functions de-
fined as sigmoidal functions given by,
f(x, a, b) = 11 + e(−a(x−b)) .
Depending on the sign of the parameter a, the sigmoidal membership function is
active for lower or higher values of x. The parameter b controls the position of the
activation. For each of the vocabulary features, the membership degree is computed
with two membership functions, low and high as shown in Figure 7.7. All the fuzzy
values of the membership function are initialized using the threshold values obtained
from the training while building the grammar shown in Figure 7.4. The output
consists of three membership functions, one for each of the diagnostic categories
defined by a constant function.
Layer 3: Decision Rules Layer. Here the fuzzy rules are defined using the mem-
CHAPTER 7. OTITIS MEDIA GRAMMAR 75
(a) Low. (b) High.
Figure 7.7: Examples of membership function using sigmoidal functions.
bership degree of each vocabulary feature. In this layer, each rule is linked to its
outcome represented by a diagnostic category. The fuzzy-logic based grammar of
otitis media classifier is derived from 7.4 consists of the following rules:
1. If bulging is high and light is high then AOM.
2. If bulging is high and light is low and central concavity is high then AOM.
3. If bulging is high and light is low and central concavity is low and amber is low
then AOM.
4. If bulging is low and light is high and central concavity is high then AOM.
5. If bulging is low and light is high and central concavity is low and malleus
presence is low then AOM.
6. If bulging is low and light is low and translucency is high then NOE.
7. If bulging is low and light is low and translucency is low and amber is low and
bubble is low then variance is low then NOE.
CHAPTER 7. OTITIS MEDIA GRAMMAR 76
8. If bulging is low and light is high and central concavity is low and malleus
presence is high and translucency is low and amber is low and bubble is low
then variance is low then NOE.
9. If bulging is low and light is high and central concavity is low and malleus
presence is high and translucency is high then NOE.
10. If bulging is high and light is low and central concavity is low and amber is high
then OME.
11. If bulging is low and light is low and translucency is low and amber is high then
OME.
12. If bulging is low and light is low and translucency is low and amber is low and
bubble is high then OME.
13. If bulging is low and light is low and translucency is low and amber is low and
bubble is low then variance is high then OME.
14. If bulging is low and light is high and central concavity is low and malleus
presence is high and translucency is low and amber is high then OME.
15. If bulging is low and light is high and central concavity is low and malleus
presence is high and translucency is low and amber is low and bubble is high
then OME.
16. If bulging is low and light is high and central concavity is low and malleus
presence is high and translucency is low and amber is low and bubble is low
then variance is high then OME.
CHAPTER 7. OTITIS MEDIA GRAMMAR 77
Layer 4: Decision Making Layer. Each of the rules defined in Layer 3 is evaluated
to obtain individual outputs. These individual outputs are defined by the degree of
the output membership functions that are passed on to the next layer.
Layer 5: Output Defuzzification Layer. In this layer the final output is obtained as
the weighted average of the degree of all the output membership functions in Layer 4.
We used the adaptive neuro-fuzzy interface system (ANFIS) available in the Fuzzy
Logic toolbox in MATLAB Version 7.12.0.635 (R2011a). The hybrid optimization
method is used to tune the membership function parameters during the training
phase. The parameters associated with the membership function changes during
the learning process in the training phased using labeled data. The optimization
process stops depending on two parameters, preset value of error measure or number
of iterations, whichever is reached first. The error measure is defined as the sum of
the squared difference between actual and predicted outputs.
Chapter 8
Experimental Results
We now present the results of applying our otitis media classifier on the tympanic
membrane images and compare it to the performance of other automated classifiers.
In the first section, we discuss the data set used in this work. In the second sec-
tion, we present the process of obtaining ground truth for the tympanic membrane
images from three expert otoscopists. We also present diagnosis provided by three
general pediatricians on images diagnosed by experts. In the next section, we dis-
cuss the different classification algorithms used for comparing the performance of our
method. The last section focuses on the experimental results of the classifiers with
the corresponding discussion.
8.1 Data Set
As part of a clinical trial evaluating the efficacy of antimicrobials in young children
with acute otitis media, 826 tympanic membrane images at a resolution of 480 × 640
were collected using an otoendoscope from children with AOM, OME and NOE [27].
These images are collected by Dr. Hoberman [26] and Dr. Shaikh [62] at the Chil-
78
CHAPTER 8. EXPERIMENTAL RESULTS 79
dren’s Hospital of Pittsburgh of University of Pittsburgh Medical Center.
8.2 Ground Truth
Each tympanic membrane image is assigned one of three diagnostic categories; AOM,
OME or NOE. For our experiments, the ground truth is obtained from a panel of three
expert otoscopists. To understand the diagnostic accuracy in a more real clinical set-
ting where otitis media is evaluated on a regular basis by non-expert otoscopists such
as general pediatricians or family physicians, we also present the diagnosis provided
by three general pediatricians on the images evaluated by the three expert otoscopists.
8.2.1 Diagnosis by Expert Otoscopists
A panel of three expert otoscopists examined these images and assigned a diagnosis
for each image. As these images pose challenges even for expert otoscopists, the agree-
ment was rather poor in labeling the images. Having accurate ground-truth labels is
crucial for algorithm development, and thus, we asked the panel to re-diagnose the
entire data set while also providing a diagnosis confidence level for each image; levels
between 80-100 indicated high confidence in diagnosis, while levels below 30 indicated
almost no confidence in diagnosis. Based on confidence and agreement on diagnosis
among the experts we divided 826 tympanic membrane images to 3 non-overlapping
data sets.
Data Set 1
We select a subset from the original set of 826 images for which the three experts
gave the same diagnosis and expressed confidence of over 60 in that diagnosis. The
number of images in this ground-truth set is 181; 63 AOM, 70 OME, and 48 NOE.
CHAPTER 8. EXPERIMENTAL RESULTS 80
We call this set of images as data set 1 (DS1).
Data Set 2
We select a subset from the original set of 826 images for which three experts gave the
same diagnosis irrespective of confidence of diagnosis presented. The total number
of images in this ground-truth set is 390; 267 AOM, 82 OME, and 41 NOE. We call
this set of images as data set 2 (DS2).
8.2.2 Data Set 3
We select a subset from the original set of 826 images for which at least two experts
assigned the same diagnosis irrespective of confidence of diagnosis presented. For this
set of images, the labels were assigned by taking the majority vote of the diagnosis
among the three experts. The total number of images in this ground-truth set is 248;
58 AOM, 112 OME, and 78 NOE. There is a high inter-expert diagnosis variability in
this set of images. To better understand how challenging this diagnosis task can be
even for the experts, we present the diagnosis provided by each of the experts for 248
images in the data set 3 (DS3) in Table 8.1. The number and percentage of images in
data set DS3 on which the two experts assigned same diagnosis is shown in Table 8.2.
AOM OME NOE
Expert 1 73 52 123Expert 2 39 166 43Expert 3 58 131 59
Table 8.1: High variability in the diagnoses among the three expert otoscopists onthe tympanic membrane images in data set DS3. The rows correspond to the totalnumber of images assigned by an expert to each diagnostic category.
Such variability presented by three experts in their diagnosis underscores the fact
that even for these highly-trained expert otoscopists, this is a challenging task.
CHAPTER 8. EXPERIMENTAL RESULTS 81
Experts (1, 2) (2, 3) (1, 3)
No. of images 73 81 94Agreement (%) 29.4 32.7 37.9
Table 8.2: Agreement of diagnoses by two expert otoscopists on the diagnosis oftympanic membrane images in data set DS3.
From the original set of 826 images, we exclude 7 images from our evaluations as
they were assigned different diagnostic category by each expert.
8.2.3 Diagnosis by General Pediatricians
To validate the algorithm against a realistic diagnostic situation, we asked three
general pediatricians to examine our ground-truth set of 181 tympanic membrane
images provided by expert otoscopists. The experiment also required them to state
their level of confidence in diagnosing each of the tympanic membrane images. In
cases of diagnosis with high confidence, the examiner assigned only one diagnostic
category to the image, whereas in cases where the confidence of diagnosis was either
medium or low, the examiner was asked to also provide a second possible choice
of diagnosis, resulting in two diagnoses of an image representing first and second
diagnostic choices, respectively.
To evaluate how the group of three general pediatricians performed on the ground-
truth data set DS1, Table 8.3 shows three confusion matrices: the first is the average
diagnosis by the three pediatricians, while the other two are average diagnoses with
high and medium/low confidence, respectively. The diagnostic accuracy that was
obtained as an average of the accuracies from the three examining pediatricians was
found to be 79.6% (91.7%, 75.7%, and 71.3%, respectively), well below that of expert
otoscopists that we use as our ground truth of 100%.
In terms of misdiagnoses, NOE and OME are the categories with the highest level
CHAPTER 8. EXPERIMENTAL RESULTS 82
Total High confidence Medium/low confidenceAOM OME NOE AOM OME NOE AOM OME NOE
AOM 62 1 0 60 0 0 2 1 0OME 11 56 3 6 37 1 5 19 2NOE 4 18 26 1 8 15 3 10 11
Accuracy 79.6%
Table 8.3: Diagnoses by three general pediatricians (columns) versus the ground truthof expert otoscopists (rows).
of misdiagnosis. The misdiagnosis of OME as AOM (15.7%) is clearly a cause of
concern since it leads to the unnecessary prescription of antibiotics. Similarly, NOE
is often misdiagnosed as OME (37.5%). It is surprising to note that only 50% cases of
NOE were diagnosed with high confidence, of which 9 out of 24 were misdiagnosed.
In the remaining 50% cases, 13 out of 24 (54.2%) were misdiagnosed as OME; such
instances of misdiagnosis may lead to unnecessary treatment procedures.
8.3 Automated Classifiers for Comparison
To validate our algorithm, we also compare it to five automated classifiers, three of
which we designed previously, correlation filter classification system1, multiresolution
classifier and SIFT and shape descriptors using SVM classifier2, and two that are
available in the literature, WND-CHARM classifier and random forest classifier. We
now briefly describe each of these. Note that for all the experiments, we used a 5-fold
cross validation setup.1Correlation filter classification system was implemented by Dr. Pablo Hennings Yeomans during
the early phase of this project2SIFT and shape descriptors using SVM classifier was implemented by Dr. Pedro Quelhas during
the early phase of this project
CHAPTER 8. EXPERIMENTAL RESULTS 83
8.3.1 Correlation Filter Classification System
In this classifier, the image is first transformed into the polar domain. Overlapping
concentric annular regions of different radii are extracted from the image. The center
of the annular regions is assigned as the centroid of the segmented tympanic membrane
image. During the training phase, templates of annular regions for each class are
obtained. These templates are then used to assign a class label to the test images
based on their similarity using normalized cross correlation measure.
8.3.2 Multiresolution Classifier
The multiresolution classifier, which was designed for biomedical applications [13],
decomposes the image into subbands using a multiresolution decomposition (for ex-
ample, wavelets or wavelet packets), followed by feature extraction and classification
in each subband using neural networks (any classifier can be used in each individual
subband) and a global decision based on weighted individual subband decisions. We
ran the multiresolution classifier with 2 levels and 26 Haralick texture features on the
grayscale image and each of the 20 subbands (546 in total).
8.3.3 SIFT and Shape Descriptors with SVM Classifier
In this classifier, we combined SIFT descriptors and shape features. SIFT descrip-
tors [40, 41] are first extracted from the images using the VLFeat library [72]. The
shape features were used as an attempt to detect bulging in the tympanic membrane.
The main idea was to extract areas with bright and dark symmetry. On the segmented
image, we applied phase symmetry detection algorithm described in [34]. Bright and
dark regions were segmented using Otsu thresholding algorithm [53], resulting in two
masks; one for the bright bulging regions and the other for the rest. Based on these
CHAPTER 8. EXPERIMENTAL RESULTS 84
masks the following features were computed: total area of bright regions, total area
of dark regions, average symmetry measure in bright areas, number of dark regions,
number of bright regions, and mean area of bright regions. The SIFT descriptors
and shape features are normalized and combined using a bag-of-words model. The
classification was performed using support vector machine [12].
8.3.4 WND-CHARM Classifier
This is a universal classifier that extracts a large number (4,008) of generic image-level
features [65]. The computed features include polynomial decompositions, high con-
trast features, pixel statistics, and textures. These features are derived from the raw
image, transforms of the image, and compound transforms of the image (transforms
of transforms). The algorithm performs a feature selection during the training stage
by assigning a weight to each feature depending on its ability to distinguish between
the classes. These weighted features are then used to classify test images based on
their similarity to the training classes using nearest neighbor algorithm.
8.3.5 Random Forest Classifier
This is an ensemble classifier [8] that consists of many decision trees, and outputs the
class that is the result of a majority vote of the classes output by individual trees.
The random forest was constructed on the 8 otitis media vocabulary features. At
every node in the tree, a subset of 5 features out of 8 was randomly selected. The
split at each node was performed on the feature from this subset that gave the best
performance. The number of trees in the forest is fixed as 500 since during multiple
runs of random forest we observed that the out-of-bag error converged in the range
of 475–500 trees. We used the implementation of random forest in [29].
CHAPTER 8. EXPERIMENTAL RESULTS 85
8.4 Classification of Tympanic Membrane Images
In this section, we discuss the performance of classifiers on the tympanic membrane
images. The experimental results are presented for each of the data sets (DS1, DS2,
and DS3) and the corresponding reduced set of images after applying the rejection
procedure presented in Section 5.3.
8.4.1 Results: DS1
DS1 consists of the 181 images on which all the three experts stated the same diag-
nosis with high confidence. Given the nature of this data set, we have an opportunity
to understand the discriminative power of our algorithm designed using vocabulary
features and grammar governing the decision rules. Our goal is to achieve classifica-
tion accuracy comparable to the diagnostic capability of the experts while classifying
tympanic membrane images. To that end, we present the performance of the auto-
mated classifiers discussed in Section 8.3 in comparison with the three versions of
otitis media classifiers starting with our classifier built during the early phase of the
project [36] using six vocabulary features. This was further improved in [37] using
eight vocabulary features and finally we present the otitis media fuzzy logic classifier.
Table 8.4 compares the performance of the diagnosis on the data set of 181 images
by three general pediatricians (GP), as well as eight classifiers: correlation filter
classification system (CFC), WND-CHRM (WCM), multiresolution classifier (MRC),
SIFT and shape descriptors with SVM classifier (SSC), random forest classifier (RF),
our initial classifier from [36], otitis media classifier [37] (OMC), and otitis media
fuzzy logic classifier (OMFLC). Table 8.5 compares the results of the above-mentioned
classifiers on the data set of 170 images after automatic rejection of unreliable images.
For ease of reference, we suffix ‘R’ to the name of all the classifiers applied to the data
CHAPTER 8. EXPERIMENTAL RESULTS 86
CFC WCM MRC SSC GP RF [36] OMC OMFLC
AOM 66.7 68.2 53.5 66.7 98.4 84.1 81.3 88.8 92.1OME 57.1 60.8 66.3 81.0 80.0 81.4 85.7 82.6 90.0NOE 62.5 63.4 75.1 60.0 54.2 66.6 81.4 85.4 93.8
Accuracy 61.8 64.1 64.1 70.2 79.6 80.1 84.0 85.6 91.7
Table 8.4: Classification accuracies (in %) on the ground-truth set of 181 tympanicmembrane images. Each row corresponds to the class-wise classification accuraciesand columns correspond to the diagnosis by three general pediatricians (GP) as wellas the following algorithms: correlation filter classification system (CFC), WND-CHRM (WCM), multiresolution classifier (MRC), SIFT and shape descriptors withSVM classifier (SSC), random forest classifier (RF), our initial classifier [36], otitismedia classifier (OMC) [37], and otitis media fuzzy logic classifier (OMFLC).
CFCR WCMR MRCR SSCR RFR OMCR OMFLCR
AOM 65.6 65.6 61.0 76.2 80.3 90.0 93.4OME 56.7 58.2 65.3 72.8 79.1 89.1 91.0NOE 71.4 69.0 83.9 58.3 69.4 93.2 91.4
Accuracy 63.6 63.6 68.2 70.0 77.1 89.9 93.5
Table 8.5: Classification accuracies (in %) on the ground-truth set of 170 tympanicmembrane images out of 181 images after rejection. Each row corresponds to theclass-wise classification accuracies and columns correspond to classification by thefollowing algorithms: correlation filter classification system (CFCR), WND-CHRM(WCMR), multiresolution classifier (MRCR), SIFT and shape descriptors with SVMclassifier (SSCR), random forest classifier (RFR), otitis media classifier (OMCR), andotitis media fuzzy logic classifier (OMFLCR).
set after rejection. For example, correlation filter classification system with rejection
and the otitis media fuzzy classifier with rejection will be refered to as CFCR and
OMFLCR respectively. The OMFLCR outperforms the other classifiers by a fair
margin (16.4%). Random forest classifier shows the highest performance among the
five compared algorithms but fails to outperform the otitis media classifiers. There
are a couple of reasons for this poorer performance: since each image is assigned
an output label based on majority vote of outputs from all the decision trees in the
CHAPTER 8. EXPERIMENTAL RESULTS 87
forest, the final output label can be a result contributed by poorly formed decision
trees, and, a random forest classifier is known to exhibit better performance when the
features used are uncorrelated which is not the case in this work, since more than one
vocabulary feature is directly targeted to characterize a specific diagnostic category.
While the overall performance increase between the otitis media classifier pre-
sented in [36] and otitis media classifier using the new vocabulary and grammar
might not seem substantial, the increase in classification accuracy of AOM cases is
significant. This increase can be attributed to the new grammar presented in Fig-
ure 7.4, which includes new vocabulary features; bulging and malleus presence. In 7.1,
identifying AOM was solely based on central concavity and light features, which only
indicate the presence of a bulge unlike the bulging feature that measures the total area
of bulging in the tympanic membrane. The performance presented by otitis media
classifier with rejection is a trade-off between misclassification and not classifying all
the input data. A total of 11 (2 AOM, 3 OME and 6 NOE) images were rejected due
to specular highlights and illumination problems. In this set we found that no images
were rejected due to presence of excessive cerumen. We believe that this rejection
step during preprocessing will ensure the collection of good-quality images that are
suitable for processing and high-quality diagnosis.
Pediatricians OMFLCAOM OME NOE AOM OME NOE
AOM 62 1 0 58 3 2OME 11 56 3 6 63 1NOE 4 18 26 1 2 45
Accuracy (%) 79.6 91.7
Table 8.6: Diagnoses by three general pediatricians (columns 2, 3, and 4) and OMFLC(columns 5, 6, and 7) versus the ground truth of expert otoscopists (rows) on imagesin data set DS1.
CHAPTER 8. EXPERIMENTAL RESULTS 88
Overall, the otitis media classifier performs better than the average of the three
general pediatricians by a good margin (from 79.6% to 91.7%). Note that for the
comparison to be fair, we did not compare the performance of the pediatricians to
the otitis media classifier with rejection as seen in Table 8.6 , because they do not have
an objective way of rejecting images of poor quality. At the same time, the rejection
capability is a clear advantage of an automated algorithm, and leads to improved
performance (from 91.7% without rejection to 93.5% with rejection). Pediatricians
performed well on diagnosing AOM but with a high possibility of overdiagnosing
AOM.
When comparing misdiagnoses of OME and NOE as AOM between pediatricians
and the algorithm, 15.7% (11 out of 70) cases of OME and 8.3% (4 out of 48) cases
of NOE were misdiagnosed as AOM by pediatricians compared to 8.6% (6 out of 70)
cases of OME and 2.1% (1 out of 48) of NOE by the classifier, with a p-value of
0.0421 for the two-tailed Fisher exact test. When comparing misdiagnoses of NOE
between pediatricians and the algorithm, 45.8% (22 out of 48) cases of NOE were
misdiagnosed by pediatricians compared to only 6.3% (3 out of 48) by the classifier,
with a p-value of 0.0001 for the two-tailed Fisher exact test. From these observations,
we conclude that, on average, our algorithm outperforms general pediatricians.
8.4.2 Results: DS2
Table 8.7 shows the classification accuracies of the classifiers on the data set of 390
images (267 AOM, 82 OME, and 41 NOE). OMFLC demonstrates better class-wise
and overall performance than all the other classifiers. The same trend is followed
in Table 8.8 on the classification accuracies on the set of 233 images (144 AOM, 52
OME, and 37 NOE) images retained after the rejection procedure.
CHAPTER 8. EXPERIMENTAL RESULTS 89
WCM CFC MRC RF SSC OMC OMFLC
AOM 55.1 57.3 73.4 54.3 70.4 71.5 74.2OME 48.8 48.8 28.1 75.6 40.2 61.0 61.0NOE 41.5 29.3 4.9 39.0 39.0 58.5 53.6
Accuracy 52.3 52.6 56.7 57.2 60.8 67.9 69.3
Table 8.7: Classification accuracies (in %) on the ground-truth set of 390 tympanicmembrane images (267 AOM, 82 OME, and 41 NOE). Each row corresponds tothe class-wise classification accuracies and columns correspond to the classificationby the following algorithms: WND-CHRM (WCM), correlation filter classificationsystem (CFC), multiresolution classifier (MRC), random forest classifier (RF), SIFTand shape descriptors with SVM classifier (SSC), otitis media classifier (OMC) [37],and otitis media fuzzy logic classifier (OMFLC).
RFR CFCR WCMR MRCR SSCR OMCR OMFLCR
AOM 54.1 60.8 59.0 76.4 68.9 72.3 71.6OME 81.3 58.3 42.1 37.5 50.0 64.6 68.8NOE 21.6 32.4 64.9 2.7 54.1 59.5 62.2
Accuracy 54.5 55.8 56.2 56.7 62.6 68.7 69.5
Table 8.8: Classification accuracies (in %) on the ground-truth set of 233 out of 390tympanic membrane images (144 AOM, 52 OME, and 37 NOE) after rejection. Eachrow corresponds to the class-wise classification accuracies and columns correspond tothe classification by the following algorithms: random forest classifier (RFR), corre-lation filter classification system (CFCR), WND-CHRM (WCMR), multiresolutionclassifier (MRCR), SIFT and shape descriptors with SVM classifier (SSCR), otitismedia classifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR).
8.4.3 Results: DS3
Table 8.9 shows the classification accuracies of the classifiers on the data set of 248
images (58 AOM, 112 OME, and 78 NOE). OMFLC demonstrates better class-wise
and overall performance than all the other classifiers. The same trend in classification
accuracies is followed in Table 8.10 showing the classification on 162 images (44 AOM,
46 OME, and 72 NOE) retained after the rejection procedure.
For reliable computation, we objectively reject images based on presence of spec-
CHAPTER 8. EXPERIMENTAL RESULTS 90
MRC CFC WCM SSC RF OMC OMFLC
AOM 65.5 62.1 50.0 56.9 43.1 60.3 63.8OME 36.6 34.8 34.8 35.7 70.5 52.7 54.5NOE 3.9 37.2 51.3 50.0 15.4 46.2 48.7
Accuracy 33.1 41.9 43.6 45.2 46.8 52.4 54.5
Table 8.9: Classification accuracies (in %) on the ground-truth set of 248 tympanicmembrane images (58 AOM, 112 OME, and 78 NOE). Each row corresponds to theclass-wise classification accuracies and columns correspond to the classification by thefollowing algorithms: multiresolution classifier (MRC), correlation filter classificationsystem (CFC), WND-CHRM (WCM), SIFT and shape descriptors with SVM classi-fier (SSC), random forest classifier (RF), otitis media classifier (OMC) [37], and otitismedia fuzzy logic classifier (OMFLC).
MRCR CFCR RFR SSCR WCMR OMCR OMFLCR
AOM 68.2 63.6 45.5 65.9 59.1 56.8 63.6OME 47.8 37.0 67.4 32.6 47.8 39.1 39.1NOE 5.6 37.5 33.3 44.5 50.0 61.1 62.5
Accuracy 34.6 43.2 46.3 46.9 51.9 53.7 56.2
Table 8.10: Classification accuracies (in %) on the ground-truth set of 162 out of248 tympanic membrane images (44 AOM, 46 OME, and 72 NOE). Each row cor-responds to the class-wise classification accuracies and columns correspond to theclassification by the following algorithms: multiresolution classifier (MRCR), corre-lation filter classification system (CFCR), random forest classifier (RFR), SIFT andshape descriptors with SVM classifier (SSCR), WND-CHRM (WCMR), otitis mediaclassifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR).
ular highlights, poor illumination and presence of cerumen obstructing the adequate
visualization of the tympanic membrane. It must be noted that fraction of images
rejected in each data set is different. In data set DS1 only 11 out of 181 images are
rejected whereas in data sets DS2 (157 out of 390) and DS3 (86 out of 248). One of
reasons stated by the experts for lower diagnostic confidence is the poor quality of
images. Our rejection procedure is in line with experts’ opinion on the image quality
being critical for a clear diagnosis. The trend we observe is that larger fractions of
CHAPTER 8. EXPERIMENTAL RESULTS 91
images are rejected in data sets DS2 and DS3 where the experts state lower diagnostic
confidence and poor agreement.
The consistent better performance of our otitis media classifier designed based
on vocabulary and grammar validates our methodology that a small number of tar-
geted, physiologically-meaningful features, vocabulary, together with a well-designed
grammar that mimics the decision process of expert otoscopists, is what is needed to
achieve accurate classification in this problem.
Chapter 9
Conclusions
The main goal of this thesis was to create an accurate automated classification system
for classifying the three diagnostic categories of otitis media based on tympanic mem-
brane images. Our working hypothesis was that mimicking the diagnostic process of
the expert otoscopists will lead to an accurate classification system to distinguish
the diagnostic categories of AOM/OME/NOE. In our efforts to closely mimic the
expert otoscopists’ diagnostic abilities, we follow two guiding principles—vocabulary
and grammar.
In this thesis, we present,
• Otitis Media Vocabulary: A set of features designed to characterize the
actual visual cues used by expert otoscopists while distinguishing the diagnostic
categories of otitis media.
• Otitis Media Grammar: A set of rules that govern the association and hier-
archy to combine the vocabulary terms in order to mimic the clinical decision
process of the expert otoscopists while distinguishing the diagnostic categories
of otitis media.
92
CHAPTER 9. CONCLUSIONS 93
The otitis media classifier designed using vocabulary and grammar exhibits high
levels of accuracy in identifying the diagnostic categories of otitis media and is com-
parable to the diagnoses by expert otoscopists. In comparison with other automated
classifiers and diagnoses by general pediatricians, the otitis media classifiers has out-
performed with higher classification accuracy by a fair margin. These results demon-
strate that our simple and concise 8-feature otitis media vocabulary is effective on the
problem, underscoring the importance of using targeted, physiologically-meaningful
features instead of a large number of general-purpose features. The classification
process, grammar, has a set of clear intuitive rules closely mimicking the diagnostic
process used by otoscopists. Increasing the accuracy from the current stage becomes
harder, as we have reached a high accuracy range; we now discuss potential strategies
for achieving that as directions for further work.
Images captured using a digital otoscope exhibit a large variability arising from
the non-standard acquisition procedure. Depending on the angle and amount of
light incident on the membrane and the ear canal, we encounter different illumina-
tion problems related to brightness and contrast. In our current implementation, we
only correct local illumination problems but have not solved for global illumination
problems. When images are found to be unreliable due to poor illumination, these
images were rejected from further computation. Artifacts such as shading, shadows,
and changes due to global variation in the intensity or color due to overexposure or
underexposure will affect feature computation. Strategies for minimizing such arti-
facts are subject of future studies. We have not explored the issue of illumination
normalization and plan to do so in future work.
In summary, the otitis media classifier introduced in this thesis validates our work-
ing hypothesis by demonstrating high classification accuracies on images of tympanic
membrane and the performance is comparable to the diagnoses of expert otoscopists
CHAPTER 9. CONCLUSIONS 94
when examining the images of tympanic membranes. The current standard of diag-
nosing otitis media is by visual examination, and as argued earlier, this subjective
evaluation has clear limitations. Our contribution is significant and innovative since
no other system exists for objective evaluation of otitis media, our otitis media classi-
fier will be the first automated system for classifying the diagnostic categories of otitis
media. We believe that with further improvements, the otitis media classifier can be
employed as a clinical diagnostic aid for non-expert examiners to drastically decrease
both underdiagnosis and overdiagnosis of AOM, assuring adequate antimicrobial use
when AOM is present, and reducing inappropriate use when AOM is absent, thus
avoiding adverse side effects and the risk of contributing to bacterial resistance.
Bibliography
[1] E. Asher, E. Leibovitz, J. Press, D. Greenberg, N. Bilenko, and H. Reuveni.
Accuracy of acute otitis media diagnosis in community and hospital settings.
Am. Acad. Pediatr., 94(4):423–428, April 2005.
[2] P. N. Belhumeur and D. Kriegman. What is the set of images of an object under
all possible lighting conditions? In Proc. IEEE Int. Conf. Comput. Vis. Pattern
Recogn., pages 270–277, June 1996.
[3] R. Bhagavatula, M. C. Fickus, J. W. Kelly, C. Guo, J. A. Ozolek, C. A. Castro,
and J. Kovacevic. Automatic identification and delineation of germ layer compo-
nents in H&E stained images of teratomas derived from human and nonhuman
primate embryonic stem cells. In Proc. IEEE Int. Symp. Biomed. Imag., pages
1041–1044, Rotterdam, The Netherlands, April 2010.
[4] bimagicLab. http://www.jelena.ece.cmu.edu/bimagic.html.
[5] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science
and Statistics. Springer, 2006.
[6] R. Bornard, E. Lecan, L. Laborelli, and J. Chenot. Missing data correction in
still images and image sequences. In Proc. ACM Int. Conf. Multimedia, pages
355–361, Juan-les-Pins, France, 2002.
95
BIBLIOGRAPHY 96
[7] L. Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, 1996.
[8] L. Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[9] J. Canny. A computational approach for edge detection. IEEE Trans. Pattern
Anal. Mach. Intell., 8(6):1293–1299, 1986.
[10] H. P. Chan, J. Wei, Y. Zhang, M. A. Helvie, R. H. Moore, B. Sahiner, L. Had-
jiiski, and D. B. Kopans. Computer-aided detection of masses in digital to-
mosynthesis mammography: comparison of three approaches. Medical Physics.,
5:4087–4095, 2008.
[11] T. F. Chan and L. A. Vese. Active contours without edges. IEEE Trans. Image
Process., 10(2):266–277, February 2001.
[12] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines.
ACM Trans. Intell. Syst. Tech., 2:1–27, 2011.
[13] A. Chebira, Y. Barbotin, C. Jackson, T. E. Merryman, G. Srinivasa, R. F. Mur-
phy, and J. Kovacevic. A multiresolution approach to automated classification
of protein subcellular location images. BMC Bioinform., 8(210), 2007.
[14] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal.
Mach. Intell., 17:790–799, 1995.
[15] L. D. Cohen. On active contour models and balloons. CVGIP: Image Und.,
53(2):211–218, March 1991.
[16] R. O. Duda and P. E. Hart. Use of the Hough transform to detect lines and
curves in pictures. Commun. ACM, 15:204–208, January 1977.
BIBLIOGRAPHY 97
[17] A. El-Baz, G. M. Beache, G. Gimelfarb, K. Suzuki, K. Okada, A. Elnakib,
A. Soliman, and B. Abdollahi. Computer-aided diagnosis systems for lung cancer:
Challenges and methodologies. Int. J. Biomed. Imag., 2013.
[18] K. Ganapathy, J. Hu, J. Kovacevic, A. Mojsilovic, and R. J. Safranek. Retrieval
and matching of color patterns based on a predetermined vocabulary and gram-
mar. US Patent, Jun. 25, 2002. #6,411,953.
[19] K. Ganapathy, J. Hu, J. Kovacevic, A. Mojsilovic, and R. J. Safranek. Retrieval
and matching of color patterns based on a predetermined vocabulary and gram-
mar: II. US Patent, Nov. 26, 2002. #6,487,554.
[20] B. V. Ginnekan, B. T. H. Romeny, and M. A. Viergever. Computer-aided diag-
nosis in chest radiography: A survey. IEEE Trans. Med. Imag., 20:1228–1241,
December 2001.
[21] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall,
Englewood Cliffs, NJ, 2002.
[22] R. M. Haralick. Statistical and structural approaches to texture. Proc. IEEE,
67:786–804, 1979.
[23] R. M. Haralick, K. Shanmugam, and Its’Hak Dinstein. Textural features for
image classification. 1979.
[24] G. F. Hayden. Acute suppurative otitis media in children. diversity of clinical
diagnostic criteria. Clin. Pediatrics, 22:99–104, 1981.
[25] T. Ho. The random subspace method for constructing decision forests. IEEE
Trans. Pattern Anal. Mach. Intell., 20(8):832–844, 1998.
[26] A. Hoberman. http://www.chp.edu/CHP/Hoberman,+Alejandro,+MD.
BIBLIOGRAPHY 98
[27] A. Hoberman, J. L. Paradise, H. E. Rockette, N. Shaikh, E. R. Wald, D. H.
Kearney, D. K. Colborn, M. K. Lasky, S. Bhatnagar, M. A. Haralam, L. M.
Zoffel, C. Jenkins, M. A. Pope, T. L. Balentine, and K. A. Barbadora. Treatment
of acute otitis media in children under 2 years of age. The New England J. Med.,
364:105–115, 2011.
[28] E. Ilkko, K.Suomi, and A. Karttunen. Computer-assisted diagnosis by temporal
subtraction in postoperative brain tumor patients - a feasibility study. Acad.
Radiology., 11(8):887–893, 2004.
[29] A. Jaiantila. Randomforest-matlab. https://code.google.com/p/randomforest-
matlab/.
[30] A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern
Recogn., 29(8):1233–1244, 1996.
[31] B. Julesz. Textons, the elements of texture perception, and their interactions.
Nature, 290:91–97, March 1981.
[32] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Int.
J. Comput. Vis., 1(4):321–331, 1988.
[33] G. K. Klinker, S. A. Shafer, and T. Kanade. A physical approach to color image
understanding. Pattern Recogn., 4:7–38, 1990.
[34] P. Kovesi. Matlab and octave functions for computer vision and image processing.
http://www.csse.uwa.edu.au/ pk/research/matlabfns/.
[35] B. V. K. Vijaya Kumar, A. Mahalanobis, and R. D. Juday. Correlation Pattern
Recognition. Cambridge Univ. Press, 2005.
BIBLIOGRAPHY 99
[36] A. Kuruvilla, J. Li, P. Hennings Yeomans, P. Quelhas, N. Shaikh, A. Hoberman,
and J. Kovacevic. Otitis media vocabulary and grammar. In Proc. IEEE Int.
Conf. Image Process., pages 2845–2848, Orlando, FL, September 2012.
[37] A. Kuruvilla, N. Shaikh, A. Hoberman, and J. Kovacevic. Automated diagnosis
of otitis media: A vocabulary and grammar. Int. J. Biomed. Imag., sp. iss.
Computer Vis. Image Process. for Computer-Aided Diagnosis, August 2013.
[38] C. Lannon, L. E. Peterson, and A. Goudie. Quality measure for the care of
children with otitis media with effusion. Pediatrics, 127, May 2011.
[39] T. Leung and J. Malik. Representing and recognizing the visual appearance of
materials using three-dimensional textons. Int. J. Comput. Vis., 43:29–44, 2001.
[40] D. G. Lowe. Object recognition from local scale-invariant features. In Proc.
IEEE Int. Conf. Comput. Vis., volume 2, pages 1150–1157, Washington, DC,
1999.
[41] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vis., 60(2):91–110, November 2004.
[42] J. B. MacQueen. Some methods for classification and analysis of multivariate
observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics
and Probability, volume 1, pages 281–297. Univ. California Press, 1967.
[43] R. Malladi, J. A. Sethian, and B. Vemuri. Shape modeling with front propaga-
tion: A level set approach. IEEE Trans. Pattern Anal. Mach. Intell., 17(2):158–
175, February 1995.
BIBLIOGRAPHY 100
[44] B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of
image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8):837–842, August
1996.
[45] D. Marr and E. Hildreth. Theory of edge detection. In Proc. R. Soc. of Lon.,
volume B207, pages 187–217, 1980.
[46] M. T. McCann, R. Bhagavatula, M. C. Fickus, J. A. Ozolek, and J. Kovacevic.
Automated colitis detection from endoscopic biopsies as a tissue screening tool in
diagnostic pathology. In Proc. IEEE Int. Conf. Image Process., pages 2809–2812,
Orlando, FL, September 2012.
[47] I. Minornica, C. Vertan, and D. C. Gheorghe. Automatic pediatric otitis de-
tection by classification of global image features. In Proc. 3rd Intl. Conf. on
E-Health and Bioengineering, Iasi, Romania, November 2011.
[48] T. Mitchell. Machine Learning. McGraw-Hill, 1997.
[49] A. Mojsilovic, J. Kovacevic, J. Hu, R. J. Safranek, and K. Ganapathy. Matching
and retrieval based on the vocabulary and grammar of color patterns. In Proc.
IEEE Int. Conf. Multim. Comput. Syst., Florence, Italy, June 1999.
[50] A. Mojsilovic, J. Kovacevic, J. Hu, R. J. Safranek, and K. Ganapathy. Matching
and retrieval based on the vocabulary and grammar of color patterns. IEEE
Trans. Image Process., sp. iss. Image Video Process. Digit. Libraries, 9(1):38–
54, January 2000. IEEE Signal Processing Society Young author Best Paper
Award.
[51] A. Mojsilovic, J. Kovacevic, D. A. Kall, R. J. Safranek, and K. Ganapathy. Vo-
cabulary and grammar of color patterns. IEEE Trans. Image Process., 9(3):417–
431, March 2000.
BIBLIOGRAPHY 101
[52] E. Onusko. Tympanometry. Agency for Healthcare Research and Quality., 70,
November 2004.
[53] N. Otsu. A threshold selection method from gray-level histograms. IEEE Trans.
Syst. Man Cybern., 9:377–393, 1979.
[54] N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern
Recogn., 26(99):1277–1294, 1993.
[55] J. L. Paradise, H. E. Rockette, and D. K. Colborn. Otitis media in 2,253
pittsburgh-factors during the first two years of life. Pediatrics, 99:318–333, May
1997.
[56] Am. Acad. Pediatr. Diagnosis and management of acute otitis media. Pediatrics,
113(5):1451–1465, 2004.
[57] P. Perez, M. Gangnet, and A. Blake. Poisson image editing. ACM Siggraph,
22(3):313–318, 2003.
[58] D. L. Pham, C. Xu, and J. L. Prince. Current methods in medical image seg-
mentation. Ann. Rev. Biomed. Eng., 2:315–337, 2001.
[59] M. E. Pichichero. Diagnostic accuracy of otitis media and tympanocentesis skills
assessment among pediatricians. Eur. J. Clin. Microbiol. Infect. Dis., 22(9):519–
524, September 2003.
[60] M. E. Pichichero and M. D. Poole. Assessing diagnostic accuracy and tympa-
nocentesis skills in the management of otitis media. Archives of Pediatrics and
Adolescent Medicine, 155(10):1137–1142, 2001.
[61] R. J. Schalkoff. Artificial Neural Networks. Computer Science. McGraw-Hill,
1997.
BIBLIOGRAPHY 102
[62] N. Shaikh. http://www.chp.edu/CHP/Shaikh,+Nader,+MD,+MPH.
[63] N. Shaikh, A. Hoberman, P. H. Kaleida, H. E. Rockette, M. Kurs-Lasky,
H. Hoover, M. E. Pichichero, O. F. Roddey, C. Harrison, J. A. Hadley, and R. H.
Schwartz. Otoscopic signs of otitis media. Pediatr. Infect. Dis. J., 30(10):822–
826, 2011.
[64] N. Shaikh, A. Hoberman, H. E Rockette, and M. Kurs-Lasky. Development of
an algorithm for the diagnosis of otitis media. Pediatrics, 12(3):214–218, May
2012.
[65] L. Shamir, N. Orlov, D. M. Eckley, T. Macura, J. Johnston, and I. G. Gold-
berg. WND-CHARM: Multi-purpose image classification using compound image
transforms. Pattern Recogn. Lett., 29:1684–1693, 2008.
[66] P. Shekelle, G. Takata, and G. Chan. Diagnosis, natural history, and late effects
of otitis media with effusion. evidence report/technical assessment no. 55. Agency
for Healthcare Research and Quality., pages 3–23, May 2003.
[67] I. Sluimer, A.Schilham, M. Prokop, and B. V. Ginnekan. Computer analysis of
computed tomography scans of the lung: A survey. IEEE Trans. Med. Imag.,
25:385–405, April 2006.
[68] P. A. Tahtinen, M. K. Laine, P. Huovinen, J. Jalava, O. Ruuskanen, and A. Ruo-
hola. A placebo-controlled trial of antimicrobial treatment for acute otitis media.
The New England J. Med., 2011.
[69] D. W. Teele, J. O. Klein, and B. Rosner. Epidemiology of otitis media during
the first seven years of life in children in greater Boston: A prospective, cohort
study. J. Infect. Dis., 160(1):83–94, 1989.
BIBLIOGRAPHY 103
[70] P. S. Tsai and M. Shah. Shape from shading using linear approximation. Image
Vis. Comput., 12:487–498, 1994.
[71] M. Varma and A. Zisserman. Classifying images of materials: Achieving view-
point and illumination independence. In Proc. Eur. Conf. Comput. Vis., vol-
ume 3, pages 255–271, May 2002.
[72] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer
vision algorithms, 2008. http://www.vlfeat.org/.
[73] C. Vertan, D. C. Gheorghe, and B. Ionescu. Eardrum color content analysis in
video-otoscopy images for the diagnosis support of pediatric otitis. In Int. Symp.
on Signals, Circuits Syst., Bucharest, Romania, July 2011.
[74] C. Xu and J. L. Prince. Snakes, shapes and gradient vector flow. IEEE Trans.
Med. Imag., 7:359–369, March 1998.
[75] L. A. Zadeh. Fuzzy sets. Information and Control., 8(3):338–353, 1965.
[76] D. Ziou and S. Tabbone. Edge detection techniques-an overview. Int. J. Pattern
Recogn. Image Anal., 8(4):537–559, 1998.