Automated Diagnosis of Otitis Media A Vocabulary and Grammarjelena.ece.cmu.edu/repository/theses/13_Thesis_Kuruvilla.pdf · Automated Diagnosis of Otitis Media ... Automated Diagnosis

Automated Diagnosis of Otitis Media:

A Vocabulary and Grammar

bimagicLab Dept. of Biomedical Engineering

Carnegie Mellon University

Anupama Kuruvilla

Automated Diagnosis of Otitis Media:

A Vocabulary and Grammar

Anupama Kuruvilla

Advisor: Prof. Jelena Kovacevic

Center for Bioimage Informatics

Department of Biomedical Engineering

Carnegie Mellon University, Pittsburgh, PA 15213

Thesis Manuscript

Submitted in partial fulfillment of the requirements towards the Ph.D. degree

awarded by the Department of Biomedical Engineering, Carnegie Inst. of Tech.,

Carnegie Mellon University.

Thesis Committee Members

Prof. Jelena Kovacevic (Advisor)

Departments of Biomedical Engineering and

Electrical and Computer Engineering


Dr. Alejandro Hoberman

Division of General Academic Pediatrics

Children’s Hospital of Pittsburgh

University of Pittsburgh School of Medicine

Prof. Jose M. F. Moura

Departments of Biomedical Engineering and

Electrical and Computer Engineering


Prof. George D. Stetten

Department of Biomedical Engineering and

The Robotics Institute


എനറെe അമമകയ ം അമപപചചന ം, എനനില അമ അമഎനന ം അമവിശവസിചചദിന ം, അമ

എനറെനന സനനഹിചചദിന ം, അമ എനറെe അമകയ പിടിചചദിന ം.

To Amma and Appachen,

who never doubted, endlessly loved, and constantly supported.

Abstract

This thesis presents an automated algorithm for classifying diagnostic categories of

otitis media (middle ear inflammation): acute otitis media, otitis media with effusion,

and no effusion. Acute otitis media represents a bacterial superinfection of the middle

ear fluid, while otitis media with effusion, represents a sterile effusion that tends to

subside spontaneously. Diagnosing children with acute otitis media is difficult as

it is often confused with otitis media with effusion leading to overprescription of

antimicrobials as they are beneficial only for children with acute otitis media. Such

misdiagnoses is of increasing concern as it leads to mismanaged episodes of otitis

media and most importantly compromises the efficacy of any future treatments for a

bacterial infection. The current standard of clinical diagnosis of otitis media is visual

examination of the tympanic membrane, this manual and subjective evaluation has

clearly shown its limitations prompting the need for an accurate and automated

diagnostic algorithm.

To that end, we design a feature set understood by both otoscopists and engineers

based on the actual visual cues used by otoscopists; we term this the otitis media

vocabulary. We also design a process to combine the vocabulary terms based on the

decision process used by otoscopists; we term this the otitis media grammar. The

algorithm achieves 93.5% classification accuracy, outperforming both clinicians who

did not receive special training and state-of-the-art classifiers.

Acknowledgments

This thesis would be inconceivable if it was not for the very much appreciated support

of all the people who have taken part in my journey. This is my attempt at a tribute

to few of those people.

I wish to express my heartfelt gratitude to my advisor Prof. Jelena Kovacevic for

mentoring me during my years of graduate study. Her guidance, cheerful enthusiasm

and dedication has made my journey at CMU filled with adventure and fun from start

to finish. I have benefited very much from the freedom and independence she has

allowed me, while always knowing that I could count on her support when needed. No

ornamentation of words can do justice in expressing my gratitude and joy in learning

under her advice and guidance that extends far beyond the realm of research. Thank

you for being such an admirable “guru”.

I would like to thank my thesis committee members for consenting to be part of

this work. My sincere thanks to Dr. Alejandro Hoberman—his expert opinion was

very instrumental in the development of this work. I am grateful to him for promptly

and patiently answering all my questions, and providing good insights to the problem.

Many thanks to Prof. Jose M. F. Moura for his valuable feedback on the project and

introducing us to RICOH Inc., with whose collaboration we have an opportunity to

turn this research into a product. A warm thanks to Prof. George D. Stetten for his

comments that improved our work and pushed it to a higher level.

ii

iii

My sincere thanks to all the collaborators with whom I have had the opportunity

to work with and learn from. In particular, I thank Dr. Nader Shaikh who helped

us with the data and enthusiastically offered his help and comments on our paper.

I thank Dr. Pedro Quelhas for his initial work on the project. I would also like to

thank Dr. Pablo Hennings Yeomans, a senior member of our lab, who very patiently

introduced to his initial work on otitis media classification. Our meetings were fun

and all the inputs were greatly helpful and very much appreciated.

I would like to express my gratitude to Prof. Gustavo. K. Rohde for being my

co-advisor during my initial years as a graduate student. His passion for research and

attention to detail is something we should all strive for.

During my thesis, I have had the opportunity to supervise two very talented

students; Jian Li and Lakshmi Dhevi Jayagobi. Their significant contribution finds

place in this thesis. I have greatly enjoyed working with the both of them and cherish

good memories of our interactions.

I am grateful to all my academic siblings at bimagicLab, working with whom has

been my fortune and privilege. I have learnt so much and enjoyed our time together.

I feel lucky to have known Michael McCann for being a very helpful labmate and

cheerful neighbor. Many thanks to him for painstakingly reading every document

I have written including this thesis, his quick software fix ups, great conversations,

and above all, being a good friend. Many thanks to Filipe Condessa, for our mutual

enthusiasm to seeing each others ’novel’ MATLAB plots, our fun conversations, and

enjoyable walks home. Thank you Kuan-Chieh (Jackie) Chen and Siheng Chen, for

being sweet neighbors, your inputs and readiness to always offer help. I have benefited

from the interactions with my seniors in the lab and thank Wei Wang, Cheng Chen

and Ramamurthy Bhagavatula for helping me with their guidance and expertise. I

also thank my other CBI labmates over the years and CBI staff for making it a happy

iv

working environment.

Many have contributed to this thesis both directly and indirectly, in person, over

emails, phone and across time zones. Many thanks to all of them for their friendship

through all these years even though most are half-way across the globe. I would like

to mention some of them here depending on where I met them.

Pittsburgh: To Dr. Susan Cherian, for all the different hats she wore during

our friendship. Thank you for being my friend, my 24/7 therapist and helpline, yoga

partner, and above all my family. Thank you for introducing me to your wonderful

family and especially Adeline (Coco) who taught me the concept of “fake laugh”. I

have always secretively enjoyed it when people mistook me for her daughter and was

never made to feel any less. Cheers to our “cosmic-consciousness” for a lifetime. To

Rev. Fr. John Mathew Elanjileth, for being our partner in crime for numerous drives,

gifts and wonderful times. Special thanks to Meena Rocky for all our fun shopping

trips, spoiling me with custom-made birthday cakes, and always welcoming me home

with warmth and care. Special thank you to Ozzie Miloykovich for our super-fun

girl nights, discovering art, music and movies in Pittsburgh, and uncountable trips to

your kitchen for gourmet food and chocolate bars. I would like to say a big thank you

to all the wonderful people I met in Pittsburgh and have become friends along the

way. In particular, warm thanks to Rathna Veeramachaneni for being a dear friend

and Sumedha Sethi for all the fun times. Many thanks to Deepa Krishnaswamy, for

introducing me to plays, music and art, painting, dance lessons, walks in the park

and the list goes on.

Bangalore: I am extremely grateful to Dr. Gowri Srinivasa, for being part of

my life right from undergraduate work up until now and of course introducing me to

Jelena! I consider myself very fortunate to have been associated with you first as your

student and now as a peer. Thank you for guiding and encouraging me throughout

this journey. It is always a treat to read super-long emails with updates on ‘namma

Bengaluru’. I express my deepest gratitude to Prof. Ajey S.N.R for introducing me

to early lessons of signal processing. I will forever cherish his expert tutelage.

My incomparable gratitude to Vaibhav Upadhyaya, for being my biggest fan and

rock through all times. It is my good fortune that I can always count on him being

my “Atlas”. Thank you for being such a splendid fellow traveler. Thank you Vaidehi

Murthy, for being my feel-good factor, all the support and love along these long years

across continents. Going through everything with her made it so much easier! Many

thanks to Rakshatha Krishnamurthy for constantly checking up on me, heart to heart

conversations and most importantly for her role as “akashwani”.

Thank you to my dearest sister, Poornima Kuruvilla for among other things, her

excitement on happenings in my life and the sheer happiness I benefit from her just

being part of my life.

Finally, I would like to thank the two people who have been with me from my

beginning and all through: my parents.

Amma: Much of what I have learned over the years came as the result of being your

daughter. You have always inspired me, consciously and subconsciously contributed

tremendously to whom I have grown up to be. I am eternally gratefully for all the

cheers, chiding, laughs, lessons and my everyday dose of encouragement. I am so

blessed. Thank you!

Appachen: For all the sacrifices you made for me and undying support through

all times. I will always appreciate your understanding and belief in me. Thank you

for letting me choose and be.

I also gratefully acknowledge support from the NSF through award 1017278, the

NIH through award 1DC010283, the NIH-NIAID through award 3U01AI066007-02S1,

and the CMU CIT Infrastructure Award.

Contents

1 Introduction 1

1.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Diagnosis and Management of Otitis Media 6

2.1 Diagnostic Categories of Otitis Media . . . . . . . . . . . . . . . . . . 6

2.1.1 Acute Otitis Media . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Otitis Media with Effusion . . . . . . . . . . . . . . . . . . . . 8

2.1.3 No Effusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Clinical Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Simple Hand-held Otoscope . . . . . . . . . . . . . . . . . . . 10

2.2.2 Pneumatic Otoscope . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.3 Video Otoscope . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.4 Tympanometry . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Diagnostic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Misdiagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Judicious Use of Antimicrobial Agents . . . . . . . . . . . . . 16

3 Background and Related Work 18

3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

vi

CONTENTS vii

3.1.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2 Image Correction . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 Overview of Classification Methods . . . . . . . . . . . . . . . 24

3.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3 Clustering and Classification Problems . . . . . . . . . . . . . 27

3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.1 Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . 38

3.3.2 Vocabulary and Grammar . . . . . . . . . . . . . . . . . . . . 39

3.3.3 Automated Classification of Otitis Media . . . . . . . . . . . . 40

4 Goals of the Thesis 42

4.1 Gaps to Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.2 Guiding Principles . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Diagnosis as Classification . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Preprocessing 49

5.1 Automated Segmentation of the Tympanic Membrane . . . . . . . . . 50

5.2 Image Correction: Inpainting Tympanic Membrane Images . . . . . . 51

5.3 Rejection of Unreliable Data . . . . . . . . . . . . . . . . . . . . . . . 53

5.3.1 Rejection due to Specular Highlights . . . . . . . . . . . . . . 53

5.3.2 Rejection due to Over/Under Exposure . . . . . . . . . . . . . 53

5.3.3 Rejection due to Presence of Cerumen . . . . . . . . . . . . . 55

6 Otitis Media Vocabulary 57

6.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

CONTENTS viii

6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.3 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.3.1 Bulging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.3.2 Central concavity . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.3.3 Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3.4 Malleus presence . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3.5 Translucency . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3.6 Amber level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3.7 Bubble presence . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3.8 Grayscale variance . . . . . . . . . . . . . . . . . . . . . . . . 66

7 Otitis Media Grammar 67

7.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2.1 Hierarchical-Rule based Grammar . . . . . . . . . . . . . . . . 68

7.2.2 Fuzzy-Logic based Grammar . . . . . . . . . . . . . . . . . . . 72

8 Experimental Results 78

8.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

8.2 Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.2.1 Diagnosis by Expert Otoscopists . . . . . . . . . . . . . . . . . 79

8.2.2 Data Set 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8.2.3 Diagnosis by General Pediatricians . . . . . . . . . . . . . . . 81

8.3 Automated Classifiers for Comparison . . . . . . . . . . . . . . . . . . 82

8.3.1 Correlation Filter Classification System . . . . . . . . . . . . . 83

8.3.2 Multiresolution Classifier . . . . . . . . . . . . . . . . . . . . . 83

8.3.3 SIFT and Shape Descriptors with SVM Classifier . . . . . . . 83

CONTENTS ix

8.3.4 WND-CHARM Classifier . . . . . . . . . . . . . . . . . . . . . 84

8.3.5 Random Forest Classifier . . . . . . . . . . . . . . . . . . . . . 84

8.4 Classification of Tympanic Membrane Images . . . . . . . . . . . . . 85

8.4.1 Results: DS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.4.2 Results: DS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.4.3 Results: DS3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

9 Conclusions 92

CONTENTS x

List of Acronyms

ANFIS Adaptive neuro fuzzy inference system

AOM Acute otitis media

CAD Computer-aided diagnosis

CFC Correlation filter classification system

CFCR Correlation filter classification system with rejection

DoG Difference of Gaussians

DS1 Data set 1

DS2 Data set 2

DS3 Data set 3

GP General Pediatricians

H&E Hematoxylin and eosin

MRC Multiresolution classifier

MRCR Multiresolution classifier with rejection

NN Neural network

NOE No effusion

OMC Otitis media classifier

OMCR Otitis media classifier with rejection

OME Otitis media with effusion

OMFLC Otitis media fuzzy logic classifier

OMFLCR Otitis media fuzzy logic classifier with rejection

RF Random forests classifier

RFR Random forests classifier with rejection

CONTENTS xi

List of Acronyms

SIFT Scale invariant feature transform

SSC SIFT and Shape descriptors with SVM classifier

SSCR SIFT and Shape descriptors with SVM classifier with rejection

SVM Support vector machine

WCM WND-CHRM classifier

WCMR WND-CHRM classifier with rejection

List of Notations

B Bright region

D Dark region

fa Amber feature

fb Bulging feature

fc Central concavity feature

fl Light feature

fm Malleus presence feature

ft Translucency feature

I Square neighborhood

K Number of clusters initialized for K-means algorithm

(m,n) Pixel location in an image

(mc, nc) Pixel location of central concavity detection

Nt Total number of pixels to train translucency feature

Ntl Number of training images for translucency feature

R Set of radii

r Radius of circular neighborhood

X Original image

Xa Binary image of amber level detection

Xbp Binary image of bubble presence detection

Xc Binary mask of cerumen detection

Xb Binary mask of bulging detection

Xd Depth map

Xm Binary mask of segmented region

Xt Binary image of translucency detection

Td Threshold of the depth map

Tl Threshold of light feature

θmax Direction perpendicular to maximum illumination gradient

List of Figures

2.1 Bulging observed in TM during incidence of AOM. . . . . . . . . . . 8

2.2 Examples of tympanic membrane images of OME. . . . . . . . . . . . 9

2.3 Examples of tympanic membrane images of normal ears. . . . . . . . 9

2.4 Images of hand-held otoscopes. . . . . . . . . . . . . . . . . . . . . . 11

2.5 A example of tympanogram readings [52]. . . . . . . . . . . . . . . . 13

4.1 Illustration of inter-class similarity. Examples of tympanic membrane

images of OME (left) and AOM (right) showing strong similarity in

appearance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Illustration of intra-class variability. Examples of tympanic membrane

images of OME, different severity conditions along OME condition

leads to different presentations. . . . . . . . . . . . . . . . . . . . . . 44

4.3 Guidelines for grammar design: Decision tree for the diagnosis of otitis

media [64]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4 Block diagram of our proposed otitis media classifier. . . . . . . . . . 48

5.1 Comparison of automated segmentation (top) and hand segmentation

by expert otoscopists (bottom). . . . . . . . . . . . . . . . . . . . . . 50

xiii

5.2 Correction of specular highlights for AOM (left), OME (middle) and

NOE (right). Input images are in the top row, identification of specular

highlight regions in the middle row, and correction of the identified

regions in the bottom row. . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 Examples of rejected images from each class with AOM (left), OME

(middle) and NOE (right). Top row corresponds to images rejected

due to washed out appearance and bottom row corresponds to images

rejected due to dull appearance. . . . . . . . . . . . . . . . . . . . . . 54

5.4 Examples of rejected images. Top row corresponds to input images

and bottom row corresponds to images showing detected wax regions. 55

6.1 Computation of the bulging feature. . . . . . . . . . . . . . . . . . . . 60

6.2 Computation of the central concavity feature. . . . . . . . . . . . . . 61

6.3 Computation of the light feature. . . . . . . . . . . . . . . . . . . . . 62

6.4 Computation of the malleus presence feature. . . . . . . . . . . . . . 64

7.1 Initial grammar for classifying otitis media. . . . . . . . . . . . . . . . 68

7.2 Stage 1: Grammar for identifying AOM. . . . . . . . . . . . . . . . . 69

7.3 Stage 2: Grammar for identifying NOE. (Black arrows/boxes denote

those paths belonging to this stage; gray ones belong to Stage 1.) . . 70

7.4 Stage 3: Grammar for identifying OME. (Black arrows/boxes denote

those paths belonging to this stage; gray ones belong to Stages 1 and 2.) 71

7.5 Example of a binary membership function. . . . . . . . . . . . . . . . 73

7.6 Example of a continuous membership. . . . . . . . . . . . . . . . . . . 74

7.7 Examples of membership function using sigmoidal functions. . . . . . 75

List of Tables

4.1 Guidelines for vocabulary design: Otoscopic findings associated with

clinical diagnostic categories of tympanic membrane images [63]. . . . 46

8.1 High variability in the diagnoses among the three expert otoscopists on

the tympanic membrane images in data set DS3. The rows correspond

to the total number of images assigned by an expert to each diagnostic

category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8.2 Agreement of diagnoses by two expert otoscopists on the diagnosis of

tympanic membrane images in data set DS3. . . . . . . . . . . . . . . 81

8.3 Diagnoses by three general pediatricians (columns) versus the ground

truth of expert otoscopists (rows). . . . . . . . . . . . . . . . . . . . . 82

xv

LIST OF TABLES xvi

8.4 Classification accuracies (in %) on the ground-truth set of 181 tym-

panic membrane images. Each row corresponds to the class-wise clas-

sification accuracies and columns correspond to the diagnosis by three

general pediatricians (GP) as well as the following algorithms: corre-

lation filter classification system (CFC), WND-CHRM (WCM), mul-

tiresolution classifier (MRC), SIFT and shape descriptors with SVM

classifier (SSC), random forest classifier (RF), our initial classifier [36],

otitis media classifier (OMC) [37], and otitis media fuzzy logic classifier

(OMFLC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


panic membrane images out of 181 images after rejection. Each row

corresponds to the class-wise classification accuracies and columns cor-

respond to classification by the following algorithms: correlation filter

classification system (CFCR), WND-CHRM (WCMR), multiresolu-

tion classifier (MRCR), SIFT and shape descriptors with SVM clas-

sifier (SSCR), random forest classifier (RFR), otitis media classifier

(OMCR), and otitis media fuzzy logic classifier (OMFLCR). . . . . . 86

8.6 Diagnoses by three general pediatricians (columns 2, 3, and 4) and

OMFLC (columns 5, 6, and 7) versus the ground truth of expert oto-

scopists (rows) on images in data set DS1. . . . . . . . . . . . . . . . 87


panic membrane images (267 AOM, 82 OME, and 41 NOE). Each row

corresponds to the class-wise classification accuracies and columns cor-

respond to the classification by the following algorithms: WND-CHRM

(WCM), correlation filter classification system (CFC), multiresolution

classifier (MRC), random forest classifier (RF), SIFT and shape de-

scriptors with SVM classifier (SSC), otitis media classifier (OMC) [37],

and otitis media fuzzy logic classifier (OMFLC). . . . . . . . . . . . . 89

8.8 Classification accuracies (in %) on the ground-truth set of 233 out of

390 tympanic membrane images (144 AOM, 52 OME, and 37 NOE)

after rejection. Each row corresponds to the class-wise classification ac-

curacies and columns correspond to the classification by the following

algorithms: random forest classifier (RFR), correlation filter classifica-

tion system (CFCR), WND-CHRM (WCMR), multiresolution classi-

fier (MRCR), SIFT and shape descriptors with SVM classifier (SSCR),

otitis media classifier (OMCR) [37], and otitis media fuzzy logic clas-

sifier (OMFLCR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89


panic membrane images (58 AOM, 112 OME, and 78 NOE). Each

row corresponds to the class-wise classification accuracies and columns

correspond to the classification by the following algorithms: multires-

olution classifier (MRC), correlation filter classification system (CFC),

WND-CHRM (WCM), SIFT and shape descriptors with SVM classifier

(SSC), random forest classifier (RF), otitis media classifier (OMC) [37],

and otitis media fuzzy logic classifier (OMFLC). . . . . . . . . . . . . 90

LIST OF TABLES xviii

8.10 Classification accuracies (in %) on the ground-truth set of 162 out of

248 tympanic membrane images (44 AOM, 46 OME, and 72 NOE).

Each row corresponds to the class-wise classification accuracies and

columns correspond to the classification by the following algorithms:

multiresolution classifier (MRCR), correlation filter classification sys-

tem (CFCR), random forest classifier (RFR), SIFT and shape descrip-

tors with SVM classifier (SSCR), WND-CHRM (WCMR), otitis media

classifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR). 90

Chapter 1

Introduction

Middle-ear inflammation, clinically known as acute otitis media (AOM) is a frequent

condition affecting a majority of the pediatric population for which antimicrobials

are prescribed in the United States. The children are mostly affected in their first

two years of life, particularly between 6 and 12 months. The number of otitis media

episodes has increased substantially in the past two decades, with approximately 25

million visits to office-based physicians in the US and a total of 20 million prescriptions

for antimicrobials related to otitis media yearly [69]. This results in significant social

burden and indirect costs due to time lost from school and work, with an estimated

annual medical expenditure of approximately $2 billion [56].

The correct diagnosis and management of otitis media has a significant impact on

the health of children and overall use of antimicrobial agents. Even though numerous

clinical studies exist on this prevalent problem, there is a vast amount of variability

in the medical community on the optimal diagnostic criteria and management of

otitis media. Thus, AOM is frequently over-diagnosed as it gets confused with otitis

media with effusion (OME, a sterile effusion that subsides spontaneously), resulting in

unnecessary antibiotic prescriptions to a substantial proportion of patients in whom

1

CHAPTER 1. INTRODUCTION 2

it leads to adverse effects and increased bacterial resistance without the expected

benefit of an improved clinical outcome. This issue is of increasing concern since it

leads to mismanaged episodes of otitis media and most importantly compromises the

efficacy of any future treatments for a bacterial infection.

The current standard of clinical diagnosis of AOM includes the visual exami-

nation of the tympanic membrane using an otoscope; this is time-consuming, not

reproducible, error-prone, subjective, and shows limited intra and inter observer re-

producibility. These concerns underscore the critical need for developing an accurate,

efficient, and automated system for the classification of otitis media into AOM, OME,

and no effusion (NOE) leading to the goal of this thesis:

To develop an accurate automated classification algorithm for

the classification of otitis media into three distinct diagnostic

categories, based on tympanic membrane images.

The material presented in this thesis is developed based on the domain-knowledge

of expert otoscopists. In an ideal situation, if each of the diagnostic categories of

otitis media presented a unique set of signs and symptoms, then an engineer’s job

of designing an automated classifier would reduce to building a look-up table that

associates diagnostic categories with a set of signs and symptoms. Unfortunately,

this is not the case. In a real-world situation the experts’ diagnostic process involves

weighing different pieces of information and evidence gathered over multiple situations

in order to reach the most appropriate diagnosis for the presented situation. It is our

goal to mimic the expert otoscopists’ diagnostic capability by designing a system that

utilizes the expert human knowledge in addition to performing numerical calculations

on the data.

Currently, an automated classification algorithm for otitis media does not exist

and we are the first to develop such algorithm that can be used as a diagnostic aid to


classify tympanic membrane images into one of the three stringent clinical diagnostic

categories: AOM, OME, and NOE. This contribution is significant because it provides

clinicians with an objective, highly accurate classification system that will be a easy

to use clinical aid to discriminate among AOM, OME and NOE.

An accurate automated classification system will enable a more appropriate use of

antibiotics by decreasing the rate of misdiagnoses of OME as AOM (over-prescription),

and AOM as OME (under-prescription). A decrease in over-prescription will result in

reduction of (1) adverse effects, (2) bacterial resistance, as well as (3) financial costs

(direct medication costs, copays, emergency department and primary care provider

visits, and indirect missed work, special day care arrangements) associated with over-

prescription. A decrease in under-prescription will result in appropriate treatment

for a bacterial infection and similar reduced financial costs. Together, these will lead

to more appropriate and enhanced quality of care.

1.1 Thesis Contributions

We present the main contributions of the thesis:

1. Otitis Media Vocabulary. We develop a vocabulary (feature set) understood

by both otoscopists and engineers based on the actual visual cues used by oto-

scopists. Our working hypothesis is that mimicking the features of the trained

otoscopists closely increases the classification accuracy. The results presented

in this thesis demonstrate that using a small set of physiologically meaningful

features increases the classification accuracy.

2. Otitis Media Grammar. We develop a grammar (decision-making process) to

combine the vocabulary terms based on the decision process used by otoscopists.

Our working hypothesis is that developing a decision-making process based on


clinical diagnostic process will yield a highly accurate classification system to

distinguish the diagnostic categories of AOM/OME/NOE. The otitis media

classifier built using otitis media grammar outperforms the other classifiers as

demonstrated in this thesis.

1.2 Thesis Outline

This thesis is organized as follows. In Chapter 2, we introduce the three diagnostic

categories of otitis media and present a review of the current diagnostic tools and pro-

cedures employed in a pediatrician’s office for diagnosing otitis media. We highlight

the importance of accurate diagnosis and judicious administration of antimicrobials.

The focus of these sections is emphasize the challenges of accurately diagnosing otitis

media and to introduce the need for an accurate automated classification system.

In Chapter 3, we present the background on segmentation and image correction

techniques. This is followed by a discussion of unsupervised and supervised classi-

fication methods used in this thesis. The focus of these sections is to provide the

necessary background and highlight some of the advantages and limitations of these

methods. We also present the related previous work in the area, starting with a review

of the existing computer-aided diagnostic systems and then introduce the notion of

vocabulary and grammar that is the basis and inspiration of the methods we develop.

Finally, we also discuss the previous work in the area of automated diagnosis of otitis

media.

In Chapter 4, we highlight the challenges presented by the tympanic membrane

images. The need for automated classification system for otitis media and provide

the overall framework of the otitis media classification system designed in this thesis.

In the three subsequent chapters, we present each module of the otitis media


classifier in detail, starting with preprocessing, feature extraction and classification.

In Chapter 5, we present preprocessing of the tympanic membrane images, which

is the first module of the otitis media classification system. In Section 5.1, we

present the segmentation of tympanic membrane, previous work of bimagicLab [4]

(Dr. Kovacevic’s group). From Section 5.2, we present work of this thesis starting

with image correction and rejection techniques for making the data reliable for further

processing.

In Chapter 6, we describe the otitis media vocabulary, one of the major contribu-

tion of the thesis. The vocabulary features are designed to mimic the actual visual

cues used by expert otoscopists’ during the clinical diagnosis.

In Chapter 7, we describe the otitis media grammar, another major contribution

of this thesis. Here, we present the decision making process to combine the vocab-

ulary terms mimicking the diagnostic process used by the expert otoscopists while

examining otitis media.

In Chapter 8, we discuss the evaluation of otitis media classifier compared to

other automated classifiers on the tympanic membrane images of otitis media. The

otitis media classifier demonstrates superior performance on the classification of the

tympanic membrane images of three diagnostic categories of otitis media. Results

demonstrate that the performance of the classifier is comparable to the diagnosis

agreed upon by a panel of three expert otoscopists.

In Chapter 9, we conclude this thesis by summarizing our work and proposing

ideas for the future.

Chapter 2

Diagnosis and Management of

Otitis Media

Otitis media is the childhood illness that results in the most frequent reasons for visits

to a pediatricians office. The purpose of this chapter is to provide an understanding

of the diagnostic categories of otitis media that are important for the discussion in

this thesis. We introduce the three diagnostic categories of otitis media, followed by

different diagnostic systems used in the current clinical setting. We conclude with

the current standard of evaluation in clinical setting and the importance of reducing

unnecessary and ineffective antimicrobial therapy prescribed for non-acute cases of

otitis media.

2.1 Diagnostic Categories of Otitis Media

Otitis media is the general term for the inflammation of the middle ear. It occurs in

the Eustachian tube that connects the middle ear cavity with the nasopharynx. The

Eustachian tube performs three main functions: Firstly, it allows the passage of air in

6

CHAPTER 2. DIAGNOSIS AND MANAGEMENT OF OTITIS MEDIA 7

the tube that is important to maintain equal pressure on both sides of the eardrum.

Secondly, it drains middle ear secretions into the nose and, lastly, it prevents the flow

of fluid back up the tube into the middle ear. Since the Eustachian tube is narrow

in infants even slight amount of swelling blocks the tube, impeding its function and

causing fluid buildup in the middle ear that can lead to an inflammation. The most

frequent occurrence of this condition is during the first two years of life owing to the

ongoing physiological and immunological development of children. Studies show that

70% of children experience at least one episode of otitis media during their first year

and 93% experience otitis media once in their first seven years [69].

“Otitis” means inflammation and “media” means ear, hence the name – otitis

media. The term “otitis media” refers to a continuum of ear disease: acute middle

ear infections (acute otitis media, purulent otitis media, suppurative otitis media),

the accumulation of fluid in the middle ear (otitis media with effusion, serous otitis

media), or both.

2.1.1 Acute Otitis Media

Acute otitis media (AOM) is an infection of the middle ear and Eustachian tube. It

can occur at any age, but primarily affects children between the ages of six months

and two years. AOM is a frequent condition affecting a majority of the pediatric

population for which antibiotics are prescribed.

Children with AOM present different symptoms such as ear pain (otalgia), ear

discharge (otorrhea) and/or temporary hearing loss. During AOM, the tympanic

membrane looks inflamed, opaque and bulged with marked redness. Bulging of the

tympanic membrane is considered the most reliable otoscopic characteristic in cases

of AOM [63]. Figure 2.1 shows examples of mild bulging, moderate bulging and

severe bulging observed in AOM. Additional non-specific symptoms in young children


(a) Mild bulging. (b) Moderate bulging. (c) Severe bulging.

Figure 2.1: Bulging observed in TM during incidence of AOM.(Images courtesy: All the tympanic membrane images presented and used in this thesis areprovided by Dr. Hoberman and Dr. Shaikh.)

include: irritability, fever, night waking, poor feeding due to decreased appetite, cold

symptoms, conjunctivitis and occasional balance problems.

2.1.2 Otitis Media with Effusion

Otitis media with effusion (OME) is the presence of sterile middle ear fluid without

signs and symptoms of acute ear infection; this condition tends to subside sponta-

neously. Many cases result in recurrent OME and 5% to 10% of the cases last more

than a year [38, 55]. This is an important clinical condition due to its prevalence in

children and the costs associated with it. Approximately 2.2 million cases of OME

are reported annually in the United States with associated costs of $4 billion [66].

Despite the high prevalence of OME, most common clinical practices are unable to

identify these cases correctly. Diagnosing OME accurately is crucial for proper man-

agement and distinguishing OME from AOM is fundamental in ensuring appropriate

use of antimicrobials.

In OME, the tympanic membrane is often partially cloudy with decreased mobil-

ity and an air-fluid level or bubble may be visible [63]. Examples of the tympanic


Figure 2.2: Examples of tympanic membrane images of OME.

membrane images of OME are shown in Figure 2.2. Some cases of OME may be

asymptomatic or the patients may experience ear discomfort, hearing loss, or a feeling

of ear-fullness. Children who are at the risk for delayed speech or language are more

likely to be affected due to hearing problems associated with OME.

OME may occur due to dysfunction of Eustachian tube resulting from a upper res-

piratory infection and may precede or succeed the occurrence of AOM. However, since

OME does not result from bacterial infection, it does not benefit from antimicrobial

therapy. Therefore, the task of distinguishing OME from AOM is of utmost impor-

tance. This will ensure avoiding unnecessary use of antibiotics, which leads to adverse

effects of incorrect medication and development of harmful bacterial resistance.

Figure 2.3: Examples of tympanic membrane images of normal ears.


2.1.3 No Effusion

Figure 2.3 shows examples of a normal tympanic membrane. They are pearly gray in

color, translucent, in normal position, and with clear bony landmarks. The landmarks

include the short process and the manubrium of the malleus that are easily observable.

The tympanic membrane moves inward on the application of positive pressure and

outward on applying negative pressure. If the tympanic membrane does not move

with gentle application of pressure, chances are there is presence of middle ear effusion,

sterile or infectious.

2.2 Clinical Diagnostics

When a child is brought to a pediatrician’s office complaining of ear pain and discom-

fort, the clinical diagnostic procedure starts with the parent taking a questionnaire

where the symptoms of the conditions are to be described. This is usually followed

by an examination of the ear by a clinician. A variety of diagnostic tools are available

in the current market that are utilized for the visual examination of the tympanic

membrane. In this section, we briefly introduce some of the diagnostic devices and

tests that might be encountered during the clinical examination and evaluation of

otitis media.

2.2.1 Simple Hand-held Otoscope

An otoscope (see Figure 2.4(a)) is a medical device that enables the clinical examiner

to look at the middle ear and visualize the tympanic membrane. It consists of a

handle and a head. The head of the otoscope houses the illumination source and a

simple low-power magnifying lens. The front end of the otoscope has a disposable


(a) Simple otoscope. (b) Pneumatic otoscope.

Figure 2.4: Images of hand-held otoscopes.(Images courtesy: http://otoscopy.hawkelibrary.com/album08/INst_6.html)

plastic ear speculum with varying sizes. The examiner inserts the speculum of the

otoscope into the ear canal by gently pulling on the pinna (the outer ear) up or down

to straighten the ear canal that has a natural curve, and makes it easier to visualize

the tympanic membrane. The external canal and the tympanic membrane can be

visualized by looking into the magnifying lens and through the speculum. These

hand-held otoscopes can be wall-mounted or portable. Wall-mounted otoscopes are

attached by a flexible power cord to a base, which serves as a source of electric

power plugged into an outlet and as a resting base when not in use. Portable devices

are powered by batteries housed in the handle. As the ear canal is lined with hair

follicles and glands that produce a waxy oil called cerumen, buildup of cerumen often

obstructs the clear visualization of the tympanic membrane. Most models facilitate

the insertion of instruments through the otoscope into the ear canal for removing

wax.


2.2.2 Pneumatic Otoscope

Examination with a pneumatic otoscope allows for determining the mobility of a pa-

tient’s tympanic membrane in response to pressure changes. The examiner gently

puffs air using the attached bulb (see Figure 2.4(b)) into the ear canal to observe

the movement of the tympanic membrane. The normal tympanic membrane moves

in response to pressure. Immobility may be due to fluid buildup in the middle ear,

sterile or infected, a perforation, or tympanosclerosis, among other reasons. Pneu-

matic otoscopy helps to detect the presence of effusion even when the appearance

of the tympanic membrane gives no clear indication of the condition. The detection

of tympanic membrane mobility helps in establishing the diagnosis of OME. This

otoscope is relatively cheaper than the other devices and can be effectively used with

appropriate training.

2.2.3 Video Otoscope

Many doctors’ offices employ more sophisticated video-otoscopes and otoendoscopes,

which connect to a light source (halogen, xenon or LED) and a computer, and can

record images or video. Single hand-held otoscopes do not permit acquisition of

images and/or video and require diagnosis on the spot, while video-otoscopes and

otoendoscopes do; however, the clinician views the feed on a side screen while holding

the device in the ear canal of an often-squirming young child.

2.2.4 Tympanometry

A tympanometer is a hand-held device that provides quantitative information on the

presence of fluid and function in the middle ear. The examination is done by placing

the probe into the ear canal, a sound stimulus is transmitted into the canal while


Figure 2.5: A example of tympanogram readings [52].


a vacuum pump adjusts the pressure in the ear causing the tympanic membrane

to move. A microphone in the instrument detects the returning sound energy. The

mobility of the tympanic membrane is at its maximum when the air pressures are equal

on both sides of it. This often done in conjunction with pneumatic otoscopy, which

provides qualitative measure of tympanic membrane mobility, whereas tympanometry

provides quantitative information. The graphic display of this information showing

the amount of positive and negative pressure generated, the absorption of acoustic

energy by the middle ear, and ear canal volume is called a tympanogram.

Figure 2.5 is an example of a tympanogram. The different curves correspond to

distinct conditions of the middle ear [52]. For example, Figure 2.5(a) shows a flat

curve, which is indicative of decreased mobility of the tympanic membrane. Fig-

ure 2.5(b) shows a completely flat curve showing very low ear canal volume; this

is an indication that the ear canal may be occluded with cerumen. Figure 2.5(c)

shows a flat curve but with volume, this would mean that the ear canal volume is

increased due to perforation of the tympanic membrane. The perforation results in

more volume in the ear canal than the normal volume. Figure 2.5(d) shows a wide

curve with a height in the normal range; though not a clear indication of a pathol-

ogy, it could mean a starting or clearing of OME. Figure 2.5(e) shows a curve with

normal height and negative pressure. Building up of negative pressure increases the

risk of upper respiratory infection and hence presents an increased risk to develop

AOM. Figure 2.5(f) indicates high positive peak pressure due to bulging of tympanic

membrane that is clear indication of AOM.

Additionally, there are a number of other clinical procedures. Tympanocentesis

is a procedure where a tube is placed in the tympanic membrane to drain the fluid

accumulated in the middle ear. Acoustic reflectometry is a test that measures the

amount of sound reflected back by the tympanic membrane as an indirect measure of


the fluid buildup. If the child has persistent ear infections and fluid buildup in the

middle ear, tests are performed by an audiologist to assess hearing, speech skills and

to detect any impediments to normal development.

2.3 Diagnostic Uncertainty

A major challenge in diagnosing otitis media is distinguishing between OME and

AOM. OME is more prevalent than AOM since it can be present during the onset

of AOM or when AOM is resolved. A misdiagnosis of AOM leads to unnecessary

prescription of antibiotics. It is of utmost importance that the examiners must avoid

such false-positive diagnosis in children. The diagnosis of otitis media is particularly

difficult in young children and infants in the preverbal state. Other factors such as the

narrowness of auditory canal, inability of the child to remain still during examination,

or incomplete removal of cerumen from the ear canal adds to the difficulty in making

the diagnosis.

2.3.1 Misdiagnosis

The inherent difficulties in distinguishing among the three diagnostic categories of

otitis media, together with the above issues, make the diagnosis by nonexpert oto-

scopists notoriously unreliable and lead to the following:

1. Overprescription of antibiotics.

AOM is frequently overdiagnosed; this happens when NOE or OME is diagnosed

as AOM, resulting in unnecessary antibiotic prescriptions that lead to adverse

effects and increased bacterial resistance [1]. Overdiagnosis is more common

than underdiagnosis because doctors typically try to avoid the possibility of


leaving an ill patient without treatment, leading to antibiotic prescriptions in

uncertain cases.

2. Underprescription of antibiotics.

Misdiagnosis of AOM as either NOE or OME leads to underdiagnosis. Most

importantly, children’s symptoms are left unaddressed. Occasionally, under-

diagnosis can lead to an increase in serious complications such as perforation of

the tympanic membrane, and very rarely, mastoditis [68].

3. Increased financial costs and burden.

There are direct and indirect financial costs associated with misdiagnoses such

as medication costs, co-payments, emergency department and primary care

provider visits, missed work, and special day care arrangements.

2.3.2 Judicious Use of Antimicrobial Agents

As argued earlier, otitis media is the most frequently treated condition in the pe-

diatrics and the consistent leading reason for prescription of antimicrobials. The

cumulative evidence from the available literature confirms that antimicrobial agents

are unnecessary and non-beneficial for non-AOM cases; these wrong prescriptions lead

to spread of antimicrobial resistance.

Even though there exists a general agreement that only AOM cases benefit from

antimicrobials, stringent criteria to establish the diagnosis in a clinical setting is miss-

ing. For example, in a survey conducted on about 165 pediatricians, 147 combinations

of signs and symptoms were identified as criteria for diagnosis [24]. Such wide vari-

ability in diagnostic criteria leads to non-standard management of otitis media of

which wrong prescriptions are a highly prevalent outcome.

One of the major considerations is accurately classifying AOM and OME that


directly translates to optimal management of otitis media. In a study to assess the

ability of pediatricians and otolaryngologists to differentiate the physical findings of

otitis media through visual evaluation [60], the participants were shown video images

and asked to state their diagnosis. Among the pediatricians, the average rate of

correct diagnosis was 50%. OME was often misdiagnosed as AOM. The average rate

of correct diagnosis for otolaryngologists was 73%. Another study, [59], reported the

results of testing the skill level of pediatricians from different countries. The average

percentage of correct diagnosis performed by pediatricians in Italy was 54%, Greece

36%, South Africa 53%, and USA 51%.

The above concerns underscore the need for an accurate automated classification

system that can be used as a clinical aid during the diagnosis of otitis media to ensure

reliable diagnosis and hence help reduce the development of bacterial resistance.

Chapter 3

Background and Related Work

In this chapter, we present the background material relevant to the discussion in this

thesis. We begin by introducing image preprocessing techniques focusing on segmen-

tation and image correction. This is followed by a discussion of feature extraction

methods and an overview of supervised and unsupervised classification methods. Fi-

nally, we present the previous work relevant to the research presented here.

3.1 Preprocessing

Preprocessing removes undesired noise and enhances the data for the further analysis

and processing. This commonly involves normalizing the intensity of the individual

pixels in the image, removing reflections, and selecting regions of interest in the

image for further computation. Segmentation is the first preprocessing step in our

system aimed at delineating the tympanic membrane region from the image. In

this section, we briefly discuss basics of segmentation and introduce active-contour

segmentation [32,74] used for segmenting the tympanic membrane from an otoscopic

image. This is followed by a review of image correction techniques currently available

18

CHAPTER 3. BACKGROUND AND RELATED WORK 19

to solve a wide range of illumination problems.

Before we define any operations on an image, we introduce the notation to denote

a digital image in our discussion. We denote an image by X ∈ RM×N two-dimensional

array and can be represented as a function of two variables (m,n) and the domain is

given by M ×N .

3.1.1 Segmentation

When a human observer views a scene, the processing that takes place in the visual

system inherently segments the scene. This is done so effectively that the complex

scene now reduces to a collection of coherent objects. While the task of segmenta-

tion seems trivial in human visual processing, it is not in the case of digital image

analysis. Segmentation is a fundamental task in any image processing pipeline where

an image is divided into multiple regions and background [21]. With increasing size

and number of medical images, segmentation algorithms are necessary for delineating

regions of interest for further analysis. Methods for performing segmentation vary

widely depending on the task at hand and the modality of imaging. There is no good

universal segmentation method that works well on all types of image data, however,

there are general methods that are widely applicable on a variety of data. Typically,

application-specific methods fare better than general methods by using prior knowl-

edge of the data. Here, we discuss active-contour based segmentation that has been

used in segmentation of the tympanic membrane images. A full description of other

available segmentation methods is beyond the scope of our discussion and we refer

the reader to additional references for further details [54, 58].

Active-Contour Segmentation Active-contour segmentation belongs to deformable

models using energy-minimizing curves known as “snakes”. An initial contour is


placed in the image, which is then evolved to best fit the desired object/region in

the image. The deformity of the contour is comparable to an elastic rubber band

placed outside the target shape and the shape is found when the rubber band stops

shrinking and fits the shape.

Active contours can be expressed as an energy minimization problem. The target

region in the image is an energy functional having properties that control the way the

contour can expand, contract or curve. The contour evolves according to two types

of forces acting on it [11, 15, 32, 43, 74], the external forces from the image such as

edges and internal forces from the contour itself such as its curvature. The points at

the same energy level are connected by a snake. The snake evolves in an iterative

manner by searching in a local neighborhood to select new points that have lower

snake energy. The external forces from the image force the contour to move and

deform from its initial position to best fit the desired region in the image.

We now formally define the snake formulation as the addition of the contour’s

internal energy, and the image energy denoted by Eint, and Eimage respectively. These

functions act on the set of coordinates of control points that make up the snake, v(s).

v(s) = (m(s), n(s)),

where m(s) and n(s) are the Cartesian coordinates of the contour and s is the nor-

malized index of control points. The energy functional of the snake, Esnake is then

defined as

Esnake =∫ 1

s=0Eint(v(s)) + Eimage(v(s))ds.

The goal of the snake is to evolve by minimizing the above equation. This is


achieved by seeking a set of points v(s) such that

dEsnakedv

= 0.

Let us now consider the parameters that influence the snake’s behavior. The

internal energy Eint is a combination of a continuity term and a smoothness term

written as

Eint = α(s)∣∣∣∣∣dv(s)ds

∣∣∣∣∣2

+ β(s)∣∣∣∣∣d2v(s)ds2

∣∣∣∣∣2

,

where the term dv(s)/ds measures the energy due to stretching; high values of this

term imply high rate of change on the contour. It controls the spacing between the

points and attempts to keep the points at equal distances from each other making the

contour continuous. The second term d2v(s)/ds2 measures the energy from curving.

This term enforces smoothness by avoiding abrupt changes in the curvature. Choice

of α and β controls the shape evolution of the snake. Low values for α imply the

points can be unevenly spaced, whereas higher values imply that the snake aims to

attain evenly spaced contour points. Low values for β imply that curvature is not

minimized and the contour can form corners in its perimeter whereas high values

force the snake to stick to smooth contours.

The other source of energy is the image energy Eimage. The purpose of this term

is to attract the contour towards the target contour using image-features such as

brightness or edges. This is achieved by computing the gradient of the intensity at

each snake point.

Active contours have the advantage of being autonomous ad self-adapting in search

of the minimal energy contour. While local optimization properties of snakes are

sometimes desirable this could also lead to getting stuck in a local minimum state. In

[43], the authors comment on the advantages and disadvantages of energy approaches


of deforming contours and provide an extended literature on snakes.

3.1.2 Image Correction

Several methods [6, 57, 71] are shown to be robust in correcting local illumination

changes. Most of these methods adjust the pixel intensity value of the image using a

nonlinear mapping function for illumination correction based on the estimated local

illumination at each pixel location and combining the adjusted illumination image

with the reflectance image to generate an output image. The extent of possible image

correction and editing ranges from replacement or mixing with another source image

region, to altering some aspects of the original image locally such as illumination or

color. Since these methods can be used to locally modify image characteristics, we

aim to correct local specular highlights observed in tympanic membrane images.

One of the useful image correction method is Poisson image editing [57] that can

be used for correcting regions in an image in a seamless manner. The main idea is to

fill the target region with pixel values obtained by interpolation of pixel values along

the boundary of the target region. We are interested in achieving local changes in

the regions of specular highlights in the tympanic membrane images. Here, we briefly

discuss Poisson image editing technique.

In the an image X, let Ω be a closed region with a boundary ∂Ω. Let f be an

unknown scalar function defined over Ω, f ∗ be a known scalar function defined on X

minus the interior of Ω, and v be a vector field defined over Ω. For each pixel p(m,n)

in X, let Np be the set of its 4-connected neighbors in X. Let < p, q > denote each

such pixel pair such that q ∈ Np. The boundary of Ω is then given by

∂Ω = p ∈ X\Ω : Np ∩ Ω 6= ∅.


The value of the function f at a pixel p is denoted by fp. The task is to compute

the set of intensities, f |Ω = fp, p ∈ Ω. This is achieved by solving the minimization

problem:

minf |Ω

∑<p,q>∩Ω6=∅

(fp − fq − vpq)2,with fp = f ∗p∀p ∈ ∂Ω,

where vpq is the projection of v((p + q)/2) on the edge [p, q]. The solution to the

minimization problem above can be obtained by solving for the simultaneous linear

equations:

∀p ∈ Ω, |Np|fp −∑

q∈Np∩Ωfq =

∑q∈Np∩Ω

f ∗q +∑

q ∈ Npvpq.

In Chapter 5, we will see examples of tympanic membrane images that are cor-

rected using this technique to mitigate the problem of local specular highlights in the

image.

3.2 Classification

Every time we open our eyes and look, we effortlessly perform a visual feat far beyond

the capability of today’s most sophisticated computers, though well within the ca-

pability of a kindergartner. This feat is pattern recognition, a typical human ability

that plays an important role in everyday life in reading texts, identifying people, or

even finding a way home. It is this very ease with which we perform these tasks that

belies our “superior” pattern recognition ability.

As we have seen in the previous chapter, the need for an automated classifica-

tion system is crucial. In the particular case of diagnosing otitis media, the expert

otoscopists rely on their training and experience to distinguish among the three diag-

nostic categories. The advantage of years of experience allows them to assign each of

the otitis media cases to a diagnostic category to the best of their knowledge. Such


manual processing has its limitations prompting the need to switch to computer-aided

diagnosis.

3.2.1 Overview of Classification Methods

Distinguishing diagnostic categories of otitis media from tympanic membrane im-

ages is an image classification task, which we now formally define. Let us assume

that a digital image X ∈ RM×N of the tympanic membrane image is stored as a

two-dimensional array and can be represented as a function of two variables (m,n).

Classification can be defined as a mapping from the space of input images RM×N to

the output space Y = 1, 2, . . . , C of class labels. To reduce the dimensionality of

the problem from M ×N to k, where k M ×N , a feature extractor is defined as

a map f : RM×N 7→ F from the input space to the feature space F = Rk. This is

followed by a classifier defined as a map from the feature space F to the output space

of class labels Y , c : F 7→ Y . The entire classification is then seen as the composition

of these two maps, s = c f .

3.2.2 Feature Extraction

Feature extraction is the process of defining a set of measurements on the image

characteristics that will most efficiently or meaningfully represent the information

that is important for analysis and classification. This is the most critical step in a

classification pipeline since features made available directly influence the efficacy of

classification.

Feature extraction techniques try to capture some intuitive visual attributes from

the image such as composition of the image, placement of objects, spatial relationship

between the objects, color, contrast, pattern etc. Most common feature extraction


techniques seek to capture some of the visual properties from an image such as edges

[9, 45, 76], color [30, 33], texture [22, 31, 39, 44] and shape [43, 74]. We focus here

specifically on two following general feature sets that were used in building our initial

otitis media classifiers:

Haralick Texture Features

These features are designed based on the assumption that the texture information

in an image is contained in the overall or average spatial relationship of the adja-

cent gray-level pixels. Four directions of adjacency are defined for calculation of the

Haralick texture features and are calculated using four gray-level co-occurance matri-

ces [22,23]. Each element [i, j] of such matrix is obtained by counting the number of

times a pixel with value i appears adjacent to a pixel with value j. Each such entry

can be considered as the probability that a pixel of value i will be found adjacent to

a pixel of value j. The four directions of adjacency are defined; horizontal, vertical,

left and right diagonals. To describe the texture of the image 14 statistical measures

are calculated that make up the Haralick texture feature set.

Scale Invariant Feature Transform

The scale invariant feature transform (SIFT) [40, 41] for an image is a set of local

feature vectors. A local region in an image is described by its center coordinates, the

radius of the region, its orientation in radians and the histogram of gradients. Each

of these feature vectors is invariant to scaling, rotation or translation of the image.

The SIFT features are extract in a 4-step filtering approach:

1. Scale-Space Extrema Detection. In this stage, the image is filtered using a

scale space function. This is to detect locations (keypoints) and scales that are


identifiable from different views of the same object. The scale-space is defined

by the function:

L(m,n, σ) = G(m,n, σ) ∗X(m,n),

where ∗ is a convolution operator, G(m,n, σ) is a Gaussain, where scale is varied

by the parameter σ, and X(m,n) is the input image. To locate a keypoint in the

scale-space, the difference of Gaussians (DoG) is used by obtained by computing

the difference between two images, one with scale a times the other given by,

D(m,n, σ) = L(m,n, aσ)− L(m,n, σ).

The local maxima or minima of D(m,n, σ) at each (m,n) is compared with 8

neighbors at same scale, and its 9 neighbors one scale up and down.

2. Keypoint Localization. The keypoints that have poor contrast or poor localized

on an edge are discarded. This is done by comparing the absolute value of the

DoG scale space at the peak with a threshold and discarding the peak if its

value falls below the threshold.

3. Orientation Assignment. Once a peak is identified in the DoG scale space, its

orientation is computed by a histogram of gradient orientations in a Gaussian

window 1.5 times the scale σ of the keypoint. This histogram is then smoothed

and the maximum value is selected as its dominant orientation.

4. Keypoint Descriptor. The SIFT descriptor is a spatial histogram of image gra-

dients. The keypoint descriptor uses a set of 16 histograms, each having 8 ori-

entation bins spaced evenly between [0, 2π], resulting in a feature vector with

128 elements.


Expert Classification Features

Unlike the general features that are applicable to a wide variety of problems, features

can be designed specifically for an application/problem area such as classification of

human faces, fingerprints, documents, natural images, medical images, among others.

Efforts have been made to design physiologically meaningful features trying to mimic

the actual visual cues of the experts in their evaluation process. Examples of such

application-specific features are histopathology vocabulary for delineation of tissues

in images of H&E-stained teratomas [3] and similar vocabulary features were used

in [46] for automated detection of colitis.

3.2.3 Clustering and Classification Problems

In this thesis, we focus most of our discussions on supervised learning methods. Here

we discuss briefly one of the unsupervised learning methods that has been used in our

early attempts to build a classification system and briefly introduce other available

standard classifiers.

K-means Clustering

Clustering involves grouping a set of data points—feature vectors into non-overlapping

partitions, or clusters, where members within a cluster are “more similar” to one

another than to the members belonging to other clusters. The term “more similar”,

when applied to grouping points is defined by some measure of proximity. When a set

of data points is clustered, every point is assigned to some cluster, and then each of

these clusters can be characterized by a single reference point, usually by an average

of the points belonging to the cluster.

K-means clustering [42] is one of the simplest and most popular unsupervised


learning algorithm. In order to compensate for the lack of labeled training data, the

algorithm learns the characteristics of the data through multiple iterations. Consider

the data set x1, x2, . . . , xT. The main idea is to group these T points into a prede-

fined number of clusters, in this case K. Each cluster is then represented by a single

point that is center of the cluster obtained by averaging all the points xt belong-

ing to that cluster. Let µ1, µ2, . . . , µk be the cluster centers initialized randomly.

Each data point xt is assigned to a cluster where the distance to the cluster center

is minimum. Once all the points in the data set are assigned to the clusters centers,

the process can be iterated again by recomputing new cluster centers. Finally, this

algorithm aims at minimizing an objective function,

J =T∑t=1

K∑k=1

atk‖xt − µk‖2,

where atk is a indicator function where atk = 1 if xt is assigned to cluster k and 0

otherwise.

Although clustering does not require training data, they do require initialization

of the cluster centers and are known to be sensitive to initializations.

Correlation Filters

Correlation filters have been traditionally designed for distinguishing patterns from

each other and from the background. In this section, we give a high-level overview of

correlation filter theory. For an excellent, more comprehensive and exhaustive survey

of correlation filters we point the readers to [35]. A correlation filter can be seen as a

spatial-frequency domain array or a template in the image domain that is specifically

learned from a set of training data which is a good representative of the desired

class of pattern/object. This template is then compared to the query image using a


cross-correlation function by spanning the query image by relative shifts between the

template and the query. This can be efficiently computed in the frequency domain

(u, v) as,

C(u, v) = X(u, v)H∗(u, v),

where X(u, v) is the 2D Fourier transform (FT) of the query pattern and H(u, v)

is the correlation filter obtained by 2D FT of the template and C(u, v) is the 2D FT

of the correlation output c(m,n). Here ∗ denotes the complex conjugate.

Once such a template is learned, it can be used as a simple and effective clas-

sifier. The main idea is that correlation filters must ideally produce a sharp peak

at the center of the correlation output c(m,n) (obtained by performing 2D inverse

Fourier transform on the C(u, v)) for the authentic/true class and no such peak for an

impostor/false class. Attractive properties of correlation filters are shift invariance,

robustness to noise, graceful degradation of the response to occlusions and in some

cases simple closed-form solutions. There are different types of correlation filters;

minimum average correlation energy filter, maximum average correlation height fil-

ter, quadratic correlation filters are most commonly used. In Chapter 7, we will see

how correlation filters are used to learn a template of tympanic membrane and used

to classify them into three distinct diagnostic categories.

Support Vector Machines

Support Vector Machine (SVM) learning algorithms are one of the most popular “off-

the-shelf” supervised classification methods. SVM is built on the key idea of learning

a decision boundary that maximizes the distance between points of opposite classes

closest to the boundary. In pursuit of the optimum boundary separation, by using

duality, the separation problem is transformed into another problem that might be


easier to solve. This transformation typically involves projection of the data into a

higher-dimensional space using a non-linear mapping.

Let us consider a binary classification problem, where the training data are

(x1, y1), (x2, y2), . . . , (xT , yT ),

where yt ∈ −1,+1 to denote the class labels for the output class for the training

example xt. The ultimate goal of the SVM algorithm is to construct a hyperplane that

maximally separates the data. For a binary classification problem, this corresponds

to finding a hyperplane such that one side contains all the examples labeled yt = −1

and the other side contains all the points for which yt = +1. When, the problem

contains more than two classes, it is solved by reducing to a number of simpler

binary classification problems. Once this hyperplane is constructed from the training

data, the algorithm makes prediction on the testing data by checking which side of

the hyperplane the testing data on. Note that there could be infinitely many such

separating hyperplanes, in that case, the optimal decision boundary is chosen by

maximizing the classification margin that ensures that the best chance that the new

unseen data points will fall on the correct side of the boundary.

Since we are seeking for a linear decision boundary, the classifier is of the form

h(x) = sign(wTφ(x) + b),

where w is a vector of weights and b is the positive term that represents the margin and

φ as the transformation of the original data set into the new and better represented

space where the data is separable.

An appealing feature of SVM classification is how the decision boundary is sparsely

represented. The hyperplane separating the data points depends on the weights on the


training data. Far away samples receive zero weights while the training points close

to the decision boundary receive non-zero weights. The training points of opposite

classes close to the decision boundary are called the support vectors. The training

points that far from the separating plane do not influence the decision boundary,

making SVMs robust to outliers. This feature of SVM reduces overfitting of the data

making SVMs very popular and widely used for classification problems.

Neural Networks

Neural networks (NN) [61] were built on the principle of trying to model the learn-

ing and adaptation processes in a human brain, thus are similar to their biological

counterparts. One efficient and often employed way of solving complex problem is

following the principle of “divide and conquer”; splitting a complex problem into nu-

merous simple problems. Networks are one approach for achieving the reduction of

complex problems into simple components defined by a set of building blocks, and

connections between them.

NNs are an example of such networks where the building blocks are computa-

tional units called “neurons” and the connection between them is characterized by

the weights assigned to them. The network is made up of interconnected neurons that

work in parallel on the principles of learning and adaptation from the training data.

These neurons are organized in layers, where the neurons in one layer are connected

to the adjacent layer and each of these connections is assigned a weight. Each neuron

takes in multiple inputs and generates an output that is a weighted sum of its input

signals. This output is then input to the activation function or the subsequent hidden

layer. Each activation function is evaluated as,

aj =N∑i=1

xiwij + wj0,


where aj is the activation of neuron j, N is the number of inputs, wij is the weight

assigned to the connection between neurons i and j, and wj0 is the bias at the layer j.

Each of these activations are then transformed using a nonlinear activation function

h(.) that can be a hard threshold function, a sigmoid function or tan-sigmoid function

to produce output y given by,

y = h(aj).

During the learning process these weights are adjusted in the network to accom-

plish best performance in classifying the training data. Depending on the nature of

the problem and the training data, one can expect the NNs to learn the data quite

well and are known to be robust to nonlinear relationships in the data. Many meth-

ods have been devised to train a NN, the most popular being the backpropogation

scheme [61], where the weights are adjusted in each layer such that the error between

the desired output and the actual output is reduced. One of the main disadvantages

include its “black-box” nature, where the internal interactions between the neurons

and the layers themselves become intractable. It also requires a large amount of

training data to be trained properly and produce reliable predictions on the unseen

data.

Random Forests

A Random Forest (RF) consists of a collection or ensemble of simple tree predictors,

which collectively assign a single label when presented with a set of features. The final

output label is the most popular class among all the trees in the ensemble obtained

by the majority voting. The random forest method [8] combines two main ideas,

bagging [7] and random-subspace sampling [25].

A random forest classifier can be defined as collection of tree-structured classifiers.


The tree is built on a bootstrapped version of the training data. Bootstrapping

is a fundamental resampling method that is built on the basic idea that the true

distribution F can be estimated from the so-called empirical distribution F .

Let the training data be (xi, yi), i = 1, 2, ..., T then, the empirical distribution

function can be written as a discrete probability function given by

PF (x, y) =

1T

if(x, y) = (xi, yi), for i.

0, otherwise.(3.1)

A bootstrap sample of size T built from the training data is

(x∗i , y∗i ), i = 1, 2, . . . , T,

where each (x∗i , y∗i ) are drawn uniformly at random from the training set with re-

placement. This then corresponds to exactly T independent draws and then just

approximate the true F .

Each tree in the forest ensemble is then built on a bootstrapped sample and the

splitting at each node is performed on the best feature from a random subset of

features. The final output label is then assigned by taking a majority vote of all the

individual classifiers in the ensemble.

There are two sources of randomness. Firstly, it comes from bootstrapping: the

T data samples are selected at random with replacement. Secondly, at each node

a subset of the features is randomly sampled from the complete feature set. Each

node is then split using the best among this randomly chosen subset of features.

By introducing the randomness into growing the trees, one expects to benefit from

constructing very dissimilar trees, but this is not guaranteed.

The tree is then grown to its maximum depth with no pruning. An estimate of


the error can be obtained from the training data by predicting the labels of the data

not included in the bootstrapped sample. This is called out-of-bag (OOB) error for

one tree using the bootstrapped sample. The OOB error is calculated each tree in

the forest and the error of the entire ensemble is calculated by aggregating the errors

obtained for all the trees.

Another advantage of this method is the calculation of feature importance. Under-

standing the importance of features used to build a tree helps in avoiding overfitting,

improve the model performance and gives a deeper insight to the underlying nature

of the data. Different criteria have been used to measure feature importance. The

most commonly used importance measure in RF classifiers is Gini index. This is used

to determine which feature is to be used to split the tree in the training phase. For

example, under a binary split case, let p represent the fraction of positive samples

and (1 − p) represent the fraction of negative sample at a node. The impurity G

measured by Gini index at that node is given by

G = 2p(1− p)

The purer a node is, the smaller the value of the Gini index. Every time a node is

split by selecting the feature from the subset that would yield the purest node.

The bagging scheme provides the generalization property that improves with de-

crease in variance and improves the over-all generalization error. Unlike classical

decision trees, this method has been proved to be robust against over fitting and

hence there is no need to prune the trees. The complete method is directed by only

two parameters, the number of trees in the ensemble and the number of features to

be randomly sampled at each node.


Rule-Based Classifiers

Systems that are built on a set of rules have many desirable properties. They are easy

to understand and the rules can be based on the prior domain knowledge. Rule-based

methods provide a comprehensible description about a system instead of black box

prediction. These set of rules are useful if the rules are not numerous, understandable,

and predict unseen data with high accuracy. Different types of rule formations are

used to express the nature of data, the most common ones are classical preposition

rules, association rules, fuzzy logic rules, threshold rules, and similarity based rules

[5, 48]. Here, we will briefly discuss classical logic rules and fuzzy rules used in the

discussion of the thesis.

Classical logic rules have the simplest type of rule formation using a logical propo-

sition. This type of rule-based classifier uses a collection of “if . . . then . . . ” rules

for classifying the data. The classifier is governed by a set of rules of the form

R = (r1, r2, . . . , rk), where R is the set of rules and ri’s are the classification rules.

Each classification rule then can be expressed as

ri : (conditioni) 7→ yi.

The left-hand side of the of the rule is called the rule antecedent or the rule condition,

that is made up of tests on the feature values. The right-hand side of the rule is called

the consequent that contains the class label yi. A rule condition is generally of the

form

conditioni = (F1 V1) ? (F2 V2) ? · · · ? (Fk Vk),

where (Fj Vj) is an feature-value pair and is a operator mostly chosen from the set


of relation operators. The result of each feature-value pair is then combined using a

operator ? chosen from the set of logical operators.

The advantage of using such predefined classical logic rules are that they are simple

description of the data that are made of function that are defined on combination

of features. The drawback of this method is that it partitions the feature-space into

hyperboxes and provide an abrupt step-wise approximation to decision boundaries.

Fuzzy Logic

Fuzzy set theory is also known as possibility theory allows dealing with vague or

inexact facts, and is useful for data mining systems performing rule-based classifica-

tion. This concept was introduced in [75]. This can basically be seen as multivalued

logic, that allows for intermediate values to be defined between the conventional crisp

evaluations such as true/false, yes/no, or high/low. The term “fuzzy” suggest an im-

precise boundary than an abrupt one. Unlike the traditional hard computing, fuzzy

logic accommodates imprecision of the real world by allowing for soft computing. It is

a fascinating area of research because it reaches a good trade off between significance

and precision that we humans are good at managing.

Now, we formally defined a fuzzy set and fuzzy membership functions. Consider

a classical set A defined as a collection of elements and the membership or non

membership of each element x is defined by a membership function γA(x), which can

be seen as a mapping:

γA : X 7→ 0, 1

Here γA(x) takes the value either 0 or 1, which represent the truth value of element

x in A whereas fuzzy theory allows for defining partial membership. Let X denote

the universe of discourse of fuzzy set A that characterized by its membership function


γA defined as a mapping:

γA : X 7→ [0, 1]

The most commonly used membership functions are triangular, trapezoidal, Gaus-

sian, sigmoidal or piece-wise linear. The type of membership function chosen is

mostly dependent on the application and based on available domain knowledge of the

problem.

Fuzzy classification system are an application of fuzzy theory. Expert knowledge

can be expressed in a natural way using linguistic variables, described by the fuzzy

set as described above. For example, the expert knowledge can be translated into a

fuzzy rule such as,

IF feature A is low AND feature B is medium THEN output = class 1.

Each of these rules has a basic “if...then..” formulation with a consequence that is

the output class. The final output is assigned by combining the outputs decisions from

each of the rules, this is done in a variety of ways, the most commonly employed are

majority voting and averaging the individual decisions. Depending on the application

requirement it might not be necessary to define all possible formulations of the rules

using the features since some possibilities may never be observed in the real data.

In contrast to the powerful neural networks that learn a model from the training

data that is more often difficult to interpret, fuzzy logic can built relying on the

knowledge of the domain experts who have better understanding and insight of the

system to their experience. The basis for fuzzy logic is natural language of human

communication because fuzzy logic is built on structures of qualitative description.

It is ‘computing with words’ quoting the author of fuzzy logic [75].

In Chapter 7, we will use fuzzy logic to design the otitis media grammar (rules)

used in the automated otitis media classifier designed to mimic the clinical diagnostic


decisions made by expert otoscopists while evaluating tympanic membranes.

3.3 Related Work

In this section, we present previous related works divided into three broad categories.

Firstly, computer-aided diagnosis—we give an overview of the existing work on dif-

ferent diagnostic systems. Secondly, vocabulary and grammar—a brief discussion on

the previous works on understanding the human perception of color patterns in a

vocabulary and grammar framework, which is the basic guiding principle followed

in this thesis. We then conclude with classification of otitis media—we present the

related work we are aware of in the literature on automated classification of otitis

media.

3.3.1 Computer-Aided Diagnosis

Understanding and interpreting medical images is arguably one of the most difficult

tasks of pattern recognition. In most medical fields, more often the diagnosis is

inferred manually by medical professionals. The accuracy of the diagnosis of such

manual evaluation shows high variability due to multiple factors such as level of

experience, bias, noise, or fatigue. There is a need for automated aids in the medical

image evaluation processes to achieve objective evaluation. Automation or semi-

automation of clinical diagnostics is done by using computerized systems, which will

automatically process the medical data and present an output useful to the medical

professional.

Using computers to aid in the analysis of medical image images is not new. The

goal of the computer-aided diagnosis (CAD) systems is to produce fast and accu-

rate decisions, reduce interpretation errors as well as variation between and within


observers through objective evaluation. CAD systems have been used in different

medical fields and as well as on a wide variety of acquisition modalities. The research

in the area of CAD has seen very high activity in the last three decades. We briefly

review a few CAD systems currently employed in clinical diagnosis. For more ex-

tensive review on research and development of CAD systems, we refer the readers

to [20,67].

One of the early CAD systems was interpreting radiology images; the main idea

was to provide a computer output as a useful “second opinion” to the radiologist.

Since then, a number of CAD systems have been developed and employed to help in

diagnostic decision making. For example, CAD systems are employed in automated

detection and classification of various abnormalities in mammograms [10], detecting

lung diseases [17], and brain tumor assessment [28], among many others.

3.3.2 Vocabulary and Grammar

To achieve our goal, we adopt the guiding principles below, partly inspired by [18,

19, 49–51]. The authors performed research on understanding how humans perceive

and measure similarity of color patterns. To understand this process, the authors

performed a subjective experiment that resulted in five perceptual criteria that the

subjects used to compare and associate similarity between color patterns. These

perceptual criteria were named as the vocabulary—a set of basic categories used by

humans in judging similarity of color patterns. The second aspect of the research

was understanding the relative importance and relationships between these basic cat-

egories, as well as hierarchy of rules to combine them—grammar.

In this thesis, we aim to find the corresponding vocabulary and grammar for otitis

media that make the otoscopist’s language.


• Vocabulary: A set of visual cues that characterize the images of tympanic mem-

brane according to expert otoscopists.

• Grammar : A set of rules that govern the association and hierarchy to combine

the vocabulary terms that mimic the clinical decision process of the expert

otoscopists.

3.3.3 Automated Classification of Otitis Media

To our knowledge, the only works in this area are [47,73], where the authors used color

features to classify otitis media. In both these works, the classification was formulated

as a two-class problem, distinguishing between normal cases and otitis media. There

was no distinction made between cases of AOM or OME but they were together

labeled as otitis media. The ground truth was provided by an otolaryngologist.

In [73], from each image a rectangular region from the tympanic membrane and

circular annular ring region from the auditory canal was selected by otolaryngologist;

the pixels from these selected regions were transformed in to the CIELAB space.

Only the chrominance channels a and b were used for computation.

During the training phase, a color pair (representative of tympanic membrane

and auditory canal) is formed by the expectation of the mono-modal distribution

of the tympanic membrane and auditory canal. The modal color of the tympanic

membrane is predicted using a linear regression on the modal color of the auditory

canal. The regression was applied on the principal components of the two color modes

obtained from principal component analysis. Two regression models were constructed

one for each class, namely normal and otitis. The classification was made based on

the model that gave least prediction error.The model performed poorly detecting 74%

of the normal cases and 62.5% of the otitis media cases.


In [47], the authors extract two features from the tympanic membrane images;

HSV color histogram and HSV color coherence vectors. These features are classified

using different standard classification methods such as k-nearest neighbors, decision

trees, linear discriminant analysis, naıve Bayes, NN, and SVM. The highest accuracy

of 73.11% reported using NN on color coherence vectors.

The authors in both [47,73] conclude that color alone is not sufficient to distinguish

otitis media from normal cases.

Chapter 4

Goals of the Thesis

In this chapter, we begin by outlining the challenges presented by tympanic membrane

images for processing. This is followed by the discussion of the guiding principles we

adhere to in the process of building an automated classifier. In the last section, we

present the diagnosis of otitis media as a classification problem by giving an overview

of the framework of our classificaiton system.

4.1 Gaps to Fill

As discussed in the previous chapter, there is a critical need to develop an automated

method that can process tympanic membrane images and classify them into AOM,

OME or NOE. The rationale underlying the research work presented in this thesis is

than an automated classification algorithm will enable clinicians to properly diagnose

and treat AOM, reducing the likelihood of adverse effects of bacterial resistance. Such

an automated classification system does not currently exist.

42

CHAPTER 4. GOALS OF THE THESIS 43

4.1.1 Challenges

The difficulties of reaching a clear diagnosis in otitis media arises from multiple

sources. The very nature of the disease presents difficulty since otitis media refers

to a continuum of middle ear infection conditions. The absence of a clear decision

boundary between OME and AOM makes the diagnosis a hard problem for general

pediatricians and sometimes even for experienced otoscopists.

The current standard of examination includes visual examination of the tympanic

membrane by inserting the otoscope along the ear canal while holding an often crying

and squirming child. In cases where the ear canal is blocked by cerumen, this adds

to the workload of the examiner where the cerumen must be removed from the ear

canal to have adequate visualization of the tympanic membrane.

Figure 4.1: Illustration of inter-class similarity. Examples of tympanic membraneimages of OME (left) and AOM (right) showing strong similarity in appearance.

Despite the challenges discussed above during a clinical examination, the tympanic

membrane images also pose challenges for computation. Since there is no standard

procedure for the acquisition of images, it depends solely on the examiner. This

causes many variations in the data such as nonuniform positioning of the tympanic

membrane, in-plane and out-of-plane rotations in the image, inadequate visualization

of the tympanic membrane, illumination problems (nonuniformity, local artifacts).

Failure to remove cerumen from the ear canal leads to occlusion of the tympanic


Figure 4.2: Illustration of intra-class variability. Examples of tympanic membraneimages of OME, different severity conditions along OME condition leads to differentpresentations.

membrane that makes the computation unreliable.

The variation that arises from the acquisition and the absence of clear separation

between OME and AOM presents “inter-class similarity”, where images from distinct

categories have similar appearance as shown in Figure 4.1. On the other hand, in

“intra-class variability”, the images from the same diagnostic category have different

appearance as shown in Figure 4.2.

Finally, the issue of the absence of gold-standard in differentiating diagnostic

categories of otitis media leads to disagreement on ground truth among the experts

making the development of an automated method a challenging task.

4.1.2 Guiding Principles

The prevalence of the problem, disagreement on ground truth and the other associated

challenges make our goal hard and ambitious. In our attempt to build an accurate

automated otitis media classifier, we follow these guiding principles.

Vocabulary We aim to design a feature set understood by both otoscopists and

engineers based on the actual visual cues used by otoscopists; we term this the otitis

media vocabulary.


To explore the diagnostic processes used, Drs. Shaikh and Hoberman conducted

a study to examine findings that the expert otoscopists use during their clinical di-

agnosis [63]. During the study, endoscopic still images of tympanic membranes of

783 children were obtained and examined by expert otoscopists. The examining oto-

scopist recorded information regarding a history of otalgia, and findings concerning

the following tympanic membrane characteristics: color (amber, blue, gray, pink, red,

white, yellow), translucency (translucent, semi-opaque, opaque), position (neutral,

retracted, bulging), mobility (decreased, not decreased), and areas of marked red-

ness, as distinct from mild or moderate redness (present, absent). A random sample

of 135 (in ratio 2:2:1 of AOM:OME:NOE) of these images was sent for review to an-

other group of 7 independent expert otoscopists, resulting in a data set of 945 image

evaluations. To control for differences in color rendition between computers, color-

calibrated laptops were mailed to each expert. They were asked to independently

describe tympanic membrane findings and assign a diagnosis of AOM/OME/NOE.

Just by evaluating still images, with no information about mobility or ear pain, the

diagnosis (AOM vs. no AOM) endorsed by the majority of experts was in agreement

with the live diagnosis 88.9% of the time, underscoring the limited role that symptoms

and mobility of the tympanic membrane have in the diagnosis of AOM. Live diagnosis

refers to the diagnosis based on physical examination and evaluation of the child at the

time of the encounter and is not based on images. Among both groups of otoscopists,

bulging of the tympanic membrane was the finding judged best to differentiate AOM

from OME. 96% of ears during live diagnosis and 93% of ear image evaluations were

assigned a diagnosis of AOM based on presence of bulging. By members of the

two groups who assigned the diagnosis of OME, bulging of the tympanic membrane

was reported in 0% and 3% of ears during live diagnosis and ear image evaluations

respectively. Opacification of the tympanic membrane was the finding that best


AOM OME NOE

Color White, pale yellow, markedly red White, amber, gray, blue Gray, pinkPosition Distinctly full, bulging Neutral, retracted Neutral, retractedTranslucency Opacified Opacified, semi-opacified Translucent

Table 4.1: Guidelines for vocabulary design: Otoscopic findings associated with clin-ical diagnostic categories of tympanic membrane images [63].

differentiated OME from NOE.

To design the otitis media vocabulary, we follow the guidelines in Table 4.1 that

summarizes these otoscopic findings.

Grammar We aim to design a rule-based decision process to combine the vocabu-

lary terms based on the decision process used by otoscopists; we term this the otitis

media grammar.

Figure 4.3: Guidelines for grammar design: Decision tree for the diagnosis of otitismedia [64].

To design the grammar, we use the findings from [64], where the authors em-

pirically examined the findings used by a group of expert otoscopists for diagnosing

otitis media. In this study, the relative importance of signs and symptoms in diagno-

sis of AOM was described and then used to develop a rule-based decision tree method


to diagnose otitis media. At each visit of the patient, the otoscopist recorded the

following tympanic membrane characteristics: color (amber, blue, gray, pink, white,

yellow), degree of opacification (translucent, semi opaque, opaque), position (neutral,

retracted, bulging), decreased mobility (yes, no), presence of air-fluid level(s) (yes,

no), and presence of areas of marked redness (yes, no). A decision tree was then

developed based on the recorded tympanic membrane characteristics using recursive

partitioning to classify the cases into one of the three diagnostic categories. This man-

ual decision tree uses two decisions to discriminate among the diagnostic categories;

first, bulging is used to distinguish AOM from OME and NOE, and if no bulging was

present, opacification or air-fluid level is used to distinguish between OME and NOE

(see Figure 4.3). For ease of reference, we name the diagnosis of AOM, NOE and

OME as Stage 1, 2, and 3, respectively.

To design the otitis media grammar, we follow the guidelines in Figure 4.3 that

summarizes this decision process.

4.2 Diagnosis as Classification

The main assumption in this thesis is that diagnosis can be viewed as classification.

Our goal is to classify diagnostic categories of otitis media into three clinically dis-

tinct categories. The signs and symptoms in combination with examination lead the

expert otoscopist to reach a conclusion or diagnosis of the condition being observed.

When viewed as a classification problem, we study the problem using classification

techniques, starting with learning an algorithm with data whose label is known and

using the learned algorithm to predict the labels for test examples.

As argued earlier, accurate diagnosis of otitis media requires both experience and

understanding of the domain in enough detail. We observed that the experts are


able to sort their way through details, clear the confusion and state a diagnosis with

reasonable confidence. We believe that through collaboration and feedback from the

experts, we can understand, formulate, and build a system to automate classification

of otitis media successfully, as this thesis demonstrates.

Figure 4.4: Block diagram of our proposed otitis media classifier.

In pursuit of building an automated classifier, we adhere to the guiding principles

discussed before. Our intuition behind the presented approach is that mimicking

the visual cues and decision process of otoscopists will lead to high classification

accuracy comparable to that of the expert otoscopists on the tympanic membrane

images. In Figure 4.4, we present the overall structure of the classification system.

The subsequent chapters will explain each of the constituent blocks in the otitis media

classifier in detail.

Chapter 5

Preprocessing

To compute features, image preprocessing is crucial because it is expected that some

regions in the image such as the ear canal are not relevant for computation, hence

it is necessary to delineate the tympanic membrane. Moreover, we aim to eliminate

or minimize the impact of image artifacts arising from illumination problems. These

artifacts will affect feature computation and hence must be corrected. To that end,

we start with an automated segmentation step to locate the tympanic membrane and

apply a local illumination correction to mitigate the problem of specular highlights.

If a captured image cannot be salvaged by local illumination correction, then it is

deemed not fit for processing and the image is rejected from further computation. In

a clinical setting, this rejection procedure in the algorithm could prompt the clinician

to retake the image. Unreliable images are also rejected based on global illumination

artifacts such as very bright appearance due to over exposure and dull appearance

due to under exposure, and occlusion of the tympanic membrane due to buildup of

cerumen in ear canal.

49

CHAPTER 5. PREPROCESSING 50

5.1 Automated Segmentation of the Tympanic Mem-

brane

Figure 5.1: Comparison of automated segmentation (top) and hand segmentation byexpert otoscopists (bottom).

Segmentation is a crucial step to extract relevant regions on which reliable features

for classification can be computed. We now briefly summarize an active-contour

based segmentation algorithm [32] we adapted for our purposes1: First, a so-called

snake potential of the grayscale version of the input image is computed, followed

by a set of forces that outline the gradients and edges of the image. The active-

contour algorithm [74] is then initialized by a circumference in the center of the image.

The algorithm iteratively grows this contour and stops at a predefined convergence

criterion, which leaves an outline that covers the relevant region in the image. This

outline is used to generate the final mask Xm that is applied to the input image

to obtain the final result shown in Figure 5.1. We evaluated the performance of1Automated segmentation of tympanic membrane was implemented by Dr. Pablo Hennings

Yeomans during the early phase of this project


the algorithm on automatically segmented images against hand segmented images by

expert otoscopists, and found that we can automatically segment prior to classification

without hurting the performance of the classifier. By adding this segmentation stage,

the classification system becomes completely automated by not requiring the clinician

to specify where the tympanic membrane is positioned.

5.2 Image Correction: Inpainting Tympanic Mem-

brane Images

One of the problems encountered is the presence of specular highlight regions caused

by residual cerumen (wax) in the ear canal and wax on surface of the hair in the ear

canal, which might remain after the examination. Cerumen reflects the light from the

otoscope, which results in white regions in the image as shown in Figure 5.2 (top).

These regions of local specular highlights have to be corrected.

Our aim is to detect the specular highlights in the image and locally correct them.

We use a simple thresholding scheme on image intensities to identify the specular

highlight areas with white pixels. These detected regions are shown in Figure 5.2

(middle row). Once these regions are detected, we apply Poisson image editing tech-

nique [57] explained in Section 3.2.2 to each color channel separately. The local image

correction is achieved by replacing the white pixels with pixel intensities approximated

by interpolating the pixel intensities from the neighborhood of the specular highlight

areas. The corrected images are shown in Figure 5.2 (bottom row).


Figure 5.2: Correction of specular highlights for AOM (left), OME (middle) and NOE(right). Input images are in the top row, identification of specular highlight regionsin the middle row, and correction of the identified regions in the bottom row.


5.3 Rejection of Unreliable Data

The frequently encountered problem in the tympanic membrane images is variation

presented by image illumination. Nonuniform illumination produces both local arti-

facts such as specular highlights and global artifacts such as dark or very-brightly lit

images.

5.3.1 Rejection due to Specular Highlights

Some of the segmented images may contain large regions of white pixels due to over-

exposure. Poisson image editing method relies on using the neighboring pixels to

approximate intensities in the region to be corrected, and thus, are effective only

when the region to be corrected is small. We empirically found that if the area of

continuous white pixels is more than 15% of total pixels in the segmented tympanic

membrane image, correcting such regions gives unreliable results and hence such an

image should be rejected. Our aim is to use the rejection stage in the real application

and prompt the clinician to retake the image until deemed suitable for processing.

5.3.2 Rejection due to Over/Under Exposure

Depending on the angle and amount of light incident on the membrane and the

ear canal, we encounter different illumination problems related to brightness and

contrast. Artifacts such as shading, shadows, and changes due to global variation in

the intensity or color due to overexposure or underexposure will also affect feature

computation. Both overexposure and underexposure results in loss of detail in images

and leads to very bright appearance and dark appearance respectively.

One of the most commonly used methods to characterize the distribution of pixel


Figure 5.3: Examples of rejected images from each class with AOM (left), OME(middle) and NOE (right). Top row corresponds to images rejected due to washed outappearance and bottom row corresponds to images rejected due to dull appearance.

intensities in a image is a histogram. We calculate the histogram statistics descriptor

(HSD) [21] to describe the distribution of pixel intensities. The six statistical measures

are mean, standard deviation, second central moment (also known as R measure),

the third central moment, the uniformity measure and the entropy. The mean of an

image gives the measure of average intensity whereas the standard deviation gives

the measure of average contrast. The second central moment reflects the relative

smoothness of the intensity in a region. The third central moment measures the

skewness of a histogram. The entropy of an image measures the degree of randomness

of the image. The entropy is usually calculated from the first-order histogram of an

image. The uniformity measure is a factor inversely proportional to the variance

of an image. It measures the uniformity of intensity in the histogram. A training

set was selected manually containing two classes of data; images to be rejected due

to overexposure/underexposure and images of proper illumination that are fit for

further computation. An SVM classifier was trained for a two-class problem on the


six features extracted from the images in the training set. Figure 5.3 shows examples

of images rejected using this procedure.

5.3.3 Rejection due to Presence of Cerumen

Figure 5.4: Examples of rejected images. Top row corresponds to input images andbottom row corresponds to images showing detected wax regions.

Build up of cerumen in the ear canal leads to inadequate visualization of the

tympanic membrane. Computation on such tympanic membrane images lead to un-

reliable results. We aim to reject these images from further computation and in a

clinical setting, this rejection step can be used a prompt to the clinician for cerumen

removal.

The areas of cerumen in the tympanic membrane image is detected using a color-

assignment technique outlined in Section 6.3.5, used to measure translucency of a

tympanic membrane. During the training phase, regions of wax are hand segmented.

The pixels from these hand-segmented regions are clustered using K-means algorithm.

To detect cerumen region in a image X, for each pixel (m,n), the Euclidean distance

between the pixel and the cluster centers is computed. If any of the K distances fall


below a threshold (Tt = 10, found experimentally), the pixel is labeled as a cerumen

pixel. This results in a binary image Xc shown in Figure 5.4 (bottom row) indicating

cerumen and noncerumen regions. The degree of cerumen in the image is defined

as mean of Xc and is used to reject images where the amount of pixels labeled as

cerumen occupy more than 10% of the segmented image.

Chapter 6

Otitis Media Vocabulary

6.1 Main Idea

The expert otoscopist uses specialized knowledge when discriminating between the

different diagnostic categories. The goal of our proposed methodology is to create a

feature set—otitis media vocabulary, which will mimic the visual cues used by trained

otoscopists to diagnose otitis media.

6.2 Methodology

To design the otitis media vocabulary we will follow the process outlined in [3], where a

histopathology vocabulary was designed for automated identification and delineation

of tissues in images of H&E-stained teratomas. Similar vocabulary features were used

in [46] for automated detection of colitis.

Formulation of initial set of descriptions We obtain initial descriptions of those

characteristics best describing a given diagnostic category from the summary of oto-

scopic findings in Table 4.1.

57

CHAPTER 6. OTITIS MEDIA VOCABULARY 58

Computational translation of key terms From this set, the key terms, such as

bulging, are translated into their computational synonyms, creating a computational

vocabulary. In our case, we construct a feature called bulging, which measures the

area of the bulged region in the tympanic membrane.

Computational translation of descriptions Using the computational vocabu-

lary, the entire otoscopist’s descriptions, such as bulging and white, are translated.

Verification of translated descriptions Based on these translated descriptions,

and without access to the image, the otoscopist tries to identify the diagnostic cat-

egory being described, emulating the overall system with translated descriptions as

features and the otoscopist as the classifier.

Refinement of insufficient terms If the otoscopist is unable to identify a diag-

nostic category based on translated descriptions, or if a particular translation is not

understandable, then that translation is refined and presented again to the otoscopist

for verification.

Otitis media vocabulary If the otoscopist is able to identify a diagnostic category

based on translated descriptions, then the discriminative power of the key terms and

their corresponding computational interpretations are validated, and these terms can

be included as otitis media vocabulary terms to create features.

This feedback loop is iterated until a sufficient set of terms have been collected to

formulate the otitis media vocabulary:

bulging fb central concavity fc light f` malleus presence fm

translucency ft amber level fa bubble presence fbp grayscale variance fv

.


6.3 Vocabulary

We designed the vocabulary features, bulging, central concavity, malleus presence,

translucency, amber level, and bubble presence based on otoscopic findings listed

in Table 4.1. Supplementing the features designed based on otoscopic findings, we

designed an additional two, light and grayscale variance, based on our observations

and to catch classifier errors.

1. The first three vocabulary features, bulging, central concavity, and light, describe

the distinct characteristics associated with AOM.

2. The next two vocabulary features, malleus presence and translucency, are in-

dicative of NOE.

3. The final three vocabulary features, amber level, bubble presence, and grayscale

variance, describe the characteristics of OME.

We now explain each of the vocabulary features in detail.

6.3.1 Bulging

In [64], the authors showed that bulging of the tympanic membrane is crucial for

diagnosing AOM. We will design a feature that calculates the percentage of bulged

region in the tympanic membrane; we call it the bulging feature. The goal is to

derive a 3D tympanic membrane shape from a 2D image, by expressing it in terms

of depth at each pixel. For example, in AOM, we should be able to identify high-

depth variation due to bulging of the tympanic membrane in contrast to low-depth

variation in NOE due to tympanic membrane being neutral or retracted. The shape

from shading technique [70] can be applied to recover a 3D shape from a single


monocular image. The input is a grayscale scale version of the segmented original

RGB image X ∈ RM×N as shown in Figure 6.1(a). The depth at each pixel can be

calculated in an iterative manner using the image gradient and a linear approximation

of the reflectance function of the image. Figure 6.1(b) shows the result of depth map

Xd identifying the bulged regions in the tympanic membrane. The depth map Xd is

then thresholded at Td (here Td = 0.6) to obtain a binary mask Xb of bulging regions

in the tympanic membrane.

(a) Original image. (b) Depth recovered showing the bulged area in red.

Figure 6.1: Computation of the bulging feature.

We then define the bulging feature as the mean of Xb,

fb = E[Xb ] .

6.3.2 Central concavity

The tympanic membrane is attached firmly to the malleus that is one of the three

middle ear bones called auditory ossicles. In the presence of an infection, the tympanic

membrane begins to bulge in the periphery. The central region, however, remains

attached to the malleus forming a concavity. We design a feature to identify the

concave region located centrally in the tympanic membrane; we call it the central

concavity feature. The input is a grayscale version (Figure 6.2(a)) of the segmented

original RGB image X ∈ RM×N as in Figure 5.1. We extract a circular neighborhood


of radius R around the pixel (m,n). This circular neighborhood is then transformed

into its polar coordinates to obtain XR(r, θ), with r ∈ 1, 2, . . . , R, θ ∈ [0, 2π], and

r =√

(m−mc)2 + (n− nc)2, θ = arctan (n− nc)(m−mc)

,

where (mc, nc) are the center coordinates of the neighborhood XR. In Figure 6.2(b),

the resulting image has r as the horizontal axis and θ as the vertical one. The

concave region changes from dark to bright from the center towards the periphery

of the concavity; in polar coordinates this change from dark to bright occurs as the

radius grows, see Figure 6.2(b). Defining the bright region B = (r, θ) | r > R′ and

the dark region D = (r, θ) | r ≤ R′, and with R′ ∈ [1/4R, 3/4R], we compute the

ratio of the two means,

fc,R′ =E[XR(r, θ) |(r,θ)∈B

]E[XR(r, θ) |(r,θ)∈D

] ,As the concave region is always centrally located, we experimentally determine a

square neighborhood I (here 151× 151) to compute the central concavity feature,

fc = max(m,n)∈I,R′

fc,R′ .

(a) Grayscale. (b) Polar. (c) Labeled.

Figure 6.2: Computation of the central concavity feature.


6.3.3 Light

Examination of the tympanic membrane is performed by an illuminated otoendoscope.

The distinct bulging in AOM results in nonuniform illumination of the tympanic

membrane, in contrast to the uniform illumination in NOE. Our aim is to construct

a feature that will measure this nonuniformity as the ratio of the brightly-lit to the

darkly-lit regions; we call it the light feature.

We start by performing contrast enhancement on the grayscale image in Fig-

ure 6.3(a) to make the nonuniform lighting prominent. The resulting image in Fig-

ure 6.3(b) is thresholded at T` (found experimentally) to obtain a mask of the brightly-

lit binary image Xbl in Figure 6.3(c).

(a) Grayscale. (b) Contrast-enhanced. (c) Dominant orientation.

Figure 6.3: Computation of the light feature.

To find the direction (θmax) perpendicular to the maximum illumination gradi-

ent, we look at lines passing through (mc, nc) (the pixel coordinates at which fc

is obtained) at the angle θ with the horizontal axis. Defining the bright region

B = (m,n) | n ≥ tan(θ)(m − mc) + nc and the dark region D = (m,n) | n <

tan(θ)(m−mc) + nc, we compute the ratio of the two means,

r(θ) =E[Xbl(m,n) |(m,n)∈B

]E[Xbl(m,n) |(m,n)∈D

] .


Then, the direction perpendicular to the maximum illumination gradient is given

by

θmax = arg maxθ

r(θ),

and we define the light feature as

f` = r(θmax).

6.3.4 Malleus presence

In OME or in NOE, the tympanic membrane position is either neutral or retracted

and makes the short process of the malleus visible. We design a feature to detect the

partial or complete appearance of the malleus that would help in distinguishing AOM

from OME and NOE; we call it the malleus presence feature. To identify the presence

of the malleus, we perform an ellipse fitting (shown as a red outline in Figure 6.4(a))

to identify the major axis. The image is then rotated to align the major axis with the

horizontal axis. Mean-shift clustering [14] is then performed as shown in Figure 6.4(b),

followed by Canny edge detection [9]. Hough transform [16] is applied on the obtained

edges around the major axis (50-pixel neighborhood empirically obtained) to detect a

straight line (shown in red Figure 6.4(c)) extending to the periphery that will indicate

the visibility of the malleus. If such a line is detected then the feature malleus presence

fm is assigned a value of 1 and 0 otherwise.

6.3.5 Translucency

Translucency of the tympanic membrane is the main characteristic of NOE in contrast

with opacity in AOM and semi-opacity in OME; it results in the clear visibility of

the tympanic membrane, which is primarily gray. We design a feature to measure the


(a) Ellipse fitting. (b) Mean-shift clustering. (c) Malleus detection.

Figure 6.4: Computation of the malleus presence feature.

grayness of the tympanic membrane; we call it the translucency feature. We do that

by using a simple color-assignment technique. As these images were acquired under

different lighting and viewing conditions, according to [2], at least 3–6 images are

needed to characterize a structure/region under all lighting and viewing conditions.

We take the number of images to be Ntl = 20.

To determine gray-level clusters in translucent regions, we extract Nt pixels from

translucent regions (Nt = 100) of Ntl RGB images by hand segmentation, to obtain

a total of NtlNt pixels from images (here 2000). We then perform clustering of

these NtlNt pixels using k-means clustering to obtain K cluster centers ck ∈ R3,

k = 1, 2, . . . , K, (K = 10) capturing variations of gray in the translucent regions.

To compute the translucency feature for a given image X, for each pixel (m,n), we

compute K Euclidean distances of X(m,n) to the cluster center ck, k = 1, 2, . . . , K,

dk(m,n) =

√√√√ 3∑i=1

(Xi(m,n)− ck,i)2,

with i = 1, 2, 3, denoting the color channel. If any of the computed K distances falls

below a threshold Tt = 10 (found experimentally), the pixel is labeled as translucent

and belongs to the region Rt = (m,n) | mink dk(m,n) < Tt. The binary image Xt

is then simply the characteristic function of the region Rt, Xt = χRt .


We then define the translucency feature as the mean of Xt,

ft = E[Xt ] .

6.3.6 Amber level

We use the knowledge that OME is predominantly amber or pale yellow to distinguish

it from AOM and NOE. We design a feature to measure the presence of amber in the

tympanic membrane; we call it the amber feature. We apply a color-assignment tech-

nique similar to that used for computing Xt to obtain a binary image Xa, indicating

amber and nonamber regions. We define the amber feature as the mean of Xa,

fa = E[Xa ] .

6.3.7 Bubble presence

The presence of visible air-fluid levels, or bubbles, behind the tympanic membrane is

an indication of OME. We design a feature to detect the presence of bubbles in the

tympanic membrane; we call it the bubble presence feature. The algorithm takes in red

and green channels of the original RGB image and performs Canny edge detection [9],

to place parallel boundaries on either sides of the real edge, creating a binary image

Xbp in between. This is followed by filtering and morphological operations to enhance

edge detection and obtain smooth boundaries. We then define the bubble feature as

the mean of Xbp,

fbp = E[Xbp ] .


6.3.8 Grayscale variance

Another discriminating feature is the variance of the intensities across the grayscale

version of the image Xv. We define the feature grayscale variance as the variance of

the pixel intensities in the image Xv,

fv = var(Xv) ;

for example, OME has a more uniform appearance than AOM and NOE, and has

consequently a much lower variance that can be used to distinguish it from the rest.

Chapter 7

Otitis Media Grammar

7.1 Main Idea

The modeling of human perception of otitis media diagnosis is new—starting with the

vocabulary feature design and the set of rules considered as the basic grammar of the

otoscopist’s language. For designing the grammar, it is important to understand the

way these rules are applied. An important aspect of our work is to use feedback from

expert otoscopists to improve classification performance by mimicking their diagnostic

process.

7.2 Grammar

In this section, we present the design process of grammar. We begin by presenting

the initial grammar that consists of a set of rules used to combine six vocabulary

features. This is followed by an improved grammar that consists of a set of rules to

combine eight vocabulary features mimicking the decision process designed by expert

otoscopists exactly. Finally, we present the grammar implemented using fuzzy logic.

67

CHAPTER 7. OTITIS MEDIA GRAMMAR 68

7.2.1 Hierarchical-Rule based Grammar

In [36], we designed an initial grammar shown in Figure 7.1, a simple hierarchi-

cal classifier that uses two levels. At the first level, binary decisions were used

to split the images into two superclasses; AOM/OME (acute infection/middle ear

fluid infection) and NOE/OME (no infection/middle ear fluid infection). At the

second level, these superclasses were split into individual diagnostic categories us-

ing a weighted combination (wa, wbp, wt, wv) of four features, amber level fa, bubble

presence fbp, translucency ft, and grayscale variance fv. A weighted combination,

wafa + wbpfbp + wtft + wvfv was used to split superclasses into AOM/OME/NOE.

Figure 7.1: Initial grammar for classifying otitis media.

We then modified the grammar in [37] to mimic the decision process used by

expert otoscopists in Figure 4.3 exactly. The decision process uses a hierarchical rule-

based classification scheme based on the domain knowledge of the expert otoscopists.

The classification is done in three stages by distinguishing one diagnostic category at

a time: AOM (Stage 1), NOE (Stage 2), and OME (Stage 3) respectively, which we

now describe in more detail.


Figure 7.2: Stage 1: Grammar for identifying AOM.

Stage 1: Identification of AOM

As the first stage, we detect the instances of AOM based on bulging, light, central

concavity, and malleus presence features as shown in Figure 7.2. Ideally, if there

is bulging present, the image should be classified as AOM as shown in Figure 4.3,

but our bulging feature alone cannot accomplish the task. We use the other features

in the otitis media vocabulary that describe the AOM characteristics such as light,

central concavity, and malleus presence in order to aid separation of AOM from NOE

and OME. In some cases, OME images can exhibit partial bulging and therefore have

a high possibility of being grouped as AOM. In such cases, we use low amber level to

distinguish AOM from OME.

Stage 2: Identification of NOE

Low value of bulging, light, central concavity, and malleus presence features elimi-

nates the possibility of AOM being the diagnosis. Such a situation results in either the

diagnosis being NOE or OME (see Figure 7.3). In Stage 2, our goal is to distinguish

NOE from OME. The translucency feature, which is the most distinguishing char-

acteristic of NOE, can be used here to identify normal cases. In this stage, NOE is


Figure 7.3: Stage 2: Grammar for identifying NOE. (Black arrows/boxes denote thosepaths belonging to this stage; gray ones belong to Stage 1.)

identified from the superclass NOE/OME by a high value of the translucency feature,

or low values of all the features characteristic of OME: amber level, bubble presence,

and grayscale variance.

Stage 3: Identification of OME

Figure 7.4 shows the complete otitis media grammar. Most of OME cases are iden-

tified from the superclass NOE/OME from Stage 2 as high values of amber level,

bubble presence, and grayscale variance features. Some cases of OME can exhibit

partial bulging resulting in high values of the bulging feature; in such cases, we can

correctly detect OME if the values of light and central concavity features are low,

and the value of amber level feature is high.

The threshold values for the features were calculated during the training phase of


Figure 7.4: Stage 3: Grammar for identifying OME. (Black arrows/boxes denotethose paths belonging to this stage; gray ones belong to Stages 1 and 2.)

the algorithm. We performed a five-fold nested cross-validation. During each fold,

the data was split into training and testing, and the training set was further split into

two sets: learning and validation. We used misclassification rate of the validation set

as the criterion to learn the threshold for each split. The threshold was fixed where we

obtained the least misclassification rate during training and was used on the testing

set.

The complete otitis media grammar that we designed in Figure 7.4 thus follows

the exact structure of the decision tree designed by expert otoscopists in Figure 4.3.


7.2.2 Fuzzy-Logic based Grammar

The hierarchical rule-based decision process presented above is constructed using

binary splits on a feature at each node evaluated based on a threshold learned during

the training stage. In our effort to closely mimic the otoscopists’ clinical decision

making process, we present a modified rule-based grammar by employing fuzzy-logic

based decisions. Fuzzy logic is often employed to capture the imprecise modes of

reasoning that play an essential role in the human ability to make decisions. Let

us consider an example in the context of diagnosis of otitis media. One of most

significant diagnostic decision making rule used by an expert otoscopist has the form:

If bulging is high, then it is AOM.

The quantity high is a linguistic variable and there is no corresponding precise

real value that differentiates the high from not high. A set by definition is a collection

of elements that have a definite membership, that is, either they belong to the set or

they do not. Referring back to our example of bulging, in the case of the grammar

presented in Section 7.2.1, bulging in an image was described as high or low as shown

in Figure 7.5. Note how the first two tympanic membrane images were assigned a

value of 0 for bulging despite the distinct difference in the amount of bulging between

them. Such representation does not work very well when trying to describe a real-

world problem like clinical diagnosis. Another drawback of this lack of distinction

is that though there is presence of some bulging in the second tympanic membrane

image, the binary membership function forces us to assign no bulging at all.

In such situations, the fuzzy set approach provides a much better representation of

the amount of bulging in the image. The set in Figure 7.6, is defined by a continuously

inclining function. The membership function for fuzzy set allows for a range of values

[0, 1]. The vertical axis shows the membership value of the bulging in the fuzzy set.


Figure 7.5: Example of a binary membership function.

Here, the first image has a membership value of 0 since there is no bulging present,

whereas the second image gets a membership of 0.45 that is a not-very-high bulging,

and the last image gets a membership of 1.0 for presenting definitely high amount of

bulging.

The grammar shown in Figure 7.4 built on the exact structure of the decision

tree designed by expert otoscopists is modified using a fuzzy inference system to

incorporate the notion of non-abrupt feature memberships. The fuzzy inference sys-

tem consists of mainly five functional layers: (1) input, (2) fuzzification, (3) decision

rules, (4) decision making, and (5) output defuzzification. Each of the layers must be

defined for the otitis media classifier which we describe below.

Layer 1: Input Layer. At the onset of the fuzzy inference system we define the

input image as a set of otitis media vocabulary features.

Layer 2: Fuzzification Layer. The membership functions for each feature is de-

fined. The membership degree is calculated for all the features. In the otitis media


Figure 7.6: Example of a continuous membership.

classifier, each input vocabulary feature is defined by two membership functions de-

fined as sigmoidal functions given by,

f(x, a, b) = 11 + e(−a(x−b)) .

Depending on the sign of the parameter a, the sigmoidal membership function is

active for lower or higher values of x. The parameter b controls the position of the

activation. For each of the vocabulary features, the membership degree is computed

with two membership functions, low and high as shown in Figure 7.7. All the fuzzy

values of the membership function are initialized using the threshold values obtained

from the training while building the grammar shown in Figure 7.4. The output

consists of three membership functions, one for each of the diagnostic categories

defined by a constant function.

Layer 3: Decision Rules Layer. Here the fuzzy rules are defined using the mem-


(a) Low. (b) High.

Figure 7.7: Examples of membership function using sigmoidal functions.

bership degree of each vocabulary feature. In this layer, each rule is linked to its

outcome represented by a diagnostic category. The fuzzy-logic based grammar of

otitis media classifier is derived from 7.4 consists of the following rules:

1. If bulging is high and light is high then AOM.

2. If bulging is high and light is low and central concavity is high then AOM.

3. If bulging is high and light is low and central concavity is low and amber is low

then AOM.

4. If bulging is low and light is high and central concavity is high then AOM.

5. If bulging is low and light is high and central concavity is low and malleus

presence is low then AOM.

6. If bulging is low and light is low and translucency is high then NOE.

7. If bulging is low and light is low and translucency is low and amber is low and

bubble is low then variance is low then NOE.



presence is high and translucency is low and amber is low and bubble is low

then variance is low then NOE.


presence is high and translucency is high then NOE.

10. If bulging is high and light is low and central concavity is low and amber is high

then OME.

11. If bulging is low and light is low and translucency is low and amber is high then

OME.


bubble is high then OME.


bubble is low then variance is high then OME.


presence is high and translucency is low and amber is high then OME.


presence is high and translucency is low and amber is low and bubble is high

then OME.


presence is high and translucency is low and amber is low and bubble is low

then variance is high then OME.


Layer 4: Decision Making Layer. Each of the rules defined in Layer 3 is evaluated

to obtain individual outputs. These individual outputs are defined by the degree of

the output membership functions that are passed on to the next layer.

Layer 5: Output Defuzzification Layer. In this layer the final output is obtained as

the weighted average of the degree of all the output membership functions in Layer 4.

We used the adaptive neuro-fuzzy interface system (ANFIS) available in the Fuzzy

Logic toolbox in MATLAB Version 7.12.0.635 (R2011a). The hybrid optimization

method is used to tune the membership function parameters during the training

phase. The parameters associated with the membership function changes during

the learning process in the training phased using labeled data. The optimization

process stops depending on two parameters, preset value of error measure or number

of iterations, whichever is reached first. The error measure is defined as the sum of

the squared difference between actual and predicted outputs.

Chapter 8

Experimental Results

We now present the results of applying our otitis media classifier on the tympanic

membrane images and compare it to the performance of other automated classifiers.

In the first section, we discuss the data set used in this work. In the second sec-

tion, we present the process of obtaining ground truth for the tympanic membrane

images from three expert otoscopists. We also present diagnosis provided by three

general pediatricians on images diagnosed by experts. In the next section, we dis-

cuss the different classification algorithms used for comparing the performance of our

method. The last section focuses on the experimental results of the classifiers with

the corresponding discussion.

8.1 Data Set

As part of a clinical trial evaluating the efficacy of antimicrobials in young children

with acute otitis media, 826 tympanic membrane images at a resolution of 480 × 640

were collected using an otoendoscope from children with AOM, OME and NOE [27].

These images are collected by Dr. Hoberman [26] and Dr. Shaikh [62] at the Chil-

78

CHAPTER 8. EXPERIMENTAL RESULTS 79

dren’s Hospital of Pittsburgh of University of Pittsburgh Medical Center.

8.2 Ground Truth

Each tympanic membrane image is assigned one of three diagnostic categories; AOM,

OME or NOE. For our experiments, the ground truth is obtained from a panel of three

expert otoscopists. To understand the diagnostic accuracy in a more real clinical set-

ting where otitis media is evaluated on a regular basis by non-expert otoscopists such

as general pediatricians or family physicians, we also present the diagnosis provided

by three general pediatricians on the images evaluated by the three expert otoscopists.

8.2.1 Diagnosis by Expert Otoscopists

A panel of three expert otoscopists examined these images and assigned a diagnosis

for each image. As these images pose challenges even for expert otoscopists, the agree-

ment was rather poor in labeling the images. Having accurate ground-truth labels is

crucial for algorithm development, and thus, we asked the panel to re-diagnose the

entire data set while also providing a diagnosis confidence level for each image; levels

between 80-100 indicated high confidence in diagnosis, while levels below 30 indicated

almost no confidence in diagnosis. Based on confidence and agreement on diagnosis

among the experts we divided 826 tympanic membrane images to 3 non-overlapping

data sets.

Data Set 1

We select a subset from the original set of 826 images for which the three experts

gave the same diagnosis and expressed confidence of over 60 in that diagnosis. The

number of images in this ground-truth set is 181; 63 AOM, 70 OME, and 48 NOE.


We call this set of images as data set 1 (DS1).

Data Set 2

We select a subset from the original set of 826 images for which three experts gave the

same diagnosis irrespective of confidence of diagnosis presented. The total number

of images in this ground-truth set is 390; 267 AOM, 82 OME, and 41 NOE. We call

this set of images as data set 2 (DS2).

8.2.2 Data Set 3

We select a subset from the original set of 826 images for which at least two experts

assigned the same diagnosis irrespective of confidence of diagnosis presented. For this

set of images, the labels were assigned by taking the majority vote of the diagnosis

among the three experts. The total number of images in this ground-truth set is 248;

58 AOM, 112 OME, and 78 NOE. There is a high inter-expert diagnosis variability in

this set of images. To better understand how challenging this diagnosis task can be

even for the experts, we present the diagnosis provided by each of the experts for 248

images in the data set 3 (DS3) in Table 8.1. The number and percentage of images in

data set DS3 on which the two experts assigned same diagnosis is shown in Table 8.2.

AOM OME NOE

Expert 1 73 52 123Expert 2 39 166 43Expert 3 58 131 59

Table 8.1: High variability in the diagnoses among the three expert otoscopists onthe tympanic membrane images in data set DS3. The rows correspond to the totalnumber of images assigned by an expert to each diagnostic category.

Such variability presented by three experts in their diagnosis underscores the fact

that even for these highly-trained expert otoscopists, this is a challenging task.


Experts (1, 2) (2, 3) (1, 3)

No. of images 73 81 94Agreement (%) 29.4 32.7 37.9

Table 8.2: Agreement of diagnoses by two expert otoscopists on the diagnosis oftympanic membrane images in data set DS3.

From the original set of 826 images, we exclude 7 images from our evaluations as

they were assigned different diagnostic category by each expert.

8.2.3 Diagnosis by General Pediatricians

To validate the algorithm against a realistic diagnostic situation, we asked three

general pediatricians to examine our ground-truth set of 181 tympanic membrane

images provided by expert otoscopists. The experiment also required them to state

their level of confidence in diagnosing each of the tympanic membrane images. In

cases of diagnosis with high confidence, the examiner assigned only one diagnostic

category to the image, whereas in cases where the confidence of diagnosis was either

medium or low, the examiner was asked to also provide a second possible choice

of diagnosis, resulting in two diagnoses of an image representing first and second

diagnostic choices, respectively.

To evaluate how the group of three general pediatricians performed on the ground-

truth data set DS1, Table 8.3 shows three confusion matrices: the first is the average

diagnosis by the three pediatricians, while the other two are average diagnoses with

high and medium/low confidence, respectively. The diagnostic accuracy that was

obtained as an average of the accuracies from the three examining pediatricians was

found to be 79.6% (91.7%, 75.7%, and 71.3%, respectively), well below that of expert

otoscopists that we use as our ground truth of 100%.

In terms of misdiagnoses, NOE and OME are the categories with the highest level


Total High confidence Medium/low confidenceAOM OME NOE AOM OME NOE AOM OME NOE

AOM 62 1 0 60 0 0 2 1 0OME 11 56 3 6 37 1 5 19 2NOE 4 18 26 1 8 15 3 10 11

Accuracy 79.6%

Table 8.3: Diagnoses by three general pediatricians (columns) versus the ground truthof expert otoscopists (rows).

of misdiagnosis. The misdiagnosis of OME as AOM (15.7%) is clearly a cause of

concern since it leads to the unnecessary prescription of antibiotics. Similarly, NOE

is often misdiagnosed as OME (37.5%). It is surprising to note that only 50% cases of

NOE were diagnosed with high confidence, of which 9 out of 24 were misdiagnosed.

In the remaining 50% cases, 13 out of 24 (54.2%) were misdiagnosed as OME; such

instances of misdiagnosis may lead to unnecessary treatment procedures.

8.3 Automated Classifiers for Comparison

To validate our algorithm, we also compare it to five automated classifiers, three of

which we designed previously, correlation filter classification system1, multiresolution

classifier and SIFT and shape descriptors using SVM classifier2, and two that are

available in the literature, WND-CHARM classifier and random forest classifier. We

now briefly describe each of these. Note that for all the experiments, we used a 5-fold

cross validation setup.1Correlation filter classification system was implemented by Dr. Pablo Hennings Yeomans during

the early phase of this project2SIFT and shape descriptors using SVM classifier was implemented by Dr. Pedro Quelhas during

the early phase of this project


8.3.1 Correlation Filter Classification System

In this classifier, the image is first transformed into the polar domain. Overlapping

concentric annular regions of different radii are extracted from the image. The center

of the annular regions is assigned as the centroid of the segmented tympanic membrane

image. During the training phase, templates of annular regions for each class are

obtained. These templates are then used to assign a class label to the test images

based on their similarity using normalized cross correlation measure.

8.3.2 Multiresolution Classifier

The multiresolution classifier, which was designed for biomedical applications [13],

decomposes the image into subbands using a multiresolution decomposition (for ex-

ample, wavelets or wavelet packets), followed by feature extraction and classification

in each subband using neural networks (any classifier can be used in each individual

subband) and a global decision based on weighted individual subband decisions. We

ran the multiresolution classifier with 2 levels and 26 Haralick texture features on the

grayscale image and each of the 20 subbands (546 in total).

8.3.3 SIFT and Shape Descriptors with SVM Classifier

In this classifier, we combined SIFT descriptors and shape features. SIFT descrip-

tors [40, 41] are first extracted from the images using the VLFeat library [72]. The

shape features were used as an attempt to detect bulging in the tympanic membrane.

The main idea was to extract areas with bright and dark symmetry. On the segmented

image, we applied phase symmetry detection algorithm described in [34]. Bright and

dark regions were segmented using Otsu thresholding algorithm [53], resulting in two

masks; one for the bright bulging regions and the other for the rest. Based on these


masks the following features were computed: total area of bright regions, total area

of dark regions, average symmetry measure in bright areas, number of dark regions,

number of bright regions, and mean area of bright regions. The SIFT descriptors

and shape features are normalized and combined using a bag-of-words model. The

classification was performed using support vector machine [12].

8.3.4 WND-CHARM Classifier

This is a universal classifier that extracts a large number (4,008) of generic image-level

features [65]. The computed features include polynomial decompositions, high con-

trast features, pixel statistics, and textures. These features are derived from the raw

image, transforms of the image, and compound transforms of the image (transforms

of transforms). The algorithm performs a feature selection during the training stage

by assigning a weight to each feature depending on its ability to distinguish between

the classes. These weighted features are then used to classify test images based on

their similarity to the training classes using nearest neighbor algorithm.

8.3.5 Random Forest Classifier

This is an ensemble classifier [8] that consists of many decision trees, and outputs the

class that is the result of a majority vote of the classes output by individual trees.

The random forest was constructed on the 8 otitis media vocabulary features. At

every node in the tree, a subset of 5 features out of 8 was randomly selected. The

split at each node was performed on the feature from this subset that gave the best

performance. The number of trees in the forest is fixed as 500 since during multiple

runs of random forest we observed that the out-of-bag error converged in the range

of 475–500 trees. We used the implementation of random forest in [29].


8.4 Classification of Tympanic Membrane Images

In this section, we discuss the performance of classifiers on the tympanic membrane

images. The experimental results are presented for each of the data sets (DS1, DS2,

and DS3) and the corresponding reduced set of images after applying the rejection

procedure presented in Section 5.3.

8.4.1 Results: DS1

DS1 consists of the 181 images on which all the three experts stated the same diag-

nosis with high confidence. Given the nature of this data set, we have an opportunity

to understand the discriminative power of our algorithm designed using vocabulary

features and grammar governing the decision rules. Our goal is to achieve classifica-

tion accuracy comparable to the diagnostic capability of the experts while classifying

tympanic membrane images. To that end, we present the performance of the auto-

mated classifiers discussed in Section 8.3 in comparison with the three versions of

otitis media classifiers starting with our classifier built during the early phase of the

project [36] using six vocabulary features. This was further improved in [37] using

eight vocabulary features and finally we present the otitis media fuzzy logic classifier.

Table 8.4 compares the performance of the diagnosis on the data set of 181 images

by three general pediatricians (GP), as well as eight classifiers: correlation filter

classification system (CFC), WND-CHRM (WCM), multiresolution classifier (MRC),

SIFT and shape descriptors with SVM classifier (SSC), random forest classifier (RF),

our initial classifier from [36], otitis media classifier [37] (OMC), and otitis media

fuzzy logic classifier (OMFLC). Table 8.5 compares the results of the above-mentioned

classifiers on the data set of 170 images after automatic rejection of unreliable images.

For ease of reference, we suffix ‘R’ to the name of all the classifiers applied to the data


CFC WCM MRC SSC GP RF [36] OMC OMFLC

AOM 66.7 68.2 53.5 66.7 98.4 84.1 81.3 88.8 92.1OME 57.1 60.8 66.3 81.0 80.0 81.4 85.7 82.6 90.0NOE 62.5 63.4 75.1 60.0 54.2 66.6 81.4 85.4 93.8

Accuracy 61.8 64.1 64.1 70.2 79.6 80.1 84.0 85.6 91.7

Table 8.4: Classification accuracies (in %) on the ground-truth set of 181 tympanicmembrane images. Each row corresponds to the class-wise classification accuraciesand columns correspond to the diagnosis by three general pediatricians (GP) as wellas the following algorithms: correlation filter classification system (CFC), WND-CHRM (WCM), multiresolution classifier (MRC), SIFT and shape descriptors withSVM classifier (SSC), random forest classifier (RF), our initial classifier [36], otitismedia classifier (OMC) [37], and otitis media fuzzy logic classifier (OMFLC).

CFCR WCMR MRCR SSCR RFR OMCR OMFLCR

AOM 65.6 65.6 61.0 76.2 80.3 90.0 93.4OME 56.7 58.2 65.3 72.8 79.1 89.1 91.0NOE 71.4 69.0 83.9 58.3 69.4 93.2 91.4

Accuracy 63.6 63.6 68.2 70.0 77.1 89.9 93.5

Table 8.5: Classification accuracies (in %) on the ground-truth set of 170 tympanicmembrane images out of 181 images after rejection. Each row corresponds to theclass-wise classification accuracies and columns correspond to classification by thefollowing algorithms: correlation filter classification system (CFCR), WND-CHRM(WCMR), multiresolution classifier (MRCR), SIFT and shape descriptors with SVMclassifier (SSCR), random forest classifier (RFR), otitis media classifier (OMCR), andotitis media fuzzy logic classifier (OMFLCR).

set after rejection. For example, correlation filter classification system with rejection

and the otitis media fuzzy classifier with rejection will be refered to as CFCR and

OMFLCR respectively. The OMFLCR outperforms the other classifiers by a fair

margin (16.4%). Random forest classifier shows the highest performance among the

five compared algorithms but fails to outperform the otitis media classifiers. There

are a couple of reasons for this poorer performance: since each image is assigned

an output label based on majority vote of outputs from all the decision trees in the


forest, the final output label can be a result contributed by poorly formed decision

trees, and, a random forest classifier is known to exhibit better performance when the

features used are uncorrelated which is not the case in this work, since more than one

vocabulary feature is directly targeted to characterize a specific diagnostic category.

While the overall performance increase between the otitis media classifier pre-

sented in [36] and otitis media classifier using the new vocabulary and grammar

might not seem substantial, the increase in classification accuracy of AOM cases is

significant. This increase can be attributed to the new grammar presented in Fig-

ure 7.4, which includes new vocabulary features; bulging and malleus presence. In 7.1,

identifying AOM was solely based on central concavity and light features, which only

indicate the presence of a bulge unlike the bulging feature that measures the total area

of bulging in the tympanic membrane. The performance presented by otitis media

classifier with rejection is a trade-off between misclassification and not classifying all

the input data. A total of 11 (2 AOM, 3 OME and 6 NOE) images were rejected due

to specular highlights and illumination problems. In this set we found that no images

were rejected due to presence of excessive cerumen. We believe that this rejection

step during preprocessing will ensure the collection of good-quality images that are

suitable for processing and high-quality diagnosis.

Pediatricians OMFLCAOM OME NOE AOM OME NOE

AOM 62 1 0 58 3 2OME 11 56 3 6 63 1NOE 4 18 26 1 2 45

Accuracy (%) 79.6 91.7

Table 8.6: Diagnoses by three general pediatricians (columns 2, 3, and 4) and OMFLC(columns 5, 6, and 7) versus the ground truth of expert otoscopists (rows) on imagesin data set DS1.


Overall, the otitis media classifier performs better than the average of the three

general pediatricians by a good margin (from 79.6% to 91.7%). Note that for the

comparison to be fair, we did not compare the performance of the pediatricians to

the otitis media classifier with rejection as seen in Table 8.6 , because they do not have

an objective way of rejecting images of poor quality. At the same time, the rejection

capability is a clear advantage of an automated algorithm, and leads to improved

performance (from 91.7% without rejection to 93.5% with rejection). Pediatricians

performed well on diagnosing AOM but with a high possibility of overdiagnosing

AOM.

When comparing misdiagnoses of OME and NOE as AOM between pediatricians

and the algorithm, 15.7% (11 out of 70) cases of OME and 8.3% (4 out of 48) cases

of NOE were misdiagnosed as AOM by pediatricians compared to 8.6% (6 out of 70)

cases of OME and 2.1% (1 out of 48) of NOE by the classifier, with a p-value of

0.0421 for the two-tailed Fisher exact test. When comparing misdiagnoses of NOE

between pediatricians and the algorithm, 45.8% (22 out of 48) cases of NOE were

misdiagnosed by pediatricians compared to only 6.3% (3 out of 48) by the classifier,

with a p-value of 0.0001 for the two-tailed Fisher exact test. From these observations,

we conclude that, on average, our algorithm outperforms general pediatricians.

8.4.2 Results: DS2

Table 8.7 shows the classification accuracies of the classifiers on the data set of 390

images (267 AOM, 82 OME, and 41 NOE). OMFLC demonstrates better class-wise

and overall performance than all the other classifiers. The same trend is followed

in Table 8.8 on the classification accuracies on the set of 233 images (144 AOM, 52

OME, and 37 NOE) images retained after the rejection procedure.


WCM CFC MRC RF SSC OMC OMFLC

AOM 55.1 57.3 73.4 54.3 70.4 71.5 74.2OME 48.8 48.8 28.1 75.6 40.2 61.0 61.0NOE 41.5 29.3 4.9 39.0 39.0 58.5 53.6

Accuracy 52.3 52.6 56.7 57.2 60.8 67.9 69.3

Table 8.7: Classification accuracies (in %) on the ground-truth set of 390 tympanicmembrane images (267 AOM, 82 OME, and 41 NOE). Each row corresponds tothe class-wise classification accuracies and columns correspond to the classificationby the following algorithms: WND-CHRM (WCM), correlation filter classificationsystem (CFC), multiresolution classifier (MRC), random forest classifier (RF), SIFTand shape descriptors with SVM classifier (SSC), otitis media classifier (OMC) [37],and otitis media fuzzy logic classifier (OMFLC).

RFR CFCR WCMR MRCR SSCR OMCR OMFLCR

AOM 54.1 60.8 59.0 76.4 68.9 72.3 71.6OME 81.3 58.3 42.1 37.5 50.0 64.6 68.8NOE 21.6 32.4 64.9 2.7 54.1 59.5 62.2

Accuracy 54.5 55.8 56.2 56.7 62.6 68.7 69.5

Table 8.8: Classification accuracies (in %) on the ground-truth set of 233 out of 390tympanic membrane images (144 AOM, 52 OME, and 37 NOE) after rejection. Eachrow corresponds to the class-wise classification accuracies and columns correspond tothe classification by the following algorithms: random forest classifier (RFR), corre-lation filter classification system (CFCR), WND-CHRM (WCMR), multiresolutionclassifier (MRCR), SIFT and shape descriptors with SVM classifier (SSCR), otitismedia classifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR).

8.4.3 Results: DS3

Table 8.9 shows the classification accuracies of the classifiers on the data set of 248

images (58 AOM, 112 OME, and 78 NOE). OMFLC demonstrates better class-wise

and overall performance than all the other classifiers. The same trend in classification

accuracies is followed in Table 8.10 showing the classification on 162 images (44 AOM,

46 OME, and 72 NOE) retained after the rejection procedure.

For reliable computation, we objectively reject images based on presence of spec-


MRC CFC WCM SSC RF OMC OMFLC

AOM 65.5 62.1 50.0 56.9 43.1 60.3 63.8OME 36.6 34.8 34.8 35.7 70.5 52.7 54.5NOE 3.9 37.2 51.3 50.0 15.4 46.2 48.7

Accuracy 33.1 41.9 43.6 45.2 46.8 52.4 54.5

Table 8.9: Classification accuracies (in %) on the ground-truth set of 248 tympanicmembrane images (58 AOM, 112 OME, and 78 NOE). Each row corresponds to theclass-wise classification accuracies and columns correspond to the classification by thefollowing algorithms: multiresolution classifier (MRC), correlation filter classificationsystem (CFC), WND-CHRM (WCM), SIFT and shape descriptors with SVM classi-fier (SSC), random forest classifier (RF), otitis media classifier (OMC) [37], and otitismedia fuzzy logic classifier (OMFLC).

MRCR CFCR RFR SSCR WCMR OMCR OMFLCR

AOM 68.2 63.6 45.5 65.9 59.1 56.8 63.6OME 47.8 37.0 67.4 32.6 47.8 39.1 39.1NOE 5.6 37.5 33.3 44.5 50.0 61.1 62.5

Accuracy 34.6 43.2 46.3 46.9 51.9 53.7 56.2

Table 8.10: Classification accuracies (in %) on the ground-truth set of 162 out of248 tympanic membrane images (44 AOM, 46 OME, and 72 NOE). Each row cor-responds to the class-wise classification accuracies and columns correspond to theclassification by the following algorithms: multiresolution classifier (MRCR), corre-lation filter classification system (CFCR), random forest classifier (RFR), SIFT andshape descriptors with SVM classifier (SSCR), WND-CHRM (WCMR), otitis mediaclassifier (OMCR) [37], and otitis media fuzzy logic classifier (OMFLCR).

ular highlights, poor illumination and presence of cerumen obstructing the adequate

visualization of the tympanic membrane. It must be noted that fraction of images

rejected in each data set is different. In data set DS1 only 11 out of 181 images are

rejected whereas in data sets DS2 (157 out of 390) and DS3 (86 out of 248). One of

reasons stated by the experts for lower diagnostic confidence is the poor quality of

images. Our rejection procedure is in line with experts’ opinion on the image quality

being critical for a clear diagnosis. The trend we observe is that larger fractions of


images are rejected in data sets DS2 and DS3 where the experts state lower diagnostic

confidence and poor agreement.

The consistent better performance of our otitis media classifier designed based

on vocabulary and grammar validates our methodology that a small number of tar-

geted, physiologically-meaningful features, vocabulary, together with a well-designed

grammar that mimics the decision process of expert otoscopists, is what is needed to

achieve accurate classification in this problem.

Chapter 9

Conclusions

The main goal of this thesis was to create an accurate automated classification system

for classifying the three diagnostic categories of otitis media based on tympanic mem-

brane images. Our working hypothesis was that mimicking the diagnostic process of

the expert otoscopists will lead to an accurate classification system to distinguish

the diagnostic categories of AOM/OME/NOE. In our efforts to closely mimic the

expert otoscopists’ diagnostic abilities, we follow two guiding principles—vocabulary

and grammar.

In this thesis, we present,

• Otitis Media Vocabulary: A set of features designed to characterize the

actual visual cues used by expert otoscopists while distinguishing the diagnostic

categories of otitis media.

• Otitis Media Grammar: A set of rules that govern the association and hier-

archy to combine the vocabulary terms in order to mimic the clinical decision

process of the expert otoscopists while distinguishing the diagnostic categories

of otitis media.

92

CHAPTER 9. CONCLUSIONS 93

The otitis media classifier designed using vocabulary and grammar exhibits high

levels of accuracy in identifying the diagnostic categories of otitis media and is com-

parable to the diagnoses by expert otoscopists. In comparison with other automated

classifiers and diagnoses by general pediatricians, the otitis media classifiers has out-

performed with higher classification accuracy by a fair margin. These results demon-

strate that our simple and concise 8-feature otitis media vocabulary is effective on the

problem, underscoring the importance of using targeted, physiologically-meaningful

features instead of a large number of general-purpose features. The classification

process, grammar, has a set of clear intuitive rules closely mimicking the diagnostic

process used by otoscopists. Increasing the accuracy from the current stage becomes

harder, as we have reached a high accuracy range; we now discuss potential strategies

for achieving that as directions for further work.

Images captured using a digital otoscope exhibit a large variability arising from

the non-standard acquisition procedure. Depending on the angle and amount of

light incident on the membrane and the ear canal, we encounter different illumina-

tion problems related to brightness and contrast. In our current implementation, we

only correct local illumination problems but have not solved for global illumination

problems. When images are found to be unreliable due to poor illumination, these

images were rejected from further computation. Artifacts such as shading, shadows,

and changes due to global variation in the intensity or color due to overexposure or

underexposure will affect feature computation. Strategies for minimizing such arti-

facts are subject of future studies. We have not explored the issue of illumination

normalization and plan to do so in future work.

In summary, the otitis media classifier introduced in this thesis validates our work-

ing hypothesis by demonstrating high classification accuracies on images of tympanic

membrane and the performance is comparable to the diagnoses of expert otoscopists

CHAPTER 9. CONCLUSIONS 94

when examining the images of tympanic membranes. The current standard of diag-

nosing otitis media is by visual examination, and as argued earlier, this subjective

evaluation has clear limitations. Our contribution is significant and innovative since

no other system exists for objective evaluation of otitis media, our otitis media classi-

fier will be the first automated system for classifying the diagnostic categories of otitis

media. We believe that with further improvements, the otitis media classifier can be

employed as a clinical diagnostic aid for non-expert examiners to drastically decrease

both underdiagnosis and overdiagnosis of AOM, assuring adequate antimicrobial use

when AOM is present, and reducing inappropriate use when AOM is absent, thus

avoiding adverse side effects and the risk of contributing to bacterial resistance.

Bibliography

[1] E. Asher, E. Leibovitz, J. Press, D. Greenberg, N. Bilenko, and H. Reuveni.

Accuracy of acute otitis media diagnosis in community and hospital settings.

Am. Acad. Pediatr., 94(4):423–428, April 2005.

[2] P. N. Belhumeur and D. Kriegman. What is the set of images of an object under

all possible lighting conditions? In Proc. IEEE Int. Conf. Comput. Vis. Pattern

Recogn., pages 270–277, June 1996.

[3] R. Bhagavatula, M. C. Fickus, J. W. Kelly, C. Guo, J. A. Ozolek, C. A. Castro,

and J. Kovacevic. Automatic identification and delineation of germ layer compo-

nents in H&E stained images of teratomas derived from human and nonhuman

primate embryonic stem cells. In Proc. IEEE Int. Symp. Biomed. Imag., pages

1041–1044, Rotterdam, The Netherlands, April 2010.

[4] bimagicLab. http://www.jelena.ece.cmu.edu/bimagic.html.

[5] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science

and Statistics. Springer, 2006.

[6] R. Bornard, E. Lecan, L. Laborelli, and J. Chenot. Missing data correction in

still images and image sequences. In Proc. ACM Int. Conf. Multimedia, pages

355–361, Juan-les-Pins, France, 2002.

95

BIBLIOGRAPHY 96

[7] L. Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, 1996.

[8] L. Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.

[9] J. Canny. A computational approach for edge detection. IEEE Trans. Pattern

Anal. Mach. Intell., 8(6):1293–1299, 1986.

[10] H. P. Chan, J. Wei, Y. Zhang, M. A. Helvie, R. H. Moore, B. Sahiner, L. Had-

jiiski, and D. B. Kopans. Computer-aided detection of masses in digital to-

mosynthesis mammography: comparison of three approaches. Medical Physics.,

5:4087–4095, 2008.

[11] T. F. Chan and L. A. Vese. Active contours without edges. IEEE Trans. Image

Process., 10(2):266–277, February 2001.

[12] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines.

ACM Trans. Intell. Syst. Tech., 2:1–27, 2011.

[13] A. Chebira, Y. Barbotin, C. Jackson, T. E. Merryman, G. Srinivasa, R. F. Mur-

phy, and J. Kovacevic. A multiresolution approach to automated classification

of protein subcellular location images. BMC Bioinform., 8(210), 2007.

[14] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal.

Mach. Intell., 17:790–799, 1995.

[15] L. D. Cohen. On active contour models and balloons. CVGIP: Image Und.,

53(2):211–218, March 1991.

[16] R. O. Duda and P. E. Hart. Use of the Hough transform to detect lines and

curves in pictures. Commun. ACM, 15:204–208, January 1977.

BIBLIOGRAPHY 97

[17] A. El-Baz, G. M. Beache, G. Gimelfarb, K. Suzuki, K. Okada, A. Elnakib,

A. Soliman, and B. Abdollahi. Computer-aided diagnosis systems for lung cancer:

Challenges and methodologies. Int. J. Biomed. Imag., 2013.

[18] K. Ganapathy, J. Hu, J. Kovacevic, A. Mojsilovic, and R. J. Safranek. Retrieval

and matching of color patterns based on a predetermined vocabulary and gram-

mar. US Patent, Jun. 25, 2002. #6,411,953.

[19] K. Ganapathy, J. Hu, J. Kovacevic, A. Mojsilovic, and R. J. Safranek. Retrieval

and matching of color patterns based on a predetermined vocabulary and gram-

mar: II. US Patent, Nov. 26, 2002. #6,487,554.

[20] B. V. Ginnekan, B. T. H. Romeny, and M. A. Viergever. Computer-aided diag-

nosis in chest radiography: A survey. IEEE Trans. Med. Imag., 20:1228–1241,

December 2001.

[21] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall,

Englewood Cliffs, NJ, 2002.

[22] R. M. Haralick. Statistical and structural approaches to texture. Proc. IEEE,

67:786–804, 1979.

[23] R. M. Haralick, K. Shanmugam, and Its’Hak Dinstein. Textural features for

image classification. 1979.

[24] G. F. Hayden. Acute suppurative otitis media in children. diversity of clinical

diagnostic criteria. Clin. Pediatrics, 22:99–104, 1981.

[25] T. Ho. The random subspace method for constructing decision forests. IEEE

Trans. Pattern Anal. Mach. Intell., 20(8):832–844, 1998.

[26] A. Hoberman. http://www.chp.edu/CHP/Hoberman,+Alejandro,+MD.

BIBLIOGRAPHY 98

[27] A. Hoberman, J. L. Paradise, H. E. Rockette, N. Shaikh, E. R. Wald, D. H.

Kearney, D. K. Colborn, M. K. Lasky, S. Bhatnagar, M. A. Haralam, L. M.

Zoffel, C. Jenkins, M. A. Pope, T. L. Balentine, and K. A. Barbadora. Treatment

of acute otitis media in children under 2 years of age. The New England J. Med.,

364:105–115, 2011.

[28] E. Ilkko, K.Suomi, and A. Karttunen. Computer-assisted diagnosis by temporal

subtraction in postoperative brain tumor patients - a feasibility study. Acad.

Radiology., 11(8):887–893, 2004.

[29] A. Jaiantila. Randomforest-matlab. https://code.google.com/p/randomforest-

matlab/.

[30] A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern

Recogn., 29(8):1233–1244, 1996.

[31] B. Julesz. Textons, the elements of texture perception, and their interactions.

Nature, 290:91–97, March 1981.

[32] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Int.

J. Comput. Vis., 1(4):321–331, 1988.

[33] G. K. Klinker, S. A. Shafer, and T. Kanade. A physical approach to color image

understanding. Pattern Recogn., 4:7–38, 1990.

[34] P. Kovesi. Matlab and octave functions for computer vision and image processing.

http://www.csse.uwa.edu.au/ pk/research/matlabfns/.

[35] B. V. K. Vijaya Kumar, A. Mahalanobis, and R. D. Juday. Correlation Pattern

Recognition. Cambridge Univ. Press, 2005.

BIBLIOGRAPHY 99

[36] A. Kuruvilla, J. Li, P. Hennings Yeomans, P. Quelhas, N. Shaikh, A. Hoberman,

and J. Kovacevic. Otitis media vocabulary and grammar. In Proc. IEEE Int.

Conf. Image Process., pages 2845–2848, Orlando, FL, September 2012.

[37] A. Kuruvilla, N. Shaikh, A. Hoberman, and J. Kovacevic. Automated diagnosis

of otitis media: A vocabulary and grammar. Int. J. Biomed. Imag., sp. iss.

Computer Vis. Image Process. for Computer-Aided Diagnosis, August 2013.

[38] C. Lannon, L. E. Peterson, and A. Goudie. Quality measure for the care of

children with otitis media with effusion. Pediatrics, 127, May 2011.

[39] T. Leung and J. Malik. Representing and recognizing the visual appearance of

materials using three-dimensional textons. Int. J. Comput. Vis., 43:29–44, 2001.

[40] D. G. Lowe. Object recognition from local scale-invariant features. In Proc.

IEEE Int. Conf. Comput. Vis., volume 2, pages 1150–1157, Washington, DC,

1999.

[41] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J.

Comput. Vis., 60(2):91–110, November 2004.

[42] J. B. MacQueen. Some methods for classification and analysis of multivariate

observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics

and Probability, volume 1, pages 281–297. Univ. California Press, 1967.

[43] R. Malladi, J. A. Sethian, and B. Vemuri. Shape modeling with front propaga-

tion: A level set approach. IEEE Trans. Pattern Anal. Mach. Intell., 17(2):158–

175, February 1995.

BIBLIOGRAPHY 100

[44] B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of

image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8):837–842, August

1996.

[45] D. Marr and E. Hildreth. Theory of edge detection. In Proc. R. Soc. of Lon.,

volume B207, pages 187–217, 1980.

[46] M. T. McCann, R. Bhagavatula, M. C. Fickus, J. A. Ozolek, and J. Kovacevic.

Automated colitis detection from endoscopic biopsies as a tissue screening tool in

diagnostic pathology. In Proc. IEEE Int. Conf. Image Process., pages 2809–2812,

Orlando, FL, September 2012.

[47] I. Minornica, C. Vertan, and D. C. Gheorghe. Automatic pediatric otitis de-

tection by classification of global image features. In Proc. 3rd Intl. Conf. on

E-Health and Bioengineering, Iasi, Romania, November 2011.

[48] T. Mitchell. Machine Learning. McGraw-Hill, 1997.

[49] A. Mojsilovic, J. Kovacevic, J. Hu, R. J. Safranek, and K. Ganapathy. Matching

and retrieval based on the vocabulary and grammar of color patterns. In Proc.

IEEE Int. Conf. Multim. Comput. Syst., Florence, Italy, June 1999.

[50] A. Mojsilovic, J. Kovacevic, J. Hu, R. J. Safranek, and K. Ganapathy. Matching

and retrieval based on the vocabulary and grammar of color patterns. IEEE

Trans. Image Process., sp. iss. Image Video Process. Digit. Libraries, 9(1):38–

54, January 2000. IEEE Signal Processing Society Young author Best Paper

Award.

[51] A. Mojsilovic, J. Kovacevic, D. A. Kall, R. J. Safranek, and K. Ganapathy. Vo-

cabulary and grammar of color patterns. IEEE Trans. Image Process., 9(3):417–

431, March 2000.

BIBLIOGRAPHY 101

[52] E. Onusko. Tympanometry. Agency for Healthcare Research and Quality., 70,

November 2004.

[53] N. Otsu. A threshold selection method from gray-level histograms. IEEE Trans.

Syst. Man Cybern., 9:377–393, 1979.

[54] N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern

Recogn., 26(99):1277–1294, 1993.

[55] J. L. Paradise, H. E. Rockette, and D. K. Colborn. Otitis media in 2,253

pittsburgh-factors during the first two years of life. Pediatrics, 99:318–333, May

1997.

[56] Am. Acad. Pediatr. Diagnosis and management of acute otitis media. Pediatrics,

113(5):1451–1465, 2004.

[57] P. Perez, M. Gangnet, and A. Blake. Poisson image editing. ACM Siggraph,

22(3):313–318, 2003.

[58] D. L. Pham, C. Xu, and J. L. Prince. Current methods in medical image seg-

mentation. Ann. Rev. Biomed. Eng., 2:315–337, 2001.

[59] M. E. Pichichero. Diagnostic accuracy of otitis media and tympanocentesis skills

assessment among pediatricians. Eur. J. Clin. Microbiol. Infect. Dis., 22(9):519–

524, September 2003.

[60] M. E. Pichichero and M. D. Poole. Assessing diagnostic accuracy and tympa-

nocentesis skills in the management of otitis media. Archives of Pediatrics and

Adolescent Medicine, 155(10):1137–1142, 2001.

[61] R. J. Schalkoff. Artificial Neural Networks. Computer Science. McGraw-Hill,

1997.

BIBLIOGRAPHY 102

[62] N. Shaikh. http://www.chp.edu/CHP/Shaikh,+Nader,+MD,+MPH.

[63] N. Shaikh, A. Hoberman, P. H. Kaleida, H. E. Rockette, M. Kurs-Lasky,

H. Hoover, M. E. Pichichero, O. F. Roddey, C. Harrison, J. A. Hadley, and R. H.

Schwartz. Otoscopic signs of otitis media. Pediatr. Infect. Dis. J., 30(10):822–

826, 2011.

[64] N. Shaikh, A. Hoberman, H. E Rockette, and M. Kurs-Lasky. Development of

an algorithm for the diagnosis of otitis media. Pediatrics, 12(3):214–218, May

2012.

[65] L. Shamir, N. Orlov, D. M. Eckley, T. Macura, J. Johnston, and I. G. Gold-

berg. WND-CHARM: Multi-purpose image classification using compound image

transforms. Pattern Recogn. Lett., 29:1684–1693, 2008.

[66] P. Shekelle, G. Takata, and G. Chan. Diagnosis, natural history, and late effects

of otitis media with effusion. evidence report/technical assessment no. 55. Agency

for Healthcare Research and Quality., pages 3–23, May 2003.

[67] I. Sluimer, A.Schilham, M. Prokop, and B. V. Ginnekan. Computer analysis of

computed tomography scans of the lung: A survey. IEEE Trans. Med. Imag.,

25:385–405, April 2006.

[68] P. A. Tahtinen, M. K. Laine, P. Huovinen, J. Jalava, O. Ruuskanen, and A. Ruo-

hola. A placebo-controlled trial of antimicrobial treatment for acute otitis media.

The New England J. Med., 2011.

[69] D. W. Teele, J. O. Klein, and B. Rosner. Epidemiology of otitis media during

the first seven years of life in children in greater Boston: A prospective, cohort

study. J. Infect. Dis., 160(1):83–94, 1989.

BIBLIOGRAPHY 103

[70] P. S. Tsai and M. Shah. Shape from shading using linear approximation. Image

Vis. Comput., 12:487–498, 1994.

[71] M. Varma and A. Zisserman. Classifying images of materials: Achieving view-

point and illumination independence. In Proc. Eur. Conf. Comput. Vis., vol-

ume 3, pages 255–271, May 2002.

[72] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer

vision algorithms, 2008. http://www.vlfeat.org/.

[73] C. Vertan, D. C. Gheorghe, and B. Ionescu. Eardrum color content analysis in

video-otoscopy images for the diagnosis support of pediatric otitis. In Int. Symp.

on Signals, Circuits Syst., Bucharest, Romania, July 2011.

[74] C. Xu and J. L. Prince. Snakes, shapes and gradient vector flow. IEEE Trans.

Med. Imag., 7:359–369, March 1998.

[75] L. A. Zadeh. Fuzzy sets. Information and Control., 8(3):338–353, 1965.

[76] D. Ziou and S. Tabbone. Edge detection techniques-an overview. Int. J. Pattern

Recogn. Image Anal., 8(4):537–559, 1998.

Documents

Automated Diagnosis of Otitis Media A Vocabulary and Grammarjelena.ece.cmu.edu/repository/theses/13_Thesis_Kuruvilla.pdf · Automated Diagnosis of Otitis Media ... Automated Diagnosis