Decrypting Cryptogenic Epilepsy: Machine Learning Methods

Decrypting Cryptogenic Epilepsy: Machine

Learning Methods for Detecting Cortical

Malformations

A dissertation submitted by

Bilal Ahmed

In partial fulfillment of the requirements

for the degree of Doctor of Philosophy in

Computer Science

TUFTS UNIVERSITY

May 2016

ADVISER: Carla E. Brodley

ii

Abstract

Epilepsy is a common neurological disorder, affecting approximately 1% of

the world’s population. Uncontrolled epilepsy can have harmful effects on the

brain and increases the risk of injuries and sudden death. Cortical malforma-

tions, particularly focal cortical dysplasia (FCD) is recognized as one of the

most common source of treatment resistant epilepsy (TRE). Surgical resection

of the abnormal tissue is the only treatment for TRE patients, and a success-

ful outcome results in complete seizure freedom. Chances of success when the

lesion is visually detected on the MRI (MRI-positive) are 66%, and only 29%

for cases with undetected lesions (MRI-negative). Approximately 45%-60% of

histologically confirmed FCD lesions are missed by expert neuroradiologists.

This dissertation develops automated methods of detecting cortical mal-

formations in MRI-negative patients using surface-based morphometry. Using

data from MRI-negative patients to train machine learning (ML) algorithms

has a number of confounding factors that limit their applicability to the lesion

detection task. These include, label noise arising from subjectivity in deter-

mining the cortical region to resect without a visible abnormality. Similarly,

inter-subject and intra-subject variations in brain morphology limit the gener-

alization of ML methods trained on data aggregated from different individuals.

To address these issues we develop two novel ML methods. We propose a mul-

titask learning (MTL) method that models each patient as a separate learning

task, and uses the results of intra-cranial EEG exam as added supervision to

mitigate label noise. Next, we develop hierarchical conditional random fields

(HCRF) for outlier-detection, which is a semi-supervised learning method that

does not require labeled training data. By correcting for all three factors (i.e.,

label noise, intra-subject and inter-subject variation) HCRF outperforms the

baseline methods and the MTL method.

iii

The high detection rate (75% for HCRF) of the proposed methods for

MRI-negative patients shows that some electrophysiologically and histologi-

cally abnormal cortical regions are not visually apparent to the human eye

but can be detected using ML methods. Incorporating such ML methods in

the pre-surgical evaluation protocol have the potential to enhance the chances

of detecting the lesion prior to surgery, leading to an increased number of

patients being referred to resective surgery.

iv

Acknowledgements

Foremost, I would like to express my sincere gratitude to my thesis advisor

Prof. Carla E. Brodley for her supervision of my research. I consider that all

I have achieved during the course of my doctorate, and the fun I have had

would not have been possible without her support and patience. I would like

to thank my thesis committee: Prof. Roni Khardon, Prof. Ben Hescott, Prof.

Shuchin Aeron and Dr. Thomas Thesen, for their insightful comments and

suggestions.

I am also grateful to the following former or current staff at Tufts Univer-

sity, for their support during my graduate study: Jeannine Vangelist, Donna

Cirelli, Gail Fitzgerald, Sarah Richmond, George D. Preble and the excellent

Systems support staff. I have benefited immensely from the advice of the

people at the Tufts research computing group.

I would like to specially thank the Epilepsy Foundation, USA for awarding

me the pre-doctoral training scholarship, and also FACES (finding a curing

for epilepsy and seizures) organization for their financial support.

My friends have helped me stay sane through these difficult years. Their

support and care helped me overcome difficult times and stay the course in my

graduate studies. I greatly value their friendship and I deeply appreciate their

belief in me. I would especially like to thank, Mashhood Ishaque, Noman

H. Khan, Ehsan Ullah, Nathan Ricci, Saeed Majidi, Alireza Aghassi, Gilad

Barash, Haris Ghafoor, Mamoon Raja, Abdur-Rehman Rashid and Syed Musa

Bukhari.

None of this would have been possible without the never-ending love and

unconditional support of my parents, Baba and Ammi. They inspired me to

reach for the stars and dream big. I would also like to thank my parents-in-law,

Uncle and Aunty. I am also indebted to my loving wife, Sabeeka for believing

v

in me even under the most trying circumstances, and for the numerous pep-

talks that showed me the light when everything seemed bleak.

vi

Contents

1 Introduction 1

1.1 Lesion Detection in Epilepsy Patients . . . . . . . . . . . . . . 2

1.2 Machine Learning For Lesion Detection . . . . . . . . . . . . . 3

1.3 Intra-cranial EEG as Auxiliary Supervision . . . . . . . . . . . 5

1.4 Identifying Lesions As Outliers . . . . . . . . . . . . . . . . . 8

1.5 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Focal Cortical Dysplasia and surface-based Morphometry 12

2.1 Focal Cortical Dysplasia . . . . . . . . . . . . . . . . . . . . . 13

2.2 Surface-Based Morphometry . . . . . . . . . . . . . . . . . . . 15

2.2.1 Surface Reconstruction . . . . . . . . . . . . . . . . . . 15

2.2.2 Registration . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Morphological Features . . . . . . . . . . . . . . . . . . 17

2.3 Radiological Features of FCD Lesions . . . . . . . . . . . . . . 20

2.4 Computational Methods for Detecting FCD using surface-based

Morphometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 A Vertex-Based Classifier 25

3.1 Eliminating Label Noise . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Removing False Positives . . . . . . . . . . . . . . . . . 27

vii

3.1.2 Removing False Negatives . . . . . . . . . . . . . . . . 29

3.2 Reducing Cortical Complexity . . . . . . . . . . . . . . . . . . 29

3.3 Overcoming Class Imbalance . . . . . . . . . . . . . . . . . . . 30

3.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 32

3.4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . 32

3.4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.1 Data Stratification . . . . . . . . . . . . . . . . . . . . 38

3.5.2 Mask Reduction . . . . . . . . . . . . . . . . . . . . . . 42

3.5.3 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Leveraging iEEG for FCD Lesion Detection 47

4.1 Multitask Learning . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 MTL with Auxiliary Label Information . . . . . . . . . . . . . 52

4.2.1 Regularized Multi-task Learning (MTL) . . . . . . . . 53

4.2.2 Incorporating Auxiliary Label Information . . . . . . . 54

4.2.3 Globally-Consistent Label Ranking (GC) . . . . . . . . 56

4.2.4 Task-Specific Label Ranking (TS) . . . . . . . . . . . . 59

4.3 Detecting Cortical Malformations . . . . . . . . . . . . . . . . 61

4.3.1 Data Description . . . . . . . . . . . . . . . . . . . . . 61

4.3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . 62

4.3.3 Creating Electrode Maps . . . . . . . . . . . . . . . . . 63

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4.1 Baseline Selection . . . . . . . . . . . . . . . . . . . . . 65

4.4.2 Experimental Setup: . . . . . . . . . . . . . . . . . . . 66

viii

4.4.3 Performance Analysis: . . . . . . . . . . . . . . . . . . 68

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Hierarchical Conditional Random Fields For Detecting FCD

Lesions 73

5.1 Hierarchical Conditional Random Fields . . . . . . . . . . . . 76

5.2 HCRFs for Lesion Detection . . . . . . . . . . . . . . . . . . . 77

5.2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . 78

5.2.2 HCRF Construction . . . . . . . . . . . . . . . . . . . 80

5.2.3 Lesion Detection . . . . . . . . . . . . . . . . . . . . . 84

5.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 85

5.3.1 Data Pre-processing and Parameter Selection . . . . . 86

5.3.2 Evaluation Methodology . . . . . . . . . . . . . . . . . 88

5.3.3 Cluster Ranking . . . . . . . . . . . . . . . . . . . . . . 88

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.4.1 Individual Features . . . . . . . . . . . . . . . . . . . . 92

5.4.2 Combining Features . . . . . . . . . . . . . . . . . . . 97

5.4.3 Ranking Criterion and the Detection Rate . . . . . . . 104

5.5 HCRF versus Human Expert . . . . . . . . . . . . . . . . . . . 106

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6 Conclusion 110

Appendices 113

A Patient Information 114

B HCRF Results for MRI-Positive Patients 118

Bibliography 121

ix

List of Figures

2.1 Automatic segmentation of the gray/white matter boundary

and surface extraction. . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Registration of different cortical surfaces. . . . . . . . . . . . . 17

2.3 Summary of the five morphometric features estimated at each

cortical vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Manual mask reduction for an MRI-positive patient. . . . . . . 28

3.2 Overview of the training and test phase of a vertex-based classifier. 31

3.3 Detection results for the machine learning based approach on

an MRI-positive and an MRI-negative subject. . . . . . . . . . 37

3.4 Effects of changing the manually determined thresholds for mask

reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Effects of changing the classifier design on the detection rate of

MRI-negative patients . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Mapping iEEG electrodes on the cortical surface. . . . . . . . 64

5.1 Constructing a Hierarchical Conditional Random Field (HCRF)

for a flattened cortical parcellation image isolated using a neuro-

anatomical atlas. . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.2 Detection results for MRI-negative patient NY67 using HCRF

and cortical thickness. . . . . . . . . . . . . . . . . . . . . . . 93

x

5.3 Comparison of detection rates, precision and recall between the

HCRF based approach and the baseline method using individual

morphological features. . . . . . . . . . . . . . . . . . . . . . . 95

5.4 Detection results for MRI-negative patient NY294 using HCRF

and cortical thickness. . . . . . . . . . . . . . . . . . . . . . . 96


HCRF based approach and the z-score based baseline method

when the detection scores are combined across features. . . . . 99


HCRF based approach and the logistic regression based baseline

method, when the detection scores are combined across features. 100


HCRF based approach and the z-score based baseline method

when the detection scores are combined across cortical thickness

and mean curvature. . . . . . . . . . . . . . . . . . . . . . . . 103

5.8 Cluster ranking criterion and its effects on the detection rate. 105

5.9 An MRI-positive patient with abnormal detections outside the

resection zone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

B.1 Detection results for an MRI positive patient, using HCRF and

cortical thickness. . . . . . . . . . . . . . . . . . . . . . . . . . 119

B.2 Comparison of detection rates, precision and recall between

then HCRF based approach and the baseline method using in-

dividual morphological features for MRI-positive patients. . . 120

xi

List of Tables

3.1 The detection performance of the z-score baseline approach and

the proposed scheme (ML) on MRI-positive subjects. The true

positive rate (TPR) and false positive rate (FPR) are calculated

as the percentage of lesional vertices correctly labeled, and the

percentage of non-lesional vertices incorrectly labeled, respec-

tively. The Dice coefficient (DC) measuring the degree of spatial

overlap (shown here as a percentage) between the detected clus-

ters and the expert-marked lesion on the cortical surface is also

listed (‘-’ represents a value of zero, and for both TPR and DC

signifies that no abnormal cluster was detected that overlapped

with the lesion). . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Results for MRI-negative subjects. For each subject the true

positive rate (TPR) and false positive rate (FPR) are calcu-

lated as the percentage of lesional vertices correctly labeled,

and the percentage of non-lesional vertices incorrectly labeled,

respectively. The dice coefficient (DC) is also shown as a per-

centage to quantify the overlap between the detected clusters

and the resection on the cortical surface (‘-’ represents a value

of zero for FPR and no-detection for TPR and DC). . . . . . . 39

xii

3.3 A comparison of detection results using the z-score based method

and the ML methods only for MRI-positive subjects with differ-

ent variations in the design of the ML approach. (A) no strati-

fication along the sulcal values, (B) stratifies the data based on

the sulcal depth values, but does not reduce the lesion mask.

(C) uses stratification, lesion reduction by calculating a thresh-

old for each sulcal level using cortical thickness values, but it

does not use bagging (The TPR and FPR are measured as a

percentage and‘-’ represents a value of zero for FPR and no-

detection for TPR). . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 A comparison of detection results using the z-score based method

and the ML method only for MRI-negative subjects with differ-

ent variations in the design of the ML approach. (A) no strati-

fication along the sulcal values, (B) stratifies the data based on

the sulcal depth values, but does not reduce the lesion mask.

(C) uses stratification, lesion reduction by calculating a thresh-

old for each sulcal level using cortical thickness values, but it

does not use bagging (FPR is given as a percentage and ‘-’ rep-

resents a value of zero for FPR). . . . . . . . . . . . . . . . . . 41

4.1 Range of values for the model hyper-parameters used in the

grid search. The grid search optimized the area under the curve

(AUC) over the model parameter set (MPS) consisting of three

patients whose data is distinct from the fifteen patients used for

performance analysis. . . . . . . . . . . . . . . . . . . . . . . . 68

xiii

4.2 Detailed results for MRI-negative subjects. LDA is the Fisher

linear discriminant analysis based method adapted from [43],

ML represents the stratfified classification scheme described in

Chapter 3, MTL represents regularized MTL [31] without aux-

iliary supervision, GC and TS are the globally-consistent and

the task-specific approaches, respectively (‘-’ represents a value

of zero for FPR and no-detection for recall and precision, ‘*’

MRI-positive patients). . . . . . . . . . . . . . . . . . . . . . . 70

A.1 Demographic and seizure-related information for the MRI-positive

patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.2 Demographic and seizure-related information for the MRI-negative

patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

xiv

Chapter 1

Introduction

In this research we address the task of detecting structurally abnormal cortical

regions in patients suffering from treatment resistant epilepsy (TRE) caused

by focal cortical dysplasia (FCD). We take a machine learning approach that

utilizes the magnetic resonance imaging (MRI) data of TRE patients with the

end goal of enhancing the early detection rate in patients whose MRI scans

are deemed normal by neuroradiologists. Visual detection of the dysplastic

cortical region (FCD lesion) is dependent on various factors such as reviewer

training, location of the lesion within the complex convolutional structure of

the brain, etc. Section 1.1 introduces the clinical process of lesion detection for

TRE patients, and its impact on surgical outcomes. Section 1.2 formalizes the

lesion detection problem from a machine learning perspective and explains the

major confounding factors in the data that warrant the development of novel

learning techniques. Sections 1.2, 1.3 and 1.4 give an overview of three new

learning algorithms specifically tailored for the task of FCD lesion detection,

that constitute the main technical contributions of this research. In Section

1.6 we provide a guide to the rest of the thesis.

1

1.1 Lesion Detection in Epilepsy Patients

Epilepsy is a common neurological disorder, affecting approximately 1% of the

population [39]. It is characterized by profound abnormal neural activity dur-

ing seizures and inter-ictal periods. Uncontrolled epilepsy can have harmful

effects on the brain and has increased risk of injuries and sudden death [11].

About one third of epilepsy patients remain refractory to medical treatment

[55]. Cortical malformations, particularly focal cortical dysplasia (FCD) is rec-

ognized as the most common source of pediatric epilepsy [11, 96] and the third

most common source in adults suffering from TRE [45, 92]. Early detection

and subsequent surgical removal of the FCD lesion area is the most effective

treatment to stop seizures and is often the last hope for these patients.

For patients suffering from FCD based TRE, an initial radiological evalua-

tion of the patient’s MRI is carried out by a panel of experienced radiologists to

locate the lesion. In some cases the lesion is located based on visual inspection

(MRI-positive), while in most cases the patient’s MRI is read as normal (MRI-

negative). A number of factors such as the highly complex folded pattern of

the brain [44, 12], reviewer experience [107] and the specific characteristics of

the FCD lesion [64, 53] limit the chances of visually detection. Visual inspec-

tion is followed by an intracranial EEG (iEEG) exam. This invasive procedure

requires precise implantation of intracranial electrodes, which in the absence of

any target provided by the MRI becomes a challenging task. Once the seizure

onset zone is identified, then resective surgery can be performed. It should be

noted here that, resective surgery is dependant on the specific location of the

lesion. In certain cases resection may not be possible, such as a lesion in the

motor or visual cortex, in which case surgical resection will lead to a loss of

basic life functions.

For MRI-positive patients, the chance that the patient will be seizure-free

2

after surgery is 66%, whereas for MRI-negative patients it is only 29% [57].

It is estimated that 70-80% of cases with FCD escape visual MRI inspection

[11, 96] (i.e., are MRI-negative).

Despite a growing number of studies demonstrating that resective surgery

is effective for TRE patients whose main indication is FCD, it remains under-

utilized [9]. This is especially true for MRI-negative patients. Not only are

these individuals less likely to be referred to specialized epilepsy center by neu-

rologists [38], but many epilepsy specialists are reluctant to operate without

a well-defined lesion. In this thesis we describe and evaluate novel machine

learning algorithms that have higher sensitivity for identifying FCD lesions

in MRI data than other reported computational methods for FCD lesion de-

tection [14, 96, 43], resulting in an increased detection rate for MRI-negative

patients during their pre-surgical evaluation. The ultimate impact of this re-

search is enhanced utilization of the resective surgical procedure leading to

better quality of life for FCD patients.

1.2 Machine Learning For Lesion Detection

We use surface-based morphometry (SBM) [24] to extract a surface model from

the structural MRI scans of the patients. SBM represents the cortex as a two

dimensional folded sheet embedded in a three dimensional space [35]. Techni-

cally, the folded sheet is represented as a triangulated surface, and each vertex

on the surface can be characterized by different morphological features such

as cortical thickness [33], curvature, etc. Using SBM, we can accurately align

the extracted surfaces of different individuals such that there is a one-to-one

correspondence among the cortical regions of different individuals. This align-

ment plays a crucial role in comparing the regions among individuals as there

3

is considerable inter-subject variation in brain morphology based on different

demographic factors such as age, gender, handedness, level of education, etc.

By aligning the brains of different individuals to a common surface, we can

correct for inter-subject variation by matching exact locations among different

brains.

The machine learning task is to train learning algorithms to distinguish

between normal and lesional vertices on the extracted surfaces. To this end we

use training data from healthy controls and FCD patients (both MRI-positive

and MRI-negative) who underwent surgical resection and neuropathological

examination of the resected tissue showed evidence of FCD.

The task as described above, seems straight-forward and fits nicely within

the supervised learning framework for binary classification. We can train a

classifier by collecting the vertices from the patient’s affected area as posi-

tive instances and the corresponding vertices from the healthy controls would

serve as negative instances. However, there are a number of confounding fac-

tors arising from human subjectivity and data complexity which if not prop-

erly addressed will result in classifiers that have low sensitivity. The major

confounding factors in the data include:

Label Noise: The diagnostic methodology in the absence of an MRI-visible

lesion rests on the accurate placement of iEEG electrodes and subsequent

analysis by a surgical board. For MRI-negative patients the absence of a visu-

ally detected target, negatively impacts the accuracy of electrode placement

which in turn significantly undermines the surgical outcome [88, 89]. The goal

of resective surgery is to remove the entire lesion. If any part of the lesion is left

behind, the outcome will not be successful. This introduces false positives be-

cause the margins around the lesion/resection area tend to be “generous”. The

problem is more pronounced in MRI-negative patients because in the absence

4

of an MRI-identified target, abnormal vertices are delineated by the extent of

the tissue removed in surgery. The resected tissue may include a gradation

from abnormal to normal tissue. In addition, there are false negatives outside

the resected regions of patients, which arise due to lifetime seizure burden

leading to cortical abnormalities or the presence of additional developmental

lesions that are not epileptogenic.

Anatomic Complexity: The anatomic complexity and heterogeneity in folded

cortical tissue reduces the ability to discriminate lesional tissue from normal

cortex, which is one of the reasons why a large number of lesions remain elusive

to human perception in routine radiological evaluation [93, 44]. Recent studies

have shown that subtle FCD lesions occur with higher frequency at the bottom

of the folded regions [41]. Similarly, the distribution of different features such

as cortical thickness and gray-white contrast (GWC) exhibits a covariate shift

based on where the region is located within the folded cortex [17].

Chapter 2 provides a detailed description of SBM. It also provides an

overview of the different FCD lesion detection schemes that utilize SBM. In

Chapter 3 we develop a FCD lesion detection mechanism that is tailored specif-

ically to counter the confounding factors outlined above. The empirical results

show that higher detection rate can be achieved for MRI-negative patients by

appropriately addressing the domain idiosyncrasies as compared to a recently

reported detection approach [96].

1.3 Intra-cranial EEG as Auxiliary Supervi-

sion

Our second approach, investigates iEEG as an auxilliary source of labels in

the supervised learning framework to augment the noisy vertex labels. Re-

5

call that, the main confounding factor in the data for FCD lesion detection is

label noise, that arises when the entire resected tissue is treated as an FCD le-

sion. However, for patients who have undergone surgery, and are subsequently

seizure free, the resection zone can be regarded as a source of weak (noisy)

supervision. In addition to the resection zones, we also have the results of the

iEEG analysis for MRI-negative patients. Our second approach investigates

the incorporation of iEEG as an auxiliary source of supervision, that can be

used to mitigate the effects of label noise when only resection zones are used

as ground truth.

Before undergoing resective brain surgery, all patients are subjected to an

invasive intracranial EEG (iEEG) exam. In this exam subdural electrodes are

implanted on the cortical surface to record electrical activity [112]. A board of

certified epileptologists reviews this information to determine the region that

is responsible for generating the seizure (i.e., the seizure onset zone). To isolate

the abnormal region, each electrode is labeled as being part of the seizure onset

zone or not. iEEG has been shown to be highly effective for localizing FCD

lesions [65]. However, for MRI-negative patients there is no visible lesion to

guide precise electrode implantation, which results in sampling errors. In such

cases the identified abnormal region fails to capture the lesion in its entirety

in about 40% of the cases [43]. Similarly, the subdural electrodes are unable

to record electrical activity from the bottom of the sulci. In cases where the

lesion is located at the bottom of the sulcus, iEEG analysis will not be effective

in locating the seizure onset zone [13]. Therefore, the outcome of the iEEG

analysis constitutes another source of weak supervision.

The output of an iEEG analysis consists of labeling each electrode as: part

of the seizure onset zone; active during the initial stages of seizure onset; or ac-

tive toward the end of the seizure. Based on these classifications, we interpret

6

the electrode labels as the output of a pairwise ranking function. This means

that the vertices that fall within the range of an electrode labeled as being part

of the seizure onset zone would be considered more positive (will have a higher

rank) than vertices covered by electrodes that have a different label. However,

the criteria used by the epileptologists for classifying electrodes depends on

a number of patient-specific factors [81], seizure morphology and semiology

[101], etc. Therefore, the underlying semantics of the pairwise ranking func-

tion varies from one patient to another. Along with the inter-patient variabilty

of iEEG assessment, the morphology of the human brain such as its thickness,

curvature and the overall structure in general are affected by different demo-

graphic factors such as age, gender and education [84, 83]. Because the data

of each patient has its own unique morphological characteristics, treating the

data from all the patients in an identical manner will lead to poor classification

accuracy.

To model inter-patient variability, both in terms of brain morphology and

iEEG-based electrode ranking, we treat each patient as a separate learning

task, and learn a joint classifier using the multitask learning framework [22].

To this end, we use the patient’s MRI to isolate the resected region (posi-

tive instances) and extract the same region from an age and gender matched

healthy control (negative instances). The positive labels provided by the re-

section zones are augmented with the ranking information provided by iEEG.

To utilize ranking information as an auxiliary source of supervision, we extend

the regularized multitask learning framework [30, 31] to learn a common clas-

sifier across the training subjects. This classifier can then be used to detect

FCD lesions in new patients i.e., who have not yet undergone iEEG electrode

placement. We evaluate the proposed technique on a dataset comprised of the

individual resection zones of patients and the corresponding cortical regions

7

from matched controls. Using this combined supervision, our proposed multi-

task learning approach detects abnormal regions within the resection zones for

all fifteen MRI-negative patients included in the dataset, albeit with a higher

false positive rate, as compared to other supervised learning methods that

included our vertex based method (c.f. Section 1.2) which correctly detected

lesions in 60% patients, and another recently reported supervised approach

[42] which achieved a detection rate of 73%. Chapter 4 provides the technical

details of this approach along with experimental results.

1.4 Identifying Lesions As Outliers

Our third method for lesion detection overcomes the effects of label noise by

formulating FCD lesion detection as an outlier detection problem. To, this end

we define a cortical lesion as a region that would be considered an outlier when

compared to the same region across a control cohort. Using this approach we

are able to bypass the use of noisy vertex labels to train a classifier.

Unlike other neurological disorders that affect a particular region of the

cortex such as Autism [71], Schizophrenia [79], etc., FCD lesions can occur

anywhere in the cortex and have variable size. In order to minimize the chances

of missing subtle lesions on the cortical surface we model lesion detection as

a multi-scale salient object detection problem using hierarchical conditional

random fields (HCRF) [78, 75]. In our case the saliency of the object is defined

by it’s degree of “outlier-ness”.

We employ image segmentation to isolate sub-regions of the cortex that

have similar morphological properties. Instead of segmenting the image at

a single scale we segment the image at different scales to obtain sub-regions

of varying size. Each sub-region is given an outlier score by comparing it to

8

the same region extracted from the control population. Finally, these outlier

scores are combined across the different scales using a tree structured condi-

tional random field. The final outlier scores are then thresholded to obtain the

detected lesion(s).

HCRFs have been used previously for object detection and semantic image

labeling for which they require accurate pixel-level labels. The accuracy of the

HCRFs in these domains is highly sensitive to label noise, and in most cases

the pixel-level labels need to be refined manually to obtain accurate results. In

our proposed formulation, we have extended the HCRF framework for binary

object detection/segmentation for which only image captions are available.

In our case the image captions correspond to whether a brain is healthy or

diseased. A caveat to this contribution is that the images must be able to

be accurately registered such that a one-to-one correspondence can be made

between sub-regions.

The HCRF-based outlier detection scheme was able to achieve a detection

rate of 75% for twenty MRI-negative patients as compared to our vertex-

based scheme [1] that achieved a detection rate of 55% and another baseline

approach that achieved a detection rate of 60%. For MRI-positive patients

the HCRF-based method achieved a detection rate of 92%, as compared to

the baseline which detected the lesion in 85% of the patients. As compared

to the baselines the HCRF-based method was able to achieve higher recall

and precision for both MRI-positive and MRI-negative patients. Chapter 5

provides the technical details of the HCRF-based outlier detection scheme for

FCD lesion detection and detailed experimental results including a comparison

of its performance with an expert neuroradiologist.

9

1.5 Thesis Contributions

The main focus of the research presented in this thesis is the development and

evaluation of automated methods for detecting cortical malformations in MRI-

negative epilepsy patients. As a first step we identify the main confounding

factors that result when the training data consists of MRI-negative patients.

Next, we develop two novel machine learning methods: a regularized multi-

task learning (MTL) method with auxiliary supervision from iEEG analysis

and hierarchical conditional random fields (HCRF) for outlier detection. In

separate evaluations, both methods were able to achieve superior performance

as compared to recently reported methods in the lesion detection literature.

Keeping in mind that experienced neuro-radiologists were unable to visu-

ally locate the lesion in MRI-negative patients, the high detection rate of the

proposed methods shows that some electrophysiologically and histopatholog-

ically abnormal cortical regions are not visually apparent to the human eye

but can be detected with the aid of machine learning methods. Furthermore,

incorporating automated lesion detection methods in the pre-surgical evalua-

tion protocol can enhance the chances of detecting the lesion prior to surgery,

leading to a higher number of patients being referred to resective surgery.

1.6 Roadmap

The rest of the thesis is organized as follows. Chapter 2 provides a brief intro-

duction to surface-based morphometry and a review of different approaches to

lesion detection that utilize surface-based morphometry. We develop an initial

supervised vertex-based lesion detection method that addresses the presence

of all the confounding factors that we identify for this data in Chapter 3. In

Chapter 4 we develop methods of regularized multitask learning with auxiliary

10

supervision, which lead to the incorporation of iEEG data for mitigating the

effects of label noise for supervised learning. Chapter 5 describes and evaluates

hierarchical conditional random fields (HCRF) for outlier detection and their

application to detecting FCD lesions. Chapter 6 discusses future avenues of

research in this domain and the concluding remarks for this research.

11

Chapter 2

Focal Cortical Dysplasia and

surface-based Morphometry

“If the human brain were sosimple that we could understandit, we would be so simple that wecouldn't”

Emerson M. Pugh

Epilepsy affects around 50 in 100,000 people every year, and a third of them

have medically intractable seizures i.e., their seizures cannot be controlled

through medication [55]. Treatment resistant epilepsy (TRE)1 carries the risks

of premature death, seizure-related injuries, social isolation and an overall low

quality of life [56]. For TRE patients, surgical resection of the affected cortical

region is the only treatment and usually their last hope for leading a normal,

seizure-free life. Focal cortical dysplasia (FCD), a malformation of cortical

development (MCD), is the most common epileptogenic lesion in children and

the third most common in adults with TRE [45, 92].

1Also known as drug-resistant epilepsy.

12

2.1 Focal Cortical Dysplasia

Focal cortical dysplasia (FCD) represents a group of structural disorders re-

sulting from malformations of cortical development (MCD). MCD characterize

structural and metabolic abnormalities of the brain that occur during gesta-

tion. About 25% of all reported cases of epilepsy are caused by MCD [110].

In all such cases, FCD is the most prevalent etiology accounting for 45% of

the cases [110, 76].

FCD is classified into three subtypes [18, 7]:

1. FCD Type I: is caused by abnormal neuronal migration.

2. FCD Type II: results from abnormal neural proliferation.

3. FCD Type III: defines lesions accompanied with hippocampal sclerosis

and tumors.

Surgical resection of the dysplastic brain tissue is the only treatment for

FCD-based TRE patients, and a successful outcome results in complete seizure

freedom for the patient. The success of the surgical procedure rests on the iden-

tification and delineation of the full FCD lesion during pre-surgical evaluation,

which currently involves an expert visual inspection of the patient’s MRI. The

chances of a successful surgical outcome in the presence of a visually detected

lesion are 66% as compared to only 29% when the lesion is not detected dur-

ing pre-surgical MRI evaluation [57, 99]. Recent advances in neuroimaging

technology especially MRI have revolutionized the detection and evaluation of

structural lesions associated with FCD, this in turn has led to higher success

rates for resective surgery [95]. However, approximately 45% of histologically

confirmed FCD lesions go undetected during visual inspection [110].

A successful surgical outcome depends on the complete removal of the

FCD lesion detected on the patient’s pre-surgical MRI [92]. In some cases,

13

even with a visually identified FCD lesion, sugery is not feasible as the lesion

overlaps with the eloquent cortex, which represents the cortical regions that

are mainly responsible for sensory, linguistic and motor processing. Hence,

before identifying the target for resection, all patients are subjected to an

invasive intracranial EEG (iEEG) exam, to accurately identify the extent of the

lesion and also to map the eloquent cortex. In this exam subdural electrodes

are implanted on the cortical surface to record electrical activity [112]. A

board of certified epileptologists reviews this information to determine the

region that is responsible for generating the seizure i.e., the seizure onset zone.

iEEG has been shown to be effective for localizing FCD lesions [65]. However,

for MRI-negative patients there is no visible lesion to guide precise electrode

implantation, which results in sampling errors. In such cases the identified

abnormal region fails to capture the lesion in its entirety in about 40% of the

cases, leading to poor surgical outcomes [99]. Therefore, patients who lack an

MRI-visible lesion are less likely to be referred to a specialized epilepsy center

by neurologists [38] and many epilepsy specialists are reluctant to operate

without a well-defined lesion. For these reasons, resective surgery remains

underutilized, despite a growing number of studies demonstrating that surgery

is effective for patients with focal TRE [9].

The relative inability to locate subtle FCD lesions on structural MRI scans

has lead to the development of mathematical and computational models of

brain’s morphology such as its shape, folding patterns and tissue character-

istics derived from structural MRI. These models facilitate comparisons of

cortical structures among different brains, and help in quantifying disease and

variability patterns. The models and the resulting algorithms are collectively

known as morphometry. A number of different morphometric algorithms exist

such as voxel-based morphometry [4], sulcal morphometry [80, 44] and surface-

14

based morphometry [24]. Next, we describe surface-based morphometry and

imaging biomarkers that are used by neuroradiologists to identify FCD lesions.

2.2 Surface-Based Morphometry

The cortical surface represents the outer layer of the brain modeled as a folded

two-dimensional surface in three-dimensional space. Even with optimized im-

age acquisition, identifying and delineating FCD lesions is highly dependent

on reviewer expertise. The rate of FCD lesion detection by non-expert and

expert neuroradiolgists range from 39%-50% [107]. Similarly, certain image

biomarkers of FCD, such as subtle abnormalities in cortical curvature and sul-

cal/gyral patterns may not be easily identifiable on planar MRI slices [93, 8].

In such cases computational models of the cortex derived from structural MRI

have shown to increase the sensitivity of locating FCD lesions [14, 44, 96].

Surface-based morphometry (SBM) is one such methodology, which pro-

vides the means to characterize and analyze the human brain by explicitly

modeling the cortex using a suitable geometric model [24], using structural

MRI scans. Modeling the brain using explicit surface models has advantages

of reaching sub-millimeter accuracy in measuring morphological features [33],

more precise registration [37, 50] and high sensitivity of identifying differences

in morphological features [59]. SBM has been used successfully for analyzing

and detecting neurological abnormalities in various neurological disorders such

as Schizophrenia [79], Autism [71], and Epilepsy [96, 43].

2.2.1 Surface Reconstruction

Structural T1-weighted MRI scans are used to extract the cortical surface by

delineating the boundary between the gray and white matter [24]. This process

15

Figure 2.1: (Top): Results of automatic segmentation and classification ofwhite matter voxels on an MRI volume to locate the gray/white matter bound-ary (yellow) and the pial surface boundary (red). (Bottom): Three differentsurface models obtained from the surface reconstruction phase.

is referred to as surface reconstruction [24], and involves: (i) segmentation of

the white matter, (ii) tessellation of the gray/white matter (GWM) boundary,

(iii) inflation of the folded surface, and (iv) correction of topological defects.

Once the surface is reconstructed it is further refined by classifying all white

matter vertices in the MRI volume to create the GWM boundary. The GWM

boundary is delineated up to sub-millimeter accuracy by further refining the

white matter surface. After refining the gray/white matter boundary the pial

surface is located by deforming the surface outward [35]. The reconstructed

surface is represented as a triangulated mesh and at each vertex different

morphological features can be estimated to characterize the cortex. It should

be noted that the spatial resolution of the reconstructed surface is different

from that of the original MRI volume. Figure 2.1 shows the results of surface

reconstruction on a subject’s MRI along with the resulting surface models.

16

Figure 2.2: Inter-subject registration using surface-based morphometry. (a):Mapping the curvature values from the pial surface to a sphere. (b): Aligningthe spheres to a group average sphere, by matching the curvature on a vertex-by-vertex basis. (c): Transforming the aligned sphere back to a surface model.

2.2.2 Registration

The reconstructed surface is closed at the brain stem, and can be geometrically

regarded as a sphere [35]. Different morphological transforms can be applied

to register the cortical surface to a standard surface also known as a group-

atlas. Registration is achieved by aligning specific sulcal and gyral patterns

across the reconstructed cortical surfaces while minimizing metric distortion.

Figure 2.2 shows the different steps involved in the registration process. The

use of strucural landmarks to guide the registration process results in a more

accurate alignment among different brains [50], which in turn allows more

precise comparisons of individual cortical structures across subjects [36].

2.2.3 Morphological Features

In this work, we use five morphological features to characterize the cortex:

17

Figure 2.3: Summary of the five morphometric features estimated at eachcortical vertex. A. shows the pial surface (blue) and the white matter surface(pink) on the underlying MRI, B. cortical thickness, C. gray-white contrast,D. mean curvature, E. suclal depth/gyral height, and F. Jacobian distortion.

1. Cortical thickness represents the thickness of the cortex which is de-

fined as the distance between the gray/white matter boundary and the

outermost surface of the gray matter (pial surface). It is calculated at

each vertex using an average of two measurements [33]: (a) the shortest

distance from the white matter surface to the pial surface; and (b) the

shortest distance from the pial surface at each point to the white matter

surface.

2. Gray/white-matter contrast (GWC) represents the degree of blurring at

the gray/white-matter boundary. GWC is estimated by calculating the

non-normalized T1 image intensity contrast at 0.5mm above and below

the gray/white boundary with trilinear interpolation of the images. The

range of GWC values lies in [− 1, 0], with values near zero indicating a

higher degree of blurring of the gray/white boundary.

18

3. Curvature is measured as 1r, where r is the radius of an inscribed circle

and mean curvature represents the average of two principal curvatures

with a unit of 1/mm [74]. Mean curvature quantifies the sharpness of

cortical folding at the gyral crown or within the sulcus, and can be used

to assess the folding of small secondary and tertiary folds in the cortical

surface.

4. Sulcal depth characterizes the folded structure of the cortex. It is esti-

mated by calculating the dot product of the movement vectors with the

surface normal [35], and results in the calculation of the depth/height of

each point above the average surface. The values of sulcal depth lie in

the range [− 2, 2] with lower values indicating a location in the sulcus

whereas higher values indicate a location on the gyral crown.

5. Jacobian distortion measures the distortion at each vertex during regis-

tration. In the registration process, as defined above, each subjects gyral

and sulcal features are aligned by warping the entire brain to a spheri-

cal average surface (i.e., the standard brain). During this process, each

vertex is subjected to a nonlinear spherical transform. Jacobian distor-

tion measures the magnitude of the nonlinear transform at each vertex

needed to warp each vertex on the subjects brain to a target vertex on

the average surface [36]. It is a measure of global brain deformation and

has been used at the vertex level for the detection of abnormal cortical

regions in autism [28].

Figure 2.3 illustrates the estimation of the morphological features using SBM.

19

2.3 Radiological Features of FCD Lesions

Typical MRI features of FCD include cortical thickening or thinning, blur-

ring of the gray-white matter boundary, increased signal intensities on Fluid-

attenuated inversion recovery (FLAIR) and/or T2-weighted images, a trans-

mantle stripe of T2 hyperintensity, and localized brain atrophy [67]. Below

we describe the efficacy of each of the previously mentioned morphological

features in identifying FCD lesions from a diagnostic imaging perspective.

Cortical Thickness: Thickening of the cortex is reported in 50-92% of FCD

cases [92, 10]. Cortical thickening results from the presence of balloon cells

(FCD type II) and is usually found in conjunction with blurring of the GWM

boundary. It has been reported as the most sensitive feature for automated

methods of detecting FCD lesions specially in Type-II patients [96, 43, 2].

GW Contrast: Blurring of the GWM boundary is another common finding

in MRI-positive patients, reported in 60-80% of FCD cases [92]. High levels

of blurring is observed mostly in FCD type-II patients due to the presence

of immature balloon cells and neuronal hypertrophy [97]. Cortical thickening

combined with blurring of the GWM boundary were found in approximately

64% of FCD type-II patients [64].

Sulcal depth and curvature: Subtle changes in sulcal depth and curvature are

difficult to observe and assess on planar MRI slices [8]. However, FCD le-

sions have been associated with varying degrees of sulcal and curvature based

anomalies [8]. Hong et al. [43], found sulcal depth to be helpful in identifying

FCD type-II lesions, however in the same study sulcal depth was also respon-

sible for generating the most extra-lesional clusters (detections deemed as false

positives based on expert-marked lesions).

Overall, 45% of histologically confirmed FCD lesions go undetected dur-

ing visual inspection of the MRI [110], which besides other factors can be

20

attributed to the anatomical complexity of the folded structure of the cortex.

For example, about 80% of FCD lesions located deep within the sulcus cannot

be detected through visual inspection [12]. Similarly, 87% of FCD type-I cases

[94, 53] and 33% of FCD type-II cases [94, 53] have been reported as having

normal MRI (MRI-negative). This makes FCD the most common histopatho-

logical finding in focal epilepsy patients with no visible lesion.

2.4 Computational Methods for Detecting FCD

using surface-based Morphometry

In this section we first define the related work with regard to automated tech-

niques of FCD lesion detection. We then discuss the critical limitations of

existing approaches. We provide the current computational methods of FCD

lesion detection that specifically use surface-based morphometry. For methods

that do not use SBM please see the recent and comprehensive surveys provided

in Bernasconi et al. [11], Kini et al. [49], and Duncan et al. [27].

Besson et al. [14], use a combination of surface and texture based features

to represent each vertex on the surface. They use cortical thickness, curvature

and sulcal depth along with gray-white contrast and T1 signal hyperintensity.

A four-layer neural network was trained to detect abnormal vertices using

leave-one-subject-out cross-validation. The dataset consisted of nineteen MRI-

positive patients who had “small” FCD lesions. The neural network based

classifier was able to detect abnormal regions within the expert-marked lesions

of 95% patients. A second fuzzy k-nearest neighbors classifier was used to

further refine the results and reduce the false positive rate. For this purpose,

each detected cluster was represented by the mean and standard deviation of

the individual features. The final detection rate after post-processing by the

21

second level classifier was found to be 68%.

Hong et al. [43], developed a two-stage Fisher linear discriminant analysis

(LDA) [16] classifier to detect FCD type-II lesions in patients who were radio-

logically classified as MRI-negative during their pre-surgical assessment. The

lesions were however identified on the pre-surgical MRI scans after surgery and

were traced manually by an expert using texture-based maps. Therefore, as far

as the learning algorithm is considered the patients were MRI-positive. Each

vertex was represented using cortical thickness, sulcal depth, curvature, gray-

white contrast and relative intensity from the T1-weighted MRI volume. A

leave-one-subject-out evaluation strategy was used, to assess the performance

of the lesion detection scheme. As a first step, a vertex-level LDA classifier

was used to classify each vertex on the reconstructed cortical surface as being

lesional or non-lesional for both controls and patients. These detections were

then further refined using a second LDA classifier that was trained to discrim-

inate between actual FCD lesions (detections made inside the manually traced

resection zones of patients) and spurious lesional detections made on controls.

For secondary classification, each cluster was represented by the mean and

standard deviation of the original individual features. The proposed scheme

was able to detect abnormal regions that co-localized with the expert-marked

lesions in 14/19 (74%) patients.

Thesen et al. [96], used a semi-supervised uni-variate z-score based thresh-

olding approach on registered SBM data of MRI-positive patients to classify

each vertex as being lesional or normal, using cortical thickness, GWC, curva-

ture, sulcal depth and Jacobian-distortion, individually. The dataset consisted

of eleven MRI-positive patients with five having FCD as the primary indica-

tion. They nominate cortical thickness along with GWC as being the most

informative features for FCD lesion detection in MRI-positive patients. By

22

combining results from cortical thickness and GWC the lesion was correctly

detection in ten out of the eleven patients.

Most of the techniques mentioned above deal either with MRI-positive

patients [14, 96] or patients who were initially deemed MRI-negative during

their preliminary radiological screening, but later their lesions were found to

visible on MRI [43]. In contrast to these studies, our data includes pure MRI-

negative patients whose lesions are not visible on their MRI, but their resected

tissues have been histologically verified to contain FCD.

The goal of resective surgery is to remove the entire lesion. If any part of

the lesion is left behind, the outcome will not be successful. This introduces

label noise, because the expert-marked lesion can contain normal vertices; the

margin around the lesion is marked in a “generous” manner so as to increase

the chances of capturing the entire lesion. Chapter 3 provides empirical evi-

dence that label noise needs to be mitigated particularly when MRI-negative

patients are part of the training data. A possible way to eliminate label noise

would be train exclusively on MRI-positive patients. However, the features

that characterize the lesion in MRI-negative vs MRI-positive patients may not

be concordant. For example, in FCD type-I (a high proportion of MRI-negative

patients have type-I lesions [49]) the abnormal features such as sulcal-depth

and curvature are hard to interpret on planar MRI slices [93, 8]. Therefore,

training exclusively on MRI-positive patients will limit the classifier’s detec-

tion ability. In Chapter 4 we take a different approach to eliminating label

noise and augment the weak labels provided by the marked resection with the

results of iEEG evaluation. We develop an outlier detection method based on

hierarchical conditional random fields (HCRF) in Chapter 5, that overcomes

label noise by posing FCD lesion detection as an outlier detection problem,

and does not utilize the resected regions as ground truth for training.

23

Most lesion detection methods cited previously, typically employ a post-

processing method to reduce the false positive rate. In this strategy a portion

of the vertices labeled lesional by the classifier are relabeled as normal. This

can be done by training a second-level classifier to classify the detected clusters

as lesional or non-lesional [14, 43]. Similarly, different heuristics can also be

used such as the surface area of the detected clusters [96]. Discarding any

detected region based on its size or surface area can result in discarding the

actual lesion or part of the lesion, because FCD can be located in any part of

the cortex, is highly variable in size, and may occur in multiple lobes [18]. In

Chapter 5 we develop a ranking methodology which ranks the detected clusters

based on a combination of their surface area and degree of abnormality. This

strategy bypasses the need to discard any findings and instead provides the

radiologist with multiple findings, that can be assessed visually or using iEEG.

24

Chapter 3

A Vertex-Based Classifier

“The combination of some dataand an aching desire for ananswer does not ensure that areasonable answer can beextracted from a given body ofdata”

John W. Tukey

In this chapter we develop a lesion detection scheme to classify each ver-

tex on the coritcal surface as “lesional” or “normal”. To this end, we use

labeled training data comprising of healthy controls and histopathologically

verified MRI-positive and MRI-negative patients who have undergone resec-

tive surgery. The classifier developed here highlights the idiosyncrasies of this

data that directly impact the design of a lesion detection scheme. From a

supervised learning perspective there are three main challenges that must be

addressed to develop an effective classifier:

1. Class label noise arises due to the subjectivity involved in identifying

and delineating the lesions in both MRI-positive and the MRI-negative

patients resulting in a significant number of false positives in data (much

more so for MRI-negative patients than MRI-positive patients). Label

noise is further aggravated by the presence of false negatives in the extra-

25

lesional (outside the resected regions) vertices of patients. This happens

because dysplastic regions can develop due to a number of causes such

as prolonged untreated epilepsy.

2. Anatomical complexity of the folded structure of the cortical surface re-

duces the discernability between dysplastic and normal tissue, and is

one of the main reasons why a large number of lesions remain elusive in

routine visual MRI evaluation [94].

3. Class imbalance results from the relatively low ratio of lesional vertices

to that of normal vertices for a particular patient, which is further com-

pounded by the higher availability of healthy control data as compared

to patient data.

This chapter explores the development of a vertex-based classifier that is de-

signed to explicitly address these issues.

3.1 Eliminating Label Noise

Label noise arises because the expert-marked lesion for MRI-positive patients,

and the resected tissue for MRI-negative patients can contain normal tissue

along with lesional tissue, causing normal tissue to be labeled as lesional. A

second source of false positives stems from the goal of resective surgery, which is

to remove the lesion in its entirety. Incomplete removal of the lesion can lower

the chances of a patient being seizure-free after resective surgery from 66% to

29% [92, 57]. This introduces false positives because the margins around the

lesion/resection area tend to be “generous”. The problem is more pronounced

in MRI-negative patients because in the absence of an MRI identified target,

abnormal vertices are delineated by the extent of the tissue removed in surgery.

26

The resected tissue may include a gradation from abnormal to normal tissue.

From a supervised ML perspective, treating all the resected vertices in the

case of MRI-negative patients as being lesional introduces false positives into

the training data, which can adversely affect classifier accuracy.

3.1.1 Removing False Positives

To ameliorate the impact of false positive label noise we pre-process the train-

ing data by manually reducing the lesion for both MRI-negative and MRI-

positive patients. The strategy is to eliminate those vertices from the lesional

regions that are not significantly different from the vertices outside the lesion.

In order to define the notion of “significance” we compare the distribution of

the normalized values of a morphological feature such as cortical thickness,

curvature, etc., for the vertices inside the labeled lesion/resection area to that

of the vertices outside the labeled lesion/resection area. Based on the assump-

tion that the lesion/resection area contains cortical structures characterized

by cortical malformation, we want to identify and select vertices within the

lesion/resection that are significantly different from the average feature values

outside the lesion/resection i.e., normal cortex.

Figure 3.1 shows an example of mask reduction when cortical thickness is

used characterize the cortex for an MRI-positive patient. It can be seen that

the patient has abnormal thinning in the expert-marked lesion, therefore, we

would like to select the vertices from the lesion that are in the left tail of the

distribution, at the same time ensuring that the sampling region has mini-

mal overlap with the extra-lesional (outside the resection/lesion) distribution

(marked by τthin in Figure 3.1). As a patient can have both abnormally thick

and thin values, we calculate two thresholds for each subject namely τthin and

τthick. These two thresholds can be seen as selecting only those vertices from

27

−6 −4 −2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Cortical Thickness (Z−Scores)P

(X)

Normal

Lesional

τthin

Figure 3.1: Manually calculating τthin for an MRI-positive patient who hascortical thinning in his lesion area. τthick is undefined for this patient becauseno vertices in the lesional area are significantly thicker than the vertices outsidethe lesion.

the marked lesion, that can be regarded as outliers when compared to the

tissue outside the lesion.

We have selected to work with cortical thickness, which is one of the most

informative features for characterizing FCD lesions [96]. First the thickness

measurements across all the vertices for a given subject are standardized using

first-order statistics calculated at each vertex from the controls. This vertex-

based normalization is done after registering all the controls and subjects to

the average surface. In cases where the lesional thickness density has heavier

tails than the non-lesional density on both sides we can calculate both τthin

and τthick. In patients where the structural abnormalities are characterized

only by cortical thinning or cortical thickening as is the case in FCD type-1

or type-II respectively [18], only one of the right or left tails of the lesional

density would be heavier in which case only one threshold can be calculated

and the other remains undefined. Similarly, for some patients we cannot find

appropriate thresholds, which occurs as a result of undetected abnormalities

in the non-lesional area due to factors other than epilepsy (e.g., head trauma,

long-term untreated epilepsy, etc.). If we are unable to detect any appropriate

28

threshold(s) then that particular subject contributes no vertices to the training

data.

The lesion reduction procedure is applied only to training data; for a test

subject the classifier is evaluated on all vertices and none of the vertices are left

out. This selective procedure of isolating vertices in the hope of eliminating

label noise is similar to the global and maximum difference approach taken in

[111] where the normalized gray matter difference image for a patient was used

to minimize the overlap between his/her lesional and non-lesional vertices.

3.1.2 Removing False Negatives

There are also false negatives in the non-lesional areas of patients, which can

arise because a lifetime seizure burden of a given patient can lead to cortical

abnormalities outside the seizure onset zone [63, 66] and/or the possibility of

additional non-epileptogenic dysplastic lesions [32]. In addition, patients who

are suffering from epilepsy due to developmental factors may have additional

lesions that are either not epileptogenic or have latent epileptogenicity [32].

Based on these considerations, we did not include non-lesional vertices from

the subjects as negative instances in our training data. Instead, all negative

instances were taken from a cohort of 62 healthy controls.

3.2 Reducing Cortical Complexity

The folding of the cortex varies across individuals and hinders the visibility of

subtle FCD lesions hidden deep within the folds. Recent studies have shown

that subtle FCD lesions tend to occur more frequently at the bottom of the sul-

cus [41]. Similarly, different sulcal levels have different thickness and GWC dis-

tributions [17], indicating that there are three distinct sub-populations of the

29

vertices. Given these insights we quantize the data into three non-overlapping

levels, where a sulci depth in the range [−2,−1) represents the sulcus, (1, 2]

represents the gyrus and vertices having sulci depth of [−1, 1] are labeled as

wall vertices.

Using the above mentioned stratification technique we calculate the two

thresholds τthin and τthick per sulci level for eliminating false positives (c.f.

Section 3.1.1). This results in a total of six distinct thresholds which may

or may not exist for a particular patient as explained previously. We train

two separate classifiers for each sulci level: one to detect cortical thickening

and one to detect cortical thinning. Although, we use cortical thickness to

reduce the lesion/resection region, the classifiers utilize four features: cortical

thickness, gray/white contrast, cortical curvature and Jacobian distance to

represent each vertex.

3.3 Overcoming Class Imbalance

There are far fewer lesional vertices than non-lesional vertices which, if not

addressed, can lead to a classifier that labels each vertex as non-lesional as this

maximizes classification accuracy [47]. Recall that we obtain “non-lesional”

vertices for our training data from the set of healthy controls. The number of

available controls is higher than the number of patients who have undergone

resective surgery, because only a few patients proceed to surgery when no

visible lesion is found on their MRI. This results in class imbalance where the

number of normal instances considerably outnumbers the positive (lesional)

instances.

To counter the effects of class imbalance, we use bagging coupled with

under-sampling, which has been shown to work well both empirically and

30

Figure 3.2: Different steps involved in the (A) training and (B) test phase ofthe vertex-based classifier. Note that, the lesion reduction step is applied onlyto the training patients. For a test subject we calculate two labels per vertex:one from each thick/thin classifier. The final label of the vertex is calculatedas the maximum of both predicted labels.

theoretically for imbalanced datasets [108]. Each one of our six classifiers is

replaced by a bag of ten classifiers. Within a bag, each classifier is trained on

all the lesional vertices and an equal number of randomly sampled negative

instances. To classify a vertex as lesional or non-lesional we first use its sulcal

depth to choose the two correct bags of classifiers: one for detecting cortical

thickening and the other for detecting cortical thinning.

We have chosen to work with logistic regression [16], which is a linear

classification algorithm. We selected logistic regression based on its relatively

fast training time and because it outputs a classification score that can be

interpreted as label probabilities. The final prediction for a bag is obtained

by taking a majority vote of the ten in-bag logistic regression classifiers. The

final label of the vertex is calculated as the maximum of both predicted labels.

Figure 3.2 illustrates the overall classifier design during training and testing.

31

3.4 Empirical Evaluation

We tested the vertex-based classifier defined here on a sample of 31 patients,

24 of which were MRI-negative and 7 MRI-positive. All subjects were selected

from a large registry of patients with epilepsy treated at the New York Univer-

sity School of Medicine Comprehensive Epilepsy Center who signed consent

for a research MRI scanning protocol. Criteria for inclusion in this study in-

cluded: (1) completion of a high resolution T1-weighted MRI scan; (2) surgical

resection to treat focal epilepsy; (3) diagnosis of FCD on neuropathological ex-

amination of the resected tissue. Demographic and seizure-related information

for these participants is provided in Appendix A. In addition, MRI scans us-

ing identical imaging parameters from a total of 62 neurotypical controls were

acquired (31 females/31 males; ages 17 − 65; mean age = 33; SD = 12.5).

Exclusion criteria for the control group included any history of psychiatric or

neurological disorders.

3.4.1 Data Preprocessing

The reconstructed cortical surfaces of all the subjects and controls were reg-

istered to an average surface. Furthermore, the feature values at each vertex

were z-score normalized based on first and second order statistics calculated

across the control population. Normalization of the feature values plays a vital

role in mitigating the effects of inter-personal variation in cortical morphology

resulting from different demopraphic factors such as age, gender, etc., that can

lead to high number of false positives [96].

32

3.4.2 Training

All positive (lesional) instances consisted of vertices located in the manually

reduced lesion/resection zone (c.f. Section 3.1) of both MRI-positive and MRI-

negative training subjects. The corresponding vertices from the controls were

included in the training data as negative (non-lesional) instances. This train-

ing data was partitioned into three distinct subsets based on sulcal depth.

Based on the two thresholds calculated for each subject: τthin and τthick, we

further decompose each of the three initial subsets into two non-overlapping

sets corresponding to thin and thick vertices. Thus, in our data stratification

procedure we end up with six subsets of training instances.

Six bags of ten logistic regression classifiers each, were trained to detect

either cortical thickening or cortical thinning at one of the three sulci levels. It

should be noted that any linear classifier can be used within this framework.

Each base-level logistic regression classifier was trained on a balanced dataset

i.e., with an equal number of positive and negative instances. We randomly

under-sampled [108] the negative instances (culled from the control data) to

balance the training set for each base-level classifier.

3.4.3 Testing

The output of each logistic regression classifier within the bag is the probability

that the input vertex belongs to the positive (lesional) class. To convert this

probability into a class label, we need to define a threshold ρ for the output

probability values such that the vertices having a predicted probability above

ρ are deemed lesional and those that fall below ρ are considered non-lesional.

In the results shown in Tables-3.1 and 3.2 we use ρ = 0.95.

We use a leave-one-patient-out cross-validation (LOOCV) strategy to test

the performance of our proposed classification scheme. For this purpose we left

33

out a single subject from the training data and trained the stratified classifiers

on vertices belonging to all the remaining subjects and all the controls. To

classify a vertex from the test subject, we first select the two bags of classifiers

corresponding to the sulcal depth of the vertex, and the output of each bag

is calculated based on the values of cortical thickness, GWC, curvature and

Jacobian distance for that vertex. Thus, we predict two labels for each test

vertex, indicating whether it is deemed lesional based on the “thinning” clas-

sifier (ythin) or the “thickening” classifier (ythick). These two predictions are

combined into a single label by taking the maximum of these two labels i.e.,

y := max {ythin, ythick}.

After each vertex of the test subject has been classified, the results were

post-processed to get rid of insignificant detections. To this end, we define the

notion of a detected cluster as a set of contiguous vertices that are labeled as

being lesional. The number of detected clusters was reduced to eliminate false

positives based on cluster surface area [96]. In our experiments all clusters

having a surface area less than 50mm2 were discarded, following the exact

same post-processing strategy as outlined in [96]. Although, discarding de-

tected clusters increases the possibility of discarding subtle abnormal regions,

we perform this step to have a consistent comparison between the proposed

method and the baseline.

A test subject is considered a true positive after post-processing, if any

of the remaining clusters partially or completely overlap with the original le-

sion/resection area [96, 14]. In the case, where all the significant clusters fall

outside the lesion/resection region, the test subject is regarded as a false neg-

ative. It should be kept in mind that detections outside the lesion/resection

zone may actually represent abnormal cortical tissue (c.f. Section 3.1). Thus,

the statistics provided here represent an lower bound on actual classifier per-

34

formance.

3.4.4 Results

We use the z-score based approach proposed in [96] as a baseline, which uses

a single feature to detect abnormal vertices. Specifically, the vertices of the

registered data are z-score normalized using first and second order statistics

from the control population. Then the resulting z-scores are thresholded at

z = 2.1 to identify lesional vertices. Although any one of the five available

features can be used within this approach, we selected to work with cortical

thickness which was reported by Thesen et al. [96], to be the most effective

feature for detecting FCD lesions. Furthermore, the detected clusters for the

z-score based approach were post-processed using the same method as outlined

in Section 3.4.3.

We used the Dice coefficient (DC) [26] to quantify the performance of both

the proposed approach and the z-score baseline. DC is a set similarity metric

that is a special case of the kappa statistic [113]. It is commonly used to

measure the accuracy of segmentation in medical images [114, 85, 6] when

ground truth is available. We use the DC to measure the overlap between

the final detected clusters (after post-processing) with the available resection

(for MRI-negative patients) and the expert-traced lesions (for MRI-positive

patients). Let Mpred be the binary mask created that represents all the final

detected clusters, and let Mlabel be the binary mask representing the vertices

within the lesion/resection zone for a given subject. The DC is then calculated

as:

DC(Mpred,Mlabel) =2 |Mpred ∩Mlabel||Mpred|+ |Mlabel|

(3.1)

35

Subj. Z-Score MLId. TPR FPR DC TPR FPR DC

NY49 11.85 1.00 19.92 24.76 2.27 34.58NY53 20.28 2.60 29.60 27.72 4.46 35.42NY123 29.80 3.68 27.61 31.33 4.50 26.36NY143 16.38 0.60 12.28 20.03 2.00 5.81NY156 26.12 1.20 38.69 25.65 2.11 36.14NY187 - 0.50 - - 0.90 -NY194 7.79 0.14 14.00 11.48 0.58 18.18

Mean 16.03 1.40 20.30 20.14 2.41 22.36

Table 3.1: The detection performance of the z-score baseline approach andthe proposed scheme (ML) on MRI-positive subjects. The true positive rate(TPR) and false positive rate (FPR) are calculated as the percentage of le-sional vertices correctly labeled, and the percentage of non-lesional verticesincorrectly labeled, respectively. The Dice coefficient (DC) measuring the de-gree of spatial overlap (shown here as a percentage) between the detectedclusters and the expert-marked lesion on the cortical surface is also listed (‘-’represents a value of zero, and for both TPR and DC signifies that no abnormalcluster was detected that overlapped with the lesion).

Thus we want to maximize DC. In addition to DC, we also estimate the false

positive rate: the percentage of non-lesional vertices incorrectly predicted as

lesional, and the true positive rate: the percentage of lesional vertices correctly

detected. It should be noted that all the performance metrics are calculated

using the original expert-marked resection/lesion zones as the ground truth,

therefore the estimates of true positive rate should be considered as lower

bounds.

Table 3.1 shows the performance of both approaches on MRI-positive sub-

jects. Both approaches detect significant clusters in the expert-marked lesional

area for the same subjects. It can be seen that the ML based approach de-

tects larger clusters within the lesion area as compared to the baseline but

has a higher false positive rate. However, the overall difference between the

true positive rate and DC of the two approaches was found to be not statisti-

36

(a) (b)

Figure 3.3: The detection results for the machine learning based approach on(a) an MRI-positive and (b) an MRI-negative subject. The inflated lateral andmedial cortical surfaces show the original expert-marked lesion or the resectionzone as the regions outlined by the white solid curve and the significant lesionalclusters discovered by the machine learning approach as the yellow solid filledregions. The MRI slice on the right shows the abnormal area correspondingto the clusters discovered inside the lesion/resection from the actual brainvolume.

cally significant. Figure 3.3(a) shows the detection results using the proposed

approach on an MRI-positive subject after post-processing.

Whereas, both approaches have the same detection rate for MRI-positive

subjects, the proposed approach outperforms the z-score based method for

MRI-negative subjects. Table-3.2 compares the performance of both approaches

on MRI-negative subjects. The ML approach is able to correctly detect sig-

nificant clusters inside the resection zone for 14 out of 24 subjects whereas the

z-score based method is able to correctly detect lesions in only 9. The proposed

approach has a higher true positive rate, and a higher DC. The difference be-

tween the calculated DC values for the proposed approach and the baseline

was found to be significant using a two-tailed t-test (t(23) = 3.34, p=0.0029).

However, the proposed method has a higher FPR than the z-score method

(1.04% versus 0.58%). Figure 3.3(b)b shows the detection results using the

proposed approach on an MRI-negative subject after post-processing.

37

Most of the automated FCD lesion detection schemes reported in the lit-

erature have been evaluated on MRI-positive patients [43, 96]. However, our

dataset contains 24 MRI-negative patients whose lesions were not detected

by experienced radiologists. The baseline z-score based method used here is

the actual computational tool being used at NYU Comprehensive Epilepsy

Center for patient evaluation. In this context, the higher detection rate of

the proposed scheme shows its higher sensitivity to detecting subtle cortical

malformations in FCD patients. We conjecture, that one of the reasons for

this higher sensitivity arises from the use of all five features to characterize

abnormal regions, that are overlooked by the univariate baseline method.

3.5 Sensitivity Analysis

The vertex-based classifier developed in this chapter is designed specifically to

circumvent the idiosyncrasies of the FCD dataset. In this section we evaluate

the impact of each individual design choice on classifier performance. Specif-

ically, these design choices include data stratification based on sulcal depth

measures, bagging and manual reduction of the expert marked lesion/resection

area.

3.5.1 Data Stratification

In order to determine whether correcting for cortical complexity by stratify-

ing classifiers by sulcal depth results in improved detection rates, we re-ran

the training phase in the leave-one-patient-out cross-validation without this

correction (note that we retain bagging and mask reduction). As depicted

in Tables 3.3 and 3.4 (compare the ML column to column A), the true posi-

tive rate dropped from 20.1% to 12.9%, in the MRI-positive group, and lesion

38

Subj. Z-Score MLId. TPR FPR DC TPR FPR DC

NY46 - 0.34 - 0.95 0.74 1.78NY51 2.86 1.00 5.11 4.15 1.02 7.24NY67 4.35 0.26 8.13 8.30 0.65 14.45NY68 0.09 1.33 0.14 0.12 1.69 0.15NY72 - - - 0.55 0.25 1.07NY98 - 0.33 - - 0.81 -NY116 - - - - 0.39 -NY130 - 0.16 - - 0.25 -NY148 - 0.10 - - 0.12 -NY149 - 0.84 - - 1.68 -NY169 - 1.02 - 9.41 1.98 8.97NY171 2.45 1.00 2.93 2.94 1.80 2.57NY177 1.88 0.14 3.59 3.20 0.32 5.80NY207 - 0.05 - - 0.60 -NY212 - 1.01 - - 1.60 -NY226 - 0.50 - 1.09 0.60 1.88NY241 - 0.33 - - 0.40 -NY255 3.23 0.42 6.01 6.18 1.30 10.30NY259 - 0.50 - - 0.58 -NY294 - 0.50 - - 1.40 -NY297 2.98 0.14 5.64 7.98 0.56 13.30NY299 - 3.13 - 3.30 4.86 4.32NY312 6.10 0.50 10.05 9.04 0.97 12.83NY322 1.74 0.31 3.25 2.02 0.49 3.66

Mean 1.07 0.58 1.87 2.47 1.04 3.68

Table 3.2: Results for MRI-negative subjects. For each subject the true posi-tive rate (TPR) and false positive rate (FPR) are calculated as the percentageof lesional vertices correctly labeled, and the percentage of non-lesional ver-tices incorrectly labeled, respectively. The dice coefficient (DC) is also shownas a percentage to quantify the overlap between the detected clusters and theresection on the cortical surface (‘-’ represents a value of zero for FPR andno-detection for TPR and DC).

39

Subj. Z-Score ML (A) (B) (C)Id. TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR

NY49 11.8 1.0 24.8 2.3 8.7 0.6 - - 0.6 -NY53 20.3 2.6 27.7 4.5 23.5 4.2 10.6 1.2 4.3 0.2NY123 29.8 3.7 31.3 4.5 31.4 4.6 25.2 1.2 7.4 0.1NY143 16.4 0.6 20.0 2.0 - 0.4 - - - 0.1NY156 26.1 1.2 25.7 2.1 26.4 1.8 20.1 0.4 2.1 -NY187 - 0.5 - 0.9 - 0.6 - 0.4 - -NY194 7.8 0.1 11.5 0.6 - - - - - -

Mean 16.0 1.4 20.1 2.4 12.9 1.7 8.0 0.5 2.1 0.1

Table 3.3: A comparison of detection results using the z-score based method and the ML methods only for MRI-positive subjectswith different variations in the design of the ML approach. (A) no stratification along the sulcal values, (B) stratifies the databased on the sulcal depth values, but does not reduce the lesion mask. (C) uses stratification, lesion reduction by calculating athreshold for each sulcal level using cortical thickness values, but it does not use bagging (The TPR and FPR are measured asa percentage and‘-’ represents a value of zero for FPR and no-detection for TPR).

Subj. Z-Score ML (A) (B) (C)Id. Detected FPR Detected FPR Detected FPR Detected FPR Detected FPR

NY46 n 0.34 y 0.74 n - n - n -NY51 y 1.00 y 1.02 y 0.95 y 0.30 n -NY67 y 0.30 y 0.65 y 0.19 n - n -NY68 y 1.33 y 1.69 y 1.56 y 0.81 n -NY72 n - y 0.25 n - n - n -NY98 n 0.33 n 0.81 n 0.56 n - n -NY116 n - n 0.39 n - n - n -NY130 n 0.16 n 0.25 n 0.16 n - n -NY148 n 0.10 n 0.12 n 0.10 n - n -NY149 n 0.84 n 1.68 n 0.18 n - n 0.06NY169 n 1.02 y 1.98 n 0.09 n - n -NY171 y 1.00 y 1.80 y - n - n -NY177 y 0.14 y 0.32 y 0.30 n - n -NY207 n 0.05 n 0.60 n - n - n -NY212 n 1.01 n 1.60 n 1.13 n 0.53 n 0.06NY226 n 0.50 y 0.60 n 0.46 n 0.20 n -NY241 n 0.33 n 0.40 n 0.35 n 0.08 n -NY255 y 0.42 y 1.30 y 0.08 n - n -NY259 n 0.50 n 0.58 n 0.50 n 0.17 n 0.06NY294 n 0.50 n 1.40 n 0.20 n - n -NY297 y 0.14 y 0.56 y - n - n -NY299 n 3.13 y 4.86 n 0.30 n - n -NY312 y 0.50 y 0.97 y 0.73 y - n -NY322 y 0.31 y 0.49 y 0.12 n - n -

Mean 9/24 0.58 14/24 1.04 8/24 0.33 3/24 0.09 0/24 0.005

Table 3.4: A comparison of detection results using the z-score based method and the ML method only for MRI-negative subjectswith different variations in the design of the ML approach. (A) no stratification along the sulcal values, (B) stratifies the databased on the sulcal depth values, but does not reduce the lesion mask. (C) uses stratification, lesion reduction by calculating athreshold for each sulcal level using cortical thickness values, but it does not use bagging (FPR is given as a percentage and ‘-’represents a value of zero for FPR).

detection dropped from 58% to 33% in the MRI-negative group. This sug-

gests that different feature combinations might be more prevalent in specific

regions (e.g., sulcus, gyrus, wall), which is consistent with the observation of

region-specific dysplasia subtypes (e.g., bottom-of-the-sulcus dysplasia) [41].

3.5.2 Mask Reduction

The proposed classification approach relies heavily on the availability of clean

training data (i.e., positive and negative instances with minimal label noise).

This observation holds irrespective of the choice of the base linear classifier

(logistic regression in our case). There are two main issues that need to be

evaluated for this strategy: i) is mask reduction necessary? and ii) how resilient

is the classifier performance to the choice of the manually selected thresholds?

Mask reduction aims to reduce all the vertices labeled as lesional to only

those that were significantly thicker or thinner than non-lesional vertices. We

tested the improvement in detection rates when utilizing this strategy by re-

running our analysis without mask reduction (note that we retain stratification

and bagging). The results are depicted in Tables 3.3 and 3.4 (compare the ML

column to column B) and show a drop in detection rates for both the MRI-

positive group (from 6/7 (86%) to 3/7 (43%)) and the MRI-negative group

(from 14/24 (58%) to 3/24 (12.5%) detections). This indicates that class label

noise is a significant issue for both groups that can be corrected by utilizing a

mask reduction strategy with a separate threshold for cortical thickening and

cortical thinning.

Given the high impact of mask reduction on classifier performance, the

thresholds used in manually reducing the resection/lesion area (c.f. Section

3.1.1) play a critical role. These thresholds are selected manually, and thus

involve human subjectivity. In order to test the sensitivity of our results

42

0.007 0.008 0.009 0.01 0.011 0.012 0.013 0.0140

0.1

0.2

0.3

0.4

0.5

0.6

0.7

False Positive Rate

Dete

cti

on

Rate

−7.5%

τ −5%

+7.5%

+10%

−10%+ 5%

Figure 3.4: The effect of changing the manually determined thresholds forreducing the resected region to eliminate label noise on the detection rateof MRI-negative subjects. The detection rate represents the percentage ofMRI-negative subjects for whom significant clusters were detected within themarked resected area.

to these thresholds, we perturbed the thresholds by a relative amount and

recorded the effect of this change on the final detection rates for both MRI-

positive and MRI-negative subjects.

Let ∆ represent the relative change in both the thresholds τthin and τthick

(c.f. Section 3.1). A positive change makes the absolute threshold higher,

which represents a more exclusive selection criterion for keeping lesional ver-

tices in the training data from the marked lesion/resection area. Similarly,

a negative change lowers the threshold selecting a (possibly) higher number

of vertices as lesional. The detection rate for MRI-positive subjects was not

impacted by this perturbation. Figure 3.4 plots the detection rate versus the

FPR for different perturbations in the thresholds for MRI-negative subjects.

The detection rate for MRI-negative subjects changes when the manually iden-

tified thresholds are adjusted more than 5% of their original values. Adjusting

the thresholds for the MRI-negative subjects increases both the detection rate

and the false positive rate. Note that the thresholds used in our experiments

were set once, independently for each subject manually, and were not adjusted

43

throughout the course of this study.

The results show that even though the proposed approach is robust to

slight perturbations in the mask reduction thresholds, this step is crucial for

both the resulting detection rate and the false positive rate. This finding

emphasizes the presence of label noise in data, especially when the resected

regions of MRI-negative patients are used as ground truth. Similarly, this step

also highlights the main limitation of the proposed approach: valuable expert

time is needed to establish the mask reduction thresholds for all new training

subjects.

3.5.3 Bagging

We examined the impact of bagging on the classifier by replacing each bag

with a single logistic regression classifier, trained on all the negative instances,

while retaining both stratification and mask reduction. As can be seen from

Tables 3.3 and 3.4, eliminating bagging resulted in the most substantial drop

in performance; the TPR of MRI-positive group dropped from 20.1% to 2.1%

and for the MRI-negative group the detection rate dropped from 14/24 (58%)

to 0/24 (0%). In other words, failing to correct for the class imbalance problem

resulted in zero detection of MRI-negative FCD lesions.

Figure 3.5 summarizes the changes in detection rates for MRI-negative

patients as the design choices are omitted individually (i.e., one at a time)

from the proposed classifier design.

3.6 Conclusion

This chapter provided key insights for developing an effective FCD lesion de-

tection scheme using surface-based morphometry for MRI-negative patients.

44

10

20

30

40

50

60

70

Dete

cti

on

Rate

(%

)

Effect of Classifier Design on the Detection Rate for MRI−Negative Subjects

Without LesionReduction

(B)

Without SulcalStratification

(C)

Combined(D)

No Bagging(A)

Figure 3.5: A comparison of detection rate for MRI-negative subjects with dif-ferent variations in the design of the proposed approach, including A) withoutbagging, but with sulcal stratification and mask reduction, B) without maskreduction but with bagging and sulcal stratification, C) without stratificationbut with bagging and lesion reduction, and D) using all three corrections.

We identified three different confounding factors that can undermine classifier

performance if not taken into account. The empirical evaluation shows that

the detection results are greatly enhanced as compared to a baseline scheme

that does not incorporate the design choices tailored specifically to counter

these confounding factors in data.

Our results offer a potential advancement of neurodiagnostic tools for the

more challenging population of MRI-negative patients. However, the case-

control methods we utilize in our approach require a large normal control

cohort with identical MRI scanning parameters as those of the patient and

thus cannot be readily applied in any clinical setting. Furthermore, auto-

mated detection and classification of lesions should not replace careful visual

analysis by a trained expert. Rather, the quantitative approaches can be used

to supplement visual analysis by highlighting areas with a high lesional prob-

ability, similar to a focus of attention mechanism. Such quantitative lesion

45

detection methods as the one developed in this chapter aim to increase the

chances of accurately detecting the lesion in the pre-surgical phase, and are

primarily designed to highight all possible problematic cortical regions that

may be overlooked by expert radiologists during conventional visual inspec-

tion. Therefore, the availability of such a classification scheme that can identify

FCD lesions especially in MRI-negative patients with higher accuracy during

the initial diagnostic stages enhances the chance of a successful outcome for

resective surgery.

The training data consisting of the MRI-negative cases were derived from

resection areas that were identified using iEEG. FCD pathology was present in

the resection area in all such patients; however, non-lesional tissue may have

also been part of the resection. We reduced this problem (label noise) by ap-

plying a manual mask reduction step which resulted in enhanced performance.

Mask reduction is the main shortcoming of the proposed approach; it requires

expert time to establish the mask reduction thresholds and is prone to human

errors. In the next chapter we develop a machine learning model that incor-

porates the results of the iEEG exam as an additional source of supervision,

with the aim of eliminating the manual mask reduction step.

In summary, this chapter demonstrated that a quantitative morphometric

method using surface-based brain modeling, combined with machine learn-

ing algorithms and novel strategies to deal with the complexity of cortical

malformations, results in improved detection of FCD. Improved detection of

neocortical structural lesions is likely to increase the number of patient re-

ferrals to specialized tertiary epilepsy centers for surgical consideration, and

in many cases, may decrease the delay between initial diagnosis and surgery.

This has significant implications for improved seizure and cognitive outcomes

in patients with FCD and concomitant epilepsy.

46

Chapter 4

Leveraging iEEG for FCD

Lesion Detection

“The signal is the truth. Thenoise is what distracts us fromthe truth.”

Nate Silver

The classification scheme developed in the last chapter highlighted a num-

ber of confounding factors that inhibit the performance of supervised learning

algorithms for lesion detection in MRI-negative patients. Specifically, the pro-

posed detection scheme involved a manual mask reduction step, requiring both

expert time and knowledge to determine the mask reduction thresholds. Mask

reduction is an ad-hoc procedure aimed at reducing the label noise that arises

when resected regions are treated as ground truth. Similarly, the morphology

of the human brain such as its thickness, curvature and the overall structure

in general are affected by different demographic factors such as age, gender

and education [84, 83]. Learning a classifier by aggregating the data from the

patient population (as done in the previous chapter), and not accounting for

these sub-trends in data, has a negative impact on classifier accuracy. In this

chapter we adopt a multitask learning (MTL) approach for detecting FCD

47

lesions in treatment resistant epilepsy patients (TRE), which uses the results

of the iEEG exam as a second source of supervision in addition to the vertex

labels provided by the resections for individual patients. Secondly, instead of

pooling the data from the entire patient population, we now treat each patient

as a separate learning task.

As mentioned previously, before undergoing resective brain surgery, all pa-

tients are subjected to an invasive intracranial EEG (iEEG) exam. In this

exam sub-dural electrodes are implanted on the cortical surface to record elec-

trical activity [112]. A board of certified epileptologists reviews this informa-

tion to determine the region that is responsible for generating the seizure i.e.,

the seizure onset zone. To isolate the abnormal region, each electrode is la-

beled as being part of the seizure onset zone or not. iEEG has been shown

to be effective for localizing FCD lesions [65]. However, for MRI-negative pa-

tients there is no visible lesion to guide precise electrode implantation, which

results in sampling errors. In such cases the identified abnormal region fails

to capture the lesion in its entirety in about 40% of the cases [43], resulting

in poor surgical outcomes. In this work, we augment the MRI labels i.e., the

resected regions for MRI-negative patients and the expert marked lesion of

MRI-positive patients with the results of iEEG analysis to mitigate the effects

of label noise.

By using this combined supervision, our goal is to show that the manual

mask-reduction step can be by-passed and the results of the iEEG exam can

be used to focus the attention of the learner on regions that might have a high

chance of being abnormal as compared to other regions within the resection. It

should be noted here, that during classification we only have access to patient’s

MRI data, and the extra supervision from iEEG is used only for training.

Thus, the proposed method serves as a pre-surgical patient evaluation tool that

48

detects candidate lesional regions on a patient’s MRI data; these candidate

regions are further evaluated using invasive iEEG and video monitoring to

locate the final target for resective surgery.

To address inter-patient variability we treat each patient as a separate clas-

sification task. To this end, we use the patient’s MRI to isolate the ressected

region (positive instances) and extract the same region from an age and gender

matched healthy control subject (negative instances). We then use MTL to

learn a common classifier (across all tasks), using the datasets gathered from

all the subjects. The common classifier can then be used to detect FCD lesions

in new patients.

The contributions of this work are:

� We extend the regularized multitask learning framework [31] (described

briefly in Section 4.1) to incorporate auxiliary label information as an

additional source of supervision when the training data has weak labels.

� We model the case when the auxiliary information has similar semantics

across tasks, and the case when its semantics differ among tasks (Section

4.2).

� We cast the task of automated detection of FCD lesions in MRI-negative

epilepsy patients, as a multitask learning problem and incorporate the

results of their iEEG exams to provide additional supervision to amelio-

rate the problem of label noise that arises when resection zones are used

as ground truth (Sections 4.3 and 4.4).

4.1 Multitask Learning

Multi-task learning (MTL) simultaneously learns multiple related prediction

tasks which can be represented using a shared common structure [22]. MTL is

49

ideally suited for domains in which the data is collected from different sources,

each of which when considered individually does not have enough data to fa-

cilitate learning a reliable prediction model. However, because the data are

collected from disparate sources, the underlying distribution for each source

has its own distinct characteristics. This causes a co-variate shift that neg-

atively impacts the performance of a classifier learned by simply pooling the

data together. MTL exploits the “relatedness” among tasks by sharing the

common information, through joint representation, while regulating the influ-

ence of each data source on the final model. A number of different approaches

have been taken to develop robust MTL frameworks, including hierarchical

Bayes [98], regularization [31], and Gaussian processes [109]. These have been

successfully applied in various application domains, such as object detection

[100], conjoint analysis [58], classification of light curves [109], etc.

Similar to traditional learning methods, MTL requires accurately anno-

tated training data. However, in many domains there is significant label noise

that arises due to human subjectivity, imprecise measurements, etc. Because

MTL allows information to be shared among tasks during the learning phase,

the impact of label noise can be compounded, undermining the performance

of the prediction model. In this work we formulate an MTL model that learns

from imprecise labels, given access to an additional source of information which

provides a score for an instance quantifying the confidence associated with its

label. We address the added complexity of this auxiliary label information

when it exhibits a co-variate shift and behaves differently across tasks. To this

end, we extend the regularized MTL framework [31] to incorporate auxiliary

supervision when the training labels are uncertain. We have chosen this frame-

work as it admits a solution similar to support vector learning [104], which

along with learning large-margin classifiers allows the use of kernel functions

50

[86] to learn both linear and non-linear solutions.

Label noise is a prevalent in most real-world datasets [40, 68], especially in

domains where human experts are involved in the labeling process. In general,

there are two main approaches to deal with label noise within the context of

supervised learning. The first is to identify the noisy labels [21], and either

discard them or assign them lower weights [77]. The second approach assumes

that we are provided with scores quantifying the uncertainty of each training

label. For example, in learning from probabilistic labels [90, 61] the class of

each instance is specified by a probability distribution over the possible class

labels. Our work falls within this second approach but instead of assuming

that the training labels are “soft” we assume that we have access to a sec-

ondary source of information, providing the means to infer the probability for

a particular instance as belonging to either the positive or negative class. In

this regard, the work that is closest to ours is Nguyen et al. [69], in which

the authors assume the availability of an additional source of label informa-

tion. They model this side information as inducing a pair-wise ranking in the

context of learning a binary classifier. We adopt a similar ranking approach,

however, the main difference between their approach and our work is that we

consider the inclusion of additional label information in the context of MTL

rather than for single task learning. We go beyond simply incorporating the

additional label information into the regularized MTL framework, by taking

two different modeling approaches. In the first approach we assume that the

additional source behaves uniformly across tasks, while in the second approach

we allow its underlying semantics to be different for different tasks.

51

4.2 MTL with Auxiliary Label Information

We first provide the details of our notation and review the regularized MTL

framework [31]. We then describe our proposed modifications to this frame-

work to incorporate auxiliary label information when the training labels are

imprecise. For clarity and ease of comparison we mostly use the same notation

as Evgeniou et al. [31].

Notation: We consider that we have data from T related classification tasks

given as (xti, yti), where xti ∈ Rd and yti ∈ {−1, 1} for all i ∈ {1, 2, . . . ,m} and

t ∈ {1, 2, . . . , T}. All tasks share the same feature space and, without loss

of generality we assume that all tasks have an equal number (m) of training

instances and the underlying data distributions for all tasks are different but

related. The goal then is to simultaneously learn T classifiers, one per task,

such that ft(xti) = yti ; ∀t ∈ {1, 2, . . . , T}.

In addition to labeled data we also have a label score rti for each training

instance. This score is considered relative to either the positive or the negative

class and represents the degree of “positive-ness” or “negative-ness” of the

instance. For the sake of clarity in the rest of the chapter we assume that this

score is generated relative to the positive class. This score induces a pairwise

ranking of instances for each task: (i, j) ∈ Πt : rti ≥ rtj. The ranking function

Πt is adapted from rank-SVM [48]. The score assigned by the ranking function

i.e., rti , reflects the degree of belief about an instance belonging to the positive

class.

52

4.2.1 Regularized Multi-task Learning (MTL)

The regularized MTL approach [31] learns a separate classification function,

ft(x) = wt · x for each individual task t, defined as:

wt = w0 + vt (4.1)

where w0 ∈ Rd is a vector that represents the parameters common to all tasks

and vt ∈ Rd are the task-specific parameters. If the tasks are highly similar,

then the vt are small relative to w0, and vice versa. All the classifiers ft can

be learned simultaneously by solving the following optimization problem:

minw0,vt,ξti

T∑t=1

m∑i=1

ξti +λ1

T

T∑t=1

‖vt‖2 + λ2‖w0‖2

subject to:

∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti

ξti ≥ 0

(4.2)

where, λ1 and λ2 are regularization parameters that control the relatedness of

the tasks; a large value of the ratio λ1λ2

will drive the task-specific parameter

vectors toward zero with the effect that all tasks will have an identical solution

given by w0. Whereas, a small value of λ1λ2

has the opposite effect of making all

the tasks independent and driving w0 to zero [31]. This model can be viewed

as jointly learning a mean support vector machine (SVM) represented by w0

and T task-specific SVMs, each represented by wt, such that each of the wt

has a large margin while being as close as possible to w0 [31]. Equation 4.2 can

be solved efficiently by formulating its dual. If we re-parametrize Equation 4.2

by defining two new parameters: C = T2λ1

and µ = Tλ2λ1

, then the dual problem

53

takes the form [31]:

maxαti

{T∑t=1

m∑i=1

αti −1

2

T∑t=1

m∑i=1

T∑s=1

m∑j=1

αtiytiαsjysjKst

(xti, x

sj

)subject to:

∀i, ∀t : 0 ≤ αti ≤ C

(4.3)

where αti are the Lagrange multipliers. The dual optimization problem is

identical to the dual of a binary SVM. Therefore, any standard SVM solver

can be used to solve for α. The structure of MTL is captured by the kernel

function Kst(.):

Kst(xti, x

sj) =

(1

µ+ δst

)xti · xsj (4.4)

This multi-task kernel couples the inner-product of the instances across tasks

based on µ = Tλ2λ1

which represents the degree of relatedness among the tasks.

The usual kernel-trick can be applied here to learn non-linear classifiers, by

replacing the standard inner product in Equation 4.4 with a kernel function

[31].

4.2.2 Incorporating Auxiliary Label Information

Most real-world datasets contain varying levels of label noise that result from

human subjectivity, missing information, imprecise measurements or varia-

tions in expert opinions over time. The MTL framework outlined above re-

quires accurate class labels during the learning phase, and the presence of label

noise can seriously undermine its performance. We consider the case when the

labels are noisy but there is another source of obtaining supplementary la-

bel information that can accurately grade the “positive-ness” of an instance.

This additional source may represent the subjective judgment of a domain

expert(s) about a particular instance, based on either the same set of features

54

that are available to the learner or some other view of the data. For example,

in our case, the data available to the learner consists of image features taken

from the patient’s MRI, while the source of auxiliary label information is pro-

vided by the results of an iEEG examination carried out by a panel of expert

epileptologists.

Modeling Side Information as Ranking

In this work we interpret the auxiliary label scores as the output of a pairwise

ranking function. This means that an instance with a higher score is more

“positive” than another instance that has a lower score. We can model the

behavior of the ranking function as being globally-consistent (i.e., it does not

not vary across tasks). In other words, if we consider the ranking function as

representing an expert’s judgment, then this assumption would require that

his/her evaluation criteria for ascertaining the rank of an instance does not

change from one task to another. This is a strong assumption, because most

real-life experts will make varying judgments based on the nature of the task at

hand. For example, determining the range of cortical thickness values that are

abnormal for a patient depends on the patient’s age and gender [84]. Therefore,

an expert’s criteria would be calibrated differently for different patients. We

can model this variation by taking a task-specific approach that allows each

task to have its own ranking function.

From the perspective of MTL, the difference between the two approaches

is whether the rankings are shared across tasks or not. In the task-specific

approach, we cannot compare the ranks of two instances belonging to differ-

ent tasks, because the underlying semantics of ranking are different. In the

globally-consistent case, rank information can be shared among tasks without

introducing any discrepancies.

55

We incorporate the auxiliary label information by modifying the original

regularized MTL model (Equation 4.2) such that the final model not only max-

imizes classification accuracy but additionally preserves the pairwise ranking.

Similar to rank-SVM [48], we take the rank of each instance as being propor-

tional to its distance from the separating hyperplane:

(i, j) ∈ Πt : wt · xti ≥ wt · xtj

where, Πt is the ranking function for task t. For each pair of instances be-

longing to task t we augment the original MTL problem (Equation 4.2) with

pairwise rank constraints [69]. A similar approach is taken in [69] for training

single-task binary SVMs in the presence of label noise.

4.2.3 Globally-Consistent Label Ranking (GC)

Here we consider all pairwise ranking functions Πt, ∀t to be identical. In this

case we modify the original MTL framework (Equation 4.2) by adding rank

constraints, which involve both the shared and task-specific parameters. The

new optimization problem is:

minw0,vt,ξti ,η

tpq

{1

2

T∑t=1

‖vt‖2 +µ

2‖w0‖2+

+C

T∑t=1

m∑i=1

ξti + C ′T∑t=1

∑(p,q)∈Πt

ηtpq

subject to:

∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti

∀t,∀ (p, q) ∈ Πt : (w0 + vt) ·(xtp − xtq

)≥ 1− ηtpq

ξti ≥ 0 , ηtpq ≥ 0

(4.5)

56

where ηtpq are slack variables that allow some of the rank constraints to be

violated. C ′ is a positive scalar and represents the relative cost of violating a

rank constraint. It is defined as a multiple of the original MTL cost parameter,

C ′ = aC, a ∈ R+. Equation 4.5, can be viewed as learning a classifier with a

dataset augmented with a fixed number of pseudo-examples corresponding to

the difference vectors generated by the rank constraints. This becomes more

evident if we re-write the rank constraints as:

ztpq (w0 + vt) ·∆tpq ≥ 1− ηtpq

where, ztpq = 1 are the labels for each pseudo-example: ∆tpq = xtp − xtq. By

augmenting the data for each task with the pseudo-examples ∆ij and their

corresponding labels zij = 1 we combine the two sets of constraints to solve

a single classification problem. However, the number of pseudo-examples is

quadratic in terms of the number of instances in the original dataset for each

task. This will cause the number of positive instances to be substantially

higher than the number of negative instances, which in the worst case scenario

can result in a degenerate solution in which the resulting hyperplanes only

respect the rank constraints. The trade-off between preserving the ranking

and accurate classification is controlled by the cost parameters C ′ and C,

respectively. The cost parameters are analogous to the cost parameter of

the traditional support vector machine (SVM) [86]. We set these parameters

based on a grid search, which is the standard procedure for training SVMs.

The problem described in Equation 4.5 can be efficiently solved by formulating

57

its dual using the kernel function from Equation 4.4.

maxαti,β

tpq

T∑t=1

m∑i=1

αti +

T∑t=1

∑(p,q)∈Πt

βtpq −

1

2

T∑t=1

m∑i=1

T∑s=1

m∑j=1

αtiytiαsjysjKst

(xti, x

sj

)−

T∑t=1

m∑i=1

T∑s=1

∑(p,q)∈Πt

αtiytiβspqz

spqKst

(xti,∆

spq

)

−1

2

T∑t=1

∑(p,q)∈Πt

T∑s=1

∑(k,l)∈Πs

βtpqztpqβ

sklz

sklKst

(∆tpq,∆

skl

)subject to:

∀i, ∀t : 0 ≤ αti ≤ C , 0 ≤ βtpq ≤ C ′

(4.6)

where, αti and βtpq are Lagrange multipliers corresponding to the classification

and rank constraints, respectively.

It is worth mentioning that the pseudo-examples are created on a per task

basis; i.e., there are no pseudo-examples resulting from comparing the ranks

of two instances from different tasks. The assumption of global consistency

is exploited in the construction of the rank constraints (c.f., Equation 4.5),

in which both w0 and vt are required to preserve the ranking. This can be

made explicit by inspecting the optimal solutions for both w0 and vt, obtained

by formulating and solving the Lagrangian function for Equation 4.5. We

can find the optimal value of both the mean weight vector w0 and the task-

specific weight vectors vt by formulating the Lagrangian function for problem

4.5. These are found to be:

w∗0 =1

µ

T∑t=1

m∑i=1

αtiytixti +

T∑s=1

∑(p,q)∈Πs

βspqzspq∆

spq

(4.7)

v∗t =m∑i=1

αtiytixti +

∑(p,q)∈Πt

βtpqztpq∆

tpq (4.8)

where, α and β are Lagrange multipliers.

58

4.2.4 Task-Specific Label Ranking (TS)

To model the peculiarities that may exist in the source of auxiliary information

as we move from one task to another, we can limit the influence that the rank

constraints have on the overall solution, by limiting them to affect only the

task-specific components. This is formulated similar to Equation 4.5, with

modified rank constraints:

minw0,vt,ξti ,η

tpq

{1

2

T∑t=1

‖vt‖2 +µ

2‖w0‖2+

C

T∑t=1

m∑i=1

ξti + C ′T∑t=1

∑(p,q)∈Πt

ηtpq

subject to:

∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti

∀t,∀ (p, q) ∈ Πt : ztpq vt ·∆tpq ≥ 1− ηtpq

ξti ≥ 0 , ηtpq ≥ 0

(4.9)

By not allowing the rank information to directly influence the shared compo-

nent w0, the ranking function Πt is no longer coupled across tasks, and can

behave differently for different tasks. It should be noted that although the

shared weight vector w0 is not required to preserve the rankings, it is still

indirectly affected by the rank constraints through vt (c.f., Equation 4.1). The

dual of 4.9 can be formulated as:

maxαti,β

tpq

T∑t=1

m∑i=1

αti +

T∑t=1

∑(p,q)∈Πt

βtpq −1

2µ

T∑t=1

m∑i=1

T∑s=1

m∑j=1

αtiytiαsjysj

⟨xti, x

sj

⟩−

T∑t=1

m∑i=1

∑(p,q)∈Πt

αtiytiβtpqz

tpq

⟨xti,∆

tpq

⟩−1

2

T∑t=1

∑(p,q)∈Πt

∑(k,l)∈Πt

βtpqztpqβ

tklz

tkl

⟨∆tpq,∆

tkl

⟩subject to:

∀i, ∀t : 0 ≤ αti ≤ C , 0 ≤ βtpq ≤ C ′

(4.10)

59

where, αti and βtpq are Lagrange multipliers corresponding to the classification

and rank constraints, respectively, and 〈., .〉 is the canonical-dot product. The

dual optimization problem in this case can be formulated in a form identical to

Equation 4.6, by using a new kernel function. Let X t ∈ Rd be the augmented

data for task t obtained by combining all the original data instances (xti) and

the pseudo-examples (∆tpq), and let uti be an indicator variable defined as:

utk =

1 if xtk = xti,

0 if xtk = ∆tpq.

where, k ∈ {1, . . . , |X t|}. Using these indicator variables, we obtain a new

kernel function from the primal to dual transformation, which is given as:

Kst(xtk, x

sl ) =

(utku

sl

µ+ δst

)xtk · xsl (4.11)

This multitask kernel [30] does not allow the ranking function Πt to directly

impact w0, restricting the auxiliary label information from being shared among

tasks. In this formulation, the optimal solution for vt does not change and

remains identical to Equation 4.8. The difference lies in the optimal solution

for the shared parameter vector w0, which in this case is given as:

w∗0 =1

µ

T∑t=1

m∑i=1

αtiytixti (4.12)

As expected, it can be seen that w0 is no longer affected by the rank constraints.

In both the globally-consistent and task-specific cases, the final optimiza-

tion takes the form of a standard quadratic program (QP) with box constraints

[19] which can be easily solved using any off-the-shelf QP-solver.1

1In our implementation we used the QP solver included in the optimization toolbox forMatlab (http://www.mathworks.com).

60

4.3 Detecting Cortical Malformations

Instead of classifying individual vertices we classify image patches taken from

the flattened reconstructed surfaces of patients and controls as “lesional” or

“normal”, using a classifier formed from training data that comprises of healthy

controls and patients.

4.3.1 Data Description

The dataset consists of 16 MRI-negative and 2 MRI-positive FCD patients col-

lected over a three year period from the level-4 NYU comprehensive epilepsy

treatment center. Out of the sixteen MRI-negative patients, thirteen overlap

with the MRI-negatives used in the previous Chapter, and three new patients

were added. One of the patients, namely NY68 who was initially classified as

MRI-negative was declared MRI-positive on later examination, we therefore

treat him as an MRI-positive in our current evaluation. All the patients un-

derwent successful resective surgery and histopathological examination of the

ressected tissue showed evidence of FCD. This may seem a small set but note,

that only a few MRI-negative patients proceed to surgery, and out of those

only a third have successful outcomes [57]. The controls were matched from a

cohort comprising of 115 neurotypical controls. All patients and controls were

scanned on the same scanner, with the same specialized T1 MRI sequence.

Our training and test data comprises of image patches taken from the

resected regions of the patients, and corresponding regions from matched con-

trols. We focus only on the resected regions because all the patients included

in our experiments were completely (11 Engel Class 1) or partially seizure-

free (4 Engel Class 2) after surgery, which shows that their resected regions

contained the primary FCD lesion(s). Furthermore, for new MRI-negative

61

patients, seizure-semiology (i.e., signs and symptoms of a seizure) provides

credible but crude estimation about the location of the epileptogenic zone

[81, 70, 101]. In these cases our methods can be applied to the suspected

cortical region to detect possible abnormal regions.

We learn all the patient specific classifiers simultaneously using our pro-

posed MTL approaches. In the original regularized MTL framework (c.f.

Section-4.2.1), the learned model is tested on previously left-out data from

the same tasks that generated the training data. However, in our case the

test data comes from new patients that were not part of the training data. To

generalize the model, such that it can classify data from out-of-sample tasks,

all the task-specific parameter vectors vt are discarded and only the mean

component w0 is retained for detecting FCD lesions for new patients [29].

4.3.2 Segmentation

All the patient and control surfaces were registered to an average surface, such

that there was a one-to-one correspondence between them. After registration

the resected region(s) for each patients and the corresponding region(s) from

his/her matched control was isolated, and flattened to obtain a standard 2-d

image. We use cortical thickness to represent the intensity values, as cortical

thickness has been established as one of the most informative features for char-

acterizing FCD lesions [14, 96, 43]. It should be noted that even though we

only use cortical thickness to obtain the super-pixels, a larger set of morpholog-

ical features (including cortical thickness) is used to describe each super-pixel

during the learning phase.

We use Quickshift [106] for unsupervised segmentation. The standard quick

shift algorithm is a fast mode seeking algorithm similar to mean shift [23]. It

performs a hierarchical segmentation of the image, where the sub-trees repre-

62

sent image segments. One of the main advantages of using quick shift is that

the number and size of segments need not be specified. Additionally, quick

shift does not penalize for boundary regions, and produces a diverse set of

segments having different shapes and sizes.

Quickshift requires setting two parameters, namely the size of the Gaussian

kernel (σQS) used by a Parzen window density estimator, and the maximum

distance (δQS) between two pixels permitted while remaining part of the same

segment. The scale parameter σQS is varied to change the average size of

segments, and δQS is set to be a multiple of σQS [105]. Thus, higher values of

σQS produce larger segments. All the patients and controls were segmented

using the same set of Quickshift parameters (σQS = 8, δQS = 32) that were

optimized using a parameter estimation set of three patients, that were kept

distinct from the fifteen remaining patients whose results are reported here.

After segmentation, each super-pixel is treated as an independent instance.

We used the mean and standard deviation of cortical thickness, gray-white

contrast (GWC), curvature, sulcal depth, Jacobian distortion and local gyri-

fication index (LGI) to represent each super-pixel [96]. Additionally, we also

included the average surface area measured on both the pial and white matter

surfaces.

4.3.3 Creating Electrode Maps

In order to include the additional label information from the iEEG exam we

need to map the implanted electrodes to the cortical surface for each patient.

However, the spatial resolution of the iEEG and MRI are not identical: the

MRI voxels (and their corresponding surface vertices) are smaller than the

point spread function that defines the iEEG generators. More importantly,

localizing the source for a given electrode is subject to the ill-posed inverse

63

Figure 4.1: Mapping iEEG electrodes on the cortical surface. The red spheresrepresent grid electrodes, and the blue spheres represent strip electrodes. Dur-ing iEEG monitoring all the electrodes are monitored for abnormal electricalactivity arising from clinical and sub-clinical seizures. Each electrode is thenlabeled as being part of the seizure-onset zone or not by expert epileptologists.

problem [112]. These are well-known limitations of iEEG that preclude accu-

rate and unambiguous assignment of voxels/vertices to an electrode. Epilep-

tologists and neurosurgeons face the same problem when deciding which parts

of the brain to resect based solely on iEEG results. Solving this problem is

beyond the scope of this work. However, a sphere with half the diameter of

the inter-electrode distance (approximately 10 mm), presents a reasonable cri-

terion for matching iEEG and MRI. Therefore, all surface vertices within a

radius of five millimeters (Euclidean distance) from the electrode’s location

were considered within range of the electrode. We only selected the electrodes

that overlapped with the resected region, and were labeled by the experts

as either being part of the seizure-onset zone or recorded abnormal electrical

activity during seizure onset. Super-pixels containing the selected electrodes

were given higher label scores as compared to other super-pixels in the resec-

64

tion zone. Figure 4.1 shows an example where the iEEG electrodes are mapped

onto the cortical surface.

4.4 Results

The lesions of all the 18 patients used in our experimental evaluation were

located in the temporal region, which is one of the most prevalent localization

of FCD in adults [52]. We have chosen to work with this limited dataset,

in order to reduce computational complexity resulting from the number of

pseudo-instances created for incorporating ranking constraints (from iEEG) in

the proposed model (c.f. Section 4.2.2). All patients had undergone successful

resective surgery and were histopathologically verified to have FCD. There are

two hypotheses that we need to establish based on our experiments: i) to show

that the regularized MTL framework is more effective than traditional learning

methods where the data from all patients is treated the same, and ii) including

auxiliary supervision in the MTL framework boosts the lesion detection rate.

We explain the selection of the baseline methods, the experimental setup, and

then provide the details of how the hyper-parameters were set for the proposed

methods and the baselines.

4.4.1 Baseline Selection

To show that the regularized MTL framework [31] can be used effectively to

detect FCD lesions, we adapt the LDA based classification scheme of Hong

et al., [43] as one of the baselines. To this end we use LDA to classify super-

pixels using the same set of features as used by our proposed MTL methods.

Instead of using a two-stage classifier where the first classifier is used to de-

tect the lesional vertices and the second classifier post-processes the detection

65

results to reduce the number of false detections, we train a single LDA based

classifier that classifies super-pixels as being lesional or not. We also use our

vertex based approach (ML) from the previous chapter as one of the baselines,

without any post-processing to facilitate a fair comparison.

Furthermore, we compare the performance of our proposed methods with

the single task support vector machine with rank constraints (SVMR) based

on the formulation in Ngyuen et al. [69]. As both SVMR and our proposed

methods use iEEG based auxiliary information, this comparison is aimed at es-

tablishing the efficacy of using an MTL formulation, for the task of FCD lesion

detection. Similarly, to highlight the benefit of incorporating auxiliary label

information we contrast the performance of our proposed task-specific (TS)

and globally-consistent (GC) approaches with the regularized MTL framework

[31] which does not incorporate any rank constraints.

4.4.2 Experimental Setup:

A leave-one-out cross-validation strategy was used, in which we left out one

patient’s data and trained on the remaining patients. Hyperparameters for

the proposed methods and baselines were set using the data of three MRI-

negative (NY343, NY394, NY299) patients and their matched controls whose

iEEG data was not available. We will refer to this set of three tasks (i.e., three

patients and their matched controls) as the model parameter set (MPS). The

data for these three patients are distinct from the fifteen patients and controls

used in our experiments.

Setting the Hyper-Parameters

It should be noted that the hyper-parameters were set individually for each test

subject using the MPS. Below we provide the details about how the parameters

66

were set for the different baselines and our proposed methods:

� LDA: The detection threshold (τ ∈ [0, 1]) for LDA was optimized by

maximizing the area under the curve (AUC) over the MPS, during each

round of leave-one-task-out cross validation.

� ML: The detection threshold (τ ∈ [0, 1]) for logistic regression was opti-

mized by maximizing the area under the curve (AUC) over the MPS.

� SVMR: We also used a single-task SVM using the RBF kernel that

incorporates ranking constraints based on the model in [69] as a baseline.

In addition to the cost parameter (identical to the cost parameter of

tradtional SVM) and the scale parameter of the RBF kernel, there is a

third parameter a that defines the relative cost of violating a ranking

constraint (C ′ = aC). All three parameters were set by optimizing the

AUC using the MPS.

� MTL: This corresponds to the regularized MTL framework that does

not incorporate any auxiliary supervision, and uses the resection zone

as class labels. The model parameters were set using a grid-search [31]

and include mis-classification cost (C), task-relatedness parameter (µ)

and the scale (γ) of the RBF kernel. To find suitable values for the

parameters we used a three-level grid and optimized the area under the

curve (AUC) over the MPS.

� GC & TS: These are the proposed methods that incorporate auxiliary

supervision derived from iEEG data. In addition to the three model pa-

rameters for MTL, there is a fourth parameter a that defines the relative

cost C ′ = aC of violating a rank constraint (c.f., Equations 4.5 and 4.9).

To find suitable values for the parameters we designed a four-level grid

and optimized the AUC over the MPS.

67

Parameter Range

µ 10−7, 5−6, 10−6, 5−5, . . . , 103

C 2−10, 2−9, . . . , 210

γ 2−10, 2−9, . . . , 210

a 10−6, 5−5, 10−5, . . . , 103

Table 4.1: Range of values for the model hyper-parameters used in the gridsearch. The grid search optimized the area under the curve (AUC) over themodel parameter set (MPS) consisting of three patients whose data is distinctfrom the fifteen patients used for performance analysis.

Table 4.1 lists the ranges for the parameters used in the grid search for SVMR,

MTL, GC and TS.

4.4.3 Performance Analysis:

We have developed the proposed methods keeping in view their final use as

focus-of-attention tools for neuroradiologists to help them detect visually elu-

sive FCD lesions. Therefore, the detection rate (sensitivity) (i.e., the number

of patients whose lesions are correctly detected) constitute the main result.

For a more detailed performance analysis we calculate the recall and the false

positive rate (FPR). Recall corresponds to the percentage of the vertices on the

patient’s resection zone correctly identified as lesional, while FPR corresponds

to the percentage of vertices incorrectly labeled as lesional on the matched con-

trol’s selected region. Note that the estimates of recall should be considered

as lower bounds because they are calculated using the noisy resection zones as

ground truth.

Table-4.2 compares the detection rate (number of patients with a recall

higher than zero), recall and false positive rate of the proposed task-specific

(TS) and the globally-consistent (GC) approaches with the selected baselines.

Among the baselines, MTL outperforms both LDA and ML by correctly

68

detecting the lesions in twelve patients, whereas LDA and ML detect the lesion

in eleven and nine patients, respectively. Not only does MTL achieve a higher

detection rate, it also has a higher average recall than both LDA and ML. On

the other hand, TS detects the lesion in all fifteen patients and achieves higher

average recall as compared to LDA, ML and MTL. Both TS and SVMR have

the same detection rate, but SVMR clearly outperforms TS as far as average

recall is concerned. The high detection rates of TS and SVMR show that

using iEEG based auxiliary supervision enhances the sensitivity of the lesion

detection scheme, but at the cost of an increased false positive rate. As far

as the FPR is concerned both SVMR and TS have higher FPRs than any of

the other methods. The average FPR of TS is significantly lower than that of

SVMR which is unacceptably high (37%). The lower FPR of TS shows that an

MTL based formulation coupled with auxiliary supervision leads to superior

detection rate and lower FPR.

Turning now to GC, we see that it has the worst performance in terms of

detection rate among all the methods, and correctly detects lesions in only four

of the fifteen patients. The low detection rate of GC when compared to TS,

substantiates our assumption that the information obtained from the iEEG

analysis has task-specific semantics and the criteria used for determining the

seizure-onset zone differ on a patient by patient basis. When this informa-

tion was shared freely among tasks, the label noise was further compounded

resulting in an overall low detection rate for the GC method.

4.5 Conclusion

In this work we addressed the problem of MTL in the presence of uncertain

labels, assuming an additional source of supervision, that we modeled as a

69

Recall False Postive Rate

Id. LDA ML SVMR MTL GC TS LDA ML SVMR MTL GC TS

NY67 0.07 0.04 0.48 - - 0.09 - 0.05 0.22 - - 0.21NY68∗ 0.11 0.19 0.39 0.28 0.09 0.25 0.18 0.14 0.56 0.32 0.15 0.29NY148 0.17 - 0.52 0.22 0.02 0.11 0.07 0.03 0.26 0.08 - 0.03NY169 - 0.12 0.30 0.20 - 0.20 0.1 0.10 0.35 0.17 - 0.13NY186 0.15 0.03 0.60 0.03 - 0.03 - 0.02 0.26 0.03 - 0.03NY187∗ 0.12 - 0.62 - - 0.09 - 0.05 0.39 0.04 - 0.09NY212 - - 0.30 - - 0.09 0.14 0.03 0.26 0.02 - -NY226 0.16 0.13 0.67 0.12 0.06 0.12 0.07 0.04 0.30 0.06 0.03 0.03NY255 0.15 0.06 0.49 0.08 0.46 0.22 - - 0.48 - 0.16 0.05NY259 0.04 - 0.42 0.04 - 0.04 0.05 0.01 0.23 0.09 - 0.13NY294 0.08 - 0.37 0.07 - 0.05 - 0.05 0.25 0.11 - 0.15NY297 - 0.05 0.57 0.05 - 0.05 0.07 0.07 0.23 0.05 - 0.05NY312 0.13 0.21 0.95 0.16 - 0.44 0.13 0.11 0.70 0.13 - 0.14NY351 - - 0.28 0.14 - 0.21 0.02 0.16 0.65 0.16 - 0.15NY371 0.09 0.15 0.53 0.04 - 0.1 0.15 0.05 0.39 - - 0.12Mean 0.09 0.07 0.50 0.10 0.04 0.14 0.07 0.06 0.37 0.08 0.02 0.11

Table 4.2: Detailed results for MRI-negative subjects. LDA is the Fisher linear discriminant analysis based method adapted from[43], ML represents the stratfified classification scheme described in Chapter 3, MTL represents regularized MTL [31] withoutauxiliary supervision, GC and TS are the globally-consistent and the task-specific approaches, respectively (‘-’ represents avalue of zero for FPR and no-detection for recall and precision, ‘*’ MRI-positive patients).

pairwise ranking function. To this end, we extended the regularized MTL

framework [31] by incorporating additional rank constraints. We modeled

the case when there is a single ranking function for all tasks, and the case

where each task has its own ranking function. In the latter task-specific case,

we developed a new multitask kernel for ensuring that ranks are not directly

shared among tasks. In all cases, the model parameters were found by solving

a quadratic optimization problem (QP) with box-constraints [19], which is

solvable with any standard QP solver. We demonstrated the efficacy of the

proposed method on the challenging problem of detecting FCD lesions, in TRE

patients.

By incorporating label scores from iEEG analysis, the task-specific ap-

proach and the SVMR baseline correctly detected lesional regions within the

resections of all patients, as compared to other baseline methods which achieved

lower detection rates. However, the task-specific approach achieved a lower

false-positive rate than SVMR. Even though the proposed task-specific ap-

proach is effective in identifying FCD lesions, it has the following limitations:

� For larger sets of patients, the number of pseudo-examples added to the

dataset for incorporating rank constraints can severely limit the training

time of the proposed algorithm. This is because the pseudo-examples

increase quadratically (O(n2)) with the data (n), in the worst case.

� It has been shown that the presence of outlying tasks, can seriously un-

dermine the performance of multitask learning algorithms, and in some

cases even result in negative transfer (i.e., jointly learning tasks results

in degraded performance) [82, 54]. Therefore, using patients or controls

who are outliers in terms of their demographic characteristics, location

of the lesion, and other pathological findings can limit the sensitivity of

the proposed scheme.

71

� In certain cases the lesion may not completely co-register with the seizure

onset zone [81]. In such cases, the cortical region corresponding to elec-

trodes that are active during a seizure may actually correspond to normal

tissue. In such cases the pairwise ranking function would falsely attribute

higher weights to normal instances.

In the next chapter, we cast lesion detection as an outlier detection problem

and develop a semi-supervised method which does not require vertex-level

labels for training. By discarding the need for vertex-level labels, we do away

with the need for iEEG as an auxiliary source of supervision to augment the

weak labels provided by the resected regions.

Identifying the abnormal region in cryptogenic epilepsies is based on a

confluence of evidence from multiple sources such as MRI, PET, iEEG, etc.

Even with the above mentioned limitations, the high sensitivity of the proposed

method can have a positive impact on FCD lesion detection by using carefully

selected patients and controls. Furthermore, the proposed method can be

applied to other domains in which decisions are made based on converging

evidence from disparate sources.

72

Chapter 5

Hierarchical Conditional

Random Fields For Detecting

FCD Lesions

“Sometimes it's not enough toknow what things mean,sometimes you have to knowwhat things don't mean.”

Bob Dylan

Most automated FCD lesion detection methods are vertex based classifiers

[15, 96, 43], similar to the one developed in Chapter 3 and 4. These studies

classify individual vertices of the cortical surface as lesional or normal, using

labeled training data from MRI-positive patients and controls. There are four

crucial issues that these methods and their evalutation studies fail to address:

(1) The goal of resective surgery is to remove the entire lesion. If any part

of the lesion is left behind, the outcome will not be successful. This introduces

label noise, because the expert-marked lesion can contain normal vertices; the

margin around the lesion is marked in a “generous” manner to increase the

chances of capturing the entire lesion. In Chapter 3 we used a stratified logistic

73

regression classifier to detect lesions in MRI-negative patients. By manually

reducing the resection masks for MRI-negative patients to correct for label

noise we were able to achieve a detection rate of 58%, as opposed to 12%

when the original resection masks were used as the ground truth. However, the

manual mask reduction procedure is ad hoc in nature and introduces human

subjectivity via the mask reduction thresholds.

(2) Individual vertices are assumed to be independent and identically dis-

tributed (i.i.d.). This is a strong assumption as it completely ignores the spa-

tial correlation that exists between neighboring vertices. It has been shown in

other domains such as object detection and segmentation in natural images,

that modeling spatial correlations leads to superior performance [78, 75].

(3) Vertex-based classification methods typically employ a post-processing

method to reduce the false positive rate. In this strategy a portion of the

vertices labeled lesional by the classifier are relabeled as normal. This can

be done by training a second-level classifier to classify the detected clusters

as lesional or non-lesional [14, 43]. Similarly, different heuristics can also be

used such as the surface area of the detected clusters [96]. Discarding any

detected region based on its size or surface area can result in discarding the

actual lesion or part of the lesion, because FCD lesions can be located in any

part of the cortex, vary in size, and occur in multiple lobes [18].

(4) Results are evaluated either on MRI-positive patients [14, 96] or pa-

tients who were initially deemed MRI-negative during their preliminary radi-

ological screening, but later their lesions were found to be visible on MRI [43].

However, the real challenge is to find lesions in MRI-negative patients. A vi-

sually detected lesion during pre-surgical evaluation can substantially increase

the chances of a successful surgical outcome [57], and inform iEEG electrode

placement which can result in minimal sampling errors [43] and an accurate

74

delineation of the resection target.

In this chapter we develop a lesion detection method that is designed to

explicitly address these issues. First, we model lesion detection as an out-

lier detection problem. The assumption is that a lesional cortical region is

an outlier in a suitable feature space when compared to the same cortical re-

gion across a control (normal) population. This view eliminates the use of

noisy class labels, and consequently bypasses the need for any manual mask

reduction procedure.

Second, instead of classifying individual vertices we classify segmented

patches of the cortex. The patches are obtained using unsupervised segmen-

tation of the flattened cortex that isolates regions with homogeneous feature

values. As the size of the FCD lesions varies widely, using a single scale to

isolate the lesion may not be effective. To minimize the chances of missing

the lesion, we employ a multiscale strategy in which the segmentation is car-

ried out at different scales of varying granularity. The interplay between the

patches obtained in this scale hierarchy is modeled as a tree structured con-

ditional random field (CRF) [91], rooted at the most crude scale and having

leaves at the finest scale. These random fields are also known as hierarchical

conditional random fields (HCRF) [78], because they model the dependencies

between patch labels within the scale hierarchy. HCRFs are able to fully ex-

ploit the spatial dependencies in the data by classifying image patches rather

than vertices, and furthermore larger spatial interactions are explicitly cap-

tured by the HCRF, as detailed in Section 5.2.2.

Third, we define a ranking criterion which takes into account both the size

and probability of a cortical region (cluster) that is labeled as being lesional.

Ranking eliminates the need to post-process the results, and provides a nat-

ural way of presenting the results to a radiologist to function as a “focus of

75

attention” mechanism.

Finally, we evaluate our approach on MRI-negative patients whose resec-

tions contained the primary FCD lesion, confirmed by a histological exam on

the resected tissue. MRI-negative patients account for approximately 45% of

histologically confirmed FCD lesions that go undetected during visual inspec-

tion [110]. The chances of a successful surgical outcome in the presence of

a visually detected lesion are 66% as compared to only 29% when the lesion

is not detected [57, 99]. Therefore, patients who lack an MRI-visible lesion

are less likely to be referred to specialized epilepsy center by neurologists [38]

and many epilepsy specialists are reluctant to operate without a well-defined

lesion. For these reasons, resective surgery remains underutilized, despite a

growing number of studies demonstrating that surgery is effective for patients

with focal TRE [9]. Development of computational methods of FCD lesion de-

tection that are able to achieve high sensitivity in MRI-negative cases, could

have a high impact on the number of patients who undergo resective surgery

and achieve better quality of life.

5.1 Hierarchical Conditional Random Fields

Hierarchical Conditional Random Fields (HCRFs) provide a suitable frame-

work for supervised image segmentation [78], object detection and semantic im-

age labeling [75]. In the original HCRF framework proposed for figure-ground

segmentation [78], an image is first segmented into a number of patches at dif-

ferent scales. Each patch is then classified as being part of the background or

foreground, using a suitable binary classifier based on image features such as

texture, SIFT, etc. Exploiting the fact that the labels assigned to overlapping

patches between different scales should agree, an HCRF (a tree-structured con-

76

ditional random field) is constructed to model these inter-scale interactions.

The image is thus modeled as a forest, where the root node for each tree cor-

responds to a patch obtained at the coarsest scale, whereas the leaves reside

at the finest scale. The joint probability of all patch labels is estimated by

running inference on the HCRFs. The image is segmented by thresholding the

final probabilities at the leaves. Plath et al. [75], extend this framework to

work with more than two classes. Mutli-class image labeling using HCRFs is

also done in the work by Awasthi et al. [5], where instead of obtaining image

patches using segmentation, the authors impose a grid structure on the image

at different scales and model the HCRF as a quad-tree structure. These mul-

tiscale methods are highly sensitive to the accuracy of pixel-level labels. For

example in Murphy et al. [78], the bounding boxes around the region of in-

terest (ROI) in training images were manually refined to eliminate extraneous

pixels and this resulted in a significant increase in accuracy.

5.2 HCRFs for Lesion Detection

For FCD lesion detection, we have training data from MRI-negative patients

who have undergone surgical resection and are seizure-free. The resected corit-

cal region, can be used to obtain vertex-level labels which can then be used

to train a classifier. However, as explained previously these labels tend to be

highly noisy and using them to train a classifier will result in noisy predictions

[1]. To ameliorate this problem we extend the HCRF framework proposed in

[78] to perform outlier detection on registered image data. In contrast to the

approaches mentioned previously, we cannot utilize vertex-level labels. Our

proposed method works in a semi-supervised manner, where only global labels

are available (i.e., whether the cortical surface belong to a healthy control or

77

a patient). Thus, we define an FCD lesion as a region of the brain which is

considered an outlier when compared to the same region across a population

of normal controls.

The construction of the HCRF for FCD lesion detection involves the fol-

lowing steps [2]:

1. Segment the cortex at multiple scales, to obtain image patches of varying

sizes.

2. Assign an outlier score to each image patch by comparing it to the same

cortical region across the control population. This one-to-one comparison

is made possible by registering each of the controls’ and patients’ cortical

surface to the same average surface.

3. Construct multiple HCRFs, one for each image patch obtained at the

coarsest scale.

4. Run inference on the HCRFs to calculate the posterior probability at

each node. The final lesion is detected by thresholding the posterior at

the leaves.

We start by describing our approach to segmentation.

5.2.1 Segmentation

The functional organization of the cortex is two-dimensional, e.g., the func-

tional mapping of the primary visual areas [103]. Therefore, as an initial

simplification we have chosen to work with the flattened cortex because it

will simplify the segmentation procedure and allow us to use already well-

established image segmentation techniques. Using SBM the cortex is modeled

as a two-dimensional surface, which on average contains approximately 0.15

78

million vertices. Even though it is possible to flatten the entire cortex, it’s

segmentation and subsqeuent inference on the resulting HCRFs would require

significant computational resources. Thus, to reduce the processing overload

we have chosen to subdivide the lesion detection task into smaller regions of the

cortical surface as defined by a standard neuro-anatomical atlas, which out-

lines cortical regions based on their morpho-functional properties [34]. These

regions are also known as parcellations. One such atlas is shown in Figure

5.1. Instead of segmenting the entire cortical surface at once, we isolate these

parcellations one at a time and flatten them individually to obtain a stan-

dard two-dimensional image, which we then segment at multiple scales. Any

morphological feature (e.g., cortical thickness, curvature, etc.), can be used to

represent the intensity values in the resulting image. Figure 5.1 illustrates the

overall HCRF construction process for a parcellation.

We use quick shift [106] for unsupervised segmentation. One of the main

advantages of using quick shift is that the number and size of segments need

not be specified. Additionally, quick shift does not penalize for boundary

regions, and produces a diverse set of segments having different shapes and

sizes. It should be noted that any segmentation method can be used, as long

as it has the ability to segment the image at different scales.

The standard quick shift algorithm is a fast mode seeking algorithm similar

to mean shift [23]. It performs a hierarchical segmentation of the image, where

the sub-trees represent image segments. It has two parameters namely the size

of the Gaussian kernel (σ) used by a Parzen window density estimator, and the

maximum distance (∆) between two pixels permitted while remaining part of

the same segment. We vary the scale parameter σ to change the average size

of segments, and set ∆ to be a multiple of σ [105]. Thus, higher values of σ

produce larger segments. By using different combinations of these parameters,

79

Figure 5.1: Constructing an HCRF using a standard neuro-anatomical atlas(left), and a parcellation image (top-right). Any morphological feature can beused to represent the image (this image was created using cortical thickness).At the bottom we have image patches obtained at two different scales usingQuickshift. Each image patch on the coarser scale (bottom-left) becomes aroot having children at the adjacent finer scale (bottom-right).

we construct the scale-hierarchy that is the basic building block of the HCRF,

as explained next.

5.2.2 HCRF Construction

Once the multiscale segmentation is complete for a particular subject, we

obtain a set of patches at different scales. Let Ikp be the pth patch obtained at

the kth scale. We can collect the corresponding patches from all controls and

then estimate a label y ∈ {0, 1} for Ikp , where y = 1 indicates that Ikp is an

outlier. This label cannot be considered independent from the labels of other

patches that overlap with Ikp at other scales.

We model the joint prediction of these mutually dependent labels of all the

patches using a tree structured HCRF. Let Ik+1p be an image patch at level

k + 1, it has a parent Ikq at the immediately coarser level k, such that Ikq has

maximal overlap with Ik+1p [78]. We find the index q as follows:

q := arg maxq

|Ik+1p ∩ Ikq ||Ikq |

(5.1)

80

Each patch at the coarsest scale is the root of a tree having leaves at the finest

scale. Therefore, the parcellation image is represented by a forest, where each

tree is modeled as an HCRF, as shown in Figure 5.1.

CRFs model the joint conditional probability distribution of all the patch

labels y = (y1, . . . , yn) in the tree based on the values of the input morpholog-

ical feature (x). Generally, this can be written as:

p(y|x, θ) =1

Z(x, θ)

∏i

φ(yi|x, θ)∏i,π(i)

ψ(yi, yπ(i)) (5.2)

where, π(.) represents the parent patch, and Z(x, θ) is the normalization con-

stant also called the partition function. φ(.) is called the node potential and

represents the local evidence for the label yi based on the observed data x.

The edge potentials that model the coupling between adjacent labels are rep-

resented by ψ(.). Because the graph is a tree we can efficiently calculate Z(x, θ)

and the posterior probabilities of the patch labels at all scales using standard

belief propagation [73].

Traditionally, for conditional random fields the node and edge potentials

are jointly learned from labeled training data (see [91] for details). For our

application, because the labels are noisy and we have chosen to work in an

unsupervised manner, we set the node and edge potentials separately, which

we describe next. Similar strategies for parameter estimation in HCRFs have

been used for figure-ground segmentation [78] and for object detection [75] in

natural images.

Node Potentials

The node potential is modeled to reflect our belief about the abnormality of

an individual image patch. Most of the available outlier detection mechanisms

81

produce outlier scores that are poorly calibrated i.e., the range of the outlier

score is dependant on the dataset [87]. This makes it difficult to compare the

outlier scores among datasets produced by the same method. Popular outlier

detection methods such as local outlier factor (LOF) [20] and local correlation

integral (LOCI) [72] suffer from the same problem. In our case we would

like to work with an outlier detection method that produces standardized

scores carrying the same semantics at each scale and thus can be compared

between different scales. This is an important design choice because running

inference on non-standardized scores, not comparable among different scales,

will produce meaningless results. To overcome this, we have chosen to work

with local outlier probabilities (LoOP) [51], a standardized version of LOF

that produces standardized scores within the range [0, 1]; these scores can be

interpreted as the probability that a data point is an outlier.

LoOP assumes that each data instance x has a context set S ⊆ D, and the

set of distances between x and s ∈ S has a Gaussian distribution [51]. The

standard deviation of these distances σ(x, S) combined with a significance

factor λ produces the probabilistic set distance of x to S [51] defined as:

pdist(λ, x, S) := λ · σ(x, S) (5.3)

where S is determined using a k-nearest neighbor query. The parameter λ

defines the sensitivity of the final probability estimates. It denotes that any

instance that deviates more than λ times the standard deviation would be

considered an outlier. Its values are analogous to the empirical confidence

levels defined for the standard normal distribution [51]. The probabilistic

82

local outlier factor for x can then be calculated in a manner similar to LOF:

PLOFλ,S(x) :=pdist(λ, x, S)

Es∈S[pdist(λ, s, S(s))]− 1 (5.4)

PLOF values of greater than zero indicate that the given instance may be

an outlier. In order to convert a PLOF value into a probability estimate, we

assume that they are distributed around 0 with a standard deviation calculated

as√E[(PLOF )2]. The final probability can then be calculated as:

LoOPS(x) := max

{0, erf(

PLOFλ,S(x)

λ√

2E[(PLOF )2])

}(5.5)

where, erf(.) is the Gauss error function [3].

Edge Potentials

Each edge in the HCRF represents the dependency between the “parent” image

patch at scale t and the “child” patch at scale t+ 1. We set the edge potential

to reflect the visual similarity between the two patches, using the chi-squared

distance between the histograms of scale invariant feature transform (SIFT)

features [62] of the parent and child patches. Thus, the labels of image patches

that bear close visual similarity to each other in the scale hierarchy are more

strongly coupled than those with lower similarity. This heuristic is similar to

one chosen by [78].

To estimate the histograms of the SIFT features for each image, we initially

learn a codebook of m codewords using the control data. For each control

image in the subset we flatten and isolate the parcellation, and then calculate

a SIFT feature vector at each pixel. These vectors are then clustered into m

clusters using k-means clustering. Each feature has its own range of values

and defines separate morphological properties of the cortex (see Section 2.2.3),

83

we learn a separate codebook for each parcellation/feature combination. The

edge potential between two adjacent nodes in the tree is then calculated as

[78, 75]:

ψ(yi, yj) =

eγ.ηij e−γ.ηij

e−γ.ηij eγ.ηij

(5.6)

where, γ is a free parameter that represents the strength of coupling between

adjacent levels in the CRF and ηij = e−χ2(xi,xj). xl represents the normalized

histogram of SIFT features for the lth patch in the HCRF, and χ2(., .) is the

chi-squared distance between two normalized histograms each having n bins

and defined as:

χ2(P,Q) =1

2·

n∑i=1

(Pi −Qi)2

(Pi +Qi)(5.7)

where, P and Q are normalized histograms.

5.2.3 Lesion Detection

For each subject, we calculate the posterior probabilities at each node of the

HCRF for every parcellation by running belief propagation [73]. The final

detection is obtained by thresholding the posterior beliefs at the leaves of

each HCRF [78, 75]. Different strategies for thresholding can be used, such

as defining a single threshold across all subjects, or calculating a threshold

for each subject individually. In this work we calculate an adaptive threshold

for each patient separately. This decision is based on the observations that

1) FCD lesions can manifest differently for different individuals, and 2) the

morphological features vary with different demographic factors such as gender

and age. For example cortical thickness is correlated with the age of the patient

[84]. To this end, we sort the posterior probabilities and define the threshold

as the lowest probability among the top K probability estimates. In practice

84

the value of K can be left as a free parameter which the user can vary to see

the different regions deemed lesional with varying levels of confidence. Thus,

the radiologist has a knob to turn which shows more/fewer possible candidate

lesions. This is a desirable feature, because the detection scheme presented

here is designed to be a part of the comprehensive pre-surgical evaluation

protocol that includes MRI, Positron Emission Tomography (PET), scalp EEG

and iEEG. The final resection target is determined by combining evidence

from all evaluations. Therefore, the ability to generate multiple cortical maps

delineating possible lesions at different confidence levels provides a richer set of

evidence which in turn increases the probability of capturing the actual lesion.

5.3 Empirical Evaluation

Our data consists of MRI-negative patients who have undergone resective

surgery and for whom their resected tissue was histologically verified to contain

abnormal tissue. Each patient who undergoes surgery is assigned an “Engel”

class. An Engel class of 1 represents complete seizure freedom whereas an

Engel class of 4 represents no improvement. We selected only patients with an

Engel class outcome of 1 for our experiments in order to verify that the region

resected was indeed the primary lesion and that no additional epileptogenic

lesions were present in other parts of the brain. This resulted in a dataset with

twenty MRI-negative patients (refer to Appendix A for patient related infor-

mation). This may appear to be a small dataset, but few patients proceed to

surgery when no visible lesion is found on their MRI, and of those that do, less

than a third experience complete seizure freedom [57]. These twenty patients

include all MRI-negative patients who underwent surgery at New York Univer-

sity comprehensive epilepsy treatment center during the past three years, and

85

were classified post-surgically as Engel class 1. Developing automated lesion

mechanisms for MRI-negative patients is an active area of research and our

sample size is consistent with the existing work in the domain ([14, 96, 43]).

However, in contrast to our evaluation, other studies evaluate their proposed

detection schemes on MRI-positive patients (i.e., patients whose lesion was vis-

ible on the MRI during the initial evaluation or was found visually at a later

stage). It is important to note that our sample consists of “pure” MRI-negative

patients, and therefore our results target that patient population where there

is an actual need of an automated lesion detection scheme and where such a

scheme can have a positive impact impact on the outcome of resective surgery.

5.3.1 Data Pre-processing and Parameter Selection

After the surface has been reconstructed using the freesurfer software1 we used

the Desikan-Killiany atlas [25] to isolate the different parcellations. Note that

any suitable neuro-anatomical atlas can be used to subdivide the cortical sur-

face. Each parcellation is flattened to obtain a standard 2-d image, where the

intensity of each pixel can be represented by any one of the four morphological

features.

The values of the different parameters such as the segmentation scales,

number of nearest neighbors in calculating the outlier probabilities, etc., de-

pend on various factors, such as the size of the control population, the distri-

bution of ages across the control cohort and the gender of the subject. We

therefore present these parameters as actual free parameters that can be varied

over a preset range of values to get different detection results. Whether an im-

age patch is an outlier depends on the set of controls used to learn the “normal”

model. Most morphological features vary with different demographic factors

1Available at http://surfer.nmr.mgh.harvard.edu/

86

such as age, gender, education, etc. Ideally, we could choose a customized set

of controls for each patient, but currently we do not have enough controls to

customize for age and other factors, but we do select controls based on the

patient’s gender.

To select the parameters for the various aspects of our method, we used a

validation set consisting of two MRI-positive and two MRI-negative patients,

which are distinct from the patients used to evaluate our method. We used all

115 controls to learn a separate codebook of SIFT features for every parcella-

tion/feature combination. Dense SIFT features were calculated at each pixel.

We tested vocabulary sizes of 50, 100 and 500 and selected a vocabulary size

of 50 as it resulted in higher recall and precision on the validation set. This

codebook was used subsequently to estimate the histograms of SIFT features

at each pixel location for all patient parcellation images in the test set.

Each parcellation image was segmented at three different scales using quick

shift. We used σ = {2, 3, 4} and ∆ was set to 5σ. These values were chosen

such that the smallest possible lesion in our validation set is over-segmented

(i.e., there are multiple segments that contain the lesional area). This increases

the probability that a patch can be entirely formed from lesion vertices, rather

than having patches that partially overlap with the lesion, which would be

harder to detect as outliers. Based on these settings, the validation set resulted

an average of 4255 ± 107 HCRF models per patient, with 19292 ± 373 leaves

at the finest scale using cortical thickness. Although the size of the validation

seems small as far as the number of patients are concerned, we conjecture that

the resulting number of HCRF models and number of instances are adequate

for setting the model parameters.

Finally, before performing outlier detection, we apply a standard dimension

reduction technique on each patch using principal component analysis (PCA)

87

[102]. Note that the PCA is done using only the control data. We retained

the top m principal components that accounted for 95% of the variance in

data. Based on results for the validation set (carried out independently for

each feature), the parameters for outlier detection were set to k = 10 in LoOP

and γ (c.f. equation (5.6)) was set to 50.

5.3.2 Evaluation Methodology

The final detection for each subject is determined by thresholding the pos-

terior probabilities at the leaves of the CRF, which represent the segments

obtained at the finest scale. We determine the detection thresholds by divid-

ing the last percentile of the final outlier probabilities into ten equal parts.

The first threshold corresponds to the lowest probability in the highest 0.1%

scores and so on. For the results presented in this section we determine five

such thresholds to get five different possible detections. Because, this is an

adaptive mechanism, it has a possible drawback that it always detects some-

thing even when the probabilities are very small. Thus we set 1× 10−4 as the

minimum probability, such that no threshold is calculated below this value.

This limiting value was selected based on the observation that any threshold

calculated below this value resulted in more than 80% of the cortex being la-

beled as lesional for the patients in the validation set. This lower bound on

the threshold is a free parameter of the model and can be adjusted according

to the needs of the user.

5.3.3 Cluster Ranking

We have chosen to evaluate and contrast the performance of the detection

techniques in an information retrieval framework. We first calculate the clus-

ters by thresholding the posterior probability at a given threshold. All the

88

detected clusters are then ranked based on the following score function:

score(c ;α) = αs(c) + (1− α)o(c) ; 0 ≤ α ≤ 1 (5.8)

where c is a cluster detected at a pre-defined threshold, s(.) ∈ [0, 1] is the

relative surface area of the cluster calculated as the ratio between the surface

area of c and the total surface area labeled as lesional. o(.) ∈ [0, 1] is a scoring

function that represents the degree of “outlier-ness” of the cluster. For the

HCRF, we model o(.) as the average of the outlier probabilities calculated at

each vertex that is part of the cluster. α is a tradeoff parameter such that

α = 1 defines a ranking that is based solely on cluster-size. Setting α to 0

results in a cluster ranking based only on their probability of being lesional.

Intermediate values of α define a ranking in which a smaller cluster detected at

a stringent threshold is ranked higher than a larger cluster detected at a more

lenient threshold and vice versa. In the ideal case clusters having a higher

rank should be within the lesion/resection zone of the patient.

Baseline Methods

We compare the results of our proposed technique against the univariate z-

score based technique reported in [96], and to our vertex based approach de-

veloped in Chapter 3. Both techniques require registration of the control and

patient surfaces to an average surface.

The z-score based baseline, calculates the z-scores at each vertex for the

patients, which are then thresholded to obtain the detection results. We cal-

culate the z-score based on gender matched controls instead of using all the

controls. We have chosen this technique as the baseline method because, i) it is

a semi-supervised approach and does not require accurate vertex-level labels,

89

and ii) it has been part of the pre-surgical evaluation at the NYU comprehen-

sive epilepsy treatment center where the patients included in our evaluation

were treated.

We also use the vertex-based classification scheme developed in Chapter

3, to compare the HCRF results when the detection results are combined

across the four morphological features. Recall, that the vertex-based scheme

required manual reduction of the resection masks to eliminate label noise. We

perform all the pre-processing steps including mask reduction, and use leave-

one-patient-out cross-validation to obtain the detection for each patient in the

dataset.

We omit the last step of both baseline methods, which post-processes the

detections to eliminate “small” clusters [96], based on the cluster surface area.

To facilitate comparison we calculate multiple thresholds in the exact same

manner as outlined above for HCRF, and rank the clusters at each threshold

based on Equation 5.8. We next describe the measures used to evaluate our

proposed method.

Detection Rate

Detection rate is defined as the number of patients for whom one or more

detected clusters overlap with the resected area. Usually a post-processing step

is applied to the raw detections before estimating the detection rate. Hong

et al. [43], train a classifier to distinguish between clusters detected in the

resection/lesional area and the extra-lesional clusters using the training data,

this classifier is then applied to the clusters detected on the test subject before

estimating the detection rate. Similarly, in Thesen et al. [96], all clusters below

a pre-set size threshold are discarded, and a successful detection results if one

or more of the remaining clusters overlap with the lesional area. Discarding

90

any detected cluster based on its size increases the risk of discarding subtle

lesions. Instead of discarding detected clusters, we use cluster ranking to

estimate the detection rate. To this end, we calculate five thresholds based on

the outlier probabilities for the HRCF method, and similarly for the z-score

method. After ranking the detected clusters based on (Equation 5.8), at each

threshold we consider a subject to be correctly detected if a cluster amongst

the top n (where n is relatively small as compared to the total number of

detected clusters) completely or partially overlaps with the lesion/resection.

This produces more conservative estimates of the detection rate as compared

to approaches that do not use cluster ranking.

Precision and Recall

In order to compare the quality of detections, we calculated the precision and

recall for both HCRF and the z-score based method. To this end, we consider

all detected clusters at each threshold. We define recall as the ratio of the

total surface area of all the clusters that overlap with the resection zone to the

surface area of the resection zone. Similarly, we define precision as the ratio

of the surface areas of clusters overlapping with the resection zone to the sum

of the surface area of all the detected clusters.

Accurately calculating the false postive rate for the proposed detection

scheme is challenging for several reasons. A patient can have abnormalities

outside the lesion/resection zone which may not be epileptogenic. For example,

abnormal cortical thinning remote from the epileptogenic onset region has been

observed in focal epilepsy [63, 60] and attributed to the destructive impact of

chronic seizures on brain structure rather than from malformations during cor-

tical development. This might result in elevated extra-lesional false positives

when detecting structural malformations characterized by abnormal cortical

91

thickness. We have compared our detections on MRI-positive patients with

an expert neuroradiologist (for details see Section 5.5). In 50% of the cases,

the expert identified abnormal regions that coincided with detections outside

the resection that would be classified as false positives using our evaluation

methodology which uses resection zones as the ground truth. This problem

becomes more challenging for MRI-negative patients whose structural abnor-

malities are not visible on their MRI. In order to circumvent the presence of

false negatives in our labeled data that would result in elevated estimates of

the false positive rate, we use precision to evaluate the efficacy of our proposed

scheme. Furthermore, based on the existence of structural abnormalities out-

side the resection zone (false negatives) the precision estimates provided here

should be treated as lower bounds.

5.4 Results

In this section we provide a comprehensive evaluation of the HCRF lesion de-

tection framework for MRI-negative patients. The first set of results deals with

using the four individual features (i.e., cortical thickness, curvature, GWC, and

sulcal depth). In the second phase we test different strategies of combining

the detection results obtained from individual features.

5.4.1 Individual Features

In our experiments we first evaluate the HCRF framework independently for

each of the four morphological features: cortical thickness, gray/white-matter

contrast, curvature and sulcal depth. We contrast the performance of the

HCRF framework with the z-score based univariate method [96]. In the next

set of experiments we analyze different mechanisms of combining the detec-

92

Figure 5.2: Detection results for patient NY67 using cortical thickness shownon an inflated model of the lateral cortical surface. The resected region isdelineated as the white circled region and the detection results are shown asfilled yellow regions. It can be seen that large clusters are detected withinthe resecion zone at the individual scales (i) and (iii) and small clusters aredetected at scale (ii), prior to combining the outlier probabilities using HCRF.However, at the second scale (ii) a large cluster is detected outside the resec-tion. When these findings are combined using the HCRF as shown in (iv) thelargest detected cluser is within the resection zone while the false detectionin (ii) is suppressed. (v) shows the detection made by the the z-score vertexbased approach. The results are shown for the most stringent (first) thresholdwithout any post-processing. (vi) shows the lesion highlighted on a T1 MRIslice.

tions from individual features, and use both baseline methods to contrast the

performance. Recall, that the ranking function (Equation 5.8) has a direct im-

pact on the detection rate and by setting the tradeoff parameter we can assign

more weight to either cluster size or the average cluster outlier probability.

To facilitate comparison between the proposed method and the baselines we

initially set the tradeoff parameter α to 1, in order to rank clusters only on

their surface area.

Figure 5.3(a) shows the comparison of the detection rates for MRI-negative

patients when cortical thickness is used to represent the cortex. HCRF per-

93

forms better than the z-score baseline across all the five thresholds, for the

top five detections. HCRF detects the lesion in 14 (70%) patients, while the

baseline detects only 11 (55%) subjects when considering the top ten largest

clusters. HCRF is also able to achieve higher recall and precision as shown in

Figures 5.3(b)-5.3(c). The difference between the recall values of the proposed

method (1.1140 ± 0.5654) and the baseline (0.8035 ± 0.4745) was significant

at t(9) = 7.9927, p < 0.001. Similarly, the differences in precision for HCRF

(10.4710± 1.0248) and the baseline (9.0608± 0.5577) were found to be signif-

icant at t(9) = 6.1161, p < 0.001 using a paired t-test. Figure-5.2 provides an

example of the detected clusters using HCRF and the baseline for a patient.

Using GWC, HCRF is able to detect abnormal clusters within the resection

zones of ten (50%) patients as opposed to the baseline that detects only nine

(45%), as shown in Figure 5.3(d). Figure 5.3(e) shows the recall for HCRF

method (0.6931±0.3702) that is significantly higher (t(9) = 7.1317, p < 0.001)

than the recall of the baseline method (0.3815±0.2334). Figure 5.3(f) compares

the precision of the HCRF and baseline using GWC. The differences in the

precision values for HCRF (7.5286±0.5769) and the baseline (6.4313±0.2987)

were found to be significant at t(9) = 4.2350, p = 0.0022 using a paired t-test.

Although, using GWC HCRF is able to outperform the baseline, the resulting

detection rate is worse than HCRF with cortical thickness.

Figure 5.3(g) shows the comparison of the detection rates using curvature

to represent the cortex. HCRF dominates the z-score baseline across all the five

thresholds, for both top five and top ten detections. HCRF detects abnormal

clusters within the resection zones of 13 (65%) patients, while the baseline

detects only 9 (45%) subjects when the top ten largest clusters are considered.

Figures 5.3(h)-5.3(i) show that HCRF is able to achieve higher recall and

precision, respectively. The difference between the recall values of the proposed

94

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−10)

Z−Score (Top−10)

HCRF (Top−10)

(a)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(b)

1 2 3 4 5 6 7 8 9 107.5

8

8.5

9

9.5

10

10.5

11

11.5

12

12.5

13

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(c)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(d)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(e)

1 2 3 4 5 6 7 8 9 105.5

6

6.5

7

7.5

8

8.5

9

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(f)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Ra

te

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(g)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Detection Threshold

Reca

ll (

%)

HCRF

Z−Score

(h)

1 2 3 4 5 6 7 8 9 105

6

7

8

9

10

11

12

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(i)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(j)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Reca

ll (

%)

HCRF

Z−Score

(k)

1 2 3 4 5 6 7 8 9 105

6

7

8

9

10

11

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(l)

Figure 5.3: Comparison of detection rates, precision and recall between theHCRF based approach and the baseline method using thickness (a)-(c), GWC(d)-(f), curvature (g)-(i) and sulcal depth (j)-(l). Here, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).


at t(9) = 8.825, p < 0.001. Similarly, the differences in precision for HCRF

(8.5373±1.2063) and the baseline (6.6313±0.6597) were found to be significant

95

Figure 5.4: Detection results for paitent NY294 based on curvature. Thewhite outlined area represents the region that was resected, while the filledyellow patches represent the detected clusters at the first detection thresholdfor both the HCRF and the z-score based method. The detected clusters atthe individual scales are shown in (i)-(iii). It can be seen that very small(almost negligible) clusters are detected that overlap with the resected region.However, after running belief propagation (iv) a large cluster is detected withinthe resection zone while the outliers are eliminated. (v) shows the results forthe vertex-based z-score based method while (vi) shows the lesion highlightedon a T1 MRI slice.

at t(9) = 3.9135, p < 0.0035 using a paired t-test. Figure 5.4 shows the

resulting detections from both the HCRF and the baseline when curvature is

used to characterize the cortex for an MRI-negative patient.

When sulcal depth is used to represent the cortex, both the HCRF and

the baseline method achieve the same detection rate. Both approaches are

able to detect abnormal clusters that overlap with the resections of 12 (60%)

patients (Figure 5.3(j)). However, as Figures 5.3(k)-5.3(l) show, HCRF is able

to achieve higher recall and precision values. The difference between the recall

values for HCRF (0.9585 ± 0.5013) and the baseline (0.4891 ± 0.3165) was

significant at t(9) = 7.7730, p < 0.001. Similarly, the differences in precision

for HCRF (8.7008± 0.8541) and the baseline (6.0124± 0.4679) were found to

96

be significant at t(9) = 8.0983, p < 0.001 using a paired t-test.

Using individual features, HCRF is able to achieve a maximum detection

rate of 70% while the baseline has a maximum detection rate of 60%, when top

ten largest clusters are considered. For the baseline sulcal depth and cortical

thickness achieve higher detection rates as compared to GWC and curvature.

Cortical thickness outperforms all other features based on its average precision

and recall. For the HCRF method sulcal-depth and curvature achieve identical

performance with GWC ranking the lowest.

An important consideration is the degree of consensus between the individ-

ual features with respect to the detected patients. If there is some degree of

disagreement among the features, then combining their detection can poten-

tially increase the overall detection rate. Considering top ten largest clusters,

two patients were not detected by any of the four features. Both cortical

thickness and curvature detect a combined total of 16 patients, differing on

one patient each. On the other hand all except a single patient detected by

GWC and sulcal depth were detected by either thickness or curvature. Based

on these results if we combine the output probabilities of all four features, and

then use the same thresholding and ranking technique we should be able to

achieve a detection rate that is higher than the detection rate of the individ-

ual features. We investigate the combination of all four features in the next

section.

5.4.2 Combining Features

In this section we explore the question of whether the HCRF based method

will achieve a higher detection rate if the detections of the individual features

are combined. As a first strategy, we can simply aggregate the posterior prob-

abilities as obtained by the application of HCRF to each individual feature.

97

Because every feature defines its own segmentation of a given parcellation im-

age, it is not possible to directly aggregate the probabilities obtained at the

leaves of the HCRF. To solve this issue we map the posterior probabilities ob-

tained at the leaves of the HCRF, back to the cortical surface for each feature

and then define a combination rule at every vertex. We use two basic aggre-

gation rules, in the first we average the probabilities across the four features,

and in the second each vertex is assigned a probability that is calculated as

the maximum of the four individual probabilities.

Aggregation based on averaging is similar to majority vote rule. In this

strategy, vertices for whom most of the features have a high probability of

being abnormal will be considered abnormal in the final detection. This has

the effect of lowering estimation errors leading to a lower false positive rate by

smoothing the outlier probabilities at each vertex. The second strategy that

uses the maximum across the probabilities would label a vertex as lesional

even if one of the features assigns it a lower outlier probability. This would

lead to a higher detection rate along with a high number of false positives.

Performance Comparison to Baseline Methods

We compare the performance of HCRF with combined features, to both the

z-score method (ZSC) [96] and the vertex-based classifier (ML) developed in

Chapter 3. For ZSC we calculate a single z-score estimate at each vertex,

by averaging and taking the maximum across the z-scores calculated for each

individual feature. Similarly, for the ML technique we combine the probabil-

ity calculated at each vertex as an average (and maximum) across the bag

of logistic regression classifiers. Figures 5.5 and 5.6 contrast the results of

HCRF by applying the two aggregation strategies with the z-score based base-

line method, and the logistic regression based method from Chapter 2 (ML),

98

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(a)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(b)

1 2 3 4 5 6 7 8 9 106.5

7

7.5

8

8.5

9

9.5

10

10.5

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(c)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)

Z−Score (Top−5)

HCRF (Top−10)

(d)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(e)

1 2 3 4 5 6 7 8 9 105

6

7

8

9

10

11

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(f)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Ra

te

Cortical Thickness

GWC

Curvature

Sulcal Depth

Maximum (All Features)

Average (All Features)

(g)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

Cortical Thickness

GWC

Curvature

Sulcal Depth



(h)

1 2 3 4 5 6 7 8 9 107

8

9

10

11

12

13

Detection Threshold

Precis

ion

(%

)

Cortical Thickness

GWC

Curvature

Sulcal Depth

AVERAGE (All Features)


(i)

Figure 5.5: Comparison of detection rates, precision and recall between theHCRF based approach and the z-score based baseline method when the de-tection scores are averaged across features (a)-(c), and when the final outputscore is computed as the maximum across features (d)-(f). (g) contrasts thedetection rate of both aggregation strategies with that of the individual fea-tures when the top ten largest clusters are considered and (h)-(i) provide thesame comparison for recall and precision. Note that α = 1 such that largerclusters are ranked higher (refer to Equation 5.8).

repectively.

Figure-5.5(a) shows the detection rates when the probabilities are aver-

aged across features. It can be seen that the baseline performs better than

the HCRF at the early thresholds, however HCRF is able to produce better

results as the threshold becomes more lenient. Considering the top ten largest

clusters, HCRF is able to achieve a detection rate of 60% which is slightly

99

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Det

ecti

on

Rate

ML (Top−5)

HCRF (Top−5)

ML (Top−10)

HCRF (Top−10)

(a)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

HCRF

ML

(b)

1 2 3 4 5 6 7 8 9 107

7.5

8

8.5

9

9.5

10

10.5

Detection Threshold

Precis

ion

(%

)

HCRF

ML

(c)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

ML (Top−5)

HCRF (Top−5)

ML (Top−10)

HCRF (Top−10)

(d)

1 2 3 4 5 6 7 8 9 100.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Detection Threshold

Recall

(%

)

HCRF

ML

(e)

1 2 3 4 5 6 7 8 9 108.5

9

9.5

10

10.5

11

11.5

12

12.5

Detection Threshold

Precis

ion

(%

)

HCRF

ML

(f)

Figure 5.6: Comparison of detection rates, precision and recall between theHCRF based approach and the logistic regression based baseline method, whenthe detection scores are averaged across features (a)-(c), and when the finaloutput score is computed as the maximum across features (d)-(f). Note thatα = 1 such that larger clusters are ranked higher (refer to Equation 5.8).

higher than the baseline that achieves a detection rate of 55%. The recall

and precision for the HCRF method are significantly higher than the baseline

as shown in Figures 5.5(b)-5.5(c). Using a paired t-test, the difference in the

recall values of the HCRF (1.0206±0.5797) and the baseline (0.7046±0.4410)

was significant at t(9) = 7.1317, p < 0.001, and the difference in the precision

values of the HCRF (9.2946± 0.9708) and the baseline (8.1422± 0.5595) was

significant at t(9) = 4.2350, p = 0.0022.

Similarly, the ML baseline method achieves a maximum detection rate of

55% which is lower than the detection rate of HCRF as shown in Figure-

5.6(a). As far as precision is considered ML achieves higher precision on av-

erage as compared to HCRF. However, using a paired t-test, the difference

in the preicison values of the HCRF (9.2946 ± 0.9708) and the ML baseline

(9.3424 ± 0.4004) was not significant at t(9) = −0.1975, p = 0.8479. On the

100

other hand, HCRF has higher average recall (1.0206 ± 0.5797) than the ML

method (0.8199±0.4702), and the differences in the average recall values were

significant at t(9) = 5.1051, p < 0.001.

When the posterior probability at each vertex is calculated as the maximum

across the four features, HCRF achieves higher detection rate as compared to

the z-score method, shown in Figure 5.5(d). HCRF detects abnormal clusters

within the resection zones of 13 (65%) patients, while the z-score method

detects only 10 (50%) subjects when top ten largest clusters are considered.

Whereas, the logistic regression based method (ML) detects only 9 (45%)

subjects when top ten largest clusters are considered. HCRF achieves higher

recall (t(9) = 8.825, p < 0.001) and precision (t(9) = 3.9135, p < 0.0035)

than the z-score based baseline, as shown in Figures 5.5(e)-5.5(f), respectively.

As compared to the ML method, HCRF has higher average recall (t(9) =

2.3474, p < 0.05), but lower average precision (t(9) = −1.0694, p = 0.3127).

Performance Comparison to Individual Features

A comparison of the detection rate of both aggregation strategies with the

detection rates of the individual features is shown in Figure 5.5(g). We can

see that both perform worse than cortical thickness, achieving a maximum de-

tection rate of 65% when the top ten largest clusters are considered. The same

detection rate is also achieved by curvature. Similarly, based on recall and pre-

cision values we can see that both combination strategies fail to outperform

any of the individual features with the exception of GWC.

Both the averaging and maximum strategies achieved lower recall and pre-

cision than cortical thickness (Figures 5.5(h) and 5.5(i), respectively). In ad-

dition to cortical thickness, the maximum strategy achieved lower precision

than both curvature and sulcal-depth. On the other hand, for the averag-

101

ing technique the differences in recall and precision were not significant when

compared to curvature and sulcal depth.

One reason for the failure of the combined strategies to perform better is

that each feature has its own idiosyncrasies, which when not accounted for

will introduce noise in the ranking/detection process. As an example, con-

sider sulcal depth and curvature. Both features when used within the HCRF

framework, achieve similar precision and recall but different detection rates.

This shows that although, sulcal depth detects clusters within the resection

zones of patients, it detects larger clusters outside the resection zone. If the

detections of sulcal depth and curvature are combined then the noisy clusters

detected by sulcal depth will cause a drop in the overall detection rate. There

are two possible solutions: 1) select only informative features and discard the

ones that are noisy, and 2) tune the tradeoff parameter (α) in the ranking

function (Equation 5.8) such that the ranks of smaller clusters that are highly

abnormal remain resilient to the presence of larger noisy clusters. It should

be noted that changing the ranking function will have no effect on the overall

precision and recall, because cluster ranking only influences the detection rate.

To explore option 1, feature selection, we selected cortical thickness and

curvature because of their higher detection rates, precision and recall. We

employ the same aggregation strategies as before, namely averaging and max-

imum. In Figure 5.7 we observe that when using only thickness and curvature,

both the averaging and maximum strategies produce higher detection rates,

precision and recall than the baseline (Figures 5.7(a)-5.7(f)).

More interestingly, when compared to individual features, the combination

of curvature and cortical thickness is able to achieve significantly higher preci-

sion and recall, with the exception of cortical thickness (the average precision

and recall is higher but the differences are not statistically significant), as

102

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(a)

1 2 3 4 5 6 7 8 9 100

0.25

0.5

0.75

1

1.25

1.5

1.75

2

2.25

Detection Threshold

Dete

cti

on

Rate

HCRF

Z−Score

(b)

1 2 3 4 5 6 7 8 9 108.5

9

9.5

10

10.5

11

11.5

12

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(c)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score (Top−5)

HCRF (Top−5)


HCRF (Top−10)

(d)

1 2 3 4 5 6 7 8 9 100

0.25

0.5

0.75

1

1.25

1.5

1.75

2

2.25

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(e)

1 2 3 4 5 6 7 8 9 104

5

6

7

8

9

10

11

12

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(f)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Ra

te

Cortical Thickness

GWC

Curvature

Sulcal Depth

Maximum (Thickness+Curvature)

Average (Thickness+Curvature)

(g)

1 2 3 4 5 6 7 8 9 100

0.25

0.5

0.75

1

1.25

1.5

1.75

2

2.25

Detection Threshold

Recall

(%

)

Cortical Thickness

GWC

Curvature

Sulcal Depth



(h)

1 2 3 4 5 6 7 8 9 107

8

9

10

11

12

13

Detection Threshold

Precis

ion

(%

)

Cortical Thickness

GWC

Curvature

Sulcal Depth



(i)

Figure 5.7: Comparison of detection rates, precision and recall between theHCRF based approach and the z-score based baseline method when the de-tection scores are averaged across thickness and curvature (a)-(c), and whenthe final output score is computed as the maximum between the two features(d)-(f). (g) contrasts the detection rate of both averaging and maximum withthat of the individual features when the top ten largest clusters are consideredand (h)-(i) compare the recall and precision. Note that, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).

shown in Figures 5.7(h)-5.7(i), respectively. However, as Figure 5.7(g) shows

the detection rate although higher than any of the feature aggregation strate-

gies, does not exceed that achieved by thickness alone.

Next, we explore the effects of tuning the size/probability tradeoff param-

eter α, on the detection rate of both individual and combined strategies.

103

5.4.3 Ranking Criterion and the Detection Rate

Thus far we have fixed the ranking criterion to be the size of the detected clus-

ter. However, as defined in Equation 5.8, we can tune the tradeoff parameter

such that the cluster ranking criterion pays attention to both the size and the

average outlier probability of the cluster. To ascertain how α influences the

performance of HCRF framework, we varied α over its entire range of values

and determined the detection rate for individual features, the combination of

all features, and the combination of the top two features.

For a given input feature (or a combination of features) we divided the

range of α ∈ [0, 1] uniformly into twenty 21 points. At each resulting value of

α we re-estimated the ranking of the clusters. The detection rate corresponding

to each value of α was determined by taking the maximum number of patients

detected using the top ten ranked clusters, across the first five thresholds.

Figure 5.8(a) shows the detection rates of each individual feature for different

values of α. Both cortical thickness and sulcal depth achieve their maximum

detection rates when α = 1, whereas curvature does so for intermediate values

of α = 50, 75. On the other hand the detection rate of GWC drops as

α increases. Thus, every feature has its own idiosyncratic dependency on α

which should be taken into account when combining the outputs from multiple

features, especially because the goal is to improve the overall detection rate.

Figures 5.8(b)-5.8(c) compare the influence of alpha on the detection rates

of the two combination strategies (averaging and maximum) when all the fea-

tures are used and when only cortical thickness and curvature are used, re-

spectively. It can be observed that the highest detection rate, 75%, results

from using an averaging technique to combine the posterior probabilities of

cortical thickness and curvature (Figure 5.8(c)).

In order to contrast the performance of the HCRF based method based

104

0 5 25 50 75 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α (%)

Dete

cti

on

Rate

Cortical Thickness

GWC

Curvature

Sulcal Depth

(a)

0 5 25 50 75 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α (%)

Dete

cti

on

Rate

Cortical Thickness

Curvature



(b)

0 5 25 50 75 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α (%)

Dete

cti

on

Rate

Cortical Thickness

Curvature

Maximum (Thickness + Curvature)

Average (Thickness + Curvature)

(c)

Thickness GWC Curvature Sulc Depth Avg. All Max All Avg. Top2 Max Top20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

Z−Score

HCRF

(d)

Figure 5.8: Effect of α on the detection rates of (a) individual features, (b)combination of all four morphological features and (c) combination of thetop two ranked features. In (b) and (c) we have omitted the detection ratesof sulcal depth and GWC to improve the clarity of the plot. (d) comparesthe overall median detection rate of HCRF with the baseline method usingdifferent input features and their combinations across the entire range of α.

on different input settings across the entire range of α, we used a two-sided

Wilcoxon signed-rank test [46]. Figure 5.8(d) compares the median detec-

tion rate of HCRF and the z-score based method for different input features

(and their combinations). For each indivdual feature HCRF achieves a higher

detection rate as compared to the z-score based baseline, except in the case

of sulcal-depth, for which the baseline is able to outperform HCRF. Similarly,

when we combine cortical thickness and curvature to define the final detection,

HCRF dominates the baseline, achieving the highest detection rate 75% using

105

an averaging technique to combine the posterior probabilities of the two input

features. However, when the same averaging technique is used to combine

the results of all four features, both HCRF and the baseline perform compa-

rably (albeit worse than using only two features) and the difference in their

performance across the different values of α is not statistically significant.

5.5 HCRF versus Human Expert

In 2014, We compared the results of HCRF detection, to the senior neuro-

radiologist Dr. Ruben Kuzniecky2, at New York University’s comprehensive

epilepsy treatment center which is one of the world’s leading tertiary epilepsy

treatment centers. Dr. Kuzniecky was first presented with the anonymized

clinical scans of each subject, this included T1 and T2 weighted MRI along

with FLAIR. He was then asked to identify any possible abnormality, being

totally blind to any post-surgical data and the HCRF detection results. We

compared Dr. Kuzniecky’s findings to the clusters found by HCRF at the most

lenient (fifth) threshold, using cortical thickness as the feature of interest.

For this experiment, we required patients that had a full pre-surgical MRI

dataset (e.g., T2-weighted images, FLAIR). Note that pre-surgical clinical

MRI sequences were not obtainable for many subjects due to their having

been referred from external centers. This set of patients included six of the

2Dr. Kuzniecky is Co-Director of the NYU Comprehensive Epilepsy Center and Directorof Epilepsy Research. He trained in neurology, epilepsy, and EEG at the Montreal NeurologicInstitute, McGill University, Canada. He has authored over three books, 36 chapters, andover 250 journal articles on a number of topics related to epilepsy, and has received epilepsyresearch grants from the NIH and numerous foundations. He is Co-Principal Investigatorof the Epilepsy Phenome Genome Project (EPGP), the largest epilepsy genetic study ofits type funded by National Institute of Neurological Diseases and Stroke (NINDS). Heis also the Co-Principal Investigator of the Human Epilepsy Project, a new internationalinitiative to investigate biomakers in epilepsy. His research interest focuses on brain imaging,malformations of brain development and epilepsy. Dr. Kuzniecky has been recognized forhis efforts in the “Best Doctors in America” multiple times and with many honorary lecturesaround the world.

106

MRI-positive patients and only 3 of the MRI-negative patients. We included

three MRI-negative patients who did not have an Engel outcome of 1 (i.e.,

they were not seizure free) because we had their clinical scans.

For the MRI-positive patients, the radiologist detected abnormalities in all

six cases, that correlated with the HCRF detections for five subjects consider-

ing the top five clusters, and with all six in the top ten clusters. For three of the

MRI-positive subjects his findings also identified abnormal regions that over-

lapped with extra-lesional clusters detected by HCRF that ranked amongst the

top ten. An example is depicted in Figure 5.9. In the case of MRI-negative

subjects, the radiologist was unable to identify any visible abnormality in all

six subjects. HCRF on other hand identified three subjects out of a total of five

that had outcomes of Engel class 1-3. For one MRI-negative subject with an

Engel-4 outcome, HCRF found no cluster among the top ten that overlapped

with his resection zone.

The preliminary results on the MRI-negative patients, albeit on a small

sample, are promising because the HCRF method is able to identify high

ranked clusters within the resection zones of MRI-negative patients who have

complete to partial seizure freedom after surgery. Overall, the results indi-

cate that the HCRF approach has a higher sensitivity to histopathologically-

confirmed lesions that are not visible to an expert radiologist, even when a full

set of clinical MRI sequences are available for review.

5.6 Conclusion

Any method of automated detection of FCD lesions is meant to augment the

standard comprehensive clinical evaluation protocol for epilepsy surgery can-

didates. This standard protocol typically involves a neurological exam, scalp

107

Figure 5.9: An MRI-positive patient (NY363) for whom the clusters detectedoutside the resection are also abnormal. Detected clusters using HCRF (a)-(b). The resected area (c), and the area corresponding to the largest clusteroutside the resection (d) shown on a T1 MRI slice.

electroencephalography (EEG), neuropsychological exam, positron emission

tomography (PET), and magnetic resonance imaging (MRI). Due to the com-

mon occurrence of widespread network abnormalities in focal epilepsy, each of

these methods has a high false positive rate. Thus, convergence of evidence

from multiple sources is critical to determining the region(s) with the highest

likelihood of hosting the seizure onset zone. In this work, we addressed this

challenging task of detecting FCD lesions in a semi-supervised image segmen-

tation framework. To this end, we developed a novel semi-supervised image

segmentation method based on hierarchical conditional random fields (HCRF).

We evaluated the proposed method on four morphological features, and also

investigated different mechanisms of combining the outcomes of these input

features.

In an empirical evaluation that involved twenty histologically verified MRI-

negative patients, who had undergone resective surgery and were subsequently

seizure-free, our proposed method was able to achieve higher detection rates

108

using four morphological features as compared to a baseline method. When

the detections based on these features were combined, HCRF was still able to

detect abnormal clusters within the resection zone of a higher number of pa-

tients as compared to the selected baseline. Not only did the proposed method

have a high detection rate, it also achieved significantly higher precision and

recall across all features and their combinations.

In this work we establish that each of the four morphological features,

namely cortical thickness, GWC, curvature and sulcal depth exhibit different

behavior for different settings of the cluster ranking criterion and some of them

produce noisier detections as compared to others. These two observations show

that any method that aims to combine detections from different features should

consider feature-specific properties such as the false positive rate and adjust

the ranking criterion to achieve a higher detection rate.

Because identifying the abnormal region in cryptogenic epilepsy is a mul-

tifaceted procedure that is based on a confluence of evidence from multiple

sources; the high detection rate of our proposed method will have a deeper

impact in the application domain by enhancing the sensitivity of the patient

evaluation methodology, as compared to the conventional visual assessment of

the patient’s MRI by trained radiologists. Indeed, our 75% detection rate on

the MRI-Negative patients in our evauation dataset (compared to a human

expert detection rate of 0%), suggests that this method can be used as an

effective tool in the pre-surgical evaluation of TRE patients who are likely to

undergo surgical resection.

109

Chapter 6

Conclusion

This thesis demonstrates that machine learning methods can be used to in-

crease the sensitivity of identifying epileptogenic cortical malformations in

treatment-resistant epilepsy patients, whose MRI evaluations are deemed nor-

mal by expert neuro-radiologists.

We investigated the main confounding factors that inhibit performance of

supervised learning algorithms for lesion detection in Chapter 3, and designed

a customized classification scheme tailored specifically to counter them. Our

analysis showed that by using training data from both MRI-negative and MRI-

positive patients, coupled with data pre-processing led to a higher detection

rate for MRI-negative patients as compared to a current automated lesion

detection method that is actively used at New York University’s comprehensive

epilepsy treatment center (our data collection source). In the course of this

analysis we identified label noise as the leading confounding factor, along with

inter-subject and intra-subject variations in brain morphology. To counter

label noise we performed ad hoc manual reduction of the resected region in

patients, which led to enhanced performance.

Chapter 4 proposed a novel multitask learning algorithm able to incorpo-

110

rate multiple sources of supervision. From the perspective of lesion detection,

we reinforced the weak vertex labels provided by the resected regions by us-

ing the results of the invasive iEEG exam as an added source of supervision.

Simlarly, we treated each patient as a separated learning task, with the goal

of countering the effects of inter-subject variations in brain morphology. Our

evaluation on a dataset consisting of patients with identical regions of resec-

tions, and their matched controls showed that the proposed algorithm was

able to detect abnormal regions within the resected regions of all patients.

We further established that using auxiliary supervision increased the detec-

tion rate and a multitask formulation was able to reduce the false positive rate

measured on control data, as compared to recently reported methods of lesion

detection.

Identifying label noise as the main confounding factor when using training

data from MRI-negative patients, we cast lesion detection as an outlier detec-

tion problem in Chapter 5. To this end, we developed a lesion detection frame-

work based on semi-supervised hierarchical conditional random fields (HCRF).

This method employed a multiscale strategy to locate the lesion, and explic-

itly modeled spatial dependencies among neighboring vertices. Furthermore,

we proposed a cluster ranking criterion that was able to rank clusters based

on their size and average probability of being abnormal. We evaluated this

method on both MRI-negative and MRI-positive patients, and the proposed

method was able to achieve higher detection rates, with higher precision and

recall as compared to two baseline methods. Furthermore, HCRF correctly de-

tected abnormal regions that overlapped with the resection zones of a higher

number of MRI-negative patients as compared to an expert neuroradiologist

who was unable to detect any abnormalities even with access to more compre-

hensive imaging data.

111

One of the primary aims of the proposed methods for lesion detection in

MRI-negative patients has been to counter the effects of label noise. To this

end, we performed manual mask reduction (Chapter 3), augmenting the weak

labels with the results of iEEG as another source of supervision (Chapter 4),

and discarding vertex-level labels and defining cortical lesions as outliers as

compared to the same regions in normal (control) brains (Chapter 5). An-

other confounding factor is the inter-regional variation in feature distributions

within a brain, and the inter-subject variation in brain morphology arising

from age, handedness and other demographic factors. The vertex-level clas-

sifier developed in Chapter 3, z-score normalized the feature values prior to

training using control data to counter the effects of inter-subject variation.

Similarly, the multitask method developed in Chapter 4 modeled each patient

(and a matched control) as a separate task to reduce the effects of differences

in brain morphology between different individuals. However, none of these

methods took inter-regional variation of feature values into account. On the

other hand, the HCRF framework (Chapter 5) models both inter-regional and

inter-patient variations, by comparing a given coritcal region from a patient

to the same region across the control population, and assigning outlier scores

based on the most similar (in a suitable feature space) k controls. By correct-

ing for all three factors (label noise, inter-regional and inter-subject variation)

HCRF is able to achieve higher performance than the other two methods.

The superior performance of the methods proposed in this thesis as com-

pared to both existing methods and human experts show that machine learning

based automated detection of epileptogenic malformations can serve as an ad-

ditional information source that can add weight to candidate brain regions for

further radiologic or electrophysiological probing, as part of the pre-surgical

protocol. Ultimately, such methods can serve as an additional check to ensure

112

that subtle, visually elusive regions are not overlooked. This is critical, con-

sidering that expert radiologists were unable to identify an abnormality in any

of the MRI-negative patients, as compared to the proposed methods whose

detection rates varied from 58% to 75%. This suggests that some electrophys-

iologically and histopathologically abnormal regions are not visually apparent

to the human eye but can be detected with machine learning methods. How-

ever, similar to all other informational sources in the comprehensive epilepsy

evaluation, results from automated detection methods will never be consid-

ered in isolation to identify a target brain region for surgical removal, but

could become a valuable addition to the clinical comprehensive epilepsy evalu-

ation protocol, particularly if they add weight to potentially abnormal regions

that may have been overlooked by standard radiological review. Similarly, the

results of the proposed methods can inform electrode placement as part of the

iEEG evaluation, thus increasing the chances of finding the lesion(s) prior to

the surgical procedure. The availability of such abnormal regions prior to sur-

gical resection has shown to increase the chances of a patient being seizure-free

(66%) as compared to the case when no potential targets are detected (29%).

Future avenues of research include the use of features derived from other

imaging techniques such as positron emission tomogepahy (PET), Fluid-attenutated

inversion recovery (FLAIR), etc. Similarly, morphological features can be com-

bined with network based features derived from different sources such as diffu-

sion tensor imaging (DTI), diffusion weighted imaging (DWI), functional MRI

(fMRI) and Magnetoencephalography (MEG). There is also a need for the

development of automated lesion detection methods that can work with data

collected at different imaging centers (i.e., heterogenuous scanners and scan-

ning sequences), as most of the current methods (including the work presented

in this thesis) rely on single source data.

113

Appendix A

Patient Information

Participants were selected from a large registry of patients with epilepsy treated

at the New York University School of Medicine Comprehensive Epilepsy Center

who signed consent for a research MRI scanning protocol. Criteria for inclusion

in this study included: (1) completion of a high resolution T1-weighted MRI

scan; (2) surgical resection to treat focal epilepsy; (3) diagnosis of FCD on

neuropathological examination of the resected tissue. This appendix provides

the demographic and seizure-related information for these participants.

Imaging for the research protocol was performed at the New York Uni-

versity Center for Brain Imaging on a Siemens Allegra 3T scanner. Image

acquisitions included a conventional 3-plane localizer and a T1-weighted vol-

ume pulse sequence (TE=3.25 ms, TR =2530 ms, TI =1100 ms, flip angle =7

deg field of view (FOV) = 256 mm, matrix = 256x256, vertex size =1x1x1.3

mm, scan time: 8:07 min). Acquisition parameters were optimized for in-

creased gray/white matter image contrast. The T1-weighted image was reori-

ented into a common space, roughly similar to alignment based on the AC-PC

line. Images were corrected for nonlinear warping caused by no-uniform fields

created by the gradient coils.

114

Clinical imaging sequences for radiological review were acquired at the

NYU Department of Radiology on a 3-Tesla Siemens scanner. Clinical se-

quences were variable across patients but commonly included a high-resolution

T1-weighted MPRAGE (magnetization-prepared rapid gradient echo), T2-

weighted images (axial and coronal, varying slice thickness from 1 to 3 mm),

and fluid-attenuated inversion recovery (FLAIR) images (26 mm slice thick-

ness). The research T1-weighted MPRAGE images used in our analyses were

included in the set of images reviewed by the clinical radiology team. Con-

ventional visual analysis of the clinical scans resulted in an MRI diagnosis of

FCD in 13 patients (MRI-positive) and a normal report in 24 patients (MRI-

negative). The higher number of MRI-negative patients in this sample may

be due to a tendency for patients with more complex, MRI-negative epilepsy

to be referred to the Level 4 epilepsy treatment center.

Table A.1, provides the demographic and seizure related information for

the MRI-positive patients. Note that, three MRI-positive patients namely,

NY68, NY116, and NY169 were initially classified as MRI-negative however,

on a later evaluation they were re-classified as MRI-positive. These three

patients appear as MRI-negatives in Chapter 3.

Table A.2, provides the demographic and seizure related information for

the MRI-negative patients. All except three Engel class I patients were used

for evaluating the HCRF framework (Chapter 5). These three patients in-

cluded, NY212, NY297 and NY312. NY212 was discarded because of missing

scan sequence data, while the other two patients were discarded as they had

incomplete post-surgical histopathological data.

115

Patient Location Age Sex Seizure Seizure EngelOnset Age Frequency Class

NY68 L Temporal 26 M 15 12 2NY116 R Temporal 30 M 22 84 1NY123 L Parietal 14 M 7 730 2NY143 R Frontal 38 F 4 1248 1NY156 L Parietal & Frontal 20 M 7 182 2NY169 R Temporal 26 M 3 1277 1NY174 R Temporal & Frontal 16 M 9 52 2NY187 L Temporal 45 F 5 14 1NY342 R Temporal & Frontal 47 F 32 12 3NY363 L Temporal 24 F 13 52 3NY388 L Frontal 30 F 8 1640 1NY453 R Temporal 18 F 8 12 1NY459 L Basal Frontal 17 M 1.5 365 1Mean 27.00 10.35 436.92

Table A.1: Demographic and seizure-related information for the MRI-positivepatients.

116

Patient Location Age Sex Seizure Seizure EngelOnset Age Frequency Class

NY46 R Temporal 41 M 3 52 1NY67 R Temporal 27 M 13 1825 1NY72 R Temporal 46 M 2 74 2NY148 L Temporal 37 M 35 3 2NY149 R Frontal 32 F 11 1460 1NY159 R Parietal 21 F 8 2190 1NY171 R Temporal 26 F 19 5 4NY177 L Temporal 38 F 19 5 3NY186 L Temporal 35 F 6 1095 2NY212 L Temporal 37 M 21 166 1NY226 L Temporal 40 F 5 8 1NY255 R Temporal 20 F 15 48 1NY259 L Temporal 26 F 9 288 2NY294 R Temporal 51 F 1 12 1NY297 R Temporal 51 F 8 52 1NY299 R Temporal 28 F 13 37 2NY312 L Temporal 43 F 6 24 1NY315 L Occipital 47 F 9 12 1

NY322R Frontal, Insular& Temporal

24 F 9 12 1

NY338 R Temporal 30 M 19 120 1NY343 R Temporal 32 M 21 1825 1NY351 L Temporal 30 M 12 12 1NY371 R Temporal 17 M 17 365 1NY375 R Temporal 16 F 2 54 1NY394 R Temporal 27 M 19 72 1NY404 R Temporal 51 F 45 6 1NY441 L Temporal 41 M 31 72 1NY451 R Inferior Parietal 25 M 9 912 1NY455 L Temporal 61 M 19 12 1NY486 L Temporal 29 F 27 96 1Mean 34.30 14.43 363.80

Table A.2: Demographic and seizure-related information for the MRI-negativepatients.

117

Appendix B

HCRF Results for MRI-Positive

Patients

We tested the proposed HCRF framework on a set of thriteen MRI-positive

patients (refer to Appendix A for patient information), using four individual

morphological features. These included, cortical thickness, gray/white con-

trast (GWC), curvature and sulcal depth. HCRF was evaluated using identical

parameter settings as described in Section 5.3.1. The detection rate was de-

termined by setting α to 1 (c.f. Equation 5.8), so that all clusters are ranked

based only on their surface area. We include these results to demonstrate

the superior performance of HCRF as compared to the z-score based baseline

method.

Figure B.2(a) shows the comparison of the detection rates for MRI-negative

patients when cortical thickness is used to represent the cortex. HCRF per-

forms better than the z-score baseline using both top five and top ten detec-

tions. HCRF detects the lesion in 12 (92%) patients, while the baseline detects

11 (85%) subjects when considering the top ten largest clusters. HCRF is able

to achieve higher recall as shown in Figure B.2(b). The difference between

118

Figure B.1: Detection results for an MRI positive patient (NY156) shown onan inflated model of the lateral cortical surface. The actual lesion is delineatedas the white circled region and the detection results are shown as filled yellowregions. Detected clusters after thresholding outlier probabilities at each indi-vidual scale (a)-(c), after running belief propagation (d), and using the z-scorebased approach (e). The results are shown for the highest ranking thresholdwithout any post-processing. (f) shows the lesion highlighted on a T1 MRIslice.

the recall values of the proposed method (14.4321± 0.6078) and the baseline

(2.0878± 1.0392) was significant at t(9) = 87.0053, p < 0.001, using a paired

t-test. As far as precision is considered, HCRF has higher precision values

(19.6634 ± 0.6749) than the baseline (19.1915 ± 1.1462) however the differ-

ences in the values were not significant. Figure-B.1 provides an example of

the detected clusters using HCRF and the baseline for a patient.

Using GWC, HCRF is able to detect abnormal clusters within the resection

zones of eight (61.5%) patients as opposed to the baseline that detects nine

(69%), as shown in Figure B.2(d). Figure B.2(e) shows the recall for HCRF

method (2.0367 ± 1.0882) that is significantly higher (t(9) = 6.7320, p <

0.001) than the recall of the baseline method (0.7814± 0.5094). Figure B.2(f)

compares the precision of the HCRF and baseline using GWC. The differences

119

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Det

ecti

on

Rate

ZSC (Top−5)

HCRF (Top−5)

ZSC (Top−10)

HCRF (Top−10)

(a)

1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

12

14

16

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(b)

1 2 3 4 5 6 7 8 9 1017.5

18

18.5

19

19.5

20

20.5

21

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(c)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Rate

ZSC (Top−5)

HCRF (Top−5)

ZSC (Top−10)

HCRF (Top−10)

(d)

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

4

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(e)

1 2 3 4 5 6 7 8 9 107

8

9

10

11

12

13

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(f)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Ra

te

ZSC (Top−5)

HCRF (Top−5)

ZSC (Top−10)

HCRF (Top−10)

(g)

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(h)

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(i)

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Detection Threshold

Dete

cti

on

Ra

te

ZSC (Top−5)

HCRF (Top−5)

ZSC (Top−10)

HCRF (Top−10)

(j)

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

Detection Threshold

Recall

(%

)

HCRF

Z−Score

(k)

1 2 3 4 5 6 7 8 9 1011

12

13

14

15

16

17

18

Detection Threshold

Precis

ion

(%

)

HCRF

Z−Score

(l)

Figure B.2: Comparison of detection rates, precision and recall between thenHCRF based approach and the baseline method using thickness (a)-(c), GWC(d)-(f), curvature (g)-(i) and sulcal depth (j)-(l). Here, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).

in the precision values for HCRF (11.4304±1.0621) and the baseline (8.4030±

0.5979) were found to be significant at t(9) = 15.8046, p < 0.001 using a

paired t-test. Although, using GWC HCRF is able to outperform the baseline

120

in terms of recall and precision, however, the resulting detection rate is lower

than the baseline.

Figure B.2(g) shows the comparison of the detection rates using curvature

to represent the cortex. HCRF detects abnormal clusters within the resec-

tion zones of 9 (69%) patients, while the baseline detects only 8 (61.5%) sub-

jects when the top ten largest clusters are considered. Figures B.2(h)-B.2(i)

show that HCRF is able to achieve higher recall and precision than the base-

line, repsectively. The difference between the recall values of the proposed


at t(9) = 6.5582, p < 0.001. However, the differences in precision for HCRF

(12.3179 ± 1.6842) and the baseline (11.6453 ± 0.8254) were not significant

(t(9) = 1.6289, p = 0.1378) using a paired t-test.

When sulcal depth is used to represent the cortex, the baseline dominates

HCRF at the first three thresholds, however HCRF is able to perform better at

the more lenient thresholds. Overall, HCRF is able to detect abnormal clusters

that overlap with the resections of 6 (46%) patients, while the baseline detects

the lesion in 5 (38.5%) patients (Figure B.2(j)). Figure B.2(k) shows the

recall for HCRF method (2.6564± 0.1726) that is significantly higher (t(9) =

11.0850, p < 0.001) than the recall of the baseline method (0.9731± 0.6418).

Figure B.2(l) compares the precision of the HCRF and baseline using GWC.

On average, the baseline has a higher precision than HCRF, however, the

differences in the precision values for HCRF (12.5145±0.8396) and the baseline

(13.5163± 1.4811) were not significant (t(9) = −1.8481, p = 0.0976).

Using individual features, HCRF is able to achieve a maximum detection

rate of 92% while the baseline has a maximum detection rate of 85%, when top

ten largest clusters are considered. Cortical thickness outperforms all other

features based on its average precision and recall.

121

Bibliography

[1] B. Ahmed, C. E. Brodley, K. E. Blackmon, R. Kuzniecky, G. Barash,C. Carlson, B. T. Quinn, W. Doyle, J. French, O. Devinsky, and T. The-sen. Cortical feature analysis and machine learning improves detectionof mri-negative focal cortical dysplasia. Epilepsy & Behavior, 48:21 – 28,2015.

[2] B. Ahmed, T. Thesen, K. Blackmon, Y. Zhao, O. Devinsky,R. Kuzniecky, and C. Brodley. Hierarchical conditional random fieldsfor outlier detection: An application to detecting epileptogenic corticalmalformations. In Proceedings of the 31st International Conference onMachine Learning (ICML-14), pages 1080–1088, 2014.

[3] L. Andrews. Special Functions of Mathematics for Engineers. SPIEOptical Engineering Press, 1992.

[4] J. Ashburner and K. J. Friston. Voxel-Based Morphometry-The Meth-ods. NeuroImage, 11(6):805 – 821, 2000.

[5] P. Awasthi, A. Gagrani, and B. Ravindran. Image modelling using treestructured conditional random fields. In IJCAI, pages 2060–2065, 2007.

[6] K. Babalola, B. Patenaude, P. Aljabar, J. Schnabel, D. Kennedy,W. Crum, S. Smith, T. Cootes, M. Jenkinson, and D. Rueckert. Com-parison and evaluation of segmentation techniques for subcortical struc-tures in brain mri. In Medical Image Computing and Computer-AssistedIntervention MICCAI 2008, pages 409–416, 2008.

[7] A. J. Barkovich, R. Guerrini, R. I. Kuzniecky, G. D. Jackson, and W. B.Dobyns. A developmental and genetic classification for malformationsof cortical development: update 2012. Brain, 135(5):1348–1369, 2012.

[8] A. J. Barkovich and C. A. Raybaud. Neuroimaging in disorders of cor-tical development. Neuroimaging Clinics of North America, 14(2):231 –254, 2004.

[9] S. R. Benbadis, L. Heriaud, W. O. Tatum IV, and F. L. Vale. Epilepsysurgery, delays and referral patterns-are all your epilepsy patients con-trolled? Seizure, 12(3):167 – 170, 2003.

122

[10] A. Bernasconi and N. Bernasconi. Unveiling epileptogenic lesions: Thecontribution of image processing. Epilepsia, 52:20–24, 2011.

[11] A. Bernasconi, N. Bernasconi, B. C. Bernhardt, and D. Schrader. Ad-vances in MRI for ’cryptogenic’ epilepsies. Nature Reviews Neurology,7(2):99–108, 2011.

[12] P. Besson, F. Andermann, F. Dubeau, and A. Bernasconi. Small focalcortical dysplasia lesions are located at the bottom of a deep sulcus.Brain, 131(12):3246–3255, 2008.

[13] P. Besson, F. Andermann, F. Dubeau, and A. Bernasconi. Small focalcortical dysplasia lesions are located at the bottom of a deep sulcus.Brain, 131(12):3246–3255, 2008.

[14] P. Besson, N. Bernasconi, O. Colliot, et al. Surface-based texture andmorphological analysis detects subtle cortical dysplasia. In MICCAI,pages 645–652, 2008.

[15] P. Besson, N. Bernasconi, O. Colliot, A. Evans, and A. Bernasconi.Surface-based texture and morphological analysis detects subtle corti-cal dysplasia. In Proceedings of the 11th International Conference onMedical Image Computing and Computer-Assisted Intervention MICCAI’08-Part I, pages 645–652, 2008.

[16] C. Bishop. Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer-Verlag New York, Inc., 2006.

[17] K. Blackmon, E. Halgren, W. B. Barr, C. Carlson, O. Devinsky,J. DuBois, B. T. Quinn, J. French, R. Kuzniecky, and T. Thesen. In-dividual differences in verbal abilities associated with regional blurringof the left gray and white matter boundary. J Neurosci., 31(43):15257–15263, 2011.

[18] I. Blumcke, M. Thom, E. Aronica, D. D. Armstrong, H. V. Vinters, et al.The clinicopathologic spectrum of focal cortical dysplasias: A consensusclassification proposed by an ad hoc task force of the ILAE diagnosticmethods commission. Epilepsia, 52(1):158–174, 2011.

[19] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Uni-versity Press, 2004.

[20] M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. LOF: IdentifyingDensity-Based Local Outliers. In ACM SIGMOD ICMD, pages 93–104.ACM, 2000.

[21] C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. J.A.I. Res., 11:131–167, 1999.

123

[22] R. Caruana. Multitask learning. Mach. Learn., 28(1):41–75, 1997.

[23] D. Comaniciu and P. Meer. Mean shift: a robust approach toward featurespace analysis. PAMI, 24(5):603–619, 2002.

[24] A. M. Dale, B. Fischl, and M. I. Sereno. Cortical surface-based analysis:I. segmentation and surface reconstruction. NeuroImage, 9(2):179 – 194,1999.

[25] R. Desikan, F. Sgonne, B. Fischl, et al. An automated labeling systemfor subdividing the human cerebral cortex on MRI scans into gyral basedregions of interest. NeuroImage, 31(3):968–980, 2006.

[26] L. R. Dice. Measures of the amount of ecologic association betweenspecies. Ecology, 26(3):297–302, 1945.

[27] J. S. Duncan, G. P. Winston, M. J. Koepp, and S. Ourselin. Brainimaging in the assessment for epilepsy surgery. The Lancet Neurology,15(4):420–433, 2016.

[28] C. Ecker, A. Marquand, J. Mouro-Miranda, P. Johnston, E. M. Daly,M. J. Brammer, S. Maltezos, C. M. Murphy, D. Robertson, S. C.Williams, and D. G. M. Murphy. Describing the brain in autism in fivedimensions – magnetic resonance imaging-assisted diagnosis of autismspectrum disorder using a multiparameter classification approach. TheJournal of Neuroscience, 30(32):10612–10623, 2010.

[29] A. Esbroeck, L. Smith, Z. Syed, S. Singh, and Z. Karam. Multi-taskseizure detection: addressing intra-patient variation in seizure morpholo-gies. Machine Learning, 102(3):309–321, 2015.

[30] T. Evgeniou, C. A. Micchelli, M. Pontil, et al. Learning multiple taskswith kernel methods. J. Mach. Learn. Res., 6:615–637, 2005.

[31] T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proceed-ings of the Tenth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, KDD ’04, pages 109–117, 2004.

[32] S. Fauser, S. M. Sisodiya, L. Martinian, M. Thom, C. Gumbinger, H.-J.Huppertz, C. Hader, K. Strobl, B. J. Steinhoff, M. Prinz, J. Zentner,and A. Schulze-Bonhage. Multi-focal occurrence of cortical dysplasia inepilepsy patients. Brain, 132(8):2079–2090, 2009.

[33] B. Fischl and A. M. Dale. Measuring the thickness of the human cerebralcortex from magnetic resonance images. Proceedings of the NationalAcademy of Science, 97:11050–11055, 2000.

124

[34] B. Fischl, D. Salat, E. Busa, et al. Whole brain segmentation: Auto-mated labeling of neuroanatomical structures in the human brain. Neu-ron, 33(3):341–355, 2002.

[35] B. Fischl, M. I. Sereno, and A. M. Dale. Cortical surface-based anal-ysis: Ii: Inflation, flattening, and a surface-based coordinate system.NeuroImage, 9(2):195 – 207, 1999.

[36] B. Fischl, M. I. Sereno, R. B. Tootell, and A. M. Dale. High-resolutionintersubject averaging and a coordinate system for the cortical surface.Human Brain Mapping, 8(4):272–284, 1999.

[37] S. S. Ghosh, S. Kakunoori, J. Augustinack, A. Nieto-Castanon, I. Kovel-man, N. Gaab, J. A. Christodoulou, C. Triantafyllou, J. D. Gabrieli, andB. Fischl. Evaluating the validity of volume-based and surface-basedbrain image registration for developmental cognitive neuroscience stud-ies in children 4 to 11 years of age. NeuroImage, 53(1):85 – 93, 2010.

[38] A. S. Hakimi, M. V. Spanaki, L. A. Schuh, B. J. Smith, and L. Schultz. Asurvey of neurologists’ views on epilepsy surgery and medically refractoryepilepsy. Epilepsy & Behavior, 13(1):96 – 101, 2008.

[39] W. A. Hauser and D. C. Hesdorffer. Epilepsy: frequency, causes andconsequences. Epilepsy Foundation of America, 1990.

[40] R. J. Hickey. Noise modelling and evaluating learning from examples.Artificial Intelligence, 82(12):157 – 179, 1996.

[41] P. A. Hofman, G. J. Fitt, A. S. Harvey, R. Kuzniecky, and G. Jackson.Bottom-of-sulcus dysplasia: imaging features. AJR Am J Roentgenol.,196(4):881–885, 2011.

[42] S.-C. Hong, K.-S. Kang, D. W. Seo, S. B. Hong, M. Lee, D.-H. Nam,J.-I. Lee, J. S. Kim, H.-J. Shin, K. Park, W. Eoh, Y.-L. Suh, and J.-H.Kim. Surgical treatment of intractable epilepsy accompanying corticaldysplasia. Journal of Neurosurgery, 93(5):766–773, 2000.

[43] S. J. Hong, H. Kim, D. Schrader, N. Bernasconi, B. C. Bernhardt, andA. Bernasconi. Automated detection of cortical dysplasia type II inMRI-negative epilepsy. Neurology, 83(1):48–55, 2014.

[44] H.-J. Huppertz, J. Kassubek, D.-M. Altenmller, T. Breyer, andS. Fauser. Automatic curvilinear reformatting of three-dimensional MRIdata of the cerebral cortex. NeuroImage, 33(10):1932 – 1938, 2012.

[45] R. I. Kuzniecky and A. J. Barkovich. Malformations of cortical develop-ment and epilepsy. Brain Dev., 23(1):2–11, 2001.

125

[46] N. Japkowicz and M. Shah. Evaluating Learning Algorithms: A Classi-fication Perspective. Cambridge University Press, New York, NY, USA,2011.

[47] N. Japkowicz and S. Stephen. The class imbalance problem: A system-atic study. Intell. Data Anal., 6(5):429–449, 2002.

[48] T. Joachims. Optimizing search engines using clickthrough data. InACM SIGKDD, KDD ’02, pages 133–142, 2002.

[49] L. G. Kini, J. C. Gee, and B. Litt. Computational analysis in epilepsyneuroimaging: A survey of features and methods. NeuroImage: Clinical,page To Appear, 2016.

[50] A. Klein, S. S. Ghosh, B. Avants, B. Yeo, B. Fischl, B. Ardekani, J. C.Gee, J. Mann, and R. V. Parsey. Evaluation of volume-based and surface-based brain image registration methods. NeuroImage, 51(1):214 – 220,2010.

[51] H.-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek. LoOP: LocalOutlier Probabilities. In ACM CIKM, pages 1649–1652, 2009.

[52] P. Krsek, B. Maton, B. Korman, E. Pacheco-Jacome, P. Jayakar,C. Dunoyer, et al. Different features of histopathological subtypes ofpediatric focal cortical dysplasia. Annals of Neurology, 63(6):758–769,2008.

[53] P. Krsek, T. Pieper, A. Karlmeier, M. Hildebrandt, D. Kolodziejczyk,P. Winkler, E. Pauli, I. Blmcke, and H. Holthausen. Different presur-gical characteristics and seizure outcomes in children with focal corticaldysplasia type i or ii. Epilepsia, 50(1):125–137, 2009.

[54] A. Kumar and H. Daume. Learning task grouping and overlap in multi-task learning. In Proceedings of the 29th International Conference onMachine Learning (ICML-12), pages 1383–1390, 2012.

[55] P. Kwan and M. J. Brodie. Early identification of refractory epilepsy.New England Journal Of Medicine, 342(5):314–319, 2000.

[56] P. Kwan, S. C. Schachter, and M. J. Brodie. Drug-resistant epilepsy.New England Journal of Medicine, 365(10):919–926, 2011.

[57] M. L. Bell et al. Epilepsy surgery outcomes in temporal lobe epilepsywith a normal MRI. Epilepsia, 50(9):2053–2060, 2009.

[58] P. J. Lenk, W. S. DeSarbo, P. E. Green, and M. R. Young. Hierarchi-cal bayes conjoint analysis: Recovery of partworth heterogeneity fromreduced experimental designs. Marketing Science, 15(2):173–191, 1996.

126

[59] J. P. Lerch and A. C. Evans. Cortical thickness analysis examinedthrough power analysis and a population simulation. NeuroImage,24(1):163 – 173, 2005.

[60] J. Lin, N. Salamon, A. Lee, et al. Reduced neocortical thickness andcomplexity mapped in mesial temporal lobe epilepsy with hippocampalsclerosis. Cereb. Cortex, 17(9):2007–2018, 2007.

[61] P. L. Lopez-Cruz, C. Bielza, P. Larranaga, et al. Learning conditionallinear gaussian classifiers with probabilistic class labels. In CAEPIA ’13,pages 139–148, 2013.

[62] D. Lowe. Object recognition from local scale-invariant features. In ICCV,pages 1150–1157, 1999.

[63] C. McDonald, D. J. H. Jr, M. E. Ahmadi, et al. Regional neocorti-cal thinning in mesial temporal lobe epilepsy. Epilepsia, 49(5):794–803,2008.

[64] C. Mellerio, M.-A. Labeyrie, F. Chassoux, C. Daumas-Duport, E. Lan-dre, B. Turak, F.-X. Roux, J.-F. Meder, B. Devaux, and C. Oppen-heim. Optimizing mr imaging detection of type 2 focal cortical dysplasia:Best criteria for clinical practice. American Journal of Neuroradiology,39(1):80 – 86, 2008.

[65] K. J. Miller, M. denNijs, P. Shenoy, J. W. Miller, R. P. N. Rao, and J. G.Ojemann. Real-time functional brain mapping using electrocorticogra-phy. NeuroImage, 37(2):504 – 507, 2007.

[66] S. Mueller, K. Laxer, J. Barakos, I. Cheong, P. Garcia, and M. Weiner.Widespread neocortical abnormalities in temporal lobe epilepsy with andwithout mesial sclerosis. NeuroImage, 46(2):353 – 359, 2009.

[67] A. Muhlebner, R. Coras, K. Kobow, M. Feucht, T. Czech, H. Stefan,D. Weigel, M. Buchfelder, H. Holthausen, T. Pieper, M. Kudernatsch,and I. Blumcke. Neuropathologic measurements in focal cortical dys-plasias: validation of the ilae 2011 classification system and diagnosticimplications for mri. Acta Neuropathologica, 123(2):259–272, 2011.

[68] D. F. Nettleton, A. Orriols-Puig, and A. Fornells. A study of the effect ofdifferent types of noise on the precision of supervised learning techniques.Artificial Intelligence Review, 33(4):275–306, 2010.

[69] Q. Nguyen, H. Valizadegan, M. Hauskrecht, et al. Learning classificationwith auxiliary probabilistic information. In IEEE ICDM ’11, pages 477–486, 2011.

[70] S. Noachtar and A. Peters. Semiology of epileptic seizures: A criticalreview. Epilepsy Behav., 15(1):2–9, 2009.

127

[71] C. Nordahl, D. Dierker, I. Mostafavi, et al. Cortical folding abnormal-ities in autism revealed by surface-based morphometry. J Neurosci.,27(43):11725–11735, 2007.

[72] S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. LOCI:fast outlier detection using the local correlation integral. In ICDE, pages315–326, 2003.

[73] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks ofPlausible Inference. Morgan Kaufmann Publishers Inc., 1988.

[74] R. Pienaar, B. Fischl, V. Caviness, N. Makris, and P. E. Grant.A methodology for analyzing curvature in the developing brain frompreterm to adult. International Journal of Imaging Systems and Tech-nology, 18(1):42–68, 2008.

[75] N. Plath, M. Toussaint, and S. Nakajima. Multi-class image segmenta-tion using conditional random fields and global classification. In ICML,pages 817–824, 2009.

[76] A. A. Raymond, D. R. Fish, S. M. Sisodiya, N. Alsanjari, J. M. Stevens,and S. D. Shorvon. Abnormalities of gyration, heterotopias, tuber-ous sclerosis, focal cortical dysplasia, microdysgenesis, dysembryoplasticneuroepithelial tumour and dysgenesis of the archicortex in epilepsy:Clinical, eeg and neuroimaging features in 100 adult patients. Brain,118(3):629–660, 1995.

[77] U. Rebbapragada and C. E. Brodley. Class noise mitigation throughinstance weighting. In ECML ’07, pages 708–715, 2007.

[78] J. Reynolds and K. Murphy. Figure-ground segmentation using a hier-archical conditional random field. In CRV, pages 175–182, 2007.

[79] L. Rimol, R. Nesvag, D. Hagler Jr., et al. Cortical volume, surfacearea, and thickness in schizophrenia and bipolar disorder. BiologicalPsychiatry, 71(6):552–560, 2012.

[80] D. Rivire, J.-F. Mangin, D. Papadopoulos-Orfanos, J.-M. Martinez,V. Frouin, and J. Rgis. Automatic recognition of cortical sulci of thehuman brain using a congregation of neural networks. Medical ImageAnalysis, 6(2):77 – 92, 2002.

[81] F. Rosenow and H. Luders. Presurgical evaluation of epilepsy. Brain,124(9):1683–1700, 2001.

[82] M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich. Totransfer or not to transfer. In In NIPS05 Workshop, Inductive Transfer:10 Years Later, 2005.

128

[83] P. Rzezak, P. Squarzoni, F. L. Duran, T. de Toledo Ferraz Alves,J. Tamashiro-Duran, C. M. Bottino, S. Ribeiz, P. A. Lotufo, P. R.Menezes, M. Scazufca, and G. F. Busatto. Relationship between brainage-related reduction in gray matter and educational attainment. PLoSONE, 10(10):1–15, 2015.

[84] D. Salat, R. Buckner, A. Snyder, et al. Thinning of the cerebral cortexin aging. Cerebral Cortex, 14(7):721–730, 2004.

[85] M. Sampat, Z. Wang, M. Markey, G. Whitman, T. Stephens, andA. Bovik. Measuring intra- and inter-observer agreement in identify-ing and localizing structures in medical images. In Image Processing,2006 IEEE International Conference on, pages 81–84, 2006.

[86] B. Scholkopf and A. J. Smola. Learning with Kernels: Support VectorMachines, Regularization, Optimization, and Beyond. MIT Press, 2001.

[87] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evalu-ation of outlier rankings and outlier scores. In SDM, pages 1047–1058,2012.

[88] S. M. Sisodiya. Surgery for focal cortical dysplasia. Brain, 127(11):2383–2384, 2004.

[89] S. M. Sisodiya, S. Fauser, J. H. Cross, and M. Thom. Focal corticaldysplasia type ii: biological features and clinical perspectives. The LancetNeurology, 8(9):830 – 843, 2009.

[90] P. Smyth, U. M. Fayyad, M. C. Burl, P. Perona, and P. Baldi. Inferringground truth from subjective labelling of venus images. In NIPS ’94,pages 1085–1092, 1994.

[91] C. Sutton and A. McCallum. An Introduction to Conditional RandomFields, 2010. eprint arXiv:1011.4088.

[92] J. T. Lerner et al. Assessment and surgical outcomes for mild type Iand severe type II cortical dysplasia: a critical review and the UCLAexperience. Epilepsia, 50(6):1310–1335, 2009.

[93] L. Tassi, N. Colombo, R. Garbelli, S. Francione, G. Lo Russo, R. Mai,F. Cardinale, M. Cossu, A. Ferrario, C. Galli, M. Bramerio, A. Citterio,and R. Spreafico. Focal cortical dysplasia: neuropathological subtypes,eeg, neuroimaging and surgical outcome. Brain, 125(8):1719–1732, 2002.

[94] L. Tassi, N. Colombo, R. Garbelli, S. Francione, G. Lo Russo, R. Mai,F. Cardinale, M. Cossu, A. Ferrario, C. Galli, M. Bramerio, A. Citterio,and R. Spreafico. Focal cortical dysplasia: neuropathological subtypes,eeg, neuroimaging and surgical outcome. Brain, 125(8):1719–1732, 2002.

129

[95] J. F. Tellez-Zenteno, R. Dhar, L. Hernandez-Ronquillo, and S. Wiebe.Long-term outcomes in epilepsy surgery: antiepileptic drugs, mortality,cognitive and psychosocial aspects. Brain, 130(2):334–345, 2007.

[96] T. Thesen et al. Detection of Epileptogenic Cortical Malformations withSurface-Based MRI Morphometry. PLoS ONE, 6(2):1–10, 2011.

[97] M. Thom, L. Martinian, A. Sen, J. H. Cross, B. N. Harding, and S. M.Sisodiya. Cortical neuronal densities and lamination in focal corticaldysplasia. Acta Neuropathologica, 110(4):383–392, 2005.

[98] S. Thrun and L. Pratt. Learning to Learn. Kluwer Academic Publishers,1996.

[99] J. F. Tllez-Zenteno, L. H. Ronquillo, F. Moien-Afshari, and S. Wiebe.Surgical outcomes in lesional and non-lesional epilepsy: A systematicreview and meta-analysis. Epilepsy Research, 89(23):310 – 318, 2010.

[100] A. Torralba, K. Murphy, W. Freeman, et al. Sharing features: efficientboosting procedures for multiclass object detection. In IEEE CVPR ’04,pages 762–769, 2004.

[101] K. Tufenkjian and H. O. Luders. Seizure semiology: Its value and limita-tions in localizing the epileptogenic zone. J Clin Neurol., 8(4):243–250,2012.

[102] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces.In Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on, pages 586–591, 1991.

[103] D. C. Van Essen, H. A. Drury, S. Joshi, and M. I. Miller. Functionaland structural mapping of human cerebral cortex: Solutions are in thesurfaces. Proceedings of the National Academy of Sciences, 95(3):788–795, 1998.

[104] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., 1995.

[105] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/, 2008a.

[106] A. Vedaldi and S. Soatto. Quick shift and kernel methods for modeseeking. In ECCV, pages 705–718, 2008b.

[107] J. von Oertzen, H. Urbach, S. Jungbluth, M. Kurthen, M. Reuber,G. Fernndez, and C. E. Elger. Standard magnetic resonance imagingis inadequate for patients with refractory focal epilepsy. Journal of Neu-rology, Neurosurgery & Psychiatry, 73(6):643–647, 2002.

130

[108] B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Classimbalance, redux. In Proceedings of the IEEE International Conferenceon Data Mining (ICDM ’11), pages 754–763, 2011.

[109] Y. Wang and R. Khardon. Sparse gaussian processes for multi-tasklearning. In ECML/PKDD’12, pages 711–727, 2012.

[110] Z. I. Wang, A. V. Alexopoulos, S. E. Jones, Z. Jaisani, I. M. Najm, andR. A. Prayson. The pathology of magnetic-resonance-imaging-negativeepilepsy. Mod Pathol, 26(8):1051–1058, 2013.

[111] M. Wilke, J. Kassubek, S. Ziyeh, A. Schulze-Bonhage, and H. Hup-pertz. Automated detection of gray matter malformations using opti-mized voxel-based morphometry: a systematic approach. NeuroImage,20(1):330 – 343, 2003.

[112] J. Yuan, Y. Chen, and E. Hirsch. Intracranial electrodes in the presur-gical evaluation of epilepsy. Neurological Sciences, 33(4):723–729, 2012.

[113] A. Zijdenbos, B. Dawant, R. Margolin, and A. Palmer. Morphometricanalysis of white matter lesions in MR images: method and validation.Medical Imaging, IEEE Transactions on, 13(4):716–724, Dec 1994.

[114] K. H. Zou, S. K. Warfield, A. Bharatha, C. M. Tempany, M. R. Kaus,S. J. Haker, W. M. Wells III, F. A. Jolesz, and R. Kikinis. Statistical val-idation of image segmentation quality based on a spatial overlap index:scientific reports. Academic Radiology, 11(2):178–189, 2004.

131

Documents

Decrypting Cryptogenic Epilepsy: Machine Learning Methods