11
MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT 1 AbstractAccurate and precise brain segmentations of Magnetic Resonance (MR) brain images from patients after aneurysmal subarachnoid hemorrhage (aSAH) are hard to acquire by an automated routine due to presence of various cerebral abnormalities, like enlarged ventricles. Available routines neither dealt with theses abnormalities nor were suited for MR images with high magnetic field strength or used techniques with limited accuracy and precision. In order to perform accurate and precise brain volume measurements for 3 T aSAH MR images, we created a new routine in which we tried to deal with these cerebral abnormalities. Measurements of intracranial volume, total brain, lateral ventricles and peripheral cerebrospinal fluid were performed on T1 and T2 weighted MR images of 39 patients and 25 control participants using k-Nearest Neighbor (kNN) classification. Evaluation showed a fractional Similarity Index (fSI) of 0.98, 0.93 and 0.92 for respectively intra- cranial volume, total brain and lateral ventricles, which are equally good as the inter-observer results. Index TermsAneurysmal Subarachnoid Hemorrhage; k- Nearest Neighbor classification; Magnetic Resonance imaging; Segmentation I. INTRODUCTION NEURYSMAL SUBARACHNOID HEMORRHA- GE (aSAH) is a type of stroke, caused by a ruptured intracranial aneurysm [1]. The annual incidence of a non-traumatic aSAH varies from 6 - 8 cases per 100,000 person-years [2]. Almost half died within thirty days [3] while almost half of the survivors suffered from significant cognitive and neurological or cognitive deficits after a year [4]. It is assumed that the size of neuropsychological deficits, commonly detected after treatment of ruptured intracranial aneurysms is associated with the loss of cerebral volume [5]. Study by Bendel showed enlargement of cerebrospinal fluid (CSF) and ventricular volume in patients after aSAH, using the technique of voxel-based morphometry (VBM) [6]. However, the accuracy and precision of VBM is limited since its measurements are based on an average brain, which is not specific for aSAH patients [7]. Existing routines, which are based on training data of Magnetic Resonance (MR) brain images, were not suited to measure significant volume differences in scans of patients after aSAH. This is partly because they were made for MR image data with too low magnetic field strength, and partly because they lacked cerebral abnormalities present in patients after aSAH, like enlarged ventricles. k-Nearest Neighbor-based probabilistic segmentation (kNN) [8] is a supervised pattern recognition method which can perform precise and accurate brain volume measurement [7], for which training data can be obtained from different high resolution MR brain scans containing variety of cerebral abnormalities. In this study we aimed therefore to design a new, automatic routine for quantification of cerebral structure volumes in patients after aSAH, based on kNN using manually segmented MR image training data. II. MATERIALS AND METHODS A. Data For training 10 and for validation 12 scans of patients after aSAH and of age- and sex-matched control participants were included, which were obtained between 2005 and 2007. Patients who were screened on aneurysmata were included as control participants. Patients were excluded if they had additional aneurysms treated with neurosurgical clips that either contained ferromagnetic material or were located less than 20 mm from the coiled aneurysm, had a cardiac pacemaker, were claustrophobic or younger than 18 years [9]. MRI scans were acquired on a 3T Philips magnetic resonance imaging system using a standardized protocol (24 contiguous slices, voxel size: 0.45 × 0.45 × 4.0 mm) and consisted of an axial T1-weighted (repetition time in ms [TR]: 500, echo time in ms [TE]: 10) and T2-weighted sequence (TR: 3000, TE: 80). B. Image processing Routine steps In figure 1, all routine steps from provided images to resulting probability maps are schematically visualized. Automated Measurement of Brain Volume in Patients after Aneurysmal Subarachnoid Hemorrhage Anne Kaspers, Biomedical Image Sciences, University Medical Centre Utrecht A

Master research article

Embed Size (px)

Citation preview

Page 1: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

1

Abstract—Accurate and precise brain segmentations of

Magnetic Resonance (MR) brain images from patients after

aneurysmal subarachnoid hemorrhage (aSAH) are hard to

acquire by an automated routine due to presence of various

cerebral abnormalities, like enlarged ventricles. Available

routines neither dealt with theses abnormalities nor were suited

for MR images with high magnetic field strength or used

techniques with limited accuracy and precision. In order to

perform accurate and precise brain volume measurements for 3

T aSAH MR images, we created a new routine in which we tried

to deal with these cerebral abnormalities. Measurements of

intracranial volume, total brain, lateral ventricles and peripheral

cerebrospinal fluid were performed on T1 and T2 weighted MR

images of 39 patients and 25 control participants using k-Nearest

Neighbor (kNN) classification. Evaluation showed a fractional

Similarity Index (fSI) of 0.98, 0.93 and 0.92 for respectively intra-

cranial volume, total brain and lateral ventricles, which are

equally good as the inter-observer results.

Index Terms—Aneurysmal Subarachnoid Hemorrhage; k-

Nearest Neighbor classification; Magnetic Resonance imaging;

Segmentation

I. INTRODUCTION

NEURYSMAL SUBARACHNOID HEMORRHA-

GE (aSAH) is a type of stroke, caused by a ruptured

intracranial aneurysm [1]. The annual incidence of a

non-traumatic aSAH varies from 6 - 8 cases per 100,000

person-years [2]. Almost half died within thirty days [3] while

almost half of the survivors suffered from significant cognitive

and neurological or cognitive deficits after a year [4]. It is

assumed that the size of neuropsychological deficits,

commonly detected after treatment of ruptured intracranial

aneurysms is associated with the loss of cerebral volume [5].

Study by Bendel showed enlargement of cerebrospinal fluid

(CSF) and ventricular volume in patients after aSAH, using

the technique of voxel-based morphometry (VBM) [6].

However, the accuracy and precision of VBM is limited since

its measurements are based on an average brain, which is not

specific for aSAH patients [7]. Existing routines, which are

based on training data of Magnetic Resonance (MR) brain

images, were not suited to measure significant volume

differences in scans of patients after aSAH. This is partly

because they were made for MR image data with too low

magnetic field strength, and partly because they lacked

cerebral abnormalities present in patients after aSAH, like

enlarged ventricles. k-Nearest Neighbor-based probabilistic

segmentation (kNN) [8] is a supervised pattern recognition

method which can perform precise and accurate brain volume

measurement [7], for which training data can be obtained from

different high resolution MR brain scans containing variety of

cerebral abnormalities.

In this study we aimed therefore to design a new, automatic

routine for quantification of cerebral structure volumes in

patients after aSAH, based on kNN using manually segmented

MR image training data.

II. MATERIALS AND METHODS

A. Data

For training 10 and for validation 12 scans of patients after

aSAH and of age- and sex-matched control participants were

included, which were obtained between 2005 and 2007.

Patients who were screened on aneurysmata were included as

control participants.

Patients were excluded if they had additional aneurysms

treated with neurosurgical clips that either contained

ferromagnetic material or were located less than 20 mm from

the coiled aneurysm, had a cardiac pacemaker, were

claustrophobic or younger than 18 years [9].

MRI scans were acquired on a 3T Philips magnetic

resonance imaging system using a standardized protocol (24

contiguous slices, voxel size: 0.45 × 0.45 × 4.0 mm) and

consisted of an axial T1-weighted (repetition time in ms [TR]:

500, echo time in ms [TE]: 10) and T2-weighted sequence

(TR: 3000, TE: 80).

B. Image processing

Routine steps

In figure 1, all routine steps from provided images to

resulting probability maps are schematically visualized.

Automated Measurement of Brain Volume

in Patients after Aneurysmal Subarachnoid

Hemorrhage Anne Kaspers, Biomedical Image Sciences, University Medical Centre Utrecht

A

Page 2: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

2

Fig. 1. Flow chart of the Volume Measurement Routine

Page 3: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

3

First, the T1-weighted image was rigidly registered to the

T2-weighted image by using Elastix [10].

To exclude hyper-intense non-brain structures like skull and

fatty tissue, a brain mask was created by an automated routine,

based on the k-means algorithm [11], which used both the T1-

and T2-weighted image (figure 2A). The first non-empty slice

was used 5 times to get more hyper-intense background

information for k-means clustering. A foreground mask was

created using k-means clustering with a small sample set,

previous to full k-means clustering (figure 2B). Scan

inhomogeneities were corrected by a shading correction

algorithm using a multiplicative 4th

order correction model on

all voxels covered by the foreground mask [12]. In full k-

means clustering, all shading corrected T1 and T2 intensities

were taken as samples in a 2D feature space, which only

contained intensity parameters. The algorithm tried to find 10

means, which minimized the sum of Euclidean distance of all

samples to their nearest mean. Each voxel was classified to the

cluster number of their nearest mean, which resulted in 10

brain clusters and 1 background cluster, derived from the

foreground mask (figure 2C).

To select clusters suitable for the brain mask, cluster

numbers were counted for a fixed selection of approximately

1/3 of the voxels located in the center of the cluster image.

The 4 largest clusters and extra clusters, which size exceeded a

threshold, were summed to get a basic mask (figure 2D).

To exclude remaining non-brain structures and fill holes, a

number of morphological operations were performed. An

erosion with a round, 11 voxels wide kernel separated non-

brain structures from the brain. These structures were removed

by segmenting groups of attaching mask voxels, further

mentioned as blobs, and keeping only the largest blob.

Dilation with the same kernel as used for erosion restored the

old borders (figure 2E). A set of 6 dilations with a round, 9

voxels wide kernel filled holes while kept the shape of the

mask edge intact. The mask was brought back within its

original borders by 7 erosions with the same kernel (figure

2F). A maximum of the brain mask with holes and the eroded

mask restored the old borders while holes remained filled

(figure 2G). At the end of the routine 3 dilations with a 7

voxels wide, round kernel increased the margin to include all

CSF below the skull. Since only the cerebral volume was

important for our study, the cerebellum was manually

segmented (figure 2H).

The T2 image and the registered T1 image were multiplied

voxelwise by their corresponding mask including cerebellum

and inhomogeneities were corrected [12], resulting in brain

extracted shading corrected images, which were used for kNN

classification (figure 1, processing routine).

As post-processing, small groups of attaching probabilities,

Fig. 2. k-Means mask routine

Page 4: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

4

further mentioned as blobs, were transferred from the lateral

ventricles to the peripheral CSF probability map; only the

largest blob was not transferred. Afterwards, a visual check

was done to move back wrongly transferred blobs.

To remove as subcortical structures and cortical grey matter

misclassified background outside the brain, the mask was

eroded 2 times with a round, 7 voxels wide, kernel and voxels

of subcortical structures and cortical grey matter outside the

eroded mask were excluded. Infarcts, drain trajectories,

meningiomas, etcetera, significantly diminished classification

outcome and were manually segmented and removed from the

probability maps. In figure 3, an example classification

outcome of one participant is shown.

Routine choices

In this study, volume measurements of subcortical

structures, cortical grey matter, peripheral CSF and lateral

ventricles were performed. Besides these structures, other

structures were included in the masked area, further mentioned

as background, which needed to be included in the training

data to prevent misclassification. Assignment of all not

classified voxels as background in the training data would

incorrectly assign partial volume brain structure voxels to the

background. Assignment of only hypo-intense voxels as

background would lead to misclassification of hyper-intense

background to closely located brain structures with similar

intensity. Therefore, we put a manual selection of non-partial

hypo- and hyper-intense background in the training data.

Remaining misclassified skull and fatty tissue classified as

subcortical structures and cortical grey matter was removed if

it was located within 6 voxels of the edge of the brain mask,

under the assumption that only peripheral CSF could be

located there.

The provided T1 and T2 weighted MR brain images

contained a shading artefact, which diminished intensity

homogeneity for each brain structure. We applied

inhomogeneity correction [12], assuming its effect to the

classification could be large since the orientation of the shaded

area is different for each scan, which makes it hard to handle

by kNN. Preventive removal of shading seemed better than

inclusion of a representative selection of all shading areas in

the training data, which would enlarge the overlap of structure

samples in feature space. In figure 4, T1 and T2 weighted

intensities of samples from a training data patient with

numerous parenchymal high-signal intensity lesions on T2-

weighted MRI are shown before and after inhomogeneity

correction. Both the T1 and T2 weighted image added

information, which showed the different range of structures on

the x- and y-axis. After correction, intensities of all structures

were more concentrated and distinctive. Cortical grey matter,

peripheral CSF and parenchymal lesion intensities were better

separated from each other while there was still overlap

between subcortical structures and cortical grey matter, which

could be explained by the unclear border in both the T1 and

T2 weighted image. The effect of inhomogeneity correction to

cortical grey matter classification is shown in figure 5 for a

participant scan with little and one with significant shading.

After correction, cortical grey matter was better classified on

the shading area, which made the segmentation more uniform.

To create a proper brain mask, we designed an automated

routine, based on the k-means algorithm [11]. It was extended

with cluster selection and a set of morphological operations to

fill holes, caused by exclusion of small clusters in the brain,

while original borders were maintained. Parameters for cluster

selection were determined by testing values close to the

settings which were used in a study by Jongen [13] on our

training data. In contrast to the mask routine used by Jongen,

we automated cluster selection by setting a cluster size

threshold, which provided good cluster selection for 9 of the

10 training data images. After cluster selection, a large number

of small dilations, followed by one more number of small

erosions was used instead of a large morphologic closing, to

fill large holes without loss of border detail. Holes close to the

border were filled while the original border was kept intact by

taking voxelwise the maximum of the unclosed mask and the

closed, eroded mask.

For a selection of participants, results of k-means and the

Brain Extraction Tool (BET) were compared [14]. In normal

cases BET performed similar to k-means, but in cases with

large infarcts k-means performed better. In k-means we could

determine the number and selection of clusters to be classified.

Fig. 3. A registered T1 and T2 weighted image and corresponding kNN probability maps of subcortical structures, cortical grey matter, peripheral CSF and lateral ventricles.

Page 5: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

5

This allowed us to include large infarcts and exclude hyper-

intense background. BET often considered infarcts as non-

brain structures, which caused large gaps in the mask. Since a

larger part of the patients after aSAH had infarcts (n = 40), we

chose to use k-means instead of BET.

All blobs in the lateral ventricles probability map, except the

largest were transferred from the lateral ventricles to the

peripheral CSF probability map, under the assumption that all

lateral ventricle voxels attach to each other. However, this

assumption was not valid in all cases because of the large slice

thickness. Manual adjustment was needed for some posterior

and inferior ventricle horns. Nevertheless, this operation was

an easy way to get improvement.

Since we were only interested in volume measurements of

brain structures in the cerebrum, we needed to segment the

cerebellum. However, presence of subcortical structures,

cortical grey matter and peripheral CSF in both cerebrum and

cerebellum complicated kNN classification and search for

better methods exceeded the project scope, so we segmented

the cerebellum manually. Because the border between

cerebrum and cerebellum was unclear, specific segmentation

rules had to be defined to guaranty consistency.

C. Training data routine

The training data consisted of non-partial volume

segmentations of 10 participant scans (JB). It is a

representative selection of the dataset (Appendix A),

composed of scans of patients after aSAH and control

participants, which varied in modified Rankin Scale [15] and

size of the lateral ventricles. The segmentations contained

background and 4 brain structures: subcortical structures,

cortical grey matter, peripheral CSF and lateral ventricles. For

all training data participants pre-processing was performed

(section C). A fixed, random selection of 40% of the manually

segmented structures and background was saved by their brain

extracted shading corrected T1 and T2 weighted intensity and

spatial parameters. The kNN algorithm could calculate

distances in feature space to obtain structure probabilities of

partial volume samples.

D. Validation routine

Right or left hemispheres were selected randomly

throughout the brain from 12 participant scans of whom 6

were from the training data and 6 from other data. Subcortical

structures, cortical grey matter, peripheral CSF and lateral

Fig. 4 A. Scatter plot of voxel intensities of the original T2W image relative to the registered original T1WFFE image of one patient from the

training data. Five structures are indicated: subcortical structures (SCS), cortical grey matter (CGM), peripheral (per.) CSF, lateral (lat.) ventricles and parenchymal (par.) lesions B. Same for shading corrected intensities.

Page 6: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

6

ventricles in these slices were manually segmented by 2

observers. They could indicate multiple structures per voxel.

So, in contrast to the training data, validation data also

contained partial volume voxels.

Since there were multiple structures per voxel, manual

fractions could be computed, as well as for single as combined

observers. Uniform distribution of structures and observer

certainty was assumed for each voxel, since no information

about the distribution was provided. For a single observer, the

manual fraction for voxel and structure is defined as

where is the binary value for voxel and structure

of the observer and the number of structures

classified in voxel by the observer. In order to enlarge the

range of manual fractions, uncertainty of both observers were

combined. For combined observers, the manual fraction is

equal to the average of both observer manual fractions.

For calculation of the manual fraction of total brain,

subcortical structures and cortical grey matter were merged,

and for the manual fraction of total CSF, peripheral CSF and

lateral ventricles were merged. The manual fraction for voxel

of resp. total brain and total CSF for a single observer are

defined as

and

.

For combined observers, the average of the total brain and

total CSF were taken. The manual value of intracranial

volume is binary for a single observer, since it is 1 for all

structures and 0 for the background, and fractional for

combined observers, for which the average of the binary

values of both observers were taken.

Fig. 5. Example of an image with a significant shading artefact (top) and a small shading artefact (bottom) with their cortical grey matter classifications using SC training data on the SC image (middle) and using uncorrected training data on the uncorrected image

(right).

Page 7: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

7

E. Evaluation

The agreement of observer segmentations and the automatic

segmentation, acquired by kNN classification, and the inter-

observer agreement, were measured by a variant of the Dice

similarity index (SI) [16, 17] . The SI formula assumes binary

values for both the reference and the segmentation. It is

defined as

where “Ref” denotes the volume of the binary reference,

“Seg” is the volume of the binary segmentation, “Ref ∩ Seg”

denotes the volume of the intersection of the binary reference

and binary segmentation, is the sum over all voxels

in the binary reference, is the sum over all voxels,

where in the binary reference the intensity value equals 1 and

idem for the binary segmentation.

Because we calculated manual fractions for the observer

segmentations, and kNN classification provided probabilistic

segmentations, the fractional Similarity Index (fSI) was

measured [18]. It is defined as

where is the manual fraction, computed for single

observers (formula 1) or combined observers (formula 2).

Notice that in case probabilistic values are substituted for

binary values, the fSI formula is equal to the SI formula. The

agreement of the probabilistic manual segmentations with the

automatic segmentation and the inter-observer agreement were

measured with the fSI.

Besides the fSI, also the sensitivity and specificity were

measured, which are more common quality indicators and

therefore makes the validation outcome comparable to other

studies. They are defined as

and

where is the sum of minima of the

reference and segmentation probabilities, equivalent to the

sum of true positives, is the sum of reference

probabilities, equivalent to the sum of true positives and false

negatives, is the number of

voxels minus the maxima of the reference and segmentation

probabilities, equivalent to the sum of true negatives, and

is the number of voxels minus the sum of

reference probabilities, equivalent to the sum of true negatives

and false positives.

The reference and segmented volume were determined by

multiplication of and to the volume of 1

voxel in milliliters. The difference was examined to detect

over- or under-segmentation of the automated structure

volumes.

Inter-observer and routine fSI and sensitivity scores of

subcortical structures, cortical grey matter, peripheral CSF,

lateral ventricles, total brain, total CSF and intracranial

volume were analyzed. To investigate if inclusion of training

data in the validation data improved validation scores, fSI

scores were compared for a validation set of only training data

to a validation set of non training data.

III. RESULTS

Table I shows the inter-observer validation results for all

structures. Apart from peripheral CSF, fSI scores of all

structures are good with a score of 0.82 for cortical grey

matter and total CSF, 0.95 for lateral ventricles and total brain

and even 0.98 for intracranial volume. Contrary to their high

fSI score, sensitivity of cortical grey matter is moderate with a

score of 0.77.

Table II shows the routine validation results for all

structures. Intracranial volume, total brain and lateral

ventricles scored well with fSI scores of resp. 0.98, 0.93, 0.92

and similar sensitivity scores. Subcortical structures scored

less with a fSI score of 0.83 and a sensitivity score of 0.88.

Total CSF, cortical grey matter and peripheral CSF scored

moderately with fSI scores of resp. 0.77, 0.76 and 0.71.

Page 8: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

8

IV. DISCUSSION

In this paper we proposed a kNN based routine to segment

subcortical structures, subcortical grey matter, peripheral CSF

and lateral ventricles on 3T T1 and T2 MR brain images of

patients after aSAH. To measure subtle differences in brain

volumes, high accuracy and precision were required.

Therefore, we based our routine on the kNN algorithm, which

is an accurate and precise method, and used accurate training

data of an expert and automated most routine steps for optimal

precision. The fSI scores of intracranial volume, total brain

and lateral ventricles were good, while subcortical structures,

total CSF, cortical grey matter and peripheral CSF scores were

lower.

A. Classification issues

The low scores of cortical grey matter, peripheral and total

CSF are partially explained by the slice thickness (4 mm),

which exceeded the thickness of cortical grey matter (2-4 mm)

and peripheral CSF (± 2 mm) [19], which made it largely

consist of partial volume. Subcortical structures and especially

cortical grey matter both have a lower fSI score than total

brain. This is partly explained by the large overlapping area

between subcortical structures and cortical grey matter, where

partial volume correction caused rounding errors, and partly

by the perivascular spaces, which were misclassified as

cortical grey matter (figure 6).

Several studies showed that fluid attenuation inversion

recovery (FLAIR) images were more suitable for classification

of parenchymal high-signal intensity lesions on T2-weighted

MRI since it showed them hyper-intense and ventricles hypo-

intense [20]–[23]. In a study by Anbeek, its optimal SI score

decreased from 0.81 to 0.63 when FLAIR images were

excluded from training data, which consisted of inverse

recovery (IR), proton-density (PD), T1 and T2 weighted

images [24]. Because we did not have FLAIR images, good

segmentation was not feasible, since parenchymal high-signal

intensity lesions and lateral ventricles were both hyper-intense

on T2-weighted MRI and closely located to each other, and

occur on different locations and in different amounts.

Therefore, they were combined with subcortical structures to

which they belong anatomically.

B. Validation issues

In order to fully exploit the observer segmentations, they

were combined into manual fractions, which take partial

volume into account. Both observers got equal share, even if

one observer did not assign any structure. Information about

the distribution of multiple structures in a voxel was not

indicated by the observers, so we considered equal importance

of all structures. For example, three structures in a voxel all

got a probability of 1/3, in case of one observer. In reality, one

of the three structures could be dominant and should have a

higher probability. For all partial volume voxels where

structures were not equally distributed, manual fractions

deviate, which caused lower classification scores. However,

TABLE II

ROUTINE VALIDATION RESULTS

Tissue type Sensitivity Specificity fSI

Subcortical structures 0.88 0.98 0.83

Cortical grey matter 0.70 0.98 0.76

Peripheral CSF 0.74 0.99 0.71

Lateral Ventricles 0.92 1.00 0.92

Total Brain 0.92 0.99 0.93

Total CSF 0.80 0.99 0.77

Intracranial 0.98 0.99 0.98

TABLE I

INTER-OBSERVER VALIDATION RESULTS

Tissue type Sensitivity Specificity fSI

Subcortical structures 0.89 0.99 0.87

Cortical grey matter 0.77 0.99 0.82

Peripheral CSF 0.87 0.99 0.77

Lateral Ventricles 0.95 1.00 0.95

Total Brain 0.93 1.00 0.95

Total CSF 0.90 0.99 0.82

Intracranial 0.98 1.00 0.98

Fig. 6. Example of perivascular spaces misclassified as cortical grey matter.

Page 9: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

9

we assumed that in a voxel, dominant structures will always

be noticed by both observers and inferior structures could be

missed by one observer, which will compensate for some of

the deviation.

Manual fractions could only take a limited number of values,

while kNN output had a wide range. Hence there was always

an error margin added, which decreased our fSI scores. We

chose not to threshold kNN output to the range of manual

fractions because it would change results for validation

reasons, while the unadjusted results were used for volume

measurement.

Using fSI instead of SI is an improvement because it could

deal better with partial volume. Probabilistic outcome of our

kNN routine did not have to be rounded and information of

multiple structures of both observers could be utilized

effectively. However, fSI scores were not used in other studies

so far and could therefore not be compared. Measurement of

the SI and fSI between observers was possible, since their

segmentations are binary and could be transformed to

fractions. The relation of fSI to SI scores could therefore be

examined. Generally, the fSI scores were lower than SI scores,

especially for structures with lots of partial volume, like

peripheral CSF and total CSF, because the SI formula did not

correct for partial volume. Usually a SI of 0.80 or higher is

considered a good segmentation and given that fSI is probably

stricter than SI, we considered the same for fSI. Compared to

the optimal SI values of the kNN based routine used by

Anbeek, which were based on PD, T1 and T2 weighted scans,

the present routine scored similar and even higher for lateral

ventricles. This is true while fSI is stricter and PD weighted

images were not included [24]. The high fSI score for lateral

ventricles could be explained by the larger ventricle volume of

patients after aSAH. Larger ventricles consist mostly of non-

partial voxels, which could better be classified than partial

volume voxels. An even lower optimal SI for cortical grey

matter, compared to our fSI score, indicated that our routine

did not fail but performed well using the kNN algorithm and

the provided imagery.

Validation scores of the single observers versus the

automatic routine were approximately similar as the combined

observers versus the automatic routine. Adding extra

information of uncertainty did not improve the scores. Leaving

training data out of the validation data did not change the

scores significantly, which indicated good classification

quality for new participant scans.

C. Application

Present routine is based on the kNN algorithm, which can

deliver precise and accurate results, while it is also simple and

fast. Its quality depends apart from the quality of the images,

strongly on the composition of the training data, in which

cerebral abnormalities were included. Samples of the training

data were consistently used by kNN for precise classification.

Because kNN effectively measured spatial and intensity

distances in feature space, only a small training set of non-

partial voxels was enough to deal with partial volume. The k-

means algorithm, which was used for brain mask creation, is

also simple and provides precise cluster images, under

assumption that sufficient samples were taken. With the use of

our defined set of morphological operations, cluster images

could be transformed into closed masks, which kept original

borders unchanged. Hence, the core of our routine is clear and

simple so we could focus on application specific processing

for improvement of kNN results. Apart from cerebellum

segmentation, all steps in our routine were automated.

Selection of appropriate training data may require lots of

expensive man hours, although a study by Vrooman showed

that automatic training with kNN is possible and routine steps

need only little adaption for general use [25]. Hence, its

application is feasible and additions and changes could be

tested without much human intervention.

D. Strengths and limitations

The strength of the present study is the usage of non-partial

volume samples in the training data for kNN classification.

Accuracy of brain volume was evaluated using small,

representative manual segmentations, which contained partial

volume information, while other brain volume measurement

studies use binary manual segmentations. Precision of brain

volume could be evaluated because data was selected from a

significant number of scans with variety of cerebral

abnormalities. For optimal precision, a standardized scanning

protocol was used for acquiring images of the data set.

Automated routine steps ensured consistency whereas manual

steps were consequently performed, like cerebellum

segmentation.

A limitation of the present routine is that many cerebral

abnormalities, like infarcts and perivascular spaces, could not

be processed automatically. However, we had accurate manual

segmentations of those cerebral abnormalities to our disposal,

so this limitation did not hinder accurate brain volume

measurements. The small number of observers limited the

evaluation because only 6 different values could be assigned

to the manual fractions, while kNN probabilities could have

100 different values, but it is still better than using binary

manual values.

Page 10: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

10

V. CONCLUSION

In this paper, we proposed an automated routine for brain

volume measurements on MR brain images from patients after

aSAH. We extended kNN classification with processing steps,

which we described and evaluated. Lateral ventricles, total

brain and intracranial volume, have good validation scores

while structures with more partial volume scored worse. It

could be explained by validation limitations, since visual

inspection showed good performance for structures with much

partial volume, like peripheral CSF.

VI. FUTURE PROSPECTS

Most cerebral abnormalities present in patients after aSAH

were manually segmented, but could be automated after more

study or under other conditions. For accurate automatic

cerebellum segmentation, sagittal images may be needed,

since they show the border between cerebrum and cerebellum

clearer. Validation scores of structures with much partial

volume should increase with the number of observers, because

it makes the manual fraction more accurate. These

assumptions need to be addressed in further studies.

APPENDIX A

A.1 Data

For cross-sectional volume measurements, 39 patients after

aSAH and 30 control participants from the COMET study

were selected. Inclusion criteria were mentioned in chapter

Materials and Methods, section Data. Additionally, control

participants with symptomatic ischemia were excluded. One

control participant had a large infarct because of a

neurotrauma and 3 control participants had clinically manifest

infarcts.

A.2 Cross-sectional routine

For all participants in the SAH database, pre-processing was

performed as mentioned. In two cases, only 3 clusters were

taken in k-means and in 5 cases an extra cluster was added

when a good cluster image initially did not result in a good

mask. For some masks, eyes were removed, moderate

imperfections were adjusted or k-means was performed with

fewer clusters because of movement artifacts, infarcts,

bleedings or without clear reason.

Post-processing on kNN probability maps were performed,

where in 18 cases, one or two ventricle horns, which voxels

did not attach to the lateral ventricles voxels, had to be

manually moved back from peripheral CSF to lateral

ventricles.

Automated segmented volumes of all structures were

calculated by multiplication of the size of one voxel in

milliliters to the sum of all probabilities. For the validation

data, the difference between the automated and manual

volume and the average volume for all validation participants

were calculated.

The total volumes of structures were calculated by

multiplication of the sum of their probabilities to the voxel

volume in milliliters.

The results of the probabilistic classification of all

structures were visually checked for all participants, and

incorrectly classified images were excluded. Also total brain

and total CSF volume were calculated. The mean and standard

deviation of the total brain, total CSF, subcortical structures,

cortical grey matter, peripheral CSF, and lateral ventricular

volume were measured for patients after aSAH and control

participants.

A.3 Cross-sectional volume measurements

Table A.I shows the mean and standard deviation of

automated volume measurements for control participants and

patients after aSAH. As expected, patients after aSAH had

larger lateral ventricles and infarcts than control participants.

TABLE A.I

MEAN VOLUMES AND STANDARD DEVIATION OF VOLUMES IN PATIENTS WITH SAH AND CONTROL PARTICIPANTS

Peripheral CSF Lateral ventricles Total brain Total CSF Intracranial Infarct1

Control participants Volume (ml)

232 ± 52.5 26.6 ± 10.6 978 ± 80.8 259 ± 57.4 1235 ± 125 1.10 [0.67, 1.53]

Patients with SAH

Volume (ml) 200 ± 40.4 48.0 ± 25.4 956 ± 112 248 ± 39.4 1194 ± 134 5.92 [1.49, 20.8]

Data are unadjusted mean brain volumes ± SD or 1 median infarct volumes and interquartile range

Page 11: Master research article

MASTER RESEARCH ARTICLE OF ANNE KASPERS, BIOMEDICAL IMAGE SCIENSES, UNIVERSITY MEDICAL CENTRE UTRECHT

11

ACKNOWLEDGMENT

My special thanks go to Jeroen de Bresser for his pleasant

supervision and for his approachableness during the project, to

Koen Vincken and Hugo Kuijf for their suggestions during the

meetings, to Nelly Anbeek for her suggestions between

meetings and to Bart Waalewijn and Ekke Kaspers for

reviewing my article.

REFERENCES

1. van Gijn J, Rinkel GJE (2001) Subarachnoid haemorrhage: diagnosis, causes and management. Brain 124:249-278

2. Linn FH, Rinkel GJ, Algra A, van GJ (1996) Incidence of subarachnoid

hemorrhage: role of region, year, and rate of computed tomography: a meta-analysis. Stroke

3. Broderick JP, Brott TG, Duldner JE, Tomsick T, Leach A (1994) Initial

and recurrent bleeding are the major causes of death following subarachnoid hemorrhage. Stroke; a journal of cerebral circulation

4. Hackett ML, Anderson CS (2000) Health outcomes 1 year after

subarachnoid hemorrhage: An international population-based study. The Australian Cooperative Research on Subarachnoid Hemorrhage Study

Group. Neurology

5. Bendel P, Koivisto T, Niskanen E, Kononen M, Aikia M, Hanninen T, Koskenkorva P, Vanninen R (2009) Brain atrophy and

neuropsychological outcome after treatment of ruptured anterior cerebral

artery aneurysms: a voxel-based morphometric study. Neuroradiology

51:711-722

6. Bendel P, Koivisto T, Aikia M, Niskanen E, Kononen M, Hanninen T,

Vanninen R (2009) Atrophic enlargement of CSF volume after subarachnoid hemorrhage: correlation with neuropsychological

outcome. American Journal of Neurology 31:370-376

7. de Bresser J, Portegies MP, Leemans A, Biessels GJ, Kappelle LJ, Viergever MA (2010) A comparison of MR based segmentation

methods for measuring brain atrophy progression. Neuroimage 2:760-

768

8. Cover T, Hart P (1967) Nearest neighbor pattern classification. {IEEE}

Transactions on Information Theory 13:21-27

9. Schaafsma JD, Velthuis BK, Majoie CB, van den Berg R, Brouwer PA, Barkhof F, Eshghi O, de Kort GA, Lo RT, Witkamp TD, Sprengers ME,

van Walderveen MA, Bot JC, Sanchez E, Vandertop WP, van Gijn J,

Buskens E, van der Graaf Y, Rinkel GJ (2010) Intracranial aneurysms treated with coil placement: test characteristics of follow-up MR

angiography--multicenter study. Radiology 1:209-218

10. Klein S, Staring M, Murphy K, Viergever MA, Pluim JP (2009) elastix: a toolbox for intensity-based medical image registration. IEEE Trans

Med Imaging

11. MacQueen J (1965) Some methods for classification and analysis of

multivariate observations.

12. Likar B, Viergever MA, Pernus F (2001) Retrospective correction of MR intensity inhomogeneity by information minimization. IEEE Trans

Med Imaging 20:1398-1410

13. Jongen C, van der Grond J, Kappelle LJ, Biessels GJ, Viergever MA,

Pluim JP (2007) Automated measurement of brain and white matter lesion volume in type 2 diabetes mellitus. Diabetologia 50:1509-1516

14. Smith SM (2002) Fast robust automated brain extraction. Human Brain

Mapping 3:

15. van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van GJ (1988)

Interobserver agreement for the assessment of handicap in stroke

patients. Stroke 19:604-607

16. Zijdenbos AP, Want BM, Margolin RA, Palmer AC (1994)

Morphometric analysis of white matter lesions in MR images: method

and validation. IEEE Trans Med Imaging 4:716-724

17. Dice LR (1945) Measures of the Amount of Ecologic Association

Between Species. Ecology 26:297-302

18. Crum WR, Camara O, Hill DL (2006) Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Trans Med

Imaging 25:1451-1461

19. Kandel ER, Schwartz JH, Jessell TM (2000) Principles of Neural Science Fourth Edition. McGraw-Hill Medical,

20. Admiraal-Behloul F, van den Heuvel DM, Olofsen H, van Osch MJ, van

der GJ, van Buchem MA, Reiber JH (2005) Fully automatic segmentation of white matter hyperintensities in MR images of the

elderly. Neuroimage 3:607-617

21. Anbeek P, Vincken KL, van Osch MJ, Bisschops RH, van der GJ (2004) Probabilistic segmentation of white matter lesions in MR imaging.

Neuroimage

22. Murray AD, Staff RT, Shenkin SD, Deary IJ, Starr JM, Whalley LJ (2005) Brain white matter hyperintensities: relative importance of

vascular risk factors in nondemented elderly people. Radiology 1:251-257

23. Wen W, Sachdev PS, Li JJ, Chen X, Anstey KJ (2009) White matter

hyperintensities in the forties: their prevalence and topography in an epidemiological sample aged 44-48. Human Brain Mapping 4:1155-

1167

24. Anbeek P, Vincken KL, van Bochove GS, van Osch MJ, van der GJ (2005) Probabilistic segmentation of brain tissue in MR imaging.

Neuroimage 4:795-804

25. Vrooman HA, Cocosco CA, van der Lijn F, Stokking R, Ikram MA, Vernooij MW, Breteler MM, Niessen WJ (2007) Multi-spectral brain

tissue segmentation using automatically trained k-Nearest-Neighbor

classification. Neuroimage 1:71-81