Jul 15, 2013 Jason Su

JOURNAL CLUB:Cardoso et al., University College London, UK

“STEPS: Similarity and Truth Estimation for Propagated Segmentationsand its application to hippocampal segmentation and brain parcelation.”

Jul 15, 2013Jason Su

Motivation• Manual segmentation of structures on MR images is an

important but often tedious task, requiring the expertise of a radiologist– Automation is highly desirable and can also help remove

observer bias

• At 7T, we are working on studies that could benefit from this– Manually segmenting thalamic nuclei with the improved

contrast of WMn-MPRAGE– Measuring atrophy in MS patients

Background

STAPLE: Warfield et al. “Simultaneous Truth and Performance Level Estimation.” 2004.• The algorithm considers a collection of segmentations

and computes a probabilistic estimate of the true segmentation: “label fusion”

• Source of each segmentation in the collection may be human raters or automated

• True segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation

Similarity Measures

Intra-model: same contrast, same imaging parameters• Mean Square Difference:

1/N*Σ(meanA-meanB)2

• Requires the images to have intensity values in the same range• Normalized Cross Correlation (NCC): 1/N*cov(A,B)/(varA*varB)

• Allows intensity values related by a linear transformation

Inter-modal:• Mutual Information:

H(A,B) – H(A) – H(B)• Indicates how much uncertainty about one set is reduced by the

knowledge of the second set

Similarity Measures

For binary or multi-class masks•Dice coefficient•2[A == B]/[A] + [B]•where [X] = the size of the region X

STEPS Improvements on STAPLE

Use local intensity to select best labels to fuse

Markov random field (MRF) to ensure spatial consistency can now handle probabilities

Make unbiased towards larger structures

Extended these improvements to multi-label problem

Theory: Model

For each voxel i:• yi = image intensity• ti = the true segmentation (binary or multi-class)• di = column vector of R candidate segmentations, i.e. our

atlas found automatically or manually

For each rater j:• pj and qj represent how good she is at estimating 1s and 0s

• pj = P(dij=1 | ti=1)• qj = P(dij = 0 | ti=0)

Theory: Model

What we want: wi = P(ti=1 | di, p, q)• The posterior probability• Given all the information from our raters and how good they

are, what is the likelihood that this voxel is actually in the structure

Unfortunately p and q are coupled with wi

• We simultaneously modify our understanding of the truth and the performance of the rates compared to it

• This is solved iteratively with an Expectation-Maximization algorithm• Alternatingly keep one fixed and solved for the other until

convergence

Improvement: Local Ranking

STAPLE did not have a full treatment of automatic segmentation with an atlas• Accuracy of the registration was not taken into account• Recently there have been developments using global or ROI metrics

of NCC but these

Introduced voxel-wise local NCC

• NCC computed within a Gaussian kernel around each voxel• Algorithm is now modified to only keep top X registered raters per

voxel• Decouples sources of error, registration accuracy and reliability of

rater

Improvement : MRF Regularization

Spatial consistency to promote a contiguous segmentation is incorporated

Uses Markov random field

• A voxel is independent of the rest of the image given its neighbors (6-connectivity)

• P(ti=k) is now also dependent on the voxel’s neighbors• Adapted it to be computed with probabilities instead of the

label, claim better accuracy and speed

Improvement: Unbiasing

In STAPLE, the size of the segmentation/object heavily biases p and q• If the object is small, we can guess 0s everywhere and still

have a high rating for q• Usually need to choose a high confidence threshold

(~0.9999) to get the final segmentation

Now only consider voxels in contention between raters• Improves computation time and performance• Choice of threshold is now simply 0.5

Validation

Simulation• Show the effect of local ranking using samples with different

morphologies• Compared against other algorithms with leave one-out cross validation in

simulated MR data

ADNI Database, measuring atrophy• Evaluated improvement with each innovation• Compared against other algorithms with leave one-out cross validation

Other algorithms implemented in NiftySeg• Where is STEPS?

Results: Optimizing Local Ranking

• STEPS does best for X=15 but the gain is marginal?• Notably, other algorithms can become much worse with more

templates

Results: Simulation

• Local ranking in STEPS beats global ranking in STAPLE

• Especially when the morphology of the target is very different from atlas raters

• This means can get away with less raters to cover more cases

Results: Phantom Simulation

• Measure the performance as R, the number of raters or size of the atlas, is reduced

• STEPS beats the others with even the smallest R– Performance characteristics can change with R, i.e. optimal X


• STEPS is significantly better than all of these competitors• They all seem pretty good though, splitting hairs?


Results: ADNI

Results: MRF Smoothing in Multi-Steps

Discussion

Each innovation in STEPS significantly improves the accuracy• Final performance is significantly better than state of the art

competitors• Dice score (0.925±0.021) close to the inter-rater variability of the

manual segmentors (0.93±0.03) on a different database

Local ranking strategies implicitly encode local morphological variability rather than global• Fewer anatomical templates are needed to deal with the

population’s variability• However, LNCC may have limitations in low contrast, consider

other metrics

Documents

Jul 15, 2013 Jason Su