Upload
latif
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Jul 15, 2013 Jason Su. Motivation. Manual segmentation of structures on MR images is an important but often tedious task, requiring the expertise of a radiologist Automation is highly desirable and can also help remove observer bias - PowerPoint PPT Presentation
Citation preview
JOURNAL CLUB:Cardoso et al., University College London, UK
“STEPS: Similarity and Truth Estimation for Propagated Segmentationsand its application to hippocampal segmentation and brain parcelation.”
Jul 15, 2013Jason Su
Motivation• Manual segmentation of structures on MR images is an
important but often tedious task, requiring the expertise of a radiologist– Automation is highly desirable and can also help remove
observer bias
• At 7T, we are working on studies that could benefit from this– Manually segmenting thalamic nuclei with the improved
contrast of WMn-MPRAGE– Measuring atrophy in MS patients
Background
STAPLE: Warfield et al. “Simultaneous Truth and Performance Level Estimation.” 2004.• The algorithm considers a collection of segmentations
and computes a probabilistic estimate of the true segmentation: “label fusion”
• Source of each segmentation in the collection may be human raters or automated
• True segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation
Similarity Measures
Intra-model: same contrast, same imaging parameters• Mean Square Difference:
1/N*Σ(meanA-meanB)2
• Requires the images to have intensity values in the same range• Normalized Cross Correlation (NCC): 1/N*cov(A,B)/(varA*varB)
• Allows intensity values related by a linear transformation
Inter-modal:• Mutual Information:
H(A,B) – H(A) – H(B)• Indicates how much uncertainty about one set is reduced by the
knowledge of the second set
Similarity Measures
For binary or multi-class masks•Dice coefficient•2[A == B]/[A] + [B]•where [X] = the size of the region X
STEPS Improvements on STAPLE
Use local intensity to select best labels to fuse
Markov random field (MRF) to ensure spatial consistency can now handle probabilities
Make unbiased towards larger structures
Extended these improvements to multi-label problem
Theory: Model
For each voxel i:• yi = image intensity• ti = the true segmentation (binary or multi-class)• di = column vector of R candidate segmentations, i.e. our
atlas found automatically or manually
For each rater j:• pj and qj represent how good she is at estimating 1s and 0s
• pj = P(dij=1 | ti=1)• qj = P(dij = 0 | ti=0)
Theory: Model
What we want: wi = P(ti=1 | di, p, q)• The posterior probability• Given all the information from our raters and how good they
are, what is the likelihood that this voxel is actually in the structure
Unfortunately p and q are coupled with wi
• We simultaneously modify our understanding of the truth and the performance of the rates compared to it
• This is solved iteratively with an Expectation-Maximization algorithm• Alternatingly keep one fixed and solved for the other until
convergence
Improvement: Local Ranking
STAPLE did not have a full treatment of automatic segmentation with an atlas• Accuracy of the registration was not taken into account• Recently there have been developments using global or ROI metrics
of NCC but these
Introduced voxel-wise local NCC
• NCC computed within a Gaussian kernel around each voxel• Algorithm is now modified to only keep top X registered raters per
voxel• Decouples sources of error, registration accuracy and reliability of
rater
Improvement : MRF Regularization
Spatial consistency to promote a contiguous segmentation is incorporated
Uses Markov random field
• A voxel is independent of the rest of the image given its neighbors (6-connectivity)
• P(ti=k) is now also dependent on the voxel’s neighbors• Adapted it to be computed with probabilities instead of the
label, claim better accuracy and speed
Improvement: Unbiasing
In STAPLE, the size of the segmentation/object heavily biases p and q• If the object is small, we can guess 0s everywhere and still
have a high rating for q• Usually need to choose a high confidence threshold
(~0.9999) to get the final segmentation
Now only consider voxels in contention between raters• Improves computation time and performance• Choice of threshold is now simply 0.5
Validation
Simulation• Show the effect of local ranking using samples with different
morphologies• Compared against other algorithms with leave one-out cross validation in
simulated MR data
ADNI Database, measuring atrophy• Evaluated improvement with each innovation• Compared against other algorithms with leave one-out cross validation
Other algorithms implemented in NiftySeg• Where is STEPS?
Results: Optimizing Local Ranking
• STEPS does best for X=15 but the gain is marginal?• Notably, other algorithms can become much worse with more
templates
Results: Simulation
• Local ranking in STEPS beats global ranking in STAPLE
• Especially when the morphology of the target is very different from atlas raters
• This means can get away with less raters to cover more cases
Results: Phantom Simulation
• Measure the performance as R, the number of raters or size of the atlas, is reduced
• STEPS beats the others with even the smallest R– Performance characteristics can change with R, i.e. optimal X
Results: Phantom Simulation
• STEPS is significantly better than all of these competitors• They all seem pretty good though, splitting hairs?
Results: Phantom Simulation
Results: ADNI
Results: MRF Smoothing in Multi-Steps
Discussion
Each innovation in STEPS significantly improves the accuracy• Final performance is significantly better than state of the art
competitors• Dice score (0.925±0.021) close to the inter-rater variability of the
manual segmentors (0.93±0.03) on a different database
Local ranking strategies implicitly encode local morphological variability rather than global• Fewer anatomical templates are needed to deal with the
population’s variability• However, LNCC may have limitations in low contrast, consider
other metrics