| Literature DB >> 28286318 |
Ferran Prados1, John Ashburner2, Claudia Blaiotta2, Tom Brosch3, Julio Carballido-Gamio4, Manuel Jorge Cardoso5, Benjamin N Conrad6, Esha Datta4, Gergely Dávid7, Benjamin De Leener8, Sara M Dupont8, Patrick Freund7, Claudia A M Gandini Wheeler-Kingshott9, Francesco Grussu10, Roland Henry4, Bennett A Landman6, Emil Ljungberg11, Bailey Lyttle12, Sebastien Ourselin5, Nico Papinutto4, Salvatore Saporito13, Regina Schlaeger4, Seth A Smith14, Paul Summers15, Roger Tam16, Marios C Yiannakas10, Alyssa Zhu4, Julien Cohen-Adad17.
Abstract
An important image processing step in spinal cord magnetic resonance imaging is the ability to reliably and accurately segment grey and white matter for tissue specific analysis. There are several semi- or fully-automated segmentation methods for cervical cord cross-sectional area measurement with an excellent performance close or equal to the manual segmentation. However, grey matter segmentation is still challenging due to small cross-sectional size and shape, and active research is being conducted by several groups around the world in this field. Therefore a grey matter spinal cord segmentation challenge was organised to test different capabilities of various methods using the same multi-centre and multi-vendor dataset acquired with distinct 3D gradient-echo sequences. This challenge aimed to characterize the state-of-the-art in the field as well as identifying new opportunities for future improvements. Six different spinal cord grey matter segmentation methods developed independently by various research groups across the world and their performance were compared to manual segmentation outcomes, the present gold-standard. All algorithms provided good overall results for detecting the grey matter butterfly, albeit with variable performance in certain quality-of-segmentation metrics. The data have been made publicly available and the challenge web site remains open to new submissions. No modifications were introduced to any of the presented methods as a result of this challenge for the purposes of this publication.Entities:
Keywords: Challenge; Evaluation metrics; Grey matter; MRI; Segmentation; Spinal cord
Mesh:
Year: 2017 PMID: 28286318 PMCID: PMC5440179 DOI: 10.1016/j.neuroimage.2017.03.010
Source DB: PubMed Journal: Neuroimage ISSN: 1053-8119 Impact factor: 6.556
Demographic data per site, first row: number of healthy controls per site, second row: gender - female (F):male (M); third row: mean age in years. Std: standard deviation.
| 20 | 20 | 20 | 20 | |
| 14F:6M | 11F:9M | 6F:14M | 7F:13M | |
| 44.3 (10.4) | 33.7 (17.4) | 40.6 (10.4) | 28.3 (8.2) |
A summary of acquisition parameters from each site.
| 3 T Philips Achieva | 3 T Siemens TIM Trio | 3 T Siemens Skyra | 3 T Philips Achieva | |
| 3D Gradient echo | 2D spoiled gradient multi-echo | 3D multi-echo gradient-echo | 3D multi-echo gradient-echo | |
| 5 | 19 | |||
| 23 | 539 | 44 | 700 | |
| 7 | 35 | 11 | 28 | |
| 240×180 | 320×320 | 162×192 | 160×160 | |
| 8 | 1 | 5 | 2 | |
| 10 (3 extracted) | 10 | 20 | 14 | |
| 13:34 | 4:38 | 10:40 | 5:46 | |
| 16 | 16 | 16 | ||
| Neurovascular | Head+Neck | Neurovascular | Neurovascular | |
| – | GRAPPA factor 2 | – | SENSE RL=2 |
A summary of the validation metrics.
| Similarity between masks | Higher values are better | Overlap | |||
| Similarity between masks | Higher values are better | Overlap | |||
| Ratio between mis-segmented and correctly segmented | Higher values are better | Overlap | |||
| Mean euclidean distance between mask contours (mean error) | Smaller values are better | Distance | |||
| Longest euclidean distance between mask contours (absolute error) | Smaller values are better | Distance | |||
| Indicator of maximal local error | Smaller values are better | Distance | |||
| Indicator of global errors | Smaller values are better | Distance | |||
| Low values mean that method tends to under-segment | Higher values are better | Statistical | |||
| Quality of segmented background | Higher values are better | Statistical | |||
| Low values mean that method tends to over-segment | Higher values are better | Statistical |
Setup parameters and characteristics for each presented method. Note atlas size is in number of slices and that computational time per slice is an approximation, has been obtained in different workstations and might vary depending on the resolution.
| Automatic | No | 820 | 4–5 min | ||
| Automatic | Yes (4 h) | 160 | |||
| Automatic | No | 1 | 1 s | ||
| Manual | Yes ( | No | 5–80 s | ||
| Automatic | No | 447 | 8–10 s | ||
| Automatic | No | No | 5 s |
Comparison of each rater segmentation versus the majority voting mask of all raters for the test dataset with the mean (std) Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized Hausdorff distance (SHD), skeletonized median distance (SMD), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC). In bold face, the best obtained result for each particular metric. The script * represents significant differences (paired t-test with p0.05) between the obtained result by a rater and the best result. MSD, HSD, SHD and SMD are in millimetres and lower values mean better, for all the other scores higher values mean better score.
| 0.91 (0.02)* | 0.89 (0.03)* | 0.90 (0.03)* | ||
| 0.20 (0.21) | 0.30 (0.31)* | 0.21 (0.22) | ||
| 1.80 (0.68)* | 1.75 (0.57)* | 1.53 (0.44) | ||
| 0.71 (0.28) | 1.10 (0.39)* | 0.70 (0.31) | ||
| 0.37 (0.18) | 0.43 (0.21) | 0.36 (0.18) | ||
| 89.27 (3.7) | 81.99 (5.39)* | 84.64 (3.76)* | ||
| 99.990 (0.02) | 99.995 (0.01) | 99.994 (0.01) | ||
| 92.01 (3.48)* | 96.04 (1.92) | 95.08 (2.06)* | ||
| 0.83 (0.04)* | 0.80 (0.05)* | 0.82 (0.04)* | ||
| 78.95 (5.94)* | 73.80 (8.89)* | 77.45 (6.40)* |
Fig. 1Results of the raters for the testing dataset. Boxplot, the mean value is represented by a rhombus and dots show original obtained values per mask. Each rater's results are compared to the majority voting mask. From left to right, first row: Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized median distance (SMD) and skeletonized Hausdorff distance (SHD). Second row: true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC).
Comparison of each method segmentation versus each one of the four raters masks for the test dataset with the mean (std) Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized Hausdorff distance (SHD), skeletonized median distance (SMD), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC). In bold face, the best obtained result for each particular metric. The script * represents significant differences (paired t-test with p0.05) between the obtained result and the best result. The script + represents non-significant differences (paired t-test with p0.05) between the obtained result and the consensus of the raters. MSD, HSD, SHD and SMD are in millimetres and lower values mean better, for all the other scores higher values mean better score.
| 0.79 (0.04) | 0.75 (0.07)* | 0.76 (0.06)* | 0.69 (0.07)* | 0.61 (0.13)* | ||
| 0.46 (0.48) | 0.70 (0.79)* | 0.62 (0.64) | 0.69 (0.76)* | 1.04 (1.14)* | ||
| 4.07 (3.27)* | 3.56 (1.34) | 4.92 (3.30)* | 3.26 (1.35) | 5.34 (15.35)+ | ||
| 1.26 (0.65)* | 1.07 (0.37) | 1.86 (0.85)* | 1.12 (0.41) | 2.77 (8.10)+ | ||
| 0.45 (0.20)*+ | 0.39 (0.17)*+ | 0.61 (0.35)* | 0.39 (0.16)+ | 0.54 (0.25)* | ||
| 77.98 (4.88)* | 78.89 (10.33)* | 75.69 (8.08)* | 70.29 (6.76)* | 65.66 (14.39)* | ||
| 99.97 (0.04) | 99.94 (0.08)* | 99.97 (0.05) | 99.95 (0.06) | 99.93 (0.09)* | ||
| 81.06 (5.97) | 65.60 (9.01)* | 76.26 (7.41)* | 67.87 (8.62)* | 59.07 (13.69)* | ||
| 0.66 (0.05) | 0.60 (0.08)* | 0.61 (0.08)* | 0.53 (0.08)* | 0.45 (0.13)* | ||
| 47.17 (11.87) | 29.36 (29.53)* | 33.69 (24.23)* | 6.46 (30.59)* |
Fig. 2Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized Hausdorff distance (SHD), skeletonized median distance (SMD) results of the presented methods per site using the testing dataset. Boxplot, the mean value is represented by a rhombus and dots show original obtained values per mask. MSD, HSD, SMD and SHD are in mm and represented using a logarithmic scale.
Fig. 3True positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC) results of the presented methods per site using the testing dataset. Boxplot, the mean value is represented by a rhombus and dots show original obtained values per mask.
Fig. 4Binary grey matter segmentation results for the same single slice for subject 11 of each site. From top to bottom row: input image, majority voting segmentation from the 4 raters and the segmentation methods: JCSCS, DEEPSEG, MGAC, GSBME, SCT and VBEM. Obtained 3D DSC is overlayed.
Generalized linear model results for the method's performance per each metric depending on the scanner sequence expressed as p-value (F-test between all site coefficients in a regression model). Values with (in bold face) mean that the image quality has an statistically significant influence over the performance of this metric and method. Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized Hausdorff distance (SHD), skeletonized median distance (SMD), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC).
| 0.233 | 0.210 | 0.174 | 0.286 | |||
| 0.270 | 0.295 | |||||
| 0.345 | ||||||
| 0.120 | 0.145 | 0.263 | 0.869 | |||
| 0.155 | 0.140 | |||||
| 0.217 | 0.172 | 0.212 | 0.261 | |||
| 0.256 | 0.289 | 0.161 | 0.346 |
Generalized linear model results for the method's performance per each metric depending on the age of each subject expressed as regression coefficient, 95% confidence interval (CI) and p-value. DSC, TPR, TNR, PPV, JI, CC are in and SHD, SMD, MSD, HSD are in mm years−1. Values with mean that the age (atrophy) has a statistically significant influence over the performance of this metric and method. Dice similarity coefficient (DSC), mean surface distance (MSD), Hausdorff surface distance (HSD), skeletonized Hausdorff distance (SHD), skeletonized median distance (SMD), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), Jaccard index (JI) and conformity coefficient (CC).
| 0.001 | 0.001 | |||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.070 | ||||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.043 | ||||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.001 | 0.003 | |||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.074 | 0.085 | 0.102 | ||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.109 | 0.125 | 0.221 | ||||
| CI=[ | CI=[0.039–0.211] | CI=[ | CI=[0.068–0.373] | CI=[ | CI=[ | |
| 0.001 | 0.002 | 0.221 | ||||
| CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | CI=[ | |
| 0.312 | 0.400 | 0.623 | 0.104 | |||
| CI=[0.049–0.574] | CI=[ | CI=[ | CI=[0.068–1.179] | CI=[ | CI=[ | |