| Literature DB >> 31727919 |
Rutger Heinen1, Martijn D Steenwijk2,3, Frederik Barkhof3,4, J Matthijs Biesbroek5, Wiesje M van der Flier6,7, Hugo J Kuijf8, Niels D Prins6, Hugo Vrenken2,3, Geert Jan Biessels5, Jeroen de Bresser9.
Abstract
White matter hyperintensities (WMHs) are a common manifestation of cerebral small vessel disease, that is increasingly studied with large, pooled multicenter datasets. This data pooling increases statistical power, but poses challenges for automated WMH segmentation. Although there is extensive literature on the evaluation of automated WMH segmentation methods, such evaluations in a multicenter setting are lacking. We performed WMH segmentations in sixty patients scanned on six different magnetic resonance imaging (MRI) scanners (10 patients per scanner) using five freely available and fully-automated WMH segmentation methods (Cascade, kNN-TTP, Lesion-TOADS, LST-LGA and LST-LPA). Different MRI scanner vendors and field strengths were included. We compared these automated WMH segmentations with manual WMH segmentations as a reference. Performance of each method both within and across scanners was assessed using spatial and volumetric correspondence with the reference segmentations by Dice's similarity coefficient (DSC) and intra-class correlation coefficient (ICC) respectively. We found the best performance, both within and across scanners, for kNN-TTP, followed by LST-LPA and LST-LGA, with worse performance for Lesion-TOADS and Cascade. Our findings can serve as a guide for choosing a method and also highlight the importance to further improve and evaluate consistency of methods in a multicenter setting.Entities:
Mesh:
Year: 2019 PMID: 31727919 PMCID: PMC6856351 DOI: 10.1038/s41598-019-52966-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Mean WMH volume of the reference segmentations and the segmentations of the methods for each scanner (n = 42; n = 7 per scanner).
| WMH volume | GE Signa | GE Signa | GE Discovery | Philips | Philips | Philips | Overall |
|---|---|---|---|---|---|---|---|
| Reference | 22 ± 31 | 16 ± 18 | 9 ± 10 | 14 ± 17 | 41 ± 71 | 24 ± 26 | |
| Cascade | 26 ± 20 | 19 ± 11 | 13 ± 5 | 19 ± 10 | 12 ± 4 | 11 ± 5 | |
| kNN-TTP | 16 ± 19 | 14 ± 13 | 9 ± 10 | 14 ± 17 | 32 ± 49 | 20 ± 22 | |
| Lesion-TOADS | 19 ± 20 | 16 ± 12 | 11 ± 9 | 36 ± 24 | 30 ± 45 | 31 ± 16 | |
| LST-LGA | 20 ± 19 | 19 ± 23 | 12 ± 15 | 15 ± 20 | 22 ± 28 | 14 ± 17 | |
| LST-LPA | 18 ± 22 | 15 ± 18 | 11 ± 13 | 14 ± 18 | 33 ± 51 | 18 ± 22 |
Note: Values represent mean WMH volumes ± SD in mL. Reference: reference segmentations.
Figure 1WMH segmentations of the methods regarding periventricular, confluent and punctuate WMHs. Example of WMH segmentations for a subject (subject A) with predominantly periventricular WMHs (panel A), a subject (subject B) with large confluent WMHs (panel B) and a subject (subject C) with predominantly punctuate WMHs (panel C). Top rows panels (A–C) original FLAIR scan and WMH reference segmentation (green) and WMH segmentations of all methods (red) are shown. Bottom rows panels (A–C) false negative voxels are shown in blue; false positive voxels are shown in yellow.
Performance of the WMH segmentation methods compared to the reference segmentations (n = 42; n = 7 per scanner).
| Method | Measure | GE Signa HDxt | GE Signa HDxt | GE Discovery | Philips Ingenuity | Philips Ingenia | Philips Achieva | Overall |
|---|---|---|---|---|---|---|---|---|
| Ref | WMH | 22 ± 31 | 16 ± 18 | 9 ± 10 | 14 ± 17 | 41 ± 71 | 24 ± 26 | |
| Cascade | ΔWMH | 4 ± 15 | 4 ± 19 | 4 ± 11 | 6 ± 12 | −29 ± 68 | −13 ± 22 | |
| |ΔWMH| | 12 ± 9 | 14 ± 12 | 10 ± 5 | 11 ± 6 | 32 ± 66 | 15 ± 21 | ||
| DSC | 0.48 ± 0.29 | 0.35 ± 0.20 | 0.34 ± 0.25 | 0.43 ± 0.22 | 0.40 ± 0.21 | 0.41 ± 0.14 | ||
| ICC | 0.45 (−0.19; 0.87) | 0.45 (−0.18; 0.87) | * | 0.44 (−0.16; 0.86) | 0.43 (−0.40; 0.87) | 0.46 (−0.32; 0.88) | ||
| kNN-TTP | ΔWMH | −5 ± 13 | −2 ± 7 | 0.8 ± 3 | 0.9 ± 2 | −9 ± 22 | −4 ± 4 | − |
| |ΔWMH| | 6 ± 13 | 5 ± 6 | 2 ± 2 | 1 ± 2 | 10 ± 21 | 4 ± 4 | ||
| DSC | 0.74 ± 0.11 | 0.68 ± 0.11 | 0.71 ± 0.12 | 0.74 ± 0.10 | 0.75 ± 0.14 | 0.76 ± 0.07 | ||
| ICC | 0.99 (0.94; 1.00) | 0.95 (0.73; 0.99) | 0.97 (0.76; 0.99) | 0.96 (0.80; 0.99) | 0.99 (0.95; 1.00) | 0.98 (0.88; 1.00) | ||
| Lesion-TOADS | ΔWMH | −3 ± 10 | 0.5 ± 9 | 2 ± 3 | 23 ± 31 | −11 ± 26 | 7 ± 24 | |
| |ΔWMH| | 5 ± 9 | 6 ± 6 | 3 ± 2 | 25 ± 29 | 14 ± 24 | 16 ± 18 | ||
| DSC | 0.63 ± 0.21 | 0.56 ± 0.20 | 0.49 ± 0.22 | 0.43 ± 0.34 | 0.61 ± 0.15 | 0.46 ± 0.32 | ||
| ICC | 0.80 (0.28; 0.96) | 0.77 (0.22; 0.96) | 0.69 (−0.01; 0.94) | * | 0.93 (0.65; 0.99) | 0.08 (−0.54; 0.73) | ||
| LST-LGA | ΔWMH | −2 ± 13 | 4 ± 7 | 4 ± 6 | 2 ± 4 | −19 ± 44 | −10 ± 10 | |
| |ΔWMH| | 7 ± 11 | 6 ± 6 | 4 ± 5 | 3 ± 2 | 19 ± 44 | 10 ± 10 | ||
| DSC | 0.58 ± 0.16 | 0.53 ± 0.18 | 0.54 ± 0.12 | 0.53 ± 0.17 | 0.63 ± 0.18 | 0.59 ± 0.11 | ||
| ICC | 0.95 (0.70; 0.99) | 0.92 (0.62; 0.99) | 0.97 (0.78; 1.00) | 0.92 (0.61; 0.99) | 0.90 (0.32; 0.98) | 0.89 (−0.03; 0.99) | ||
| LST-LPA | ΔWMH | −3 ± 10 | −0.2 ± 7 | 2 ± 5 | 0.6 ± 4 | −8 ± 21 | -−6 ± 6 | |
| |ΔWMH| | 5 ± 8 | 4 ± 5 | 3 ± 5 | 3 ± 2 | 10 ± 20 | 7 ± 5 | ||
| DSC | 0.65 ± 0.13 | 0.52 ± 0.20 | 0.53 ± 0.17 | 0.59 ± 0.17 | 0.69 ± 0.15 | 0.63 ± 0.11 | ||
| ICC | 0.97 (0.85; 1.00) | 0.87 (0.47; 0.98) | 0.94 (0.71; 0.99) | 0.88 (0.43; 0.98) | 0.96 (0.80; 0.99) | 0.93 (0.54; 0.99) |
Note: WMH, ΔWMH, |ΔWMH| and DSC are shown as means ± SD. ICC is shown with 95% confidence interval.
Ref: Reference; WMH: WMH volume (mL); ΔWMH: difference in WMH volume (mL) between the reference segmentations and segmentations of the methods; |ΔWMH|: absolute difference in WMH volume (mL) between the reference segmentations and segmentations of the methods; DSC: dice similarity coefficient; ICC: intra-class correlation coefficient. *Negative ICC (not used for calculating the overall mean ICC).
Variation in performance across scanners by means of multiple linear regression analyses (n = 42; n = 7 per scanner).
| Method | GE Signa HDxt | GE Signa HDxt | GE Discovery | Philips Ingenuity | Philips Ingenia | Philips Achieva |
|---|---|---|---|---|---|---|
| Cascade | 0.09 [−0.09; 0.27] | −0.06 [−0.24; 0.12] | −0.08 [−0.26; 0.10] | 0.03 [−0.15; 0.21] | 0.003 [−0.18; 0.18] | 0.01[−0.17; 0.19] |
| kNN-TTP | 0.01 [−0.08; 0.10] | −0.06 [−0.15; 0.03] | −0.03 [−0.12; 0.07] | 0.02 [−0.08; 0.11] | 0.03 [−0.06; 0.12] | 0.03 [−0.06; 0.12] |
| Lesion-TOADS | 0.12 [−0.08; 0.33] | 0.04 [−0.17; 0.24] | −0.05 [−0.26; 0.16] | −0.12 [−0.33; 0.08] | 0.10 [−0.11; 0.30] | −0.08 [−0.29; 0.12] |
| LST-LGA | 0.02 [−0.11; 0.14] | −0.04 [−0.17; 0.09] | −0.03 [−0.16; 0.10] | −0.04 [−0.17; 0.09] | 0.07 [−0.05; 0.20] | 0.02 [−0.10; 0.15] |
| LST-LPA | 0.06 [−0.07; 0.20] | −0.10 [−0.24; 0.03] | −0.09 [−0.23; 0.05] | −0.01 [−0.15; 0.13] | 0.11 [−0.03; 0.24] | 0.03 [−0.10; 0.17] |
Data are represented as unstandardized beta coefficients with 95% confidence intervals. We assessed whether the DSC (as an outcome) depended on scanner (as a categorical variable with each scanner being compared to all other scanners as the reference) using linear regression analysis. A significant relation between a certain scanner and the DSC (family wise error rate corrected p-value of <0.05 using a Bonferroni correction) indicates that the performance (in terms of spatial correspondence with the reference segmentation) was biased for that segmentation method by the use of that scanner (compared to the other scanners). As can be seen in the table, no significant relations were seen for any of the methods.
Performance of WMH segmentation methods for different WMH lesion loads.
| Method | Fazekas scale | WMH volume reference | WMH volume method | ΔWMH | |ΔWMH| | DSC | ICC |
|---|---|---|---|---|---|---|---|
| Cascade | 1 | 4 ± 4 | 12 ± 6 | 8 ± 6 | 8 ± 6 | 0.24 ± 0.16 | 0.02 (−0.12; 0.27) |
| 2 | 16 ± 10 | 18 ± 11 | 2 ± 12 | 10 ± 6 | 0.50 ± 0.15 | 0.31 (−0.16; 0.67) | |
| 3 | 73 ± 61 | 26 ± 18 | −47 ± 62 | 49 ± 60 | 0.54 ± 0.22 | 0.13 (−0.23; 0.67) | |
| kNN-TTP | 1 | 4 ± 4 | 5 ± 4 | 0.4 ± 1 | 0.9 ± 0.6 | 0.64 ± 0.10 | 0.91 (0.67; 0.97) |
| 2 | 16 ± 10 | 15 ± 9 | −1 ± 3 | 3 ± 2 | 0.78 ± 0.06 | 0.96 (0.90; 0.99) | |
| 3 | 73 ± 61 | 56 ± 41 | −17 ± 22 | 18 ± 21 | 0.82 ± 0.06 | 0.92 (0.62; 0.99) | |
| Lesion TOADS | 1 | 4 ± 4 | 18 ± 20 | 13 ± 21 | 13 ± 21 | 0.35 ± 0.21 | 0.11 (−0.13; 0.43) |
| 2 | 16 ± 10 | 19 ± 11 | 3 ± 13 | 6 ± 12 | 0.61 ± 0.20 | 0.50 (0.08; 0.78) | |
| 3 | 73 ± 61 | 53 ± 37 | −20 ± 24 | 22 ± 22 | 0.77 ± 0.06 | 0.90 (0.49; 0.98) | |
| LST-LGA | 1 | 4 ± 4 | 4 ± 5 | −0.3 ± 2 | 2 ± 2 | 0.47 ± 0.12 | 0.76 (0.46; 0.91) |
| 2 | 16 ± 10 | 15 ± 10 | −0.4 ± 7 | 5 ± 5 | 0.61 ± 0.14 | 0.84 (0.63; 0.94) | |
| 3 | 73 ± 61 | 53 ± 17 | −20 ± 48 | 31 ± 40 | 0.70 ± 0.08 | 0.68 (−0.11; 0.94) | |
| LST-LPA | 1 | 4 ± 4 | 5 ± 5 | 0.3 ± 3 | 2 ± 2 | 0.49 ± 0.13 | 0.76 (0.45; 0.91) |
| 2 | 16 ± 10 | 14 ± 10 | −2 ± 6 | 4 ± 4 | 0.64 ± 0.14 | 0.85 (0.60; 0.94) | |
| 3 | 73 ± 61 | 62 ± 39 | −11 ± 23 | 16 ± 18 | 0.78 ± 0.07 | 0.90 (0.53; 0.98) |
Note: WMH, ΔWMH, |ΔWMH| and DSC are shown as means ± SD. ICC is shown as means (95% confidence interval).
ΔWMH: mean difference in WMH volume (mL) between the reference segmentations and segmentations of the methods.
|ΔWMH|: mean absolute difference in WMH volume (mL) between the reference segmentations and segmentations of the methods.
DSC: dice similarity coefficient; ICC: intra-class correlation coefficient.
Seventeen subjects had a Fazekas scale of 1, eighteen subjects had a Fazekas scale of 2 and seven subjects had a Fazekas scale of 3.
Figure 2Bland Altman plots comparing WMH volume of each method versus the WMH volume of the reference segmentations. X-axis: mean WMH volume (in mL) of the automated and reference segmentations. Y-axis: difference (in mL) in WMH volume between the automated and reference segmentations. The lower (−1.96 SD) and upper (+1.96 SD) limits of agreement (dashed lines) and mean (straight line) are shown. A narrow width of the limits of agreement reflects a small amount of variation between the measurements of the reference and automated WMH segmentations. A positive difference on the y-axis is seen when WMH volume as measured by the automated method was larger than the reference WMH volume (i.e. overestimation). A negative difference on the y-axis is seen when WMH volume as measured by the automated method was smaller than the reference WMH volume (i.e. underestimation).
Considerations when choosing a method.
| Method | Spatial correspondence | Volumetric correspondence | Lesion load | Different field strength | Different scanners | Computational Time |
|---|---|---|---|---|---|---|
| Cascade | − | − | − | − | +/− | ++ |
| kNN-TTP | + | ++ | + | + | + | + |
| Lesion TOADS | − | +/− | − | + | − | +/− |
| LST-LGA | − | +/− | − | + | + | +/− |
| LST-LPA | +/− | ++ | + | +/− | +/− | + |
Note: ++: highly recommended; +: recommended; +/−; neutral; −: not recommended. Spatial correspondence: based on Dice’s Similarity Coefficient (DSC). Volumetric correspondence: based on intraclass correlation coefficient (ICC) and mean and mean absolute WMH volume differences. Lesion load: based on both spatial and volumetric correspondence with varying lesion loads. Different field strength: based on both spatial and volumetric correspondence on 1.5 Tesla compared to 3 Tesla MRI scanner of the same MRI vendor. Different scanners: based on the variation in performance across scanners, both in terms of spatial and volumetric correspondence. The (qualitative) recommendations were based on the results of the present study.
Overview of MRI sequence parameters for each scanner.
| Center | Scanner vendor, type | Tesla | Sequence | Slices | TR (ms) | TE (ms) | TI (ms) | Voxel size (mm) |
|---|---|---|---|---|---|---|---|---|
| A | GE, Signa HDxt | 1.5 | 3D T1 | 172 | 12.3 | 5.2 | — | 0.98 × 0.98 × 1.50 |
| 3D FLAIR | 128 | 6500 | 117 | 1987 | 1.21 × 1.21 × 1.30 | |||
| A | GE, Signa HDxt | 3 | 3D T1 | 176 | 7.8 | 3.0 | — | 0.94 × 0.94 × 1.00 |
| 3D FLAIR | 132 | 8000 | 126 | 2340 | 0.98 × 0.98 × 1.20 | |||
| A | GE, Discovery MR750 | 3 | 3D T1 | 176 | 8.2 | 3.2 | — | 0.94 × 0.94 × 1.00 |
| 3D FLAIR | 160 | 8000 | 130 | 2340 | 0.98 × 0.98 × 1.20 | |||
| A | Philips, Ingenuity | 3 | 3D T1 | 180 | 9.9 | 4.6 | — | 0.87 × 0.87 × 1.00 |
| 3D FLAIR | 321 | 4800 | 279 | 1650 | 1.04 × 1.04 × 0.56 | |||
| B | Philips, Achieva | 3 | 3D T1 | 192 | 7.9 | 4.5 | — | 1.00 × 1.00 × 1.00 |
| 2D FLAIR | 48 | 11000 | 125 | 2800 | 0.96 × 0.95 × 3.00 | |||
| B | Philips, Ingenia | 3 | 3D T1 | 192 | 7.9 | 4.5 | — | 1.00 × 1.00 × 1.00 |
| 2D FLAIR | 48 | 11000 | 125 | 2800 | 0.96 × 0.95 × 3.00 |
Note: A = Amsterdam University Medical Center; B = Utrecht University Medical Center; TR = repetition time; TE = echo time; TI = inversion time.