| Literature DB >> 34873789 |
Isabel Hotz1,2, Pascal Frédéric Deschwanden1, Franziskus Liem2, Susan Mérillat2, Brigitta Malagurski2, Spyros Kollias3, Lutz Jäncke1,2.
Abstract
White matter hyperintensities (WMH) of presumed vascular origin are frequently found in MRIs of healthy older adults. WMH are also associated with aging and cognitive decline. Here, we compared and validated three algorithms for WMH extraction: FreeSurfer (T1w), UBO Detector (T1w + FLAIR), and FSL's Brain Intensity AbNormality Classification Algorithm (BIANCA; T1w + FLAIR) using a longitudinal dataset comprising MRI data of cognitively healthy older adults (baseline N = 231, age range 64-87 years). As reference we manually segmented WMH in T1w, three-dimensional (3D) FLAIR, and two-dimensional (2D) FLAIR images which were used to assess the segmentation accuracy of the different automated algorithms. Further, we assessed the relationships of WMH volumes provided by the algorithms with Fazekas scores and age. FreeSurfer underestimated the WMH volumes and scored worst in Dice Similarity Coefficient (DSC = 0.434) but its WMH volumes strongly correlated with the Fazekas scores (rs = 0.73). BIANCA accomplished the highest DSC (0.602) in 3D FLAIR images. However, the relations with the Fazekas scores were only moderate, especially in the 2D FLAIR images (rs = 0.41), and many outlier WMH volumes were detected when exploring within-person trajectories (2D FLAIR: ~30%). UBO Detector performed similarly to BIANCA in DSC with both modalities and reached the best DSC in 2D FLAIR (0.531) without requiring a tailored training dataset. In addition, it achieved very high associations with the Fazekas scores (2D FLAIR: rs = 0.80). In summary, our results emphasize the importance of carefully contemplating the choice of the WMH segmentation algorithm and MR-modality.Entities:
Keywords: MRI; automated segmentation; healthy aging; validation; white matter hyperintensities
Mesh:
Year: 2021 PMID: 34873789 PMCID: PMC8886667 DOI: 10.1002/hbm.25739
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
Number of scans (N) broken down per modality (3D T1w, 2D FLAIR, and 3D FLAIR) and time points (baseline, 1‐, 2‐, and 4‐year follow‐up)
| Modality | Time points | Total | |||
|---|---|---|---|---|---|
| Baseline | 1‐year follow‐up | 2‐year follow‐up | 4‐year follow‐up | ||
| 3D T1‐weighted | 231 | 207 | 196 | 166 | 800 |
| 2D FLAIR | 228 | 203 | 174 | 157 | 762 |
| 3D FLAIR | 4 | 46 | 53 | 63 | 166 |
Name of the dataset subsets (FreeSurfer T1w, UBO 2D, BIANCA 2D, UBO 3D, and BIANCA 3D), applied algorithms, input modality/modalities for the different algorithms, and the number (N) of scans per subset
| Name of the subsets | Algorithm(s) | Input modality/modalities | Total |
|---|---|---|---|
| FreeSurfer T1w | FreeSurfer | 3D T1w | 800 |
|
UBO 2D BIANCA 2D | UBO Detector and BIANCA | 2D FLAIR + 3D T1w | 762 |
|
UBO 3D BIANCA 3D | UBO Detector and BIANCA | 3D FLAIR + 3D T1w | 166 |
The following metrics were used to determine the agreements between the operators (interoperator) and between the outcomes of the algorithms and the gold standards (validation)
| Metrics | Formulas (interoperator) | Formulas (validation) |
|---|---|---|
| Hausdorff distance for the 95th percentile (H95) |
| |
| Dice similarity coefficient (DSC) |
|
|
| Detection error rate (DER) (all clusters without intersection |
|
|
| Outline error rate (OER) (all clusters with intersection |
|
|
| Sensitivity = true positive ratio (TPR) |
| |
| False positive ratio (FPR) |
| |
represents the Kth ranked distance such that (Dubuisson & Jain, 1994).
MTA is the mean total area, area of rater A and area of rater B divided by 2 (Wack et al., 2012).
FIGURE 1Section of an overlay of three masks—one per operator—named as “mean mask” in a 3D FLAIR image on an axial plane with the different values displayed in different colors per operator (light green: all three operators classified the voxel as WMH (voxel value 1.0); dark green: two operators classified the voxel as WMH (voxel value ); orange: one operator classified the voxel as WMH (voxel value )
FIGURE 2Mean WMH volume in cm3 with standard error of the mean (SEM) of the manually segmented gold standards on the left side of the figure, and the corresponding mean WMH volumes estimated by the automated algorithms on the right side of the figure. **p < .01; ***p < .001
FIGURE 5Correlation matrix of WMH volume estimation within the different gold standards (GS) (T1w), GS (2D FLAIR), GS (3D FLAIR), and between the gold standards and the different algorithms outputs (FreeSurfer T1w, UBO 2D, BIANCA 2D, UBO 3D, and BIANCA 3D. Result of the correlation of the gold standards (all combinations: r = .97, p < .05)
Summary of the accuracy metrics for the WMH using the three different modalities (T1w, 3D FLAIR, and 2D FLAIR)
| FreeSurfer (T1w) | UBO (3D FLAIR) | UBO (2D FLAIR) | BIANCA (3D FLAIR) | BIANCA (2D FLAIR) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | 95% CI | Mean | 95% CI | Mean | 95% CI | Mean | 95% CI | Mean | 95% CI |
| Post hoc Dunn–Bonferroni (Holm correction) | |
| DSC | 0.434 | 0.375–0.493 | 0.500 | 0.435–0.567 | 0.531 | 0.471–0.591 | 0.602 | 0.531–0.672 | 0.561 | 0.498–0.624 | .001 | BIANCA 3D > FreeSurfer |
| H95 (mm) | 8.660 | 6.804–10.515 | 11.533 | 8.520–14.545 | 13.522 | 10.324–16.720 | 6.200 | 4.093–8.305 | 8.443 | 6.421–10.464 | <.001 | BIANCA 3D > UBO 3D, UBO 2D |
| Sensitivity | 0.315 | 0.260–0.370 | 0.427 | 0.353–0.501 | 0.572 | 0.500–0.643 | 0.611 | 0.538–0.683 | 0.575 | 0.492–0.657 | <.001 | UBO 2D, BIANCA 3D, BIANCA 2D > FreeSurfer; BIANCA 3D > UBO 3D |
| FPR | 4.62 | 0.0000–0.0001 | 0.0002 | 0.0001–0.0002 | 0.0004 | 0.0003–0.0006 | 0.0003 | 0.0001–0.0004 | 0.0004 | 0.0002–0.0005 | <.001 | FreeSurfer < all others; UBO 3D < UBO 2D |
| DER | 0.200 | 0.146–0.253 | 0.313 | 0.228–0.398 | 0.327 | 0.231–0.423 | 0.162 | 0.113–0.210 | 0.215 | 0.143–0.287 | <.001 | BIANCA 3D < UBO 3D, UBO 2D |
| OER | 0.932 | 0.839–1.025 | 0.687 | 0.629–0.746 | 0.611 | 0.540–0.683 | 0.635 | 0.538–0.733 | 0.664 | 0.586–0.742 | <.001 | All others < FreeSurfer |
Note: DER, detection error rate; DSC, Dice similarity coefficient; FPR, false positive ratio; H95, Hausdorff distance for the 95th percentile, sensitivity; ICC, interclass correlation coefficient; OER, outline error rate. Gray shadowed cells indicate significantly worse performance.
Comparison by Friedman.
A direct comparison is only allowed between the same modalities.
Fazekas scores versus WMH volumes (Spearman's rho between the Fazekas scores and the different WMH volume measures)
| Comparison | All ( | Median ( |
|---|---|---|
| Fazekas vs. FreeSurfer total | 0.72 | 0.73 (0.68–0.73) |
| Fazekas vs. UBO 2D total | 0.80 | 0.80 (0.78–0.82) |
| Fazekas vs. UBO 3D Total | 0.80 | 0.81 (0.79–0.95) |
| Fazekas vs. BIANCA 2D total | 0.41 | 0.41 (0.38–0.42) |
| Fazekas vs. BIANCA 3D total | 0.58 | 0.51 (0.32–0.72) |
| Fazekas PVWMH vs. UBO 2D PVWMH | 0.73 | 0.74 (0.68–0.78) |
| Fazekas PVWMH vs. UBO 3D PVWMH | 0.77 | 0.75 (0.45–0.78) |
| Fazekas PVWMH vs. BIANCA 2D PVWMH | 0.36 | 0.37 (0.33–0.40) |
| Fazekas PVWMH vs. BIANCA 3D PVWMH | 0.56 | 0.50 (0.43–0.69) |
| Fazekas DWMH vs. UBO 2D DWMH | 0.61 | 0.60 (0.59–0.65) |
| Fazekas DWMH vs. UBO 3D DWMH | 0.61 | 0.63 (0.51–0.95) |
| Fazekas DWMH vs. BIANCA 2D DWMH | 0.34 | 0.33 (0.30–0.41) |
| Fazekas DWMH vs. BIANCA 3D DWMH | 0.41 | 0.51 (0.23–0.95) |
Note: Shown are the correlations (r s) across the entire sample (All) and the median (Median) correlation across all four time points (minimum and maximum correlations are shown in brackets). Correlations r s > 0.6 are highlighted by gray shading. Spearman's rho (r ) = weak: 0.1–0.3, moderate: >0.3–0.6, strong: >0.6–0.9, perfect: >0.9 (Dancey & Reidy, 2017).
Abbreviations: DWMH, deep WMH; PVWMH, periventricular WMH.
Summary of the main effect of age on WMH volumes
| FreeSurfer T1w, | UBO 2D, | UBO 3D, | BIANCA 2D, | BIANCA 3D, | ||
|---|---|---|---|---|---|---|
|
Main effect WMH ~ age | Age entry |
|
|
|
|
|
| CI | 0.347–0.564 | 0.295–0.520 | 0.127–0.485 | 0.183–0.403 | 0.173–0.506 | |
| Cohen's | 0.594 | 0.482 | 0.294 | 0.326 | 0.368 | |
| Residuals |
Estimates CI |
0.116 [0.109–0.123] |
0.161 [0.152–0.172] |
0.132 [0.109–0.163] |
0.434 [0.409–0.462] |
0.417 [0.344–0.520] |
Note: ***p < .001.
Abbreviation: 95% CI, 95% confidence interval.
is the (standardized beta) fixed effect (slope). The WMH volumes are log‐transformed and z‐standardized. Chronological age is z‐standardized.
Cohen's d: small effect size: ≥0.2–, medium effect size: 0.5, large effect size: 0.8 (Cohen, 2013).
FIGURE 3Validation of the three algorithms with the subsets FreeSurfer T1w, UBO 2D, and BIANCA 2D. Scatter plot of total WMH volume (log‐transformed) according to the Fazekas score (a), and spaghetti plot of total WMH volume (cm3) and chronological age (years) (b)
Display per subsets and algorithm (BIANCA 2D, BIANCA 3D, UBO 2D, UBO 3D, and FreeSurfer T1w) with the outputted number (N) of segmented scans, number (n) of subjects with longitudinal data (at least two time points), number and percentage (in brackets) of subjects with outlier in longitudinal data, number of intervals between two measurement points, and number and percentage (in brackets) of outlier intervals between two measurement points
| Subsets |
|
|
|
|
|
|---|---|---|---|---|---|
| BIANCA 2D | 762 | 209 | 109 (52.15%) | 531 | 161 (30.32%) |
| BIANCA 3D | 166 | 39 | 7 (17.95%) | 48 | 8 (16.67%) |
| UBO 2D | 757 | 209 | 7 (3.35%) | 523 | 7 (1.34%) |
| UBO 3D | 166 | 39 | 2 (5.13%) | 48 | 2 (4.17%) |
| FreeSurfer T1w | 800 | 213 | 0 (0%) | 569 | 0 (0%) |
Explanation “intervals between two measurement points”: If a subject had three time points (Baseline, 1‐year follow‐up, and 4‐year follow‐up) this would result in two existing intervals.
The data point with the segmentation error (segmented eyeballs) is included.
FIGURE 4Modified Bland–Altman (Bland & Altman, 1986) plots for WMH volume for the different algorithms (total WMH volume gold standard minus total WMH volume algorithm. The x‐axes contain total WMH volumes of gold standards in cm3, the y‐axes contain absolute differences of total WMH volumes in cm3: S(x, y) = [gold standard WMH volumes (S1), gold standard WMH volumes – algorithm WMH volumes (S2); (S1, S1 – S2)]