| Literature DB >> 32062817 |
Cassidy M Fiford1, Carole H Sudre2,3,4, Hugh Pemberton2, Phoebe Walsh2, Emily Manning2, Ian B Malone2, Jennifer Nicholas5, Willem H Bouvy6, Owen T Carmichael7, Geert Jan Biessels6, M Jorge Cardoso2,3,4, Josephine Barnes2.
Abstract
Accurate, automated white matter hyperintensity (WMH) segmentations are needed for large-scale studies to understand contributions of WMH to neurological diseases. We evaluated Bayesian Model Selection (BaMoS), a hierarchical fully-unsupervised model selection framework for WMH segmentation. We compared BaMoS segmentations to semi-automated segmentations, and assessed whether they predicted longitudinal cognitive change in control, early Mild Cognitive Impairment (EMCI), late Mild Cognitive Impairment (LMCI), subjective/significant memory concern (SMC) and Alzheimer's (AD) participants. Data were downloaded from the Alzheimer's disease Neuroimaging Initiative (ADNI). Magnetic resonance images from 30 control and 30 AD participants were selected to incorporate multiple scanners, and were semi-automatically segmented by 4 raters and BaMoS. Segmentations were assessed using volume correlation, Dice score, and other spatial metrics. Linear mixed-effect models were fitted to 180 control, 107 SMC, 320 EMCI, 171 LMCI and 151 AD participants separately in each group, with the outcomes being cognitive change (e.g. mini-mental state examination; MMSE), and BaMoS WMH, age, sex, race and education used as predictors. There was a high level of agreement between BaMoS' WMH segmentation volumes and a consensus of rater segmentations, with a median Dice score of 0.74 and correlation coefficient of 0.96. BaMoS WMH predicted cognitive change in: control, EMCI, and SMC groups using MMSE; LMCI using clinical dementia rating scale; and EMCI using Alzheimer's disease assessment scale-cognitive subscale (p < 0.05, all tests). BaMoS compares well to semi-automated segmentation, is robust to different WMH loads and scanners, and can generate volumes which predict decline. BaMoS can be applicable to further large-scale studies.Entities:
Keywords: Alzheimer’s disease; Automated segmentation; Magnetic resonance imaging; Neurodegeneration; Vascular pathology; White matter hyperintensities
Mesh:
Year: 2020 PMID: 32062817 PMCID: PMC7338814 DOI: 10.1007/s12021-019-09439-6
Source DB: PubMed Journal: Neuroinformatics ISSN: 1539-2791
Fig. 1Flowchart of process from initial protocol (developed on 20 Diabetes Mellitus and controls subjects from Utrecht (DM2)), through to BaMoS segmentation assessment set. At each stage different subjects’ scans were used. ADNI2/Go = Alzheimer’s Disease Neuroimaging Initiative Phases 2 and GO.
Subject demographics and basic imaging information for the ADNI cohort. Demographics are shown for controls, Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI), Subjective/Significant Memory Concern (SMC) and Alzheimer’s disease (AD). Values are mean (SD) unless stated in the table, White matter hyperintensity (WMH) is reported as median, (interquartile range). Abbreviations: Mini-mental state examination (MMSE), Clinical Dementia Rating Global score (CDRGlobal), Trails A and Trails B and Alzheimer’s disease Assessment scale cognitive subscale (ADAS-Cog)
| Controls | SMC | EMCI | LMCI | AD | Group difference ( | ||
|---|---|---|---|---|---|---|---|
| N | 180 | 107 | 320 | 171 | 151 | ||
| Age at baseline, years | 73.4 (6.2) | 72.3 (5.5) | 71.0 (7.5) | 72.4 (7.6) | 74.9 (8.0) | <0.001 | |
| Male (%) | 46 | 43 | 54 | 56 | 56 | 0.08 | |
| Percentage APOE ε4 carriers | 33 | 36 | 47 | 60 | 71 | <0.001 | |
| Years of education | 16.5 (2.5) | 16.8 (2.5) | 16.0 (2.6) | 16.5 (2.5) | 15.7 (2.8) | <0.001 | |
| Race(%) | Asian | 1.11 | 0.00 | 1.25 | 0.58 | 3.31 | 0.2 |
| Native Hawaiian or Pacific | 0.00 | 0.00 | 0.31 | 0.58 | 0.00 | ||
| Black or African American | 9.44 | 2.80 | 3.44 | 3.51 | 3.97 | ||
| American Indian or Alaskan | 0.00 | 0.00 | 0.31 | 0.00 | 0.00 | ||
| White | 87.78 | 94.39 | 91.56 | 94.74 | 91.39 | ||
| More than one race | 1.11 | 2.80 | 2.19 | 0.58 | 1.32 | ||
| Race Unknown | 0.56 | 0.00 | 0.94 | 0.00 | 0.00 | ||
| Follow up time | 3.3 (1.5) | 2.1 (0.9) | 3.5 (1.8) | 2.9 (1.6) | 1.2 (0.7) | <0.001 | |
| Number of visits | 5.3 (1.5) | 4.1 (1.1) | 5.9 (2.2) | 5.5 (2.0) | 3.6 (1.1) | <0.001 | |
| Baseline MMSE | 29.0 (1.3) | 29.0 (1.3) | 28.3 (1.6) | 27.6 (1.8) | 23.1 (2.1) | <0.001 | |
| Baseline CDRGlobal | 0 (0) | 0 (0) | 0.5 (0.03) | 0.5 (0.03) | 0.8 (0.3) | <0.001 | |
| Baseline ADAS-Cog | 9.0 (4.4) | 8.9 (4.3) | 12.7 (5.5) | 18.8 (7.2) | 31.1 (8.5) | <0.001 | |
| Baseline Trails A | 33.3 (10.4) | 34.3 (13.0) | 36.9 (14.8) | 42.3(19.0) | 60.8 (33.4) | <0.001 | |
| Baseline Trails B | 81.8 (43.4) | 86.5 (41.0) | 99.0 (50) | 121.6 (70.2) | 195.5 (86.2) | <0.001 | |
| Baseline WMH (ml) | 3.4 (4.8) | 3.4 (4.4) | 3.8 (6.1) | 3.7 (8.1) | 5.8 (9.0) | <0.001 | |
Table comparing semi-automated segmentations between raters. Values are reported as median (inter-quartile range). Section A shows the median volumes, upper and lower quartiles of WMH volume from each rater, with (p value) showing statistical difference in each volume compared to rater 1. Inter-rater reliability (Intra-class coefficient) is shown between all raters with 95% confidence intervals. Section B of the table shows each raters performance compared to rater 1, correlation of WMH volumes using intra class correlation coefficient (ICC) with 95% confidence intervals, Dice scores of overlap, outline error false positive (OEFP) which, for a given shared WMH lesion, denotes voxels included in the segmentation which are not in the reference; outline error false negative (OEFN) which denotes, for a given shared WMH lesion, voxels which are included in the reference and not the segmentation; detection error false positive (DEFP) which denotes voxels included in the segmentation and not the reference (false positive lesions), and detection error false negative (DEFN) denoting lesions included in the reference and not the segmentation (missed lesions). Section C compares each rater to a consensus of the three remaining raters, using the metrics from section B. Statistical tests are shown for differences between each spatial metric for each rater. There were 10 controls and 10 AD patients from each of the three scanner types (Siemens, Philips and General Electric scanners)
| A. | Rater 1 | Rater 2 | Rater 3 | Rater 4 | Test between raters |
| WMH Volume (ml) | 5.70 | 6.07 | 5.96 | 5.62 | |
| (3.12–12.60) | (3.37–14.19) | (3.16–12.11) | (3.14–12.33) | ||
| (0.63) | (0.93) | (0.91) | |||
| Inter-rater reliability | 0.974 (0.96–0.98) | ||||
| B. | Semi- Automated Comparison to Rater 1 | ||||
| ICC | 0.956 (0.92–0.98) | 0.998 (0.99–0.99) | 0.992 (0.99–0.99) | ||
| Dice Score | 0.88 (0.84–0.92) | 0.94 (0.91–0.97) | 0.89 (0.87–0.93) | <0.001 | |
| OEFP | 122.5 (45–407.5) | 53.5 (8–147) | 81 (36.5–255.5) | 0.01 | |
| OEFN | 49.5 (13.5–145.5) | 38 (4.5–129) | 61 (16.5–205.5) | 0.07 | |
| DEFP | 56 (31.5–89.5) | 21.5 (5.5–50) | 30.5 (14–52) | <0.001 | |
| DEFN | 17 (6–77) | 22.5 (7.5–51.5) | 25 (8.5–106) | 0.3 | |
| C. | Semi-Automated Comparison to Consensus | ||||
| ICC | 0.997 (0.99–0.99) | 0.944 (0.87–0.97) | 0.995 (0.99–0.99) | 0.992 (0.99–0.99) | |
| Dice Score | 0.93 (0.9–0.95) | 0.90 (0.86–0.94) | 0.93 (0.89–0.95) | 0.91 (0.88–0.94) | 0.01 |
| OEFP | 42.5 (10–126.5) | 106 (35–312) | 44.5 (17.5–88.5) | 76.5 (22–173.5) | <0.001 |
| OEFN | 68 (30–233) | 42.5 (14.5–114) | 72 (28.5–226) | 65.5 (21–199) | 0.07 |
| DEFP | 16 (6–73.5) | 52 (30–80.5) | 25.5 (10.5–80) | 27 (14–46) | 0.002 |
| DEFN | 23.5 (12–44.5) | 13.5 (3.5–78.5) | 26 (18.5–56.5) | 22.5 (8–78) | 0.3 |
Table comparing semi-automated segmentations between rater 1’s first and second segmentation. Values are reported as median (inter-quartile range), unless stated. Section A shows the WMH volume from the first and second segmentation rounds, and (p value) showing statistical differences between these WMH volumes. Intra-rater reliability (intra class correlation coefficient) with 95% confidence intervals is reported. Section B of the table shows Dice scores of overlap, outline error false positive (OEFP) which, for a given shared WMH lesion, denotes voxels included in the segmentation which are not in the reference; outline error false negative (OEFN) which denotes, for a given shared WMH lesion, voxels which are included in the reference and not the segmentation; detection error false positive (DEFP) which denotes voxels included in the segmentation and not the reference (false positive lesions), and detection error false negative (DEFN) denoting lesions included in the reference and not the segmentation (missed lesions). There were 10 controls and 10 AD patients from each of the three scanner types (Siemens, Philips and General Electric scanners)
| A. | Rater 1 | Rater 1 |
| First segmentation | Second segmentation | |
| Volume | 5.70 | 5.31 |
| (3.12–12.60) | (2.73–11.00) | |
| (0.4) | ||
| Intra-rater reliability | 0.976 (0.92–0.99) | |
| B. | Comparison to first segmentation | |
| Dice Score | 0.91 | |
| (0.86–0.94) | ||
| OEFP | 34.5 | |
| (12.5–100.5) | ||
| OEFN | 149 | |
| (69–373) | ||
| DEFP | 7.5 | |
| (4–24) | ||
| DEFN | 24 | |
| (9–55) | ||
Table comparing semi-automated segmentations from each rater, and consensus of the 4 raters, to BaMoS automated values. Values are reported as median (inter-quartile range), unless stated. Volumes from each rater, the consensus, and BaMoS are reported, with (p value) showing difference compared to BaMoS. Correlation coefficients are given for each method compared to BaMoS using intra class correlation coefficient (ICC) with 95% confidence intervals. Spatial metrics of the following are given for to compare BaMoS with each rater/consensus as the reference; Dice scores of overlap, outline error false positive (OEFP) which, for a given shared WMH lesion, denotes voxels included in the segmentation which are not in the reference; outline error false negative (OEFN) which denotes, for a given shared WMH lesion, voxels which are included in the reference and not the segmentation; detection error false positive (DEFP) which denotes voxels included in the segmentation and not the reference (false positive lesions), and detection error false negative (DEFN) denoting lesions included in the reference and not the segmentation (missed lesions). Statistical tests are shown for differences between each spatial metric for each rater. There were 10 controls and 10 AD patients from each of the three scanner types (Siemens, Philips and General Electric scanners)
| BaMoS | Rater 1 | Rater 2 | Rater 3 | Rater 4 | Semi-automated Consensus | Test (BaMoS vs raters) | |
|---|---|---|---|---|---|---|---|
| Volume | 5.56 | 5.70 | 6.07 | 5.96 | 5.62 | 5.61 | |
| 3.88–11.18) | (3.12–12.60) | (3.37–14.19) | (3.16–12.11) | (3.14–12.33) | (2.94–11.94) | ||
| (0.94) | (0.58) | (0.87) | (0.97) | (0.83) | |||
| Comparison to BaMoS | |||||||
| ICC | 0.958 | 0.875 | 0.958 | 0.944 | 0.959 | ||
| (0.93–0.97) | (0.78–0.93) | (0.93–0.97) | (0.91–0.97) | (0.93–0.98) | |||
| Dice Score | 0.73 | 0.74 | 0.73 | 0.72 | 0.74 | >0.9 | |
| (0.63–0.81) | (0.66–0.81) | (0.64–0.8) | (0.66–0.8) | (0.66–0.82) | |||
| OEFP | 261.5 | 219.5 | 263 | 245.5 | 250 | 0.5 | |
| (144.5–490) | (136.5–420) | (145–502) | (152.5–481) | (150–498) | |||
| OEFN | 234.5 | 306.5 | 255 | 260.5 | 226 | 0.3 | |
| (114–544.5) | (147–738.5) | (124–527.5) | (121.5–640) | (120–543.5) | |||
| DEFP | 197 | 169 | 203.5 | 196 | 210 | 0.2 | |
| (144.5–255.5) | (114–220.5) | (151–255) | (147.5–273) | (150–264) | |||
| DEFN | 48.5 | 53 | 47 | 45 | 26 | 0.8 | |
| (11.5–147.5) | (33–120) | (13–131) | (18–108.5) | (8–73.5) | |||
Confusion matrix showing overall differences between BaMoS and the semi-automated consensus segmentations in the 60 semi-automatically segmented individuals.
| BaMoS | |||
|---|---|---|---|
| No Lesion | Lesion | ||
| Semi-automated consensus | No Lesion | NA | 35.0 (OE:22.1 / DE:12.9) |
| Lesion | 27.6 (OE:23.8 / DE:3.8) | 114.0 | |
Figures represent sum over 60 subjects in mls. NA not applicable, OE outline error, DE detection error
Fig. 2Bland Altmann of BaMoS generated WMH volumes compared to consensus of 4 raters WMH volumes. The difference between the two volumes is plotted on the y axis and the mean of the two volumes is plotted on the x axis. The mean difference between the two volumes is represented by the black line, and the 95% limits of agreement are the dotted line (mean difference ± 1.96*standard deviation of the mean difference)
Fig. 3Bullseye plots showing ratios of spatial metrics as a proportion of total error (a, c, e and g) and as a proportion of true positive white matter hyperintensity (WMH) volume (b, d, f and g). Each concentric ring of the bullseye represents a cortical WM layer from each lobe, with the innermost ring representing the inner cortical layer (closest to the midline ventricles), and the outer ring representing the cortical layer nearest the grey matter. A and b report outline error false positive (OEFP) denoting voxels included in the segmentation (BaMoS) which are not in the reference (consensus). C and d represent outline error false negative (OEFN), voxels which are included in the reference and not the segmentation. Bullseyes e and f show detection error false positive (DEFP) denoting voxels included in the segmentation and not the reference (false positive lesions). g and h show detection error false negative (DEFN) denoting lesions included in the reference and not the segmentation (missed lesions)
Fig. 4Images showing differences in spatial metrics between BaMoS automated segmentation and consensus of all 4 raters, in subjects with low, medium and high WMH loads. FLAIR images are shown in the left column, with difference maps overlaid in the right column. Blue voxels signify outline error false positive (OEFP) which, for a given shared WMH lesion, denotes voxels included in BaMoS which are not in the consensus. Yellow represents outline error false negative (OEFN) which denotes, for a given shared WMH lesion, voxels which are included in the consensus and not in BaMoS. Green represents detection error false positive (DEFP) which denotes voxels included in BaMoS and not the reference (false positive lesions). Red represents detection error false negative (DEFN) denoting voxels included as lesion in the consensus and not BaMoS
Results of the models of neuropsychological change predicted by white matter hyperintensity (log2WMH) volume. Values are shown as estimate (p value) [95% confidence intervals]. Models were run separately in each group; controls, early Mild Cognitive Impairment (EMCI), late mild cognitive impairment (LMCI), Subjective/Significant Memory Concern (SMC) and Alzheimer’s disease (AD). Baseline scores and change in each neuropsychology test predicted by the model are reported; Mini-mental state examination (MMSE), Clinical Dementia Rating Global score (CDRGlobal), Trails A and Trails B and Alzheimer’s disease Assessment scale- cognitive subscale (ADAS-Cog). Estimates are shown for a change in neuropsychology (baseline or change in) for a doubling of baseline WMH compared to the average baseline volume. Models are adjusted for age, sex, years of education, APOE genotype (binary covariate indicating presence of an ε4 allele). Models were bootstrapped for all groups apart from AD.
| Controls | EMCI | LMCI | SMC | AD | ||
|---|---|---|---|---|---|---|
| Baseline | MMSE | 28.96 | 28.12 | 28.30 | 28.62 | 21.96 |
| (<0.001) | (0.00) | (<0.01) | (<0.001) | (<0.01) | ||
| [28.65, 29.28] | [27.47, 28.76] | [27.17, 29.43] | [27.83, 29.41] | [20.53, 23.39] | ||
| CDRGlobal | 0.02 | 0.42 | 0.53 | 0.02 | 0.82 | |
| (0.27) | (<0.001) | (<0.01) | (0.40) | (<0.001) | ||
| [−0.02, 0.06] | [0.38, 0.47] | [0.48, 0.57] | [−0.02, 0.06] | [0.65, 0.99] | ||
| ADAS-Cog | 9.00 | 11.97 | 11.97 | 11.27 | 27.98 | |
| (<0.001) | (<0.001) | (<0.001) | (<0.001) | (<0.01) | ||
| [7.27, 10.74] | [9.79, 14.16] | [7.13, 16.80] | [8.21, 14.32] | [22.45, 33.51] | ||
| Trails A | 35.88 | 44.29 | 49.09 | 41.60 | 68.00 | |
| (<0.001) | (<0.01) | (<0.001) | (<0.001) | (<0.001) | ||
| [31.16, 40.59] | [38.14, 50.44] | [36.56, 61.63] | [30.33, 52.86] | [46.47, 89.53] | ||
| Trails B | 102.33 | 124.61 | 128.56 | 95.52 | 208.44 | |
| (<0.001) | (<0.01) | (<0.001) | (<0.001) | (<0.01) | ||
| [74.16, 130.50] | [102.76, 146.46] | [73.56, 183.56] | [69.22,121.82] | [155.98, 260.90] | ||
| Change in | MMSE | −0.10 | −0.23 | −1.12 | −0.15 | −2.15 |
| (0.01) | (<0.001) | (<0.01) | (0.01) | (<0.001) | ||
| [−0.17, −0.03] | [−0.33, −0.14] | [−1.38, −0.86] | [−0.26, −0.03] | [−2.69, −1.61] | ||
| CDRGlobal | 0.02 | −0.01 | 0.09 | 0.05 | 0.24 | |
| (<0.001) | (0.25) | (<0.01) | (<0.001) | (<0.001) | ||
| [0.01, 0.03] | [−0.01, 0.00] | [0.06, 0.12] | [0.03, 0.07] | [0.17, 0.30] | ||
| ADAS-Cog | 0.06 | 0.61 | 2.36 | 0.03 | 5.44 | |
| (0.49) | (<0.001) | (<0.001) | (0.85) | (<0.001) | ||
| [−0.11, 0.22] | [0.38, 0.85] | [1.79, 2.93] | [−0.30, 0.37] | [4.25, 6.63] | ||
| Trails A | 0.07 | 0.31 | 4.16 | 0.22 | 10.46 | |
| (0.75) | (0.35) | (<0.001) | (0.74) | (<0.001) | ||
| [−0.36, 0.49] | [−0.33, 0.95] | [2.26, 6.07] | [−1.09, 1.53] | [5.79, 15.13] | ||
| Trails B | 2.31 | 2.76 | 11.13 | 1.37 | 21.95 | |
| (0.02) | (<0.01) | (<0.001) | (0.33) | (<0.001) | ||
| [0.37, 4.25] | [0.94, 4.59] | [6.44, 15.83] | [−1.39, 4.14] | [10.63, 33.28] | ||
| Effect of WMH on Baseline | MMSE | −0.08 | −0.02 | −0.20 | 0.15 | 0.02 |
| (0.21) | (0.73) | (0.05) | (0.05) | (0.87) | ||
| [−0.19, 0.04] | [−0.13, 0.09] | [−0.40, 0.01] | [0.00, 0.30] | [−0.28, 0.33] | ||
| CDRGlobal | 0.00 | 0.00 | −0.00 | 0.00 | 0.02 | |
| (0.45) | (0.68) | (0.64) | (0.75) | (0.37) | ||
| [−0.00, 0.01] | [−0.01, 0.01] | [−0.02, 0.01] | [−0.01, 0.01] | [−0.02, 0.05] | ||
| ADAS-Cog | 0.03 | 0.18 | 1.06 | 0.58 | 0.10 | |
| (0.89) | (0.44) | (0.01) | (0.04) | (0.86) | ||
| [−0.39, 0.45] | [−0.27, 0.63] | [0.27, 1.84] | [0.02, 1.14] | [−1.06, 1.26] | ||
| Trails A | 1.37 | 0.63 | 1.26 | 1.03 | 0.57 | |
| (0.02) | (0.34) | (0.31) | (0.19) | (0.81) | ||
| [0.18, 2.57] | [−0.65, 1.92] | [−1.16, 3.68] | [−0.51, 2.57] | [−4.18, 5.32] | ||
| Trails B | 3.61 | 3.99 | 4.67 | 1.15 | 5.21 | |
| (0.17) | (0.06) | (0.37) | (0.65) | (0.42) | ||
| [−1.51, 8.72] | [−0.19, 8.18] | [−5.65, 14.99] | [−3.76, 6.07] | [−7.38, 17.79] | ||
| Effect of WMH on change in | MMSE | −0.07 | −0.07 | −0.08 | −0.11 | 0.06 |
| (0.04) | (0.01) | (0.34) | (0.03) | (0.77) | ||
| [−0.13, −0.00] | [−0.13, −0.02] | [−0.24, 0.08] | [−0.21, −0.01] | [−0.32, 0.44] | ||
| CDRGlobal | 0.00 | 0.01 | 0.03 | −0.00 | 0.04 | |
| (0.37) | (0.08) | (0.01) | (0.68) | (0.08) | ||
| [−0.01, 0.01] | [−0.00, 0.01] | [0.01, 0.05] | [−0.02, 0.01] | [−0.01, 0.10] | ||
| ADAS-Cog | 0.12 | 0.18 | 0.22 | 0.01 | −0.21 | |
| (0.05) | (0.03) | (0.26) | (0.47) | (0.62) | ||
| [−0.00, 0.24] | [0.02, 0.34] | [−0.16, 0.59] | [−0.16, 0.36] | [−1.05, 0.63] | ||
| Trails A | 0.19 | 0.12 | 0.29 | 0.61 | −1.63 | |
| (0.30) | (0.56) | (0.64) | (0.11) | (0.34) | ||
| [−0.17, 0.55] | [−0.29, 0.53] | [−0.93, 1.52] | [−0.17, 1.37] | [−4.99, 1.72] | ||
| Trails B | 1.55 | 0.94 | 0.07 | 0.12 | −1.11 | |
| (0.06) | (0.07) | (0.96) | (0.92) | (0.80) | ||
| [−0.09, 3.18] | [−0.09, 1.97] | [−2.68, 2.82] | [−2.25, 2.54] | [−9.91,7.69] |