| Literature DB >> 35088930 |
Parisa Mojiri Forooshani1,2, Mahdi Biparva1,2, Emmanuel E Ntiri1,2, Joel Ramirez1,2, Lyndon Boone1,3, Melissa F Holmes1,2, Sabrina Adamo1,2, Fuqiang Gao1,2, Miracle Ozzoude1,2, Christopher J M Scott1,2, Dar Dowlatshahi4, Jane M Lawrence-Dewar5, Donna Kwan6, Anthony E Lang7, Karine Marcotte8,9, Carol Leonard10,11, Elizabeth Rochon11,12, Chris Heyn1,13, Robert Bartha14, Stephen Strother15, Jean-Claude Tardif16, Sean Symons13, Mario Masellis17, Richard H Swartz17, Alan Moody13, Sandra E Black1,2,17, Maged Goubran1,2,3.
Abstract
White matter hyperintensities (WMHs) are frequently observed on structural neuroimaging of elderly populations and are associated with cognitive decline and increased risk of dementia. Many existing WMH segmentation algorithms produce suboptimal results in populations with vascular lesions or brain atrophy, or require parameter tuning and are computationally expensive. Additionally, most algorithms do not generate a confidence estimate of segmentation quality, limiting their interpretation. MRI-based segmentation methods are often sensitive to acquisition protocols, scanners, noise-level, and image contrast, failing to generalize to other populations and out-of-distribution datasets. Given these concerns, we propose a novel Bayesian 3D convolutional neural network with a U-Net architecture that automatically segments WMH, provides uncertainty estimates of the segmentation output for quality control, and is robust to changes in acquisition protocols. We also provide a second model to differentiate deep and periventricular WMH. Four hundred thirty-two subjects were recruited to train the CNNs from four multisite imaging studies. A separate test set of 158 subjects was used for evaluation, including an unseen multisite study. We compared our model to two established state-of-the-art techniques (BIANCA and DeepMedic), highlighting its accuracy and efficiency. Our Bayesian 3D U-Net achieved the highest Dice similarity coefficient of 0.89 ± 0.08 and the lowest modified Hausdorff distance of 2.98 ± 4.40 mm. We further validated our models highlighting their robustness on "clinical adversarial cases" simulating data with low signal-to-noise ratio, low resolution, and different contrast (stemming from MRI sequences with different parameters). Our pipeline and models are available at: https://hypermapp3r.readthedocs.io.Entities:
Keywords: Bayesian neural networks; adversarial attacks; deep learning; image segmentation; uncertainty estimation; vascular lesions; white matter hyperintensity
Mesh:
Year: 2022 PMID: 35088930 PMCID: PMC8996363 DOI: 10.1002/hbm.25784
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
Participant demographics, clinical diagnosis, MoCA scores, ICV and ventricle volumes, WMH volumes, and MRI parameters in the training and test datasets
| Training dataset ( | Test dataset ( | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CAIN ( | ONDRI ( | LIPA ( | VBH ( | CAIN ( | ONDRI ( | MITNEC ( | LIPA ( | VBH ( | ||
| Demographics | ||||||||||
| Diagnostic group | CN | PD, CVD | CN, FTD | AD, CVD, CN | CN | PD, CVD | MCI, AD, CVD | CN, FTD | AD, CVD, CN | |
| Age (years) | 70.4 (7.8) | 68.3 (6.9) | 67.0 (7.3) | 76.4 (9.0) | 69 (7.5) | 67.9 (5.6) | 77 (8.7) | 67.9 (12.1) | 77.5 (11.7) | |
| Sex ( | 79 (39.0) | 40 (25.0) | – | 14 (43.8) | 22 (44.0) | 12 (31.5) | 24 (45.2) | – | 4 (50.0) | |
| MOCA total score | – | 25.9 (2.8) | – | – | – | 25.3 (2.9) | 22.2 (4.8) | – | – | |
| Neuroimaging metrics | ||||||||||
| ICV (cc) | 1241.5 (118.5) | 1286.1 (135.8) | 1255.7 (132.1) | 1253.1 (171.0) | 1234.9 (140.0) | 1264.38 (151.0) | 1225.8 (174.2) | 1254.5 (153.9) | 1291.2 (149.6) | |
| vCSF (cc) | 36.9 (19.7) | 38.8 (19.9) | 44.3 (20.0) | 56.5 (37.2) | 36.8 (19.7) | 33.1 (18.1) | 53.3 (21.6) | 45.1 (18.2) | 48.3 (21.3) | |
| pvWMH (cc) | 7.7 (9.3) | 7.2 (10.1) | 6.7 (9.9) | 20.1 (22.6) | 6.2 (4.9) | 5.5 (7.9) | 32.5 (22.0) | 2.8 (1.6) | 33.6 (35.2) | |
| dWMH (cc) | 1.1 (1.6) | 0.7 (1.0) | 0.6 (0.7) | 2.1 (2.4) | 0.8 (1.1) | 0.6 (0.8) | 2.2 (1.2) | 0.5 (0.5) | 2.4 (2.3) | |
| MRI acquisition parameters | ||||||||||
| T1 (SPGR) | In‐plane resolution (mm) | 0.94 × 1.1 | 1 × 1 | 0.9 × 0.9 | 0.94 × 1.1 | 1 × 1 | 0.9 × 0.9 | |||
| Slice thickness (mm) | 1.4 | 1 | 1 | 1.4 | 1 | 1 | ||||
| FOV (mm) | 240 | 256 | 220 | 240 | 256 | 220 | ||||
| FLAIR | In‐plane resolution (mm) | 1 × 1.1 | 0.94 × 0.94 | 0.9 × 0.9 | 1 × 1.1 | 0.94 × 0.94 | 0.9 × 0.9 | |||
| Slice thickness (mm) | 3 | 3 | 3 | 3 | 3 | 3 | ||||
| FOV (mm) | 230 | 240 | 220 | 240 | 240 | 220 | ||||
| Filed strength | 3 T | 3 T | 3 T | 3 T | 3 T | 3 T | ||||
Note: These datasets are used for training and testing SOTA. Data are presented as mean ± SD unless otherwise specified.
Abbreviations: AD, Alzheimer's disease; CN, cognitively normal; CVD, cerebrovascular disease; dWMH, deep white matter hyperintensity; FLAIR, fluid‐attenuated inversion recovery; FTD, frontotemporal dementia; ICV, intracranial volume; MCI, mild cognitive impairment; MoCA, Montreal cognitive assessment; PD, Parkinson's disease; pvWMH, periventricular white matter hyperintensity; SOTA, state‐of‐the‐art; VCI, vascular cognitive impairment; vCSF, ventricular cerebrospinal fluid.
Three hundred seventy‐eight used for training and 54 for validation during training.
FIGURE 1(a) Proposed architecture for the Bayesian 3D U‐Net convolutional neural network with residual blocks and dilated convolutions. (b) Overall inference pipeline to generate WMH segmentation and uncertainty maps as well as a second network to differentiate dWMH and pvWMH. dWMH, deep white matter hyperintensity; pvWMH, periventricular white matter hyperintensity; WMH, white matter hyperintensity
FIGURE 2An example of WMH segmentation and uncertainty estimation, showing on a FLAIR scan, the Bayesian model's prediction and estimated epistemic uncertainty in (a) axial, (b) sagittal, and (c) coronal views. Blue represents the overlap between ground truth and prediction, red (and green arrow heads) represents ground truth voxels missing in prediction (undersegmentation), green (and green arrow heads) represents prediction voxels not in the ground truth (oversegmentation). Red boxes represent “false‐positive” voxels (model predictions) that are indeed positive voxels and were missed in the manual editing of the semiautomated ground truth labels. FLAIR, fluid‐attenuated inversion recovery; WMH, white matter hyperintensity
Evaluation of WMH segmentation on different methods with the following metrics: Dice similarity coefficient, Hausdorff distance in “mm” unit (modified as 95th percentile) (HD95), absolute volume difference, sensitivity (Recall), and F‐1 score for individual lesions
| HyperMapper baseline | HyperMapper Bayesian | BIANCA | DeepMedic | |
|---|---|---|---|---|
| Dice similarity coefficient | 0.892 (±0.080) |
| 0.604 (±0.222) | 0.858 (±0.080) |
| Hausdorff distance (HD95) (mm) ↓ | 3.045 (±4.417) |
| 29.684 (±17.105) | 4.477 (±6.322) |
| Absolute volume difference (%) ↓ |
| 9.843 (±12.134) | 108.970 (±216.346) | 13.846 (±14.731) |
| Sensitivity (recall) | 0.762 (±0.149) | 0.762 (±0.150) |
| 0.764 (±0.153) |
| F‐1 score | 0.752 (±0.119) |
| 0.199 (±0.148) | 0.652 (±0.126) |
| Time (s) |
|
| 24 | 25 |
Note: ↓ indicates that smaller values represent better performance.
Values in bold indicate best performance.
Abbreviation: WMH, white matter hyperintensity.
FIGURE 3Evaluation of WMH segmentations across tested methods using the following metrics: Dice similarity coefficient, modified Hausdorff distance (HD95), absolute volume difference (%), and Lesion F1. not significant: ns, p < .05: *; p < .01: **; p < .001: ***; p < .0001: ****. WMH, white matter hyperintensity
FIGURE 4Visual comparison of the tested methods in an example subject. Blue represents the overlap between ground truth and prediction (true‐positive voxels), red (and red arrows) represents ground truth voxels missing in prediction (false‐negative voxels), green (and green arrows) represents prediction voxels not in ground truth (false‐positive voxels)
FIGURE 5WMH segmentation and uncertainty maps of cases with the highest (a) and lowest (b) Dice similarity coefficients from the test set. Red arrowheads and circles highlight areas of under‐segmented and green arrowheads and circles highlight areas that were over‐segmented. Red boxes represent an enlarged perivascular space (PVS) in the frontal lobe that was mislabelled as WMH in the ground truth data, but accurately not captured by our model as WMH. WMH, white matter hyperintensity
FIGURE 6An example of WMH segmentation on a FLAIR scan (axial, sagittal, and coronal views), showing the Bayesian model's total WMH prediction, dWMH and pvWMH prediction, as well as ground truth labels. Blue labels represent Bayesian model WMH prediction, red labels represent dWMH, and green labels represent pvWMH. dWMH, deep white matter hyperintensity; FLAIR, fluid‐attenuated inversion recovery; pvWMH, periventricular white matter hyperintensity; WMH, white matter hyperintensity
Evaluation of WMH segmentation on mild WMH cases
| HyperMapper baseline | HyperMapper Bayesian | BIANCA | DeepMedic | |
|---|---|---|---|---|
| Dice similarity coefficient | 0.841 (±0.097) |
| 0.340 (±0.067) | 0.779 (±0.092) |
| Hausdorff distance (HD95) (mm) ↓ | 5.365 (±6.450) |
| 44.042 (±9.564) | 8.236 (±7.190) |
| Absolute volume difference (%) ↓ |
| 11.393 (±12.812) | 275.970 (±324.757) | 22.087 (±20.971) |
| Sensitivity (recall) | 0.712 (±0.161) | 0.712 (±0.160) |
| 0.749 (±0.160) |
| F‐1 score | 0.720 (±0.116) |
| 0.096 (±0.048) | 0.594 (±0.120) |
Note: ↓ indicates that smaller values represent better performance.
Values in bold indicate best performance.
Abbreviation: WMH, white matter hyperintensity.
Evaluation of WMH segmentation on different adversarial attacks
| Adversarial attacks | HyperMapper Bayesian | BIANCA | DeepMedic | |
|---|---|---|---|---|
| Dice similarity coefficient | Noise (sigma = 0.2) |
| 0.252 (±0.130) | 0.639 (±0.217) |
| Downsampled (2 × 2 × 2) |
| 0.348 (±0.176) | 0.306 (±0.208) | |
| Contrast (gamma = 0.5) |
| 0.389 (±0.184) | 0.677 (±0.198) | |
| Hausdorff distance (HD95) (mm) ↓ | Noise (sigma = 0.2) |
| 31.521 (±14.999) | 14.092 (±13.626) |
| Downsampled (2 × 2 × 2) |
| 30.469 (±18.451) | 50.049 (±21.236) | |
| Contrast (gamma = 0.5) |
| 29.488 (±17.077) | 10.742 (±10.5854) | |
| Absolute volume difference (%) ↓ | Noise (sigma = 0.2) |
| 159.602 (±278.002) | 42.627 (±24.601) |
| Downsampled (2 × 2 × 2) |
| 156.712 (±325.622) | 77.265 (±18.012) | |
| Contrast (gamma = 0.5) |
| 176.860 (±365.183) | 40.007 (±24.096) | |
| Sensitivity (recall) | Noise (sigma = 0.2) |
| 0.669 (±0.168) | 0.259 (±0.124) |
| Downsampled (2 × 2 × 2) |
| 0.601 (±0.217) | 0.072 (±0.040) | |
| Contrast (gamma = 0.5) | 0.601 (±0.200) |
| 0.348 (±0.191) | |
| F‐1 score | Noise (sigma = 0.2) |
| 0.139 (±0.142) | 0.376 (±0.134) |
| Downsampled (2 × 2 × 2) |
| 0.316 (±0.169) | 0.131 (±0.066) | |
| Contrast (gamma = 0.5) |
| 0.213 (±0.175) | 0.469(±0.166) |
Note: ↓ indicates that smaller values represent better performance.
Values in bold indicate best performance.
Abbreviation: WMH, white matter hyperintensity.
DeepMedic failed on one subject with increased noise.
DeepMedic failed on 62 subjects with lower resolution.
FIGURE 7WMH segmentation and uncertainty estimates using our Bayesian model under three types of adversarial attacks applied to the same subject (the addition of noise with a sigma of 0.2, downsampling of resolution by a factor of 2 × 2 × 2, and changing contrast with 0.5 gamma). WMH, white matter hyperintensity
FIGURE 9Visual comparison of the segmentation methods under three types of adversarial attacks (the addition of gamma noise with a sigma of 0.2, downsampling of resolution by a factor of 2 × 2 × 2, and changing contrast with 0.5 gamma)
FIGURE 8Evaluation of WMH segmentation on cases with increased noise. Not significant: ns, p < .05: *; p < .01: **; p < .001: ***; p < .0001: ***. WMH, white matter hyperintensity