| Literature DB >> 32739882 |
Quentin Vanderbecq1, Eric Xu2, Sebastian Ströer3, Baptiste Couvy-Duchesne4, Mauricio Diaz Melo5, Didier Dormont6, Olivier Colliot7.
Abstract
BACKGROUND: Manual segmentation is currently the gold standard to assess white matter hyperintensities (WMH), but it is time consuming and subject to intra and inter-operator variability.Entities:
Keywords: Artificial intelligence; Dementia; Microvascular; Segmentation; White matter hyperintensity
Mesh:
Year: 2020 PMID: 32739882 PMCID: PMC7394967 DOI: 10.1016/j.nicl.2020.102357
Source DB: PubMed Journal: Neuroimage Clin ISSN: 2213-1582 Impact factor: 4.881
Demographic information for the research dataset (from ADNI) and the clinical routine dataset. Continuous values are displayed as average with the min–max range within parentheses. For ADNI, we also display the characteristics of the training and testing datasets separately.
| ADNI | ROUTINE | |||
|---|---|---|---|---|
| All | Training | Testing | All | |
| N | 147 | 40 | 107 | 60 |
| Age | 74 | 74.7 | 73.7 | 78.2 |
| Sex | 85 | 19 | 66 | 30 |
Intra- and inter-rater reproducibility assessed on the training dataset from ADNI (comprising 40 patients).
| DSC | Volume similarity | Intraclass correlation | Volume error rate | False positive rate | False negative rate | |
|---|---|---|---|---|---|---|
| Intra-operator reproducibility | 0.744 | 0.899 | 0.987 | 0.185 | 0.196 | 0.292 |
| First segmentation first operator vs second operator | 0.723 | 0.884 | 0.984 | 0.277 | 0.324 | 0.199 |
| Second segmentation first operator vs second operator | 0.701 | 0.844 | 0.974 | 0.310 | 0.262 | 0.290 |
DSC: Dice similarity coefficient. For each metric, the table displays the average and the 95% confidence interval within parentheses.
Summary of evaluation, and some selected information to choose a method.
| Ranking on research data | Ranking on routine data | Robustness artifacts | Robustness different scanner | Sequences needed | Need training data | Limitations/ Requirements | Proc. time | |
|---|---|---|---|---|---|---|---|---|
| LPA | 2 | 1 | – | FLAIR | No | Matlab | 1 min | |
| LGA | 4 | 5 | – | FLAIR/ T1w | No | Matlab | 6 min | |
| BIANCA | 4 | 1 | – | FLAIR/ T1w | Yes | Need mask of WM | 17 min | |
| SLS | 2 | 1 | – | FLAIR/ T1w | No | Matlab | 8 min | |
| W2MHS | 8 | 8 | – | FLAIR/ T1w | No | Matlab | 5 min | |
| nicMSlesion (original) | 4 | 7 | – | FLAIR/ T1w | No | GPU | 10 min | |
| nicMSlesion (retrained) | 1 | 5 | – | FLAIR/ T1w | Yes | GPU | 10 min | |
| UBO | 4 | 4 | – | – | FLAIR/ T1w | No | Matlab | 9 min |
Ranking performed using t-test comparison on the primary criterion (DSC) (see Supplementary Tables 7 and 9 for details). We started by looking at the method with the best DSC. Then all methods not significantly different from it were given the same rank classified, and so on.
Processing time were evaluated on MacBook Pro laptop with a 2.2 GHz Intel Core i7 2018 CPU, without a graphic processing unit (GPU), with 16 Go RAM except for the nicMSlesion for which we used a GPU-equipped computer, namely a Linux workstation with an Intel Xeon E5-2699 @ 2.30 GHz CPU, with NVIDIA Quadro M4000 GPU, 256 Go RAM.
– indicates that the DSC is sensitive to artifacts or scanner type at p < 0.05 uncorrected for multiple comparisons, on routine dataset.
-- indicates that the DSC is sensitive to artifacts or scanner type at after correction for multiple testing, on routine dataset.
Best DSC in our evaluation (though not necessarily significantly better which explains equal first).
2 min for segmentation and 15 min for generation of the exclusion mask.
With graphic processing unit (GPU, NVIDIA Quadro M4000).
3.5 min for segmentation and 6.5 min for preprocessing
Retraining time.
Fig. 1DSC performance of the different automatic segmentation methods. Left : ADNI research dataset Right : clinical routine dataset. The boxplots show the median and the 25% and 75% percentiles of the metrics distribution. Values outside the whiskers indicate outliers. Gray dots show the value for individual participants. .
Performance of the different automatic segmentation methods on the research dataset ADNI.
| ADNI | DSC | Volume similarity | Volume error rate | Intraclass correlation | False positive rate | False negative rate |
|---|---|---|---|---|---|---|
| LPA | 0.539 | 0.734 | 0.850 | 0.812 | 0.438 | 0.366 |
| LGA | 0.474 | 0.759 | 0.426 | 0.680 | 0.444 | 0.535 |
| BIANCA | 0.469 | 0.638 | 0.760 | 0.417 | 0.393 | 0.481 |
| SLS | 0.527 | 0.732 | 0.903 | 0.890 | 0.564 | |
| W2MHS | 0.351 | 0.603 | 2.219 | 0.292 | 0.539 | 0.569 |
| nicMSlesion(original) | 0.454 | 0.787 | 0.694 | 0.948 | 0.517 | 0.503 |
| nicMSlesion(retrained) | 0.402 | |||||
| UBO | 0.486 | 0.762 | 0.907 | 0.881 | 0.587 | 0.360 |
For each metric, we present the average and the 95% confidence interval within parentheses. DSC: Dice similarity coefficient. Results in bold indicates the best score for each metric
Fig. 2Maps of False negative and False positive rate from each method on the ADNI research dataset. We represent masks of segmentation on MNI template. The first row of the plot represents an overlay of manual segmentation in the ADNI testing set. The greyscale ranges from 0%(white) to 33% (black) of WMH at any particular voxel. The left column of the plot represents the false negative rate map for each method in ADNI testing set. The right column shows the false positive rate map for each method on ADNI dataset. Scale ranges from 0 to 33% of errors at each voxel, which corresponds to the maximal error rates observed.
Performance of the different automatic segmentation methods on the clinical routine dataset.
| Routine | DSC | Volume similarity | Volume error rate | Intraclass correlation | False positive rate | False negative rate |
|---|---|---|---|---|---|---|
| LPA | 0.727 | 0.402 | ||||
| LGA | 0.490 | 0.729 | 2.533 | 0.287 | 0.560 | 0.354 |
| BIANCA | 0.607v(0.556–0.657) | 0.788 | 0.709 | 0.859 | 0.404 | 0.296 |
| SLS | 0.613 | 0.738 | 0.515 | 0.815 | 0.368 | |
| W2MHS | 0.223 | 0.448 | 0.682 | 0.510 | 0.461 | 0.844 |
| nicMSlesion(original) | 0.433 | 0.647 | 4.498 | 0.396 | 0.616 | 0.351 |
| nicMSlesion(retrained) | 0.500 | 0.781 | 1.349 | 0.505 | 0.433 | |
| UBO | 0.560 | 0.836 | 0.569 | 0.734 | 0.471 | 0.353 |
For each metric, the table displays the average and the 95 % confidence interval within parentheses. DSC: Dice similarity coefficient. Results in bold indicates the best score for each metric.
Fig. 3Maps of False negative and False positive rate from each method on the clinical routine dataset. We represent masks of segmentation on MNI template. The first row of the plot represents an overlay of manual segmentation in the ADNI testing set. The greyscale ranges from 0%(white) to 33% (black) of WMH at any particular voxel. The left column of the plot represents the false negative rate map for each method in ADNI testing set. The right column shows the false positive rate map for each method on ADNI dataset. Scale ranges from 0 to 33% of errors at each voxel, which corresponds to the maximal error rates observed.
Fig. 4Boxplots of DSC performance across Artifact and Scanner subgroups. a. DSC distributions with and without artifact. The box shows the median and the 25% and 75% percentiles. The whiskers indicate the distribution in function of the inter-quartile range. Orange boxplot and dots show data without strong artifact. Blue boxplot and dots show results with artifact. N artifact image = 10 and N without artifact = 50. b. DSC distributions for the different MRI scanners. The box shows the median and the 25% and 75% percentiles. The whiskers indicate the distribution in function of the inter-quartile range. Outliers are presented as black rhombus. Yellow Stars indicates a significant effect of scanner type on DSC variance. N = 15 per scanner. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)