| Literature DB >> 34724000 |
Gregory Kiar1, Yohan Chatelain2, Pablo de Oliveira Castro3, Eric Petit4, Ariel Rokem5, Gaël Varoquaux6, Bratislav Misic1, Alan C Evans1, Tristan Glatard2.
Abstract
The analysis of brain-imaging data requires complex processing pipelines to support findings on brain function or pathologies. Recent work has shown that variability in analytical decisions, small amounts of noise, or computational environments can lead to substantial differences in the results, endangering the trust in conclusions. We explored the instability of results by instrumenting a structural connectome estimation pipeline with Monte Carlo Arithmetic to introduce random noise throughout. We evaluated the reliability of the connectomes, the robustness of their features, and the eventual impact on analysis. The stability of results was found to range from perfectly stable (i.e. all digits of data significant) to highly unstable (i.e. 0 - 1 significant digits). This paper highlights the potential of leveraging induced variance in estimates of brain connectivity to reduce the bias in networks without compromising reliability, alongside increasing the robustness and potential upper-bound of their applications in the classification of individual differences. We demonstrate that stability evaluations are necessary for understanding error inherent to brain imaging experiments, and how numerical analysis can be applied to typical analytical workflows both in brain imaging and other domains of computational sciences, as the techniques used were data and context agnostic and globally relevant. Overall, while the extreme variability in results due to analytical instabilities could severely hamper our understanding of brain organization, it also affords us the opportunity to increase the robustness of findings.Entities:
Mesh:
Year: 2021 PMID: 34724000 PMCID: PMC8559953 DOI: 10.1371/journal.pone.0250755
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Exploration of perturbation-induced deviations from reference structural connectomes.
(A) The absolute deviations between connectomes, in the form of normalized percent deviation from reference. The difference in MCA-perturbed connectomes is shown as the across MCA series, and is presented relative to the variability observed across subsamples, sessions, and subjects. (B) The number of significant decimal digits in each set of connectomes as obtained by evaluating the complete distribution of networks. In the case of 16, values can be fully relied upon, whereas in the case of 1 only the first digit of a value can be trusted. Dense and sparse perturbations are shown on the left and right, respectively.
The impact of instabilities as evaluated through the discriminability of the dataset based on individual (or subject) differences, session, and subsample.
| Comparison | Chance | Target | Unscaled Ref. | Scaled Ref. | Dense MCA | Sparse MCA | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Det. | Prob. | Det. | Prob. | Det. | Prob. | Det. | Prob. | |||
| 0.04 | 1.0 | 0.64 | 0.65 | 0.82 | 0.82 | 0.82 | 0.82 | 0.77 | 0.75 | |
| 0.5 | 0.5 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.88 | 0.85 | |
| 0.5 | 0.5 | 0.99 | 1.00 | 0.71 | 0.61 | |||||
The performance is reported as mean discriminability. While a perfectly discriminable dataset would be represented by a score of 1.0, the chance performance, indicating minimal discriminability, is 1/the number of classes. H3 could not be tested using the reference executions due to too few possible comparisons. The alternative hypothesis, indicating significant discrimination, was accepted for all experiments, with p < 0.005 after correcting for multiple comparisons.
Fig 2Distribution and stability assessment of multivariate graph statistics.
(A, B) The cumulative distribution functions of multivariate statistics across all subjects and perturbation settings. There was no significant difference between the distributions in A and B. (C, D) The number of significant digits in the first 5 five moments of each statistic across perturbations. The dashed red line refers to the maximum possible number of significant digits.
Fig 3Variability in BMI classification across the sampling of an MCA-perturbed dataset.
The dashed red lines indicate random-chance performance, and the orange dots show the performance using the reference executions.