| Literature DB >> 26275383 |
David M Cash1, Chris Frost2, Leonardo O Iheme3, Devrim Ünay4, Melek Kandemir5, Jurgen Fripp6, Olivier Salvado6, Pierrick Bourgeat6, Martin Reuter7, Bruce Fischl7, Marco Lorenzi8, Giovanni B Frisoni9, Xavier Pennec8, Ronald K Pierson10, Jeffrey L Gunter11, Matthew L Senjem11, Clifford R Jack11, Nicolas Guizard12, Vladimir S Fonov12, D Louis Collins12, Marc Modat13, M Jorge Cardoso13, Kelvin K Leung14, Hongzhi Wang15, Sandhitsu R Das16, Paul A Yushkevich16, Ian B Malone14, Nick C Fox14, Jonathan M Schott14, Sebastien Ourselin17.
Abstract
Structural MRI is widely used for investigating brain atrophy in many neurodegenerative disorders, with several research groups developing and publishing techniques to provide quantitative assessments of this longitudinal change. Often techniques are compared through computation of required sample size estimates for future clinical trials. However interpretation of such comparisons is rendered complex because, despite using the same publicly available cohorts, the various techniques have been assessed with different data exclusions and different statistical analysis models. We created the MIRIAD atrophy challenge in order to test various capabilities of atrophy measurement techniques. The data consisted of 69 subjects (46 Alzheimer's disease, 23 control) who were scanned multiple (up to twelve) times at nine visits over a follow-up period of one to two years, resulting in 708 total image sets. Nine participating groups from 6 countries completed the challenge by providing volumetric measurements of key structures (whole brain, lateral ventricle, left and right hippocampi) for each dataset and atrophy measurements of these structures for each time point pair (both forward and backward) of a given subject. From these results, we formally compared techniques using exactly the same dataset. First, we assessed the repeatability of each technique using rates obtained from short intervals where no measurable atrophy is expected. For those measures that provided direct measures of atrophy between pairs of images, we also assessed symmetry and transitivity. Then, we performed a statistical analysis in a consistent manner using linear mixed effect models. The models, one for repeated measures of volume made at multiple time-points and a second for repeated "direct" measures of change in brain volume, appropriately allowed for the correlation between measures made on the same subject and were shown to fit the data well. From these models, we obtained estimates of the distribution of atrophy rates in the Alzheimer's disease (AD) and control groups and of required sample sizes to detect a 25% treatment effect, in relation to healthy ageing, with 95% significance and 80% power over follow-up periods of 6, 12, and 24months. Uncertainty in these estimates, and head-to-head comparisons between techniques, were carried out using the bootstrap. The lateral ventricles provided the most stable measurements, followed by the brain. The hippocampi had much more variability across participants, likely because of differences in segmentation protocol and less distinct boundaries. Most methods showed no indication of bias based on the short-term interval results, and direct measures provided good consistency in terms of symmetry and transitivity. The resulting annualized rates of change derived from the model ranged from, for whole brain: -1.4% to -2.2% (AD) and -0.35% to -0.67% (control), for ventricles: 4.6% to 10.2% (AD) and 1.2% to 3.4% (control), and for hippocampi: -1.5% to -7.0% (AD) and -0.4% to -1.4% (control). There were large and statistically significant differences in the sample size requirements between many of the techniques. The lowest sample sizes for each of these structures, for a trial with a 12month follow-up period, were 242 (95% CI: 154 to 422) for whole brain, 168 (95% CI: 112 to 282) for ventricles, 190 (95% CI: 146 to 268) for left hippocampi, and 158 (95% CI: 116 to 228) for right hippocampi. This analysis represents one of the most extensive statistical comparisons of a large number of different atrophy measurement techniques from around the globe. The challenge data will remain online and publicly available so that other groups can assess their methods.Entities:
Mesh:
Year: 2015 PMID: 26275383 PMCID: PMC4634338 DOI: 10.1016/j.neuroimage.2015.07.087
Source DB: PubMed Journal: Neuroimage ISSN: 1053-8119 Impact factor: 6.556
Demographics of MIRIAD subjects.
| Group | n | Age | Gender | Baseline MMSE |
|---|---|---|---|---|
| Control | 23 | 69.7 ± 7.2 | 52%M/48%F | 29.4 ± 0.8 |
| AD | 46 | 69.4 ± 7.1 | 41%M/59%F | 19.2 ± 4 |
Summary of submissions in the MIRIAD atrophy challenge.
| Research centre | Submissions | Bias correction | Inter-subject registration | Standard/groupwise space | ROI method | Longitudinal registration | Image change measure |
|---|---|---|---|---|---|---|---|
| Bahçeşehir University | BAUMIP | SPM5 ( | SPM5 | SPM5 | SPM5, ALVIN ( | N/A | N/A |
| Brain image analysis | IOWA | N4 ( | BRAINS ( | In-house template | Tissue seg, ANN | N/A | N/A |
| CSIRO | CSIRO | Tissue seg ( | NiftyReg | Within-subject | Tissue seg, multi-atlas ( | N/A | N/A |
| Harvard MGH | FS_ORIG, FS_BETA | N3 ( | Robust inverse consistent ( | Within-subject ( | Atlas ( | N/A | N/A |
| INRIAa | INRIA | N3 | Demons-LCC ( | ADNI 200 HC | Loose regions | Demons-LCC | Regional flux analysis ( |
| Mayo Clinic | MAYO, MAYO_BSIc, MAYO_TBM | N3/SPM5 | NiftyReg | ADNI 200 HC + 200 AD | SPM5, Seg Prop | NiftyReg 9DOF (BSI), SyN (TBM) | BSI ( |
| Montreal Neurologic Institute | MNI | ANIMAL ( | ICBM152 | Within-subject template | Patch-based ( | N/A | N/A |
| University College London | UCL, UCL_BSI | N3 | NiftyReg ( | Challenge data and template library | Multi-atlas Seg Prop ( | NiftyReg ( | DBC, Symmetric BSI ( |
| University of Pennsylvaniab | UPENN, UPENN_DBM | FLIRT ( | Within-subject | Multi-atlas seg propd ( | SyN ( | Mesh-based (half-way space) ( |
a INRIA provided submissions only for the lateral ventricles and hippocampi, b University of Pennsylvania only provided submissions for the hippocampi, c the MAYO_BSI submission only included whole brain and lateral ventricle atrophy, and d template for multi atlas segmentation propagation in UPENN technique consisted of 30 randomly selected ADNI.
Fig. 1Baseline volumes of each method for (a) whole brain, (b) lateral ventricles, (c) left hippocampus and (d) right hippocampus for all groups. Red squares indicate the AD patient group and blue circles indicate the control group for each technique.
Fig. 2Back to back and two week repeatability measures for all four regions. All measures are provided in terms of % difference from baseline. Red squares = AD, blue circles = controls. Diamond markers represent truncation of the confidence interval if out of range.
Median (95% CI) symmetry differences by structure, group, and technique. These differences are between the forward and backward atrophy, divided by the average measures of atrophy. The UCL and INRIA measures are designed to be symmetric: thus there are no differences and they were excluded.
| Group | Brain | Ventricle | Left hippocampus | Right hippocampus | ||||
|---|---|---|---|---|---|---|---|---|
| HC | AD | HC | AD | HC | AD | HC | AD | |
| Mayo_BSI | 0.0% (0.0, 0.0) | 0.0% (− 0.2, 0.0) | 0.0% (0.0, 0.0) | 0.0% (− 0.1, 0.0) | N/A | N/A | N/A | N/A |
| Mayo_TBM | − 0.9% (− 3.7, 1.4) | − 1.4% (− 2.3, − 0.5) | 0.0% (− 1.1, 2.7) | 0.0% (− 0.4, 0.4) | − 6.0% (− 11.1, 1.7) | − 0.3% (− 2.4, 1.6) | 2.3% (− 4.7, 5.2) | − 1.2% (− 3.5, 1.8) |
| UPenn_DBM | N/A | N/A | N/A | N/A | 9.6% (− 10.4, 54.4) | 3.3% (− 1.1, 8.0) | − 13.1% (− 56.5, 49.2) | − 8.5% (− 19.4, 9.9) |
Median (95% CI) transitivity differences by structure, group, and technique. Transitivity differences are defined to be the difference between the two 12 month atrophy measures (one coming from summing baseline and 6 months to 6 months and 12 months, and the other coming from the direct baseline to 12 months), divide by the average of these two atrophy measures.
| Group | Brain | Ventricle | Left hippocampus | Right hippocampus | ||||
|---|---|---|---|---|---|---|---|---|
| HC | AD | HC | AD | HC | AD | HC | AD | |
| INRIA | N/A | N/A | − 1.7% (− 3.4, − 0.3) | − 0.8% (− 1.4, 0.1) | 1.7% (− 3.1, 8.8) | 0.0% (− 1.0, 2.3) | − 0.1% (− 4.2, 5.7) | − 1.8% (− 2.9, − 0.6) |
| Mayo_ BSI | 0.0% (− 0.3, 1.5) | 0.0% (0.0, 0.0) | − 0.2% (− 0.6, 0.6) | 0.2% (0.1, 0.4) | N/A | N/A | N/A | N/A |
| Mayo_ TBM | 1.4% (− 2.3, 5.2) | − 2.2% (− 3.6, − 0.8) | 0.9% (− 1.7, 2.0) | − 3.5% (− 3.8, − 2.2) | − 3.5% (− 33.8, 23.6) | − 1.5% (− 5.4, 10.2) | 19.5% (− 3.8, 50.1) | − 3.2% (− 11.5, 12.5) |
| UCL_ BSI | 0.5% (− 0.1, 1.4) | 0.2% (− 0.2, 0.4) | 0.1% (− 0.2, 0.2) | 0.0% (0.0, 0.2) | 14.9% (− 8.5, 35.4) | 0.2% (− 0.8, 5.0) | 0.8% (− 8.3, 17.0) | − 1.3% (− 3.7, 6.1) |
| UPenn_DBM | N/A | N/A | N/A | N/A | − 4.1% (− 64.0, 50.6) | − 2.8% (− 11.2, 2.9) | − 1.6% (− 42.0, 64.2) | − 9.8% (− 25.2, 4.9) |
Fig. 3Mean atrophy rates with 95% confidence intervals estimated (left) from a single pair of scans 12 months apart and (right) from all available data using statistical linear mixed models.
Fig. 4An illustration of the extent to which the statistical models fit the data, using one technique for each structure and selected techniques as exemplars. For each of the sixty-three scan pairs (sixty-six less the three same-day scan pairs) the empirical mean rates are contrasted with those predicted by the linear mixed model.
Between (σ2) and within (σ2) components of variance over one year (95% CI) in the AD subjects by structure and technique. These can be used to compute the total variance of rates of change over any follow-up time of t years (σ2 + σ2/t2).
| Brain | Ventricle | Left hippocampus | Right hippocampus | |||||
|---|---|---|---|---|---|---|---|---|
| Between subject variance (%/year)2 | Within subject variance (%/year)2 | Between subject variance (%/year)2 | Within subject variance (%/year)2 | Between subject variance (%/year)2 | Within subject variance (%/year)2 | Between subject variance (%/year)2 | Within subject variance (%/year)2 | |
| BAUMIP[I] | 1.88 (1.34, 2.67) | 1.14 (0.78, 1.70) | 19.21 (11.62, 34.01) | 3.58 | 0.17 | 38.57 | 21.57 | 48.78 |
| CSIRO[I] | 0.21 (0.12, 0.35) | 0.54 (0.41, 0.71) | 14.88 (10.46, 22.45) | 2.85 | 3.27 | 9.59 | 3.03 | 7.77 |
| FS51ORIG[I] | 0.81 (0.46, 1.62) | 2.53 (1.60, 5.29) | 14.85 (10.30, 23.58) | 3.04 | 3.65 | 5.56 | 4.47 | 7.12 |
| FS52BETA[I] | – | – | 14.36 (9.93, 23.26) | 3.00 | 3.42 | 6.15 | 4.90 | 6.55 |
| INRIA[D] | – | – | 3.18 (2.28, 4.92) | 0.82 | 5.84 | 1.92 | 5.07 | 2.30 |
| IOWA[I] | 0.77 (0.46, 1.24) | 1.17 | 10.41 (6.79, 17.46) | 3.84 | 12.26 | 19.62 | 19.24 | 15.90 |
| MAYO[I] | 1.29 (0.90, 1.90) | 0.86 | 14.54 (10.12, 22.28) | 3.57 | 4.90 | 4.37 | 10.4 | 7.78 |
| MAYO_BSI[D] | 0.96 (0.53, 1.63) | 1.46 | 13.86 (9.55, 21.08) | 4.31 | – | – | – | – |
| MAYO_TBM[D] | 1.01 (0.67, 1.58) | 0.58 | 12.36 (8.51, 19.51) | 2.96 | 2.19 | 1.92 | 2.32 | 1.92 |
| MNI[I] | 0.41 (0.21, 0.66) | 0.65 | 14.43 (9.96, 23.54) | 3.12 | 1.94 | 5.40 | 2.67 | 5.54 |
| UCL[I] | 0.40 (0.18, 0.76) | 1.08 | 17.17 (12.03, 26.46) | 3.94 | 4.24 | 12.00 | 4.60 | 10.05 |
| UCL_BSI[D] | 0.47 (0.32, 0.72) | 0.19 | 15.52 (10.91, 23.61) | 2.80 | 10.00 | 6.93 | 8.50 | 7.51 |
| UPENN[I] | – | – | – | – | 1.65 | 2.53 | 1.97 | 1.98 |
| UPENN_DBM[D] | – | – | – | – | 1.64 | 2.61 | 1.83 | 2.10 |
Required total sample sizes (both groups combined) for clinical trials assuming that a putative treatment can reduce the excess atrophy rate (over and above that seen in healthy controls) by 25% without altering variability. Calculations assume that the trial will have 80% statistical power to detect a treatment effect using a conventional two-sided significance level of 5%. Results shown in bold and underlined purple are the best for each structure and time interval. Those underlined and shown in green are not statistically significantly worse than best.
Fig. 5Sample region delineation from MIRIAD atrophy challenge subject 220A. Each column represents a submission and the rows show a different region outlined in green (from top to bottom): whole brain, lateral ventricles, and hippocampi. In one case, INRIA, only a probabilistic mask was used, and this is shown with colour overlays.
Fig. 6Transitivity plots comparing 12 month atrophy measures in the hippocampi. Blue points indicate controls and red points AD patients. Different point glyphs are used to distinguish methods. A dashed line at y = x is present to indicate where a perfectly transitive measure would be located.