Literature DB >> 28643354

Diffusion MRI microstructure models with in vivo human brain Connectome data: results from a multi-group comparison.

Uran Ferizi^1,2,3, Benoit Scherrer⁴, Torben Schneider^3,5, Mohammad Alipoor⁶, Odin Eufracio⁷, Rutger H J Fick⁸, Rachid Deriche⁸, Markus Nilsson⁹, Ana K Loya-Olivas⁷, Mariano Rivera⁷, Dirk H J Poot¹⁰, Alonso Ramirez-Manzanares⁷, Jose L Marroquin⁷, Ariel Rokem^11,12, Christian Pötter¹², Robert F Dougherty¹², Ken Sakaie¹³, Claudia Wheeler-Kingshott³, Simon K Warfield⁴, Thomas Witzel¹⁴, Lawrence L Wald¹⁴, José G Raya², Daniel C Alexander¹.

Abstract

A large number of mathematical models have been proposed to describe the measured signal in diffusion-weighted (DW) magnetic resonance imaging (MRI). However, model comparison to date focuses only on specific subclasses, e.g. compartment models or signal models, and little or no information is available in the literature on how performance varies among the different types of models. To address this deficiency, we organized the 'White Matter Modeling Challenge' during the International Symposium on Biomedical Imaging (ISBI) 2015 conference. This competition aimed to compare a range of different kinds of models in their ability to explain a large range of measurable in vivo DW human brain data. Specifically, we assessed the ability of models to predict the DW signal accurately for new diffusion gradients and b values. We did not evaluate the accuracy of estimated model parameters, as a ground truth is hard to obtain. We used the Connectome scanner at the Massachusetts General Hospital, using gradient strengths of up to 300 mT/m and a broad set of diffusion times. We focused on assessing the DW signal prediction in two regions: the genu in the corpus callosum, where the fibres are relatively straight and parallel, and the fornix, where the configuration of fibres is more complex. The challenge participants had access to three-quarters of the dataset and their models were ranked on their ability to predict the remaining unseen quarter of the data. The challenge provided a unique opportunity for a quantitative comparison of diverse methods from multiple groups worldwide. The comparison of the challenge entries reveals interesting trends that could potentially influence the next generation of diffusion-based quantitative MRI techniques. The first is that signal models do not necessarily outperform tissue models; in fact, of those tested, tissue models rank highest on average. The second is that assuming a non-Gaussian (rather than purely Gaussian) noise model provides little improvement in prediction of unseen data, although it is possible that this may still have a beneficial effect on estimated parameter values. The third is that preprocessing the training data, here by omitting signal outliers, and using signal-predicting strategies, such as bootstrapping or cross-validation, could benefit the model fitting. The analysis in this study provides a benchmark for other models and the data remain available to build up a more complete comparison in the future.

Entities: Chemical

Keywords: Connectome; brain microstructure; diffusion MRI; fornix; genu; model selection

Mesh：

Year: 2017 PMID： 28643354 PMCID： PMC5563694 DOI： 10.1002/nbm.3734

Source DB: PubMed Journal: NMR Biomed ISSN： 0952-3480 Impact factor: 4.044

computerized tomography cross‐validation diffusion basis function diffusion function diffusion tensor diffusion tensor imaging diffusion‐weighted elastic net International Symposium on Biomedical Imaging Linear Acceleration of Sparse and Adaptive Diffusion Dictionary least‐squares magnetic resonance imaging principal diffusion directions restriction spectrum imaging region of interest sparse fascicle model signal‐to‐noise ratio sum of squared errors echo time white matter

INTRODUCTION

Diffusion‐weighted (DW) magnetic resonance imaging (MRI) can provide unique insights into the microstructure of living tissue and is increasingly used to study the microanatomy and development of normal functioning tissue as well as its pathology. Mathematical models for analysis and interpretation have been crucial for the development and translation of DW‐MRI. Even though diffusion tensor imaging (DTI),1 which is based on a simple Gaussian model of the DW‐MRI signal, has shown promise in clinical applications,2 e.g. Alzheimer's disease,3 multiple sclerosis4 or brain tumors,5 a much wider variety of DW‐MRI models has been proposed to extract more information from the DW signal. Models generally fall between two extremes: ‘models of the tissue’ and ‘models of the signal’. Models of the tissue6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 describe the underlying tissue microstructure in each voxel explicitly with a multi‐compartment approach.18, 19, 20 Models of the signal focus on describing the DW signal attenuation without describing the underlying tissue composition that gives rise to the signal explicitly.21, 22, 23, 24, 25, 26, 27, 28, 29 Other approaches fall between these two classes and include some features of the tissue, such as the distribution of fibre orientations, but often describe the signal from individual fibres without modelling the fibre composition explicitly.30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 Despite this explosion of DW‐MRI models, a broad comparison on a common dataset and within a common evaluation framework is lacking, so little is understood about which models are more plausible representations or explanations of the signal. Panagiotaki et al.18 established a taxonomy of diffusion compartment models and compared 47 of them using data from the fixed corpus callosum of a rat acquired on a pre‐clinical system. Later, Ferizi et al.39 performed a similar experiment using data from a live human subject, while Ferizi et al.41, 42 explored a different class of models, which aim to capture fibre dispersion. Rokem et al.43 compared two classes of models using cross‐validation and test–retest accuracy. All these studies18, 43, 44 aim to evaluate variations with specific classes of models with all other variables of the parameter estimation pipeline (i.e. noise model, fitting routine, etc.) fixed. While this provides fundamental insight into which compartments are important in compartment models, questions remain about the broader landscape of models; in particular, which classes of models explain the signal best and how strongly performance depends on the choice of parameter‐estimation procedure. Publicly organized challenges provide a unique opportunity to bring a research community together to gain a quantitative and unbiased comparison of a diverse set of methods applicable to a particular data‐processing task. Such publicly organized challenges have helped to establish a common ground for the evaluation of competing methods in a variety of imaging‐related tasks, e.g. in brain MR image registration45 and segmentation.46 In DW‐MRI, public challenges have focused on recovering synthetic intra‐voxel fibre configurations47 or evaluating tractography techniques48, 50 and have been very successful at driving research and translation forward. Another interesting comparison of reconstruction methods using DW‐MRI data was based on the signal acquired from a physical phantom.49 Here we report on such a community‐wide challenge to model the variation of DW‐MRI signals at the voxel level in the in vivo human brain. Modelling the diffusion signal is a key step in realizing practical and reliable quantitative imaging techniques based on diffusion MRI. The challenge in the area is to extract the salient features from the diffusion signal and relate them to the principal features of the underlying tissue (e.g. in the case of brain white matter (WM) the fibre orientation, axonal packing and axonal size). Three distinct questions arise. Given the richest possible dataset that samples the space of achievable measurements as widely as possible, which mathematical model can capture best the intrinsic variation of the acquired signal? Which tissue features can be derived from the model? What subset of those features can we estimate from limited acquisition time on a standard clinical scanner and what dataset best supports such estimates? The intuition gained from (i) is generalizable over a wide range of applications, while (ii) and (iii) are highly dependent on the MRI study design and the available hardware. Therefore, our challenge focuses on question (i), as an understanding of (i) is necessary to inform (ii) and (iii). To that end, we acquire the richest possible dataset using the most powerful hardware available and the most motivated subject available (UF). Specifically, we use the Connectome scanner,51 which is unique among human scanners in having 300 mT/m gradients, rather than 40 mT/m as is typical of state‐of‐the‐art human scanners. Preclinical work by Dyrby et al.13 highlights the benefits of such strong gradients and the first results from the Connectome scanner42, 52, 53, 54 are now starting to verify those findings. This kind of model comparison, based on prediction error, is a common and crucial part of the development of any statistical model‐based estimation applications. Burnham and Anderson55 explain how and why such comparisons should be performed to reject models that are theoretically plausible but not supported by the data. To that end, we used a uniquely rich dataset acquired on the Connectome system42 composed of around 5000 points in q space with, for each shell, a unique combination of gradient strength, diffusion time, pulse width and echo time. This offers the opportunity for the comparison of the many different types of models within a common framework, over a very wide range of measurement space. Using this rich dataset, we organized the White Matter Modeling challenge, held during the 2015 International Symposium on Biomedical Imaging (ISBI) in New York. The goal of the challenge was to evaluate and compare the models in two different tissue configurations that are common in the brain: (1) a WM region of interest where fibres are relatively straight and parallel, specifically the genu of the corpus callosum; and (2) a region in which the fibre configuration is more complex, specifically the fornix. Challenge participants had access to three‐quarters of each whole dataset; the participating models were evaluated on how well they predicted the remaining ‘unseen’ part of the data. As announced before the challenge, the final ranking was based exclusively on the performance on the genu data. In this article, however, we include results from both the genu and the fornix. The article is organized as follows. We first describe in section 2 the experimental protocol, data post‐processing and preparation of the training and testing data for the challenge. We then present the methods for ranking the models and tabulate the various models involved in the competition succinctly. We report the challenge results in section 3 and discuss these results in section 4; a more detailed description of the models follows in the Appendix.

MATERIAL AND METHODS

The complete experiment protocol

One healthy volunteer was scanned over two non‐stop 4 h sessions. The imaged volume comprised twenty 4 mm thick whole‐brain sagittal slices covering the corpus callosum left–right. The image size was 110×110 and the in‐plane resolution 2×2 mm2. 45 unique and evenly distributed diffusion directions (taken from http://www.camino.org.uk) were acquired for each shell, with both positive and negative polarities; these directions were the same in each shell. We also included 10 interleaved b=0 measurements, leading to a total of 100 measurements per shell. Each shell had a unique combination of Δ={22,40,60,80,100,120} ms, δ={3,8} ms and |G|={60,100,200,300} mT/m (see Table 1). The measurements were randomized within each shell, whereas the gradient strengths were interleaved. We inspected the images visually and did not observe any obvious shifts from gradient heating. The minimum possible echo time (TE) for each gradient duration and diffusion time combination was chosen to enhance signal‐to‐noise ratio (SNR) and potential estimation of compartment‐specific relaxation constants. The SNR of b=0 images was 35 at TE = 49 ms and 6 at TE = 152 ms. The SNR was computed by assessing the signal mean and noise variance across the selected WM voxels on multiple b=0 images. In both cases these estimates matched reasonably well. More details about the acquisition protocol can be found in Ferizi et al.42

Table 1

The scanning protocol used, acquired in ∼8 hours over two non‐stop sessions. The protocol has 48 shells, each with 45 unique gradient directions (‘blip‐up‐blip‐down’)

Acquisition Protocol
	δ=3 m s					δ=8 m s
	Δ	TE	\|G\|	b		Δ	TE	\|G\|	b
Nr	(ms)	(ms)	(mT/m)	(s/mm ² )	Nr	(ms)	(ms)	(mT/m)	(s/mm ² )
1	22	49	61	50	25	22	58	58	300
2	22	49	86	100	26	22	58	95	800
3	22	49	192	500	27	22	58	190	3,200
4	22	49	285	1,100	28	22	58	275	6,700
5	40	67	63	100	29	40	72	59	600
6	40	67	100	250	30	40	72	100	1,700
7	40	67	200	1,000	31	40	72	200	6,850
8	40	67	289	2,100	32	40	72	292	14,550
9	60	87	63	150	33	60	92	34	300
10	60	87	103	400	34	60	92	100	2,650
11	60	87	199	1,500	35	60	92	200	10,500
12	60	87	290	3,200	36	60	92	292	22,350
13	80	107	63	200	37	80	112	61	1,300
14	80	107	99	500	38	80	112	100	3,550
15	80	107	201	2,050	39	80	112	200	14,150
16	80	107	291	4,300	40	80	112	292	30,200
17	100	127	63	250	41	100	132	60	1,600
18	100	127	101	650	42	100	132	100	4,450
19	100	127	200	2,550	43	100	132	200	17,850
20	100	127	291	5,400	44	100	132	292	38,050
21	120	147	63	300	45	120	152	60	1,950
22	120	147	99	750	46	120	152	100	5,350
23	120	147	199	3,050	47	120	152	200	21,500
24	120	147	291	6,500	48	120	152	292	45,900

Note: We provide signal for the parts of protocol marked in black. In red is the protocol for which the signal needs to be predicted.

The scanning protocol used, acquired in ∼8 hours over two non‐stop sessions. The protocol has 48 shells, each with 45 unique gradient directions (‘blip‐up‐blip‐down’) Note: We provide signal for the parts of protocol marked in black. In red is the protocol for which the signal needs to be predicted.

Post‐processing

All post‐processing was performed using Software Library (FSL).56 The DW images were corrected for eddy current distortions separately for each combination of δ and Δ using FSL's E d d y module (www.fmrib.ox.ac.uk/fsl/eddy) with its default settings. The images were then co‐registered using FSL's F n i r t package. As the 48 shells were acquired across a wide range of TEs, over two days, we chose to proceed in two steps. First, within each quarter of the dataset (different day, different δ) we registered all the b=0 images together. We then applied these transformations to their intermediary DW images, using a trilinear resampling interpolation. The second stage involved co‐registering the four different quarters. To help the co‐registration, especially between the two days images that required some through‐plane adjustment as well, we omitted areas of considerable eddy‐current distortions by reducing the number of slices from 20 to 5 (i.e. leaving two images either side of the mid‐sagittal plane) and reducing the in‐plane image size to 75×80.

Training and testing data

The data for this work originated from two regions of interest (ROIs), each containing 6 voxels (see Figure 1). The first ROI was selected in the middle of the genu in the corpus callosum, where the fibres are mostly straight and coherent. The second ROI's fibre configuration is more complex: it lies in the body of fornix, where two bundles of fibres bend and bifurcate.

Figure 1

We only consider two ROIs, each containing six voxels from the genu in the corpus callosum, where the fibres are approximately straight and parallel, and from the fornix, where the configuration of fibres is more complex The dataset was split into two parts: the training dataset and the testing dataset. The training dataset was fully available for the challenge participants. The testing dataset was retained by one of the organizers (UF). The DW signal of the training dataset (36 shells, with acquisition parameters shown in black in Table 1) was provided together with the gradient scheme on the challenge website (http://cmic.cs.ucl.ac.uk/wmmchallenge/). This data was used by the participants to estimate their DW‐MRI model parameters. The signal attenuation in the testing dataset (12 shells, with acquisition parameters shown in red in Table 1) was kept unseen. It contained one shell, chosen at random, from each TE‐specific set of four shells (i.e of the same combination of δ and Δ). The challenge participants were then asked to predict the signal for the corresponding gradient scheme. They were free to use as much or as little of the training data provided as they wished to predict the signal of the test dataset for the six voxels in each ROI. Figure 2 shows the DW signal attenuation for each shell in the genu dataset, with stars in the legend indicating which shells were left out for testing. In this plot, a small number of data appear as ‘outliers’ (two such data are shown with arrows in the bottom‐left subplot of Figure 2). Specifically, we counted about 10 of them among more than 4812 measurements, most of them being in the b=300 s/mm2 shell. Since these outliers appear to be specific to the b=300 s/mm2 shell and are not in other shells with similar b value, we attribute them to a momentary twitching of the subject rather than more systematic effects, such as perfusion.

Figure 2

Diffusion‐weighted signal from the genu ROI, averaged over the six voxels. Across each column and row, the signal pertains to one of the gradient strengths or pulse times δ used; in each subplot, the six shells shown in different colours are Δ‐specific, increasing in value (22, 40, 60, 80, 100, 120 ms) from top to bottom. Inside the legend, the b value is in s/mm2 units; here, the HARDI shells kept for testing are those marked with a star; the remaining shells comprise the training data. On the x‐axis is the cosine of the angle between the applied diffusion gradient vector G and the fibre direction n. Some models in this study omit data outliers; two such data points are shown in the bottom‐left subplot with vertical arrows — obviously each model has its own criteria for determining the outliers

Figure 3

Diffusion‐weighted signal from the fornix ROI, averaged over the six voxels. The legend's b value is in s/mm2 units. Testing shells are marked with a star. On the x‐axis is the cosine of the angle between the applied diffusion gradient vector G and the fibre direction n

Model ranking

Models were evaluated and ranked based on their ability to predict the unseen DW signal accurately. Specifically, the metric used was the sum of square differences between the hidden signal and the predicted signal, corrected for Rician noise:57 where N is the number of measurements, is the ith measured signal, S its prediction from the model and σ the noise standard deviation.

Competing models

Here we give a short summary of the competing models. Additionally, Table 2 provides a summary of their key characteristics. More details are included in the Appendix.

Table 2

Summary of the various diffusion models evaluated. Tissue models are models that include an explicit description of the underlying tissue microstructure with a multi‐compartment approach. In contrast, signal models focus on describing the DW signal attenuation without explicitly describing the underlying tissue and instead correspond to a ‘signal processing’ approach

	Type of model	Nb of free param. (genu/fornix)	Models effect of δ and Δ	Noise assumption	Optimization algorithm	Outliers strategy	Special signal prediction strategy
R–Manzanares	Tissue	N/A	Yes	Gaussian	weighted‐LS	Yes	CV
					bootstrapping
Nilsson	Tissue	< 12/12	Yes	Gaussian	LM	Yes	CV
Scherrer	Tissue	10/16	No	Gaussian	Bobyqa	Yes	No
Ferizi_1	Tissue	< 12/12	Yes	approx.‐Rician	LM	No	No
Ferizi_2	Tissue	< 10/10	Yes	approx.‐Rician	LM	No	No
Alipoor	Signal	17/17	No	Gaussian	weighted‐LS	Yes	No
Sakaie	Signal	N/A	No	Gaussian	nonlinear‐LS	Yes	No
Rokem	Tissue	∼20	No	Gaussian	Elastic net	No	CV
				+ Noise floor
Eufracio	Tissue	7/7	No	Gaussian	bounded‐LS	No	No
					Lasso, Ridge
Loya‐Olivas_1	Tissue	11	No	Gaussian	bounded‐LS	No	No
					& Lasso
Loya‐Olivas_2	Tissue	11	No	Gaussian	bounded‐LS	No	No
Poot	Signal	103	No	Rician	LM‐like	No	No
Fick	Signal	475	Yes	Gaussian	Laplacian‐ reg‐LS	No	partial‐CV
Rivera	Signal	23	Yes	Gaussian	Weighted Lasso	Yes	CV

Abbreviations: LS=least‐squares, LM=Levenberg–Marquardt, CV=cross‐validation, reg=regularized

Ramirez‐Manzanares: a dictionary‐based technique that accounts for multiple fibre bundles and models the distribution of tissue properties (axon radius, parallel diffusivity) and the orientation dispersion of fibres. Nilsson: a multi‐compartment model that models isotropic, hindered and restricted diffusion and accounts for varying (T 1, T 2) relaxation times for each compartment.58 Scherrer a multi‐compartment model in which each compartment is modelled by a statistical distribution of 3‐D tensors.16 Ferizi two three‐compartment models that account for varying T 2 relaxation times for each compartment. As regards the intracellular compartment, Ferizi1 models the orientation dispersion by using dispersed sticks as one compartment; Ferizi2 uses a single radius cylinder instead.42 Poot: a three‐compartment model comprising an isotropic diffusion compartment, a tensor compartment and a model‐free compartment in which an Apparent Diffusion Coefficient (ADC) is estimated for each direction independently. T 2 relaxation times are also estimated for each compartment.59 Rokem: a combination of the sparse fascicle model43 with restriction spectrum imaging60 that describes the signal arising from a multi‐compartment model in a densely sampled spherical grid, using L1 regularization to enforce sparsity. Eufracio: an extension of the Diffusion Basis Function (DBF) model that accounts for multiple b‐value shells. Loya‐Olivas two models based on the Linear Acceleration of Sparse and Adaptive Diffusion Dictionary (LASADD) technique. Loya‐Olivas1 uses the DBF signal model, while Loya‐Olivas2 uses a three‐compartment tissue model. The optimization uses linearized signal models to speed up computation and sparseness constraints to regularize. Alipoor: a model of fourth‐order tensors, corrected for T 2‐relaxation across different shells. A robust LS fitting was applied to mitigate influence of outliers. Sakaie: a two‐compartment model of restricted and hindered diffusion with angular variation. A simple exclusion scheme based on the b=0 signal intensity was applied to remove outliers. Fick: a spatio‐temporal signal model to represent 3‐D diffusion signal simultaneously over varying diffusion time. Laplacian regularization was applied during the fitting.61 Rivera: a regularized linear regression model of diffusion encoding variables. This is intentionally built as a simplistic model to be used as a baseline for model comparison. Summary of the various diffusion models evaluated. Tissue models are models that include an explicit description of the underlying tissue microstructure with a multi‐compartment approach. In contrast, signal models focus on describing the DW signal attenuation without explicitly describing the underlying tissue and instead correspond to a ‘signal processing’ approach Abbreviations: LS=least‐squares, LM=Levenberg–Marquardt, CV=cross‐validation, reg=regularized While the challenge organizers also had competing models (Ferizi1, Ferizi2 and Scherrer), only Ferizi had access to the hidden data. The hidden data were never used to tune the results of his models.

RESULTS

Figure 4 shows the averaged prediction error in each ROI (top subplot is for the genu, bottom subplot is for the fornix) and the corresponding overall ranking of the participating models in the challenge. The first six models in the genu ranking performed similarly, each higher ranked model marginally improving on the prediction error. The prediction error clearly increased at a higher rate for the subsequent models. In the fornix dataset, the prediction error was higher than in the genu. For both datasets, the first six models were the same, albeit permuted. Most of the models performed similarly in terms of ranking in both genu and fornix cases, i.e. Nilsson (second in genu/first in fornix), Scherrer (third/second) and Ferizi_2 (fourth/fourth). Others performed significantly better in one of the cases, with Ramirez‐Manzanares (first/sixth) being the most notable.

Figure 4

Overall ranking of models by sum‐of‐squared‐errors (SSE) metric over all voxels in genu (top) and fornix (bottom) ROIs. The colors represent different ranges of b‐value shells

Overall ranking of models by sum‐of‐squared‐errors (SSE) metric over all voxels in genu (top) and fornix (bottom) ROIs. The colors represent different ranges of b‐value shells Figure 4 also details the prediction error for different ranges of b values in the unseen dataset. Models inevitably vary in their prediction capabilities; some models perform better within a given b‐value range but are penalized more in another. Across the models, as the figure shows, the ranking between models was dominated by the signal prediction accuracy for b values between 750 and 1400s/mm2; specifically, the shell that has the largest weight on this error is the b=1100 s/mm2 one. The top‐ranking models, nevertheless, were better at predicting the signal for higher b‐value images as well. The prediction performance of lower b‐value images (<750s/mm2) in the genu was less consistent across ranks. For example, the models of Rokem and Sakaie outperformed most of the higher ranking models in this low b‐value range. The fornix is a more complex region than the genu, hence the performance across the shells is less consistent. In the fornix, the prediction errors were generally larger than in the genu across all b values for all models, except Rivera's, which showed the opposite effect. The prediction errors of the b=0 images were also larger than in the genu, especially for the highly ranked models of Poot and Ferizi. The prediction errors in other b‐value shells followed the overall ranking of the models more closely. Figure 5 shows the prediction error for each voxel independently. In the genu plot, the best performing models had high consistency of low prediction errors across all individual voxels. These were followed by the models with consistent larger prediction error in all voxels. Most of the lowest ranking models not only had largest prediction errors, they also showed large variations in prediction performance. For example, while the model of Loya‐Olivas2 was competitive in voxel 5, it ranked low due to large prediction errors in voxels 4 and 6. The results in the fornix show a lower consistency of prediction errors between the voxels than in the genu. Specifically, two voxels (3 and 4) showed substantially larger prediction errors and were likely responsible for much of the overall ranking.

Figure 5

Sum‐of‐squared‐errors (SSE) per voxel for each model in genu and fornix. The size of rectangles represent the SSE value per voxel

Sum‐of‐squared‐errors (SSE) per voxel for each model in genu and fornix. The size of rectangles represent the SSE value per voxel Finally, we report in Figures 6, 7, 8 and 9 an illustration of the quality of fit of each model to four representative shells, including the b=1100s/mm2 shell mentioned above; Figures 6 and 7 concern the genu data and Figures 8 and 9 are for fornix data.

Figure 6

Figure 7

Genu signal for the second group of 14 models. Raw testing data are shown by blue stars, while red circles denote the model‐predicted data. The x‐axis is the cosine of the angle between G and n

Figure 8

Fornix signal for the group consisting of the best 7 from 14 models. We show only four (of twelve) representative shells; these are shown by blue stars, while red circles denote model‐predicted data. The best models are listed first. The x‐axis is the cosine of the angle between G and n

Figure 9

Fornix signal for the second group of 14 models. Raw testing data are shown by blue stars, while red circles denote the model‐predicted data. The x‐axis is the cosine of the angle between G and n

Genu signal for the group consisting of the best seven from 14 models. We show only four (of twelve) representative shells; these are shown by blue stars, while red circles denote the model‐predicted data. The best models are listed first. The x‐axis is the cosine of the angle between G and n Genu signal for the second group of 14 models. Raw testing data are shown by blue stars, while red circles denote the model‐predicted data. The x‐axis is the cosine of the angle between G and n Fornix signal for the group consisting of the best 7 from 14 models. We show only four (of twelve) representative shells; these are shown by blue stars, while red circles denote model‐predicted data. The best models are listed first. The x‐axis is the cosine of the angle between G and n Fornix signal for the second group of 14 models. Raw testing data are shown by blue stars, while red circles denote the model‐predicted data. The x‐axis is the cosine of the angle between G and n

DISCUSSION

The challenge set out to compare the ability of various kinds of models to predict the diffusion MR signal from WM over a very wide range of measurement parameters – exploring the boundaries of possible future quantitative diffusion MR techniques. The 14 challenge entries were a good representation of the many available models that are proposed in the literature. The acquired data aimed to cover the broadest spectrum of experimental parameters possible. The participating models use a variety of fitting routines and modelling assumptions, providing additional insight into the effects of algorithmic and modelling choices during parameter estimation. Although the set of methods tested is not sufficient to make a full comparison of each independent feature (diffusion model, noise model, fitting routine, etc.) and the number of combinations prohibits an exhaustive comparison, the results of the challenge do reveal some important trends. In contrast with earlier model comparisons,18, 43, 44 the results provide new insight into which broad classes of model explain the signal best and what features of the estimation procedure are important. This information is very timely, as recent model‐based diffusion MRI techniques, such as NODDI,15 SMT,17, 40 DIAMOND,16 DKI62 and LEMONADE,63 are starting to become widely adopted in clinical studies and trials. Despite their success, intense debate continues in the field about applicability of different models and fitting routines.64, 65 The insights from this challenge provide key pointers to the important features of the next‐generation of front‐line imaging techniques of this type. Moreover, the data and evaluation routines remain available to form the basis of an expanding ranking of models and fitting routines and a benchmark for future model development.

Main conclusions

The first insight is on the type of model used. Signal models do not necessarily outrank tissue models; indeed, using our dataset, models of the signal (Alipoor, Sakaie, Fick, Rivera) ranked on average lower than models of the tissues, despite their theoretical ability to offer more flexibility in describing the raw signal. This is quite surprising, as the current perception within the field is that, generally, we can capture the signal variation much better through a functional description of the signal (signal models) rather than via a biophysical model of the tissue (tissue models). The former generally consist of bases of arbitrary complexity, whereas the latter are generally very parsimonious models that rely on extremely crude descriptions of tissue (e.g. white matter as parallel impermeable cylinders). The results suggest that the flexibility of signal models can rapidly lead to overfitting. However, the tissue models can explain the signal relatively well even with just a few parameters (compare the quality‐of‐fit plots of the Rivera model in Figure 7 with the signal prediction of the top models in Figure 6: the higher the b value, the worse the prediction of the linear signal model). Certain underlying assumptions may cause the signal models to perform less well than expected. For example, they are often designed to work with data with a single diffusion time and do not generalize naturally to incorporate the additional dimension (although see Fick et al.61 for some steps towards generalization). Many of the tissue models, on the other hand, naturally account for finite δ, varying diffusion times and gradient strength (e.g. the Ramirez‐Manzanares, Nilsson and Ferizi models in our collection). We cannot draw any conclusion about the benefits of an adjustable number of parameters in a model, because of the limited number of models in our study that do this and because the models differ in a range of other aspects. The second insight concerns the choice of noise modelling. Despite the fact that SNR at b=0 and T E=152 ms falls to about 6, use of the Rician noise model does not appear to be a significant benefit in predicting unseen signal; here, however, we do not investigate the effect on estimated model parameters, which may still benefit from the more accurate noise model. In this challenge, most participants used non‐linear least‐squares or maximum‐likelihood optimization. Additional regularization of the objective function (Eufracio & Rivera/Lasso, Rokem/Elastic Net, Fick/Laplacian) appeared to have little benefit over non‐regularized optimization. The third observation is about removing signal outliers. Five of the eleven models preprocessed the training data by clearing out outliers, including the top two models. We tried this procedure with two good models that did not use such a procedure, Ferizi1 and Ferizi2, and observed that it did not affect the ranking, though it did improve the prediction error marginally. This is understandable, considering the relatively little weight these apparent outliers have on the total number of measurements (10 points from a 4812‐strong dataset). Additionally, specific strategies for predicting the signal, e.g. bootstrapping or cross‐validation, as used by the top two models of Ramirez‐Manzanares and Nilsson, may also help the model ranking.

Limitations and future directions

Although this challenge provides several new insights into the choice of model and fitting procedure for diffusion‐based quantitative imaging tools, it has a number of limitations that future challenges might be designed to address. One limitation of the study is that we use a very rich acquisition protocol that is not representative of common or clinical acquisition protocols. In particular, we cover a very wide range of b values and the data acquisition (protocol) we use consists of many TEs, unlike many other multi‐shell diffusion datasets that use a fixed TE. As stated in the Introduction, our intention is to sample the measurement space as widely as possible to support the most informative models possible. Varying the TE makes it possible to probe compartment‐specific T 2 (the decay of which Ferizi et al.42 finds to be monoexponential at the voxel level), an investigation that would be impossible with a single TE. However, the good performance of DIAMOND also shows that a model with fixed δ and Δ can still capture the signal variation in multi‐TE datasets and that, while the majority of the full data was ignored in each of the reconstructions, its prediction error compared favourably with other techniques. We use the unique human Connectome scanner51 to acquire a dataset with gradients of up to 300mT/m, which is not readily available in most current MR machines. However, previous preclinical work by Dyrby et al.13 suggests that high diffusion gradients enrich the signal, which helps model fitting and comparison. Future challenges might be designed that focus on explaining the signal and estimating parameters from data more typical of clinical acquisitions. Assessing the prediction performance on unseen data as in this challenge is different from assessing the fitting error: it implicitly penalizes models that overfit the data. However, since most of the missing shells lie in between other shells (in terms of b values, TEs, etc.), the quality of signal extrapolation was not assessed. We get a glimpse of this from Figure 4, where the SSE is unevenly distributed between the b values. Here, the shell that bore the largest error is the b=1100 s/mm2 one; see also Figures 6 and 7. Of all ‘unseen’ shells, this shell combines the lowest Δ and highest |G|, placing it on the edge of the range of the measurement space sampled. Such a b‐value shell combines high signal magnitude with high sensitivity, i.e. the gradient of signal against b‐value is highest in this range, which makes it hard to predict. (We stress that this observation is in the context of the wider multi‐shell acquisition, and is not to be seen in isolation for its potential impact on single‐shell acquisition methods.) On the other hand, the variability of prediction errors in the b<750 s/mm2 range could arise from the varying sensitivity of different models to the free water component, which is challenging to estimate as it can easily be confounded with hindered water, or physiological effects, which are mostly observable in this low b‐value range. Future work can take this further, by selecting unseen shells outside the min–max range of experimental parameters. This is likely to penalize more complex models that overfit the data even more strongly. We did not take into account the computational demand of each model, and this might limit the generalization of the results. Models that use bootstrapping generally have a higher computational burden and may not be feasible for large datasets, e.g. whole brain coverage. The dataset used in this challenge is specific to one subject who underwent a long‐duration acquisition, which adds to the question of generalizability. The subsequent preprocessing of the data is also a factor to bear in mind: the registration of two 4h datasets, across such a broad range of echo times, poses its own challenges for certain non‐homogenous regions in the brain, such as the fornix (compared with, for example, the relatively large genu). Thus the results may be somewhat subject‐specific and may be affected by residual alignment errors. Another limitation is that we only look at isolated voxels inside the corpus callosum and the fornix. Questions still remain about which models are viable even in the most coherent areas of the brain with the simplest geometry, so we believe our focused challenge on well‐defined areas is an informative first step necessary before extending the idea to the whole of the white matter, which would make for an extremely complex challenge. We note, however, recent work by Ghosh et al.66 that illustrates such an approach with Human Connectome Project (HCP) data. We focused here on comparing models based on their ability to predict unseen data. Although models that reflect true underlying tissue structure should explain the data well, we cannot infer in general that models that predict unseen data better are mechanistically closer to the tissue than those that do not. As we discuss in the Introduction, the main power of evaluating models in terms of prediction error is to reject models that cannot explain the data. Thus, while the identification of parsimonious models that explain the data certainly has great benefit, further validation is necessary through comparison of the parameters that they estimate with independent measurements, e.g. obtained through microscopy (our challenge makes no attempt to assess the integrity of parameter estimates themselves, but future challenges might use such performance criteria). Models can be evaluated to some extent by sanity checking the realism of their fitted parameter values, as in for example Jelescu et al.64 or Burcaw et al.67 However, obtaining accurate ground‐truth values for quantitative evaluation remains a hard and yet unsolved problem for diffusion MRI in general. In particular, histology can only roughly approximate the in vivo ground truth and introduces its own set of challenges in sample preparation, acquisition and biophysical interpretation.12, 13, 65, 68, 69, 70, 71 This challenge highlights the need for improved model comparison and validation methods.

CONCLUSION

Challenges such as this have great value in bringing the community together and provide an unbiased comparison of wide‐ranging solutions to key data‐processing problems. They raise new insights and ideas, motivating more directed future studies. The data are publicly available for others to use, with more details of the dataset given on the Challenge website at http://cmic.cs.ucl.ac.uk/wmmchallenge/. On this website, an up‐to‐date ranking of the models will be available, where additional models can be added after the publication of the article and where the community will be able to evaluate further the impact of noise correction, compartment‐specific T 2 estimation, inter‐class model assumptions, e.g. tissue versus signal models, or indeed intra‐class model assumptions, e.g. whether cylinders or sticks are optimal models for the given dataset.42 This will provide an important benchmark for future models and parameter estimation routines.

74 in total

1. The importance of axonal undulation in diffusion MR measurements: a Monte Carlo simulation study.

Authors: Markus Nilsson; Jimmy Lätt; Freddy Ståhlberg; Danielle van Westen; Håkan Hagslätt
Journal: NMR Biomed Date: 2011-10-21 Impact factor: 4.044

2. Direct estimation of the fiber orientation density function from diffusion-weighted MRI data using spherical deconvolution.

Authors: J-Donald Tournier; Fernando Calamante; David G Gadian; Alan Connelly
Journal: Neuroimage Date: 2004-11 Impact factor: 6.556

3. An objective method for regularization of fiber orientation distributions derived from diffusion-weighted MRI.

Authors: Ken E Sakaie; Mark J Lowe
Journal: Neuroimage Date: 2006-10-09 Impact factor: 6.556

4. A novel tensor distribution model for the diffusion-weighted MR signal.

Authors: Bing Jian; Baba C Vemuri; Evren Ozarslan; Paul R Carney; Thomas H Mareci
Journal: Neuroimage Date: 2007-05-03 Impact factor: 6.556

5. Multi-fiber reconstruction from diffusion MRI using mixture of Wisharts and sparse deconvolution.

Authors: Bing Jian; Baba C Vemuri
Journal: Inf Process Med Imaging Date: 2007

6. "Squashing peanuts and smashing pumpkins": how noise distorts diffusion-weighted MR data.

Authors: Derek K Jones; Peter J Basser
Journal: Magn Reson Med Date: 2004-11 Impact factor: 4.668

7. White matter characterization with diffusional kurtosis imaging.

Authors: Els Fieremans; Jens H Jensen; Joseph A Helpern
Journal: Neuroimage Date: 2011-06-13 Impact factor: 6.556

8. Degeneracy in model parameter estimation for multi-compartmental diffusion in neuronal tissue.

Authors: Ileana O Jelescu; Jelle Veraart; Els Fieremans; Dmitry S Novikov
Journal: NMR Biomed Date: 2015-11-29 Impact factor: 4.044

9. Model-based analysis of multishell diffusion MR data for tractography: how to get over fitting problems.

Authors: Saad Jbabdi; Stamatios N Sotiropoulos; Alexander M Savio; Manuel Graña; Timothy E J Behrens
Journal: Magn Reson Med Date: 2012-02-14 Impact factor: 4.668

10. Diffusion MRI microstructure models with in vivo human brain Connectome data: results from a multi-group comparison.

Authors: Uran Ferizi; Benoit Scherrer; Torben Schneider; Mohammad Alipoor; Odin Eufracio; Rutger H J Fick; Rachid Deriche; Markus Nilsson; Ana K Loya-Olivas; Mariano Rivera; Dirk H J Poot; Alonso Ramirez-Manzanares; Jose L Marroquin; Ariel Rokem; Christian Pötter; Robert F Dougherty; Ken Sakaie; Claudia Wheeler-Kingshott; Simon K Warfield; Thomas Witzel; Lawrence L Wald; José G Raya; Daniel C Alexander
Journal: NMR Biomed Date: 2017-06-23 Impact factor: 4.044

14 in total

Review 1. Challenges in diffusion MRI tractography - Lessons learned from international benchmark competitions.

Authors: Kurt G Schilling; Alessandro Daducci; Klaus Maier-Hein; Cyril Poupon; Jean-Christophe Houde; Vishwesh Nath; Adam W Anderson; Bennett A Landman; Maxime Descoteaux
Journal: Magn Reson Imaging Date: 2018-11-29 Impact factor: 2.546

2. Comparison of NODDI and spherical mean signal for measuring intra-neurite volume fraction.

Authors: Hua Li; Rahul Nikam; Vinay Kandula; Ho Ming Chow; Arabinda K Choudhary
Journal: Magn Reson Imaging Date: 2018-11-26 Impact factor: 2.546

3. Motion-robust diffusion compartment imaging using simultaneous multi-slice acquisition.

Authors: Bahram Marami; Benoit Scherrer; Shadab Khan; Onur Afacan; Sanjay P Prabhu; Mustafa Sahin; Simon K Warfield; Ali Gholipour
Journal: Magn Reson Med Date: 2018-11-16 Impact factor: 4.668

4. The Connectivity Fingerprint of the Fusiform Gyrus Captures the Risk of Developing Autism in Infants with Tuberous Sclerosis Complex.

Authors: Benoit Scherrer; Anna K Prohl; Maxime Taquet; Kush Kapur; Jurriaan M Peters; Xavier Tomas-Fernandez; Peter E Davis; Elizabeth M Bebin; Darcy A Krueger; Hope Northrup; Joyce Y Wu; Mustafa Sahin; Simon K Warfield
Journal: Cereb Cortex Date: 2020-04-14 Impact factor: 5.357

Review 5. Mapping the human connectome using diffusion MRI at 300 mT/m gradient strength: Methodological advances and scientific impact.

Authors: Qiuyun Fan; Cornelius Eichner; Maryam Afzali; Lars Mueller; Chantal M W Tax; Mathias Davids; Mirsad Mahmutovic; Boris Keil; Berkin Bilgic; Kawin Setsompop; Hong-Hsi Lee; Qiyuan Tian; Chiara Maffei; Gabriel Ramos-Llordén; Aapo Nummenmaa; Thomas Witzel; Anastasia Yendiki; Yi-Qiao Song; Chu-Chung Huang; Ching-Po Lin; Nikolaus Weiskopf; Alfred Anwander; Derek K Jones; Bruce R Rosen; Lawrence L Wald; Susie Y Huang
Journal: Neuroimage Date: 2022-02-23 Impact factor: 7.400

Review 6. Magnetic Resonance Imaging technology-bridging the gap between noninvasive human imaging and optical microscopy.

Authors: Jonathan R Polimeni; Lawrence L Wald
Journal: Curr Opin Neurobiol Date: 2018-05-11 Impact factor: 6.627

7. Alcohol consumption in the general population is associated with structural changes in multiple organ systems.

Authors: Hideaki Suzuki; Wenjia Bai; Evangelos Evangelou; Raha Pazoki; He Gao; Paul M Matthews; Paul Elliott
Journal: Elife Date: 2021-06-01 Impact factor: 8.713

8. Diffusion MRI microstructure models with in vivo human brain Connectome data: results from a multi-group comparison.

Review 9. The visual white matter: The application of diffusion MRI and fiber tractography to vision science.

Authors: Ariel Rokem; Hiromasa Takemura; Andrew S Bock; K Suzanne Scherf; Marlene Behrmann; Brian A Wandell; Ione Fine; Holly Bridge; Franco Pestilli
Journal: J Vis Date: 2017-02-01 Impact factor: 2.240

10. Leveraging multi-shell diffusion for studies of brain development in youth and young adulthood.

Authors: Adam R Pines; Matthew Cieslak; Bart Larsen; Graham L Baum; Philip A Cook; Azeez Adebimpe; Diego G Dávila; Mark A Elliott; Robert Jirsaraie; Kristin Murtha; Desmond J Oathes; Kayla Piiwaa; Adon F G Rosen; Sage Rush; Russell T Shinohara; Danielle S Bassett; David R Roalf; Theodore D Satterthwaite
Journal: Dev Cogn Neurosci Date: 2020-04-22 Impact factor: 6.464