Literature DB >> 28457975

Robust and fast nonlinear optimization of diffusion MRI microstructure models.

R L Harms¹, F J Fritz², A Tobisch³, R Goebel², A Roebroeck².

Abstract

Advances in biophysical multi-compartment modeling for diffusion MRI (dMRI) have gained popularity because of greater specificity than DTI in relating the dMRI signal to underlying cellular microstructure. A large range of these diffusion microstructure models have been developed and each of the popular models comes with its own, often different, optimization algorithm, noise model and initialization strategy to estimate its parameter maps. Since data fit, accuracy and precision is hard to verify, this creates additional challenges to comparability and generalization of results from diffusion microstructure models. In addition, non-linear optimization is computationally expensive leading to very long run times, which can be prohibitive in large group or population studies. In this technical note we investigate the performance of several optimization algorithms and initialization strategies over a few of the most popular diffusion microstructure models, including NODDI and CHARMED. We evaluate whether a single well performing optimization approach exists that could be applied to many models and would equate both run time and fit aspects. All models, algorithms and strategies were implemented on the Graphics Processing Unit (GPU) to remove run time constraints, with which we achieve whole brain dataset fits in seconds to minutes. We then evaluated fit, accuracy, precision and run time for different models of differing complexity against three common optimization algorithms and three parameter initialization strategies. Variability of the achieved quality of fit in actual data was evaluated on ten subjects of each of two population studies with a different acquisition protocol. We find that optimization algorithms and multi-step optimization approaches have a considerable influence on performance and stability over subjects and over acquisition protocols. The gradient-free Powell conjugate-direction algorithm was found to outperform other common algorithms in terms of run time, fit, accuracy and precision. Parameter initialization approaches were found to be relevant especially for more complex models, such as those involving several fiber orientations per voxel. For these, a fitting cascade initializing or fixing parameter values in a later optimization step from simpler models in an earlier optimization step further improved run time, fit, accuracy and precision compared to a single step fit. This establishes and makes available standards by which robust fit and accuracy can be achieved in shorter run times. This is especially relevant for the use of diffusion microstructure modeling in large group or population studies and in combining microstructure parameter maps with tractography results.

Entities: CellLine Chemical Disease Gene Species

Keywords: Biophysical compartment models; Diffusion MRI; Graphics Processing Unit; Microstructure; Nonlinear optimization; Parallel computing

Mesh：

Year: 2017 PMID： 28457975 PMCID： PMC5518773 DOI： 10.1016/j.neuroimage.2017.04.064

Source DB: PubMed Journal: Neuroimage ISSN： 1053-8119 Impact factor: 6.556

Introduction

Diffusion MRI (dMRI) is a tool for investigating the microstructure of biological tissue by probing the self-diffusion of water (Bihan et al. 1986). The conventional method for the analysis of white matter in dMRI imaging is the tensor model in Diffusion Tensor Imaging (DTI; Basser et al. 1994). DTI has shown to be sensitive to microstructural changes due to, for example, development (Pfefferbaum et al., 2000, Neil et al., 2002, Assaf and Pasternak, 2008, Lebel et al., 2010) and pathology (Werring et al., 1999, Horsfield and Jones, 2002, Sotak, 2002, Sundgren et al., 2004). Although sensitive, DTI indices such as Fractional Anisotropy (FA) are also unspecific, since differences in FA can reflect different axonal properties such as axon density, diameter distribution and myelination (Beaulieu, 2002, Assaf et al., 2004, Assaf and Pasternak, 2008, Jones et al., 2013, Santis et al., 2014). Recently, advances in biophysical multi-compartment modeling have gained popularity because they possess greater specificity than DTI in relating the dMRI signal to the underlying cellular microstructure (Assaf et al., 2004, Assaf and Basser, 2005, Assaf et al., 2008, Alexander et al., 2010, Panagiotaki et al., 2012, Zhang et al., 2012, Assaf et al., 2013, Fieremans et al., 2013, De Santis et al., 2014, Santis et al., 2014, Jelescu et al., 2015). In these diffusion microstructure models the diffusion weighted signal is expressed as a combination of one or more biophysically inspired compartments. Although the compartment models are based on simple geometric shapes, and a strong assumption of no inter-compartment exchange is made (cf; Nilsson et al. 2013; Li et al. 2016), these models can provide specific measures such as fiber density, orientation dispersion, and axonal diameter distributions and have been shown to be sensitive to specific white matter alterations due to development (e.g. Kunz et al., 2014; Jelescu et al., 2015) and pathology (Bergers et al., 2002, Fieremans et al., 2013, Benitez et al., 2014, Timmers et al., 2015, Wen et al., 2015, Kamagata et al., 2016). There is a large range of diffusion microstructure models, including popular models such as Neurite Orientation Dispersion and Density Imaging model (NODDI; Zhang et al., 2012), the Combined Hindered And Restricted Model of Diffusion (CHARMED; Assaf and Basser, 2005), the White Matter Tract Integrity model (WMTI; Fieremans et al., 2013), AxCaliber (Assaf et al., 2008), the Minimal Model of White Matter Diffusion (MMWMD in ActiveAx; Alexander et al., 2010; Zhang et al. 2011), including further developments of these models like NODDIDA (Jelescu et al., 2015), Bingham NODDI (Tariq et al., 2016), diffusion time dependent CHARMED (De Santis et al. 2016) and fiber-specific T1 CHARMED (De Santis et al., 2016). Each of these models needs to be fitted to the dMRI data to estimate parameter maps, which is commonly accomplished using non-linear optimization. This has two major challenges. First, it is computationally expensive, quickly leading to very long run times. Second, the quality, accuracy and precision of the data fit is often uncertain. To tackle these challenges, each of the popular models comes with its own, mostly different, optimization algorithm, noise model and initialization strategy to estimate its parameter maps. This creates challenges to comparability and generalization of results from diffusion microstructure models, additional to the model formulation itself. In this technical note we investigate the performance of several optimization algorithms and initialization strategies over a few of the more popular diffusion microstructure models. We investigate whether a single well performing approach exists that could be applied to many models and that equates both run time and fit aspects. To this end, we evaluate the fit, accuracy, precision and run time for different models of differing complexity against three common optimization algorithms and three parameter initialization strategies. All models, algorithms and strategies were implemented on the GPU to remove run time constraints. Variability of the achieved quality of fit in actual data is evaluated on ten subjects of each of two large group studies with a different acquisition protocol, the MGH-USC part of the Human Connectome Project (Fan et al., 2016) and the Rhineland Study (www.rheinland-studie.de).

Methods

All biophysical compartment models, noise models / likelihood functions, as well as optimization algorithms were implemented in a python based GPU accelerated toolbox (Maastricht Diffusion Toolbox or MDT, freely available under an open source L-GPL license at https://github.com/cbclab/MDT). Its object oriented modular design allows arbitrary combinations of individual single compartment models into composite compartment models, which can then be combined with a chosen likelihood function and optimization algorithm by Python scripting. The complete multi-compartment model, likelihood function and optimization algorithm is automatically compiled into OpenCL code executable on both CPUs and GPUs (or combinations thereof).

Single compartment models

Table 1 defines the individual compartment models which are combined to construct the multi-compartment models (c.f. Panagiotaki et al., 2012; Ferizi et al., 2013) valid for (singly refocused) Pulsed Gradient Spin Echo (PGSE; Stepisnik, 1993) acquisitions.

Table 1

The single compartment models, see Table 2 for an overview of the optimizable parameters. The primary direction of diffusivity n, is parameterized using polar coordinates with angles and radius d. The variables b, g, q, ,, G and TE are sequence settings. In the Tensor compartment, the function rotates the Tensor around by the angle . In the NODDI models, the function gives the probability of finding fiber bundles along orientation using a Watson distribution with parameter integrated over the unit sphere . In the NODDI_ex model, the diffusion tensor is defined as a cylindrically symmetric Tensor (alike the Tensor in this table except for the symmetry). In the CHARMED_in compartment N is the number of gamma cylinders used, is the weight per gamma distributed cylinders and is the radius per cylinder. In previous work is sometimes denoted as and as , we inlined these identities here in the CHARMED_in equation.

Compartment	Signal function	Compartment model parameters
Tensor	S=e−b(d∥(n∙g)2+d⊥1(n⊥1∙g)2+d⊥2(n⊥2∙g)2)	d\|\|,d⊥1,d⊥2,θ,ϕ,ψ
	n⊥1=rotate(n,ψ)
	n⊥2=n×n⊥1
Ball	S=e−bd	d
Stick	S=e−bd(n∙g)2	d,θ,ϕ
NODDI_in	S=∫S2f(n,κ)e−bd(n∙g)2dn	d,θ,ϕ,κ
NODDI_ex	S=e−bgT(∫S2f(n,κ)D(n)dn)g	d\|\|,d⊥,θ,ϕ,κ

CHARMED_in	S=∑i=1Nvi[S∥(q,Δ)⋅S⊥i(q,TE)]	d,θ,ϕ
	S∥(q,Δ)=e−4π2\|q\|2(n∙g)2(Δ−δ/3)d
	S⊥i(q,TE)=e−(4π2\|q\|2(1−(n∙g)2)Ri4/(d∙TE2))(796)(2−(99112)Ri2/(d∙TE2))

Table 2

The parameter descriptions corresponding to Table 1.

Parameter	Compartments	Usage
d\|\| (or d)	Tensor, Ball, Stick, NODDI_in, NODDI_ex, CHARMED_in	Parallel diffusivity along the primary direction of diffusion n
d⊥1	Tensor	Perpendicular diffusivity, perpendicular to both d\|\| and d⊥2
d⊥2	Tensor	Perpendicular diffusivity, perpendicular to both d\|\| and d⊥1
θ	Tensor, Stick, NODDI_in, NODDI_ex, CHARMED_in	Polar angle used to parameterize n, the primary direction of diffusion
ϕ	Tensor, Stick, NODDI_in, NODDI_ex, CHARMED_in	Azimuth angle used to parameterize n, the primary direction of diffusion
ψ	Tensor	Used to rotate the Tensor around its primary axis
κ	NODDI_in, NODDI_ex	The dispersion index of the Watson distribution

The parameter descriptions corresponding to Table 1. The Tensor model was first described in (Basser et al., 1994), the Ball and Stick models are defined in (Behrens et al., 2003), NODDI_in and NODDI_ex are respectively the intra cellular and extra cellular models in (Zhang et al., 2012) and the CHARMED_in compartment is the restricted compartment defined in (Assaf et al., 2004). Each compartment has a signal function modelling the signal S depending on the unit norm gradient direction vector g with scalar and the vector . Here G is the gradient amplitude, the time between the start of the two gradient pulses, the duration of the gradient pulse and the gyromagnetic ratio for 1H in . Additionally, the CHARMED_in compartments depend on the echo time TE. Disregarding fixing of parameter values discussed below, the modelled fiber orientations n are optimized as spherical coordinates using the free parameters and with scalar d for the diffusivity. Some models have more than one diffusivity, these are all optimized separately unless specified otherwise in the multi-compartment model. In the Tensor compartment the rotate function first rotates the fiber directions by 90 degrees in the (x, z) plane to make it perpendicular to the principal direction. The Tensor is then rotated around by the angle resulting in , and, after a cross product . In the NODDI_in and NODDI_ex models, the function gives the probability of finding fiber bundles along orientation using a Watson distribution integrated over the unit sphere . In the NODDI_ex model, the diffusion tensor is defined as a cylindrically symmetric tensor. For more details on NODDI see (Zhang et al., 2012). The CHARMED_in model is a Gamma Distributed Radii (GDR) Cylinder model (Panagiotaki et al., 2012) where the cylinder distribution is predefined with N cylinders (we take N=6 throughout). Our 6 cylinders have radii R of [1.5, 2.5, 3.5, 4.5, 5.5, 6.5] meters and corresponding weights [0.0212, 0.1072, 0.1944 0.2667, 0.2150, 0.1956] taken from a Gamma distribution derived from histological results (Aboitiz et al., 1992).

Composite multi-compartment models

The general multi-compartment diffusion microstructure model has the form of a weighted sum of single compartments:Where S0 is the signal for the non-diffusion weighted (or b0) acquisitions, w the volume fractions (signal weights, signal fractions or water fractions) and Si is the signal function for the i’th of n total compartments. For this work we selected the Tensor, Ball&Sticks, NODDI and CHARMED models. Table 3 shows these multi-compartment models (henceforth simply ‘models’), their constituent compartments and total number of parameters including estimation of S0.

Table 3

The used composite multi-compartment models, their compartments (divided into intra-, extra-axonal and isotropic) and total number of parameters.

Model	Restricted	Hindered	Isotropic compartments	Number of parameters	Acquisition requirements
Model	(intra-cellular) compartments	(extra-cellular) compartments	Isotropic compartments	Number of parameters	Acquisition requirements
Tensor	–	Tensor	–	7	b<1.5*106s/m2
Ball&Sticks_in[n]	Stick (n times)	–	Ball	1+3n	–
NODDI	NODDI_in	NODDI_ex	Ball	6	≥2 b-values/shells

CHARMED_in[n]	CHARMED_in	Tensor	–	7+4n	≥2 b-values/shells
CHARMED_in[n]	(n times)	Tensor	–	7+4n	bmax≥4.0*106s/m2

The used composite multi-compartment models, their compartments (divided into intra-, extra-axonal and isotropic) and total number of parameters. Here we take the Stick compartment to be an idealized cylinder with zero radius and classify it as a restricted compartment (cf. Ferizi et al., 2014). We have added to some of the models the postfix ‘_in[n]’ which is used to identify the number of restricted compartments employed in a model for those models that allow multiple restricted compartments. For example, CHARMED_in2 indicates a CHARMED model with 2 restricted compartments (and the regular single hindered compartment) for each of two unique fiber orientations in a voxel. The total number of parameters per model is the sum of the parameters per compartment, the volume fractions (equal to the number of compartments minus one) and the scalar for . The complete signal equation per model is then given as a volume fraction weighted sum of compartments, scaled by the non-diffusion weighted signal .

Evaluation function and likelihood

Models are optimized by finding the set of free parameter values that minimize the evaluation function or objective function of the modeling errors with O the observed data and S() the model signal estimate. In general, the evaluation function is formulated as the negative log likelihood function which embeds a noise model for the data. Whereas the noise in the complex valued Fourier coefficient images can be described by a Gaussian distribution, the noise in the magnitude reconstructed MR data follows a non-zero mean (Rician or non-central ) distribution (Gudbjartsson and Patz, 1995). For SNR>2 an Offset Gaussian model can be used (Alexander 2009), with the advantage that it is potentially more stable than the Rician model (Panagiotaki et al., 2012). Therefore, we use the Offset Gaussian likelihood model for all optimizations. Minimizing the negative log likelihood leads to the following objective function (with constant terms dropped): Here we use as the standard deviation of the Gaussian distributed error of the complex valued Fourier images. We estimated this standard deviation from the reconstructed magnitude images using the method in Dietrich et al. (2007; eq. A6).

Parameter dependencies and global fixes

Some of the models have specific dependencies between their parameters, as well as parameter fixed to a specific value. Defining the dependencies at the level of the multi-compartment model has two advantages. First, this allows the compartments to be defined as general as possible and adapt them for the use in several multi-compartment models. Second, some dependencies relate the parameters of different compartments to each other and therefore must operate at the level of the multi-compartment model. The list of global fixes and dependencies is given by Table 4 where we use the ‘dot’ notation (e.g. Ball.d for the d parameter of the Ball compartment) to explicitly assign parameters to certain compartments.

Table 4

Parameter dependencies and global fixes.

Model	Parameter global fixes	Parameter dependencies
Tensor	–	–

Ball&Sticks_in[n]	Ball.d = 3*10⁻⁹ m²/s	∑wi=1
Ball&Sticks_in[n]	Stick.d = 1.7*10⁻⁹ m²/s	∑wi=1

NODDI	NODDI_in.d = 1.7*10⁻⁹ m²/s	1) ∑wi=1 2) NODDI_ex.d⊥1=NODDI_ex.d\|\|∙wex(win+wex) 3) NODDI_ex.{κ,θ,ϕ} = NODDI_in.{κ,θ,ϕ}
	NODDI_ex.d = 3*10⁻⁹ m²/s
	Ball.d = 3*10⁻⁹ m²/s
CHARMED_in[n]	–	∑wi=1

Parameter dependencies and global fixes. NODDI_ex. = NODDI_in. In the NODDI model in Table 4, the second dependency defines the NODDI tortuosity assumption and the third dependency locks the orientations and dispersion of the NODDI_in and NODDI_ex model to each other (the dispersion because the NODDI model adjusts the ratio of parallel and perpendicular extra-axonal diffusivity on the basis of orientation dispersion of the intracellular compartment). Additionally, for all multi-compartment models, the weights (volume fractions) must sum to one. This is accomplished by normalizing the set of n-1 volume fractions in each optimization iteration by the sum if that sum is larger than one. The last compartment weight, not explicitly optimized, is either set to zero, i.e. or set as if s is smaller than zero, for signal evaluation in the optimization process.

Indices calculated in post-processing

For every model various additional volumetric maps are calculated from the optimized model parameters. These can be model specific and model independent. Table 5 highlights the model specific maps.

Table 5

Additional volumetric maps per model.

Model	Additional maps
Tensor	–
Ball&Sticks_in[n]	FS=1–wball
NODDI	1) NDI=winwin+wex 2) ODI=arctan2(1,κ∙10)∙2π
CHARMED_in[n]	FR=1–whin

Additional volumetric maps per model. FS (Fraction of Sticks) is the total fraction (sum of weights) of the n Stick compartments in the Ball&Sticks model. NDI (Neurite Density Index) and ODI (Orientation Dispersion Index) are defined in (Zhang et al., 2011) and FR (Fraction of Restricted) is the total fraction of the n restricted compartments in the CHARMED model (Assaf et al., 2004, De Santis et al., 2012). In addition to these model specific maps, we add two model independent maps to the result set of every model. The first is the Log Likelihood (LL) map containing the complete (log) likelihood of the model given the data, as given by: With O the observation, S() the function value given the optimal set of parameters, the noise standard deviation and m the number of volumes in the dataset. This LL map gives an indication of the goodness of fit and a higher LL is preferred. The second model independent map we add is the Bayesian Information Criterion (BIC) map. The BIC (Schwarz, 1978) can be used to compare models with different numbers of parameters on the same dataset (Panagiotaki et al., 2012, Ferizi et al., 2014). It is defined as where LL is the log likelihood value, k is the number of free parameters and m is the number of observations. The model with the lower BIC is preferred and features the best trade-off between model-fit and complexity (number of parameters).

Optimization algorithms

We explicitly separate the models from the optimization algorithms to enable evaluating model/optimization algorithm combinations. We choose three commonly used optimization algorithms, two generations of gradient free algorithms (Nelder-Mead Simplex and Powell's conjugate-direction) and one gradient-based algorithm (Levenberg-Marquardt). All optimization algorithms in this study were originally programmed in C and converted to OpenCL for the purpose of this work. OpenCL is a cross-vendor language standard and framework that enables highly parallelized calculations on the graphics card (graphical processing unit, GPU), as well as central processor unit (CPU), potentially allowing for large computational acceleration (c.f. Hernandez et al., 2013). The first routine we used is the Nelder-Mead simplex (NMSimplex) method (Nelder and Mead, 1965). This is a general derivative-free nonlinear local optimization algorithm, also used in the MATLAB fminsearch function. The NMSimplex routine starts with an n-dimensional simplex with at each vertex initial values for all n model parameters. During optimization, the values of the evaluation function at each vertex of the simplex are used to determine the worst corner point. The algorithm then tries to replace this worst point by transforming the simplex about its centroid using reflection, expansion, contraction and shrinking (Rios and Sahinidis, 2012). This algorithm has five coefficients that determine its behavior: the scale of the initial simplex (s), and the coefficients for the reflection , contraction , expansion , and shrink of the simplex during each iteration. Preliminary tests using the default coefficients by (Nelder and Mead,1965), the table of coefficients in (Wang and Shoup, 2011) and the adaptive coefficients in (Gao and Han 2012) led us to use the adaptive coefficients in the latter, being: , , , and with k being the number of free parameters in the model. The original implementation followed for translation into OpenCL was programmed in C by Michael F. Hutt (http://www.mikehutt.com/neldermead.html). The second routine studied is the Powell conjugate-direction method (Powell, 1964). This is a general derivative-free nonlinear local optimization algorithm. It starts the search with n search vectors each containing all n model parameters. In every iteration, the algorithm tries to update each search vector in turn using a line search across each dimension. Doing so, it will continuously create new vector combinations (using line searches) and replace old combinations until a stop criteria is met. The final result is the search vector with the minimum function value. We used an OpenCL implementation translated from the C implementation described in Numerical Recipes 3th edition (Press et al., 2007) while using the Brent algorithm (Brent, 1973) for the line search. We used the default settings (from the literature) with a bracket gold ratio of 1.618034 (default ratio by which successive intervals are magnified in bracketing) and a g-limit of 100 (the maximum magnification allowed for a parabolic-fit step in function bracketing). The third routine we studied is the Levenberg-Marquardt (LM) method (Levenberg, 1944). This method is a gradient based least squares optimization routine for which the gradients can either be given analytically or approximated numerically. We used numerical gradients for all models, as this is most commonly applied (Alexander et al., 2010, Assaf et al., 2008, De Santis et al., 2012, Zhang et al., 2012). Our implementation is an OpenCL version of the lmfit (http://apps.jcns.fz-juelich.de/lmfit) C implementation of this algorithm. All three algorithms above have two main stop criteria. Either the difference between two consecutive iterations is below a certain threshold, or the predefined maximum number of iterations is reached. For the first we set the break threshold at thirty times the machine epsilon. For the second we set the so-called patience ( where i is the number of iterations, p is the patience and k is the number of free parameters in the model) to a predefined setting per optimization algorithm. This construction allows fixing the patience while ensuring that models with more parameters are allowed more iterations. We did a preliminary study to determine per algorithm and model the optimal patience given goodness of fit and run time. We used, unless stated otherwise, for NMSimplex a patience of 200, for Powell a patience of 2 and for LM a patience of 100, which gave the optimal trade-off between run time and fit (objective function minimum) for each algorithm. All calculations were performed on a single 2015 AMD Fury X graphics card using single float (32-bit) precision, except for parts where value range (summation of log likelihoods, Euclidian norm in the LM algorithms) or precision (the Watson function in NODDI) are critical, which were evaluated with double precision.

Parameter transformations

During model fitting with the NMSimplex, Powell and LM method we use parameter transformations to limit the range of each parameter to biophysical meaningful values and to scale the parameters to a range better suited for optimization after Panagiotaki (2012). These transformations transform the parameters between two sets of parameters spaces: the model space and the optimization space, both in . The model space is the space in which the parameters can be used as input to the model. The optimization space is the set of parameters the optimization routine tries to optimize. We define the encoding transformation as transforming the parameters from model to optimization space and the decoding transformation from optimization to model space. Please see Appendix A for a complete list of parameter transformations used in this work.

Cascaded model optimization

A common strategy to optimize complex (i.e. high number of parameter) models is initializing subsets of parameters with results from earlier simpler model optimizations (Panagiotaki et al., 2012, Zhang et al., 2012, De Santis et al., 2016). We formalize this here as cascaded model optimization. We test three cascading strategies: Cascade S0 (CS), Cascade Initialized (CI) and Cascade Fixed (CF), see Fig. 1. In the CS cascade we first fit a one parameter S0 model to the non-diffusion weighted volumes to determine the scale of the non-diffusion weighted signal. We use this to initialize the S0 parameter of the model we are interested in. In the CI cascades we initialize a few specific parameters like the volume fractions and the fiber orientation parameters from the results of a simpler model. The CF cascades are similar to the CI cascades except that we now fix some of the parameters, reducing the dimensionality of the last optimization, instead of initializing them. For a complete overview of initialized and fixed parameters for all three cascade variants see Appendix B. In this work, we always use at least CS for all model fitting, as an alternative to dividing away the non-diffusion weighted signal or to fixing it beforehand. Since the range of the S0 parameter is quite large and arbitrary, initializing with an S0 estimate improves the performance and run time of the optimization routines.

Fig. 1

Illustration of the three different cascading strategies (for the example of the NODDI model): CS, CI and CF. The blue arrows indicate initialization of a parameter, the orange arrows indicate fixing a parameter.

Datasets

For this study we used two groups of ten subjects coming from two studies. The first ten subjects are from the freely available fully preprocessed dMRI data from the USC-Harvard consortium of the Human Connectome project. Data used in the preparation of this work were obtained from the MGH-USC Human Connectome Project (HCP) database (https://ida.loni.usc.edu/login.jsp). The HCP project (Principal Investigators: Bruce Rosen, M.D., Ph.D., Martinos Center at Massachusetts General Hospital; Arthur W. Toga, Ph.D., University of California, Los Angeles, Van J. Weeden, MD, Martinos Center at Massachusetts General Hospital) is supported by the National Institute of Dental and Craniofacial Research (NIDCR), the National Institute of Mental Health (NIMH) and the National Institute of Neurological Disorders and Stroke (NINDS). Collectively, the HCP is the result of efforts of co-investigators from the University of California, Los Angeles, Martinos Center for Biomedical Imaging at Massachusetts General Hospital (MGH), Washington University, and the University of Minnesota (http://www.humanconnectome.org/documentation/MGH-diffusion/). The data were acquired on a specialized Siemens Magnetom Connectom with 300mT/m gradient set (Siemens, Erlangen, Germany). These datasets were acquired at a resolution of 1.5mm isotropic with =21.8ms, =12.9ms, TE=57ms, TR=8800ms, Partial Fourier = 6/8, MB factor 1 (i.e. no simultaneous multi-slice), in-plane GRAPPA acceleration factor 3, with 4 shells of b=1000, 3000, 5000, 10,000, s/mm^2, with respectively 64, 64, 128, 393 directions to which are added 40 b0 volumes leading to 552 volumes in total per subject, with an acquisition time of 89 minutes. We refer to these datasets as HCP MGH − 1.5mm −552vol - b10k and to the multi-shell direction table as the HCP MGH table. These four-shell, high number of directions, and very high maximum b-value datasets allow a wide range of models to be fitted. The second set of ten subjects comes from the diffusion protocol pilot phase of the Rhineland Study (www.rheinland-studie.de) and was acquired on a Siemens Magnetom Prisma (Siemens, Erlangen, Germany) with the Center for Magnetic Resonance Research (CMRR) multi-band (MB) diffusion sequence (Moeller et al., 2010, Xu et al., 2013). These datasets had a resolution of 2.0mm isotropic with =45.8ms, =16.3ms and TE=90ms, TR=4500ms Partial Fourier = 6/8, MB factor 3, no in-plane acceleration with 3 shells of b=1000, 2000, 3000 s/mm^2, with respectively 30, 40 and 50 directions to which are added 14 b0 volumes leading to 134 volumes in total per subject, with an acquisition time of 10 min 21 sec. Additional b0 volumes were acquired with a reversed phase encoding direction which were used to correct susceptibility related distortion (in addition to bulk subject motion) with the topup and eddy tools in FSL version 5.0.9. We refer to these datasets as RLS – 2mm – 134dir – b3k and to the multi-shell direction table as the RLS table. These three-shell datasets represent a relatively short time acquisition protocol that still allows many models to be fitted. Since the Tensor model is only valid for b-values up to about 1200 s/mm2, it is estimated on only the b-value 1000 s/mm2 shell and b0 volumes. All other models are estimated on all data volumes. For all datasets we created a whole brain mask, using BET from FSL (Smith, 2002), and a white matter (WM) mask. The whole brain mask is used during the model fitting, whereas averages over the WM mask are used in model or data comparisons. The WM mask was calculated by applying a threshold of 0.3 on the Tensor FA results, followed by a double pass 3D median filter of radius 2 in all directions. The Tensor estimate for this mask generation was calculated using a CI Ball Stick/Tensor cascade optimized with the Powell method. To compare performance and robustness of different optimization algorithms and cascading strategies the per-subject LL averages over the WM mask were calculated. The mean differences between algorithms or strategies over ten subjects and their standard error of the mean (SEM) were then reported.

Ground truth simulations

We performed two sets of ground truth simulations (see Fig. 2) to quantify accuracy and precision of: 1) the optimization (Fig. 2A) and 2) the cascading results (Fig. 2B). First, to assess the accuracy and precision of the different optimization algorithms (Fig. 2A), we created ten thousand random fiber orientations for each of sixty linearly spaced volume fractions in [0.2, 0.8], for the Ball&Sticks_in1, CHARMED_in1 and NODDI models using both a HCP MGH and a RLS multi-shell direction table. For the generated signals we created three copies with a Rician distributed noise realization at SNRs 10, 20 and 40. We compute optimization error as optimized parameter value minus ground truth parameter value for the intra-axonal volume fraction, i.e. fraction of stick (FS) for Ball&Sticks_in1, fraction of restricted (FR) for CHARMED_in1 and fraction of restricted FR for NODDI. We compute a measure of accuracy as the inverse of the mean optimization error over ten thousand random fiber orientations per volume fraction (each with a unique noise realization). Similarly, as a measure of precision we compute inverse of the standard deviation of the optimization error over ten thousand random fiber orientations per volume fraction. Finally, we aggregated these results over the sixty volume fractions into a mean and standard error for both accuracy and precision.

Fig. 2

Illustration of the ground truth simulation workflow to quantify the accuracy and precision of A) the optimization routine comparisons, B) the cascading results.

Illustration of the ground truth simulation workflow to quantify the accuracy and precision of A) the optimization routine comparisons, B) the cascading results. Second, to assess the accuracy and precision of the different cascading strategies (see Fig. 2B), we created for the CHARMED_in3 model 1e6 instances with randomly distributed fiber orientations and volume fractions, using both a HCP MGH and a RLS multi-shell direction table. Instead of generating consistent instances which meet inter-fiber angle and volume fraction constraints, we chose to generate many random instances and filter for the constraints. Therefore, after generating 1e6 instances we filtered out: i) instances with any angle between the three orientations smaller than 30 degrees, and ii) individual volume fractions outside the range [0.2, 0.8], leaving 1.5e5 instances. For the generated signals we created three copies with a Rician distributed noise realization at SNRs 10, 20 and 40. After optimization, we calculated for largest, middle and smallest volume fractions the accuracy and precision as above. We aggregated these results over the three volume fractions within the model into a mean and standard error for both accuracy and precision.

Results

We focus first on the effect of optimization algorithms when applied to the entire diffusion model estimation problem. Subsequently, we focus on initialization strategies by cascaded optimization, dividing the estimation problem into sequential lower dimensional problems.

Effect of optimization algorithm

Fig. 3 shows estimated parameter maps for a single HCP MGH subject (HCP MGH − 1.5mm −552vol - b10k, subject 1003) for four different models (Tensor, Ball&Sticks_in1, NODDI, CHARMED_in1) each estimated with three optimization algorithms (NMSimplex, Powell, LM). Note that the Tensor model is mainly shown for comparison and is only estimated on the b-value 1000 s/mm2 shell. Table 6 shows the corresponding Log Likelihoods (LL), Bayesian Information Criteria (BIC) and run times (RT) for these results, along with the results of a single RLS – 2mm – 134dir – b3k subject. It can be seen that a GPU based implementation gives full model optimization results and estimated whole brain parameter maps in matters of seconds to minutes even for complex models and a large dataset. Powell almost always outperforms other methods in terms of LL and BIC (except for CHARMED_in1 for the RLS dataset). The Powell algorithm shows the most consistently high fitting performance over models and datasets, especially on the complex 552 volume HCP MGH dataset. On the simpler 134 volume RLS dataset the fit is sometimes matched (with NMSimplex in Tensor-RLS and with LM in Ball&Sticks_in1- RLS) or slightly exceeded (by NMSimplex in CHARMED_in1 - RLS) by one of the other algorithms. In addition, Powell is often the fastest algorithm, and when not the fastest always better fitting than the fastest algorithm. To assess the variability of the achieved fit, Fig. 4 reports mean differences in fit between algorithms and its standard error over the mean (SEM) over ten HCP MGH and ten RLS subjects. Powell outperforms the other algorithms except for the case of the CHARMED_in1 model with the RLS data by a very small margin. Error bars show that the variability of the fit differences over subjects is relatively low with the SEM generally at about ten percent of the fit difference.

Fig. 3

Table 6

Models (in columns) against optimization algorithms (in rows) for a single HCP MGH and RLS dataset. Reported LL and BIC are the mean over a whole brain white matter mask. The run times are over the entire brain mask and are in units of (h:mm:ss). For the BIC, the number of observations is for Ball&Sticks_in1, NODDI and CHARMED_in1 given as (HCP MGH: 552, RLS: 134) and for Tensor given as (HCP MGH: 104, RLS: 44), while the number of free parameters is given as (Tensor: 7, Ball&Sticks_in1: 4, NODDI: 6, CHARMED_in1: 11).

		Tensor			Ball&Sticks_in1			NODDI			CHARMED_in1
		LL	BIC	RT	LL	BIC	RT	LL	BIC	RT	LL	BIC	RT
HCP MGH	NMSimplex	−528	1099	0:00:41	−3000	6025	0:01:33	−2795	5627	1:07:50	−2698	5466	2:30:53
	Powell	−514	1072	0:01:08	−2985	5995	0:00:50	−2714	5467	0:12:08	−2691	5452	0:21:17
	LM	−519	1082	0:01:30	−2992	6009	0:00:49	−2759	5557	0:14:48	−2708	5486	0:28:10
	NMSimplex	−324	682	0:00:07	−1083	2185	0:00:09	−984	1998	0:05:32	−962	1977	0:06:36

RLS	Powell	−324	682	0:00:03	−1081	2182	0:00:03	−976	1980	0:01:23	−963	1979	0:01:19
	LM	−328	689	0:00:28	−1081	2182	0:00:04	−997	2023	0:01:31	−972	1998	0:02:59

Fig. 4

Algorithm Log Likelihood differences per model over 10 HCP MGH (left) and 10 RLS subjects (right). Bars depict the mean LL difference and error bars give the standard error of the mean for the difference over subjects.

Models (in columns) against optimization algorithms (in rows) for a single HCP MGH dataset. FA: Fractional Anisotropy; FS: Fraction of stick compartment; FR: fraction of restricted (fraction of intra-cellular/restricted compartment). Each model parameter map (column) is individually scaled. Algorithm Log Likelihood differences per model over 10 HCP MGH (left) and 10 RLS subjects (right). Bars depict the mean LL difference and error bars give the standard error of the mean for the difference over subjects. Models (in columns) against optimization algorithms (in rows) for a single HCP MGH and RLS dataset. Reported LL and BIC are the mean over a whole brain white matter mask. The run times are over the entire brain mask and are in units of (h:mm:ss). For the BIC, the number of observations is for Ball&Sticks_in1, NODDI and CHARMED_in1 given as (HCP MGH: 552, RLS: 134) and for Tensor given as (HCP MGH: 104, RLS: 44), while the number of free parameters is given as (Tensor: 7, Ball&Sticks_in1: 4, NODDI: 6, CHARMED_in1: 11). To assess accuracy and precision for different optimization algorithms, Fig. 5 reports ground truth simulation results over three models (Ball&Sticks_in1, NODDI and CHARMED_in1), the two multi-shell direction tables and three simulated SNRs. A general trend of increasing accuracy and precision with SNR can be observed. In general, Powell has a tendency for higher accuracy and precision than the other algorithms over all models, which is often more pronounced at higher SNRs and more complex models. In terms of accuracy, LM can keep up with Powell in Ball&Sticks_in1 and NMSimplex in the CHARMED_in1 model, but precision is almost always considerably higher for Powell. When comparing acquisition tables for the Powell method, little difference can be seen for Ball&Sticks_in1, the simplest model. For NODDI there seems to be a tendency for increased precision when using the extensive HCP MGH direction table instead of the RLS table at the same SNR. For CHARMED_in1, a clear increase in accuracy and precision can be observed when going to the more complex HCP protocol, which may reflect the presence of high b-values (>3000 s/mm2) as previously reported (Santis et al., 2014).

Fig. 5

Ground truth simulation results for three models (rows) and two multi-shell direction tables (columns) for three SNRs. The unstriped bars represent the accuracy (acc) per optimization algorithm, the striped bars represent the precision (prc), computed over 104 realizations per volume fraction. The bars give the mean and the error bars give the standard error computed over seven different simulated volume fractions between 0.2 and 0.8.

Effects of cascaded optimization

We next focus on the effect of different cascading strategies for a very complex model, CHARMED_in3 with three different restricted compartments per voxel and a total number of 19 parameters. This is a very difficult model to fit, even on a very large number of volumes (such as the HCP MGH multi-shell direction table), but nonetheless a very useful model for microstructure quantification as it separates multiple fiber orientations per voxel (De Santis et al., 2012). Fig. 6 shows FR maps and multiple local restricted compartment orientations for CS (Cascade S0), CI (Cascade Initialized) and CF (Cascade Fixed). CS will optimize the full 19 parameters in the final cascade step; CI will also optimize the full 19 parameters in the final cascade step but the orientations are initialized from an earlier Ball&Sticks_in3 cascade step; CF will optimize 13 parameters in the final cascade step where the 6 restricted orientation parameters are fixed from an earlier cascade step. Transverse maps and orientation plots are shown at the level of the centrum semiovale where three distinct fiber orientations are expected in white matter. Both FR and orientation estimates appear more structured and smooth in CI and CF, compared to CS. Orientation estimates, in particular, show missing identification of the green A-P orientation (corresponding to the superior longitudinal fasciculus) in some places, for CS and to a lesser degree CI (see white arrow). Table 7 reports run times and fit parameters for different models of increasing complexity for a single HCP MGH and RLS dataset. The run times and fit parameters show that quantitatively CI and CF have a better fit than CS. In terms of run time, CF outperforms both CS and CI, especially for the CHARMED_in3 model by a factor of 3. These results show that a straightforward optimization of a large 19 parameter model as in CS, achieves a suboptimal fit, even when using a large multi-shell direction table. However, with a good initialization of a subset of the parameters, the fit can be improved considerably. Finally, although results are more subtle, fixing (in CF) rather than initializing (in CI) the orientation parameters, provides similarly good results even though the degrees of freedom in the final optimization step have been reduced. For simpler models (NODDI and CHARMED_in1 and _in2) with less parameters, the fit advantages of CI and CF strategies are almost absent and the speed advantages less pronounced, although speed-ups of 1.5x to 3x are still achieved.

Fig. 6

Table 7

Models (columns) against optimization algorithms (rows) for a single HCP MGH and RLS dataset. Reported LL and BIC are the mean over a whole brain white matter mask. The run times are over the entire brain mask and are in units of (h:mm:ss). For the BIC, the number of observations is given by the number of volumes in the datasets (552 for HCP MGH and 134 for RLS), while the number of free parameters is given as (NODDI: 6, CHARMED_in1: 11, CHARMED_in2: 15, CHARMED_in3: 19).

		NODDI			CHARMED_in1			CHARMED_in2			CHARMED_in3
		LL	BIC	RT	LL	BIC	RT	LL	BIC	RT	LL	BIC	RT
HCP MGH	CS	−2714	5467	0:12:08	−2691	5452	0:21:17	−2680	5455	1:03:38	−2677	5474	1:29:34
	CI	−2716	5470	0:11:31	−2689	5447	0:21:02	−2677	5449	1:00:56	−2675	5469	2:07:14
	CF	−2715	5468	0:07:47	−2690	5449	0:13:17	−2678	5451	0:33:44	−2676	5472	1:03:09
	CS	−976	1980	0:01:23	−963	1979	0:01:19	−962	1997	0:03:51	−961	2015	0:11:12

RLS	CI	−974	1977	0:01:09	−961	1977	0:01:42	−959	1991	0:04:57	−958	2009	0:12:31
	CF	−974	1978	0:00:48	−962	1978	0:00:51	−960	1993	0:02:13	−959	2011	0:03:38

Parameter maps (CHARMED_in3, FR map; top row) and local restricted compartment 3D color-coded orientations (bottom row) for different cascading strategies (in columns) for a single HCP MGH subject (1003). The top row shows a thresholded FR map superimposed on a hindered fraction map in gray-scale. Models (columns) against optimization algorithms (rows) for a single HCP MGH and RLS dataset. Reported LL and BIC are the mean over a whole brain white matter mask. The run times are over the entire brain mask and are in units of (h:mm:ss). For the BIC, the number of observations is given by the number of volumes in the datasets (552 for HCP MGH and 134 for RLS), while the number of free parameters is given as (NODDI: 6, CHARMED_in1: 11, CHARMED_in2: 15, CHARMED_in3: 19). To assess the robustness of the fit improvements, Fig. 7 reports mean difference in fit between cascading strategies and its SEM over ten HCP MGH and ten RLS subjects. CI and CF always have a higher LL than CS (with the exception of NODDI for the HCP MGH data). More generally, the fit improvement of CI and CF over CS increases with model complexity (i.e. with CHARMED_in2, CHARMED_in3). In addition, the fit of CI is always slightly higher than that of CF, with this fit difference generally being smaller than that separating CS from both of these. Error bars show that the variability of the fit differences over subjects is relatively low.

Fig. 7

Inter-subject differences of the LL for each model and cascading technique over 10 MGH and 10 RLS subjects. The error bars give the standard error of the mean for the inter-subject differences.

Inter-subject differences of the LL for each model and cascading technique over 10 MGH and 10 RLS subjects. The error bars give the standard error of the mean for the inter-subject differences. To assess accuracy and precision for different cascading strategies, Fig. 8 reports ground truth simulation results for the two multi-shell direction tables and three SNRs for the most complex CHARMED_in3 model. The cascading strategies CI and CF outperform CS in terms of accuracy and precision, especially for the higher SNRs of 20 and 40. For the more extensive HCP protocol, accuracy and precision for CI and CF is very similar, irrespective of SNR. For the less extensive RLS protocol, CF has a tendency of outperforming CI, especially for lower SNRs. Also, the accuracy and precision is clearly higher for the more complex HCP MGH multi-shell direction table.

Fig. 8

Ground truth simulation results for the CHARMED_in3 model for the HCP MGH and RLS multi-shell direction tables for three SNRs. The unstriped bars represent the accuracy (acc) per cascading strategy, the striped bars represent the precision (prc), computed over a total number of 106 realizations. The bars give the mean and the error bars give the standard error computed over the volume fractions of the three simulated orientations.

Discussion

Using an efficient GPU based implementation, we show that run time can be removed as a fundamental constraint for optimization of multi-compartment models, achieving whole brain dataset fits in seconds to minutes. We find that optimization algorithms and multi-step optimization approaches have a considerable influence on performance and stability over subjects and over acquisition protocols. The gradient-free Powell conjugate-direction algorithm was found to outperform other common algorithms in terms of run time, fit, accuracy and precision. Parameter initialization approaches were found to be relevant especially for more complex models, such as those involving several fiber orientations per voxel. The Powell conjugate-direction algorithm almost always outperforms the other methods in terms of LL and BIC, over multiple models and two different multi-shell acquisition protocols. In addition, Powell is often the fastest algorithm, and when not the fastest it always provides better fitting than a faster algorithm. The variability over ten subjects in the fit differences between algorithms is relatively low with the standard error generally at a third to half of the fitting difference. This shows that the fitting improvement is a significant and robust one likely exceeding the variability of the model parameters in experimental subject groups and therefore possibly of influence on statistical power. The BIC differences between optimization algorithms are of a similar size as the BIC differences between models. This means that the choice of optimization algorithm could have an influence on later model comparisons and model rankings. Going beyond data fit and considering accuracy and precision using ground truth simulations we find that Powell outperforms the other algorithms, which is even more pronounced at higher SNRs and with more complex models. When comparing accuracy and precision between acquisition protocols for the Powell method, little effect is observed for simpler models such as Ball&Stick_in1 and NODDI, but an increase in accuracy can be observed for CHARMED_in1 when going to the more complex HCP protocol.

Cascaded optimization

Different forms of cascaded optimization have already been applied to diffusion microstructure modeling. For instance, Zhang et al. (2012) and the accompanying NODDI MATLAB toolbox first do an initial grid search over a feasible range of parameter values after which the parameters are optimized twice, once with the fiber orientations and dispersion locked and the second time with all parameters free. De Santis et al. (2016) uses a two-step optimization approach in which the number of intra-axonal compartments and their orientations are fixed to the orientations found by the CSD method (Tournier et al., 2007, Jeurissen et al., 2014). We formalize and generalize multi-step optimization here as cascaded model optimization. Results show that CI and CF always have a better fit (higher Log Likelihood) than CS and that the fit improvement of CI and CF over CS increases with model complexity. This shows that with a good initialization of a subset of the parameters, the fit can be improved considerably over one-step optimization (not considering S0, as it is common practice to either work on S0-normalized signal or to estimate and fix it beforehand). In terms of accuracy and precision CI and CF also outperform CS, especially for the higher SNRs of 20 and 40. Comparing between CI and CF for the complex CHARMED_in3 model (19 parameters) and for the more extensive HCP protocol, the accuracy and precision for CI and CF is very similar, irrespective of SNR. For the less extensive RLS protocol, CF has a tendency of outperforming CI especially for lower SNRs. These results are interesting given that CF has lower degrees of freedom in its final cascade step than CI. More degrees of freedom potentially allow for a better fit since the model can better adapt to the given data, which is shown in the superior fit performance in CI in the context of high SNR and the rich HCP MGH acquisition protocol. However, in the context of few data points or lower SNR, more degrees of freedom are a disadvantage, as is shown in the superior accuracy of CF in the context of lower SNRs and the RLS acquisition protocol. Hence, overall, CF has very appealing properties for general use as a cascaded optimization strategy for complex models. Besides providing a good fit, accuracy and precision in many contexts, it is also, by a large margin, invariably, the fastest cascading strategy. Since the Powell algorithm provided the best performance when compared to NMSimplex and LM we focused here on comparing cascading results using Powell only. Comparisons of cascading strategies using the other optimization algorithms (see Supplementary Table 1) show that CI and CS strategies can provide considerable benefits for NMSimplex and LM as well, but also that the Powell algorithm continues to outperform NMSimplex and LM when combined with CF and CI. Other initialization strategies like random multi-start and initializing with a preliminary grid search have been applied to diffusion microstructure model optimization. Comparisons of these initialization strategies to the cascading strategies investigated here using all optimization algorithms (see Supplementary Table 2) show that considerable fit improvements can indeed be achieved with multi-start and grid search compared to CS for NMSimplex and LM. However, these results are almost invariably inferior to, or as good as, the CF and CI results, especially for the Powell algorithm. In terms of runtime/performance tradeoff the cascading strategies perform vastly better than grid search and multi-start. An additional argument for the use of CF is that it allows combining microstructure parameter maps with tractography results in a structured way (e.g.; Bells et al., 2011). One can use the fiber orientations from a Ball&Sticks_in[n] fit for tractography and subsequently use CF with the same orientations for a CHARMED_in[n] model to ensure that microstructure parameters mapped onto these tracks agree with the local track orientation. Alternatively, when using CI or when fitting the tractography orientations and the microstructure models separately, one would not have this assurance. Future research could focus on more ways of initializing or fixing parameters over cascading steps, such as limiting the extra-axonal Tensor compartment in CHARMED to the orientations of the intra axonal orientations. Additionally, one could also use other methods for orientation initialization, such as CSD (Tournier et al., 2007, Jeurissen et al., 2014). Additionally, one can think of cascades where the final model is initialized with the results of more than one model, for example, initializing the orientations with a Ball&Sticks model and initializing a (possible) T1 or T2 compartment from a different model. These extensions fit within the framework proposed here.

Run times

All run times reported here were achieved with a single 2015 AMD Fury X graphics card with a theoretical peak performance of 8.6 teraflops single precision. Higher performing hardware, as well as use of multiple devices in parallel should deliver proportional run time gains. In practice, OpenCL driver performance, number of parallel streaming processors and streaming processor architecture, device memory and memory bandwidth (here 4GB and 512GBytes/sec, respectively) also determine computational performance.

Limitations

In this work we compared three optimization algorithms: Nelder-Mead Simplex, Powell's conjugate-direction and Levenberg-Marquardt which covers a set of generally used algorithms. Other optimization algorithms could be considered. Since all the tested optimization routines are local optimizers, it would be interesting to also use global optimization algorithms such as the Self-Organizing Migrating Algorithm (SOMA; Zelinka, 2004) to evaluate their capacity of avoiding local minima. Also Markov Chain Monte Carlo sampling of the full Bayesian posterior would be an interesting extension since it can avoid local minima and would provide precision or uncertainty estimates for the obtained parameters. However MCMC sampling is a different type of algorithm than the optimization algorithms covered in this work, in two ways. First, it has a different goal: it aims to sample the full posterior distribution of parameters, rather than provide a (maximum likelihood) point estimate of the optimal parameter value. Generating the full posterior inevitably also takes more runtime, which is often only justified if more than just the maximum of the posterior is used (e.g. the standard deviation of the posterior). Second, it requires more information. All algorithms in this work are local optimization routines that only require an objective (likelihood) function to perform the minimization. MCMC on the other hand additionally requires the user to define a prior probability function and a proposal function for each of the model parameters. Other optimization approaches avoid non-linear optimization altogether by linearizing the optimization problem (Daducci et al., 2015; Novikov, 2015). For instance, AMICO (Daducci et al., 2015) constructs a convex formulation with a single global minimum by choosing a finite dictionary of atoms and has been successfully applied to the NODDI, ActiveAx (Anna Auria, 2015, Daducci et al., 2015) and VERDICT (Bonet-Carne et al., 2016) models. This different approach comes with its own trade-offs. For instance, atom dictionary size can increase exponentially in more complex models when sampling interdependent parameters sufficiently dense (Pew-Thian Yap, 2015) and it is limited to a Gaussian noise model, which embeds assumptions almost never met in standard multichannel acquisitions (Gudbjartsson and Patz, 1995, Dietrich et al., 2008, Daducci et al., 2015).

Conclusions and recommendations

When the aim is optimization of a point estimate, rather than sampling a full parameter posterior, (e.g. with MCMC), we would advise the use of the Powell algorithm for microstructure modelling in dMRI. For low complexity models up to about 8 parameters, including NODDI and Ball&Sticks_in1 a simple CS strategy can be used, although a CF strategy can yield run time gains. For intermediate complexity models, such as Ball&Sticks_in3, we recommend using a successive CI cascade on an increasing number of fiber orientations. Finally, CF is recommended for the higher complexity models such as CHARMED_in3 yielding run time, fit, accuracy and precision advantages as well as advantages for combined tractography and microstructure.

Software implementation note

All biophysical compartment models, noise models / likelihood functions, as well as optimization algorithms were modularly implemented in a python based GPU accelerated toolbox (Maastricht Diffusion Toolbox or MDT) freely available under an open source L-GPL license at https://github.com/cbclab/MDT. It comes with a graphical user interface for easy usage as an analysis tool. MDT can be used to perform all model optimizations reported in this work and the modular design of the toolbox allows easy extension with new single compartment models and composite multi-compartment models.

Table 8

Parameter transformations, the units of the diffusivities is in m2/s, other units are dimensionless.

Parameter	Model (& compartment)	Encoding transformation	Decoding transformation
d\|\|	Tensor	d®\|\|=asin⁡(\|d\|\|1*10−8\|)	d\|\|=sin2(d®\|\|)∙1*10−8
d⊥1	Tensor	d®⊥1=asin⁡(\|d⊥11*10−8\|)	d⊥1=sin2(d®⊥1)∙1*10−8
d⊥2	Tensor	d®⊥2=asin⁡(\|d⊥21*10−8\|)	d⊥2=sin2(d®⊥2)∙1*10−8
ψ	–	ψ®=\|ψ\|%π	ψ=\|ψ®\|%π
θ	–	θ®=\|θ\|%π	θ=\|θ®\|%π
ϕ	–	ϕ®=\|ϕ\|%π	ϕ=\|ϕ®\|%π
κ	–	κ®\|\|=acos⁡(\|κ−110−52π−110−5\|)	κ=cos2(κ®)∙(2π−110−5)+110−5
d\|\|	CHARMED (Tensor)	d®\|\|=asin⁡(\|d\|\|−110−9510−9−1*10−9\|)	d\|\|=sin2(d®\|\|)∙(510−9−110−9)+1*10−9
d⊥1	CHARMED (Tensor)	d®⊥1=asin⁡(\|d⊥1−110−10510−9−3*10−10\|)	d⊥1=sin2(d®⊥1)∙(510−9−310−10)+3*10−10
d⊥2	CHARMED (Tensor)	d®⊥2=asin⁡(\|d⊥2−310−10310−9−3*10−10\|)	d⊥2=sin2(d®⊥2)∙(310−9−310−10)+3*10−10
d	CHARMED(CHARMED_in)	d®=asin⁡(\|d−310−10310−9−3*10−10\|)	d=sin2(d®)∙(310−9−310−10)+3*10−10
η	CHARMED(Johnson noise)	η®\|\|=acos⁡(\|η1*106\|)	η=cos2(η®)∙1*106

Table 9

The initialization in a Cascade S0 (CS).

Target model	Source model	Cascade parameters inits
Ball&Sticks_in[n]	S0	S0.s0
CHARMED_in[n]	S0	S0.s0
Tensor	S0	S0.s0
NODDI	S0	S0.s0

Table 10

The initializations in a Cascade Initialize (CI) cascade.

Target model	Source model	Cascade parameters inits
Ball&Sticks_in1	S0	S0.s0
Ball&Sticks_in[n]	Ball&Sticks_in[n-1]	S0.s0, Stick[n].{θ,ϕ}, wstick{0,1,2,…n}

CHARMED_in[n]	Ball&Sticks_in[n]	S0.s0
		CHARMED_in[n].{θ,ϕ} = Stick[n].{θ,ϕ}
		Tensor.{θ,ϕ} = Stick0.{θ,ϕ}
		w_res{0,1,2,…n}=wstick{0,1,2,…n}
Tensor	Ball&Sticks_in1	S0.s0, Tensor.{θ,ϕ} = Stick0.{θ,ϕ}

NODDI	Ball&Sticks_in1	S0.s0
		NODDI_in.{θ,ϕ} = Stick0.{θ,ϕ}
		wcsf=wball
		win=wstick0/2
		wex=wstick0/2

Table 11

The fixes overriding the initializations in a Cascade Fixed (CF) cascade.

Target model	Source model	Overriding cascade fixes
CHARMED_in[n]	Ball&Sticks_in[n]	CHARMED_in[n].{θ,ϕ} = Stick[n].{θ,ϕ}
NODDI	Ball&Sticks_in1	NODDI_in.{θ,ϕ} = Stick0.{θ,ϕ}

51 in total

1. Age-related decline in brain white matter anisotropy measured with spatially corrected echo-planar diffusion tensor imaging.

Authors: A Pfefferbaum; E V Sullivan; M Hedehus; K O Lim; E Adalsteinsson; M Moseley
Journal: Magn Reson Med Date: 2000-08 Impact factor: 4.668

Review 2. Diffusion tensor imaging of the brain: review of clinical applications.

Authors: P C Sundgren; Q Dong; D Gómez-Hassan; S K Mukherji; P Maly; R Welsh
Journal: Neuroradiology Date: 2004-04-21 Impact factor: 2.804

3. Fiber composition of the human corpus callosum.

Authors: F Aboitiz; A B Scheibel; R S Fisher; E Zaidel
Journal: Brain Res Date: 1992-12-11 Impact factor: 3.252

Review 4. Diffusion tensor imaging (DTI)-based white matter mapping in brain research: a review.

Authors: Yaniv Assaf; Ofer Pasternak
Journal: J Mol Neurosci Date: 2008 Impact factor: 3.444

Review 5. The role of diffusion tensor imaging in the evaluation of ischemic brain injury - a review.

Authors: Christopher H Sotak
Journal: NMR Biomed Date: 2002 Nov-Dec Impact factor: 4.044

6. Accelerated Microstructure Imaging via Convex Optimization (AMICO) from diffusion MRI data.

Authors: Alessandro Daducci; Erick J Canales-Rodríguez; Hui Zhang; Tim B Dyrby; Daniel C Alexander; Jean-Philippe Thiran
Journal: Neuroimage Date: 2014-10-22 Impact factor: 6.556

7. Axon diameter mapping in the presence of orientation dispersion with diffusion MRI.

Authors: Hui Zhang; Penny L Hubbard; Geoff J M Parker; Daniel C Alexander
Journal: Neuroimage Date: 2011-02-19 Impact factor: 6.556

8. Impact of transcytolemmal water exchange on estimates of tissue microstructural properties derived from diffusion MRI.

Authors: Hua Li; Xiaoyu Jiang; Jingping Xie; John C Gore; Junzhong Xu
Journal: Magn Reson Med Date: 2016-06-25 Impact factor: 4.668

9. Diffusion tensor imaging of lesions and normal-appearing white matter in multiple sclerosis.

Authors: D J Werring; C A Clark; G J Barker; A J Thompson; D H Miller
Journal: Neurology Date: 1999-05-12 Impact factor: 9.910

10. White matter tract integrity metrics reflect the vulnerability of late-myelinating tracts in Alzheimer's disease.

Authors: Andreana Benitez; Els Fieremans; Jens H Jensen; Maria F Falangola; Ali Tabesh; Steven H Ferris; Joseph A Helpern
Journal: Neuroimage Clin Date: 2013-11-09 Impact factor: 4.881

31 in total

1. Cellular Automata Tractography: Fast Geodesic Diffusion MR Tractography and Connectivity Based Segmentation on the GPU.

Authors: Andac Hamamci
Journal: Neuroinformatics Date: 2020-01

2. Assessing the validity of the approximation of diffusion-weighted-MRI signals from crossing fascicles by sums of signals from single fascicles.

Authors: Gaëtan Rensonnet; Benoît Scherrer; Simon K Warfield; Benoît Macq; Maxime Taquet
Journal: Magn Reson Med Date: 2017-07-16 Impact factor: 4.668

3. Microglial Density Alters Measures of Axonal Integrity and Structural Connectivity.

Authors: Sue Y Yi; Nicholas A Stowe; Brian R Barnett; Keith Dodd; John-Paul J Yu
Journal: Biol Psychiatry Cogn Neurosci Neuroimaging Date: 2020-04-24

4. Tacrolimus Protects against Age-Associated Microstructural Changes in the Beagle Brain.

Authors: Hamsanandini Radhakrishnan; Margo F Ubele; Stephanie M Krumholz; Kathy Boaz; Jennifer L Mefford; Erin Denhart Jones; Beverly Meacham; Jeffrey Smiley; László G Puskás; David K Powell; Christopher M Norris; Craig E L Stark; Elizabeth Head
Journal: J Neurosci Date: 2021-05-05 Impact factor: 6.167

5. Towards microstructure fingerprinting: Estimation of tissue properties from a dictionary of Monte Carlo diffusion MRI simulations.

Authors: Gaëtan Rensonnet; Benoît Scherrer; Gabriel Girard; Aleksandar Jankovski; Simon K Warfield; Benoît Macq; Jean-Philippe Thiran; Maxime Taquet
Journal: Neuroimage Date: 2018-09-30 Impact factor: 6.556

6. Bayesian uncertainty quantification in linear models for diffusion MRI.

Authors: Jens Sjölund; Anders Eklund; Evren Özarslan; Magnus Herberthson; Maria Bånkestad; Hans Knutsson
Journal: Neuroimage Date: 2018-03-29 Impact factor: 6.556

7. Ultra-high resolution blood volume fMRI and BOLD fMRI in humans at 9.4 T: Capabilities and challenges.

Authors: Laurentius Huber; Desmond H Y Tse; Christopher J Wiggins; Kâmil Uludağ; Sriranga Kashyap; David C Jangraw; Peter A Bandettini; Benedikt A Poser; Dimo Ivanov
Journal: Neuroimage Date: 2018-06-08 Impact factor: 6.556

8. Modeling white matter microstructure with fiber ball imaging.

Authors: Emilie T McKinnon; Joseph A Helpern; Jens H Jensen
Journal: Neuroimage Date: 2018-04-13 Impact factor: 6.556

9. Adverse childhood experiences predict neurite density differences in young children with and without attention deficit hyperactivity disorder.

Authors: Megan M Hare; Anthony Steven Dick; Paulo A Graziano
Journal: Dev Psychobiol Date: 2022-01 Impact factor: 3.038

Review 10. Cerebral White Matter Myelination and Relations to Age, Gender, and Cognition: A Selective Review.

Authors: Irina S Buyanova; Marie Arsalidou
Journal: Front Hum Neurosci Date: 2021-07-06 Impact factor: 3.169