Literature DB >> 32048764

A novel gamma GLM approach to MRI relaxometry comparisons.

Rohan Kapre^1,2, Junhan Zhou³, Xinzhe Li¹, Laurel Beckett², Angelique Y Louie^1,3.

Abstract

PURPOSE: To demonstrate that constant coefficient of variation (CV), but nonconstant absolute variance in MRI relaxometry (T1 , T2 , R1 , R2 ) data leads to erroneous conclusions based on standard linear models such as ordinary least squares (OLS). We propose a gamma generalized linear model identity link (GGLM-ID) framework that factors the inherent CV into parameter estimates. We first examined the effects on calculations of contrast agent relaxivity before broadening to other applications such as analysis of variance (ANOVA) and liver iron content (LIC).
METHODS: Eight models including OLS and GGLM-ID were initially fit to data obtained on sulfated dextran iron oxide (SDIO) nanoparticles. Both a resampling simulation on the data as well as two separate Monte Carlo simulations (with and without concentration error) were performed to determine mean square error (MSE) and type I error rate. We then evaluated the performance of OLS/GGLM-ID on R1 repeatability and LIC data sets.
RESULTS: OLS had an MSE of 4-5× that of GGLM-ID as well as a type I error rate of 20-30%, whereas GGLM-ID was near the nominal 5% level in the relaxivity study. Only OLS found statistically significant effects of MRI facility on relaxivity in an R1 repeatability study, but no significant differences were found in a resampling, whereas GGLM was more consistent. GGLM-ID was also superior to OLS for modeling LIC.
CONCLUSIONS: OLS leads to erroneous conclusions when analyzing MRI relaxometry data. GGLM-ID factors in the inherent CV of an MRI experiment, leading to more reproducible conclusions.

Entities: Chemical Disease Gene Species

Keywords: MRI relaxometry; coefficient of variation; gamma GLM; relaxivity; repeatability

Mesh：

Substances：
Contrast Media
Iron

Year: 2020 PMID： 32048764 PMCID： PMC7317199 DOI： 10.1002/mrm.28192

Source DB: PubMed Journal: Magn Reson Med ISSN： 0740-3194 Impact factor: 4.668

INTRODUCTION

MRI relaxometry is used increasingly as a method to quantify differences in MRI images.1, 2, 3 Relaxometry involves mapping of relaxation times T 1, T 2, (or alternatively, the relaxation rates R 1, R 2, corresponding to their inverses) spatially across voxels in an MRI image. Compared to traditional MRI imaging based on T 1‐ or T 2‐weighted signal intensity, MRI relaxometry is more robust for quantitative comparison as it is less sensitive to nuisance parameters such as coil positioning or scanner variabilities.1, 2 Among other applications, relaxometry has been used to detect brain abnormalities1, 4 and myocardial fibrosis5 or evaluate metabolic imaging,3 liver iron content (LIC),6, 7, 8, 9 and contrast agent (CA) properties.10, 11, 12, 13, 14 As MRI relaxometry becomes more widespread, there is an increasing need to develop an appropriate statistical framework to accurately quantify differences in relaxometric biomarkers, and correlate these differences to physiological phenomenon. Currently, most comparisons are performed using t‐tests, analysis of variance (ANOVA), or ordinary least squares (OLS) regression15, 16, 17 that all fall under the category of standard linear models.15, 18 In fact, both Student’s t‐test and ANOVA are special cases of OLS regression with categorical variables.18, 19 Beyond the linearity of the relationship, OLS has three key assumptions18, 19: The response variables (relaxation rate or time) must be independent and identically distributed. The model residuals must follow a normal distribution. The model residuals must all have a constant absolute variance (homoscedasticity). Assumption 1 is automatically satisfied for relaxometry data from different scans and subjects. Otherwise, correlation can be accounted for through linear mixed models. Estimates from OLS are also somewhat robust to violations of assumption 2, especially for large sample sizes. However, assumption 3 is often violated in the context of MRI relaxometry data, where the absolute error in the relaxation time (T 1, T 2, or ) or rate (R 1, R 2, ) often tends to increase with the measured value itself (heteroscedasticity).6, 16 Much of the literature on reproducibility in MRI relaxometry is focused on measuring the coefficient of variation (CV),6, 20, 21, 22 which implies a constant relative error rather than absolute error. Various factors can influence the CV of T/R, ranging from the signal intensity fitting error, the exact pulse sequence used, the flip angle, concentration errors, as well as natural biological variability.2, 23 Violations of assumption 3 ultimately end up leading to less precise estimates for coefficients in a linear model and more importantly, biased standard errors (SEs).18 These biased SEs will result in invalid P‐values for comparisons, leading to potentially erroneous conclusions.24 Sometimes the assumptions of standard linear models are bypassed using nonparametric tests such as Wilcoxon rank‐sum‐test3 or the Kruskal‐Wallis test.25 However, these two tests are limited in that they can only be applied to compare two populations or multiple levels within a single factor, respectively. Additionally, nonparametric methods are less powerful at detecting differences than parametric counterparts provided the data can be described adequately by a normal or other parametric distribution.26, 27 Ideally, the best parametric method would assume constant CV rather than constant absolute error. One way to do so is the log transform, which is known to stabilize absolute errors that have a constant CV. This amounts to assuming that relaxometry data are lognormally distributed,22 rather than normally distributed. In fact, the log transform has been used occasionally for both linear mixed models4, 5 and LIC calibration curves.8, 9, 28 Recently, the gamma distribution has also been used to account for the constant CV in neuroimaging relaxometry data.29 In the context of regression, the gamma distribution can be used through a generalized linear model (GLM) rather than OLS.19, 30, 31 Although results will often be similar to a log‐transform,19, 31, 32 the gamma GLM (GGLM) framework is more flexible as it preserves the physical additivity of relaxation rates15 and simple interpretation of the results on the original untransformed scale. The simplest example of the linear model in MRI relaxometry occurs in experiments to determine relaxivity (mM−1 s−1), a measure of the strength of a CA. A higher relaxivity implies that the CA is more effective at increasing the relaxation rate of protons surrounding its metal‐ion sphere.33, 34, 35 Using the Solomon–Bloemborgen theory of relaxation, it can be shown that relaxation rate is a linear function of CA concentration33, 34, 35: To determine relaxivity and associated SE, investigators typically perform an OLS regression of versus and take the slope.13 corresponds to the residual error term in the fit.15, 18, 19 Relaxivity studies are prime examples of heteroscedasticity where the error bars tend to increase in size with the relaxation rate.10, 11, 12, 13 Currently, statistically suspect comparisons of relaxivities under different experimental conditions (e.g., field strength) or between CAs are regularly performed and displayed in bar charts or tables.10, 11, 12, 13 To date, there is no standard protocol for how to best design and analyze a relaxivity experiment.36 Similar problems have been addressed previously in UV‐Vis spectroscopy37 and liquid chromatography‐mass spectrometry (LC‐MS)38 through weighted least squares (WLS), which has also been applied on occasion to relaxivity.12, 39 Although WLS is a potential solution, the GGLM approach proposed in this work is more flexible as it assigns weights empirically by an iterative reweighted least squares algorithm.31 Therefore, it is more easily extended to MRI relaxometry experiments involving more complex situations other than relaxivity, such as developing quantitative imaging biomarkers,22 where exact weights may be unknown. In this work, relaxivity measurements will be used as a starting basis for demonstrating the superiority of weighting, log‐transform, and in particular, the GGLM framework over the current standard of OLS. Afterward we will move toward more general applications of GGLM in MRI relaxometry, such as ANOVA and the measurement of LIC.

METHODS

NMR relaxometry measurements

Sulfated dextran iron oxide (SDIO) nanoparticles were prepared as described in previous protocols published by our group40, 41 (Supporting Information Figure S1). Iron content was determined via inductively coupled plasma mass spectrometry. Ten solutions of SDIO in filtered deionized water (milliQ Advantage A10) with the following [Fe] concentrations were prepared: 0–0.08 mM spaced by 0.02 mM, and 0.1–0.5 mM spaced by 0.1 mM. T 2 measurements were acquired with the Carr‐Purcell‐Meiboom‐Gill pulse sequence on a Bruker Minispec mq60 (1.41 T) at 37°C. Gain was set to 57 dB and the number of scans was set to four for each individual measurement. The entire procedure of preparing the ten solutions and obtaining T 2 measurements was repeated four times, for a total of 40 data points.

Relaxivity model fitting

All statistical analysis was performed in R version 3.6.0.42 Along with OLS, a variety of models were fit to the data. The linear models included WLS, GGLM‐ID, GGLM‐INV, and Thiel‐Sen (TS) fit directly to Equation 1. The weights for WLS were defined to be , corresponding to the appropriate weight for a constant CV in R 2. GGLM‐ID and GGLM‐INV refer to gamma GLMs with the identity and inverse links, respectively. The identity link corresponds to applying the gamma distribution to R 2 = 1/T 2 values directly, whereas the inverse link applies the gamma distribution to T 2 values followed by inversion to obtain R 2.19, 30 TS is a nonparametric linear fitting method using the mblm package (version 0.12.1, for R)43 involving the median of all possible slopes from pairs of points44 at different concentrations and has been recommended once for relaxivity measurements.45 The nonlinear fits included nonlinear least squares (NLS), nonlinear weighted least squares (NWLS), and log‐transformed nonlinear least squares (LNLS). Both NLS and NWLS were fit to Equation 3 corresponding to inverse of Equation 1, whereas LNLS was fit to Equation 4 corresponding to the log transform of Equation 1. The weights for NWLS were set to . Note that NLS just as OLS does not account for heteroscedasticity and has been included only for comparison purposes, and not as a proposed model. The variance function was estimated via a Bayesian log‐log linear regression of SD T 2 versus mean T 2 46 at each concentration. A Bayesian framework was used to obtain a distribution for the slope in the variance function regression with a normal prior of mean 1 and SD 0.3. This accounts for the prior knowledge that the slope will be near 1 because of an approximately constant CV. Furthermore, using a Bayesian framework allows us to obtain a distribution for the slope in the variance function that is not strictly possible in a purely frequentist approach.

Restricted bootstrap resampling

To simulate batch to batch variability, one point at each concentration was randomly chosen and each of the models were fit to the data to obtain a relaxivity and associated SE. The process was repeated 100,000 times to obtain a histogram of possible relaxivities. These individual relaxivities and SE were compared to the full relaxivity obtained from fitting all 40 points to determine the type I error rate via a t‐test. The SE estimate for each individual fit was also compared to the SD of the relaxivity histogram. Furthermore, the mean square error (MSE)47 for each model was computed using the bias of the mean bootstrap relaxivity with respect to the full relaxivity and the SD of the relaxivity histogram:

Relaxivity Monte Carlo simulation

Two Monte Carlo simulations each with 100,000 runs were performed using the above concentrations assuming a true relaxivity of 100 mM−1 s−1. A Gaussian error with a 2.5% CV was added to T 2 in the first simulation while the second simulation additionally included a Gaussian error with 10% CV in concentration. To account for deviations from perfect linear proportionality in the error in T 2, the error was set to be constant CV of , with P set to originate from the posterior normal distribution with slope value and associated uncertainty from the variance function determined earlier. As in the resampling, the MSE and type I error rates were computed.

Data analysis: R 1 repeatability study ANOVA

To demonstrate an application of the GGLM‐ID framework, we analyzed published relaxometry data at 7T from Waterton et al20 on the repeatability/reproducibility of R 1 measurements in NiCl2 phantoms across multiple MRI facilities. The concentrations of Ni2+ tested in the study were 0.50, 1.04, 2.02, 4.08, and 8.05 mM in agarose. The R 1 measurements were performed at five different facilities (A, C, E, F, G1) and three ROIs (O, X, Z) with saturation recovery using a rapid acquisition with relaxation enhancement (RARE) sequence. For this analysis, only the isocenter ROI “O” was considered (to ensure independence) and ANOVA was be performed using OLS, GGLM‐ID, and LNLS frameworks with facility and concentration interaction terms considered. The −1, 0, 1 indicator variable coding with −1 denoting facility G1 was used, shown in Equations 7 and 8. Akaike Information Criterion (AIC) was used to compare the models in addition to an ANOVA F‐test with respect to the null model containing only concentration and solvent (agarose) relaxation rate. Bonferroni’s post‐hoc test was used to evaluate statistically significant differences in relaxivity between the individual facilities using the emmeans package.48

R 2‐LIC Monte Carlo simulations

One medical application of the GGLM‐ID framework is the estimation of LIC via MRI relaxometry.6, 7 The original study done by St. Pierre et al used a spin‐echo sequence and found a complex nonlinear relationship between R 2 and LIC7, 9: The study noted significant heteroscedasticity with CV for R 2 given as 8%. The CV for LIC was piecewise‐dependent: 19% for LIC < 4 mg/g and 9% for LIC > 9 mg/g.7, 8 Equation 10 and the above CVs were used to perform two simulations (10,000 runs) with lognormal error in LIC and with lognormal error in both R 2 and LIC. LIC values in between 4 and 9 mg/g were set to have a random CV uniformly distributed between 9% and 19%. The models considered are shown in Table 1. The true R 2 ranged from 10–275 s−1, translating to a true LIC of 0.05–39.55 mg/g. For all models, the regression was defined with LIC on the y‐axis and R 2 on the x‐axis. The reasoning for this is that the LIC is more variable than the R 2 values from MRI, and the goal is to predict LIC and not R 2. For simplicity, we will only be examining polynomial models as the nonlinear fitting method was not reported in the original paper by St. Pierre et al.7 The models were evaluated by examining the model with the minimum median RMS percent prediction error (MRMSPPE):

Table 1

Models considered for the parametric R 2‐LIC simulation based on the equation derived by St. Pierre et al7 (Equations 10 and 11)

Model name	Functional form	Method	# of Parameters
Quadratic	LIC=β0+β1R2+β2R22	OLS	3
Quadratic	LIC=β0+β1R2+β2R22	GGLM‐ID	3
Double log linear	logLIC=β0+β1logR2	OLS	2
Double log linear	logLIC=β0+β1logR2	GGLM‐LOG	2

The exact nonlinear fitting method performed by St. Pierre et al.7 was is not given, but we used the equation to simulate the data for MRMSPPE calculation.

Models considered for the parametric R 2‐LIC simulation based on the equation derived by St. Pierre et al7 (Equations 10 and 11) The exact nonlinear fitting method performed by St. Pierre et al.7 was is not given, but we used the equation to simulate the data for MRMSPPE calculation. The median helped to ensure that any outliers in the 10,000 runs did not affect the results. To further quantify the uncertainty in the MRMSPPE, we reported 95% percentile intervals. Equation 10 with the squared terms was used rather than absolute values, because these have better properties and more closely match the maximum likelihood estimate corresponding to a heteroscedastic Gaussian distribution for LIC. Finally, the OLS and GGLM‐ID quadratic fits were plotted to an approximation of the R 2‐LIC data obtained using PlotDigitizer.49 To help quantify and visualize variability in the calibration curve, we also added the same lognormal error structure mentioned earlier for both R 2 and LIC directly to the approximated points. Following this, we generated 95% prediction intervals to depict variability in the calibration under different realizations of the same data set.

RESULTS

The results of the SDIO relaxivity experiment are shown in Figure 1. As evidenced by the substantially lower AIC, Figure 1 suggests that the GGLM‐ID framework provided a better fit to the data than OLS. Although adjusted R 2 does not exist for GLMs and is not the ideal metric to assess fit despite widespread use, a pseudo‐adjusted R 2 was calculated for the purposes of data reporting using the residual and null deviance and degrees of freedom.19 Furthermore, the spread in R 2 increased with R 2 itself. To determine the dependence, we performed a log‐log Bayesian linear regression of log(SD(T 2)) versus log(T 2) (Supporting Information Figure S2). The Bayesian linear regression for the variance function revealed that the slope has a 95% probability of being between 0.94 and 1.41. This indicates that SD(T 2) may not be exactly proportional to T 2 but to a power in T 2 between 0.94 and 1.41.

Figure 1

SDIO R 2 relaxivity fits using OLS and GGLM‐ID. The lower AIC suggests that GGLM‐ID provides a better fit. Adjusted R 2 for the OLS fit was 0.995, whereas the pseudo‐adjusted R 2 for the GGLM‐ID fit was 0.996. All eight models were fit to the full data but only OLS and GGLM‐ID are shown in the figure Because many relaxivity studies do only one trial, one point was resampled at each concentration and fit to all models to assess model performance with MSE and type I error rates. Results are shown in Table 2 and a histogram of the resampled relaxivities is shown in Figure 2. Histograms for the estimated SE and variance are shown in Supporting Information Figure S3.

Table 2

Resampling results (100,000 iterations)

Method	Full r2 (mM⁻¹s⁻¹)	Resample r2¯ (± Bias) (mM⁻¹ s⁻¹)	Mean predicted SE (mM⁻¹ s⁻¹)	Resample SD	P(SEpred<SD) (%)	Type I error rate (%)	MSE (mM⁻²s⁻²)
OLS	93.655	93.664 (+0.009)	1.809	2.922	94.25	28.16	8.54
NLS	90.447	90.260 (−0.188)	1.696	4.890	100	49.35	23.95
WLS	91.750	91.556 (−0.194)	1.776	1.856	58.55	5.47	3.48
NWLS	92.172	92.284 (+0.112)	1.768	1.819	55.44	5.34	3.32
LNLS	91.950	91.922 (−0.028)	1.771	1.834	56.66	5.39	3.37
GGLM‐INV	91.881	91.800 (−0.081)	1.776	1.841	56.87	5.39	3.39
GGLM‐ID	92.021	92.043 (+0.022)	1.766	1.829	56.56	5.38	3.34
TS	92.076	92.830 (+0.754)	2.111	2.869	84.36	18.96	8.80

NWLS, LNLS, and GGLM‐ID perform best as assessed by minimum MSE. NWLS, GGLM‐INV, and GGLM‐ID have the lowest type I error rates closest to the ideal 5%. OLS has very high type I error rate as the predicted SE nearly always underestimates the true SD in the column P(SEpred < SD).

Figure 2

Histograms of the relaxivity resampling distributions from each method. OLS relaxivity is bimodal and asymmetric. WLS, NWLS, LNLS, GGLM‐INV, and GGLM‐ID relaxivities are all normally distributed. TS does not appear to be viable in practice

Resampling results (100,000 iterations) Mean predicted SE (mM−1 s−1) (%) NWLS, LNLS, and GGLM‐ID perform best as assessed by minimum MSE. NWLS, GGLM‐INV, and GGLM‐ID have the lowest type I error rates closest to the ideal 5%. OLS has very high type I error rate as the predicted SE nearly always underestimates the true SD in the column P(SEpred < SD). Histograms of the relaxivity resampling distributions from each method. OLS relaxivity is bimodal and asymmetric. WLS, NWLS, LNLS, GGLM‐INV, and GGLM‐ID relaxivities are all normally distributed. TS does not appear to be viable in practice As shown in Table 2, OLS had the third worst MSE and second worst type I error rate of 28.23%. The SE from the OLS fit underestimated the true SD. The OLS histogram (Figure 2) was asymmetric, suggesting that relaxivity from OLS was poorly behaved and not amenable to the usual comparisons that are widely performed. Fits that account for the heteroscedasticity because of constant CV all performed better, with WLS, NWLS, LNLS, GGLM‐INV, and GGLM‐ID being nearly equivalent. However, WLS was slightly biased in the resampling. TS did not appear to be viable despite being a nonparametric method without assumptions. Overall, the resampling simulations suggest that NWLS, LNLS, and both GGLMs perform much better than OLS as assessed by MSE and type I error (bolded). To further investigate the optimal model, we performed Monte Carlo simulations with and . Results are shown in Table 3. The first simulation (Table 3, condition a) was done accounting for only 2.5% error in T 2 whereas the second simulation (Table 3, condition b) additionally accounted for a 10% error in concentration. The inclusion of concentration error allowed us to determine which fitting method is best in a practical sense.

Table 3

Results of relaxivity simulation (100,000 iterations)

Method	SDa (mM⁻¹ s⁻¹)	Biasa (mM⁻¹ s⁻¹)	MSEa (mM⁻² s⁻²)	P (type I error) (%)	SDb (mM⁻¹ s⁻¹)	Biasb (mM⁻¹ s⁻¹)	MSEb (mM⁻² s⁻²)	P (type I error) (%)
OLS	1.115	0.021	1.244	21.68	7.165	0.048	51.346	26.25
NLS	1.642	0.010	2.695	44.18	6.263	−0.495	39.473	34.97
WLS	0.639	−0.031	0.410	4.35	3.605	−1.730	15.988	6.71
NWLS	0.642	0.057	0.416	4.40	3.434	0.884	12.572	5.00
LNLS	0.640	0.013	0.410	4.34	3.450	−0.413	12.076	5.01
GGLM‐INV	0.639	−0.002	0.409	4.34	3.486	−0.852	12.880	5.32
GGLM‐ID	0.640	0.027	0.411	4.36	3.430	0.022	11.768	4.90
TS	0.738	−0.002	0.545	8.85	4.977	−0.010	24.768	10.13

The true relaxivity used to generate the T 2 data was with intercept . A normally distributed T 2 error where was added to the ground truth. The SD, bias, and MSE of the estimated relaxivity is shown along with the type I error probability for a two‐tailed t‐test against the true relaxivity. P also originated from a normal distribution with mean 1.18 and SD 0.12, based on the Bayesian linear fit of log(SD(T 2)) versus log(T 2).

No additional concentration errors, T 2 error only.

With 10% concentration errors in addition to T 2 error. NWLS, LNLS, and both GGLMs have type I error rates robust to random concentration errors. GGLM‐ID is slightly better under concentration errors than NWLS, LNLS, and GGLM‐INV as assessed by MSE. The WLS relaxivity appears to acquire a slight bias under concentration error. OLS has very high type I error rates of 20–30%.

Results of relaxivity simulation (100,000 iterations) The true relaxivity used to generate the T 2 data was with intercept . A normally distributed T 2 error where was added to the ground truth. The SD, bias, and MSE of the estimated relaxivity is shown along with the type I error probability for a two‐tailed t‐test against the true relaxivity. P also originated from a normal distribution with mean 1.18 and SD 0.12, based on the Bayesian linear fit of log(SD(T 2)) versus log(T 2). No additional concentration errors, T 2 error only. With 10% concentration errors in addition to T 2 error. NWLS, LNLS, and both GGLMs have type I error rates robust to random concentration errors. GGLM‐ID is slightly better under concentration errors than NWLS, LNLS, and GGLM‐INV as assessed by MSE. The WLS relaxivity appears to acquire a slight bias under concentration error. OLS has very high type I error rates of 20–30%. The Monte Carlo Simulation results in Table 3 confirmed that OLS is a suboptimal method to determine relaxivity and perform comparisons. The MSE for OLS relative to the bolded methods was roughly three times higher without concentration errors and four times higher with 10% concentration errors. Therefore, the estimate from OLS had poor precision/reproducibility. For making statistical comparisons between CAs, the type I error rate for OLS was roughly 22% and increased to 26% when concentration errors were considered. This indicates that the probability of wrongly concluding two relaxivities are statistically different is 20–30% with OLS at a nominal significance level of 5%. Many of the proposed fits to account for heteroscedasticity were similar when no concentration error was considered. The type I errors were close to the nominal 5% level of significance. However, interestingly, concentration errors appeared to worsen the performance of WLS the most relative to NWLS, LNLS, and both GGLMs. This is in line with the resampling results from earlier. Furthermore, the Monte Carlo simulation with concentration error appeared to indicate that the identity link for GGLM performs better than the inverse link for practical biochemical situations. To showcase an application of the GGLM‐ID framework to more complex linear models in MRI relaxometry, we performed ANOVA with OLS, GGLM‐ID, and LNLS to the R 1 repeatability and reproducibility study data at 7 T from Waterton et al.20 A plot of R 1 versus concentration in each facility can be found in Supporting Information Figure S4 and clearly shows variance increasing with R 1. The −1, 0, 1 coding was used for each facility dummy variable, with facility G1 assigned −1. Results are shown in Table 4.

Table 4

Linear model results for Waterton et al20 data

Coefficient	OLS	GGLM‐ID	LNLS
Intercept R1,0s-1	0.401 ± 0.0159***	0.381 ± 0.004***	0.381 ± 0.005***
Mean r1mM-1s-1	0.840 ± 0.004***	0.849 ± 0.004***	0.849 ± 0.004***
ΔR1,0,As-1	0.056 ± 0.032	0.012 ± 0.010	0.012 ± 0.010
ΔR1,0,Cs-1	−0.019 ± 0.032	−0.010 ± 0.010	−0.010 ± 0.010
ΔR1,0,Es-1	−0.0033 ± 0.032	0.0014 ± 0.010	0.0015 ± 0.010
ΔR1,0,Fs-1	−0.032 ± 0.032	−0.010 ± 0.010	−0.010 ± 0.010
Δr1,AmM-1s-1	−0.031 ± 0.008***	−0.012 ± 0.007	−0.013 ± 0.007
Δr1,CmM-1s-1	0.009 ± 0.008	0.006 ± 0.007	0.006 ± 0.007
Δr1,EmM-1s-1	−0.011 ± 0.008	−0.013 ± 0.007	−0.012 ± 0.007
Δr1,FmM-1s-1	0.027 ± 0.008**	0.017 ± 0.007*	0.017 ± 0.008*
ANOVA P‐value	4.2 × 10⁻⁴***	0.15	0.15
AIC_Full	−107.33	−170.09	−251.41^a
AIC_Null	−90.22	−172.23	−253.42^a

OLS suggests effect of facilities is statistically significant in the model, whereas GGLM/LNLS suggests it is not. AIC values for GGLM indicate better fit relative to OLS. Results of the 10 Bonferroni corrected pairwise comparisons between relaxivities measured at each facility can be found in Supporting Information Table S1

aAIC values for LNLS can only be compared to each other because of a different response variable of log(R 1).

P < 0.05.

P < 0.01.

P < 0.001.

Linear model results for Waterton et al20 data OLS suggests effect of facilities is statistically significant in the model, whereas GGLM/LNLS suggests it is not. AIC values for GGLM indicate better fit relative to OLS. Results of the 10 Bonferroni corrected pairwise comparisons between relaxivities measured at each facility can be found in Supporting Information Table S1 aAIC values for LNLS can only be compared to each other because of a different response variable of log(R 1). P < 0.05. P < 0.01. P < 0.001. For the NiCl2 phantom in the study, we found (for the isocenter alone) the mean relaxivity across all facilities to be 0.840 mM−1 s−1 using OLS, and 0.849 mM−1 s−1 using GGLM‐ID/LNLS. ANOVA and minimum AIC both suggested that for GGLM‐ID/LNLS, the model could be simplified to the standard relaxivity model without facility as a factor. Although this is not definitive proof that the MRI facility has no effect on relaxivity (not rejecting the null hypothesis does not prove the null hypothesis), the data analysis with GGLM‐ID/LNLS did not support the conclusion that relaxivity differed across facilities. OLS, however, did suggest that relaxivity between facilities differed by more than chance alone. Bonferroni’s post‐hoc test revealed statistically significant differences in relaxivity (at familywise α = 0.05) calculated by OLS ANOVA between facility A and each of facility C/F/G1 as well as facility E and F (Supporting Information Table S1). However, considering the increased type I error rate of OLS because of the violation of homoscedasticity, this conclusion is suspect and not necessarily replicable. To investigate this, we performed bootstrap (10,000 runs)‐based ANOVA (with OLS/GGLM‐ID) with resampling to generate a null distribution50 (Supporting Information Table S2). No statistically significant differences between facilities were found in the bootstrap for either model, further suggesting that the OLS conclusions are not replicable. Many of the coefficient values and effect sizes (Table 4 and Supporting Information Table S1) themselves were quite different from those calculated through the more accurate GGLM‐ID/LNLS methods. Facility A and facility F, for example, have a relaxivity difference of 0.058 mM−1 s−1 (7%) with OLS but only a 0.029 mM−1 s−1 (3.4%) difference with GGLM‐ID. Furthermore, the A–F relaxivity difference is marked as statistically significant (P = 0.0002) with OLS but not GGLM‐ID (P = 0.135). Although the coefficients from OLS are theoretically unbiased in the long run, they are less efficient,44 and a far greater sample size would be necessary to obtain representative effect sizes. The OLS coefficients are less precise than the GGLM‐ID counterparts and may fail to be reproducible if the study were to be repeated. Interestingly, a benefit of the GGLM‐ID/LNLS frameworks is the ability to estimate the global CV for relaxation rate across all factors in the experiment directly through the summary output of the model in R. For GGLM‐ID, where is the estimated dispersion parameter of the gamma distribution.19 For LNLS, where RSE is the residual SE of the LNLS fit. In the case of OLS, the RSE would be an estimate of the absolute error only if the assumptions were satisfied. Because LNLS uses a log‐transform, the RSE estimates the absolute error in log(R 1), which back transforms to CV of R 1 on the original scale.19, 22 The estimated CV for this data set was 1.74% using GGLM‐ID and 1.76% using LNLS, which roughly agree with each other as well as the isocenter‐only CV of 1.56% reported by Waterton et al.20 So far, this work has focused on relatively simple scenarios in MRI relaxometry such as relaxivity and inter‐facility variation within phantoms. To demonstrate a potential clinical application of GGLM‐ID, we performed simulations based on the LIC calibration curve and CV values derived by St. Pierre et al.7 Results are shown in Table 5.

Table 5

Median MRMSPPE for Table 1 models of LIC versus R 2

Model (# of parameters)	Method	MRMSPPEa (%)	95% CIa (%)	MRMSPPEb (%)	95% CIb (%)
Quadratic (3)	OLS	24.91	(4.75, 61.57)	165.29	(94.34, 240.23)
Quadratic (3)	GGLM‐ID	3.50	(3.13, 4.45)	16.15	(14.74, 18.21)
Double log linear (2)	OLS	10.57	(10.10, 11.35)	18.26	(16.68, 20.24)
Double log linear (2)	GGLM‐LOG	11.39	(10.62, 12.47)	19.40	(17.52, 21.73)

Simulation parameters were generated from a lognormal distribution: and . CVLIC was set to 0.19 for LIC < 4 mg/g and 0.09 for LIC > 9 mg/g. For 4 ≤ LIC ≤ 9, CV was simulated from CV ~ Unif (0.09, 0.19). The ground truth was set using Equation 10 with spaced by 0.5 s−1. The high MRMSPPE for OLS explains why a quadratic model that looks visually appealing may not have been chosen for R 2‐LIC calibration. However, GGLM‐ID makes a quadratic model more reasonable, and further improvements are possible in the future through an errors‐in‐variables GGLM‐ID model

Error in LIC but no errors in R 2.

Including errors in both LIC and R 2.

Median MRMSPPE for Table 1 models of LIC versus R 2 Simulation parameters were generated from a lognormal distribution: and . CVLIC was set to 0.19 for LIC < 4 mg/g and 0.09 for LIC > 9 mg/g. For 4 ≤ LIC ≤ 9, CV was simulated from CV ~ Unif (0.09, 0.19). The ground truth was set using Equation 10 with spaced by 0.5 s−1. The high MRMSPPE for OLS explains why a quadratic model that looks visually appealing may not have been chosen for R 2‐LIC calibration. However, GGLM‐ID makes a quadratic model more reasonable, and further improvements are possible in the future through an errors‐in‐variables GGLM‐ID model Error in LIC but no errors in R 2. Including errors in both LIC and R 2. According to Table 5, GGLM‐ID quadratic fit performed the best of the models examined. The 95% confidence intervals (CIs) were much narrower, and the MRMSPPE was lower in both situations of no error in R 2 and 8% error in R 2. Although the MRMSPPE values appear high in Table 5, one should note the determination of LIC by biopsy itself is only accurate to 9–19%.7, 8 A simple OLS quadratic fit did not perform well and explains why a quadratic model has not been used. Through GGLM‐ID, however, a quadratic relationship between LIC and R 2 is more reasonable. For the purposes of visualization, the quadratic OLS/GGLM‐ID models were plotted to an approximation of the original data obtained via PlotDigitizer49 in Figure 3.

Figure 3

OLS and GGLM‐ID quadratic fits to an approximation of R 2‐LIC data from St. Pierre et al7 PlotDigitizer were used to obtain approximations to the points. The lower AIC suggests that the GGLM‐ID model fits better, which agrees with the MRMSPPE simulation in Table 5. Ninety‐five percent prediction intervals depict fluctuations of the calibration curve under different realizations of the approximated data. These simulations used the same lognormal errors of and The OLS calibration curve tends to underestimate at high R 2/LIC and has larger variability

DISCUSSION

The OLS approach to the analysis of MRI relaxometry data suffers from high variability and biased SEs that can invalidate statistical inference. Although the coefficients are still unbiased, the higher MSE value with OLS implies that, on average, the OLS estimates will be further away from the true value. At a concentration error of 10%, the MSE for GGLM‐ID (~13) is 4–5× lower than that of OLS (~50). This is a substantial improvement and is in fact equivalent to the accuracy one would obtain with a 4–5× increase in sample size using OLS. Therefore, GGLM‐ID is more economical, because fewer runs are required to obtain representative results. The simulations in Table 2 with 10% concentration error demonstrate that at a relaxivity of 100 mM−1 s−1, this error propagates to a large variability of 7% in relaxivity using OLS but only a 3.5% variability using GGLM‐ID. This is very promising, because it indicates greater reproducibility for relaxivity studies. Although 10% concentration error is relatively high, it ultimately shows that the GGLM‐ID method is quite robust to concentration errors. Concentration errors are anticipated for biological samples or proprietary contrast agents where the metal content is typically determined via atomic absorption spectroscopy or inductively coupled plasma mass spectrometry. Interestingly, the simulations suggested that WLS, which is an often‐recommended method to address heteroscedasticity, is not ideal. The MSE remained low, but the estimate was slightly biased in the presence of concentration errors. In the case of GLMs, weighting is done through an iterative technique that is more flexible and does not require the user to specify weights beforehand. This flexibility is likely what enabled GGLM‐ID to be robust against concentration errors up to 10%. Furthermore, the simulations suggest the GGLM‐ID works even when the CV is not perfectly constant in T 2 but instead in with P ~ N (1.18, 0.122). This suggests that GGLM‐ID works under a variety of error structures. As mentioned earlier, the log‐transform is an alternative to address heteroscedasticity that has an approximately constant CV. Our simulations confirm this but suggest that GGLM‐ID may be slightly superior, likely attributed, again, to the iterative weighting algorithm, which tolerates greater deviations from constant CV.51 Some work also suggests that GGLM has more power than a log‐transform to detect differences.31 Clinically, it should be noted that substantial differences are possible if outliers are present.32 Our simulations, which used a heteroscedastic normal distribution, show that the exact distribution of the relaxation time/rate need not follow a perfect gamma distribution. Furthermore, inverting normal T 2 manually creates non‐normal R 2, and therefore GGLM‐ID also appears to be robust to normality deviations. If the variance structure deviates substantially from constant CV, an option worth exploring is that of semiparametric gamma generalized estimating equations (GEE), which was used very recently to investigate the relationship between C‐reactive protein and LIC by MRI.52 GEE is even more robust to variance structure misspecification.31, 53 Furthermore, an inherent advantage over log‐transform LNLS is that the GGLM‐ID does not distort linearity at the cost of stabilizing the variance (Supporting Information Figure S5). Gamma GLMs can readily be extended to existing linear models for statistical parametric mapping whereas a log‐transform would change coefficient interpretation, without a nonlinear fit.15 This is advantageous when there are continuous predictors.15 When variables are categorical, one additional subtle benefit of using GGLM‐ID over a log transform is interpretation. Log‐transforms compare % change in R, whereas GGLM‐ID compares absolute differences. This is important in MRI relaxometry, where the important metric for assessing phenomenon such as CA uptake or cerebral blood volume16 is and not . is a direct surrogate biomarker for CA concentration, whereas can be small at higher concentrations even if is large and vice versa. Relaxation rates are additive on the original scale,15 and therefore the GGLM approach fits better with physical theory. The ANOVA example using data from Waterton et al20 shows that the choice of OLS versus GGLM‐ID has the potential to influence the conclusion of an MRI relaxometry study. It should be noted that the Waterton et al20 study used a very simple design and the concentration errors for their NiCl2 phantom were very low. Although the statistically significant differences in relaxivity detected by OLS ANOVA with post‐hoc Bonferroni test may not necessarily all be practically significant, they serve to highlight the potential for erroneous conclusions. This is evidenced by the bootstrap resampling (Supporting Information Table S2) that was not able to replicate the OLS conclusions. Considering that the conclusions are affected even in a simple scenario, a greater impact can be expected in experiments that contain concentration errors, biological variation, or mixed effects. The conclusions derived from ANOVA using OLS are suspect because of high type I error rates, which could make studies more difficult to replicate.24 When evaluating imaging biomarkers for suitability, OLS cannot adequately separate true effects from both physiological or instrument noise. Furthermore, the study by Waterton et al20 used the same RARE pulse sequence in all the facilities. As mentioned earlier, MRI parameters such as pulse sequence, flip angle, etc. also influence the measured relaxation time/rate and its CV.2, 20, 23 Intuitively, comparing measurements with different acquisition methods will further increase the overall CV. Although not investigated in this study nor by Waterton et al,20 we speculate that using the GGLM‐ID framework for ANOVA has the potential to lead to greater reproducibility of inferences based on MRI relaxometry images even across different acquisition methods. Future studies on large data sets obtained in vivo will need to be conducted to assess this. The example of LIC‐R 2 calibration serves to highlight yet another example where considering a GGLM‐ID framework can potentially improve prediction. The exact nonlinear fitting method used by St. Pierre et al7 is unknown, but GGLM‐ID is much better than a simple OLS quadratic fit. Simulations in Table 4 indicate that GGLM‐ID performs well as assessed by MRMSPPE even in the presence of errors on both axes. This can also be visualized in Figure 3, where the variability of the OLS calibration curve is greater than that of GGLM‐ID under different realizations of the same data. For further improvements, the best approach would be to extend the GGLM framework to a heteroscedastic errors‐in‐variables method.22, 54, 55 The validation of LIC measurements by MRI relaxometry is an ongoing field of study.9 The GGLM approach can also be integrated with previous guidelines for developing imaging biomarkers by the quantitative imaging biomarker alliance (QIBA).22, 56 QIBA recommends a noninferiority/superiority/equivalence testing framework (over standard hypothesis testing) to assess clinical significance.56 Unlike OLS, GGLM can be used to construct accurate 95% CIs for such tests, while avoiding the potential problems in interpreting log‐transforms that QIBA warns about.22 GGLM can also be used to more precisely derive the constant and proportional bias of a new biomarker measurement technique relative to the accepted standard in QIBA’s linear model.22, 56 For example, the presence of Rician noise is known to bias obtained T 1/T 2 values if the usual Levenberg–Marquardt fitting for signal intensity is used.57 GGLM could be used to precisely assess this bias with respect to methods that directly incorporate the Rician noise in T 1/T 2 estimation. Although not discussed extensively in the current work, the GGLM framework can also be extended to gamma generalized linear mixed models (GGLMM) that are useful for analysis of longitudinal or spatially correlated data that often arises in imaging. These models can be readily integrated with QIBA’s proposed testing framework to accurately quantify changes in R over time and ensure that these changes are repeatable and reproducible.

CONCLUSIONS

Although widely used because of simplicity, OLS is a poor method to determine relaxivity. It fails to consider the nonconstant variance of the relaxation time/rate and provides inaccurate estimates of standard errors that invalidate statistical inference. In this work, we have shown that GGLM‐ID provides one of the best estimates of relaxivity in terms of both precision and accuracy. Furthermore, among other proposed fits, GGLM factors in the inherent variability within an MRI experiment and therefore is more representative of the physical phenomenon. For in vivo studies, GGLM‐ID can be used to make unbiased inferences about relaxometry biomarkers on the original scale of the data, unlike a log transform. The GGLM‐ID method can be implemented easily in R with the glm() command, but to make the method more easily adoptable, we have included code for an interactive RStudio interface that can be used for relaxivity, the simplest example of the linear model in MRI relaxometry. Overall, the GGLM‐ID framework serves to unify the analysis of MRI relaxometry data from simple in vitro experiments in solution to complex biological and clinical MRI data on developing quantitative imaging biomarkers.22 FIGURE S1 Synthesis of SDIO nanoparticles.40, 41 Dextran iron oxide (DIO) and sulfated dextran iron oxide (SDIO) nanoparticles were prepared with slight adjustments to the published protocols by our group. In brief, reduced dextran (1 equiv.) and FeCl3 × 6H2O (27 equiv.) were dissolved in degassed deionized ultra‐filtered (DIUF) water and stirred under argon for 30 min at 0–5°C. FeCl2 × 4H2O (18 equiv.) and NH4OH (432 equiv.) were added dropwise to the mixture. After 3 h at 85°C, DIO was purified by dialysis (50 KDa) against deionized water for 72 h. DIO was obtained as a brown powder after lyophilization. DIO (1 equiv.) was dissolved in dry formamide, to which 2‐methyl‐2‐butene was added dropwise under argon. Sulfur trioxide pyridine complex (1 equiv.) was added in the mixture. Then, the mixture was refluxed for 3.5 h at 30°C with stirring and argon bubbling. Saturated NaHCO3 aqueous solution was used to quench the reaction. The product was purified by dialysis (50 KDa) against deionized water for 72 h. SDIO was obtained as a brown powder after lyophilization. Iron content was determined via inductively coupled plasma mass spectrometry (ICP‐MS) FIGURE S2 T 2 variance function. The variance function was determined by a Bayesian linear regression58 via the MCMC algorithm of log(SD T 2) versus log(mean T 2). The default (uninformative) prior was used for the intercept and an N(1, 0.332) prior was set for the slope. The posterior distribution for slope was found to be N(1.18, 0.122). This was used in the Monte Carlo simulations in Table 2 of the main paper that contained mean 0 normally distributed error of FIGURE S3 (A) Resampling histogram of estimated variance of relaxivity for each fit. Variance follows a χ2 distribution and is therefore asymmetric for all fits. (B) Resampling histogram for estimated standard error (SE) of relaxivity for each fit FIGURE S4 7T R 1 data from Waterton et al.20 R 1 data are shown with all facilities pooled together. The data are clearly heteroscedastic, which impacts statistical inference (because of inefficiency and effects on estimated SEs) even if the fits may visually look similar FIGURE S5 Log transform (LNLS) SDIO relaxivity determination. Whereas LNLS provides a precise determination of relaxivity, it distorts the linear relationship to stabilize the variance. Although adequate, this may complicate analysis in more complex models, such as those with including mixed effects and both categorical/continuous predictors TABLE S1 Bonferroni post‐hoc contrasts (computed using emmeans 48) on relaxivities using OLS and GGLM‐ID. The SE for each difference estimated was roughly 0.0106 mM−1s−1 with OLS and roughly 0.0127 mM−1s−1 with GGLM‐ID. OLS indicates statistically significant differences in relaxivity between A and C/F/G1 as well as E and F. The estimated effect sizes for OLS are also different. Conclusions from OLS are suspect because of heteroscedasticity. The % difference in parenthesis is calculated with respect to the mean relaxivity (0.840 mM−1s−1 for OLS and 0.849 mM−1s−1 for GGLM‐ID). 95% CIs are also shown for % differences with respect to mean relaxivity. *P < 0.05, **P < 0.01, ***P < 0.001 TABLE S2 Bootstrap resampling50 P‐values for R 1 repeatability study. Each facility was assigned a random R 1 value (with replacement) among all observed R 1 values at a given concentration. This ensures that in the bootstrap, no statistically significant differences between facilities will exist. This bootstrap OLS and GGLM‐ID were fitted to the data and a null distribution of t‐statistics (or Wald Z, for GGLM‐ID) was generated for both models for all comparisons above. The null distribution of this bootstrap is more robust to heteroscedasticity in R 1 (s−1). The assumption is that the population relaxivity (r 1 mM−1 s−1) standard errors among Facilities are roughly the same,50 which is valid here as the same CA is being tested. The observed t statistic in the original data was compared to this distribution to generate a bootstrap P‐value. Bonferroni correction for 10 tests was applied in the end and no statistically significant differences are apparent for either model. Therefore, the OLS conclusion is not replicable. 10K iterations were performed. *P < 0.05, **P < 0.01, ***P < 0.001 Click here for additional data file.

34 in total

1. Analysis of cost data in randomized trials: an application of the non-parametric bootstrap.

Authors: J A Barber; S G Thompson
Journal: Stat Med Date: 2000-12-15 Impact factor: 2.373

2. Noninvasive measurement and imaging of liver iron concentrations using proton magnetic resonance.

Authors: Timothy G St Pierre; Paul R Clark; Wanida Chua-anusorn; Adam J Fleming; Gary P Jeffrey; John K Olynyk; Pensri Pootrakul; Erin Robins; Robert Lindeman
Journal: Blood Date: 2004-07-15 Impact factor: 22.113

3. Selecting the correct weighting factors for linear and quadratic calibration curves with least-squares regression algorithm in bioanalytical LC-MS/MS assays and impacts of using incorrect weighting factors on curve stability, data quality, and assay performance.

Authors: Huidong Gu; Guowen Liu; Jian Wang; Anne-Françoise Aubry; Mark E Arnold
Journal: Anal Chem Date: 2014-09-04 Impact factor: 6.986

4. MRI as an alternative to serum ferritin for diagnosis of iron overload in children in the context of immune response after stem cell transplantation.

Authors: Georg W Wurschi; Hans-Joachim Mentzel; Karl-Heinz Herrmann; Ines Krumbein; James F Beck; Juergen R Reichenbach; Karim Kentouche
Journal: Pediatr Transplant Date: 2019-09-18

5. Relaxometry and quantification in simultaneously acquired single and triple quantum filtered sodium MRI.

Authors: Wieland A Worthoff; Aliaksandra Shymanskaya; N Jon Shah
Journal: Magn Reson Med Date: 2018-07-29 Impact factor: 4.668

6. Hippocampal abnormalities after prolonged febrile convulsion: a longitudinal MRI study.

Authors: Rod C Scott; Martin D King; David G Gadian; Brian G R Neville; Alan Connelly
Journal: Brain Date: 2003-08-22 Impact factor: 13.501

Review 7. Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment.

Authors: David L Raunig; Lisa M McShane; Gene Pennello; Constantine Gatsonis; Paul L Carson; James T Voyvodic; Richard L Wahl; Brenda F Kurland; Adam J Schwarz; Mithat Gönen; Gudrun Zahlmann; Marina V Kondratovich; Kevin O'Donnell; Nicholas Petrick; Patricia E Cole; Brian Garra; Daniel C Sullivan
Journal: Stat Methods Med Res Date: 2014-06-11 Impact factor: 3.021

8. T1 mapping of the myocardium: intra-individual assessment of post-contrast T1 time evolution and extracellular volume fraction at 3T for Gd-DTPA and Gd-BOPTA.

Authors: Nadine Kawel; Marcelo Nacif; Anna Zavodni; Jacquin Jones; Songtao Liu; Christopher T Sibley; David A Bluemke
Journal: J Cardiovasc Magn Reson Date: 2012-04-28 Impact factor: 5.364

9. Gray Matter Alterations in Early and Late Relapsing-Remitting Multiple Sclerosis Evaluated with Synthetic Quantitative Magnetic Resonance Imaging.

Authors: Christina Andica; Akifumi Hagiwara; Koji Kamagata; Kazumasa Yokoyama; Keigo Shimoji; Asami Saito; Yuki Takenaka; Misaki Nakazawa; Masaaki Hori; Julien Cohen-Adad; Mariko Yoshida Takemura; Nobutaka Hattori; Shigeki Aoki
Journal: Sci Rep Date: 2019-05-31 Impact factor: 4.379