Literature DB >> 31367671

The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic Alzheimer's disease.

Dan Li¹, Samuel Iddi^1,2, Paul S Aisen¹, Wesley K Thompson³, Michael C Donohue¹.

Abstract

INTRODUCTION: Clinical trials on preclinical Alzheimer's disease are challenging because of the slow rate of disease progression. We use a simulation study to demonstrate that models of repeated cognitive assessments detect treatment effects more efficiently than models of time to progression.
METHODS: Multivariate continuous data are simulated from a Bayesian joint mixed-effects model fit to data from the Alzheimer's Disease Neuroimaging Initiative. Simulated progression events are algorithmically derived from the continuous assessments using a random forest model fit to the same data.
RESULTS: We find that power is approximately doubled with models of repeated continuous outcomes compared with the time-to-progression analysis. The simulations also demonstrate that a plausible informative missing data pattern can induce a bias that inflates treatment effects, yet 5% type I error is maintained. DISCUSSION: Given the relative inefficiency of time to progression, it should be avoided as a primary analysis approach in clinical trials of preclinical Alzheimer's disease.

Entities: Chemical

Keywords: Alzheimer's disease; Bayesian joint mixed-effect model; Clinical trial simulations; Common close design; Cox proportional hazards model; Longitudinal data; Mixed model of repeated measures (MMRM); Statistical power

Year: 2019 PMID： 31367671 PMCID： PMC6656701 DOI： 10.1016/j.trci.2019.04.004

Source DB: PubMed Journal: Alzheimers Dement (N Y) ISSN： 2352-8737

Introduction

Presymptomatic (or preclinical) Alzheimer's disease (PAD) is defined by evidence of abnormal levels of fibrillar amyloid beta (Aβ) in brain as measured by positron emission tomography brain scan or cerebrospinal fluid (CSF) assay [1]. Clinical trials have been initiated in this early phase of disease with the hope that, as in other diseases, early interventions will be more successful in slowing progression [2], [3], [4]. In PAD, progression is typically measured by continuous assessments such as the Preclinical Alzheimer's Cognitive Composite (PACC), a cognitive performance assessment sensitive to amyloid-related decline [5]. An alternative measure of progression is transition from normal cognition to mild cognitive impairment (MCI). The diagnosis of MCI is not algorithmic. It is based on an expert clinician's subjective impression of clinical tests and interviews with participants or study partners. In contrast to cancer progression or death, the cognitive diagnosis (normal or MCI) can vary from one clinician to the next or from one study visit to the next. In a multicenter study, the diagnosis made by a clinician at a trial performance site may be confirmed by experts centrally based on review of assessments without the benefit of direct in-person assessment. Some researchers prefer the inherent clinical meaningfulness of time-to-MCI analysis. Undoubtedly, for a given subject, a transition from normal cognition to MCI is more clinically meaningful than a point change in a continuous cognitive performance measure. However, in a clinical trial, we are still left to determine how large a randomized group difference in the rate of, or delay in, a clinically meaningful event is itself clinically meaningful. The typical Alzheimer's clinical trial assesses cognition at clinic visits conducted every three or six months. With a continuous outcome, the primary contrast is estimated at the last scheduled visit, at approximately 4.5 years. Proponents of time to progression argue that the endpoint allows for a common close design, similar to oncology studies, in which follow-up can continue until the last subject enrolled reaches the 4.5-year visit. The Cox Proportional Hazards model [6] admits data collected under such a design. Linear mixed-effects models can also admit data from a common close design, but assumptions about the mean trend (e.g., quadratic time trends) are necessary, similar to the proportional hazards assumption. Some related work has demonstrated the advantages of analyzing continuous outcomes, when available, over time-to-event outcomes in other contexts. Donohue et al. [7] reviewed the literature and provided an analytic demonstration that, under general conditions, a mixed-effect model comparison of rate of change on a continuous outcome is effectively always more powerful than an analysis of time to threshold. The authors also conducted simulations based on Alzheimer's Disease Neuroimaging Initiative (ADNI) MCI subjects and demonstrated that the marginal linear model and linear mixed models are more robust and efficient than the Cox model of time from MCI to dementia. Our goal is to extend our earlier work in the MCI population [7] to the earlier biomarker-defined PAD population. Specifically, we aim to compare the performance of models of repeated measures of the PACC versus time to progression when evaluating treatment effects in randomized trials and to assess bias due to informative missingness. We also compare the common close design and the fixed follow-up design. We apply the mixed models of repeated measures (MMRMs) [8] for the analysis of change in the PACC score. Constrained longitudinal data analysis (cLDA) models [9] are also used to model the PACC scores, treating time as a continuous variable. Cox proportional hazards model is applied to the time-to-event endpoint.

Data

ADNI is a prospective observational cohort study, led by principal investigator Michael W. Weiner, MD, which is tracking cognitive, imaging, and biofluid markers of Alzheimer's in volunteers diagnosed as cognitively normal (CN), with subjective memory concern, MCI, and mild-to-moderate dementia. To simulate both longitudinal continuous markers and time to MCI for a PAD clinical trial, we first model the disease markers and clinical diagnosis using data from PAD ADNI participants. The PAD population is defined by a diagnosis of CN or subjective memory concern at baseline and florbetapir positron emission tomography standardized uptake value ratio above 1.11 [10] or CSF Aβ below 950.6 pg/ml. The CSF threshold of 950.6 pg/ml was selected because it yields the same proportion of PAD as the 1.11 standardized uptake value ratio threshold. Follow-up observation reports, including a site clinician's diagnosis of CN, MCI, or dementia, are collected every three, six, or 12 months. For more information on the study design of ADNI, including protocols, see adni.loni.usc.edu. Sensitive tests of cognition may show changes in PAD many years before the onset of functional decline [5], [11]. In this work, we focus on the following seven cognitive outcomes in the PAD population: Alzheimer's Disease Assessment Scale delayed word recall (ADASDWR) [12], Logical memory paragraph recall (LogMem) [13], Trail making test part B (Trails B) [14], Mini-Mental State Examination (MMSE) [15], Category fluency—animals, Clinical Dementia Rating—Sum of Boxes (CDRSB) [16], and Functional assessment questionnaire (FAQ). Baseline covariates considered include age and carriage of an apolipoprotein E4 (APOE ε4) allele. The PAD population includes a total of N = 163 individuals, in which N = 39 (23.9%) were observed to progress to MCI over a median follow-up time of 4.0 years (interquartile range: 2.1 to 5.6 years; maximum: 11.5 years). Baseline characteristics of the modeled PAD cohort are presented in Table 1.

Table 1

Variable	NC (N = 120)	SMC (N = 43)	Total (N = 163)
Age	75.21 (5.83)	72.77 (5.78)	74.57 (5.90)
APOE ε4 alleles
0	52 (43)	23 (53)	75 (46)
≥1	111 (57)	140 (47)	88 (54)
ADAS delayed word recall	2.96 (1.79)	3.00 (2.08)	2.97 (1.86)
Logical memory—delayed recall	13.11 (3.15)	12.63 (3.19)	12.98 (3.16)
Trails B	93.40 (48.90)	89.10 (32.00)	92.30 (45.00)
MMSE	29.11 (1.13)	29.09 (0.89)	29.10 (1.07)
Category fluency (animals)	20.72 (5.32)	19.72 (5.60)	20.45 (5.40)
CDRSB
0	111 (92)	36 (84)	147 (90)
0.5	8 (7)	7 (16)	15 (9)
1	1 (1)	0 (0)	1 (1)
FAQ
0	108 (90)	32 (74)	140 (86)
1	7 (6)	8 (19)	15 (9)
2	2 (2)	0 (0)	2 (1)
3	2 (2)	3 (7)	5 (3)
5	1 (1)	0 (0)	1 (1)

NOTE. Values are given as count (%) or mean (SD).

Abbreviations: ADAS, Alzheimer's Disease Assessment Scale; APOE, apolipoprotein E; MMSE, Mini-Mental State Examination; CDRSB, Clinical Dementia Rating—Sum of Boxes; FAQ, functional assessment questionnaire; SD, standard deviation.

Descriptive statistics by baseline diagnosis, normal cognition (NC), and subjective memory concern (SMC) for the preclinical Alzheimer's disease population in the Alzheimer's Disease Neuroimaging Initiative NOTE. Values are given as count (%) or mean (SD). Abbreviations: ADAS, Alzheimer's Disease Assessment Scale; APOE, apolipoprotein E; MMSE, Mini-Mental State Examination; CDRSB, Clinical Dementia Rating—Sum of Boxes; FAQ, functional assessment questionnaire; SD, standard deviation. Missing data patterns assumed in simulations Participants having intolerability are simulated to drop out at month six, and those perceiving inefficacy drop out at twelve months.

Methods

Joint mixed-effects model for longitudinal data

To derive a model to simulate plausible data, we first fit a model to observed ADNI data. We apply a joint (or multivariate) mixed-effects model (JMM) to simultaneously model continuous longitudinal data for disease markers in the PAD population. The model respects the within-subject correlation over time and among the battery outcomes. Linear mixed-effects models are commonly used to model continuous longitudinal data. The multivariate mixed-effects model is specified as for subject i, time j, and outcome k, where are fixed-effect regression coefficients, and b0 and b1 are the subject- and outcome-specific random intercept and slope. The random effects are assumed to follow a multivariate Gaussian distribution with mean vector 0 and variance-covariance matrix Σ, with dimension 2p, that is, (b0,⋯,b0,b1,⋯,b1)′∼N (0,Σ). The model with multivariate random effects has the advantage of reflecting the dependency within subjects and among outcomes. The is the residual error. Because the outcomes are on different scales, we transform the raw outcome measures into a quantile scale ranging from 0 to 1 (least impaired to most severe dementia). Quantiles are calculated using the empirical cumulative distribution function using weights that are inversely proportional to the number of observations from each diagnostic category for each outcome. The quantiles were then transformed by the inverse Gaussian quantile function, resulting in an approximate Z-score before submitting to the model. When simulating data from these models, the simulated Z-scores can then be transformed back to the original scale, which can be integer valued. Bayesian estimation is performed via Markov Chain Monte Carlo (MCMC) sampling using the stan_mvmer function in R package Rstanarm [17].

Random forest algorithm for diagnosis of MCI

To simulate a clinician's diagnosis of MCI or dementia, we first use ADNI data to learn an algorithm to approximate this decision. The random forest algorithm [18] is an ensemble learning method for classification and regression. In our application, clinician diagnosis of normal cognition versus MCI or dementia is the binary outcome variable, and the seven continuous markers, age, and education are the predictors. The model is fit using the R package randomForest [19]. The fitted model is then applied to simulated continuous outcomes to predict a clinician's diagnosis.

Competing clinical trial models for continuous and time-to-event outcomes in simulation study

The simulated treatment effect on time to progression is modeled by the Cox proportional hazards model. For the PACC, we consider MMRM and the cLDA proposed by Liang and Zeger [9]. Similar to most likelihood-based approaches for longitudinal data, all three models assume any missing data are missing at random (MAR). The version of the PACC used in the study is a composite of four assessments: ADASDWR, LogMem, log transformation of Trails B, and MMSE. Each of the four component scores is first centered by subtracting the baseline sample mean and then divided by the baseline sample standard deviation of that component, to form standardized Z scores. These Z scores are averaged to form the composite. The MMRM models treat change from baseline in the PACC score as the outcome and baseline PACC as a predictor. It treats time as a categorical variable, which allows general mean trends in each group. MMRM has been extensively used for testing treatment effects at specific time points in clinical trials because participants are often evaluated at a fixed and relatively small number of time points [20]. In our simulation study, the within-subject dependence is modeled by a first-order autoregressive covariance structure. We also explore models that treat time as a continuous variable. In cLDA, the baseline outcome is treated as a response variable rather than a covariate, and the two randomized groups are constrained to have the same mean at baseline [21], [22]. We explore models with linear or quadratic time trends for each group.

Simulation setup

We conducted a simulation study to evaluate the performance of the competing models described in Section 3.3. In each of 1000 simulated clinical trials with visits every 6 months from 0 to 8 years, a total of 1000 and 1500 patients are, respectively, randomized to either treatment or placebo in 1:1 ratio. We also assume the proportion of MCI progressors to be 24% (based on ADNI data, as noted previously). For the placebo group, no changes will be made to the JMM fit to ADNI. For the treatment group, we will impose large (40% improvement on rate of change over the control), moderate (30% improvement), small (20% improvement), and no (same as the control) treatment effects on all outcomes. To simulate nonignorable missing data, three dropout categories are considered: intolerability, inefficacy, and missing completed at random (MCAR). Participants having intolerability or inefficacy drop out from the study immediately after six and twelve months, respectively. For MCAR, we assume linear attrition rate of 5% per year for both the treatment and placebo groups. The simulated dropout rates are described in Table 2.

Table 2

Missing data patterns assumed in simulations

Scenario		Missing data rate
Group	Treatment	Perceived inefficacy	Intolerability	Completely at random
Active	Ineffective	15%	10%	5% per year
Active	Effective	8%	10%	5% per year
Placebo	Not applicable	15%	0%	5% per year

Participants having intolerability are simulated to drop out at month six, and those perceiving inefficacy drop out at twelve months.

To assess bias due to missing data, we simulate complete data for every subject. The complete data are appropriately censored for the analysis of “observed” data and left uncensored for analysis of the “complete” data. Completers and MCAR dropouts are assumed to have the same longitudinal mean profile within each treatment arm. Dropouts due to intolerability are simulated to have the expected benefit, on average, until dropout, followed by an “unobserved” benefit that is diminished by a factor of 15%. Dropouts due to inefficacy are simulated to have no benefit. The four competing clinical trial models are MMRM, cLDA1 (linear) and cLDA2 (quadratic) for continuous PACC scores, and Cox for time to progression, with two baseline covariates namely age at baseline and carriage of the APOE ε4 allele. The Cox model will use all data observed up to 8 years until the last subject reaches the final scheduled visit under the common close design. We assume a linear enrollment rate such that enrollment is completed in 4 years and about half the subjects contribute “extra” common close follow-up in the 4.5- to 8-year range to the Cox model. The MMRM, cLDA1, and cLDA2 will only use data up to last scheduled visit, that is, from 0 to 4.5 years. We focus on “treatment policy” estimands of interest. The estimand will be the difference between randomized groups in the intention-to-treat population in terms of (1) rate (hazard ratio) of progression to MCI/dementia (Cox); (2) group difference in PACC at the final study time point (MMRM and cLDA1); or (3) area between mean PACC curves (cLDA2). We show how to carry out the hypothesis test of case (III) in the Supplementary Material. Let Y denote the simulated PACC scores for subject i randomized to group j at time point k, where i = 1,⋯,n , j = D,P, and k = 1,⋯,T. And k = 0 represents the baseline time point, D is the treatment group, and P is the placebo group. Under MMRM and cLDA1, for example, the objective is to estimate the between-treatment difference δ = μ − μ, where μ = E (Y − Y). A two-tailed test H0:δ = 0 versus H1:δ≠0 is carried out to evaluate whether treatment is different from placebo. For each simulated data set, we apply all four competing models to calculate point estimates of δ using the observed data (i.e., δobs) and the complete data (i.e., δcomp). For each model, “bias” is calculated as the median of the 1000 point estimates of δobs minus δcomp; “bias in percent” is computed as the median of the 1000 point estimates of δobs minus δcomp and then divided by δcomp. The interquartiles Q1 and Q3 are also summarized. In a real clinical trial, the endpoint is measured for completers but is missing for those who either drop out from the study either because of inefficacy or intolerability or those who remain in the study after initiating rescue medication. Mehrotra et al. [23] discussed that the commonly used MMRM with the embedded MAR assumption can deliver an exaggerated estimate of the aforementioned estimand of interest, in favor of the drug. This happens, in part, due to implicit imputation of an overly optimistic mean for dropouts in the treatment group. To remedy this, they proposed a formula-based two-step approach by treating the true endpoint distribution for treatment group as a mixture of distributions (one each for the completers and dropouts) rather than a single distribution. Their approach reduces the bias associated with the traditional MMRM while maintaining power. To increase the precision in estimating δ, we apply their method to MMRM, cLDA1, and cLDA2 models in the simulation study.

Results

JMM and random forest fit to ADNI data

We fit a JMM for PAD participants who were observed to progress to MCI and a separate JMM for those who did not progress. Seven outcome measures described in Section 2 are included in the model. Fixed-effect covariates for each outcome include age at baseline and carriage of the APOE ε4 allele. Three parallel Markov chains are run for 4000 iterations, and the first 2000 warm-up iterations are discarded. Every fourth value of the remaining part of each chain is stored to reduce correlation, yielding a total of 1500 samples for posterior analysis. Table 3 shows the posterior means and 95% credible intervals of the covariate-effect parameters. Fig. 1 shows the subject-level observations and predictions according to time in years of the seven markers for all individuals, in which the blue and red curves are estimated using the locally estimated scatter plot smoother. The bottom panel shows that the predictions provide reasonable trends of the observations. The posterior estimates from JMM will be later used as the true parameter values to simulate the panel of continuous markers.

Table 3

Posterior estimates (means and 95% CIs) of the fixed-effect covariates for the joint mixed-effect model fit to seven outcomes for stable and MCI progressor subpopulations

Parameter	Progressor (N = 39)	Stable (N = 124)
Parameter	Mean (95% CI	Mean (95% CI
ADAS delayed word recall
Intercept	−8.244 (−15.39, −1.451)	−4.913 (−7.755, −2.003)
Year	0.330 (0.189, 0.464)	0.064 (0.021, 0.108)
Age	0.110 (0.021, 0.201)	0.062 (0.023, 0.100)
APOE ε4	0.572 (−0.319, 1.437)	0.218 (−0.247, 0.670)
Logical memory paragraph recall
Intercept	−6.897 (−15.425, 0.905)	−1.840 (−4.983, 1.350)
Year	0.261 (0.136, 0.395)	0.033 (−0.084, 0.016)
Age	0.096 (−0.005, 0.206)	0.020 (−0.023, 0.062)
APOE ε4	0.039 (−0.959, 1.099)	0.465 (−0.044, 0.985)
Trails B
Intercept	−9.458 (−14.898, −3.918)	−6.364 (−9.020, −3.792)
Year	0.353 (0.252, 0.445)	0.022 (−0.028, 0.073)
Age	0.124 (0.051, 0.193)	0.084 (0.050, 0.119)
APOE ε4	0.141 (−0.540, 0.858)	0.622 (0.187, 1.087)
MMSE
Intercept	0.852 (−191.780, 185.973)	−1.385 (−75.020, 72.568)
Year	0.009 (−3.918, 4.011)	0.022 (−2.590, 2.698)
Age	0.007 (−2.432, 2.436)	0.020 (−0.903, 0.944)
APOE ε4	0.040 (−1.116, 11.346)	0.115 (−5.683, 5.900)
Category fluency—animals
Intercept	1.430 (−127.590, 130.195)	0.942 (−96.958, 98.426)
Year	0.047 (−2.910, 2.786)	0.025 (−3.399, 3.798)
Age	−0.009 (−1.658, 1.606)	−0.011 (−1.224, 1.211)
APOE ε4	0.036 (−8.234, 8.775)	−0.118 (−7.911, 7.920)
CDRSB
Intercept	−6.537 (−364.967, 344.177)	1.094 (−82.421, 76.732)
Year	0.082 (−7.263, 6.390)	0.006 (−2.853, 2.947)
Age	0.081 (−4.230, 4.517)	−0.011 (−1.006, 1.027)
APOE ε4	−0.224 (−20.697, 19.566)	0.117 (−5.925, 6.358)
FAQ
Intercept	3.458 (−380.068, 367.151)	0.261 (−32.960, 32.991)
Year	0.023 (−7.838, 7.140)	0.0007 (−1.1420, 1.1710)
Age	−0.002 (−4.487, 4.718)	−0.003 (−0.410, 0.449)
APOE ε4	0.343 (−22.127, 22.506)	0.014 (−2.667, 2.525)

Fig. 1

Observed (upper panel) and predicted (lower panel) longitudinal profiles of the seven markers for all individuals. Bold lines are locally estimated scatter plot smoother. Abbreviations: ADASDWR, Alzheimer's Disease Assessment Scale delayed word recall; MMSE, Mini-Mental State Examination; CDRSB, Clinical Dementia Rating—Sum of Boxes; FAQ, functional assessment questionnaire.

Posterior estimates (means and 95% CIs) of the fixed-effect covariates for the joint mixed-effect model fit to seven outcomes for stable and MCI progressor subpopulations Abbreviations: ADAS, Alzheimer's Disease Assessment Scale; APOE, apolipoprotein E; MMSE, Mini-Mental State Examination; CDRSB, Clinical Dementia Rating—Sum of Boxes; FAQ, functional assessment questionnaire; SD, standard deviation; CI, credible interval; MCI, mild cognitive impairment. Observed (upper panel) and predicted (lower panel) longitudinal profiles of the seven markers for all individuals. Bold lines are locally estimated scatter plot smoother. Abbreviations: ADASDWR, Alzheimer's Disease Assessment Scale delayed word recall; MMSE, Mini-Mental State Examination; CDRSB, Clinical Dementia Rating—Sum of Boxes; FAQ, functional assessment questionnaire. For the random forest, 500 trees are fitted, and the number of variables selected at each split is 3. The node impurity of each tree is measured by the Gini index. The results show that CDRSB, LogMem, and FAQ are three most important outcomes for determining the diagnosis of MCI. The model has a 6.19% out-of-bag error rate and 93.81% out-of-bag accuracy rate. Using the fitted random forest, the simulated cognitive status can be obtained from the simulated continuous markers. Fig. 2 shows the Kaplan-Meier estimated progression rate of the ADNI-PAD population (black solid line) along with the progression rate from one large simulated placebo group (red dots). The simulated progression yields closer concordance with the Kaplan-Meier estimates at the earlier stage. Although we observe discrepancies between the two lines in the middle and the right tail, the red line still lies within the 95% confidence intervals. Both the subject-level trajectories and the progression rate illustrate that the simulated data plausibly mimic the observed data.

Fig. 2

Kaplan-Meier estimated rate of progression to MCI or dementia. Abbreviations: MCI, mild cognitive impairment; ADNI-PAD, Alzheimer's Disease Neuroimaging Initiative—presymptomatic (or preclinical) Alzheimer's disease.

Simulation results

Fig. 3 shows the results of one simulated clinical trial with a 20% treatment effect and sample size n = 1000. The figure illustrates the group trends obtained by fitting the four different models.

Fig. 3

Results of one simulated clinical trial with 20% treatment effect from (A) analysis of change from baseline using a categorical time MMRM of the PACC; (B) a cLDA model of PACC with linear time trends; (C) a cLDA model of PACC with quadratic time effects; and (D) Kaplan-Meier curves comparing the time-to-progression to mild cognitive impairment or dementia for the two groups. Abbreviations: MMRM, mixed models of repeated measures; PACC, Preclinical Alzheimer's Cognitive Composite; cLDA, constrained longitudinal data analysis. Simulated power and type I error are summarized in Table 4. Under the null hypothesis (no treatment effect), the MMRM exhibits smaller-than-expected type I error (about 2%), whereas the other models are closer to the expected 5% error rate. The Cox model consistently exhibits the weakest power of the four models. MMRM has the next best performance, followed by the quadratic (cLDA2) and linear (cLDA1) models. For example, with a trial of sample size N = 1000 subjects of drug with a 30% treatment effect, the simulated power is 33% for Cox, 79% for MMRM, 86% for cLDA2, and 96% for cLDA1. In comparing analysis of complete versus observed data, it seems the missing data do not increase type I error, but they do inflate power. This suggests the bias is only an issue with an effective drug, in which case the effectiveness might appear inflated. Fig. 4 shows the powers in all scenarios.

Table 4

Power and type I error from 1000 simulated clinical trials

Sample size	Treatment	Observed data				Completed data
Sample size	Treatment	MMRM	cLDA¹	cLDA²	Cox PH	MMRM	cLDA¹	cLDA²	Cox PH
1000	0%	0.021	0.051	0.053	0.040	0.027	0.049	0.057	0.046
	20%	0.404	0.702	0.502	0.188	0.298	0.564	0.402	0.159
	30%	0.794	0.957	0.856	0.322	0.666	0.897	0.751	0.274
	40%	0.970	0.999	0.981	0.496	0.907	0.990	0.947	0.425
1500	0%	0.024	0.042	0.054	0.058	0.014	0.048	0.051	0.055
	20%	0.560	0.843	0.660	0.261	0.454	0.722	0.550	0.232
	30%	0.927	0.996	0.954	0.452	0.847	0.973	0.907	0.392
	40%	1.000	1.000	1.000	0.653	0.994	1.000	0.996	0.573

NOTE. The rows with 0% treatment effect simulate the type I error, which we expect to be near 5%.

Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards.

Fig. 4

Statistical power for the MMRM, cLDA, and Cox proportional hazards model for treatment effects 0% (type I error), 20%, 30%, and 40% for sample sizes of n = 1000 (left panel) and n = 1500 (right panel). Solid lines indicate power estimates for data observed after simulated nonignorable missingness, and dashed lines indicate power that would be achieved with complete data (including observations that would be unobserved in reality). The observed data show greater power with fewer observations because the nonignorable missingness induces a bias in favor of the treatment. Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards.

Power and type I error from 1000 simulated clinical trials NOTE. The rows with 0% treatment effect simulate the type I error, which we expect to be near 5%. Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards. Statistical power for the MMRM, cLDA, and Cox proportional hazards model for treatment effects 0% (type I error), 20%, 30%, and 40% for sample sizes of n = 1000 (left panel) and n = 1500 (right panel). Solid lines indicate power estimates for data observed after simulated nonignorable missingness, and dashed lines indicate power that would be achieved with complete data (including observations that would be unobserved in reality). The observed data show greater power with fewer observations because the nonignorable missingness induces a bias in favor of the treatment. Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards. Tables 5 and 6 further examine the bias induced by the missing data pattern. The tables summarize the median and interquartile ranges (Q1, Q3) of the bias on the PACC scale (Table 5) and as a percent of effect seen in complete data [6]. The Cox model seems to have smaller bias with 20% treatment effect, but as the treatment grows, the bias is comparable for all models. The method proposed by Mehrotra et al. [23] successfully shrinks the magnitude of bias, for example, from 27% in favor of treatment to −4.4% in favor of placebo for MMRM with 20% treatment effect. The method appears to overcorrect the bias in favor of placebo in these simulations.

Table 5

Bias of the treatment effect due to missingness

Sample size	Analysis method	20%	30%	40%
Sample size	Analysis method	Median (Q₁, Q₃)	Median (Q₁, Q₃)	Median (Q₁, Q₃)
1000	MMRM	0.018 (0.006, 0.031)	0.028 (0.015, 0.040)	0.037 (0.024, 0.049)
	cLDA¹	0.019 (0.009, 0.029)	0.028 (0.018, 0.038)	0.038 (0.028, 0.048)
	cLDA²	0.038 (0.011, 0.065)	0.058 (0.030, 0.084)	0.077 (0.050, 0.104)
	Cox PH	−0.033 (−0.074, 0.010)	−0.045 (−0.086, −0.001)	−0.059 (−0.102, −0.017)
	MMRM-Mehrotra	−0.001 (−0.015, 0.012)	−0.002 (−0.016, 0.011)	−0.003 (−0.017, 0.010)
	cLDA¹-Mehrotra	−0.001 (−0.011, 0.008)	−0.001 (−0.012, 0.007)	−0.003 (−0.012, 0.007)
	cLDA²-Mehrotra	−0.006 (−0.034, 0.022)	−0.010 (−0.038, 0.018)	−0.014 (−0.042, 0.014)
1500	MMRM	0.018 (0.006, 0.028)	0.027 (0.016, 0.037)	0.036 (0.025, 0.047)
	cLDA¹	0.018 (0.010, 0.026)	0.027 (0.019, 0.036)	0.037 (0.028, 0.045)
	cLDA²	0.037 (0.013, 0.061)	0.056 (0.032, 0.080)	0.075 (0.052, 0.099)
	Cox PH	−0.028 (−0.064, 0.005)	−0.042 (−0.076, −0.009)	−0.055 (−0.090, −0.021)
	MMRM-Mehrotra	−0.002 (−0.012, 0.009)	−0.003 (−0.013, 0.008)	−0.004 (−0.015, 0.007)
	cLDA¹-Mehrotra	−0.001 (−0.009, 0.006)	−0.002 (−0.010, 0.005)	−0.003 (−0.011, 0.004)
	cLDA²-Mehrotra	−0.008 (−0.028, 0.015)	−0.012 (−0.032, 0.011)	−0.016 (−0.035, 0.007)

Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards.

Table 6

Bias in percent (%) of the treatment effect due to missingness based on 1000 simulated trials for the given sample size, treatment effect, and analysis method

Sample size	Analysis method	20%	30%	40%
Sample size	Analysis method	Median (Q₁, Q₃)	Median (Q₁, Q₃)	Median (Q₁, Q₃)
1000	MMRM	27.1 (7.0, 52.3)	29.9 (16.3, 46.8)	29.6 (19.4, 42.3)
	cLDA¹	29.6 (12.4, 51.9)	29.8 (18.9, 43.7)	29.7 (21.4, 39.7)
	cLDA²	24.5 (5.5, 50.2)	26.5 (13.7, 42.6)	26.2 (16.5, 37.9)
	Cox PH	17.4 (−16.1, 55.0)	22.2 (−4.5, 52.7)	25.5 (5.2, 50.4)
	MMRM-Mehrotra	−4.4 (−23.2, 20.6)	−2.9 (−15.9, 13.3)	−2.8 (−12.7, 8.6)
	cLDA¹-Mehrotra	−1.7 (−16.2, 15.4)	−1.7 (−11.3, 9.1)	−2.0 (−9.2, 5.7)
	cLDA²-Mehrotra	−6.0 (−21.2, 15.9)	−4.5 (−15.5, 9.4)	−4.7 (−13.0, 5.2)
1500	MMRM	27.5 (9.7, 52.8)	28.2 (16.6, 43.3)	28.3 (19.6, 39.3)
	cLDA¹	29.1 (15.7, 48.4)	29.2 (19.9, 40.9)	29.3 (22.2, 37.8)
	cLDA²	24.8 (8.8, 45.6)	25.4 (15.2, 38.2)	25.5 (17.8, 34.6)
	Cox PH	18.0 (−8.2, 46.9)	22.7 (3, 46.3)	24.3 (8.6, 44.6)
	MMRM-Mehrotra	−3.0 (−19.4, 17.6)	−3.0 (−13.8, 9.7)	−3.1 (−11.2, 6.2)
	cLDA¹-Mehrotra	−2.1 (−13.5, 11.4)	−2.3 (−9.8, 5.7)	−2.4 (−7.9, 3.5)
	cLDA²-Mehrotra	−6.1 (−18.8, 12.7)	−5.5 (−13.9, 5.7)	−5.5 (−11.5, 2.8)

Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards.

Bias of the treatment effect due to missingness Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards. Bias in percent (%) of the treatment effect due to missingness based on 1000 simulated trials for the given sample size, treatment effect, and analysis method Abbreviations: MMRM, mixed models of repeated measures; cLDA, constrained longitudinal data analysis; PH, proportional hazards.

Discussion

We use Bayesian JMM fit using ADNI data to simulate correlated longitudinal data that might plausibly arise in a PAD clinical trial. We used a random forest algorithm, also fit using ADNI, to algorithmically diagnose MCI in the simulated data so that we could compare models of the PACC to the Cox model of time to progression. The models of PACC consistently provide at least twice the power of the Cox model even when the Cox model has the benefit of considerably more follow-up visits under a common close design. Given this inefficiency, the time-to-progression analysis should be avoided in PAD. Some might still argue that the clinical meaningfulness of the time to progression is worth the cost of a larger, longer trial. However, given that the random forest provided a purely algorithmic diagnosis with 93.81% out-of-bag accuracy, it is suggested that there is minimal additional value in the diagnosis. And again, while the progression outcome is more qualitative than the PACC on the subject level, the group-level result is still quantitative (e.g., a hazard ratio) and requires additional interpretation to assign clinical meaning. One might also argue that clinical diagnosis cannot be adequately modeled algorithmically using trial data. That is, clinical assessment and diagnosis by a trial-site clinician may consider information not captured by trial measures. However, the cognitive, clinical, and functional assessments are designed to capture the relevant information, and clinicians generally rely on similar information obtained through less structured assessments. It seems questionable that a site clinician will gain much reliable information beyond the assessments; indeed, this is the justification for central expert panel adjudication of site diagnoses. The Bayesian joint models are well suited to simulating plausible panels of correlated longitudinal data necessary to compare clinical trial designs. This approach could be useful in many other contexts where one is interested in a fair comparison of different outcome measures, different combinations of correlated outcomes, or different models of treatment effect. Simulations that ignore the correlations among important outcomes will likely not provide reliable comparisons. All the models considered were susceptible to bias induced by a plausible missing data pattern. However, this bias seemed to only affect scenarios with an effective treatment and did not inflate type I error under the null hypothesis. The Mehrotra method shows promise in correcting this bias, but it might overcorrect in favor of placebo, and it would be impossible to detect this overcorrection in practice. Given that type I error is not inflated, we are inclined to suggest no change to the status quo approach in which the primary analysis is based on likelihood-based methods which are robust to MAR and applying appropriate MNAR sensitivity analyses such as the delta method [24]. Systematic review: Donohue et al. (2011) explored the relative efficiency of time-to-event versus continuous outcomes, reviewed the literature, derived an analytic calculation of the relative efficiency, and simulated trials in mild cognitive impairment (MCI) populations. The current work extends this earlier work to the preclinical Alzheimer's population. We reviewed trials in preclinical Alzheimer's disease on clinicaltrials.gov and found that most use a continuous primary outcome but at least one is using time to MCI. Interpretation: The simulation study confirms that continuous outcomes provide about twice the statistical power to detect treatment effects compared with time to MCI. Plausible scenarios of attrition due to intolerability and perceived lack of efficacy inflate estimates of treatment benefit, although type I error is not inflated. Future directions: The novel simulation methodology using hierarchical Bayesian mixed-effect models of multiple outcomes and random forests can be used to optimize preclinical Alzheimer's clinical trial efficiency and power and to minimize bias.

15 in total

The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic Alzheimer's disease.

Introduction

Data

Methods

Joint mixed-effects model for longitudinal data

Random forest algorithm for diagnosis of MCI

Competing clinical trial models for continuous and time-to-event outcomes in simulation study

Simulation setup

Results

JMM and random forest fit to ADNI data

Simulation results

Discussion

1. "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician.

2. Trail Making Test A and B: normative data stratified by age and education.

3. Missing data in clinical trials: control-based mean imputation and sensitivity analysis.

4. MMRM vs. LOCF: a comprehensive comparison based on simulation study and 25 NDA datasets.

5. On efficiency of constrained longitudinal data analysis versus longitudinal analysis of covariance.

6. Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials?

7. Alzheimer's Disease Assessment Scale (ADAS).

8. The Clinical Dementia Rating (CDR): current version and scoring rules.

9. Association Between Elevated Brain Amyloid and Subsequent Cognitive Decline Among Cognitively Normal Persons.

10. The preclinical Alzheimer cognitive composite: measuring amyloid-related decline.

1. Fenchel duality of Cox partial likelihood with an application in survival kernel learning.

2. Modelling prognostic trajectories of cognitive decline due to Alzheimer's disease.

3. Revealing the Timeline of Structural MRI Changes in Premanifest to Manifest Huntington Disease.

4. Data-driven causal model discovery and personalized prediction in Alzheimer's disease.