Literature DB >> 33992081

Dynamic prediction based on variability of a longitudinal biomarker.

Kristen R Campbell¹, Rui Martins², Scott Davis³, Elizabeth Juarez-Colunga⁴.

Abstract

BACKGROUND: Tacrolimus is given post-kidney transplant to suppress the immune system, and the amount of drug in the body is measured frequently. Higher variability over time may be indicative of poor drug adherence, leading to more adverse events. It is important to account for the variation in Tacrolimus, not just the average change over time.
METHODS: Using data from the University of Colorado, we compare methods of assessing how the variability in Tacrolimus influences the hazard of de novo Donor Specific Antibodies (dnDSA), an early warning sign of graft failure. We compare multiple joint models in terms of fit and predictive ability. We explain that the models that account for the individual-specific variability over time have the best predictive performance. These models allowed each patient to have an individual-specific random error term in the longitudinal Tacrolimus model, and linked this to the hazard of dnDSA model.
RESULTS: The hazard for the variance and coefficient of variation (CV) loading parameter were greater than 1, indicating that higher variability of Tacrolimus had a higher hazard of dnDSA. Introducing the individual-specific variability improved the fit, leading to more accurate predictions about the individual-specific time-to-dnDSA.
CONCLUSIONS: We showed that the individual's variability in Tacrolimus is an important metric in predicting long-term adverse events in kidney transplantation. This is an important step in personalizing the dosage of TAC post-transplant to improve outcomes post-transplant.

Entities: Chemical Disease Gene Species

Keywords: Dynamic prediction; Kidney transplant; Survival

Mesh：

Substances：

Year: 2021 PMID： 33992081 PMCID： PMC8122571 DOI： 10.1186/s12874-021-01294-x

Source DB: PubMed Journal: BMC Med Res Methodol ISSN： 1471-2288 Impact factor: 4.615

Background

In many medical settings, the variability in patient drug levels is hypothesized to predict adverse events, as higher variability is often indicative of drug-drug interactions, altered drug metabolism, or poor drug adherence. This is the case in the kidney transplant setting: Tacrolimus (TAC) is prescribed post-transplant to suppress the immune system, and the amount of drug in the body is measured frequently via blood draw. In scenarios such as this, it is important to account for the variation in the longitudinal outcome, not just the average change over time. In this paper, we test whether the variability in longitudinal TAC predicts time-to-de novo Donor Specific Antibodies (dnDSA), an early warning sign of kidney transplant graft failure. Clinicians need accurate predictions of adverse allograft events in order to know when and how to modify the treatment course for high risk patients, such as altering immunosuppression therapy or increasing follow-up visit frequencies. This is critically important today, as there exists a national shortage of available kidney donors [1]. If adverse events can be avoided, the available donated organs can be used more effectively, and in turn the kidney transplant waiting list, currently at nearly 100,000 patients [1], will be shorter. Avoiding antibody-mediated rejection, or organ rejection due to de novo donor-specific antibodies (dnDSA), is one of the most important steps in achieving long-term graft survival [2]. It has been shown that maintaining proper immunosuppression therapy levels post-transplant is a major determinant in preventing dnDSA. Tacrolimus (TAC) is the primary immunosuppressant used by U.S transplant centers (93%) [3, 4], but it’s therapeutic dose window is narrow and differs by patient. TAC trough levels, or the lowest amount of drug in the blood prior to the next dose, are measured frequently post-transplant. In previous studies, one-dimensional functions of observed TAC values were found to be associated with dnDSA; these functions included the percentage of TAC levels below 5 ng/ml [5], average TAC troughs of less than 8 ng/ml, and the percentage of time TAC was in therapeutic range [6]. Importantly, the variability of TAC was also found to be associated with dnDSA, as high variability likely indicates inadequate drug exposure and possible non-adherence. In previous studies, the coefficient of variation (CV) of TAC, summarized over a specific range of time pre-dnDSA, was predictive of dnDSA. [7-9]. The CV is calculated as the ratio of the standard deviation to the mean, and is a way to standardize the variability of a measurement so it is not dependent on the average value [10]. To complicate the problem further, dnDSA is only screened for periodically, and much less frequently than TAC trough levels are drawn. This makes dnDSA an interval censored outcome, as the exact timing of development is unknown. The majority of studies analyzing the relationship between TAC and dnDSA treat dnDSA as right censored due to the analytic complexities associated with interval-censored outcomes. A previous study used a joint model for interval censored data that incorporated all longitudinal values of TAC and found that both average TAC and the slope of TAC over time were associated with time-to-dnDSA [4]. This approach used a linear mixed model for TAC linked by a random intercept and random slope to an interval censored parametric survival model for dnDSA. Joint models such as this have become increasingly popular as a way to simultaneously model longitudinal and time-to-event data. These approaches typically use a standard linear mixed model in combination with a survival model to assess the effect of the longitudinal outcome on the time-to-event outcome. These two model processes can be linked in a number of ways, including shared random effects, the true value, or the slope of the longitudinal trajectory. More recently, research has been done into allowing for a more flexible linking function [11, 12]. In this paper, we build upon the previously mentioned joint model for TAC and dnDSA [4] to test the hypothesis that an individual’s variability of TAC over time is associated with time-to-dnDSA. We allow for each individual’s TAC trajectory to be normally distributed with heterogeneous variances, which avoids the unlikely assumption that all individuals share a common variance. We test whether this individual-specific variability is associated with time-to-dnDSA and whether it improves the predictive ability of the model. We also assess the model’s ability to dynamically predict a patient’s probability of developing dnDSA after surviving dnDSA-free for a specified period. This is motivated by the clinical scenario where patients are seen in clinic at 6 months or 1 year-post transplant dnDSA-free, and the clinician wants to assess the probability that the patient remains dnDSA-free until a future appointment. We demonstrate how these probabilities can be calculated using the joint model, and how accurate these predictions are given the dataset at hand. In the first section, we introduce the kidney transplant study data. Next, the joint models are formulated and applied to the motivating kidney transplantation dataset. Model estimation and prediction assessments follow, and we end with a discussion of study limitations and next steps.

Kidney transplant study

As a motivating example, we introduce a retrospective study of patients who received a kidney transplant at the University of Colorado between 2007 and 2013. Exclusion criteria included being under 18 at the time of transplant, having a simultaneous liver and kidney transplant or islet cell transplant, having DSA pre-transplant or dnDSA within the first week post-transplant, not having any dnDSA screening post-transplant, and not receiving TAC post-transplant. This study was conducted in accordance with the Declaration of Helsinki and was approved by the ethics committee at the Colorado Multiple Institutional Review Board (COMIRB 13-3137). Post-transplant, TAC trough levels and dnDSA screening results for each of the 538 included patients were collected for up to 7 years. Because of the unreliability of TAC and dnDSA data within the first week post-transplant, follow-up data was collected starting at day 8 post-transplant. Individuals had a median of 22 (IQR: 15-32) qualifying TAC levels, taking on values from 0 to 30 and roughly following a normal distribution. Screening for dnDSA was conducted at 1, 6, 12 months, annually, and when clinically indicated. Approximately one-third (n=181) had at least one dnDSA during follow-up, and for the purpose of this study, we focused on time-to-first dnDSA. Additional variables collected included age at transplant, race/ethnicity, and the number of Human Leucocyte Antigen (HLA) mismatches between donor and recipient. A more detailed description of the dataset and measures collected can be found elsewhere [4, 6]. We randomly split the dataset into a training (2/3, N=358) and a testing (1/3, N=180) cohort. The training cohort was used to fit all models outlined in the results, while the testing cohort was used to assess predictive performance.

Methods

General model formulation

Suppose there are N individuals each measured at n time points, i=1,...,N. Let y be the measurement for individual i at time t,j=1,...,n. We assume that given the vector of individual-specific random intercept and slope (a0,a1)′, individual outcome measures y are independent and normally distributed with mean μ and residual variance , where Some examples of f, the function that models the time component, include a linear trajectory, i.e. f(t)=b0+b1t; or b-spline basis functions, i.e. , where ψ(t) is the k of K basis functions with coefficient β0 and the number of basis functions K depends on the degree of the splines (e.g. p=3) and number of inner knots (h=3; K=p+h+1). Baseline covariates are represented by x1, with regression coefficients β1′. The random effects a0,a1 are distributed normally with mean zero and variance/covariance matrix Importantly, the model allows a flexible residual variance component, . In this paper, we either set l to 1, which assumes all individuals have a common residual variance, or we set l to i, which allows all individuals to have different residual variances. This flexible variance component avoids the typical assumption that all subjects share one common residual error term, in favor of allowing the model to estimate higher or lower variance depending on the amount of variability in each subject specific trajectory. The same N individuals are measured for a time to event outcome at varying time points. The outcome is interval censored, as it is only known to happen between two discrete time points and the exact timing is unknown. Let t be the time at which the outcome was detected and t the time of the visit immediately preceding t. The actual unknown time to event, t, occurs in between t and t. The hazard for individual i at any time t can be represented by where h0(t) is the baseline hazard, which can be various flexible distributions, including parametric (typically exponential, Weibull, or gamma), piecewise constant or spline. β0 represents the fixed intercept, x2 is the vector of fixed covariates, which can be different from the fixed covariates in Eq. 1, and β2 is its coefficient. Of note, we restrict the longitudinal measurements (y) such that only those occurring prior to t are used in the joint model. The two sub-models (1) and (3) are connected by the shared random effects (a0,a1)′, and some function of the history of the longitudinal biomarker, g. This function g can be chosen based on a priori hypotheses, and should include an estimated parameter from the longitudinal model (1). In our case, g will be a function of the heterogeneous random error of the biomarker, σ. The association parameters λ0,λ1, and λ2 represent the estimated risk of event dependent on the average biomarker, slope of the biomarker, and the function, g. Conditional on random effects, a0,a1, the log likelihood can be written for each individual in the model with individual-specific random error (): where I is an indicator that the event for individual i is right censored (1=right censored, 0=interval censored), and F(t) is the cumulative density function at time t.

Model specifications

Four models were fit to test the clinical hypothesis that the variability of the biomarker over time would be associated with the time to event outcome. These are all special cases of the joint model outlined in “General model formulation” section. Model 1 (M1): Base model. l=1, so all patients have a common residual error, and g=0, so there are only two shared parameters: the random intercept and slope. Model 2 (M2): Individual-specific variance model. l=i, so each patient is allowed a different residual error term, and g=0, so again, there are only two shared parameters: the random intercept and slope. Model 3 (M3): Individual-specific variance shared model. l=i, so each patient is allowed a different residual error, and g=σ, so λ2 represents the hazard of event associated with the residual standard error. Model 4 (M4): Individual-specific coefficient of variation (CV) shared model. l=i, so each patient is allowed a different residual error, and , so λ2 represents the hazard of event associated with the coefficient of variation. Of note, the CV is calculated here as the estimated standard deviation over the sample mean.

Model selection and predictive ability

The Deviance Information Criterion (DIC) [13] and the Watanabe-Akaike information criterion (WAIC) [14] were used for model selection. The DIC is calculated based on the deviance, D()=−2 logL(|) and penalizes for the number of effective degrees of freedom in the model. WAIC approximates leave-one-out cross-validation [14] and works well for hierarchical models where the number of degrees of freedom is not obvious [15]. The best model should have the lowest values of DIC and WAIC; differences of at least 5 points were considered important. The predictive ability of the four models was assessed using Area Under the Curve (AUC) and Brier Score (BS). The interval censored nature of dnDSA complicates the calculation of AUC and BS. In right-censored survival settings, the AUC is the proportion of concordant pairs among all subject pairs, where the pair achieves concordance if the risk of event is higher for the subject with the earlier event time [16]. Under interval censoring, it may not be clear which subject in the pair has the earlier event time. When a parametric baseline hazard is employed, such as the Weibull, various methods for calculating the probability that the event time of the first subject is less than the event time of the second have been proposed [17, 18]. We used both the nonparametric estimator approach proposed by Wu and Cook and the probability based methods outlined by Tsouprou to calculate AUC, and only reported the Tsouprou results as the two both showed the same conclusions. BS is defined as the mean squared difference between the predicted probability and the observed outcome for each subject at a pre-specified follow-up time. Due to the interval censored nature of the data, is not always known if the outcome occurred before a certain time, so again we calculate the probability that the patient had the event prior to a pre-specified time [17]. Due to the time-varying nature of the models, the AUC and BS can change over time. To assess the predictive ability of the models at differing time points for differing prediction windows, we chose two clinically relevant scenarios: (1) being dnDSA-free at 12 months, given the patient was dnDSA-free at 6 months, and (2) being dnDSA-free at 24 months, given the patient was dnDSA-free at 12 months.

Dynamic prediction of survival probabilities

In order to compare predictive abilities of the proposed models (M1-M4), we calculate the dynamic predictions of surviving dnDSA-free at a pre-specified time point. Several works show the derivation of the conditional predictive probability of survival for a new individual at time t′, given that the individual has survived up to time t, where t′>t [19, 20]. Let denote the longitudinal trajectory up to time t, and T∗>t denote that the new individual has survived up to time t. An important step of this calculation is obtaining samples of the conditional posterior distribution of the random effects for this new individual , where θ denotes the set of all parameter values of the model. Conditional on the m-th MCMC sample θ, we draw the m-th sample of the random effects vector from its posterior distribution where and p(T∗>t∣θ,a) are the conditional probabilities of the longitudinal and survival outcomes, and p(a∣θ) is the probability of the random effects vector. We follow ideas in [20] and employ adaptive rejection Metropolis Hastings sampling to draw samples of the random effect (5) and calculate the predicted survival probabilities from the MCMC samples. Survival probabilities may be obtained using any time points (t′ and t, t′>t), but for simplicity and for clinical applicability, we chose to assess two scenarios: (1) being dnDSA-free at 12 months, given the patient was dnDSA-free at 6 months (t=6,t′=12), and (2) being dnDSA-free at 24 months, given the patient was dnDSA-free at 12 months (t=12,t′=24). Oftentimes, the patient comes to clinic for a milestone appointment (say 1 year-post transplant) appointment, and the clinician wants to assess the probability of the patient continuing to remain dnDSA-free until the next appointment (at 2 years post-transplant, for example). We used this framework to assess and compare the Area Under the Curve (AUC) and Brier Score (BS) for each model under each of the time period scenarios. For particular patients, we also calculated the dynamic predictions at more time points, specifically where t was 12 months and t′ ranged from 18 months to 60 months, to demonstrate how the survival probabilities can be calculated for longer follow-up periods.

Application

Five hundred and thirty eight patients met the final inclusion criteria for this analysis. The dataset was randomly split into a training (2/3, N=358) and a testing (1/3, N=180) cohort. The four models outlined in the methods were fit to the training cohort (N=358). Each model had a longitudinal sub-model (Eq. 1) with TAC as the outcome, a fixed intercept and slope for time (f(t)=b0+b1t)) and a random intercept and slope for time for variations across individuals. Spline terms for time were tested, but DIC and WAIC indicated that a linear trend was sufficient. For the survival sub-models (Eq. 3), various baseline hazard functions were tested, including exponential, Weibull, gamma, and piecewise constant, and the Weibull yielded the best model fit (h0(t)=αt, where α is the Weibull shape parameter). All survival sub-models had time to dnDSA as the interval censored outcome, random effects associated with the longitudinal sub-model and a vector of baseline covariates for age (younger age: <30, middle age: 30 −49, older age: 50 +), race/ethnicity (Caucasian, African American, Hispanic, and Other), and number of HLA mismatches. More detail on each of these characteristics and why they are hypothesized to be associated with dnDSA can be found elsewhere [6]. All models built upon the base model, M1, which had a shared random intercept and slope. As described in “Model specifications” section, M2 allowed each subject to have an individual-specific residual error term, M3 shared this individual-specific error term with the hazard sub-model, and M4 shared the individual-specific coefficient of variation with the hazard sub-model. Thus, the final implemented models followed this general structure: Model estimation and inference was based on the Bayesian framework. Hyper-parameters are assigned weakly informative priors: Denoting u as a generic random variable, we define the following density functions: u∼Normal(μ,σ2) with probability density ; u∼Uniform(a,b) with probability density 1/(b−a) for aTr(RΩ/2})/(2Γ(k/2)) for k≥p, where p=2, Tr is the trace function and Γ is the multivariate gamma function. Notably, we defined the prior for log(σ) as uniform(-A, A), where A is sufficiently large (we used A=100). The posterior distribution of each variable was estimated using Markov Chain Monte Carlo (MCMC) simulations. A Gibbs sampler was used to form two Markov chains using JAGS software [21] and the runjags package in R [22]. Convergence of the samples was assessed by trace plot inspection, and Gelman and Geweke tests which test for equality of the means of the first and last part of a Markov chain [23]. After a burn-in period of 20,000 and thinning of 50 of 100,000 sampling iterations, 2,000 samples per chain were used for inference. On average, each model took approximately 3.5 hours to run on a PC with two Intel Xeon X5650 processors and 128 GB RAM. For dynamic prediction, 500 MCMC samples were used, and within each sample, 500 iterations of the Metropolis Hastings algorithm were employed to estimate the random effects. Calculating a prediction for one subject took approximately 40 minutes on the same PC, using parallel processing on 12 cores. A reproducible example of M4, including code to fit the model and calculate dynamic predictions, can be found in the Supplemental Materials.

Results

A comparison of WAIC and DIC for each model, M1-M4, is reported in Table 1. As shown by lower DIC and WAIC, all three models that allowed for individual-specific variation (M2, M3, M4), performed better than the base model that forced a common residual error term. The model that fit the data best according to DIC allowed for individual-specific residual error and shared the CV term with the hazard of event. M2 had the lowest WAIC. M2-M4 also had superior predictive ability of dnDSA-free survival compared to M1, demonstrated by higher AUC and lower BS (Fig. 1 and Table S1).

Table 1

Model selection criteria: DIC, and WAIC as defined in the Model selection and predictive ability section

Model	DIC	WAIC
M1: Shared random intercept and slope only	46022	46137
M2: Individual variance term (not shared)	44580	44765
M3: Shared individual variance term	44587	44791
M4: Shared individual CV term	44571	44776

Training cohort, N=358

Fig. 1

Comparison of Area under the Curve (AUC) and Brier Score (BS) for each of the four models, under two scnearios: (1) being dnDSA-free at 12 months, given the patient was dnDSA-free at 6 months, and (2) being dnDSA-free at 24 months, given the patient was dnDSA-free at 12 months. M1=base model, shared random intercept and slope only, M2=allow for individual specific random error term, M3=share the individual specific standard deviation with hazard, M4=share the individual specific CV with hazard Model selection criteria: DIC, and WAIC as defined in the Model selection and predictive ability section Training cohort, N=358 The model estimates are found in Table 2. The resulting estimates for the TAC sub-model (fixed intercept, slope, covariance matrix of random intercept and slope) are similar across the M1-M4. M1 has a common random error term for all patients, which was estimated as 7.15 (95% Credible Interval (CrI): 6.93, 7.36). All other models allowed for residual error to vary by patient, and the estimated standard deviations from M4 ranged from 0.46 to 11.16 (Fig. 2). The hazard ratios for all baseline covariates (β0−β6) are similar in magnitude across the models. The loading parameters for the shared random effects (λ0,λ1) are similar across the models, although the λ1 is slightly lower for M1 compared to the other three. λ2 represents the effect the individual-specific residual standard deviation (M3) and coefficient of variation (M4) on the hazard of dnDSA. For every one unit increase in an individual’s estimated residual standard deviation, the risk of dnDSA increases 1.25-fold (95% CrI: 1.00, 1.54), and for every one unit increase in an individual’s estimated CV, risk of dnDSA increases 1.40-fold (95% CrI: 0.12, 5.03).

Table 2

Results from all models

Parameter	Model 1	Model 2	Model 3	Model 4
Linear Sub-Model	Estimate (95% CrI)	Estimate (95% CrI)	Estimate (95% CrI)	Estimate (95% CrI)
b₁	-0.059 (-0.071, -0.047)	-0.056 (-0.068, -0.044)	-0.056 (-0.068, -0.044)	-0.056 (-0.069, -0.045)
b₀	7.254 (7.094, 7.411)	7.177 (7.007, 7.324)	7.182 (7.014, 7.338)	7.178 (7.027, 7.32)
ρ	-0.419 (-0.562, -0.264)	-0.427 (-0.564, -0.275)	-0.422 (-0.555, -0.268)	-0.424 (-0.563, -0.268)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sigma ^{2}_{0}$\end{document}σ02	1.769 (1.455, 2.124)	1.716 (1.42, 2.074)	1.711 (1.394, 2.069)	1.713 (1.397, 2.072)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sigma ^{2}_{1}$\end{document}σ12	0.005 (0.003, 0.007)	0.005 (0.003, 0.006)	0.005 (0.003, 0.007)	0.005 (0.003, 0.007)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sigma ^{2}_{e}$\end{document}σe2	7.146 (6.931, 7.359)	NA (NA, NA)	NA (NA, NA)	NA (NA, NA)
Survival Sub-Model	Hazard Ratio (95% CrI)	Hazard Ratio (95% CrI)	Hazard Ratio (95% CrI)	Hazard Ratio (95% CrI)
α	0.555 (0.439, 0.698)	0.514 (0.413, 0.623)	0.535 (0.426, 0.666)	0.524 (0.421, 0.647)
β₀	0.035 (0.012, 0.075)	0.043 (0.018, 0.085)	0.024 (0.006, 0.057)	0.045 (0.014, 0.107)
β₁ (HLA Mismatch Number)	1.262 (1.115, 1.437)	1.239 (1.098, 1.393)	1.236 (1.086, 1.394)	1.238 (1.100, 1.403)
β₂ (African American)	2.015 (1.003, 3.644)	1.612 (0.857, 2.717)	1.673 (0.872, 2.805)	1.634 (0.857, 2.794)
β₃ (Hispanic)	1.466 (0.842, 2.324)	1.54 (0.919, 2.365)	1.398 (0.836, 2.232)	1.540 (0.899, 2.420)
β₄ (Other Race)	1.092 (0.21, 2.909)	0.949 (0.192, 2.465)	0.944 (0.168, 2.459)	0.951 (0.189, 2.484)
β₅ (Middle Age)	0.721 (0.35, 1.37)	0.803 (0.41, 1.482)	0.784 (0.398, 1.463)	0.802 (0.405, 1.472)
β₆ (Older Age)	0.307 (0.137, 0.616)	0.34 (0.168, 0.627)	0.338 (0.163, 0.655)	0.335 (0.165, 0.623)
λ₀	0.659 (0.515, 0.813)	0.66 (0.539, 0.802)	0.644 (0.517, 0.78)	0.661 (0.491, 0.845)
λ₁ (per 0.05 change)	0.478 (0.315, 0.646)	0.578 (0.441, 0.731)	0.559 (0.41, 0.714)	0.569 (0.410, 0.749)
λ₂	NA (NA, NA)	NA (NA, NA)	1.254 (1.004, 1.543)	1.400 (0.119, 5.026)

The posterior mean and 95% credible intervals are presented for the linear portion of the model, the survival portion of the model, and the association parameters. The longitudinal portion is comprised of a fixed intercept (b0), a fixed slope (b1), and a random error term (). In Models 2-4, the random error term is individual specific () There is also a random intercept () and a random slope for each individual (), which have a correlation parameter ρ. The survival model is comprised of fixed covariates (β), the Weibull association parameter α, and a random intercept for each individual that is related to the random intercept, slope, and some function of the estimated longitudinal trajecotry, through association parameters. The first association parameter (λ0) links the two sub-models through their shared random intercepts. The second association parameter (λ1) links the two sub-models through the longitudinal random slope and the survival random intercept. In Models 2-4, λ2 links the two sub-models through the longitudinal individual-specific SD (Model 3) or CV (Model 4) term. Middle age=30-49 years, Older age=50+ years. Model 1 (M1)=base model, shared random intercept and slope only, Model 2 (M2)=allow for individual specific random error term, Model 3 (M3)=share the individual specific standard deviation with hazard, Model 4 (M4)=share the individual specific CV with hazard. Training cohort, N=358

Fig. 2

Model 4 (M4) allowed each individual to have it’s own random error term, to account for varying amounts of variability in TAC between patients. The distribution of the individual’s standard deviations (SD) is shown here as a histogram. The red vertical line represents the common standard deviation taken from M1, which forced each individual to have a common residual error Results from all models The posterior mean and 95% credible intervals are presented for the linear portion of the model, the survival portion of the model, and the association parameters. The longitudinal portion is comprised of a fixed intercept (b0), a fixed slope (b1), and a random error term (). In Models 2-4, the random error term is individual specific () There is also a random intercept () and a random slope for each individual (), which have a correlation parameter ρ. The survival model is comprised of fixed covariates (β), the Weibull association parameter α, and a random intercept for each individual that is related to the random intercept, slope, and some function of the estimated longitudinal trajecotry, through association parameters. The first association parameter (λ0) links the two sub-models through their shared random intercepts. The second association parameter (λ1) links the two sub-models through the longitudinal random slope and the survival random intercept. In Models 2-4, λ2 links the two sub-models through the longitudinal individual-specific SD (Model 3) or CV (Model 4) term. Middle age=30-49 years, Older age=50+ years. Model 1 (M1)=base model, shared random intercept and slope only, Model 2 (M2)=allow for individual specific random error term, Model 3 (M3)=share the individual specific standard deviation with hazard, Model 4 (M4)=share the individual specific CV with hazard. Training cohort, N=358 Dynamic predictions for experiencing dnDSA-free survival were calculated for two individuals in Fig. 3. Both individuals were dnDSA-free at 12 months, and one went on to experience a dnDSA event, while the other did not. The probability of surviving dnDSA-free from 12 months to 60 months, by an increment of 6 months, was calculated for each individual, separately using the output from M1 and M4. In the left-hand panel is an individual who had a low CV of TAC during the first year post-transplant. This patient did not go on to experience a dnDSA event within 5 years post-transplant, and the estimated probability of dnDSA-free survival is higher using M4 compared to M1. In the right-hand panel, an individual who had a high CV of TAC during the first year post-transplant is presented. This patient went on to develop TAC sometime between 48 and 54 months post-transplant, and the patient’s probability of dnDSA-free survival is lower using M4, compared to M1. For both of these individuals, M4 yielded a better prediction of dnDSA compared to M1.

Fig. 3

Predicted probability of dnDSA-free survival, conditional on survival to 12 months. Left-hand panel: black lines represent observed TAC values for a given individual with good adherence (low coefficient of variation) of TAC during the first year post-transplant. Red curve represents predicted probability from M4 (model with coefficient of variation shared parameter) of remaining dnDSA-free up to 5 years post-transplant. Blue curve represents predicted probability from M1 (base model, does not account for variability) of remaining dnDSA-free up to 5 years post-transplant. The 95% credible intervals for the predictions are indicated by the shaded regions. The right-hand figure is the same framework, but for a patient with bad TAC adherence, and who developed dnDSA sometime between 48-55 months post-transplant

Simulation study

We conducted a simulation study to evaluate the performance of a model that allowed for individual-specific variances on a finite sample. We used M4 to simulate data, as this was the model with the most complexity. The true values of each parameter, listed in Table 3, were set to be close to the resulting estimates from the training dataset. Simulation results were obtained from 200 datasets, each with 200 individuals. Models were run for 1,000 iterations, after a 1,000 iteration burn-in time. As shown in Table 3, all empirical means (means of the 200 estimates) are close to the true values, except there was a small bias in α, the shape of the Weibull baseline hazard function, and in λ2, the association parameter for g, the individual-specific CV.

Table 3

Simulation study results of M4, model with shared individual-specific CV parameter (data simulated using M4, 200 datasets of N=200)

Parameter	True Value	Mean	SD
b₁	-0.03	-0.029	0.008
b₀	7	7.006	0.102
ρ	-0.03	-0.001	0.088
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sigma ^{2}_{0}$\end{document}σ02	1.75	1.769	0.204
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sigma ^{2}_{1}$\end{document}σ12	0.004	0.004	0.001
β₀	-2	-2.006	0.344
β₁ (HLA Mismatch Number)	0.25	0.265	0.085
α	0.5	0.516	0.04
λ₀	-0.5	-0.551	0.289
λ₂ (loading parameter for individual-specific CV)	0.2	-0.179	0.480

Simulation study results of M4, model with shared individual-specific CV parameter (data simulated using M4, 200 datasets of N=200)

Discussion

Tacrolimus is the most important immunosuppressant drug in solid organ transplantation despite having a very narrow therapeutic window. Ensuring appropriate drug exposure within this window is critical in order to maintain health of the graft while avoiding drug toxicities. However, understanding intrapatient drug exposure has been challenging, as only using average drug levels, or one-dimensional summary statistics of drug levels, has several disadvantages that don’t represent a complete picture of TAC exposure over time. The variability of TAC, using all longitudinal measures, may be a useful way to help characterize TAC drug exposure, as variability often leads to more time out of therapeutic range and places patient at risk for adverse outcomes [6]. In this paper, we demonstrated that incorporating an individual-specific variance term into the modeling of TAC and dnDSA indeed improves model fit and prediction of dnDSA. The random effects in the joint model induce the dependence between the two sub-models, which allows us to test whether the variance and CV of longitudinal TAC is associated with time-to-dnDSA, while accounting for the unobserved heterogeneity in these functions. Unlike the random effects, the biomarker’s variability (embodied in the CV) has an intuitive clinical interpretation. In our joint model context, its contribution is readily quantified as a hazard of experiencing a dnDSA event. If we observe two patients that only differ in the CV of TAC by one unit, everything else being equal for a specific time, the hazard of dnDSA is 1.4-times higher for the individual with the larger CV. Similarly, if two patients differ only in the estimated random error of TAC by 1 unit, the hazard of dnDSA is 1.3-fold higher for the patient with the higher variability. The number of variance parameters estimated by M2-M4 increases as the sample size increases. These exchangeable random-effects arise naturally in Bayesian hierarchical modeling, enabling a flexible individual-level structure of dependence, and computationally they are very convenient. The estimates at the individual-level shrink towards a common population-level estimate, which intuitively allows for better modeling of any individual’s longitudinal trajectory by borrowing information from similar subjects. In other words, exchangeability allows the use of information from the entire cohort to strengthen the inference for any individual subject. In Bayesian jargon, this is called “borrowing strength” [24]. The DIC and WAIC metrics both account for the effective degrees of freedom, particularly important in models with complex hierarchies of random effects; and despite additional parameters, the models with individual-specific variances proved to be superior to the model with homogeneous variance (Table 1). Additionally, the individual-level parameters allow individual-specific estimation, improving the dynamic predictions (as described in “Dynamic prediction of survival probabilities” section). The models with heterogeneous variance (M2-M4) proved to have superior predictive performance than M1, as shown by AUC and BS. The AUC was generally higher for predicting at t=12 months, compared to t=6 months, perhaps because we have more available longitudinal measurements to use in the prediction at the later time point. The BS was lower for the scenario of t=6 and t′=12, compared to the scenario of t=12 and t′=24, perhaps because the window of prediction was shorter (6 versus 12 months). Dynamic predictions using this model may be employed as a tool of personalized medicine, providing a guide for the clinical decisions, e.g. by modifying interventions or acting as alerts in responding to the longitudinal profile of the patient. As seen in Fig. 3, incorporating the variability in to the model improved the predictions for these two individuals, which highlights the need to account for the overall trajectory and variability of TAC when assessing risk of dnDSA. Models such as these could be applied to any disease that is treated with medications that have a narrow therapeutic window that require drug monitoring, such as glomerularnephritides, autoimmune conditions, and some cancers. We have employed some of the most used parameterizations of joint models (e.g. sharing the random effects) as well as one that is less common, which considers the sharing of other characteristics of the longitudinal trajectory, namely the standard deviation of the residual error. In future work, we plan to explore alternatives to JAGS, such as Nimble [25] or Stan [26], as these approaches may speed up the computational time needed to perform the dynamic predictions. Another natural extension of this work would be to adopt the idea of the latent class joint models, which could be used to identify the patients whose prognostic can be improved by adjusting the dose of TAC and patients for whom the dose adjustment will fail. After model adjustment, subjects of different latent classes can be easily interpreted with survival probabilities plots, thus providing a good prognostic tool.

Conclusion

Using a joint model with a flexible linkage function, we demonstrated that an individual’s variability of TAC over time is important in dynamically predicting dnDSA post kidney transplant. The proposed model improved prediction of dnDSA over a method that did not model the variability in TAC, and has potential of improving clinical outcomes in this area of personalized medicine. Additional file 1 Dynamic prediction based on variability of a longitudinal biomarker Supplemental Materials

14 in total

1. Understanding the causes of kidney transplant failure: the dominant role of antibody-mediated rejection and nonadherence.

Authors: J Sellarés; D G de Freitas; M Mengel; J Reeve; G Einecke; B Sis; L G Hidalgo; K Famulski; A Matas; P F Halloran
Journal: Am J Transplant Date: 2011-11-14 Impact factor: 8.086

2. Time-dependent variability in tacrolimus trough blood levels is a risk factor for late kidney transplant failure.

Authors: Ruth Sapir-Pichhadze; Yao Wang; Olusegun Famure; Yanhong Li; S Joseph Kim
Journal: Kidney Int Date: 2013-12-11 Impact factor: 10.612

3. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event.

Authors: Dimitris Rizopoulos; Pulak Ghosh
Journal: Stat Med Date: 2011-02-21 Impact factor: 2.373

4. Dynamic predictions in Bayesian functional joint models for longitudinal and time-to-event data: An application to Alzheimer's disease.

Authors: Kan Li; Sheng Luo
Journal: Stat Methods Med Res Date: 2017-07-28 Impact factor: 3.021

5. Lower tacrolimus exposure and time in therapeutic range increase the risk of de novo donor-specific antibodies in the first year of kidney transplantation.

Authors: Scott Davis; Jane Gralla; Patrick Klem; Suhong Tong; Gina Wedermyer; Brian Freed; Alexander Wiseman; James E Cooper
Journal: Am J Transplant Date: 2017-10-24 Impact factor: 8.086

6. OPTN/SRTR 2015 Annual Data Report: Kidney.

Authors: A Hart; J M Smith; M A Skeans; S K Gustafson; D E Stewart; W S Cherikh; J L Wainright; A Kucheryavaya; M Woodbury; J J Snyder; B L Kasiske; A K Israni
Journal: Am J Transplant Date: 2017-01 Impact factor: 8.086

7. Higher Variability of Tacrolimus Trough Level Increases Risk of Acute Rejection in Kidney Transplant Recipients.

Authors: C-T Huang; K-H Shu; H-C Ho; M-J Wu
Journal: Transplant Proc Date: 2016 Jul-Aug Impact factor: 1.066

8. OPTN/SRTR 2012 Annual Data Report: kidney.

Authors: A J Matas; J M Smith; M A Skeans; B Thompson; S K Gustafson; M A Schnitzler; D E Stewart; W S Cherikh; J L Wainright; J J Snyder; A K Israni; B L Kasiske
Journal: Am J Transplant Date: 2014-01 Impact factor: 8.086

9. Time-dependent ROC curve analysis in medical research: current methods and applications.

Authors: Adina Najwa Kamarudin; Trevor Cox; Ruwanthi Kolamunnage-Dona
Journal: BMC Med Res Methodol Date: 2017-04-07 Impact factor: 4.615

10. Comparison of a time-varying covariate model and a joint model of time-to-event outcomes in the presence of measurement error and interval censoring: application to kidney transplantation.

Authors: Kristen R Campbell; Elizabeth Juarez-Colunga; Gary K Grunwald; James Cooper; Scott Davis; Jane Gralla
Journal: BMC Med Res Methodol Date: 2019-06-26 Impact factor: 4.615

1 in total

1. Joint Modeling of Repeated Measurements of Different Biomarkers Predicts Mortality in COVID-19 Patients in the Intensive Care Unit.

Authors: Kirby Tong-Minh; Yuri van der Does; Joost van Rosmalen; Christian Ramakers; Diederik Gommers; Eric van Gorp; Dimitris Rizopoulos; Henrik Endeman
Journal: Biomark Insights Date: 2022-07-14

1 in total