Literature DB >> 34547928

Flexible extension of the accelerated failure time model to account for nonlinear and time-dependent effects of covariates on the hazard.

Menglan Pang¹, Robert W Platt^1,2,3, Tibor Schuster⁴, Michal Abrahamowicz^1,3.

Abstract

The accelerated failure time model is an alternative to the Cox proportional hazards model in survival analysis. However, conclusions regarding the associations of prognostic factors with event times are valid only if the underlying modeling assumptions are met. In contrast to several flexible methods for relaxing the proportional hazards and linearity assumptions in the Cox model, formal investigation of the constant-over-time time ratio and linearity assumptions in the accelerated failure time model has been limited. Yet, in practice, prognostic factors may have time-dependent and/or nonlinear effects. Furthermore, parametric accelerated failure time models require correct specification of the baseline hazard function, which is treated as a nuisance parameter in the Cox proportional hazards model, and is rarely known in practice. To address these challenges, we propose a flexible extension of the accelerated failure time model where unpenalized regression B-splines are used to model (i) the baseline hazard function of arbitrary shape, (ii) the time-dependent covariate effects on the hazard, and (iii) nonlinear effects for continuous covariates. Simulations evaluate the accuracy of the time-dependent and/or nonlinear estimates, and of the resulting survival functions, in multivariable settings. The proposed flexible extension of the accelerated failure time model is applied to re-assess the effects of prognostic factors on mortality after septic shock.

Entities: Chemical

Keywords: Accelerated failure time model; nonlinear effect; regression splines; simulations; survival analysis; time-dependent effect

Mesh：

Year: 2021 PMID： 34547928 PMCID： PMC8649433 DOI： 10.1177/09622802211041759

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

Introduction

In most cohort studies of disease occurrence, progression, treatment or mortality, times to relevant clinical outcomes depend on multiple risk or prognostic factors, and exposures. Thus, whether the main focus is on etiology or on prediction, multivariable time-to-event analyses are essential. Yet, multivariable survival analyses require specifying a formal statistical model that describes how covariates are associated with the event times. Here, different survival models postulate different modes of covariates’ action, typically assumed to be common to all covariates. To facilitate both estimation and interpretation, most standard regression models restrict the corresponding adjusted regression coefficients (e.g. ratios of hazards, odds or survival times) for all covariates to remain constant over time. However, it is often plausible that true effects of some covariates are not consistent with this underlying modeling assumption, requiring extending the model to incorporate time-varying coefficients. In addition, for each continuous covariate one has to specify a suitable functional form of its relationship with the outcome. Both aforementioned modeling complexities have received considerable attention within the vast literature on hazard-based models that dominate medical applications of survival analysis but they are equally relevant for other multivariable survival models. In this manuscript, we attempt to address these challenges within the framework of multivariable accelerated failure time (AFT) model. The AFT model provides an alternative to the proportional hazards (PH) model to analyze time-to-event data.[3,4] Instead of the log hazard ratios estimated in the PH model, in the AFT model the covariate effects are expressed directly on the event time scale and estimated by the log event time ratios. For example, if the event time ratio (treated vs. control) equals 1.25, then the time corresponding to any given survival probability is 25% longer for the treated than the control subjects, implying accelerated failure times among the controls. Recently, the AFT model has become increasingly popular, partly because it avoids the noncollapsibility[5,6] and resulting built-in selection bias of PH-based hazard ratios.[7,8] However, in contrast to the Cox PH model, the parametric AFT models require specifying the event time distribution,[4,9] which is difficult in many real-life applications. To avoid this restriction, several semiparametric AFT models were proposed,[10-16] including our recent spline-based model. However, almost all the aforementioned semiparametric AFT models implicitly impose the conventional assumptions underlying the classic AFT model that (i) for all covariates, time ratios, i.e. “acceleration factors,” are constant over time and (ii) continuous covariates have linear relationships with the logarithm of event time (see Appendix A.1 in Supplementary material for more details). Yet, in multivariable analyses either assumption may be violated, for some covariates. Indeed, real-life applications of flexible extensions of the Cox PH model,[19-24] reported frequent violations of the corresponding assumptions of (i) constant hazard ratios (PH)[25,26] or (ii) linear covariate effects on log hazard, or even both assumptions, for the same continuous covariate.[28,29] In contrast, relatively little work has focused on flexible modeling of covariate effects in the AFT framework. Specifically, whereas alternative AFT partial linear models permit estimating nonlinear (NL) effects, through spline smoothing or piecewise linear functions,[30-32] they require additional assumptions about the error distribution or seem restricted to the univariate setting.[31,32] Similarly, only a few studies permit modeling nonconstant log time ratios, e.g. as a linear function of the follow-up time t. Cox and Oakes outlined a more general AFT model, which can handle either time-varying covariates or time-dependent effects: where function links Z to survival, and is the associated parameter vector. Setting yields AFT model with time-dependent effects : However, the authors do not discuss either parameter estimation or interpretation of the estimated time-varying effects, and do not apply the proposed model to either simulated or real-life data. Recently, in an arxiv manuscript, Crowther et al. have proposed a flexible AFT model with restricted cubic splines modeling of the event time distribution. They develop an elegant full likelihood estimation framework and provide practical implementation in Stata and R. Importantly, Crowther et al's. model builds on the aforementioned Cox and Oakes’ model, to incorporate modeling of time-varying effects, on the cumulative scale: where Crowther et al. estimate time-varying covariate effect by a spline function of the log time and estimate variance through inversion of the negative Hessian. Whereas their model allows for multivariable analyses, the time-varying event time ratio seems only directly interpretable for univariate modeling of a binary exposure. Indeed, the ratio of the times when the “exposed” ( for ) versus the “unexposed” ( for ) reach any given survival probability (1−q) is estimated by . However, it is less clear how to estimate the ratio of the corresponding event times, for fixed (1−q), for more complex contrasts, involving either continuous covariates or multivariable analyses, where event time ratios may depend on the other covariates. Both comprehensive simulation studies and a real-life application, reported by Crowther et al. illustrate the advantages of flexible modeling of the event time distribution, but involve only constant acceleration factor(s), and one or two covariates. Thus, the accuracy of the estimated time-varying covariate effect(s) and performance of their model in multivariable analyses need to be further assessed. Furthermore, similar to Cox and Oakes, Crowther et al. do not discuss modeling of potential NL effects of continuous covariates. Furthermore, to the best of our knowledge, no published AFT model permits estimating both time-dependent (TD) and NL effects of continuous covariates. Yet, simulations suggest that, under the PH model framework, the relevant TD and NL effects of all continuous covariates should be simultaneously accounted for to avoid biased estimates and inaccurate inference.[22,34] To address these challenges, we propose a flexible extension of the AFT model that permits accounting for potential TD and/or NL covariate effect(s) while allowing for arbitrary distribution of the event times. Section “Methods” describes our model and the estimation algorithm. Simulation studies are reported in the section “Simulation studies.” In the section “Real-life application,” our flexible extension of the AFT model identifies TD and/or NL effects of some prognostic factors in real-life multivariable analyses of mortality after septic shock. We conclude with a discussion of our results and their implications.

Methods

In the conventional AFT model, the natural logarithm of the event time, logT, is modeled as a linear function of the covariate vector [3,4]: where W is a random error and is the vector of regression parameters, i.e. the logarithms of time ratios, which describe how the changes in covariate values are associated with either accelerated or decelerated event time. The equivalent hazard-based specification of the AFT model is : where is the baseline hazard function corresponding to = 0 and is the same log(time ratio) vector as in (1). Both formulations of the AFT model (1) and (2) imply (i) a linear relationship between each continuous covariate and the log event time and (ii) constant-over-time log time ratios , i.e. log acceleration factors. In this article, we propose a flexible extension of the AFT model to simultaneously relax both these conventional assumptions and to account for possible TD covariate effects on the hazard and/or NL effects of continuous variables.

Joint flexible modeling of NL and TD effects in the AFT model

First, to relax the linearity assumption, the AFT model (1) can be generalized to: which is sometimes referred to as the “AFT partial linear model.”[30,32] The function is a possibly NL transformation of continuous covariate that estimates its association with the logarithm of event time. An equivalent NL extension of the hazard-based AFT model (2) is: However, both models (3) and (4), as well as other published flexible NL extensions of the AFT model,[15,30,32] restrict the covariate effects to be constant during the follow-up. To relax this assumption, we propose to further extend the NL hazard-based AFT model (4), by replacing constant log acceleration factor β by a flexible function of follow-up time . The resulting model (5) below allows for both TD effects of all covariates on the hazard and NL effects of continuous covariates: This extension involves modifying the hazard-based AFT models (2) and (4), rather than the mathematically equivalent classic AFT models (1) and (3), to avoid complex constraints required for modeling the density or the survival functions. In flexible model (5), the effect of a continuous variable on the log hazard, at time t, is modeled as a product of two covariate-specific estimable functions: and . First, defines a functional form for , i.e. its possibly NL dose-response function. Secondly, the time-dependent function , reflects the dynamic changes, during the follow-up, in the strength of the covariate effect on the log hazard, expressed by . Notice that the estimated values of do not represent time-varying log event time ratios, i.e. time-varying acceleration factors. However, model (5) implicitly allows for the ratios of event times associated with different values of to vary over time, because the constant coefficient in (2), mathematically equivalent to constant log time ratio in classic AFT model (1), is replaced by the TD function . Section “Reconstructing time-dependent time ratios” explains in detail how time-varying event time ratios for any contrast in can be reconstructed based on the estimate in (5). Then, Appendix A2 in Supplementary material permits assessing the accuracy of the resulting estimates of (i) reconstructed time ratios, and (ii) survival functions conditional on time-varying covariate effect(s), based on our proposed model (5), in simulations where the data-generating model was not specified in terms of in (5). In the proposed model (5), similar to a more constrained hazard-based AFT model (2), the covariate affects the hazard function by not only shifting the baseline hazard in the time scale horizontally, as reflected by , but also shifting it vertically by a multiplicative factor . For a binary covariate, . We propose to model for each covariate , for each continuous covariate , and the baseline log hazard, using low-dimension unpenalized regression B-splines with degree p and m interior knots: where , and are the B-spline basis functions, and , and are the spline coefficients to be estimated for , and , respectively. As a default option, we suggest using quadratic splines (p = 2) with one interior knot (m = 1) for estimating and cubic splines (p = 3) with 2 knots (m = 2) for modeling (Appendix A3.1 in Supplementary material provides the rationale and more details). The full log-likelihood based on model (5) for right-censored data is derived as: where is the event or censoring time for subject i, and is an indicator of the event or censoring . We can estimate the three sets of spline coefficients , and by maximizing the above full log-likelihood in (9). Substituting and into (6) and (7), the NL and the TD estimates can be obtained. Then, the hazard and survival functions, conditional on covariate vector , can be computed as in (10) and (11): where and .

Alternating conditional estimation

Estimating the parameters , and simultaneously, by maximizing the complex likelihood function in (9), would be challenging. First, model (5) involves a product of two estimable functions and for the same continuous covariate, inducing non-identifiability. Furthermore, both and vectors need to be estimated to capture the NL and TD effects but must be considered as fixed and “known” when estimating coefficients in for the log hazard function. To address these challenges, we rely on an iterative alternating conditional estimation (ACE) algorithm. The algorithm iterates across the three consecutive steps, each involving estimating only one of the above coefficient vectors, conditional on the previous estimates of the two other vectors. Appendix A3.2 in Supplementary material describes the ACE algorithm in detail and Appendix A3.3 in Supplementary material discusses bootstrap-based pointwise 95% confidence bands around the estimates.

Reconstructing time-dependent time ratios

The time-dependent estimate in flexible model (5) describes how the effect of on the hazard changes over time. However, in contrast to the constant in the conventional AFT models (1) and (2), in model (5) does not represent time-dependent changes in the logarithm of the time ratio. In fact, given the estimate, the corresponding time-dependent time ratios for a contrast between two subjects with different values of not only vary across time t but also depend on the values of other covariates for the subjects compared. We propose to describe time-dependent covariate effects, within the AFT framework, in terms of the times to reach a specific quantile of the event time distribution. For a binary exposure X, we focus on the ratio of the times when the unexposed (t0 for X = 0) and the exposed groups (t1 for X = 1) reach the qth quantile of the event time distribution (q), corresponding to survival probability (p = 1−q). For example, if 80% survival probability (q = 0.2) for the unexposed and exposed group is reached by 7 and 11 months, respectively, then . If the exposure has a truly time-varying effect, then e.g. differs from In contrast, in the conventional AFT model, the time ratio is constant: for all q. Yet, in multivariable analyses based on our model (5), reconstructing the time-dependent time ratio comparing survival for subjects with different covariate patterns requires complex transformations of and by inverting the survival functions in (11). Because analytic solution is difficult, we rely on a grid search to find the q-quantiles of the respective event time distributions. For example, for a setting with two continuous covariates that both have TD and NL effects, to reconstruct the time-dependent time ratio comparing two subgroups with versus but the same value of in both groups, we need three steps. Step (1) calculates and for discrete times t with extremely small increments. The following two steps are: (2) search, separately, for the times t1 and t0 such that ; (3) calculate . These calculations are repeated across the relevant range of q values and the resulting function is plotted to describe the time-dependent time ratio for this specific contrast. Similar calculations can be performed for any contrast of interest, but the results will vary depending on the values of all other covariates in the model. Figure 4 and Figure A2.1(a) to (c) and Table A5.1.2 in Supplementary material illustrate examples of reconstructed time ratios for, respectively, simulated and real-life data.

Figure 4.

Results of the estimated log time ratios by the flexible accelerated failure time (AFT) model using 100 samples in simulation scenario 1, comparing two covariate patterns for each covariate. The two covariate patterns are shown in the labels on the top of each panel, along with the true survival times in both groups corresponding to specific q-quantile of the survival time. The gray curves are the individual estimates from 100 samples, and the pointwise mean is shown by the white curve. The black dashed curve represents the perspective true time ratios. An R program that implements model (5) and includes functions to estimate the NL and TD effects and reconstruct time-dependent time ratios for an arbitrary contrast in any covariate is available at GitHub (https://github.com/MenglanPang/Flexible-AFT-Model).

Simulation studies

Design of primary simulations

To evaluate the performance of the proposed model in multivariable AFT analyses, we simulated a hypothetical cohort of N = 1000 subjects followed until the event or administrative censoring, at 6 years. Event times were generated from the extended AFT model, conditional on three covariates (binary X1 and continuous X2, X3): with a Weibull baseline hazard . Two simulated scenarios assumed different shape parameters of the increasing baseline Weibull hazard ( vs. ), with a common scale . Both scenarios assumed a TD effect of binary and both TD and NL effects of continuous . However, had a linear TD effect in scenario 1, but a constant-over-time NL effect in scenario 2. Figures 1 and 2 show “true” TD and NL covariate effects, respectively. Appendix A4.1 in Supplementary material provides details of data generation.

Figure 1.

Figure 2.

Results of the estimated TD and NL effects by the flexible AFT model using 100 samples in simulation scenario 2. The gray curves are the individual estimates from 100 samples, and the pointwise mean is shown by the white curve. The black dashed curve represents the true rescaled NL and TD functions.

Results of the estimated TD and NL effects by the flexible AFT model using 100 samples in simulation scenario 1. The gray curves are the individual estimates from 100 samples, and the pointwise mean is shown by the white curve. The black dashed curve represents the true rescaled NL and TD functions. Results of the estimated TD and NL effects by the flexible AFT model using 100 samples in simulation scenario 2. The gray curves are the individual estimates from 100 samples, and the pointwise mean is shown by the white curve. The black dashed curve represents the true rescaled NL and TD functions.

Results of primary simulations

For both scenarios, three multivariable models were fit to each of the 100 simulated samples: (i) the “conventional” parametric Weibull AFT model with linear covariate effects and constant time ratios; (ii) the “nonlinear” Weibull AFT model with NL effects for and but constant time ratios; and (iii) our proposed flexible extension of the AFT model (5) with all possible TD and NL effects, and baseline hazard estimated by splines, without distributional assumptions. NL and TD effects were estimated with quadratic B-splines with one interior knot at the median of the covariate or follow-up times distributions, respectively.

Estimation of the TD and NL functions

To assess the accuracy of the covariate effects estimated with our model (5), Figures 1 and 2 compare the 100 TD and NL estimates (gray curves) against the corresponding true functions (black dashed curves), respectively for scenarios 1 and 2. (The estimates are rescaled, as explained in Appendix A3.4 in Supplementary material). In scenario 1, most of the estimates correctly recover the TD effects of different shapes (Figure 1(a), (b) and (d)), and the NL estimates for X2 and X3 (Figure 1(c) and (e)). The TD estimates show more variability in the tails, where the events are less frequent and regression splines are less stable. Yet, the mean values of all spline-based estimates (white curves) trace fairly close the corresponding true TD/NL functions, indicating lack of under- or over-fit bias, even for the truly linear effect of in Figure 1(e). Figure 2 shows that for scenario 2 almost all spline-based TD and NL estimates are also reasonably unbiased, including TD estimates of the truly constant-over-time effect for (Figure 2(d)). The only exception is the failure of TD estimates to recover a decreasing effect of in the later phase of follow-up, where there are fewer events (Figure 2(a)). Appendix A4.2 in Supplementary material shows that generally similar results for scenario 1 were obtained with different sample sizes (N = 650, 1500) and event rate (40%). Appendix A4.3 summarizes results from the more constrained AFT models (i) and (ii) that illustrate the impact of mis-specifying the covariate effects.

Hazard and survival estimates

Figure 3 compares model-specific hazard and survival estimates (gray curves) in scenario 1, for covariate vector , against the corresponding true functions (black dashed curves). (Appendix A.4.4 in Supplementary material shows similar results for other selected covariate patterns). For the Weibull AFT models (i) and (ii), both the hazard (Figure 3(a) and (b)) and the survival function (Figure 3(d) and (e)) estimates are systematically biased, even though event times were generated assuming Weibull baseline distribution. This illustrates the impact of ignoring TD and—for model (i)—NL covariate effects. In contrast, the hazard and survival estimates based on the proposed flexible TD/NL extension of the AFT model (5) are reasonably unbiased (Figure 3(c) and (f)), suggesting potential advantages of the flexible modeling in multivariable settings with complex covariate effects. Figure A4.4.3 in Supplementary material shows similar results for scenario 2. Appendix A4.5 in Supplementary material describes additional, univariate simulations that provide further evidence of the importance of accounting for NL effects of continuous covariates.

Figure 3.

Estimated baseline hazard functions (the first row) and survival curves (the second row) by the three alternative models using 100 samples in simulation scenario 1. The gray curves are the individual estimates from 100 samples, and the pointwise mean is shown by the white curve. The black dashed curve represents the true baseline hazard and survival functions.

Time-dependent time ratios

Figure 4 shows the estimated adjusted time-dependent time ratios (gray curves), reconstructed using methods of the section “Reconstructing time-dependent time ratios,” in scenario 1 for selected contrasts (see figure legend for details), against the true time ratios (black dashed curves). The shapes of the reconstructed time-dependent time ratios for and (Figure 4(a) and (c)) generally agree with the corresponding TD estimates in model (5) (monotonically decreasing in Figure 1(a) and U-shaped in Figure 1(d)). For , because we assumed , the true time ratio equals 1 across the follow-up, and our spline-based estimates (gray curves in Figure 4(b)) recover well this constant null effect. Thus, consistent with simulations results in Appendix A2 in Supplementary material, even if the TD estimates from our model (5) represent covariate effects on the hazard, combined with the additional computations of the section “Reconstructing time-dependent time ratios,” they recover reasonably well different true patterns of time-varying or constant event time ratios. However, as illustrated in Appendix A4.6 and A6 in Supplementary material, in some situations, the pattern of changes over time in the adjusted time ratios for specific contrasts, may (a) substantially diverge from the estimated shape of for the same covariate; and/or (b) vary considerably depending on the values of other covariates. Therefore, real-life applications should rely on the approach of the section “Reconstructing time-dependent time ratios” to reconstruct the event time ratios for clinically relevant contrasts in prognostic factors.

Comparison of goodness of fit

Table 1 shows that, as expected, when true time ratios were time-varying for some covariates, the proposed flexible model (5) yielded better fit to data than the more constrained AFT models (i) and (ii) (mean Akaike information criterion (AIC) differences of 44–125 points). In contrast, in additional simulations with data generated from the conventional AFT model (1), with constant time ratios and linear effects, our flexible model (5) yielded AIC worse by, on average, 10–13 points than the two simpler AFT models (Appendix A4.7 in Supplementary material), which correctly suggested the lack of systematic TD and NL effects. These results confirm the usefulness of goodness-of-fit comparisons for “model diagnostics.”

Table 1.

Comparison of mean Akaike information criterion (AIC) in simulation studies from three alternative models.

	Flexible AFT model (df = 26)	Conventional Weibull AFT model (df = 5)	Nonlinear Weibull AFT model (df = 9)
Scenario 1	2516.16	2641.49	2600.69
Scenario 2	2255.64	2331.04	2299.25

AFT: accelerated failure time.

Comparison of mean Akaike information criterion (AIC) in simulation studies from three alternative models. AFT: accelerated failure time.

Real-life application

Data source

We applied the model (5) to a real-life study to re-assess the role of important prognostic factors for 3-month all-cause mortality after septic shock.[38,39] Time zero corresponded to initiation of vasopressors in response to septic shock, and patients alive at 90 days after the septic shock were censored. Details on baseline covariates, measured at admission, were reported elsewhere.[38,39] Our analyses included 858 patients who had appropriate antibiotics therapy and complete covariate data. There were 433 (50.5%) deaths during the 1478 patient months of follow-up (median duration: 63.5 days).

Flexible AFT analyses

We assessed five important baseline prognostic factors, selected a priori, based on the literature : age, and Sepsis-related Organ Failure Assessment (SOFA) score, with higher scores indicating a worse organ dysfunction, and three binary variables: immune suppression (yes vs. no), infection site (urinary tract vs. other), and Knaus score of activity limitations (normal or moderate vs. severe or bedridden). Three additional binary covariates were considered for inclusion if they improved the model's fit to data: presence of the germ, infection type (community-acquired vs. nosocomial) and cirrhosis status. Appendix A5.1 in Supplementary material describes the 3-stage procedure used to select specific TD and/or NL effects, and the additional covariates, into the final multivariable model (5). The pointwise 95% confidence bands for the baseline hazard, and the selected TD and NL functions, were estimated based on 300 bootstrap resamples.

Alternative models

We estimated three additional models, with the same covariates, as selected into our final flexible model (5): (i) the “conventional” Weibull AFT model, with constant time ratios and linear effects of continuous covariates, (ii) the “nonlinear” Weibull AFT model that allowed for NL effects (see the section “Results of primary simulations” for details), and (iii) the extension of constant/linear AFT model (i) which modeled baseline hazard in (2) with splines, to avoid distributional assumptions. Notice that models (i)–(iii) could not accommodate potential TD effects and only model (ii) allowed for NL effects.

Results

The spline-based estimate in Figure 5(a) shows that the baseline all-cause mortality hazard decreases monotonically with time since septic shock. For each prognostic factor (row), Table 2 compares the effects estimated with the four AFT models (columns). (Table A5.1.1 in Supplementary material shows details of model selection). For variables for which no TD effects were selected, all four models estimate similar constant-over-time time ratios, with longer survival associated with absence of immunosuppression, urinary infections and less severe Knauss scores.

Figure 5.

Table 2.

The estimated event time ratios and AIC values from alternative models in study on mortality after septic shock.

Covariates	Conventional Weibull AFT model (df = 9)	Nonlinear Weibull AFT model (df = 13)	Spline-based AFT model (df = 13)	Flexible AFT model (df = 33)
Age (1-year decrease)	1.05 (1.03, 1.06)	NL	1.01 (0.96, 1.04)	NL + TD
SOFA score (1-point decrease)	1.51 (1.42, 1.59)	NL	1.32 (1.17, 1.44)	NL + TD
Immunosuppression (absence vs. presence)	2.68 (1.88, 3.81)	2.72 (1.88, 3.81)	2.19 (1.23, 3.11)	2.31 (1.58, 3.25)
Cirrhosis (absence vs. presence)	1.53 (0.90, 2.61)	1.57 (0.98, 2.58)	1.42 (0.84, 2.05)	TD
Knaus score (lower (A/B) vs. higher C/D))	1.79 (1.27, 2.53)	1.70 (1.31, 2.33)	1.75 (1.06, 2.43)	1.51 (1.11, 2.34)
Infection site (non-urinary vs. urinary)	0.40 (0.25, 0.63)	0.38 (0.23, 0.67)	0.46 (0.28, 0.80)	0.49 (0.30, 0.69)
Infection type (community-acquired vs. nosocomial)	1.40 (0.99, 1.98)	1.40 (0.99, 2.00)	1.30 (0.77, 1.61)	TD
AIC	4191.217	4187.541	4195.138	4157.036

AFT: accelerated failure time; AIC: Akaike information criterion; TD: time dependent; NL: nonlinear; SOFA: Sepsis-related Organ Failure Assessment.

Results of the flexible AFT model in study on mortality after septic shock. (a) Baseline hazard function (for individuals with all binary covariates equal 0, age equals to the minimal age of 20 years, and SOFA score equals to the minimal value of 3); (b) TD effects of the cirrhosis status and nosocomial infection; (c) NL effect of age relative to the mean age of 66 years; (d) TD effect of age; (e) NL effect of SOFA score relative to the mean score of 11; and (f) TD effect of SOFA score. Estimates are represented by the black curve, and the shaded gray areas correspond to the 95% pointwise confidence bands, based on 300 bootstrap resamples. The NL effects are constrained to equal 0 at the reference value corresponding to the mean covariate value, and thus the estimates at the reference value show no variation. The empirical distributions of the observed event times (panels a, b, d, f), age (panel c), and SOFA score (panel e) are shown by rug plots at the bottoms of the respective graphs. The estimated event time ratios and AIC values from alternative models in study on mortality after septic shock. AFT: accelerated failure time; AIC: Akaike information criterion; TD: time dependent; NL: nonlinear; SOFA: Sepsis-related Organ Failure Assessment. The last column of Table 2 indicates which TD and/or NL effects were selected into our final flexible model (5). TD estimates in Figure 5(b) show how the strength of the effects of infection type and cirrhosis on the hazard vary over the first 60 days after the septic shock. The association of nosocomial infection with higher mortality becomes stronger with longer follow-up (dashed curve), whereas the impact of cirrhosis increases rapidly during the first 10 days and then stabilizes (solid curve). Table A5.1.2 in Supplementary material shows the resulting time ratio estimates. Both TD effects substantially improve the model's deviance (3-df likelihood ratio test (LRT) statistics of 8.93 and 22.24, respectively), implying also rejection of the H0 of no association with survival. In contrast, all simpler models (i)–(iii), that did not allow for TD effects, suggested constant-over-time effects of both factors were marginally nonsignificant (Table 2) and under-estimated their long-term impact, relative to our TD time ratio estimates in Table A5.1.2. Our final flexible model (5) included also NL and TD effects for age and the SOFA score (last column of Table 2). The nonmonotone J-shaped NL effect of age (Figure 5(c)) suggests mortality is lowest at about 45 years and increases for both younger and older patients. The TD effect for age in Figure 5(d) indicates that this impact becomes stronger over time. In contrast, the NL effect of SOFA is weak (Figure 5(e)) and adding the NL term improves only marginally the deviance (4-df LRT = 9.256, p = 0.055). While this association weakens over time, a higher SOFA at the time of septic shock is associated with a risk increase even two months later when the 95% pointwise confidence interval still excludes 0 (Figure 5(f)). (Figure A5.2.1 in Supplementary material shows NL effects of age and SOFA estimated, based on our model (5), at different follow-up times, and Figure A5.2.2 in Supplementary material shows their NL effects estimated by the “nonlinear” Weibull model (ii).) The proposed flexible extension of the AFT model (5) improved AIC by at least 30 points relative to simpler models (i)–(iii) (bottom of Table 2), highlighting the importance of accounting for the effects NL and/or TD effects. (The Cox–Snell residual plots are shown in Figure A5.2.3 in Supplementary material).

Discussion

To the best of our knowledge, the proposed model (5) is the first flexible extension of the AFT model that simultaneously incorporates both the TD effects and NL effects of continuous variables on the logarithm of hazard function, along with TD effects for categorical variables. The NL and TD effects, and the baseline hazard, are modeled using low-dimension unpenalized regression B-splines. The NL estimate describes how the hazard varies with an increasing value of a continuous covariate, whereas the TD estimate informs how the strength of the covariate effect changes during the follow-up. In multivariable simulations, the proposed spline-based estimates of the NL and TD effects were reasonably unbiased, and the methods of the section “Reconstructing time-dependent time ratios” permit reconstructing time-dependent time ratios for arbitrary contrasts in the covariate values. Furthermore, the survival curves, conditional on multiple covariates with possibly complex effects, are accurately estimated. In contrast, simulations in Appendix A4.5 in Supplementary material demonstrate that ignoring an important NL effect may bias survival curve estimates. This was expected, given similar simulation results obtained within the PH modeling framework confirm that mis-specified models yield biased estimates. Here, we note that in most of our simulations the data-generating mechanisms were structurally compatible with our proposed flexible model (5). This helped assess the accuracy of the NL and TD estimates under a correctly specified model, but future research should investigate more complex situations where the selection of TD/NL effects relies on data-dependent criteria. However, in additional simulations in Appendix A2 in Supplementary material, even if the data-generating mechanism avoided any explicit specification of time-dependent effects on the hazard, i.e. did not favor our model (5), it still allowed an accurate reconstruction of group-specific survival functions, providing some further reassurance regarding its usefulness. The analyses of mortality after septic shock illustrate how our flexible NL/TD estimates may provide new insights into the role of different prognostic factors. For age, the NL estimate suggests a nonmonotone effect on the hazard, whereas the TD estimate indicates that its strength increases over time, possibly because mortality soon after a septic shock depends mostly on the indicators of the severity of the patient's initial condition (especially SOFA and Knaus scores) rather than on age. In contrast, among those who survive this critical early period, older patients are more likely to die. Furthermore, TD estimates for both nosocomial infection and cirrhosis suggest statistically significant increases in their impact with longer time since the septic shock (Figure 5(b)). In contrast, all simpler AFT models, constrained to constant-over-time event time ratios, did not yield evidence of systematic associations of either factor with survival. Additional practical advantages of our flexible multivariable model (5), that extends the AFT model to account for NL and/or TD effects of the covariates on the hazard, may include also improving both (i) the model's fit to data (Section “Results”) and (ii) accuracy of prediction for individual patients. Regarding (ii), in simulations our flexible model (5) yielded practically unbiased estimates of survival functions, conditional on different covariate vectors, whereas simpler AFT models constrained by the conventional assumptions of either constant time ratios or linearity, yielded markedly biased estimates of survival probability, for selected covariate patterns (Figure 3 and Figures A2.1 and A4.5). Thus, our proposed flexible model (5) may be of interest, both for etiology and prediction, especially in multivariable analyses, where some covariates act consistently with the AFT assumption but other may violate the constant time ratio assumption. Finally, because AFT estimates are not affected by noncollapsibility,[5,6] in some applications flexible time-varying AFT models, such as our model (5) or the Crowther et al.'s model, may help explore the reasons for TD effects identified through flexible extensions of the PH model. Indeed, if for covariate X one finds decreasing-over-time effects in terms of both event time ratios and hazard ratios, then it is less likely that the latter finding reflects just an omitted “susceptibility” or frailty.[7,41,42] To facilitate interpretation of the time-dependent covariate effects estimated through our model (5), section “Reconstructing time-dependent time ratios” describes how to reconstruct the corresponding time-dependent event time ratios. The time-dependent time ratio, for a given contrast in the covariate value (e.g. vs. ), conditional on the values of other covariates in the multivariable model, quantifies the ratio of times when subjects with the corresponding covariate values have the same survival probability . The time-dependent effect in our model (5) allows the time ratio, for the same contrast in X, to vary depending on the “reference time” . In this sense, our model is an extension of the conventional AFT model, in which the time ratio for a given contrast is constant, regardless of or , because for any t > 0. Interestingly, the time-dependent time ratio reconstructed based on our model (5) is corresponding to the ratio of the corresponding cumulative measures of time-varying effect in the flexible AFT extension proposed by Cox and Oakes. Several flexible methods were proposed to estimate time-dependent hazard ratios in the extended Cox model.[20,24,28,40,43] In contrast, to date, discussion of TD covariate effects in the context of AFT modeling, i.e. time-dependent time ratios, has received relatively little attention and, to the best of our knowledge, is limited to the general idea expressed by Cox and Oakes, an extended linear hazard model by Elsayed et al., and the recent arxiv manuscript by Crowther et al., discussed later in this section. (Indeed, published flexible AFT partial linear models relax the linearity constraint, but impose the constant time ratio assumption[30-32]). One reason may be related to a complex relationship of time-dependent time ratio with the inverse survival function, as outlined in section “Reconstructing time-dependent time ratios.” In the PH framework, the time-dependent hazard ratio for a given covariate at time t is independent of the baseline hazard, or other covariates. In contrast, in the AFT framework, assessing the time-dependent time ratio for covariate at time t requires inversing the survival probability at time t, , which depends on both (i) the underlying baseline hazard and (ii) specific values of all variables included in the model. This implies not only analytical and computational complexities, but also conceptual challenges. In the PH framework, it is rather straightforward to assume that the impact of covariate on the current hazard, at time t during the follow-up, changes according to some smooth function . Accordingly, any reasonably flexible smoothing technique allows direct estimation of time-varying log hazard ratios. In contrast, in the AFT framework, it is more difficult to conceptualize the mechanism that generates a specific pattern of time-varying changes in the log ratio of the times at which different subjects reach the same survival probability, because depends mostly not on the current hazard at time t but on the cumulative effect of hazards at all times in the past . Therefore, to facilitate separating possibly TD covariate effects from the baseline hazard, and to avoid difficulties in modeling survival and/or density functions, we have implemented our model (5) as a flexible extension of the hazard-based formulation of the AFT model (2), rather than of the “classic” event time-based AFT model (1). Specifically, we replaced the constant in model (2), mathematically equivalent to the constant time ratio in model (1),[4,18] by the TD function of follow-up time . Thus, our flexible model (5) assumes that the instantaneous impact of the covariate on the current hazard, at time t, estimated by is common to all subjects, regardless of their values of other covariates, which is often clinically plausible.[26,40,44] However, the complex relationship between covariate effects on the hazard and survival in the AFT framework, implies that the estimated TD effect of on the hazard scale , cannot be directly interpreted as time-dependent changes in the time ratio for . To address this issue, in the section “Reconstructing time-dependent time ratios,” we show the numerical transformations necessary to convert the estimates into time-dependent time ratios , i.e. ratio of times when subjects with different values reach survival probability (1−q). Simulations in the section “Simulation studies” and Appendix A2 in Supplementary material suggest that reconstructed time-varying event time ratios are reasonably unbiased. Yet, one implication of our model (5) is that, for the same contrast in the patterns of reconstructed time ratios may be quite different for subjects with different vectors of other covariates. Figure 5 and Table A.5.1.2 in Supplementary material illustrate such discrepancies, for the effect of SOFA score on mortality after septic shock. Furthermore, Figures A6.1 to A6.2 in Supplementary material provide two hypothetical analytical examples where the shapes of (i) estimated in equation (5) versus (ii) the corresponding reconstructed time-dependent time ratio , differ substantially. Our proposed methods have some limitations. First, when analyzing the simulated data, we have a priori decided to estimate all potential NL and TD effects, regardless of which effects were present in the “true” data-generating model. We were encouraged to observe that for covariates without true TD or NL effects, most of the TD estimates were approximately constant-in-time and the NL estimates approximated well a straight line (Figures 1(e) and 2(d)). Thus, over-fit bias, a common concern for flexible modeling,[23,36,45] was not a major issue in our simulations, with about 250 uncensored events and three covariates. Additional simulations, with different event frequency and more covariates, may be, however, necessary to further explore this issue. Furthermore, due to the complexity of the likelihood function and the iterative ACE procedure, the computation time can be long, especially for large datasets where several TD and/or NL effects need to be estimated. For example, the average run time across the 100 simulations for scenario 1 in the section “Simulation studies” with N = 1000 and 80% event rate was 1.5 h on computers with Ubuntu operating system with 3.20 GHz Intel Core i7-8700 CPU and 16 GB memory. For the septic shock application, it took 6.7 h to run the final model, with 7 covariates, 33 df's and N = 858, on a Mac computer with 2.7 GHz Intel Core i5 CPU and 8 GB memory. Yet, despite the computational burden, we were able to run multivariable simulations and estimate TD and/or NL effects of 3 covariates. With rapid improvement of the computational power, future real-life analyses of a single, even large, multivariable datasets will become increasingly efficient. Further work is needed to systematically compare the proposed flexible NL/TD extension of the AFT model (5) with the PH Cox model, and its flexible extensions. Our simulations were designed to evaluate the performance of the proposed model under the AFT framework, therefore the data were generated accordingly. However, in many real-life applications, the “true” data-generating model may be more consistent with the PH model, implying violation of the constant time ratio assumption underlying the conventional AFT model, which may lead to biased estimates, unless the baseline hazard is exponential or Weibull. In such situations, by allowing the time ratios to vary during the follow-up, our proposed flexible model (5) with TD effects may still reasonably capture the relationships between the covariates and the hazard, but will require more parameters than the PH model, with a single hazard ratio for each covariate. A reverse situation will occur if the (unknown) data structure is more consistent with the AFT model. Therefore, further simulations comparing the PH and AFT models, and their flexible extensions, under a broader range of assumptions concerning “true” data-generating mechanisms, are necessary. Furthermore, in complex real-life studies with multiple covariates, neither the PH nor the AFT assumption may be fully satisfied for all covariates. Then implementing alternative modeling strategies and using goodness-of-fit criteria, supplemented by residual diagnostics, may help choose the final model, or alternative models, but further simulations are necessary to systematically evaluate such diagnostic tools. Finally, because our main focus was on accurate modeling of covariate effects within the AFT framework, we have not considered complex event time distributions. This limitation should be overcome in future simulations, even if our recent work suggests that flexible spline-based modeling of the baseline hazard ensures accurate estimation of constant-over-time covariate effects, i.e. constant acceleration factors, under the conventional AFT model (1), regardless of the “true” (unknown) shape of the hazard. Our flexible TD/NL model (5) can be considered an alternative for the flexible parametric AFT model proposed by Crowther et al. The two models allow for assumption-free modeling of both the event times distribution and time-varying covariate effects, in multivariable analyses, but use different mathematical formulations and estimation procedures. Crowther et al. rely on an elegant full maximum likelihood estimation approach, with user-friendly implementation in R and STATA, which facilitates variance estimation and, for a binary exposure in univariate setting, estimates directly interpretable time-varying acceleration factor. Our model (5) allows more flexibility in accounting for possibly NL effects of continuous covariates and is more directly adaptable to modeling possibly time-dependent covariate effects on the hazard in multivariable analyses. In the general case, both models require additional calculations to estimate time-varying event time ratios for specific contrasts in the values of a particular covariate, and in the section “Reconstructing time-dependent time ratios,” we describe the procedures applicable to our model (5), in univariate or multivariable analyses. Future simulation studies and real-life applications should help systematically compare these two flexible extensions of the AFT model with respect to accuracy of both estimation and inference about time-varying acceleration factors, and computational efficiency. Other outstanding analytical challenges, that need to be addressed in future research on both models, include developing and validating criteria and methods to assist in multivariable model building. Overall, in simulations involving multivariable analyses, our proposed flexible extension of the AFT model yielded reasonably accurate estimates of complex covariate effects on the hazards and allowed unbiased estimation of individual survival curves, conditional on these effects. The model can be implemented using our R programs, available at GitHub (https://github.com/MenglanPang/Flexible-AFT-Model). Furthermore, our septic shock application suggests that the proposed flexible extension of the AFT model may offer new insights into the role of prognostic factors in clinical studies. Still, further comprehensive simulations and multivariable empirical analyses will be necessary to systematically compare our estimates with those offered by alternative flexible extensions of either the PH model or the AFT model, including the elegant flexible parametric AFT model recently proposed by Crowther et al. We also hope that our work may encourage more widespread use of AFT modeling in time-to-event analyses and stimulate further methodological research in this area.

29 in total

1. Censored partial regression.

Authors: Jesus Orbe; Eva Ferreira; Vicente Núñez-Antón
Journal: Biostatistics Date: 2003-01 Impact factor: 5.899

2. Comparison of alternative models for linking drug exposure with adverse effects.

Authors: Michal Abrahamowicz; Marie-Eve Beauchamp; Marie-Pierre Sylvestre
Journal: Stat Med Date: 2011-11-17 Impact factor: 2.373

3. Semi-parametric accelerated failure time regression analysis with application to interval-censored HIV/AIDS data.

Authors: Hongqi Xue; K F Lam; Benjamin J Cowling; Frank de Wolf
Journal: Stat Med Date: 2006-11-30 Impact factor: 2.373

4. Joint estimation of time-dependent and non-linear effects of continuous covariates on survival.

Authors: Michal Abrahamowicz; Todd A MacKenzie
Journal: Stat Med Date: 2007-01-30 Impact factor: 2.373

5. Properties of 2 counterfactual effect definitions of a point exposure.

Authors: W Dana Flanders; Mitchel Klein
Journal: Epidemiology Date: 2007-07 Impact factor: 4.822

6. The role of frailty models and accelerated failure time models in describing heterogeneity due to omitted covariates.

Authors: N Keiding; P K Andersen; J P Klein
Journal: Stat Med Date: 1997 Jan 15-Feb 15 Impact factor: 2.373

7. The hazards of hazard ratios.

Authors: Miguel A Hernán
Journal: Epidemiology Date: 2010-01 Impact factor: 4.822

8. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation.

Authors: Willi Sauerbrei; Patrick Royston; Maxime Look
Journal: Biom J Date: 2007-06 Impact factor: 2.207

9. The epidemiology of septic shock in French intensive care units: the prospective multicenter cohort EPISS study.

Authors: Jean-Pierre Quenot; Christine Binquet; Fady Kara; Olivier Martinet; Frederique Ganster; Jean-Christophe Navellou; Vincent Castelain; Damien Barraud; Joel Cousson; Guillaume Louis; Pierre Perez; Khaldoun Kuteifan; Alain Noirot; Julio Badie; Chaouki Mezher; Henry Lessire; Arnaud Pavon
Journal: Crit Care Date: 2013-04-25 Impact factor: 9.097

10. STRengthening analytical thinking for observational studies: the STRATOS initiative.

Authors: Willi Sauerbrei; Michal Abrahamowicz; Douglas G Altman; Saskia le Cessie; James Carpenter
Journal: Stat Med Date: 2014-07-30 Impact factor: 2.373