Literature DB >> 27920368

Use of the concordance index for predictors of censored survival data.

Abstract

The concordance index is often used to measure how well a biomarker predicts the time to an event. Estimators of the concordance index for predictors of right-censored data are reviewed, including those based on censored pairs, inverse probability weighting and a proportional-hazards model. Predictive and prognostic biomarkers often lose strength with time, and in this case the aforementioned statistics depend on the length of follow up. A semi-parametric estimator of the concordance index is developed that accommodates converging hazards through a single parameter in a Pareto model. Concordance index estimators are assessed through simulations, which demonstrate substantial bias of classical censored-pairs and proportional-hazards model estimators. Prognostic biomarkers in a cohort of women diagnosed with breast cancer are evaluated using new and classical estimators of the concordance index.

Entities: Disease Gene Species

Keywords: Biomarkers; C-index; Pareto model; discrimination; proportional-hazards model; survival analysis

Mesh：

Substances：
Biomarkers

Year: 2016 PMID： 27920368 PMCID： PMC6041741 DOI： 10.1177/0962280216680245

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

1 Introduction

After determining if predictors of censored survival data are significant, a common objective is to measure their predictive strength on a scale that is not sample dependent. A plethora of statistics have been suggested. Some have attempted to transfer the concept of R2 from linear regression to censored data.[1,2] In this article we consider use of the concordance index for censored data. The first part of the paper reviews the concordance index for predictors of censored survival data. The second part develops concordance index estimators that are valid when the strength of the predictor becomes diminished with follow up. Our proposals are compared with classical methods using computer simulations and a breast cancer prognostic biomarker example.

2 Concordance index

The concordance index was initially developed to estimate the degree to which a randomly chosen observation from one distribution was larger than one chosen independently from another distribution.[3] When T1 and T2 are continuous independent random variables with cumulative distribution functions F1 and F2 the concordance index is If T1 and T2 place positive mass at the same point then we count half for ties and define C as P(T1 > T2) + P(T1 = T2)/2 so that and C = 0.5 when the two distributions are the same, even with ties. The concordance index can be estimated from the normalized Wilcoxon ranksum (Mann–Whitney) statistic, by where T1 (i = 1, …, n) and T2 (j = 1, …, m) are independent samples from F1 and F2 respectively, and I(.) denotes the indicator function. If R denotes the rank of the T1 (i = 1, …, n) in the combined sample (T11, …, T1, T21, …, T2) with the ranks of tied observations averaged, then the Wilcoxon ranksum test statistic is given by , which can be related to through . When the samples (T1 and T2) come from cases and controls respectively, the concordance index is the area under the receiver operating characteristic curve for (F1, F2).[4] When the samples are from two arms of a randomised control trial, C is a measure of the treatment effect. Some variations of C have also been studied. These include the odds of concordance C(1−C)−1,[5-7] and a modification to account for matched case-control designs,[8] but they are not considered further in this article. For a one-parameter family {T} of random variables indexed by real number Z from distribution {F}, a concordance index that quantifies the degree of association between T and Z is defined as where the last term essentially derives from allowing ties in Z to be broken at random.[9] The definition has the advantage of being continuous in the distribution of F and is equivalent to Kendall’s τ rank correlation coefficient because . C and C are not the same when Z is a two-point distribution, but they are linearly related. Consider where Z = 1, 2 (e.g. respectively cases and controls, or treated and untreated) and P(Z = 1) = P(Z = 2) = 0.5. Then . Thus for the balanced two-sample situation the range of C is only (1/4, 3/4) and not (0, 1) as for C. This important aspect is due to ties in Z, and interpretation of C is affected whenever ties in Z are possible. For example, the upper bound of C may decrease if a continuous Z is rounded. Although obvious from (2), this might seem surprising because in practice it is often implicitly assumed that the range of the concordance index C is always (0, 1). Some bounds on the range of C are as follows. Suppose there are n discrete values of Z. Then the smallest possible P(Z1 = Z2) occurs when they are distributed uniformly so that ; the smallest minimum value of C with n points is (2n)−1 and the maximum is . Therefore, with discrete data one might normalize C so that it can theoretically attain 0 and 1 via . For large n the range of C is less of an issue, and for continuous distributions of Z the range of C is (0, 1), as can be seen by letting T = {−Z} and T = {Z} respectively be a set of degenerate one-point distributions for continuous Z. In the rest of the paper we focus on estimators of C and C for right-censored data.

3 Estimator review

3.1 Censored-pairs estimators

The concordance indices (1) and (2) have been extended to censored data by ignoring pairs when the smaller survival time is censored and using a normalising constant to account for these uninformative pairs.[10,11] While such statistics can be useful for comparing different models on the same data set, Efron[12] noted that Gehan’s approach[10] was dependent on the censoring distribution, and so was not not a universal measure of P(T1 > T2). Others have noted that Harrell’s approach[11] likewise depends on the censoring distribution.[13] If the censoring random variable H is conditionally independent of T given Z, so that the observed survival function is (1−F)(1−F), then from equation (2), the censored-pairs concordance index is given by The terms in the numerator and denominator arise because contributions to the statistic only occur for pairs of observations when the smaller survival time is not censored. The following methods were developed to be independent of the censoring distribution.

3.2 Efron’s estimator of C

For the two-sample situation, Efron[12] suggested a solution using the Kaplan–Meier estimates for the survival distribution given by S1(t) = 1−F1(t) and S2(t) = 1−F2(t), and computing P(T1 > T2) based on these estimates through where (u) and (u) are the Kaplan–Meier estimates of the survival functions S1 and S2 respectively.[14] That is where the observed data are in pairs of event times and indicators (t1, y1) in group 1 and (t2, y2) in group 2, where y1 = 0 if t1 is censored, one otherwise, and similarly for y2, and is estimated by substituting Kaplan–Meier estimates of survival functions into the relevant terms in Table 1. Examples to show the difference between and the censored-pairs approach have been reported.[15]

Table 1.

(y_i, y_j)	t_i ≥ t_j	t_i < t_j
(1, 1)	1	0
(0, 1)	1	Si(tj)Si(ti)
(1, 0)	1-Sj(ti)Sj(tj)	0
(0, 0)	1-Sj(ti)Sj(tj)+∫ti∞Si(u)dFj(u)Si(ti)Sj(ti)	∫tj∞Si(u)dFj(u)Si(ti)Sj(tj)

Values of Efron’s Q(t, t, y, y) for the concordance statistic. Note that for the two-sample estimator of C the 1 and 2 subscripts have been dropped, so that for example t represents t1 and t is t2, similarly S is S1 etc. This notation is used so that the table generalises to estimators of C. overcomes limitations of the censored-pairs approach for the two-group problem but requires that the estimated survival functions decrease to zero, so that one treats the last event time in each group as not censored in the Kaplan–Meier estimator. When there is censoring due to incomplete follow up, with everyone censored by tmax and where S1(tmax) > 0 and S2(tmax) > 0, then Efron’s estimator may be very unstable. An important example of this situation is when individuals are enrolled sequentially in a clinical trial and events are recorded until (say) 10-years after the first entry (tmax = 10). In such situations taking the last time in each group to be an event will substantially bias the concordance index in the direction of the group with the longest surviving member beyond that time. For example, if 90% are at risk in both groups after the last event has occurred, then 81% of the terms in the double summation (4) will favour the group with the longest surviving (censored) member, and is guaranteed to be greater than 0.81−0.19 = 0.62.

3.3 Uno’s estimator of C

Uno and colleagues[13] developed a censored-pairs estimator of the concordance index (2) based on inverse probability weighting. Their solution uses a Kaplan–Meier estimate of the censoring distribution S, treating it as independent of Z and T, and re-weights the censored-pairs contribution when t to be , rather than one. The approach is justified by inspection of (3); the weighting cancels out the terms, so that it is (asymptotically) independent of the censoring distribution and converges to C. However, the resulting estimator is only completely independent of the censoring distribution if, as above for the Efron estimator, the maximal follow up for all patients is to a time τ such that the marginal survival distribution S(τ) = P(T > τ) = 0. If not, then the censored-pairs approach will converge to a quantity greater than C. Informally, this is because the individuals with high Z have the event first whether or not hazards also converge with time. More formally, this may be seen by re-expressing C as where and where and from Bayes’ rule. As t increases, the distribution of Z in those still at risk becomes weighted towards those with longer survival, and C decreases. When follow up is until t = τ, the censored-pairs concordance index converges to and because C is decreasing this limit is greater than C (anti-conservatively biased) unless S(τ) = 0. One can also see that the limit of Uno’s concordance index for τ close to the longest follow up will be less than Harrell’s version, since it gives relatively more weight to those C that are closer to t = τ.

3.4 Proportional-hazards model

A common approach is to estimate linear predictors of outcomes with censored event times using a proportional-hazards model. Here an estimator of the concordance index that does not depend on the censoring distribution or follow up was achieved by Gönen and Heller.[16] If T has hazard of form , then, because we have from (2) that where Z1 and Z2 are independent samples from distribution function F. When for some linear combination of covariates = (x1, …, x) and coefficients , and both T and Z are continuous, the concordance index depends on the distribution of z and equals which is linked to T only through the distribution of the coefficients and covariates . Equation (9) may be estimated by replacing F with its empirical distribution so that where uses the proportional-hazards estimates , and similarly for the more general (8). Its variance is estimable from re-sampling methods or from asymptotic formulae[16] which depend on the covariance matrix of that is routinely available from the partial-likelihood methods of the proportional-hazards model.

4 New estimators

4.1 Motivation

The methods reviewed above are not universal when the predictor loses strength with time, and may depend on the length of follow up. In particular, formulas (8) and (9) depend implicitly on the validity of the proportional-hazard assumption. Further developments would be useful because hazards are often observed to converge, so that the effect of a predictive factor diminishes as follow-up time increases. This issue is pervasive in applications[5]. For example, in breast cancer epidemiology, many prognostic factors are based on characteristics of the tumour that lose relevance once an individual has survived a period of time[17]. We next propose modifications to the Efron and the proportional-hazard estimators, before introducing a more parsimonious approach.

4.2 Modified two-sample estimator

Recall that when there is censoring due to incomplete follow up, Efron’s estimator may be very unstable. The following modification of Table 1 solves this problem by accounting for when the last time in each group is censored. Denote , , and . Let and , being respectively defined to be zero when or . Now when , P(T1 > T2) may be partitioned as since . Then from Table 1 is redefined to be t1 ≥ t2 t12 The terms are estimated by using Kaplan–Meier estimates of S2(t) for w2; for example S1(tmax) is the Kaplan–Meier estimate at the last non-censored time in the first group. As the original Efron estimator, the modified estimator is not a universal measure when censoring is due to incomplete follow up because it depends on tmax, but it is more stable than the Efron estimator because it does not depend on which group has the longest surviving censored member. It is not consistent for the concordance index if and but, in this case, clearly it is not possible to obtain a consistent estimator of the concordance index with making assumptions. However, one may obtain an estimate of the concordance index for different follow-up periods by varying tmax, where the modified estimator consistently estimates Thus, one approach to facilitate comparisons between studies is to present the estimate of this for different values of tmax. This idea has been used in a similar context elsewhere,[6,13] and is considered further in later simulations (Figure 2) and an example (Figure 5).

Figure 2.

Illustration of the effect of censoring on the two-group concordance statistic estimator. The lines show the concordance index under a Pareto model, with the γ parameter shown in the key.

Figure 5.

Plot of two-sample concordance index against type I censoring time (tmax) for binarized Ki67 and HER2 from the example. Point-wise 95% confidence intervals (empirical bootstrap) are also shown.

Illustration of the effect of converging hazards and censoring on concordance index estimators. Solid lines (—) use the classical censored-pairs approach, and the proportional-hazards model estimator is dashed (– – –). The true concordance index for this model is when there was no censoring (— black). Illustration of the effect of censoring on the two-group concordance statistic estimator. The lines show the concordance index under a Pareto model, with the γ parameter shown in the key. Concordance index estimates from simulations and true value (– – –). H: censored-pairs estimator; Ga: proportional-hazards estimator (10); Gb: hybrid proportional-hazards estimator based on (11); Pa: Pareto estimator using model fit; Pb: hybrid Pareto estimator using Table 1. Pareto model fit in example. Plot (a) is Schoenfeld partial residuals from a proportional-hazards (o) and Pareto model (end of line linked to o). Least squares trend lines of the residuals are shown for the proportional-hazards (—) and Pareto models (– –); the line at 0.5 indicates good model fit (- - -). Plot (b) compares the expected Ki67 at each event from the two models and least squares trend line. Plot (c) shows the fitted hazard ratios. Plot (d) is the estimated cumulative risk for a binarised Ki67 in the data (KM, Kaplan–Meier) and the models (— above median, – – – less than or equal to median). Plot of two-sample concordance index against type I censoring time (tmax) for binarized Ki67 and HER2 from the example. Point-wise 95% confidence intervals (empirical bootstrap) are also shown.

4.3 Modified proportional-hazards model estimator

A problem with the estimator of Gönen and Heller[16] is that if there is no censoring but proportional hazards do not hold, then the estimator will not agree with the classical approach. A partial solution to this is to modify the approach of Efron and write where . Under a proportional-hazards model, C may be estimated via the terms in Table 1, but the proportional-hazard assumption is only needed to calculate the non-trivial terms and so the estimator agrees with the classical formula when there is no censoring. A further difference to the above is that it requires an estimate of the baseline survivor function S0(t). This approach will be anti-conservatively biased when the data are censored and proportional hazards hold. It is intended for use when censoring is light and one would like robustness against large departures from proportional hazards. One might consider allowing for time-varying hazards g. In this case A concordance index based on this involves O(N2) evaluations of this double integral, which would need to be evaluated numerically. One also cannot use the model beyond the maximal follow-up time.

4.4 Pareto model

A parsimonious approach is to use a simple one-parameter model to account for varying degrees of convergence by introducing an unobserved additive covariate (frailty) to the proportional-hazards model, independent from other covariates, with a log-gamma distribution with mean one and variance γ.[18] This leads to a transformation model based on the Pareto distribution, so that if the baseline hazard and cumulative hazard are given by λ0(t) and Λ0(t) respectively, then an individual with covariate has survival function and hazard function This very flexible model has some attractive features. The hazard ratio is given by so that a consequence of the frailty (γ > 0) is that the hazard ratio approaches one as t gets large. When γ = 0 there is no frailty and it becomes the proportional-hazards model; when γ = 1 it becomes the proportional-odds model. Technical aspects of estimation and inference are considered in the appendix.

4.4.1 Concordance index

Computation of the Pareto concordance index involves a formula with γ, the {Z} and the baseline cumulative hazard function where , and analysis of concordance index (2) can proceed as the two previous approaches for proportional hazards. That is, the Pareto model can be used with in (9) replaced by (15) with s = 0 or via the hybrid approach replacing the non-trivial terms in Table 1 with the Pareto terms. The integral in (15) is needed for both approaches. Although it does not appear to be analytically tractable it may be estimated numerically, and it requires much less computation than (12).

4.4.2 Goodness-of-fit

We lastly consider model goodness-of-fit, partly because the Pareto concordance index is not needed when a proportional-hazards assumption is appropriate. One method is an asymptotic score test for when a Pareto model is taken as the alternative hypothesis to proportional hazards.[19] Another approach in this line is to apply a likelihood-ratio test for γ = 0,[20] with adjustment for model-boundary testing.[21] Schoenfeld residuals[22] are sometimes used, and in the general setting are defined for all when a non-censored event occurred (y = 1) to be where and are model estimates. These residuals show the difference between the observed and expected covariate at each event time, and have expectation zero if the model is correct. Plots of ŝ against t and fitted trends may help to identify departures from the model, and a chi-squared test based on scaled residuals is commonly used to test a proportional-hazards assumption,[23] without taking a Pareto model as the alternative. Because Schoenfeld residuals were designed to check the proportional-hazard assumption, a direct comparison with the Pareto model will help assess whether it satisfactorily addressed lack of fit. A related goodness-of-fit test is to use partial residuals defined as[22] Under the model these should be distributed uniformly between zero and one, independently of t. Empirical distribution function goodness-of-fit tests[24] could be used to assess the distribution of r in early and late periods.

5 Simulations

5.1 Bias

A simulation was used to demonstrate issues with existing methodology when there are converging hazards. Twenty-thousand individuals were simulated with survival times from a Pareto distribution; the rate for an individual was the exponent of a random normal covariate with unit mean and variance multiplied by a frailty sampled from a gamma distribution with mean one and variance γ. Type I censoring was considered, so that events occurred before a maximal follow-up time based on the expected proportion censored. For exposition we show 90%, 50% and 20% censoring. For ∼10-year follow up, heavy censoring might correspond to survival such as for distant recurrence in women diagnosed with estrogen-receptor positive breast cancer;[25] mid-range censoring (∼50%) might be seen for survival following disease such as an acute myocardial infarction event;[7] light censoring occurs when survival rates are low, for example, for survival following complete resection of non-small-cell lung cancer.[5] In all simulation scenarios there is no difference between the censored-pairs estimators of Harrell or Uno because everyone is censored at the same time. Concordance indices using a proportional-hazards model and the censored-pairs statistic were calculated and compared with the true index, obtained using a simulation without censoring. The results in Figure 1 show that for this model the proportional-hazard estimate was conservative when there was no censoring, but had positive bias when censoring was more than about 50%. The classical estimator substantially overestimated the concordance index when censoring was 50% or more; this bias was more pronounced for heavy censoring as the frailty variance γ increased.

Figure 1.

A second simulation was used to demonstrate the dependence of the two-sample estimator on follow up. Ten-thousand individuals were simulated in two groups, with survival time from an exponential distribution with rate one or two, compounded with a gamma frailty with variance γ, which was chosen to show the effect of a change from constant hazards (γ = 0) to when they converge very quickly (γ = 20). Censoring was generated by allowing individuals to be enrolled into a study at different times according to a uniform distribution between [0.00, 0.05], and then they were censored at a maximum follow-up time. The results in Figure 2 show that the two-sample statistic was conservatively biased when there was heavy censoring. Considering the chart from right (heavy censoring due to censoring) to left (no censoring), one can see that the concordance index estimate increased with more follow up (later censoring) until the covariate had ceased to influence survival due to converging hazards. The plot shows that the statistic is actually better when there are converging hazards, since it will converge to the true value with less follow up.

5.2 Comparison of estimators

A final simulation was used to compare estimators of C. Survival times were from a Pareto distribution that was the exponent of a standard random normal covariate (x) multiplied by 0.7 (i.e. z = exp(βx) with β = 0.7) and compounded by a frailty sampled from a gamma distribution with mean one and variance γ. Two choices of γ were considered (1.0 and 6.6) and three levels of censoring (follow up to time with expected censoring percentage 87%, 50% and 20%). The sample size was 1125 and 500 replications were used. The Pareto model was fitted by maximizing the profile likelihood (see Appendix). The reason for choosing β = 0.7, γ = 6.6, 87% censoring and n = 1125 is that these correspond to an example in the next section (Table 3(b), Ki67). We also considered γ = 1 in order to assess a scenario where the proportional-hazards assumption is violated more slowly, and partly for theoretical interest because it corresponds to a proportional-odds model. The censoring levels were varied to help assess the estimators as more follow up is accrued.

Table 3.

Estimated univariate concordance indices and model coefficients from example.

	Grade	HER2	Nodes	Ki67	ER
(a) Binary predictor
2-sample	0.57	0.61	0.59	0.55	0.53
Harrell	0.59	0.57	0.63	0.61	0.56
Uno	0.58	0.57	0.63	0.58	0.56
PH	0.57	0.54	0.60	0.59	0.56
Pareto	0.53	*	*	0.53*
PH β^(LR-χ²)	0.9 (24.9)	1.1 (23.1)	1.2 (47.5)	0.8 (21.6)	−0.5 (7.8)
Pareto β^(LR-χ²)	1.3 (27.0)	*	*	1.4 (25.2)*
γ^(LR-χ²)	4.0 (2.1)	0.0 (0.0)	0.0 (0.0)	8.7 (3.6)	0.0 (0.0)
(b) Continuous predictor
Harrell			0.65	0.64	0.57
Uno			0.64	0.62	0.58
PH			0.61	0.63	0.57
Pareto			*	0.55	0.54
PH β^(LR-χ²)			1.0 (72.7)	0.4 (31.8)	−0.2 (11.5)
Pareto β^(LR-χ²)			*	0.7 (35.2)	−0.2 (12.0)
γ^(LR-χ²)			0.0 (0.0)	6.6 (3.5)	2.8 (0.4)

PH: using proportional-hazards assumption and (10); Grade: moderate or worse; HER2: positive; Nodes: lymph node positive or number of nodes (ordinal: 0, 1–3, > 4); Ki67: above median or continuous marker; ER: oestrogen-receptor score above median or continuous; LR-χ2: likelihood-ratio statistic; : estimated regression coefficient for predictor; * indicates when Pareto model fit was proportional hazards.

The distribution of estimated concordance indices is shown in Figure 3. The concordance-index estimates from a Pareto model were substantially less biased than the other methods with heavy censoring (Table 2). The Pareto estimator was biased for heavy censoring at this sample size because it fits a proportional-hazards model where there is insufficient power to detect non-proportional hazards. Harrell’s statistic and the modified proportional-hazards statistic became less biased as the level of censoring decreased. The Pareto estimator had a lower mean squared error than the other estimators (Table 2).

Figure 3.

Concordance index estimates from simulations and true value (– – –). H: censored-pairs estimator; Ga: proportional-hazards estimator (10); Gb: hybrid proportional-hazards estimator based on (11); Pa: Pareto estimator using model fit; Pb: hybrid Pareto estimator using Table 1.

Table 2.

Simulation estimation results for two scenarios of γ.

	Mean bias (×100)			MSE (×100)
Censoring:	87%	50%	20%	87%	50%	20%
γ = 1 (proportional odds)
Censored pairs	4.8	2.1	0.5	28.4	5.7	1.2
PH-fit	3.2	0.4	−1.6	13.7	1.4	3.3
PH-hybrid	3.3	1.1	0.1	14.1	2.4	0.9
Pareto-fit	0.6	0.0	−0.1	10.5	1.3	0.8
Pareto-hybrid	0.6	0.0	−0.1	10.7	1.4	0.9
γ = 6.6
Censored pairs	8.3	1.8	0.2	75.5	4.8	1.0
PH-fit	6.8	0.5	−1.3	50.3	1.7	2.6
PH-hybrid	6.9	1.2	0.1	51.8	3.0	1.0
Pareto-fit	1.7	0.4	0.5	9.4	0.9	0.9
Pareto-hybrid	1.6	0.1	0.0	9.2	0.9	0.9

MSE: mean squared error; PH-fit: proportional-hazards estimator (10); PH-hybrid: proportional-hazards estimator based on (11); Pareto-fit: estimate using model fit only; Pareto-hybrid: Pareto model estimator using Table 1.

Simulation estimation results for two scenarios of γ. MSE: mean squared error; PH-fit: proportional-hazards estimator (10); PH-hybrid: proportional-hazards estimator based on (11); Pareto-fit: estimate using model fit only; Pareto-hybrid: Pareto model estimator using Table 1. Some differences were seen between a proportional-hazards concordance index based solely on model fit and the hybrid approach using Table 1. As expected the hybrid approach worked best for light censoring. It was worse under 50% censoring for the proportional-hazards model because it shifted the estimate towards the Harrell estimate, and the censored-pairs estimators are expected to be anti-conservative unless follow up is to a point where survival is zero (c.f. Figure 1). Thus, we do not recommend the hybrid approach unless censoring is light.

6 Example

The example uses a sample of 1125 women with oestrogen-receptor positive breast cancer, of whom 145 had a distant recurrence after a median 8.5-years follow up in a clinical trial (ATAC trial, ISRCTN registration numer ISRCTN18233230). This sample from the transATAC study (approved by the South-East London Research Ethics Committee (REC ref no. 971037)) were previously used to show that some immunohistochemical (IHC) biomarkers added useful information to classical clinical prognostic factors.[25] For demonstration and insight we focus next on some of the individual biomarkers used in the IHC risk score. We do not present results from the hybrid estimators because censoring is heavy, but there was little difference because model assumptions dominate the calculations (87% of women were censored). Table 3 shows some univariate concordance index estimates. The following points are of note. First, the two-sample estimates were different than the other form of concordance index. Second, Harrell’s and Uno’s statistics were closer to each other than the proportional-hazards and Pareto model statistics. This is likely due to the bias from follow up, as discussed earlier. Third, Pareto estimates were substantially lower than the proportional-hazards model when , reflecting an assumption of converging hazards. Finally, the concordance indices of binarised predictors were less than continuous counterparts due to the information loss from dichotomising. Estimated univariate concordance indices and model coefficients from example. PH: using proportional-hazards assumption and (10); Grade: moderate or worse; HER2: positive; Nodes: lymph node positive or number of nodes (ordinal: 0, 1–3, > 4); Ki67: above median or continuous marker; ER: oestrogen-receptor score above median or continuous; LR-χ2: likelihood-ratio statistic; : estimated regression coefficient for predictor; * indicates when Pareto model fit was proportional hazards. To explore further we focus on Ki67, whose Pareto concordance index estimate was 0.552 (SE (standard error) 0.0156) compared with 0.631 (SE 0.0210) under a proportional-hazards assumption, 0.644 (SE 0.0220) for Harrell’s estimator and 0.624 (SE 0.0213) for Uno’s adjusted version. Ki67 showed evidence of a departure from proportional hazards, seen informally by inspection of Table 4. More formally, a likelihood-ratio test (Table 3) that γ = 0 had P = 0.03 (after correction for model-boundary testing[21]); a different test for non-proportionality[23] yielded χ12 = 4.16, P = 0.04. Schoenfeld partial residuals in Figure 4(a) show that allowing for converging hazards via a Pareto model improved the residuals at the start and end. Figure 4(b) helps to show why; the expected value of Ki67 for events decreased more rapidly than a proportional-hazards assumption. Figure 4(c) shows the fitted hazard ratio from the Pareto model, which approximately halved over the period. Figure 4(d) demonstrates that a Pareto model for a binary Ki67 predictor better matched the Kaplan–Meier estimates than a proportional-hazards model.

Table 4.

Number of events in each year, split by Ki67 median (low/high).

Year	Low Ki67	High Ki67	Ratio
1	4	8	2.0
2	3	14	4.7
3	3	16	5.3
4	5	10	2.0
5	4	13	3.2
6	4	12	3.0
7	9	9	1.0
8	8	10	1.2
9	6	5	0.8
10	2	0	0.0

Figure 4.

Pareto model fit in example. Plot (a) is Schoenfeld partial residuals from a proportional-hazards (o) and Pareto model (end of line linked to o). Least squares trend lines of the residuals are shown for the proportional-hazards (—) and Pareto models (– –); the line at 0.5 indicates good model fit (- - -). Plot (b) compares the expected Ki67 at each event from the two models and least squares trend line. Plot (c) shows the fitted hazard ratios. Plot (d) is the estimated cumulative risk for a binarised Ki67 in the data (KM, Kaplan–Meier) and the models (— above median, – – – less than or equal to median).

Number of events in each year, split by Ki67 median (low/high). A goodness-of-fit test of the Pareto model is suggested by Figure 4(a), where most of the change in partial residuals between the proportional-hazards and Pareto model were in the first and last three years. Applying a two-sample Kolmogorov–Smirnov test of equality in distribution between the residuals in years ≤ 3 vs > 6 for the proportional-hazards model was rejected (D = 0.28, two-sided P = 0.03). The trend line shows that the Pareto model fitted somewhat better, and the same test did not reject a fit of the Pareto model (D = 0.22, P = 0.17). Thus the data showed some evidence to support the Pareto model fit, which was certainly better than proportional hazards, and the lower concordance index estimate than from a proportional-hazards model or the other approaches. Figure 5 plots the two-sample concordance index for binarised Ki67 by censoring time. The concordance index increased, and then appeared to plateau after six years. Thus one might surmise that the two-sample estimate from 10-year follow up is unlikely to increase for this variable with further follow up due to converging hazards (c.f. Figure 2). HER2 positivity is included for comparison, where the estimated concordance index increased with follow up, in better agreement with a proportional-hazards assumption.

7 Conclusion

The concordance index is routinely used to measure how well a variable predicts the time to a censored event. However, current estimators depend on the extent of follow up and many predictors using survival data lose their discriminatory power with follow up time. To account for this phenomenon we developed a concordance index based on a Pareto model. This semi-parametric model accounts for converging hazards, but leaves a baseline hazard function unspecified. In simulations under the model it was substantially less biased than other estimators. In a breast-cancer application the ordering of prognostic biomarker concordance index estimates changed when converging hazards were modelled, reflecting that some predictors are more useful for longer-term predictions than others. Our semi-parametric concordance index estimator is recommended for predictors of censored survival data when there is evidence of converging hazards.

13 in total

1. A GENERALIZED WILCOXON TEST FOR COMPARING ARBITRARILY SINGLY-CENSORED SAMPLES.

Authors: E A GEHAN
Journal: Biometrika Date: 1965-06 Impact factor: 2.445

2. A simulation study of predictive ability measures in a survival model II: explained randomness and predictive accuracy.

Authors: B Choodari-Oskooei; P Royston; Mahesh K B Parmar
Journal: Stat Med Date: 2012-07-05 Impact factor: 2.373

3. The estimation of average hazard ratios by weighted Cox regression.

Authors: Michael Schemper; Samo Wakounig; Georg Heinze
Journal: Stat Med Date: 2009-08-30 Impact factor: 2.373

4. A simulation study of predictive ability measures in a survival model I: explained variation measures.

Authors: Babak Choodari-Oskooei; Patrick Royston; Mahesh K B Parmar
Journal: Stat Med Date: 2011-04-26 Impact factor: 2.373

Review 5. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

Authors: F E Harrell; K L Lee; D B Mark
Journal: Stat Med Date: 1996-02-28 Impact factor: 2.373

6. A concordance index for matched case-control studies with applications in cancer risk.

Authors: Adam R Brentnall; Jack Cuzick; John Field; Stephen W Duffy
Journal: Stat Med Date: 2014-10-16 Impact factor: 2.373

7. Evaluating the yield of medical tests.

Authors: F E Harrell; R M Califf; D B Pryor; K L Lee; R A Rosati
Journal: JAMA Date: 1982-05-14 Impact factor: 56.272

8. The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors: J A Hanley; B J McNeil
Journal: Radiology Date: 1982-04 Impact factor: 11.105

9. Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer.

Authors: Jack Cuzick; Mitch Dowsett; Silvia Pineda; Christopher Wale; Janine Salter; Emma Quinn; Lila Zabaglo; Elizabeth Mallon; Andrew R Green; Ian O Ellis; Anthony Howell; Aman U Buzdar; John F Forbes
Journal: J Clin Oncol Date: 2011-10-11 Impact factor: 44.544

Review 10. Markers for the identification of late breast cancer recurrence.

Authors: Ivana Sestak; Jack Cuzick
Journal: Breast Cancer Res Date: 2015-01-27 Impact factor: 6.466

21 in total

1. Two-sample survival probability curves: A graphical approach for the analysis of time to event data in clinical trials.

Authors: Sandra Castro-Pearson; Chap T Le; Xianghua Luo
Journal: Contemp Clin Trials Date: 2022-02-14 Impact factor: 2.261

2. Predicting Recurrence in Endometrial Cancer Based on a Combination of Classical Parameters and Immunohistochemical Markers.

Authors: Peng Jiang; Jin Huang; Ying Deng; Jing Hu; Zhen Huang; Mingzhu Jia; Jiaojiao Long; Zhuoying Hu
Journal: Cancer Manag Res Date: 2020-08-18 Impact factor: 3.989

3. Intact global cognitive and olfactory ability predicts lack of transition to dementia.

Authors: Davangere P Devanand; Seonjoo Lee; Jose A Luchsinger; Howard Andrews; Terry Goldberg; Edward D Huey; Nicole Schupf; Jennifer Manly; Yaakov Stern; William C Kreisl; Richard Mayeux
Journal: Alzheimers Dement Date: 2020-01-04 Impact factor: 21.566

4. A novel prognostic model predicts overall survival in patients with nasopharyngeal carcinoma based on clinical features and blood biomarkers.

Authors: Changchun Lai; Chunning Zhang; Hualiang Lv; Hanqing Huang; Xia Ke; Chuchan Zhou; Hao Chen; Shulin Chen; Lei Zhou
Journal: Cancer Med Date: 2021-05-11 Impact factor: 4.452

5. Predicting Disease Recurrence, Early Progression, and Overall Survival Following Surgical Resection for High-risk Localized and Locally Advanced Renal Cell Carcinoma.

Authors: Andres F Correa; Opeyemi A Jegede; Naomi B Haas; Keith T Flaherty; Michael R Pins; Adebowale Adeniran; Edward M Messing; Judith Manola; Christopher G Wood; Christopher J Kane; Michael A S Jewett; Janice P Dutcher; Robert S DiPaola; Michael A Carducci; Robert G Uzzo
Journal: Eur Urol Date: 2021-03-09 Impact factor: 24.267

6. Trans-omics biomarker model improves prognostic prediction accuracy for early-stage lung adenocarcinoma.

Authors: Xuesi Dong; Ruyang Zhang; Jieyu He; Linjing Lai; Raphael N Alolga; Sipeng Shen; Ying Zhu; Dongfang You; Lijuan Lin; Chao Chen; Yang Zhao; Weiwei Duan; Li Su; Andrea Shafer; Moran Salama; Thomas Fleischer; Maria Moksnes Bjaanæs; Anna Karlsson; Maria Planck; Rui Wang; Johan Staaf; Åslaug Helland; Manel Esteller; Yongyue Wei; Feng Chen; David C Christiani
Journal: Aging (Albany NY) Date: 2019-08-21 Impact factor: 5.682

7. A Nomogram for Prediction of Survival in Patients After Gastrectomy Within Enhanced Recovery After Surgery (ERAS): A Single-Center Retrospective Study.

Authors: Yuqi Sun; Zequn Li; Xiaodong Liu; Shougen Cao; Xuechao Liu; Chuan Hu; Yulong Tian; Jianfei Xu; Daoshen Wang; Xin Zhou; Yanbing Zhou
Journal: Med Sci Monit Date: 2020-10-10

8. Independent Validation of Early-Stage Non-Small Cell Lung Cancer Prognostic Scores Incorporating Epigenetic and Transcriptional Biomarkers With Gene-Gene Interactions and Main Effects.

Authors: Ruyang Zhang; Chao Chen; Xuesi Dong; Sipeng Shen; Linjing Lai; Jieyu He; Dongfang You; Lijuan Lin; Ying Zhu; Hui Huang; Jiajin Chen; Liangmin Wei; Xin Chen; Yi Li; Yichen Guo; Weiwei Duan; Liya Liu; Li Su; Andrea Shafer; Thomas Fleischer; Maria Moksnes Bjaanæs; Anna Karlsson; Maria Planck; Rui Wang; Johan Staaf; Åslaug Helland; Manel Esteller; Yongyue Wei; Feng Chen; David C Christiani
Journal: Chest Date: 2020-02-28 Impact factor: 9.410

9. Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations.

Authors: Zhi Huang; Travis S Johnson; Zhi Han; Bryan Helm; Sha Cao; Chi Zhang; Paul Salama; Maher Rizkalla; Christina Y Yu; Jun Cheng; Shunian Xiang; Xiaohui Zhan; Jie Zhang; Kun Huang
Journal: BMC Med Genomics Date: 2020-04-03 Impact factor: 3.063

10. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms.

Authors: André M Carrington; Paul W Fieguth; Hammad Qazi; Andreas Holzinger; Helen H Chen; Franz Mayr; Douglas G Manuel
Journal: BMC Med Inform Decis Mak Date: 2020-01-06 Impact factor: 2.796

Year	Low Ki67	High Ki67	Ratio
1	4	8	2.0
2	3	14	4.7
3	3	16	5.3
4	5	10	2.0
5	4	13	3.2
6	4	12	3.0
7	9	9	1.0
8	8	10	1.2
9	6	5	0.8
10	2	0	0.0

Year	Low Ki67	High Ki67	Ratio
1	4	8	2.0
2	3	14	4.7
3	3	16	5.3
4	5	10	2.0
5	4	13	3.2
6	4	12	3.0
7	9	9	1.0
8	8	10	1.2
9	6	5	0.8
10	2	0	0.0

Year	Low Ki67	High Ki67	Ratio
1	4	8	2.0
2	3	14	4.7
3	3	16	5.3
4	5	10	2.0
5	4	13	3.2
6	4	12	3.0
7	9	9	1.0
8	8	10	1.2
9	6	5	0.8
10	2	0	0.0