Literature DB >> 31386230

Quantifying degrees of necessity and of sufficiency in cause-effect relationships with dichotomous and survival outcomes.

Abstract

We suggest measures to quantify the degrees of necessity and of sufficiency of prognostic factors for dichotomous and for survival outcomes. A cause, represented by certain values of prognostic factors, is considered necessary for an event if, without the cause, the event cannot develop. It is considered sufficient for an event if the event is unavoidable in the presence of the cause. Necessity and sufficiency can be seen as the two faces of causation, and this symmetry and equal relevance are reflected by the suggested measures. The measures provide an approximate, in some cases an exact, multiplicative decomposition of explained variation as defined by Schemper and Henderson for censored survival and for dichotomous outcomes. The measures, ranging from zero to one, are simple, intuitive functions of unconditional and conditional probabilities of an event such as disease or death. These probabilities often will be derived from logistic or Cox regression models; the measures, however, do not require any particular model. The measures of the degree of necessity implicitly generalize the established attributable fraction or risk for dichotomous prognostic factors and dichotomous outcomes to continuous prognostic factors and to survival outcomes. In a setting with multiple prognostic factors, they provide marginal and partial results akin to marginal and partial odds and hazard ratios from multiple logistic and Cox regression. Properties of the measures are explored by an extensive simulation study. Their application is demonstrated by three typical real data examples.

Entities: Chemical Disease Gene Species

Keywords: Cox regression; attributable risk; explained variation; logistic regression; necessary condition; sufficient condition

Mesh：

Year: 2019 PMID： 31386230 PMCID： PMC6771968 DOI： 10.1002/sim.8331

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

INTRODUCTION

The investigation of possible causes for the development of diseases and for outcomes of diseased individuals is a main task of medical research. In clinical medicine, assumed cause‐effect relationships are typically explored by means of the statistical tool box for prognostic factor studies. Naturally, epidemiologists have also been interested in the field of causation for a long time and have introduced useful concepts.1 However, application of these concepts has been hampered by the limited availability of statistical measures (except for attributable fractions) and corresponding estimation of these measures for target populations (cf the work of Vanderweele2). The methodology presented here increases the versatility of the early concepts of necessary and of sufficient conditions to empirical research. While the statistical literature offers numerous approaches to analyze different aspects of the effects of prognostic factors on dichotomous or survival outcomes, here we focus on an aspect that has received little attention: the degree to which levels of a prognostic factor are either necessary or sufficient for a (un)favorable outcome. Let the level of interest of a dichotomous factor alternatively be termed cause or condition. Such a condition can be necessary and/or sufficient for a (un)favorable outcome. A condition is considered necessary for a (un)favorable outcome, eg, disease or complete cure from disease, if without the condition, the outcome cannot develop. A condition is considered sufficient for a (un)favorable outcome if this outcome is unavoidable in the presence of the condition. Necessity and sufficiency can be seen as the two faces of causation,3 and this symmetry and equal relevance are reflected by the suggested measures. If a condition is both necessary and sufficient, it completely determines the outcome. In other words, if and only if this condition is present, the outcome will develop. While there are examples for the latter case in medicine (eg, Tangier disease4), causation in this strict sense rarely is observed in clinical or epidemiological studies. Therefore, it is of medical interest to quantify the degree to which the effect of a condition or, more generally, of a prognostic factor is necessary or sufficient for an outcome. We think that measures for this purpose should have the following properties (cf the works of Kvålseth5 and Schemper and Stare6): Simplicity and intuitively clear interpretation; a range from 0 to 1 being desirable, the endpoints marking minimal and maximal necessity and sufficiency, respectively. The measures of necessity and of sufficiency should be of similar structure but independently sensitive to relevant properties of datasets. The measures should satisfy simple relationships to a corresponding measure of explained variation, given their conceptual proximity to the latter. Availability for any type of prognostic factor, ie, continuous, dichotomous, and polytomous. Availability for any type of regression model or prediction tool. The measures DN and DS for the degrees of necessity and of sufficiency, respectively, introduced in the next Section, obey these criteria. We want to motivate the use of such measures by means of a simple example: Nilsson et al7 studied the relationship between smoking and death due to lung cancer after a 33‐year follow‐up in a cohort of 12 664 males in Sweden. See Table 1.

Table 1

Data of 12 664 males in Sweden7

	Never smoked	Ever smoked	Total
Died from lung cancer	36	177	213
Otherwise	8120	4331	12 451
Total	8156	4508	12 664

Data of 12 664 males in Sweden7 The data confirm that smoking is quite necessary for a lung cancer death (most deaths occurred among smokers); however, smoking is not sufficient for death due to lung cancer (most of the smokers did not die from lung cancer). It would be desirable to quantify the degrees of necessity, DN, and of sufficiency, DS, as additional aspects of the factor smoking, additional, that is, to the relative risk (8.895) or to explained variation8 (EV = 0.017). According to the formulae of Section 2, values of DN and DS are 0.738 and 0.023, respectively. Though smoking increases the risk of death due to lung cancer by almost 9‐fold, it explains only less than 2% of the variability in the outcome. In Section 2, we shall learn how EV is connected to DN and DS. In our example, it is the low DS that is responsible for the low EV. Seen as an aspect or attribute of any prognostic factor, DN and DS will also be defined for continuous factors. Sections 2 and 3 provide definitions for DN and DS for dichotomous and survival outcomes, respectively. In Section 4, the performance of DN and DS under typical and also under extreme situations is explored by artificial and by simulated data examples. We illustrate application to typical prognostic factor studies in Section 5 and close the paper with a discussion in Section 6.

DEGREES OF NECESSITY AND OF SUFFICIENCY FOR A DICHOTOMOUS OUTCOME

For a dichotomous outcome, let D denote the event of interest (eg, disease) with probability P(D) ≠ 0, 1. For a prognostic factor X, we propose to define DN and DS in a population as and where E and E denote expectation conditional on {X: P(D | X) < P(D)} and on {X: P(D | X) > P(D)}, respectively. Note that the noninformative case of P( contributes neither to DN nor to DS. Alternatively, we propose and In the following, we use DN and DS without subscript if referring to both variants of the measures.

Comments on definitions

For an unfavorable outcome event D (like disease), the range of values for X with P(D | X) < P(D) defines its protective levels since at these levels, the conditional disease probability falls below the unconditional one. For a favorable outcome, the definitions above still apply but the same range defines harmful levels of X. It is therefore important to clearly denote the level of the outcome to which DN and DS measures refer. In the remaining text, we assume an unfavorable outcome, if not explicitly stated otherwise, and designate levels of X with P(D | X) < P(D) as protective. By explicitly using 0 in the denominators of (1) and (3), we emphasize that the actual departure of P(D | X) from P(D) is standardized to the maximal (hypothetical) departure when P(D | X) assumes a value of 0. Likewise, the actual departure of P(D | X) from P(D) is standardized by the denominators of (2) and (4), to the maximal (hypothetical) departure when P(D | X) assumes a value of 1. Thus, DN and DS achieve their maximal values of 1 if conditional disease probabilities are at the extreme values of 0 and 1, respectively. Repeating from Section 1, a perfect necessary condition for a disease requires that no diseased individuals are observed for protective levels of the risk factor. Likewise, a perfect sufficient condition for a disease requires that no healthy individuals are observed for harmful levels of the risk factor. If conditional and unconditional disease probabilities do not differ, then there are no informative observations in (1) to (4), resulting in values of 0, both for DN and DS. By squaring the kernels (P( and (P(, within the conditional expectation, DN 1 and DS 1 give larger weight to extreme conditional probabilities (ie, close to 0 and 1, respectively), compared to DN 2 and DS 2. The square root in (1) and (2) permits interpretation on the probability scale with an intention similar to calculating standard deviations from variances. Since squaring is a convex function, Jensen's inequality gives DN 2 ≤ DN 1 and DS 2 ≤ DS 1. On the other hand, DN 2 ≥ DN 1 2 and DS 2 ≥ DS 1 2 since the kernels are constrained in [0, 1]. The definitions presented so far refer to marginal measures. As with EV, it is also possible and useful to define a partial degree of necessity as the difference between two DN measures for nested models (and analogously for DS). This makes it possible to quantify the gain in DN or DS when adding further prognostic factors to a model.

Connection to attributable risk

The reader will notice that the kernel of DN is the attributable risk for a single dichotomous prognostic factor X. The attributable risk, , quantifies the relative extent to which the disease probability can be reduced by shifting all subjects to the unexposed level (X = 0), ie, by eliminating the exposure.9 In this way, high values of AR indicate that the harmful level of a dichotomous risk factor tends to be necessary for the disease. For a general, not necessarily dichotomous, prognostic factor X with a protective value x, (P( gives the relative reduction in disease probability, if all subjects are hypothetically shifted to x. Thus, DN in (1) and (3) can be seen as averages of AR‐type measures across all protective levels of X. Alternative extensions of AR to nondichotomous prognostic factors have been proposed.10, 11 As already noted by Nelson and O'Brien,4 AR only covers the necessity part of causation. Therefore, for a single dichotomous prognostic factor X, we define the reverse attributable risk, , as the relative reduction of the probability of “nondisease” (¬D) by shifting all subjects into the exposed level (X = 1). High values of AR* indicate that moving subjects to the exposed level will substantially increase the disease probability. For a general prognostic factor X with a harmful value x, gives a suitably normalized increase in disease probability, if all subjects are hypothetically shifted to x. Thus, DS in (2) and (4) can be seen as averages of AR ∗‐type measures across all harmful levels of X.

Explained variation as constituted by the degrees of necessity and of sufficiency

In this section, we investigate the contribution of necessity and of sufficiency to explained variation. For dichotomous outcomes, the population definition of the indirect measure of explained variation 8 is EV quantifies the relative reduction in predictive inaccuracy when unconditional prediction is replaced by conditional prediction.8 We assume that P(, which is fulfilled in practice if conditional probabilities are estimated by a calibrated model. Then, the numerator of (5) can be transformed to resulting in This reformulation shows that EV = 0 if and only if P(D | X) = P(D), which is also equivalent to DN = DS = 0. On the other hand, DN = 1 if and only if P(D | X) = 0 whenever P(D | X) < P(D), and DS = 1 if and only if P(D | X) = 1 whenever P(D | X) > P(D). Following from (5), these two conditions are together equivalent to EV = 1. Note that levels x with P(D | X = x) = P(D) are not only noninformative for DN and DS but also for EV using (6). Let denote the probability of a harmful level of X. By splitting the expectation E(.) in (6) into conditional expectations E(.) and E(.), EV can be decomposed into simple functions of DN 1 and DS 1 For P(D) = 0.5, this results in EV as a weighted average of the squared DN 1 and DS 1. In general, the weights (1‐α) and α are multiplied by the odds of disease and nondisease, respectively. Thus, the proportion of variation explained by X is closer to the squared degree of necessity induced by X if the odds for disease are high and there is a low probability of harmful levels for X. For example, in the lung cancer data presented in the introduction, the unconditional disease probability is estimated as 1.7%, which results in a much higher weight for DS in the decomposition of EV (despite an estimated value for α equal to 0.36). In general, Equation (7) shows that rare diseases (P(D) small) need highly sufficient prognostic factors to achieve a reasonably large value of EV. For DN 2 and DS 2, no relation with EV similar to (7) can be deduced, since EV is based on squared deviations between conditional and unconditional disease probabilities, while DN 2 and DS 2 average over absolute differences. However, in Appendix A, we show that while the empirical examples investigated in Sections 4 and 5 demonstrate that usually is close to EV.

Estimation

For estimation of DN and DS, we replace population values in (1) to (4) by sample estimates. For a sample of n individuals, let denote the estimate of the conditional disease probability P(D | X) for the ith individual. For example, the conditional prediction could be based on a (multiple) logistic regression model. Furthermore, let denote an estimate of P(D) and n < and n > the number of observations i with and , respectively. Then, DN 1 and DS 1 can be estimated by and Similarly, and These point estimates can be accompanied by bootstrap confidence intervals. We recommend calculating percentile or BCa bootstrap confidence intervals.12 As usual, bootstrap replicates must be taken on the level of the full vector of observations, and the algorithm or model producing the must be applied to each bootstrap replicate . Within each bootstrap replicate, the estimate of DN and DS must be set to zero whenever the mean of over the harmful X range of the original sample falls below the mean over the protective range. In this way, we prevent switching the meaning of protective/harmful between the original sample and the bootstrap replicates.

Relationships for a single dichotomous prognostic factor

In the case of a single dichotomous prognostic factor, the data can be arranged as shown in Table 2, where a, b, c, and d denote the absolute frequencies in the respective cells with a + b + c + d = n.

Table 2

Notation for a 2 x 2 setting

Outcome
D	c	d
¬D	a	b
	0	1	X

Notation for a 2 x 2 setting The disease probability is estimated as . If , then , and therefore X = 0 is the protective level. Inserting for and for into (8) to (11), we therefore get Note that c = 0 and b = 0 lead to 100% degrees of necessity and of sufficiency, respectively. If , the protective level is X = 1 such that Similarly, for , we get Inserting into (5) or (6) results in which is identical with the multiple R 2 for dichotomous outcomes13, 14 and with explained variation measures and (see Section 2.2 in the work of Schemper8). With analogous derivations on the population level, we therefore have DN 1 = DN 2 = AR, DS 1 = DS 2 = AR, and . Thus, in the 2 x 2 case, explained variation can exactly be decomposed as the product of the degrees of necessity and of sufficiency. The estimates and above rely on data from representative samples, often available from cross‐sectional and cohort studies, usually not from case‐control studies.

DEGREES OF NECESSITY AND OF SUFFICIENCY FOR A SURVIVAL OUTCOME

While, for dichotomous outcomes, we compare single conditional and unconditional probabilities of an event for each individual, with survival outcomes conditional and unconditional cumulative distribution functions, and , are compared. For a single time , the situation is identical to dichotomous outcomes dealt with in Section 2, and the same measures of necessity and sufficiency can be used. However, we are usually interested in survival over the full follow‐up range [0, , and hence, it is natural to average over (0 ≤ where is the density of events in time. The square root of the terms in brackets defines and , the degrees of necessity and of sufficiency at time , respectively. They are analogous to definitions (1) and (2) of DN 1 and DS 1 with dichotomous outcomes. Furthermore, E and E denote expectation conditional on {X: F(t | X) < F(t)} and on {X: F(t | X) > F(t)}, respectively. For a single dichotomous prognostic factor, the kernel ( was termed attributable risk function to quantify the proportion of disease that would be prevented at time t by a fully effective intervention.15, 16 A generalization to “with‐intervention” distributions was proposed by Samuelsen and Eide.17 In definitions (12) and (13), and are appropriately weighted by within the time range of follow‐up or of medical interest. The same approach to averaging an effect‐size measure over time has been used by Schemper and Henderson,18 which also permits comparability of and with their explained variation measures for survival. Similarly, we define and As with dichotomous outcomes, the two variants coincide for dichotomous X, ie, and . In a given sample, let , , and denote observation time, censoring indicator, and vector of prognostic factors, respectively, for individual (1 ≤ . Assume there are distinct survival times in the sample, at times (1 ≤ , with deaths at . Then, at each distinct death time , we estimate the degree of necessity as with denoting the indicator function and the number of subjects with at time . In practice, the required unconditional and conditional estimates, and , are most often obtained by means of the Kaplan‐Meier estimator19 for the unconditional and Cox regression20 for the conditional survival function estimate, and , respectively. However, they might as well be obtained from other estimators and regression models such as parametric survival models. To obtain an overall estimate of , we form weighted averages of the over survival times, with weights designed to compensate the attenuation in observed death due to earlier censorship with , and denoting the Kaplan‐Meier estimator of the censoring or “potential follow‐up” distribution estimated like but with the meaning of the censoring indicator reversed.21 This type of weighting has also been used with explained variation measures for survival outcomes18 and permits consistent estimates of , as given by (12), in the presence of clinical or administrative censoring before (as will be confirmed by simulation results of Section 4.2). Analogously to , we estimate and Note that DN and DS are not affected by monotone transformation of the time scale, a property shared with the commonly employed semiparametric tools of survival analysis. In the above definitions, the protective range of X is the set of values where the conditional distribution function estimate, ie, the conditional estimate of the cumulative incidence, is below the unconditional one. This is useful in the frequent case of survival outcomes defined as the time to an unfavorable event. In cases of a time to a favorable event, such as complete remission, the definitions above still apply but the protective (harmful) range of X would then comprise those values where the cumulative incidence is higher (lower) if estimates are conditional instead of unconditional. As with dichotomous outcomes, it is therefore important to clearly state the outcome to which degrees of necessity and of sufficiency refer.

EMPIRICAL RESULTS

In Sections 4.1 and 4.2, we investigate the ranges of DN and DS and their relationships with odds and hazard ratios and disease probability on the population level. In Section 4.3, we explore the amount of bias of the estimates of DN and DS for sample sizes likely encountered in practice.

Dichotomous outcome

We demonstrate the dependence of DN, DS, and EV on various constellations of a dichotomous outcome and a dichotomous prognostic factor on the population level by means of Table 3. The constellations can either be characterized by the probabilities of the cells, a to d, in a 2 x 2 table (see Table 2) or, equivalently, by the corresponding values of disease probability, P(D), the probability of the harmful level of X, , and the odds ratio for disease, . Depending on whether the OR is greater or smaller than 1, α = P(X = 1) or α = P(X = 0), respectively. Moreover, recall that both variants of DN and DS, DN 1 and DN 2, and DS 1 and DS 2 are identical in case of a dichotomous prognostic factor, and thus, subscripts for DN and DS are omitted.

Table 3

Population values of DN, DS, and EV for dichotomous outcome and dichotomous prognostic factor

Scenario	P(D)	α	OR	a	b	c	d	DN	DS	EV
1	0.1	0.1	10	84.1	5.9	5.9	4.1	0.346	0.346	0.120
2	0.1	0.5	10	48.9	41.1	1.1	8.9	0.787	0.087	0.069
3	0.1	0.9	10	9.9	80.1	0.1	9.9	0.878	0.011	0.010
4	0.5	0.1	10	48.9	1.1	41.1	8.9	0.087	0.787	0.069
5	0.5	0.5	10	38.0	12.0	12.0	38.0	0.519	0.519	0.270
6	0.5	0.9	10	8.9	41.1	1.1	48.9	0.787	0.087	0.069
7	0.9	0.1	10	9.9	0.1	80.1	9.9	0.011	0.878	0.010
8	0.9	0.5	10	8.9	1.1	41.1	48.9	0.087	0.787	0.069
9	0.9	0.9	10	4.1	5.9	5.9	84.1	0.346	0.346	0.120
10	0.1	0.1	100	87.4	2.6	2.6	7.4	0.716	0.716	0.513
11	0.1	0.5	100	49.9	40.1	0.1	9.9	0.975	0.108	0.106
12	0.1	0.9	100	10.0	80.0	0.0	10.0	0.988	0.012	0.012
13	0.5	0.1	100	49.9	0.1	40.1	9.9	0.108	0.975	0.106
14	0.5	0.5	100	45.5	4.5	4.5	45.5	0.818	0.818	0.669
15	0.5	0.9	100	9.9	40.1	0.1	49.9	0.975	0.108	0.106
16	0.9	0.1	100	10.0	0.0	80.0	10.0	0.012	0.988	0.012
17	0.9	0.5	100	9.9	0.1	40.1	49.9	0.108	0.975	0.106
18	0.9	0.9	100	7.4	2.6	2.6	87.4	0.716	0.716	0.513
19	0.1	0.1	0.1	5.9	84.1	4.1	5.9	0.346	0.346	0.120
20	0.1	0.5	0.1	41.1	48.9	8.9	1.1	0.787	0.087	0.069
21	0.1	0.9	0.1	80.1	9.9	9.9	0.1	0.878	0.011	0.010
22	0.5	0.1	0.1	1.1	48.9	8.9	41.1	0.087	0.787	0.069
23	0.5	0.5	0.1	12.0	38.0	38.0	12.0	0.519	0.519	0.270
24	0.5	0.9	0.1	41.1	8.9	48.9	1.1	0.787	0.087	0.069
25	0.9	0.1	0.1	0.1	9.9	9.9	80.1	0.011	0.878	0.010
26	0.9	0.5	0.1	1.1	8.9	48.9	41.1	0.087	0.787	0.069
27	0.9	0.9	0.1	5.9	4.1	84.1	5.9	0.346	0.346	0.120
28	0.1	0.1	1	81.0	9.0	9.0	1.0	0.000	0.000	0.000
29	0.5	0.5	1	25.0	25.0	25.0	25.0	0.000	0.000	0.000
30	0.5	0.5	0	0.0	50.0	50.0	0.0	1.000	1.000	1.000
31	0.1	0.5	∞	50.0	40.0	0.0	10.0	1.000	0.111	0.111
32	0.5	0.5	∞	50.0	0.0	0.0	50.0	1.000	1.000	1.000
33	0.9	0.5	∞	10.0	0.0	40.0	50.0	0.111	1.000	0.111
34	0.01	0.5	100	50.0	49.0	0.01	1.0	0.980	0.010	0.010
35	0.5	0.01	100	50.0	0.01	49.0	1.0	0.010	0.980	0.010

Abbreviations: P(D), unconditional disease probability; OR, odds ratio; α, probability of harmful level of X; a, b, c, d, as by Table 2 but with probabilities (%).

Population values of DN, DS, and EV for dichotomous outcome and dichotomous prognostic factor Abbreviations: P(D), unconditional disease probability; OR, odds ratio; α, probability of harmful level of X; a, b, c, d, as by Table 2 but with probabilities (%). We have included scenarios with , which are rarely observed in practice. However, Pepe et al have nicely shown that odds ratios even larger than 100 are required to reliably discriminate between individuals who will experience an event and those who will not.22 By exploring the effect of extreme odds ratios (and hazard ratios in the following Section 4.2), we demonstrate that DN and DS values close to 1 are indeed achievable. Population values of DN and DS in Table 3 are obtained by inserting population probabilities instead of sample frequencies into the definitions given in Section 2.5. From Table 3, we confirm and learn that , and are in the range from 0 to 1. ; because of (i), this implies . if for any value of . If is replaced by , then , and are unchanged; in this case, however, by definition of α, a, and b are exchanged as well as c and d (compare Scenarios 1‐9 with 19‐27). This is sensible because otherwise DN (or analogously AR) would be obtained for an assumed “harmful” level of X leading to fewer unfavorable outcomes than the “protective” level. If , then , and are 0 (Scenarios 28‐29). If and only if , then ; if and only if , then ; if and only if , then (given ; for , b and c are replaced by a and d in the previous conditions; Scenarios 30‐33). For a rare unfavorable outcome (, Scenario 34), the harmful level of X is highly necessary for this outcome (DN = 98%); however, it is by no means sufficient (DS = 1%), ie, the harmful level of X does not at all imply a high probability of the unfavorable level of the outcome (despite an OR = 100; see also the lung cancer example in the Introduction). For a rare harmful exposure level (, Scenario 35), the harmful level of X is extremely sufficient for this outcome (DS = 98%); however, it is by no means necessary (DN = 1%), ie, the unfavorable outcome can easily occur also with the protective level of X (despite an OR = 100). In both cases, the resulting low value of EV indicates the existence of further important factors responsible for disease, which are more sufficient or necessary, respectively. The case of a dichotomous outcome and a continuous prognostic factor is illustrated by Figure 1, which demonstrates the effect of and on , and . The results are based on a marginal standard normal distribution of X. Regression coefficients were set to β 1 = log (10) and log (100), and β 0 selected such that P(D) = 0.5 and 0.9, respectively. The conditional expectations in Equations (1) to (4) were numerically approximated using the integrate() function in R version 3.5.1.

Figure 1

Population values of DN, DS, and EV for four scenarios with OR = 10, 100, P(D) = 0.5, 0.9, and standard normally distributed prognostic factor X. Below and above each of the corresponding panels, the distributions of X for Y = 0 and Y = 1 are shown, the areas of the smoothed histograms being proportional to 1 ‐ P(D) and P(D), respectively. Black solid line: logistic regression curve P(D | X); dashed horizontal line at P(D); dashed vertical line at X where P(D | X) = P(D), resulting in α = 0.50, 0.50, 0.75, 0.82 for upper left to lower right panel The resulting population distributions of X, separately for Y = 0 and Y = 1, and logistic regression curves for P(D | X) are shown in Figure 1. For example, in the lower right subfigure for OR = 100 and P(D) = 0.9, P(D | X) is very close to 1 throughout the harmful range of X (right of the vertical dashed line), resulting in high contributions to DS in the kernel of definition (2). In contrast, X values in the protective range (left of the vertical dashed line) exhibit a large range for P(D | X), which leads to a medium value of DN. As observed for a dichotomous prognostic factor, all three measures increase if increases from 10 to 100, and and ( increase and decrease, respectively, if increasing . For , the product of DN 1 and DS 1 exactly equals EV (upper panels in Figure 1). For , the departure of from EV is smaller with OR = 100 (right panels in Figure 1) than with OR = 10. Further simulations show that, for extremely high (or low) OR, this discrepancy goes to zero, since an OR approaching infinity (or zero) produces a situation close to the 2 x 2 case (results not shown).

Survival outcome

For survival outcomes, we consider a dichotomous prognostic factor X and exponentially distributed survival times with hazards of exp(− and set to 0, −log(2), −log(10), and −log(100). This results in relative hazards of = 1, 2, 10, and 100. The Equations (12) and (13) on the population level were numerically approximated using the integrate() function in R version 3.5.1. We considered two types of (noninformative) censoring: type I censoring at as a result of constant follow‐up of time units for all individuals; administrative censoring between 0 and , which results from uniformly distributed follow‐up when individuals enter a clinical trial at a constant rate over an interval from 0 to . In either case, data are analyzed at time . Results for DN, DS, and EV, the Schemper‐Henderson measure of explained variation,18 are presented in Table 4 with values of selected to give 0, 50, and 90% censoring under administrative censoring. Corresponding censoring percentages under type I censoring are lower. The purpose of producing results under both types of censoring for DN and DS is to distinguish the effect of changing from that of random censoring between 0 and . From Table 4, we learn that the expectations of DN and DS are virtually unaffected by additional (administrative) random censoring before . However, the measures are affected by the choice of , the maximum follow‐up time or maximum time of medical interest. Both properties are shared with the Schemper‐Henderson measure of explained variation for survival outcomes.18

Table 4

The DN, DS, and EV for survival outcome and balanced dichotomous prognostic factor

		Type 1 censoring				Administrative censoring
HR	τ	% cens.	DN	DS	EV	% cens.	DN^	DS^	EV^
1	‐	0	0.00	0.00	0.00
	1.64	20	0.00	0.00	0.00	50	0.00	0.00	0.00
	0.22	80	0.00	0.00	0.00	90	0.00	0.00	0.00
2	‐	0	0.22	0.31	0.05
	2.26	23	0.25	0.20	0.05	50	0.25	0.20	0.05
	0.30	80	0.32	0.04	0.02	90	0.33	0.04	0.02
10	‐	0	0.55	0.62	0.32
	4.90	31	0.71	0.44	0.34	50	0.71	0.44	0.34
	0.43	81	0.80	0.09	0.09	90	0.80	0.09	0.10
100	‐	0	0.67	0.69	0.48
	15.30	43	0.96	0.45	0.53	50	0.96	0.45	0.53
	0.47	81	0.98	0.11	0.14	90	0.98	0.10	0.13

Abbreviations: HR, hazard ratio; τ, time of maximum follow‐up; % cens., percentage of censored observations. For Type 1 censoring, the entries are population values, while under administrative censoring, they are based on simulated samples of 100 000 observations.

Note that both variants of DN and of DS are identical for dichotomous prognostic factors.

The DN, DS, and EV for survival outcome and balanced dichotomous prognostic factor Abbreviations: HR, hazard ratio; τ, time of maximum follow‐up; % cens., percentage of censored observations. For Type 1 censoring, the entries are population values, while under administrative censoring, they are based on simulated samples of 100 000 observations. Note that both variants of DN and of DS are identical for dichotomous prognostic factors. As DN and DS can be considered as functions that vary with time, their dependence on is natural. Kejžar et al pointed out that explained variation measures for survival data that are unaffected by do this by implicitly extrapolating Cox regression coefficients beyond the time covered by the sample.23 The same would be the case with DN‐type and DS‐type measures that were unaffected by . However, Kejžar et al also demonstrate how the Schemper‐Henderson measure could be made independent of a particular τ, if extrapolation was justified (see also the work of Schemper and Kaider24). The same procedure can be used to make DN and DS unaffected by . Table 5 shows that the distribution of X affects values of DN and DS in a similar way as presented for dichotomous outcomes in Table 3.

Table 5

Population values of DN, DS, and EV for uncensored survival outcome and dichotomous prognostic factor

	α = 0.1			α = 0.5			α = 0.9
HR	DN	DS	EV	DN	DS	EV	DN	DS	EV
1	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
2	0.05	0.47	0.02	0.22	0.31	0.05	0.36	0.08	0.02
10	0.16	0.85	0.12	0.55	0.61	0.32	0.80	0.21	0.12
100	0.24	0.94	0.17	0.68	0.68	0.48	0.93	0.25	0.17

Abbreviations: HR, hazard ratio; α, probability of harmful level of X.

Population values of DN, DS, and EV for uncensored survival outcome and dichotomous prognostic factor Abbreviations: HR, hazard ratio; α, probability of harmful level of X.

Small sample results

While the main empirical properties of DN and DS can best be understood on the population level as dealt within the previous sections, in practice, only estimates of these measures are available. By means of simulation studies, we have explored bias and variability of estimates of DN and DS with sample sizes of n = 200 and n = 500 for several scenarios with dichotomous and survival outcomes. For each scenario investigated in this section, 250 random samples have been drawn using the specifications of Sections 4.1 and 4.2 for data generation. For selected scenarios with dichotomous outcomes and dichotomous prognostic factor in Table 3, the distribution of bias in terms of its median and quartiles has been studied. Bias, calculated as sample minus population value, becomes appreciable if and are unbalanced in opposite directions, eg, and , and the corresponding population value of DN or of DS is high. In addition, with population values of 0 for DN or for DS, appreciable bias (>0.1) is possible as estimates cannot become negative, a property shared by other bounded measures. Otherwise, no bias is observed. Detailed results are available in Table 1 of the Supplementary Material. The performance with continuous prognostic factors appears analogous as indicated in Table 2 of the Supplementary Material. In Section 2.4, we have introduced confidence intervals for DN and DS, which might be useful given possible bias of point estimates. These provide the range of population values of DN and DS compatible with a given sample. We have explored the coverage of the recommended percentile and BCa intervals in a simulation study with 1000 simulated samples per scenario using 1000 bootstrap replicates per simulated sample. Coverage appears sufficiently close to the nominal level: eg, for underlying , P(D) = 0.5, a standard normally distributed prognostic factor and n = 200, the two‐sided 95% confidence intervals covered the value 0 with a frequency of 94.8%. The average width of these confidence intervals is 0.19 for n = 200, and decreases to 0.12 for n = 500, to 0.09 for n = 1000, and to 0.03 for n = 10 000. For underlying , the average width of these confidence intervals is 0.23, 0.14, 0.10, and 0.03, respectively. Simple percentile bootstrap intervals yielded very similar results. Though the use of DN and DS often will be descriptive only, these results can provide rough guidance on sample sizes required for medical studies reporting values of DN and DS. The simulation studies on bias and variability of DN and DS for censored and uncensored survival outcomes (see Tables 3 and 4 of the Supplementary Material) indicate only negligible bias for underlying hazard ratios . The bias under HR = 1, where population DN and DS are 0, appears relatively small without censoring (≤0.04 for n = 200; decreasing for increasing sample size) but can increase substantially for high amount of censoring (eg, under 90% administrative censoring, ie, for expected 20 uncensored observations, it rises to estimated 0.15).

EXAMPLES

We demonstrate the application of our proposed measures to two datasets from prognostic factor studies, the first with a dichotomous, the second with a survival outcome. In Section 1, we already have presented a simple epidemiologic example with a single prognostic factor. Datasets with multiple prognostic factors permit more elaborate usage of the proposed measures. The following two examples should give an impression of the usefulness of quantifying the degrees of necessity and of sufficiency, in addition to explained variation, and, of course, to standard results from logistic and Cox regression analysis. Both examples have already served to illustrate use of explained variation by models with dichotomous and survival outcomes.8

Prognostic factors for tumor penetration of prostatic capsule

A prostate cancer dataset from The Ohio State University Comprehensive Cancer Center has been presented in the work of Hosmer and Lemeshow.25 Statistical analysis of the 376 completely documented patients should determine whether potential prognostic factors measured at a baseline exam could be used to predict whether the tumor has penetrated the prostatic capsule. Penetration was observed in 41% of the patients. The following prognostic factors are considered: age (in years), race (White, Black), results of digital rectal exam (1 = no, 2 = unilobular, and 3 = bilobular nodule), detection of capsular involvement (no, yes), prostatic specific antigen value (PSA, in mg/ml, range 0.3‐139.7), tumor volume from ultrasound (in cm3, range 0‐97.6), and total Gleason score (range 0‐9). Multiple logistic regression of tumor penetration on the prognostic factors considered gives significant estimates of odds ratios for Gleason score ( =2.6; p < 0.00001), digital rectal exam ( =2.1; p = 0.0008), and PSA ( =1.03; p = 0.008). Despite impressive odds ratios and p‐values, the proportion of variation attributable to the set of seven prognostic factors is only moderate, EV = 0.30, accompanied by moderately large and balanced degrees of necessity and sufficiency, DN 1 = 0.56 and DS 1 = 0.52, respectively. Detailed results for the individual prognostic factors are provided by Table 6. For comparison, we also cite DN 2 and DS 2, which only differ for nondichotomous prognostic factors. Personally, we prefer DN 1 and DS 1 because of their closer connection to EV. Besides Equation (7), we recall from Section 2 that for dichotomous prognostic factors. Approximately, this relationship is also observed for the ordinal or continuous prognostic factors age, digital rectal exam, PSA, tumor volume, and Gleason score (all modeled linearly).

Table 6

Marginal degrees of necessity and sufficiency and explained variation of prognostic factors for tumor penetration of prostatic capsule

Prognostic factors	EV	DN ₁	DS ₁	DN ₂	DS ₂
Age	<0.01	0.05	0.04	0.04	0.03
Digital rectal exam	0.09	0.49	0.20	0.49	0.12
PSA	0.12	0.23	0.46	0.21	0.34
Tumor volume	0.01	0.16	0.08	0.12	0.07
Gleason score	0.23	0.51	0.44	0.47	0.39
Race	<0.01	0.03	<0.01	0.03	<0.01
Capsular involvement	0.06	0.10	0.58	0.10	0.58
Full model	0.30	0.56	0.52	0.50	0.43

Marginal degrees of necessity and sufficiency and explained variation of prognostic factors for tumor penetration of prostatic capsule In terms of marginal , the Gleason score is the strongest single prognostic factor, followed by PSA, digital rectal exam, and capsular involvement. Due to the different scales of the prognostic factors, such a ranking would have been difficult to establish from odds ratios. We now go a step further and decompose EV values into corresponding degrees of necessity and sufficiency. The strongest prognostic factor, Gleason score, is reasonably strong in both DN and DS (0.51 and 0.44, respectively). With respect to sufficiency—if you have the harmful level, it is difficult to avoid the unfavorable outcome—the detection of capsular involvement with DS 1 = 0.58 attracts attention; however, its degree of necessity is remarkably low (DN 1 = 0.10). The digital rectal exam stands out with respect to its degree of necessity with DN 1 = 0.49—without a harmful level, the unfavorable outcome becomes not very likely—but its degree of sufficiency being rather low (DS 1 = 0.20). Only the BCa confidence interval for the lowest DN and DS values of age and race includes 0. DN and DS for the full model are 0.56 (95% confidence interval 0.48 to 0.63) and 0.52 (0.43 to 0.59), respectively. Regarding partial results, only the Gleason score shows a considerable partial EV of 0.09. Interestingly, this variable hardly contributes to the DS of the full model (partial DS 1 = 0.03) while it contributes about one third of the DN of the full model (partial DN 1 = 0.16).

Prognostic factors for survival with primary biliary cirrhosis

A randomized trial in primary biliary cirrhosis of the liver was conducted at the Mayo Clinic from 1974 to 1984. For a total of 312 patients, survival times (time until death from any cause) and status (60% censored) as well as several prognostic factors have been obtained and analyzed by Fleming and Harrington.26 In our analysis, we used the following prognostic factors: age (in years), presence of edema (no, yes), albumin (in mg/dl, range 2.0‐4.6), log of serum bilirubin (serum bilirubin in mg/dl, range 1.2‐3.3), and log of prothrombin time (prothrombin time in seconds, range 2.2‐2.8). The time range of medical interest, adequately covered by the sample, is 0 to 12 years. Multiple Cox regression gives significant estimates of hazard ratios for log of bilirubin ( =2.4; p < 0.00001), albumin ( =0.37; p < 0.0001), age ( =1.03; p = 0.0001), and log of prothrombin time ( =25.6; p = 0.001). The proportion of the variation of survival explained by the full model is 0.40, which is relatively high for studies of survival (cf Table 3 in the work of Schemper and Henderson18). The EV of 0.40 is accompanied by relatively large and balanced degrees of necessity and sufficiency (DN 1 = 0.63 and DS 1 = 0.59). Hazard ratios and p‐values of the individual prognostic factors are quite impressive. Since they are measured on different scales, their importance should be compared using the EV values for the individual prognostic factors in Table 7.

Table 7

Marginal degrees of sufficiency and necessity and explained variation of prognostic factors for survival with primary biliary cirrhosis

Prognostic factors	EV	DN ₁	DS ₁	DN ₂	DS ₂
log (bilirubin)	0.31	0.56	0.51	0.52	0.43
log (prothrombin time)	0.09	0.28	0.24	0.25	0.17
Edema	0.10	0.20	0.53	0.20	0.53
Albumin	0.17	0.41	0.36	0.35	0.28
Age	0.06	0.29	0.20	0.25	0.16
Full model	0.40	0.63	0.59	0.59	0.50

Marginal degrees of sufficiency and necessity and explained variation of prognostic factors for survival with primary biliary cirrhosis Edema seems to be quite sufficient for death of these patients (DS 1 = 0.53), but the much lower degree of necessity (DN 1 = 0.20) leads to a small proportion of explained variation (EV = 0.10). The left panel of Figure 2 shows that the survival outlook of the small subgroup of patients with edema (16% of patients) is much worse. It is relatively hard to escape death, once edema is observed. In contrast, survival probabilities for patients without edema are close to the total average at each time point, resulting in relatively low necessity.

Figure 2

The effect of edema (left panel) and of log bilirubin (right panel) on survival with primary biliary cirrhosis. Log bilirubin grouped according to quartiles

The effect of edema (left panel) and of log bilirubin (right panel) on survival with primary biliary cirrhosis. Log bilirubin grouped according to quartiles Log bilirubin is by far the strongest prognostic factor and accounts for about three quarters (EV = 0.31) of the explanatory capacity of the full model with EV = 0.40. The prognostic factor receives similarly strong contributions from necessity (DN 1 = 0.56) and sufficiency (DS 1 = 0.51). This is confirmed by the right panel of Figure 2, which indicates an approximately symmetric distribution of conditional survival probabilities around the unconditional probabilities at most time points. Log bilirubin also is the only prognostic factor with nonnegligible partial DN and DS values (partial DN 1 = 0.14, partial DS 1 = 0.09).

CONCLUDING REMARKS

The suggested measures DN and DS comply with the desirable properties for such measures postulated in Section 1: They conform to property 1 with respect to simplicity, intuitive appeal, and the well interpretable range of values (see Section 2.1). They share this property with AR, which they extend in various ways. With respect to property 2, DN and DS share the same structure. However, they are independently sensitive to relevant characteristics of a dataset, as demonstrated empirically in Section 4. In Appendix B, we show that all pairs of values (DN, DS) are possible within [0, 1]2. The correspondence between DN and DS on one side and EV on the other (property 3) is confirmed for both variants of DN and DS at the extremes of their scales (0 and 1; see Section 2.3). Only for the first variants, DN 1 and DS 1, the intuitive relationship (7) has been deduced for the full scale [0, 1]. Furthermore, has been proven for the case of a single dichotomous prognostic factor, while for general types of prognostic factors, an approximate equality of and EV has been demonstrated empirically. Properties 4 and 5 are fulfilled since only unconditional and conditional predictions are required. They may be obtained from models of any type including prognostic factors of any type. The notions of necessary and sufficient conditions for events to happen have existed for a long time both in formal logic as well as in empirical sciences. Though potentially fruitful for increasing knowledge of cause‐effect relationships in medicine, related concepts have been a neglected field in biostatistics. We anchor our proposed measures of the degrees of necessity and of sufficiency between attributable risk and explained variation. Both of them are well established concepts in biostatistics. The attributable risk for a dichotomous prognostic factor is at the core of our more general measure of the degree of necessity. We have introduced a new measure of the degree of sufficiency that is as symmetric to the measure of the degree of necessity as the original notions are. Together, both measures have been linked to an established measure of the degree of variation in the outcome, explained by a prognostic factor. Explained variation, already part of the toolbox for the analysis of prognostic factor studies, measures the “importance” of prognostic factors,27 which carries independent information, additional to odds or hazard ratios and p‐values from regression. Our suggested measures provide additional information on EV‐values, whether these are primarily due to a factor's contribution to necessity, to sufficiency, or to both in a balanced way. Because of this feature, our measures have been designed to share relevant properties with a measure of explained variation. In particular, while suitable for use in connection with a modeling task by logistic and Cox regression, they do not require such a model. They are only based on unconditional and conditional probability estimates of the outcomes, and these might come from other regression models (eg, probit regression or parametric Weibull survival models), from neural networks, expert judgment, or clinical classification schemes. In any case, the application of our proposed measures is conditional on the appropriateness of the specified relationship between prognostic factors and outcome. The degrees of necessity and sufficiency can be compared across different types of models, different types of prognostic factors (categorical or continuous), and prognostic indices with a different number of prognostic factors. By means of partial DN and DS, the benefit of adding a new prognostic factor to an already existing group of established factors can be judged ‐ similar to a prognostic factor's additional contribution to EV. If, in a medical application, either only the increase in necessity or only the increase in sufficiency is of interest, the selection of prognostic factors could be controlled by the gain in DN or in DS, rather than by the usual more general criteria. Just as with EV, also the estimation of DN and DS relies on data from representative samples with respect to the distributions of the prognostic factors and, even more, of the outcome variable. The outcome distribution might be reliably obtained from cross‐sectional or cohort, but not from case‐control studies. Though the simple measurement scale of the four measures presented makes them equally easy in interpretation, we prefer and because of their closer relationship with explained variation. Application of the suggested measures is facilitated by an R function NecSuff and by a SAS macro NecSuff, both of which are available in the online supplement. SIM_8331‐Supp‐0001‐Gleiss_Schemper_Supplement_revised.pdf Click here for additional data file.

18 in total

1. Predictive accuracy and explained variation in Cox regression.

Authors: M Schemper; R Henderson
Journal: Biometrics Date: 2000-03 Impact factor: 2.571

2. Predictive accuracy and explained variation.

Authors: Michael Schemper
Journal: Stat Med Date: 2003-07-30 Impact factor: 2.373

Review 3. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker.

Authors: Margaret Sullivan Pepe; Holly Janes; Gary Longton; Wendy Leisenring; Polly Newcomb
Journal: Am J Epidemiol Date: 2004-05-01 Impact factor: 4.897

4. Using mutual information to measure the impact of multiple genetic factors on AIDS.

Authors: George W Nelson; Stephen J O'Brien
Journal: J Acquir Immune Defic Syndr Date: 2006-07 Impact factor: 3.731

5. Attributable fractions with survival data.

Authors: Sven Ove Samuelsen; Geir Egil Eide
Journal: Stat Med Date: 2008-04-30 Impact factor: 2.373

6. Causes.

Authors: K J Rothman
Journal: Am J Epidemiol Date: 1976-12 Impact factor: 4.897

7. Explained variation in survival analysis.

Authors: M Schemper; J Stare
Journal: Stat Med Date: 1996-10-15 Impact factor: 2.373

8. Explained variation for logistic regression.

Authors: M Mittlböck; M Schemper
Journal: Stat Med Date: 1996-10-15 Impact factor: 2.373

9. Statistical controversies in clinical research: the importance of importance.

Authors: A Gleiss; R Zeillinger; E I Braicu; F Trillsch; I Vergote; M Schemper
Journal: Ann Oncol Date: 2016-04-06 Impact factor: 32.976

10. Survival attributable to an exposure.

Authors: Christopher Cox; Haitao Chu; Alvaro Muñoz
Journal: Stat Med Date: 2009-11-20 Impact factor: 2.373

3 in total

1. Contribution of causal factors to disease burden: how to interpret attributable fractions.

Authors: Emilie Counil
Journal: Breathe (Sheff) Date: 2021-12

2. A Model Predicting Mortality of Hospitalized Covid-19 Patients Four Days After Admission: Development, Internal and Temporal-External Validation.

Authors: Stefan Heber; David Pereyra; Waltraud C Schrottmaier; Kerstin Kammerer; Jonas Santol; Benedikt Rumpf; Erich Pawelka; Markus Hanna; Alexander Scholz; Markus Liu; Agnes Hell; Klara Heiplik; Benno Lickefett; Sebastian Havervall; Marianna T Traugott; Matthias J Neuböck; Christian Schörgenhofer; Tamara Seitz; Christa Firbas; Mario Karolyi; Günter Weiss; Bernd Jilma; Charlotte Thålin; Rosa Bellmann-Weiler; Helmut J F Salzer; Gero Szepannek; Michael J M Fischer; Alexander Zoufaly; Andreas Gleiss; Alice Assinger
Journal: Front Cell Infect Microbiol Date: 2022-01-24 Impact factor: 5.293

3. Quantifying degrees of necessity and of sufficiency in cause-effect relationships with dichotomous and survival outcomes.

Authors: Andreas Gleiss; Michael Schemper
Journal: Stat Med Date: 2019-08-06 Impact factor: 2.373

3 in total