Iztok Hozo1, Athanasios Tsalatsanis2,3, Benjamin Djulbegovic2,3,4,5. 1. Department of Mathematics, Indiana University Northwest, Gary, Indiana, USA. 2. USF Health Program for Comparative Effectiveness Research, Tampa, Florida, USA. 3. Division for Evidence Based Medicine, Department of Internal Medicine, University of South Florida, Tampa, Florida, USA. 4. Departments of Hematology and Health Outcomes Behavior, H. Lee Moffitt Cancer and Research Institute, Tampa, Florida, USA. 5. Tampa General Hospital, Tampa, Florida, USA.
Abstract
RATIONALE, AIMS, AND OBJECTIVES: Decision curve analysis (DCA) is a widely used method for evaluating diagnostic tests and predictive models. It was developed based on expected utility theory (EUT) and has been reformulated using expected regret theory (ERG). Under certain circumstances, these 2 formulations yield different results. Here we describe these situations and explain the variation. METHODS: We compare the derivations of the EUT- and ERG-based formulations of DCA for a typical medical decision problem: "treat none," "treat all," or "use model" to guide treatment. We illustrate the differences between the 2 formulations when applied to the following clinical question: at which probability of death we should refer a terminally ill patient to hospice? RESULTS: Both DCA formulations yielded identical but mirrored results when treatment effects are ignored; they generated significantly different results otherwise. Treatment effect has a significant effect on the results derived by EUT DCA and less so on ERG DCA. The elicitation of specific values for disutilities affected the results even more significantly in the context of EUT DCA, whereas no such elicitation was required within the ERG framework. CONCLUSION: EUT and ERG DCA generate different results when treatment effects are taken into account. The magnitude of the difference depends on the effect of treatment and the disutilities associated with disease and treatment effects. This is important to realize as the current practice guidelines are uniformly based on EUT; the same recommendations can significantly differ if they are derived based on ERG framework.
RATIONALE, AIMS, AND OBJECTIVES: Decision curve analysis (DCA) is a widely used method for evaluating diagnostic tests and predictive models. It was developed based on expected utility theory (EUT) and has been reformulated using expected regret theory (ERG). Under certain circumstances, these 2 formulations yield different results. Here we describe these situations and explain the variation. METHODS: We compare the derivations of the EUT- and ERG-based formulations of DCA for a typical medical decision problem: "treat none," "treat all," or "use model" to guide treatment. We illustrate the differences between the 2 formulations when applied to the following clinical question: at which probability of death we should refer a terminally ill patient to hospice? RESULTS: Both DCA formulations yielded identical but mirrored results when treatment effects are ignored; they generated significantly different results otherwise. Treatment effect has a significant effect on the results derived by EUT DCA and less so on ERG DCA. The elicitation of specific values for disutilities affected the results even more significantly in the context of EUT DCA, whereas no such elicitation was required within the ERG framework. CONCLUSION: EUT and ERG DCA generate different results when treatment effects are taken into account. The magnitude of the difference depends on the effect of treatment and the disutilities associated with disease and treatment effects. This is important to realize as the current practice guidelines are uniformly based on EUT; the same recommendations can significantly differ if they are derived based on ERG framework.
Arguably, the threshold model represents one of the most important advances in clinical decision making.1, 2 According to the threshold model when faced with uncertainty about whether to treat, order a test or apply a predictive model, or simply observe the patient, there exists some probability of disease or disease outcome (threshold), at which a physician is indifferent between administering versus not administering treatment, or acting according to test or predictive model.1, 2 The threshold model reflects one of the fundamental principles of rational decision making: it is rational for a doctor to act (ie, order a diagnostic test, prescribe treatment) and for the patient to accept the proposed health intervention when one believes that benefits (gains) (B) of such action will outweigh its harms (losses) (H), ie, exceed threshold (T).3, 4Originally, the threshold model was derived within the precepts of the expected utility theory (EUT).1, 2 During the last 40 years, the threshold model has been reformulated in a number of ways, both within the framework of EUT and non‐EUT theories (for a review, see Djulbegovic et al4). One such extension of the threshold model is decision curve analysis (DCA).DCA is a widely used technique for evaluation of the value of diagnostic tests or predictive models over a range of all possible thresholds. 5, 6, 7, 8, 9, 10 The assessment of the threshold probability at which a decision maker is indifferent between failure to administer a beneficial, over committing to a potentially harmful health intervention, allows capturing patient preferences related to given management choices.1, 2, 4 DCA incorporates the predictive model's accuracy, the consequences of a decision action, and a patient's preferences to assess the best course of action, such as making a decision according to the predictive model, treat all patients, or treat none. 5, 6, 7 One of the advantages of DCA is that we do not actually need to elicit the threshold from each patient, but instead model decisions about treatment over a range of thresholds without knowing details about specific utilities that determine threshold.5, 6DCA was originally formulated using EUT5, 6 and reformulated within the expected regret theory (ERG) framework.7, 11 In the EUT DCA, the best course of action is the one associated with the highest expected value, whereas in the ERG DCA, the best course of action is the decision that will lead to the least amount of regret.7We previously showed that ERG DCA and EUT DCA lead to the same decisions if treatment effects are not taken into consideration.7 Note, however, that the original DCA did not explicitly model treatment effect on patient's outcomes. In this paper, we demonstrate that when treatment effects are included in the modeling, different results are generated by EUT and ERG DCA, which has important implications for medical decision making.
METHODS
Model structure
Figure 1 displays a typical decision tree describing treatment options based on the results of a prediction model. represents the model‐generated probability of the event of interest D (D+ event present, D− event absent), such as disease presence or occurrence of outcome in a patient ; is the actual probability of the event D+ for the patient . The individual's risk is
, where N is the number of patients. RRR is the relative risk reduction expected from treatment Rx, U
is the utility associated with each outcome j, and T is a threshold probability at which a decision maker is indifferent between the “do not treat” (NoRx) and “treat” (Rx) strategies.
Figure 1
Decision tree depicting use of a predictive model to guide treatment choices according to expected utility‐based theory (“EUT utilities”) and ERG (displayed as “Regret Utilities,” ie, differences in values or utility of the outcomes of the action taken and the utility of the outcomes of another action, which, in retrospect, we should have taken12, 13, 14). As explained in the text, q
represents the model‐generated probability of disease outcome for a patient i, whereas p
is the actual probability of the event D (disease outcome) for the same patient. T, threshold probability for treatment; D+, disease is present; D−, disease is absent; RRR, relative risk reduction of treatment. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been taken
Decision tree depicting use of a predictive model to guide treatment choices according to expected utility‐based theory (“EUT utilities”) and ERG (displayed as “Regret Utilities,” ie, differences in values or utility of the outcomes of the action taken and the utility of the outcomes of another action, which, in retrospect, we should have taken12, 13, 14). As explained in the text, q
represents the model‐generated probability of disease outcome for a patient i, whereas p
is the actual probability of the event D (disease outcome) for the same patient. T, threshold probability for treatment; D+, disease is present; D−, disease is absent; RRR, relative risk reduction of treatment. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been takenWe opt to model treatment effect as relative risk reduction (RRR), which is a convenient way to express a risk ratio (RR) as a proportion of risk (p
) reduction according to15, 16
where RR is defined as the risk of event in the treatment group over the risk of event in control group.16The main advantage of using RRR as a measure of treatment effect over the treatment absolute differences is that the former remains constant over the range of predicted risks (p
).15, 16, 17 RRR is also easy to interpret: RRR = 1 means that the occurrence of outcome of interest is completely preventable [as ], whereas RRR = 0 means that treatment is useless as it does not affect underlying risk [].Note that we use the term “predictive model” in generic sense to predict or foresee/foretell something that is yet unknown (such as outcome occurrence in individual patients). Typically, such models convert available information (predictors) into a statement about the probability about diagnosis or prognosis.18, 19 The model shown in Figure 1 (and later in Figure 2) applies to both prognostic and diagnostic prediction as long as such a prediction is used to guide selection of treatment.
Figure 2
Decision tree depicting a typical 3‐choice dilemma. a, Expected utility‐based tree. b, expected regret‐based tree. Three alternatives are shown: treat all patients, treat none, and use a predictive model/test to decide whether to treat or not. q
represents the model‐generated probability of disease outcome for a patient i whereas p
is the actual probability of the event D+ (disease outcome) for the same patient. T, threshold probability for treatment; D−, disease is absent; RRR, relative risk reduction of treatment. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been taken
Decision tree depicting a typical 3‐choice dilemma. a, Expected utility‐based tree. b, expected regret‐based tree. Three alternatives are shown: treat all patients, treat none, and use a predictive model/test to decide whether to treat or not. q
represents the model‐generated probability of disease outcome for a patient i whereas p
is the actual probability of the event D+ (disease outcome) for the same patient. T, threshold probability for treatment; D−, disease is absent; RRR, relative risk reduction of treatment. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been taken
Derivation of the EUT DCA
As an illustrative example, Figure 2a shows the decision tree of a 3‐choice dilemma associated with hospice referral. In this case, a patient may decide to receive treatment targeting his underlying disease (“Treat All”), accept referral to hospice (“Treat None”), or act according to the threshold model based the patient's estimated probability of death (“Model”). In Figure 2a, q
and p
represent the model estimated probability of death and the actual probability of death D+ for patient i respectively. RRR represents the relative risk reduction associated with treatment Rx, U
is the utility of outcome j, and T is the threshold probability at which a decision maker is indifferent between benefits and harms of treatment.By solving the tree, we derive the expected value of the model asAs explained earlier, according to the threshold model, we treat if
. Therefore, the expected values of the “Treat none” and “Treat all” strategies can be computed by setting T = 1 and T = 0 in Equation (1), respectively. Thus,The optimal strategy is the one that yields the higher net benefit (NB). Using the derived expected utilities, we calculate the NB of a strategy by subtracting the expected utility of this strategy (eg, “Treat all” or “Model”) from the expected utility of the “Treat none” strategy.5, 6 Thus,To simplify Equation (2), we define the true positive rate for a given threshold as
and False Positive rate for the given threshold asThus,By replacing in Equation (2), we derive the NB of “Treat all” ( andTo further simplify our notation, and following Pauker and Kassirer1, 2 and Vickers and Elkin,5 we define the differences between utilities (preferences) related to the consequence of administering treatment when it would have been of benefit as ; similarly, the preferences related to the consequences of being unnecessarily treated are denoted as harms . Finally, even if appropriately given, there is no guarantee that only ( patients will receive treatment (see Figure 2); some patients with ( may also receive treatment. We defined this difference (Δ) between utilities of administering treatment as Δ = . Note that all threshold models to date assumed that the differences in NB and harm between these utilities are positive (), which is a clinically sensible assumption. Although, in principle, that is possible, we do not consider the case of negative utilities or negative in our threshold model.With these substitutions, we can rewrite the formulas above asThe threshold probability, or the probability at which one is indifferent between deciding to treat versus not to treat, is computed asUsing the definitions for , and Δ, above, we haveNote that if RRR = 0, reduces to the “classic” EUT Pauker and Kassirer threshold1, 2:B, H, and Δ can be further characterized in terms of disutilities, or using other popular evidence‐based statistical measures.20, 21 When done so, the threshold model can be further formulated in a number of other ways (for a review, see Djulbegovic et al4).Although the original and widely used DCA did not take treatment effect into consideration, Vickers et al10 did attempt to integrate treatment into EUT DCA by expressing the threshold as the absolute risk reduction between 2 treatments: ARD = , where “p
represents the probability of event for patients receiving treatment and p
0 represents the probability of event in untreated patients”. In our views, this creates several problems. First, ARD does vary with baseline risk, and it is preferable to model treatment effects using relative effects such as RRR that remain constant over the range of predicted risks (p
).15, 16, 17 Second, it may be better to express the threshold with respect to individual risk probabilities (p
). In principle, that is possible by expressing , which would, in turn, allow reformulation of threshold via However, most importantly, the method described by Vickers et al10, assume that U1–U3 < 0. Although technically this is correct, clinically such a situation constitute an extremely rare case, particularly in the area of cancer treatment discussed by the authors. Regardless if one wants to apply this EUT DCA model, its use per se is immaterial to the main objective of our paper, which is to contrast findings using EUT DCA with ERG DCA.In further exposition, we do not consider all possible ways how the threshold equation can be expressed but focus on derivation of DCA, which is mainly done by scaling the original threshold formulas. As explained, our main intent here is to demonstrate differences between EUT‐ and ERG‐derived DCA. Using the relationship expressed in Equation (4), we can derive scaled NBs asScaled byNote that if RRR = 0, these equations reduce to the Vickers and Elkin DCA equation (which uses “classic” T).
5Scaled byScaled byDCA curves are generated by plotting NB of all 3 strategies (eg, “Treat all,” “Treat none,” and “Model”) over all thresholds of interest.
Derivation of the ERG DCA
Figure 2b depicts the same decision problem from the regret point of view. Utilities of each outcome are now represented in terms of regret, ie, the difference between the utility of the action taken and the utility of the action that, in retrospect, should have been taken.12, 13, 14Solving the tree in Figure 2b, we derive the expected regret associated with the prediction model asJust like in EUT case, when the threshold is equal to zero, or one, we haveWe derive the net expected regret difference (NERD) by calculating the expected regret of each choice (eg, “Treat All,” “Model”) and subtracting it from the expected regret of “Treat None.”7, 11, 22Using the previously defined and , as well as , and Δ, we can rewrite this asWith ( and ) in Equation (7), we derive the NERD of the “Treat all” strategy asThe threshold probability is computed as
orNote that if RRR = 0, T reduces to the “classic” EUT Pauker and Kassirer (P&K) threshold T
c
1, 2 (see above).Scaled byNote that if RRR = 0, the equations above and below reduce to Vickers and Elkin DCA EUT equation (see also above).5Scaled by
Scaled by
RESULTS
EUT and ERG DCA generate different results
The major difference between EUT and ERG DCA arises because of the definition of the threshold probability when using the classical EUT and utilities expressed via ERG:Both of these thresholds can be connected to the “classical” P&K EUT threshold viaAgain, note that if RRR = 0, all these thresholds reduce to the “classic” P&K EUT threshold. 1, 2Extending the threshold model into DCA, we obtain the following:Scaled by
Therefore,
Scaled by
Therefore,Equations (5) and (7) show that EUT‐based DCA and ERG‐based DCA differ by threshold definitions (T
EUT vs T
ERG) and the requirement for specifying Δ in the EUT model. As a result, noticeable differences in the evaluation of predictive models will be generated (Figure 3). In this illustrative example, according to the EUT DCA, the use of model to guide a management strategy is almost always best strategy regardless of RRR. However, according to ERG DCA, “Treat None” becomes best strategy with increasing thresholds. Only when RRR = 0, EUT DCA generates the same results as ERG DCA.
Figure 3
Decision curve analysis (DCA) for a model of referral a patent with terminal illness to hospice as a function of the threshold probability T. Three strategies are considered: “Treat All,” “Treat None: Refer to Hospice” (=0 on x axis), and “Model: Use Model to Guide Management.” The strategies are equal if they cross each other. The higher the value, the more superior strategy is. NB, net benefit according to expected utility theory (EUT); NERD, net expected regret differences. To enable comparison of EUT DCA and expected ERG DCA, NERD values are presented inversed (ie, –NERD). The results clearly show that EUT DCA and ERG DCA generate different results. (Note that Δ/H is arbitrary fixed to 0.05; somewhat different results are obtained when Δ/H vary.) We used the original patient‐level data from the SUPPORT study23 to create a simpler version of the model concerned with a decision whether to refer a patient to hospice/palliative care in the end‐of‐life setting. The curves are generated by calculating NB and NERD over all thresholds (from 0 to 1, in increments of 0.01)
Decision curve analysis (DCA) for a model of referral a patent with terminal illness to hospice as a function of the threshold probability T. Three strategies are considered: “Treat All,” “Treat None: Refer to Hospice” (=0 on x axis), and “Model: Use Model to Guide Management.” The strategies are equal if they cross each other. The higher the value, the more superior strategy is. NB, net benefit according to expected utility theory (EUT); NERD, net expected regret differences. To enable comparison of EUT DCA and expected ERG DCA, NERD values are presented inversed (ie, –NERD). The results clearly show that EUT DCA and ERG DCA generate different results. (Note that Δ/H is arbitrary fixed to 0.05; somewhat different results are obtained when Δ/H vary.) We used the original patient‐level data from the SUPPORT study23 to create a simpler version of the model concerned with a decision whether to refer a patient to hospice/palliative care in the end‐of‐life setting. The curves are generated by calculating NB and NERD over all thresholds (from 0 to 1, in increments of 0.01)Obviously, different predictive models with different prevalence of disease outcomes, specific values of treatment effects, and disutilities will generate different results than those shown in this example meant for illustration only. We provide an excel file that the reader can use with his/her own model and values for B, H, and Δ.
DISCUSSION
In this paper, we demonstrate that when treatment effects are included in modeling, different results are generated by EUT and ERG DCA. Under these circumstances, EUT DCA cannot be used to model decisions over all preferences without further knowledge of the specific utilities related to differences between U1 and U2 (Δ). This, however, would defy the DCA's original purpose of analyzing decision strategies without requiring the elicitation of patients' preferences.If the DCA method is to be used, ERG DCA seems to be preferable, which also has the following appeals: (1) it is a mathematically more parsimonious derivation of DCA derived within a coherent regret theory7; (2) as a cognitive emotion, regret is widely recognized as one of the key decision making mechanisms enabling a decision maker to experience consequences of decisions both at the emotional (type 1) and cognitive (type 2) level3, 24, 25; and (3) it is easily and reliably elicited using dual analog visual analog scale (“regret”‐meter)7 or similar scales.7, 26 We should also note that the model we used here is for illustration purposes only—not to advocate the use of this particular model but only to illustrate differences when the model is used within 2 different theoretical frameworks.Although we advocate using ERG DCA, we are aware of the long tradition of application of EUT in decision sciences, and of the unsettled debate about the superiority of 1 theory over the other. Our main point is that EUT and ERG versions of DCA do generate different results. Although, we cannot possibly settle here the question of superiority of EUT versus ERG (or other non‐EUT theories), the larger point we are making is that the decision at which threshold to act closely relates to the question of rational choice.3, 4 The “great rationality debate” has been prominent in nonmedical fields,27, 28 but has only been sporadic in clinical medicine. By highlighting the differences in the results between ERG DCA versus EUT DCA—the latter being widely used method—we hope to stimulate a “rationality debate” in clinical medicine. The practical importance of advancing this debate can be appreciated if we, for example, note that some practice guidelines such as guidelines for colorectal screening are based on EUT‐based modeling29; conceivably, different recommendations could have been made if the non‐EUT framework were used. Hence, we think that awareness of our findings is of importance to modelers, practicing physicians and policy makers.
Research Support
This study was supported by the DOD (grant no. W81 XWH 09‐2‐0175, PI: Djulbegovic).Supporting info itemClick here for additional data file.
Authors: Amy B Knudsen; Ann G Zauber; Carolyn M Rutter; Steffie K Naber; V Paul Doria-Rose; Chester Pabiniak; Colden Johanson; Sara E Fischer; Iris Lansdorp-Vogelaar; Karen M Kuntz Journal: JAMA Date: 2016-06-21 Impact factor: 56.272
Authors: Julian P T Higgins; Douglas G Altman; Peter C Gøtzsche; Peter Jüni; David Moher; Andrew D Oxman; Jelena Savovic; Kenneth F Schulz; Laura Weeks; Jonathan A C Sterne Journal: BMJ Date: 2011-10-18
Authors: Mia Djulbegovic; Jason Beckstead; Shira Elqayam; Tea Reljic; Ambuj Kumar; Charles Paidas; Benjamin Djulbegovic Journal: PLoS One Date: 2015-08-04 Impact factor: 3.240
Authors: Benjamin Djulbegovic; Iztok Hozo; Thomas Mayrhofer; Jef van den Ende; Gordon Guyatt Journal: J Eval Clin Pract Date: 2018-12-21 Impact factor: 2.431