| Literature DB >> 31651317 |
Laure Wynants1,2, Maarten van Smeden3,4, David J McLernon5, Dirk Timmerman6,7, Ewout W Steyerberg4, Ben Van Calster6,4.
Abstract
BACKGROUND: Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. MAIN TEXT: We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model.Entities:
Keywords: Clinical risk prediction model; Data science; Decision support techniques; Diagnosis; Prognosis; Risk; Threshold
Year: 2019 PMID: 31651317 PMCID: PMC6814132 DOI: 10.1186/s12916-019-1425-3
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Common terms
| AUC | Area under the curve, in this case the receiver operating characteristic curve. A measure of discrimination. For prediction models based on logistic regression, this corresponds to the probability that a randomly selected diseased patient had a higher risk prediction than a randomly selected patient who does not have the disease. |
| Calibration | Correspondence between predicted and observed risks usually assessed in calibration plots or by calibration intercepts and slopes. |
| Sensitivity | The proportion of true positives in truly diseased patients. |
| Specificity | The proportion of true negatives in truly non-diseased patients. |
| Positive predictive value | The proportion of true positives in patients classified as positive. |
| Negative predictive value | The proportion of true negatives in patients classified as negative. |
| Decision curve analysis | A method to evaluate classifications for a range of possible thresholds, reflecting different costs of false positives and benefits of true positives. |
| Net reclassification improvement | Net reclassification improvement, reflecting reclassifications in the right direction when making decisions based on one prediction model compared to another. |
| STRATOS | STRengthening Analytical Thinking for Observational Studies |
Example of a risk model: the ADNEX model
| ADNEX is a model to preoperatively characterize ovarian cancer by calculating the risks of benign tumors and four classes of malignant tumors. It was constructed using multinomial logistic regression and validated with more recent data and in other centers [ |
Fig. 1Frequencies of predicted risks of malignancy and three possible risk thresholds
Health-economic perspectives and clinical judgment in prediction modeling
| Risk thresholds ideally reflect the clinical context by balancing the benefits of correct decisions against the costs of incorrect decisions. Health economists often prefer to value outcomes of decisions in terms of quality-adjusted life-years, which combine mortality and quality of life in a single measure. Utility values (like quality of life) can be elicited using various formal methods [ | |
| In addition, health policy frequently involves a trade-off between monetary costs and health outcomes. To reach a societal optimum, monetary costs need to be calculated from the societal perspective (rather than the perspective of the healthcare provider or the individual patient), by including, for example, lost productivity due to time off work [ | |
| Besides data on costs and benefits [ | |
| While predictive performance is one input determining the optimal threshold, reliable data on costs or utilities are often not available in the process of validating a risk prediction model. Fortunately, the prediction modeler does not have to find the most optimal threshold from a health economic perspective to evaluate a model’s predictive performance. At the stage of model validation, it is often sufficient to consider a broad range of reasonable risk thresholds. This range can be set by asking for sensible upper and lower bounds on the maximum number of false positives one would tolerate to find one true positive [ | |
| It is only after a risk model is validated that a health economic analysis could optimize the risk threshold, based on the model’s demonstrated predictive performance, its positioning in the care pathway (e.g., in a sequence of tests [ |
Costs of outcomes when making a decision based on a risk threshold
| Diseased | Not diseased | |
|---|---|---|
| Intervene (predicted risk ≥t) | CTP = 15 The cost of detected/treated disease, e.g., risk of death or severe morbidity despite detection, plus the cost of intervening | CFP = 5 The cost of an unneeded intervention, e.g., invasiveness of testing, complication risks of treatment |
Do not intervene (predicted risk <t) | CFN = 95 The cost of an undetected disease, i.e., the risk of death or severe morbidity | CTN = 0 The cost of applying the risk model |
Classification statistics for a selection of thresholds
| Threshold | Sensitivity (95% CI) | Specificity (95% CI) | Positive predictive value (95% CI) | Negative predictive value (95% CI) |
|---|---|---|---|---|
| 0.1% | 1.00 (1.00–1.00) | 0.00 (0.00–0.01) | 0.41 (0.39–0.43) | 1.00 (0.40–1.00) |
| 6% (Utility-based for costs in Table | 0.98 (0.97–0.99) | 0.61 (0.59–0.64) | 0.64 (0.61–0.66) | 0.98 (0.96–0.98) |
| 10% | 0.97 (0.95–0.98) | 0.70 (0.67–0.72) | 0.69 (0.66–0.71) | 0.97 (0.96–0.98) |
| 20% | 0.93 (0.91–0.94) | 0.80 (0.78–0.82) | 0.76 (0.73–0.78) | 0.94 (0.92–0.95) |
| 31% (minimize misclassification) | 0.88 (0.86–0.90) | 0.85 (0.83–0.87) | 0.80 (0.78–0.83) | 0.91 (0.89–0.93) |
| 41% (prevalence) | 0.83 (0.80–0.85) | 0.88 (0.86–0.90) | 0.83 (0.80–0.85) | 0.88 (0.86–0.90) |
| 50% | 0.76 (0.74–0.79) | 0.90 (0.89–0.92) | 0.85 (0.82–0.87) | 0.85 (0.83–0.86) |
| 99.9% | 0.00 (0.00–0.01) | 1.00 (1.00–1.00) | 1.00 (0.02–1.00) | 0.59 (0.57–0.61) |