Ben Van Calster1,2, Andrew J Vickers3, Michael J Pencina4,5, Stuart G Baker6, Dirk Timmerman1, Ewout W Steyerberg2. 1. Department of Development and Regeneration, KU Leuven–University of Leuven, Leuven, Belgium (BVC, DT) 2. Department of Public Health, Erasmus MC, Rotterdam, the Netherlands (BVC, EWS) 3. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York (AJV) 4. Department of Biostatistics, Boston University, Boston, Massachusetts (MJP) 5. Harvard Clinical Research Institute, Boston, Massachusetts (MJP) 6. Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland (SGB).
Abstract
BACKGROUND: For the evaluation and comparison of markers and risk prediction models, various novel measures have recently been introduced as alternatives to the commonly used difference in the area under the receiver operating characteristic (ROC) curve (ΔAUC). The net reclassification improvement (NRI) is increasingly popular to compare predictions with 1 or more risk thresholds, but decision-analytic approaches have also been proposed. OBJECTIVE: . We aimed to identify the mathematical relationships between novel performance measures for the situation that a single risk threshold T is used to classify patients as having the outcome or not. METHODS: . We considered the NRI and 3 utility-based measures that take misclassification costs into account: difference in net benefit (ΔNB), difference in relative utility (ΔRU), and weighted NRI (wNRI). We illustrate the behavior of these measures in 1938 women suspect of having ovarian cancer (prevalence 28%). RESULTS: . The 3 utility-based measures appear to be transformations of each other and hence always lead to consistent conclusions. On the other hand, conclusions may differ when using the standard NRI, depending on the adopted risk threshold T, prevalence P, and the obtained differences in sensitivity and specificity of the 2 models that are compared. In the case study, adding the CA-125 tumor marker to a baseline set of covariates yielded a negative NRI yet a positive value for the utility-based measures. CONCLUSIONS: . The decision-analytic measures are each appropriate to indicate the clinical usefulness of an added marker or compare prediction models since these measures each reflect misclassification costs. This is of practical importance as these measures may thus adjust conclusions based on purely statistical measures. A range of risk thresholds should be considered in applying these measures.
BACKGROUND: For the evaluation and comparison of markers and risk prediction models, various novel measures have recently been introduced as alternatives to the commonly used difference in the area under the receiver operating characteristic (ROC) curve (ΔAUC). The net reclassification improvement (NRI) is increasingly popular to compare predictions with 1 or more risk thresholds, but decision-analytic approaches have also been proposed. OBJECTIVE: . We aimed to identify the mathematical relationships between novel performance measures for the situation that a single risk threshold T is used to classify patients as having the outcome or not. METHODS: . We considered the NRI and 3 utility-based measures that take misclassification costs into account: difference in net benefit (ΔNB), difference in relative utility (ΔRU), and weighted NRI (wNRI). We illustrate the behavior of these measures in 1938 women suspect of having ovarian cancer (prevalence 28%). RESULTS: . The 3 utility-based measures appear to be transformations of each other and hence always lead to consistent conclusions. On the other hand, conclusions may differ when using the standard NRI, depending on the adopted risk threshold T, prevalence P, and the obtained differences in sensitivity and specificity of the 2 models that are compared. In the case study, adding the CA-125tumor marker to a baseline set of covariates yielded a negative NRI yet a positive value for the utility-based measures. CONCLUSIONS: . The decision-analytic measures are each appropriate to indicate the clinical usefulness of an added marker or compare prediction models since these measures each reflect misclassification costs. This is of practical importance as these measures may thus adjust conclusions based on purely statistical measures. A range of risk thresholds should be considered in applying these measures.
Authors: Margaret S Pepe; Ziding Feng; Ying Huang; Gary Longton; Ross Prentice; Ian M Thompson; Yingye Zheng Journal: Am J Epidemiol Date: 2007-11-02 Impact factor: 4.897
Authors: Ewout W Steyerberg; Michael J Pencina; Hester F Lingsma; Michael W Kattan; Andrew J Vickers; Ben Van Calster Journal: Eur J Clin Invest Date: 2011-07-05 Impact factor: 4.686
Authors: Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan Journal: Epidemiology Date: 2010-01 Impact factor: 4.822
Authors: John M Findlay; Richard S Gillies; Bruno Sgromo; Robert E K Marshall; Mark R Middleton; Nicholas D Maynard Journal: J Gastrointest Surg Date: 2014-04-24 Impact factor: 3.452
Authors: Ben Van Calster; Laure Wynants; Jan F M Verbeek; Jan Y Verbakel; Evangelia Christodoulou; Andrew J Vickers; Monique J Roobol; Ewout W Steyerberg Journal: Eur Urol Date: 2018-09-19 Impact factor: 20.096
Authors: Kathleen F Kerr; Zheyu Wang; Holly Janes; Robyn L McClelland; Bruce M Psaty; Margaret S Pepe Journal: Epidemiology Date: 2014-01 Impact factor: 4.822