Richard J Stevens1, Katrina K Poppe2. 1. Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK. Electronic address: richard.stevens@phc.ox.ac.uk. 2. Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand.
Abstract
BACKGROUND AND OBJECTIVES: Definitions of calibration, an aspect of model validation, have evolved over time. We examine use and interpretation of the statistic currently referred to as the calibration slope. METHODS: The history of the term "calibration slope", and usage in papers published in 2016 and 2017, were reviewed. The behaviour of the slope in illustrative hypothetical examples and in two examples in the clinical literature was demonstrated. RESULTS: The paper in which the statistic was proposed described it as a measure of "spread" and did not use the term "calibration". In illustrative examples, slope of 1 can be associated with good or bad calibration, and this holds true across different definitions of calibration. In data extracted from a previous study, the slope was correlated with discrimination, not overall calibration. Many authors of recent papers interpret the slope as a measure of calibration; a minority interpret it as a measure of discrimination or do not explicitly categorise it as either. Seventeen of thirty-three papers used the slope as the sole measure of calibration. CONCLUSION: Misunderstanding about this statistic has led to many papers in which it is the sole measure of calibration, which should be discouraged.
BACKGROUND AND OBJECTIVES: Definitions of calibration, an aspect of model validation, have evolved over time. We examine use and interpretation of the statistic currently referred to as the calibration slope. METHODS: The history of the term "calibration slope", and usage in papers published in 2016 and 2017, were reviewed. The behaviour of the slope in illustrative hypothetical examples and in two examples in the clinical literature was demonstrated. RESULTS: The paper in which the statistic was proposed described it as a measure of "spread" and did not use the term "calibration". In illustrative examples, slope of 1 can be associated with good or bad calibration, and this holds true across different definitions of calibration. In data extracted from a previous study, the slope was correlated with discrimination, not overall calibration. Many authors of recent papers interpret the slope as a measure of calibration; a minority interpret it as a measure of discrimination or do not explicitly categorise it as either. Seventeen of thirty-three papers used the slope as the sole measure of calibration. CONCLUSION: Misunderstanding about this statistic has led to many papers in which it is the sole measure of calibration, which should be discouraged.
Authors: Shaan Khurshid; Samuel Friedman; Christopher Reeder; Paolo Di Achille; Nathaniel Diamant; Pulkit Singh; Lia X Harrington; Xin Wang; Mostafa A Al-Alusi; Gopal Sarma; Andrea S Foulkes; Patrick T Ellinor; Christopher D Anderson; Jennifer E Ho; Anthony A Philippakis; Puneet Batra; Steven A Lubitz Journal: Circulation Date: 2021-11-08 Impact factor: 29.690
Authors: Shaan Khurshid; Uri Kartoun; Jeffrey M Ashburner; Ludovic Trinquart; Anthony Philippakis; Amit V Khera; Patrick T Ellinor; Kenney Ng; Steven A Lubitz Journal: Circ Arrhythm Electrophysiol Date: 2020-12-09
Authors: Francesca Bottino; Emanuela Tagliente; Luca Pasquini; Alberto Di Napoli; Martina Lucignani; Lorenzo Figà-Talamanca; Antonio Napolitano Journal: J Pers Med Date: 2021-09-07