Literature DB >> 23304285

Testing the calibration of classification models from first principles.

Stephan Dreiseitl1, Melanie Osl.   

Abstract

The accurate assessment of the calibration of classification models is severely limited by the fact that there is no easily available gold standard against which to compare a model's outputs. The usual procedures group expected and observed probabilities, and then perform a χ(2) goodness-of-fit test. We propose an entirely new approach to calibration testing that can be derived directly from the first principles of statistical hypothesis testing. The null hypothesis is that the model outputs are correct, i.e., that they are good estimates of the true unknown class membership probabilities. Our test calculates a p-value by checking how (im)probable the observed class labels are under the null hypothesis. We demonstrate by experiments that our proposed test performs comparable to, and sometimes even better than, the Hosmer-Lemeshow goodness-of-fit test, the de facto standard in calibration assessment.

Mesh:

Year:  2012        PMID: 23304285      PMCID: PMC3540450     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  4 in total

1.  One model, several results: the paradox of the Hosmer-Lemeshow goodness-of-fit test for the logistic regression model.

Authors:  G Bertolini; R D'Amico; D Nardi; A Tinazzi; G Apolone
Journal:  J Epidemiol Biostat       Date:  2000

2.  Global goodness-of-fit tests in logistic regression with sparse data.

Authors:  Oliver Kuss
Journal:  Stat Med       Date:  2002-12-30       Impact factor: 2.373

3.  Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models.

Authors:  R L Kennedy; A M Burton; H S Fraser; L N McStay; R F Harrison
Journal:  Eur Heart J       Date:  1996-08       Impact factor: 29.983

Review 4.  A comparison of goodness-of-fit tests for the logistic regression model.

Authors:  D W Hosmer; T Hosmer; S Le Cessie; S Lemeshow
Journal:  Stat Med       Date:  1997-05-15       Impact factor: 2.373

  4 in total
  2 in total

1.  Elderly patients at higher risk of laryngeal carcinoma recurrence could be identified by a panel of two biomarkers (nm23-H1 and CD105) and pN+ status.

Authors:  Andrea Lovato; Gino Marioni; Enzo Manzato; Claudia Staffieri; Luciano Giacomelli; Giovanni Ralli; Alberto Staffieri; Stella Blandamura
Journal:  Eur Arch Otorhinolaryngol       Date:  2014-10-04       Impact factor: 2.503

2.  Implementation of the SunSmart program and population sun protection behaviour in Melbourne, Australia: Results from cross-sectional summer surveys from 1987 to 2017.

Authors:  Tamara Tabbakh; Angela Volkov; Melanie Wakefield; Suzanne Dobbinson
Journal:  PLoS Med       Date:  2019-10-08       Impact factor: 11.069

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.