Literature DB >> 33716360

Improved small-sample estimation of nonlinear cross-validated prediction metrics.

David Benkeser1, Maya Petersen2, Mark J van der Laan2,3.   

Abstract

When predicting an outcome is the scientific goal, one must decide on a metric by which to evaluate the quality of predictions. We consider the problem of measuring the performance of a prediction algorithm with the same data that were used to train the algorithm. Typical approaches involve bootstrapping or cross-validation. However, we demonstrate that bootstrap-based approaches often fail and standard cross-validation estimators may perform poorly. We provide a general study of cross-validation-based estimators that highlights the source of this poor performance, and propose an alternative framework for estimation using techniques from the efficiency theory literature. We provide a theorem establishing the weak convergence of our estimators. The general theorem is applied in detail to two specific examples and we discuss possible extensions to other parameters of interest. For the two explicit examples that we consider, our estimators demonstrate remarkable finite-sample improvements over standard approaches.

Entities:  

Keywords:  AUC; cross-validation; estimating equations; machine learning; prediction; targeted minimum loss-based estimation

Year:  2019        PMID: 33716360      PMCID: PMC7954141          DOI: 10.1080/01621459.2019.1668794

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  15 in total

1.  SisPorto 2.0: a program for automated analysis of cardiotocograms.

Authors:  D Ayres-de Campos; J Bernardes; A Garrido; J Marques-de-Sá; L Pereira-Leite
Journal:  J Matern Fetal Med       Date:  2000 Sep-Oct

2.  Time-dependent ROC curves for censored survival data and a diagnostic marker.

Authors:  P J Heagerty; T Lumley; M S Pepe
Journal:  Biometrics       Date:  2000-06       Impact factor: 2.571

3.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.

Authors:  E W Steyerberg; F E Harrell; G J Borsboom; M J Eijkemans; Y Vergouwe; J D Habbema
Journal:  J Clin Epidemiol       Date:  2001-08       Impact factor: 6.437

Review 4.  Risk prediction models: II. External validation, model updating, and impact assessment.

Authors:  Karel G M Moons; Andre Pascal Kengne; Diederick E Grobbee; Patrick Royston; Yvonne Vergouwe; Douglas G Altman; Mark Woodward
Journal:  Heart       Date:  2012-03-07       Impact factor: 5.994

Review 5.  Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

Authors:  F E Harrell; K L Lee; D B Mark
Journal:  Stat Med       Date:  1996-02-28       Impact factor: 2.373

6.  Online cross-validation-based ensemble learning.

Authors:  David Benkeser; Cheng Ju; Sam Lendle; Mark van der Laan
Journal:  Stat Med       Date:  2017-05-04       Impact factor: 2.373

7.  Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies.

Authors:  Wenjing Zheng; Laura Balzer; Mark van der Laan; Maya Petersen
Journal:  Stat Med       Date:  2017-04-06       Impact factor: 2.373

8.  Super-Learning of an Optimal Dynamic Treatment Rule.

Authors:  Alexander R Luedtke; Mark J van der Laan
Journal:  Int J Biostat       Date:  2016-05-01       Impact factor: 0.968

9.  A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.

Authors:  Mark van der Laan
Journal:  Int J Biostat       Date:  2017-10-12       Impact factor: 0.968

10.  Validation of prediction models: examining temporal and geographic stability of baseline risk and estimated covariate effects.

Authors:  Peter C Austin; David van Klaveren; Yvonne Vergouwe; Daan Nieboer; Douglas S Lee; Ewout W Steyerberg
Journal:  Diagn Progn Res       Date:  2017-04-13
View more
  2 in total

1.  Accounting for motion in resting-state fMRI: What part of the spectrum are we characterizing in autism spectrum disorder?

Authors:  Mary Beth Nebel; Daniel E Lidstone; Liwei Wang; David Benkeser; Stewart H Mostofsky; Benjamin B Risk
Journal:  Neuroimage       Date:  2022-05-10       Impact factor: 7.400

2.  Testing a global null hypothesis using ensemble machine learning methods.

Authors:  Sunwoo Han; Youyi Fong; Ying Huang
Journal:  Stat Med       Date:  2022-03-07       Impact factor: 2.497

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.