Literature DB >> 30393425

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation.

Ioannis Tsamardinos1, Elissavet Greasidou1, Giorgos Borboudakis1.   

Abstract

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822-829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.

Entities:  

Keywords:  Bias correction; Cross-validation; Hyper-parameter optimization; Performance estimation

Year:  2018        PMID: 30393425      PMCID: PMC6191021          DOI: 10.1007/s10994-018-5714-4

Source DB:  PubMed          Journal:  Mach Learn        ISSN: 0885-6125            Impact factor:   2.940


  13 in total

1.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.

Authors:  Alexander Statnikov; Constantin F Aliferis; Ioannis Tsamardinos; Douglas Hardin; Shawn Levy
Journal:  Bioinformatics       Date:  2004-09-16       Impact factor: 6.937

2.  The comparison of percentages in matched samples.

Authors:  W G COCHRAN
Journal:  Biometrika       Date:  1950-12       Impact factor: 2.445

3.  Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms.

Authors:  Christoph Bernau; Thomas Augustin; Anne-Laure Boulesteix
Journal:  Biometrics       Date:  2013-07-11       Impact factor: 2.571

4.  Multiple-rule bias in the comparison of classification rules.

Authors:  Mohammadmahdi R Yousefi; Jianping Hua; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

5.  Bias correction for selecting the minimal-error classifier from many machine learning models.

Authors:  Ying Ding; Shaowu Tang; Serena G Liao; Jia Jia; Steffi Oesterreich; Yan Lin; George C Tseng
Journal:  Bioinformatics       Date:  2014-08-01       Impact factor: 6.937

6.  Evaluating the yield of medical tests.

Authors:  F E Harrell; R M Califf; D B Pryor; K L Lee; R A Rosati
Journal:  JAMA       Date:  1982-05-14       Impact factor: 56.272

7.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

8.  Bias in error estimation when using cross-validation for model selection.

Authors:  Sudhir Varma; Richard Simon
Journal:  BMC Bioinformatics       Date:  2006-02-23       Impact factor: 3.169

9.  Cross-validation pitfalls when selecting and assessing regression and classification models.

Authors:  Damjan Krstajic; Ljubomir J Buturovic; David E Leahy; Simon Thomas
Journal:  J Cheminform       Date:  2014-03-29       Impact factor: 5.514

10.  MatureP: prediction of secreted proteins with exclusive information from their mature regions.

Authors:  Georgia Orfanoudaki; Maria Markaki; Katerina Chatzi; Ioannis Tsamardinos; Anastassios Economou
Journal:  Sci Rep       Date:  2017-06-12       Impact factor: 4.379

View more
  19 in total

1.  The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach.

Authors:  Shubhayu Bhattacharyay; Ioan Milosevic; Lindsay Wilson; David K Menon; Robert D Stevens; Ewout W Steyerberg; David W Nelson; Ari Ercole
Journal:  PLoS One       Date:  2022-07-05       Impact factor: 3.752

2.  Just Add Data: automated predictive modeling for knowledge discovery and feature selection.

Authors:  Ioannis Tsamardinos; Paulos Charonyktakis; Georgios Papoutsoglou; Giorgos Borboudakis; Kleanthi Lakiotaki; Jean Claude Zenklusen; Hartmut Juhl; Ekaterini Chatzaki; Vincenzo Lagani
Journal:  NPJ Precis Oncol       Date:  2022-06-16

3.  Outcome Prediction in Critically-Ill Patients with Venous Thromboembolism and/or Cancer Using Machine Learning Algorithms: External Validation and Comparison with Scoring Systems.

Authors:  Vasiliki Danilatou; Stylianos Nikolakakis; Despoina Antonakaki; Christos Tzagkarakis; Dimitrios Mavroidis; Theodoros Kostoulas; Sotirios Ioannidis
Journal:  Int J Mol Sci       Date:  2022-06-27       Impact factor: 6.208

4.  Combination of Whole-Body Baseline CT Radiomics and Clinical Parameters to Predict Response and Survival in a Stage-IV Melanoma Cohort Undergoing Immunotherapy.

Authors:  Felix Peisen; Annika Hänsch; Alessa Hering; Andreas S Brendlin; Saif Afat; Konstantin Nikolaou; Sergios Gatidis; Thomas Eigentler; Teresa Amaral; Jan H Moltz; Ahmed E Othman
Journal:  Cancers (Basel)       Date:  2022-06-17       Impact factor: 6.575

5.  Challenging a Fundamental Proposition of Patient-Centeredness.

Authors:  Stephen Aragon; Mak Khojasteh; Montrale Boykin; Breanne Crumpton; Laura McGuinn; Sabina Gesell
Journal:  J Best Pract Health Prof Divers       Date:  2020

6.  A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity.

Authors:  Scott Bowler; Georgios Papoutsoglou; Aristides Karanikas; Ioannis Tsamardinos; Michael J Corley; Lishomwa C Ndhlovu
Journal:  Sci Rep       Date:  2022-10-19       Impact factor: 4.996

7.  Circulating cell-free DNA in breast cancer: size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers.

Authors:  Maria Panagopoulou; Makrina Karaglani; Ioanna Balgkouranidou; Eirini Biziota; Triantafillia Koukaki; Evaggelos Karamitrousis; Evangelia Nena; Ioannis Tsamardinos; George Kolios; Evi Lianidou; Stylianos Kakolyris; Ekaterini Chatzaki
Journal:  Oncogene       Date:  2019-01-14       Impact factor: 9.867

8.  Robust parametric modeling of Alzheimer's disease progression.

Authors:  Mostafa Mehdipour Ghazi; Mads Nielsen; Akshay Pai; Marc Modat; M Jorge Cardoso; Sébastien Ourselin; Lauge Sørensen
Journal:  Neuroimage       Date:  2020-10-16       Impact factor: 7.400

9.  NTAL is associated with treatment outcome, cell proliferation and differentiation in acute promyelocytic leukemia.

Authors:  Carolina Hassibe Thomé; Germano Aguiar Ferreira; Diego Antonio Pereira-Martins; Vitor Marcel Faça; Eduardo M Rego; Guilherme Augusto Dos Santos; César Alexander Ortiz; Lucas Eduardo Botelho de Souza; Lays Martins Sobral; Cleide Lúcia Araújo Silva; Priscila Santos Scheucher; Cristiane Damas Gil; Andréia Machado Leopoldino; Douglas R A Silveira; Juan L Coelho-Silva; Fabíola Traina; Luisa C Koury; Raul A M Melo; Rosane Bittencourt; Katia Pagnano; Ricardo Pasquini; Elenaide C Nunes; Evandro M Fagundes; Ana Beatriz F Gloria; Fábio Rodrigues Kerbauy; Maria de Lourdes Chauffaille; Armand Keating; Martin S Tallman; Raul C Ribeiro; Richard Dillon; Arnold Ganser; Bob Löwenberg; Peter Valk; Francesco Lo-Coco; Miguel A Sanz; Nancy Berliner
Journal:  Sci Rep       Date:  2020-06-25       Impact factor: 4.379

10.  Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data.

Authors:  Paweł Widera; Paco M J Welsing; Christoph Ladel; John Loughlin; Floris P F J Lafeber; Florence Petit Dop; Jonathan Larkin; Harrie Weinans; Ali Mobasheri; Jaume Bacardit
Journal:  Sci Rep       Date:  2020-05-21       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.