Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms.

Literature DB >> 23845182

Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms.

Christoph Bernau¹, Thomas Augustin, Anne-Laure Boulesteix.

Abstract

High-dimensional binary classification tasks, for example, the classification of microarray samples into normal and cancer tissues, usually involve a tuning parameter. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are obtained. For correcting this tuning bias, we develop a new method which is based on a decomposition of the unconditional error rate involving the tuning procedure, that is, we estimate the error rate of wrapper algorithms as introduced in the context of internal cross-validation (ICV) by Varma and Simon (2006, BMC Bioinformatics 7, 91). Our subsampling-based estimator can be written as a weighted mean of the errors obtained using the different tuning parameter values, and thus can be interpreted as a smooth version of ICV, which is the standard approach for avoiding tuning bias. In contrast to ICV, our method guarantees intuitive bounds for the corrected error. Additionally, we suggest to use bias correction methods also to address the conceptually similar method selection bias that results from the optimal choice of the classification method itself when evaluating several methods successively. We demonstrate the performance of our method on microarray and simulated data and compare it to ICV. This study suggests that our approach yields competitive estimates at a much lower computational price.

Entities: Chemical Disease

Keywords: Classification; High-dimensional data; Method selection bias; Repeated subsampling; Tuning bias

Mesh：

Year: 2013 PMID： 23845182 DOI： 10.1111/biom.12041

Source DB: PubMed Journal: Biometrics ISSN： 0006-341X Impact factor: 2.571

Keyword Cloud
Cited

9 in total

1. Bias correction for selecting the minimal-error classifier from many machine learning models.

Authors: Ying Ding; Shaowu Tang; Serena G Liao; Jia Jia; Steffi Oesterreich; Yan Lin; George C Tseng
Journal: Bioinformatics Date: 2014-08-01 Impact factor: 6.937

2. Cross-validation pitfalls when selecting and assessing regression and classification models.

Authors: Damjan Krstajic; Ljubomir J Buturovic; David E Leahy; Simon Thomas
Journal: J Cheminform Date: 2014-03-29 Impact factor: 5.514

3. Type I error control for tree classification.

Authors: Sin-Ho Jung; Yong Chen; Hongshik Ahn
Journal: Cancer Inform Date: 2014-11-16

Review 4. Methodological issues in current practice may lead to bias in the development of biomarker combinations for predicting acute kidney injury.

Authors: Allison Meisner; Kathleen F Kerr; Heather Thiessen-Philbrook; Steven G Coca; Chirag R Parikh
Journal: Kidney Int Date: 2016-02 Impact factor: 10.612

5. Transcriptome assists prognosis of disease severity in respiratory syncytial virus infected infants.

Authors: Victor L Jong; Inge M L Ahout; Henk-Jan van den Ham; Jop Jans; Fatiha Zaaraoui-Boutahar; Aldert Zomer; Elles Simonetti; Maarten A Bijl; H Kim Brand; Wilfred F J van IJcken; Marien I de Jonge; Pieter L Fraaij; Ronald de Groot; Albert D M E Osterhaus; Marinus J Eijkemans; Gerben Ferwerda; Arno C Andeweg
Journal: Sci Rep Date: 2016-11-11 Impact factor: 4.379

6. Using ordinal outcomes to construct and select biomarker combinations for single-level prediction.

Authors: Allison Meisner; Chirag R Parikh; Kathleen F Kerr
Journal: Diagn Progn Res Date: 2018-05-21

7. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation.

Authors: Ioannis Tsamardinos; Elissavet Greasidou; Giorgos Borboudakis
Journal: Mach Learn Date: 2018-05-09 Impact factor: 2.940

8. A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization.

Authors: Roman Hornung; Christoph Bernau; Caroline Truntzer; Rory Wilson; Thomas Stadler; Anne-Laure Boulesteix
Journal: BMC Med Res Methodol Date: 2015-11-04 Impact factor: 4.615

9. On the overestimation of random forest's out-of-bag error.

Authors: Silke Janitza; Roman Hornung
Journal: PLoS One Date: 2018-08-06 Impact factor: 3.240

9 in total