Literature DB >> 18384265

Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples.

Harald Binder1, Martin Schumacher.   

Abstract

The bootstrap is a tool that allows for efficient evaluation of prediction performance of statistical techniques without having to set aside data for validation. This is especially important for high-dimensional data, e.g., arising from microarrays, because there the number of observations is often limited. For avoiding overoptimism the statistical technique to be evaluated has to be applied to every bootstrap sample in the same manner it would be used on new data. This includes a selection of complexity, e.g., the number of boosting steps for gradient boosting algorithms. Using the latter, we demonstrate in a simulation study that complexity selection in conventional bootstrap samples, drawn with replacement, is severely biased in many scenarios. This translates into a considerable bias of prediction error estimates, often underestimating the amount of information that can be extracted from high-dimensional data. Potential remedies for this complexity selection bias, such as alternatively using a fixed level of complexity or of using sampling without replacement are investigated and it is shown that the latter works well in many settings. We focus on high-dimensional binary response data, with bootstrap .632+ estimates of the Brier score for performance evaluation, and censored time-to-event data with .632+ prediction error curve estimates. The latter, with the modified bootstrap procedure, is then applied to an example with microarray data from patients with diffuse large B-cell lymphoma.

Entities:  

Mesh:

Year:  2008        PMID: 18384265     DOI: 10.2202/1544-6115.1346

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  16 in total

1.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Authors:  Ulla B Mogensen; Hemant Ishwaran; Thomas A Gerds
Journal:  J Stat Softw       Date:  2012-09       Impact factor: 6.440

2.  Gene promoter methylation signature predicts survival of head and neck squamous cell carcinoma patients.

Authors:  Efterpi Kostareli; Thomas Hielscher; Manuela Zucknick; Lorena Baboci; Gunnar Wichmann; Dana Holzinger; Oliver Mücke; Michael Pawlita; Annarosa Del Mistro; Paolo Boscolo-Rizzo; Maria Cristina Da Mosto; Giancarlo Tirelli; Peter Plinkert; Andreas Dietz; Christoph Plass; Dieter Weichenhan; Jochen Hess
Journal:  Epigenetics       Date:  2016-01-19       Impact factor: 4.528

3.  Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context.

Authors:  Gad Abraham; Adam Kowalczyk; Sherene Loi; Izhak Haviv; Justin Zobel
Journal:  BMC Bioinformatics       Date:  2010-05-25       Impact factor: 3.169

4.  A Metabolome-Wide Association Study of Kidney Function and Disease in the General Population.

Authors:  Peggy Sekula; Oemer-Necmi Goek; Lydia Quaye; Clara Barrios; Andrew S Levey; Werner Römisch-Margl; Cristina Menni; Idil Yet; Christian Gieger; Lesley A Inker; Jerzy Adamski; Wolfram Gronwald; Thomas Illig; Katja Dettmer; Jan Krumsiek; Peter J Oefner; Ana M Valdes; Christa Meisinger; Josef Coresh; Tim D Spector; Robert P Mohney; Karsten Suhre; Gabi Kastenmüller; Anna Köttgen
Journal:  J Am Soc Nephrol       Date:  2015-10-08       Impact factor: 10.121

5.  Transcriptome analysis in patients with progressive coronary artery disease: identification of differential gene expression in peripheral blood.

Authors:  Thomas G Nührenberg; Nicole Langwieser; Harald Binder; Thorsten Kurz; Christian Stratz; Rolf-Peter Kienzle; Dietmar Trenk; Dietlind Zohlnhöfer-Momm; Franz-Josef Neumann
Journal:  J Cardiovasc Transl Res       Date:  2012-11-29       Impact factor: 4.132

6.  Evaluating microarray-based classifiers: an overview.

Authors:  A-L Boulesteix; C Strobl; T Augustin; M Daumer
Journal:  Cancer Inform       Date:  2008-02-29

7.  Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.

Authors:  Stefanie Hieke; Axel Benner; Richard F Schlenk; Martin Schumacher; Lars Bullinger; Harald Binder
Journal:  PLoS One       Date:  2016-05-09       Impact factor: 3.240

8.  The validation and assessment of machine learning: a game of prediction from high-dimensional data.

Authors:  Tune H Pers; Anders Albrechtsen; Claus Holst; Thorkild I A Sørensen; Thomas A Gerds
Journal:  PLoS One       Date:  2009-08-04       Impact factor: 3.240

9.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models.

Authors:  Harald Binder; Martin Schumacher
Journal:  BMC Bioinformatics       Date:  2009-01-13       Impact factor: 3.169

10.  Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

Authors:  Murat Sariyar; Isabell Hoffmann; Harald Binder
Journal:  BMC Bioinformatics       Date:  2014-02-26       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.