Literature DB >> 26372408

Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications.

Silke Janitza1, Harald Binder2, Anne-Laure Boulesteix1.   

Abstract

The bootstrap method has become a widely used tool applied in diverse areas where results based on asymptotic theory are scarce. It can be applied, for example, for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently, some approaches have been proposed in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it were the original sample. P-values computed from bootstrap samples have been used, for example, in the statistics and bioinformatics literature for ranking genes with respect to their differential expression, for estimating the variability of p-values and for model stability investigations. Procedures which make use of bootstrapped information criteria are often applied in model stability investigations and model averaging approaches as well as when estimating the error of model selection procedures which involve tuning parameters. From the literature, however, there is evidence that p-values and model selection criteria evaluated on bootstrap data sets do not represent what would be obtained on the original data or new data drawn from the overall population. We explain the reasons for this and, through the use of a real data set and simulations, we assess the practical impact on procedures relevant to biometrical applications in cases where it has not yet been studied. Moreover, we investigate the behavior of subsampling (i.e., drawing from a data set without replacement) as a potential alternative solution to the bootstrap for these procedures.
© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Keywords:  Bootstrap; Bootstrapped information criteria; Bootstrapped p-values; Bootstrapped test statistic; Tests on bootstrap samples

Mesh:

Year:  2015        PMID: 26372408     DOI: 10.1002/bimj.201400246

Source DB:  PubMed          Journal:  Biom J        ISSN: 0323-3847            Impact factor:   2.207


  5 in total

1.  Analyzing large datasets with bootstrap penalization.

Authors:  Kuangnan Fang; Shuangge Ma
Journal:  Biom J       Date:  2016-11-21       Impact factor: 1.715

2.  Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.

Authors:  Simone Wahl; Anne-Laure Boulesteix; Astrid Zierer; Barbara Thorand; Mark A van de Wiel
Journal:  BMC Med Res Methodol       Date:  2016-10-26       Impact factor: 4.615

Review 3.  An Update on Statistical Boosting in Biomedicine.

Authors:  Andreas Mayr; Benjamin Hofner; Elisabeth Waldmann; Tobias Hepp; Sebastian Meyer; Olaf Gefeller
Journal:  Comput Math Methods Med       Date:  2017-08-02       Impact factor: 2.238

4.  State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues.

Authors:  Willi Sauerbrei; Aris Perperoglou; Matthias Schmid; Michal Abrahamowicz; Heiko Becher; Harald Binder; Daniela Dunkler; Frank E Harrell; Patrick Royston; Georg Heinze
Journal:  Diagn Progn Res       Date:  2020-04-02

5.  Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling.

Authors:  Christine Wallisch; Daniela Dunkler; Geraldine Rauch; Riccardo de Bin; Georg Heinze
Journal:  Stat Med       Date:  2020-10-21       Impact factor: 2.373

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.