Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Avoiding model selection bias in small-sample genomic datasets.

Literature DB >> 16500931

Avoiding model selection bias in small-sample genomic datasets.

Daniel Berrar¹, Ian Bradbury, Werner Dubitzky.

Abstract

MOTIVATION: Genomic datasets generated by high-throughput technologies are typically characterized by a moderate number of samples and a large number of measurements per sample. As a consequence, classification models are commonly compared based on resampling techniques. This investigation discusses the conceptual difficulties involved in comparative classification studies. Conclusions derived from such studies are often optimistically biased, because the apparent differences in performance are usually not controlled in a statistically stringent framework taking into account the adopted sampling strategy. We investigate this problem by means of a comparison of various classifiers in the context of multiclass microarray data.
RESULTS: Commonly used accuracy-based performance values, with or without confidence intervals, are inadequate for comparing classifiers for small-sample data. We present a statistical methodology that avoids bias in cross-validated model selection in the context of small-sample scenarios. This methodology is valid for both k-fold cross-validation and repeated random sampling.

Mesh：

Substances：

Year: 2006 PMID： 16500931 DOI： 10.1093/bioinformatics/btl066

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

12 in total

Review 1. Reuse of public genome-wide gene expression data.

Authors: Johan Rung; Alvis Brazma
Journal: Nat Rev Genet Date: 2012-12-27 Impact factor: 53.242

2. Bias correction for selecting the minimal-error classifier from many machine learning models.

Authors: Ying Ding; Shaowu Tang; Serena G Liao; Jia Jia; Steffi Oesterreich; Yan Lin; George C Tseng
Journal: Bioinformatics Date: 2014-08-01 Impact factor: 6.937

3. De novo deciphering three-dimensional chromatin interaction and topological domains by wavelet transformation of epigenetic profiles.

Authors: Yong Chen; Yunfei Wang; Zhenyu Xuan; Min Chen; Michael Q Zhang
Journal: Nucleic Acids Res Date: 2016-04-07 Impact factor: 16.971

4. Evaluating microarray-based classifiers: an overview.

Authors: A-L Boulesteix; C Strobl; T Augustin; M Daumer
Journal: Cancer Inform Date: 2008-02-29

5. Predicting classifier performance with limited training data: applications to computer-aided diagnosis in breast and prostate cancer.

Authors: Ajay Basavanhally; Satish Viswanath; Anant Madabhushi
Journal: PLoS One Date: 2015-05-18 Impact factor: 3.240

6. A Novel Audiovisual Brain-Computer Interface and Its Application in Awareness Detection.

Authors: Fei Wang; Yanbin He; Jiahui Pan; Qiuyou Xie; Ronghao Yu; Rui Zhang; Yuanqing Li
Journal: Sci Rep Date: 2015-06-30 Impact factor: 4.379

7. Stratification bias in low signal microarray studies.

Authors: Brian J Parker; Simon Günter; Justin Bedo
Journal: BMC Bioinformatics Date: 2007-09-02 Impact factor: 3.169

8. CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data.

Authors: M Slawski; M Daumer; A-L Boulesteix
Journal: BMC Bioinformatics Date: 2008-10-16 Impact factor: 3.169

9. Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions.

Authors: Quentin Noirhomme; Damien Lesenfants; Francisco Gomez; Andrea Soddu; Jessica Schrouff; Gaëtan Garraux; André Luxen; Christophe Phillips; Steven Laureys
Journal: Neuroimage Clin Date: 2014-04-13 Impact factor: 4.881

10. RiGoR: reporting guidelines to address common sources of bias in risk model development.

Authors: Kathleen F Kerr; Allison Meisner; Heather Thiessen-Philbrook; Steven G Coca; Chirag R Parikh
Journal: Biomark Res Date: 2015-01-24