Literature DB >> 16500931

Avoiding model selection bias in small-sample genomic datasets.

Daniel Berrar1, Ian Bradbury, Werner Dubitzky.   

Abstract

MOTIVATION: Genomic datasets generated by high-throughput technologies are typically characterized by a moderate number of samples and a large number of measurements per sample. As a consequence, classification models are commonly compared based on resampling techniques. This investigation discusses the conceptual difficulties involved in comparative classification studies. Conclusions derived from such studies are often optimistically biased, because the apparent differences in performance are usually not controlled in a statistically stringent framework taking into account the adopted sampling strategy. We investigate this problem by means of a comparison of various classifiers in the context of multiclass microarray data.
RESULTS: Commonly used accuracy-based performance values, with or without confidence intervals, are inadequate for comparing classifiers for small-sample data. We present a statistical methodology that avoids bias in cross-validated model selection in the context of small-sample scenarios. This methodology is valid for both k-fold cross-validation and repeated random sampling.

Mesh:

Substances:

Year:  2006        PMID: 16500931     DOI: 10.1093/bioinformatics/btl066

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

Review 1.  Reuse of public genome-wide gene expression data.

Authors:  Johan Rung; Alvis Brazma
Journal:  Nat Rev Genet       Date:  2012-12-27       Impact factor: 53.242

2.  Bias correction for selecting the minimal-error classifier from many machine learning models.

Authors:  Ying Ding; Shaowu Tang; Serena G Liao; Jia Jia; Steffi Oesterreich; Yan Lin; George C Tseng
Journal:  Bioinformatics       Date:  2014-08-01       Impact factor: 6.937

3.  De novo deciphering three-dimensional chromatin interaction and topological domains by wavelet transformation of epigenetic profiles.

Authors:  Yong Chen; Yunfei Wang; Zhenyu Xuan; Min Chen; Michael Q Zhang
Journal:  Nucleic Acids Res       Date:  2016-04-07       Impact factor: 16.971

4.  Evaluating microarray-based classifiers: an overview.

Authors:  A-L Boulesteix; C Strobl; T Augustin; M Daumer
Journal:  Cancer Inform       Date:  2008-02-29

5.  Predicting classifier performance with limited training data: applications to computer-aided diagnosis in breast and prostate cancer.

Authors:  Ajay Basavanhally; Satish Viswanath; Anant Madabhushi
Journal:  PLoS One       Date:  2015-05-18       Impact factor: 3.240

6.  A Novel Audiovisual Brain-Computer Interface and Its Application in Awareness Detection.

Authors:  Fei Wang; Yanbin He; Jiahui Pan; Qiuyou Xie; Ronghao Yu; Rui Zhang; Yuanqing Li
Journal:  Sci Rep       Date:  2015-06-30       Impact factor: 4.379

7.  Stratification bias in low signal microarray studies.

Authors:  Brian J Parker; Simon Günter; Justin Bedo
Journal:  BMC Bioinformatics       Date:  2007-09-02       Impact factor: 3.169

8.  CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data.

Authors:  M Slawski; M Daumer; A-L Boulesteix
Journal:  BMC Bioinformatics       Date:  2008-10-16       Impact factor: 3.169

9.  Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions.

Authors:  Quentin Noirhomme; Damien Lesenfants; Francisco Gomez; Andrea Soddu; Jessica Schrouff; Gaëtan Garraux; André Luxen; Christophe Phillips; Steven Laureys
Journal:  Neuroimage Clin       Date:  2014-04-13       Impact factor: 4.881

10.  RiGoR: reporting guidelines to address common sources of bias in risk model development.

Authors:  Kathleen F Kerr; Allison Meisner; Heather Thiessen-Philbrook; Steven G Coca; Chirag R Parikh
Journal:  Biomark Res       Date:  2015-01-24
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.