Literature DB >> 20581402

Over-optimism in bioinformatics: an illustration.

Monika Jelizarow1, Vincent Guillemot, Arthur Tenenhaus, Korbinian Strimmer, Anne-Laure Boulesteix.   

Abstract

MOTIVATION: In statistical bioinformatics research, different optimization mechanisms potentially lead to 'over-optimism' in published papers. So far, however, a systematic critical study concerning the various sources underlying this over-optimism is lacking.
RESULTS: We present an empirical study on over-optimism using high-dimensional classification as example. Specifically, we consider a 'promising' new classification algorithm, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. While this approach yields poor results in terms of error rate, we quantitatively demonstrate that it can artificially seem superior to existing approaches if we 'fish for significance'. The investigated sources of over-optimism include the optimization of datasets, of settings, of competing methods and, most importantly, of the method's characteristics. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should always be demonstrated on independent validation data. AVAILABILITY: The R codes and relevant data can be downloaded from http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/overoptimism/, such that the study is completely reproducible.

Mesh:

Year:  2010        PMID: 20581402     DOI: 10.1093/bioinformatics/btq323

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  31 in total

1.  Improving validation practices in "omics" research.

Authors:  John P A Ioannidis; Muin J Khoury
Journal:  Science       Date:  2011-12-02       Impact factor: 47.728

2.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

Authors:  Sunghee Oh; Dongwan D Kang; Guy N Brock; George C Tseng
Journal:  Bioinformatics       Date:  2010-11-02       Impact factor: 6.937

3.  Multiple-rule bias in the comparison of classification rules.

Authors:  Mohammadmahdi R Yousefi; Jianping Hua; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

4.  An empirical assessment of validation practices for molecular classifiers.

Authors:  Peter J Castaldi; Issa J Dahabreh; John P A Ioannidis
Journal:  Brief Bioinform       Date:  2011-02-07       Impact factor: 11.622

5.  survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.

Authors:  Markus S Schröder; Aedín C Culhane; John Quackenbush; Benjamin Haibe-Kains
Journal:  Bioinformatics       Date:  2011-09-07       Impact factor: 6.937

6.  High-dimensional bolstered error estimation.

Authors:  Chao Sima; Ulisses M Braga-Neto; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-09-13       Impact factor: 6.937

7.  Performance reproducibility index for classification.

Authors:  Mohammadmahdi R Yousefi; Edward R Dougherty
Journal:  Bioinformatics       Date:  2012-09-06       Impact factor: 6.937

8.  Prediction of ischemic events on the basis of transcriptomic and genomic profiling in patients undergoing carotid endarterectomy.

Authors:  Lasse Folkersen; Jonas Persson; Johan Ekstrand; Hanna E Agardh; Göran K Hansson; Anders Gabrielsen; Ulf Hedin; Gabrielle Paulsson-Berne
Journal:  Mol Med       Date:  2012-05-09       Impact factor: 6.354

9.  On the optimistic performance evaluation of newly introduced bioinformatic methods.

Authors:  Rory Wilson; Anne-Laure Boulesteix; Stefan Buchka; Alexander Hapfelmeier; Paul P Gardner
Journal:  Genome Biol       Date:  2021-05-11       Impact factor: 13.583

10.  Computational prediction of polycomb-associated long non-coding RNAs.

Authors:  Galina V Glazko; Boris L Zybailov; Igor B Rogozin
Journal:  PLoS One       Date:  2012-09-13       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.