Literature DB >> 21546390

Multiple-rule bias in the comparison of classification rules.

Mohammadmahdi R Yousefi1, Jianping Hua, Edward R Dougherty.   

Abstract

MOTIVATION: There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule.
RESULTS: This article provides a careful probabilistic analysis of the second issue and the 'multiple-rule bias', resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. AVAILABILITY: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included.

Entities:  

Mesh:

Year:  2011        PMID: 21546390      PMCID: PMC3106200          DOI: 10.1093/bioinformatics/btr262

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors:  Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal:  Genome Res       Date:  2005-05       Impact factor: 9.043

2.  Confidence intervals for the true classification error conditioned on the estimated error.

Authors:  Qian Xu; Jianping Hua; Ulisses Braga-Neto; Zixinag Xiong; Edward Suh; Edward R Dougherty
Journal:  Technol Cancer Res Treat       Date:  2006-12

3.  The molecular classification of multiple myeloma.

Authors:  Fenghuang Zhan; Yongsheng Huang; Simona Colla; James P Stewart; Ichiro Hanamura; Sushil Gupta; Joshua Epstein; Shmuel Yaccoby; Jeffrey Sawyer; Bart Burington; Elias Anaissie; Klaus Hollmig; Mauricio Pineda-Roman; Guido Tricot; Frits van Rhee; Ronald Walker; Maurizio Zangari; John Crowley; Bart Barlogie; John D Shaughnessy
Journal:  Blood       Date:  2006-05-25       Impact factor: 22.113

4.  Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors:  Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

5.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

6.  Over-optimism in bioinformatics research.

Authors:  Anne-Laure Boulesteix
Journal:  Bioinformatics       Date:  2009-11-26       Impact factor: 6.937

7.  Small-sample precision of ROC-related estimates.

Authors:  Blaise Hanczar; Jianping Hua; Chao Sima; John Weinstein; Michael Bittner; Edward R Dougherty
Journal:  Bioinformatics       Date:  2010-02-03       Impact factor: 6.937

8.  Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction.

Authors:  Anne-Laure Boulesteix; Carolin Strobl
Journal:  BMC Med Res Methodol       Date:  2009-12-21       Impact factor: 4.615

9.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.

Authors:  Eng-Juh Yeoh; Mary E Ross; Sheila A Shurtleff; W Kent Williams; Divyen Patel; Rami Mahfouz; Fred G Behm; Susana C Raimondi; Mary V Relling; Anami Patel; Cheng Cheng; Dario Campana; Dawn Wilkins; Xiaodong Zhou; Jinyan Li; Huiqing Liu; Ching-Hon Pui; William E Evans; Clayton Naeve; Limsoon Wong; James R Downing
Journal:  Cancer Cell       Date:  2002-03       Impact factor: 31.743

10.  Novel endothelial cell markers in hepatocellular carcinoma.

Authors:  Xin Chen; John Higgins; Siu-Tim Cheung; Rui Li; Veronica Mason; Kelli Montgomery; Sheung-Tat Fan; Matt van de Rijn; Samuel So
Journal:  Mod Pathol       Date:  2004-10       Impact factor: 7.842

View more
  7 in total

1.  Performance reproducibility index for classification.

Authors:  Mohammadmahdi R Yousefi; Edward R Dougherty
Journal:  Bioinformatics       Date:  2012-09-06       Impact factor: 6.937

2.  The illusion of distribution-free small-sample classification in genomics.

Authors:  Edward R Dougherty; Amin Zollanvari; Ulisses M Braga-Neto
Journal:  Curr Genomics       Date:  2011-08       Impact factor: 2.236

3.  Modeling the next generation sequencing sample processing pipeline for the purposes of classification.

Authors:  Noushin Ghaffari; Mohammadmahdi R Yousefi; Charles D Johnson; Ivan Ivanov; Edward R Dougherty
Journal:  BMC Bioinformatics       Date:  2013-10-11       Impact factor: 3.169

4.  Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation.

Authors:  Ioannis Tsamardinos; Elissavet Greasidou; Giorgos Borboudakis
Journal:  Mach Learn       Date:  2018-05-09       Impact factor: 2.940

5.  On optimal Bayesian classification and risk estimation under multiple classes.

Authors:  Lori A Dalton; Mohammadmahdi R Yousefi
Journal:  EURASIP J Bioinform Syst Biol       Date:  2015-10-24

6.  On the impoverishment of scientific education.

Authors:  Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2013-11-11

7.  A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.

Authors:  Mizanur Khondoker; Richard Dobson; Caroline Skirrow; Andrew Simmons; Daniel Stahl
Journal:  Stat Methods Med Res       Date:  2013-09-18       Impact factor: 3.021

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.