Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multiple-rule bias in the comparison of classification rules.

Literature DB >> 21546390

Multiple-rule bias in the comparison of classification rules.

Mohammadmahdi R Yousefi¹, Jianping Hua, Edward R Dougherty.

Abstract

MOTIVATION: There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule.
RESULTS: This article provides a careful probabilistic analysis of the second issue and the 'multiple-rule bias', resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. AVAILABILITY: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included.

Entities: Gene

Mesh：

Year: 2011 PMID： 21546390 PMCID： PMC3106200 DOI： 10.1093/bioinformatics/btr262

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

15 in total

1. Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors: Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal: Genome Res Date: 2005-05 Impact factor: 9.043

2. Confidence intervals for the true classification error conditioned on the estimated error.

Authors: Qian Xu; Jianping Hua; Ulisses Braga-Neto; Zixinag Xiong; Edward Suh; Edward R Dougherty
Journal: Technol Cancer Res Treat Date: 2006-12

3. The molecular classification of multiple myeloma.

Authors: Fenghuang Zhan; Yongsheng Huang; Simona Colla; James P Stewart; Ichiro Hanamura; Sushil Gupta; Joshua Epstein; Shmuel Yaccoby; Jeffrey Sawyer; Bart Burington; Elias Anaissie; Klaus Hollmig; Mauricio Pineda-Roman; Guido Tricot; Frits van Rhee; Ronald Walker; Maurizio Zangari; John Crowley; Bart Barlogie; John D Shaughnessy
Journal: Blood Date: 2006-05-25 Impact factor: 22.113

4. Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors: Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal: EURASIP J Bioinform Syst Biol Date: 2007

5. Validation of computational methods in genomics.

Authors: Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal: Curr Genomics Date: 2007-03 Impact factor: 2.236

6. Over-optimism in bioinformatics research.

Authors: Anne-Laure Boulesteix
Journal: Bioinformatics Date: 2009-11-26 Impact factor: 6.937

7. Small-sample precision of ROC-related estimates.

Authors: Blaise Hanczar; Jianping Hua; Chao Sima; John Weinstein; Michael Bittner; Edward R Dougherty
Journal: Bioinformatics Date: 2010-02-03 Impact factor: 6.937

8. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction.

Authors: Anne-Laure Boulesteix; Carolin Strobl
Journal: BMC Med Res Methodol Date: 2009-12-21 Impact factor: 4.615

9. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.

Authors: Eng-Juh Yeoh; Mary E Ross; Sheila A Shurtleff; W Kent Williams; Divyen Patel; Rami Mahfouz; Fred G Behm; Susana C Raimondi; Mary V Relling; Anami Patel; Cheng Cheng; Dario Campana; Dawn Wilkins; Xiaodong Zhou; Jinyan Li; Huiqing Liu; Ching-Hon Pui; William E Evans; Clayton Naeve; Limsoon Wong; James R Downing
Journal: Cancer Cell Date: 2002-03 Impact factor: 31.743

7. A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.

Authors: Mizanur Khondoker; Richard Dobson; Caroline Skirrow; Andrew Simmons; Daniel Stahl
Journal: Stat Methods Med Res Date: 2013-09-18 Impact factor: 3.021

7 in total

Multiple-rule bias in the comparison of classification rules.

1. Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

2. Confidence intervals for the true classification error conditioned on the estimated error.

3. The molecular classification of multiple myeloma.

4. Decorrelation of the true and estimated classifier errors in high-dimensional settings.

5. Validation of computational methods in genomics.

6. Over-optimism in bioinformatics research.

7. Small-sample precision of ROC-related estimates.

8. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction.

9. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.

10. Novel endothelial cell markers in hepatocellular carcinoma.

1. Performance reproducibility index for classification.

2. The illusion of distribution-free small-sample classification in genomics.

3. Modeling the next generation sequencing sample processing pipeline for the purposes of classification.

4. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation.

5. On optimal Bayesian classification and risk estimation under multiple classes.

6. On the impoverishment of scientific education.

7. A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.