Literature DB >> 22954625

Performance reproducibility index for classification.

Mohammadmahdi R Yousefi1, Edward R Dougherty.   

Abstract

MOTIVATION: A common practice in biomarker discovery is to decide whether a large laboratory experiment should be carried out based on the results of a preliminary study on a small set of specimens. Consideration of the efficacy of this approach motivates the introduction of a probabilistic measure, for whether a classifier showing promising results in a small-sample preliminary study will perform similarly on a large independent sample. Given the error estimate from the preliminary study, if the probability of reproducible error is low, then there is really no purpose in substantially allocating more resources to a large follow-on study. Indeed, if the probability of the preliminary study providing likely reproducible results is small, then why even perform the preliminary study?
RESULTS: This article introduces a reproducibility index for classification, measuring the probability that a sufficiently small error estimate on a small sample will motivate a large follow-on study. We provide a simulation study based on synthetic distribution models that possess known intrinsic classification difficulties and emulate real-world scenarios. We also set up similar simulations on four real datasets to show the consistency of results. The reproducibility indices for different distributional models, real datasets and classification schemes are empirically calculated. The effects of reporting and multiple-rule biases on the reproducibility index are also analyzed. AVAILABILITY: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routine and error estimation methods. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi12a/.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22954625      PMCID: PMC3476329          DOI: 10.1093/bioinformatics/bts509

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

1.  Biomarker development: prudence, risk, and reproducibility.

Authors:  Edward R Dougherty
Journal:  Bioessays       Date:  2012-02-15       Impact factor: 4.345

2.  Validation of gene regulatory networks: scientific and inferential.

Authors:  Edward R Dougherty
Journal:  Brief Bioinform       Date:  2010-12-22       Impact factor: 11.622

3.  Apparently low reproducibility of true differential expression discoveries in microarray studies.

Authors:  Min Zhang; Chen Yao; Zheng Guo; Jinfeng Zou; Lin Zhang; Hui Xiao; Dong Wang; Da Yang; Xue Gong; Jing Zhu; Yanhui Li; Xia Li
Journal:  Bioinformatics       Date:  2008-07-16       Impact factor: 6.937

Review 4.  Stability and aggregation of ranked gene lists.

Authors:  Anne-Laure Boulesteix; Martin Slawski
Journal:  Brief Bioinform       Date:  2009-09       Impact factor: 11.622

5.  Over-optimism in bioinformatics research.

Authors:  Anne-Laure Boulesteix
Journal:  Bioinformatics       Date:  2009-11-26       Impact factor: 6.937

6.  An empirical assessment of validation practices for molecular classifiers.

Authors:  Peter J Castaldi; Issa J Dahabreh; John P A Ioannidis
Journal:  Brief Bioinform       Date:  2011-02-07       Impact factor: 11.622

7.  Novel endothelial cell markers in hepatocellular carcinoma.

Authors:  Xin Chen; John Higgins; Siu-Tim Cheung; Rui Li; Veronica Mason; Kelli Montgomery; Sheung-Tat Fan; Matt van de Rijn; Samuel So
Journal:  Mod Pathol       Date:  2004-10       Impact factor: 7.842

8.  The illusion of distribution-free small-sample classification in genomics.

Authors:  Edward R Dougherty; Amin Zollanvari; Ulisses M Braga-Neto
Journal:  Curr Genomics       Date:  2011-08       Impact factor: 2.236

9.  Proteomics in melanoma biomarker discovery: great potential, many obstacles.

Authors:  Michael S Sabel; Yashu Liu; David M Lubman
Journal:  Int J Proteomics       Date:  2011-10-11

10.  Why most published research findings are false.

Authors:  John P A Ioannidis
Journal:  PLoS Med       Date:  2005-08-30       Impact factor: 11.613

View more
  3 in total

Review 1.  High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries.

Authors:  Amin Zollanvari
Journal:  Cancer Inform       Date:  2016-04-12

2.  On optimal Bayesian classification and risk estimation under multiple classes.

Authors:  Lori A Dalton; Mohammadmahdi R Yousefi
Journal:  EURASIP J Bioinform Syst Biol       Date:  2015-10-24

3.  Data Requirements for Model-Based Cancer Prognosis Prediction.

Authors:  Lori A Dalton; Mohammadmahdi R Yousefi
Journal:  Cancer Inform       Date:  2016-04-21
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.