Literature DB >> 15297303

How many samples are needed to build a classifier: a general sequential approach.

Wenjiang J Fu1, Edward R Dougherty, Bani Mallick, Raymond J Carroll.   

Abstract

MOTIVATION: The standard paradigm for a classifier design is to obtain a sample of feature-label pairs and then to apply a classification rule to derive a classifier from the sample data. Typically in laboratory situations the sample size is limited by cost, time or availability of sample material. Thus, an investigator may wish to consider a sequential approach in which there is a sufficient number of patients to train a classifier in order to make a sound decision for diagnosis while at the same time keeping the number of patients as small as possible to make the studies affordable.
RESULTS: A sequential classification procedure is studied via the martingale central limit theorem. It updates the classification rule at each step and provides stopping criteria to ensure with a certain confidence that at stopping a future subject will have misclassification probability smaller than a predetermined threshold. Simulation studies and applications to microarray data analysis are provided. The procedure possesses several attractive properties: (1) it updates the classification rule sequentially and thus does not rely on distributions of primary measurements from other studies; (2) it assesses the stopping criteria at each sequential step and thus can substantially reduce cost via early stopping; and (3) it is not restricted to any particular classification rule and therefore applies to any parametric or non-parametric method, including feature selection or extraction. AVAILABILITY: R-code for the sequential stopping rule is available at http://stat.tamu.edu/~wfu/microarray/sequential/R-code.html

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15297303     DOI: 10.1093/bioinformatics/bth461

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

Review 1.  Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology.

Authors:  Wenjiang J Fu; Arnold J Stromberg; Kert Viele; Raymond J Carroll; Guoyao Wu
Journal:  J Nutr Biochem       Date:  2010-03-16       Impact factor: 6.048

2.  A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions.

Authors:  Kevin K Dobbin
Journal:  Biostatistics       Date:  2008-11-27       Impact factor: 5.899

3.  Development and Validation of Biomarker Classifiers for Treatment Selection.

Authors:  Richard Simon
Journal:  J Stat Plan Inference       Date:  2008-02-01       Impact factor: 1.111

Review 4.  Gut-host Crosstalk: Methodological and Computational Challenges.

Authors:  Ivan Ivanov
Journal:  Dig Dis Sci       Date:  2020-03       Impact factor: 3.199

5.  Bias-corrected diagonal discriminant rules for high-dimensional classification.

Authors:  Song Huang; Tiejun Tong; Hongyu Zhao
Journal:  Biometrics       Date:  2010-12       Impact factor: 2.571

6.  Optimally splitting cases for training and testing high dimensional classifiers.

Authors:  Kevin K Dobbin; Richard M Simon
Journal:  BMC Med Genomics       Date:  2011-04-08       Impact factor: 3.063

7.  A simulation-approximation approach to sample size planning for high-dimensional classification studies.

Authors:  Perry de Valpine; Hans-Marcus Bitter; Michael P S Brown; Jonathan Heller
Journal:  Biostatistics       Date:  2009-02-21       Impact factor: 5.899

8.  Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment.

Authors:  Li Shao; Xiaohui Fan; Ningtao Cheng; Leihong Wu; Yiyu Cheng
Journal:  PLoS One       Date:  2013-07-05       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.