Literature DB >> 17121434

Confidence intervals for the true classification error conditioned on the estimated error.

Qian Xu1, Jianping Hua, Ulisses Braga-Neto, Zixinag Xiong, Edward Suh, Edward R Dougherty.   

Abstract

Bias and variance for small-sample error estimation are typically posed in terms of statistics for the distributions of the true and estimated errors. On the other hand, a salient practical issue asks, given an error estimate, what can be said about the true error? This question relates to the joint distribution of the true and estimated errors, specifically, the conditional expectation of the true error given the error estimate. A critical issue is that of confidence bounds for the true error given the estimate. We consider the joint distribution of the true error and the estimated error, assuming a random feature-label distribution. From it, we derive the marginal distributions, the conditional expectation of the estimated error given the true error, the conditional expectation of the true error given the estimated error, the conditional variance of the true error given the estimated error, and the 95% upper confidence bound for the true error given the estimated error. Numerous classification and estimation rules are considered across a number of models. Massive simulation is used for continuous models and analytic results are derived for discrete classification. We also consider a breast-cancer study to illustrate how the theory might be applied in practice. Although specific results depend on the classification rule, error-estimation rule, and model, some general trends are seen: (I) if the true error is small (large), then the conditional estimated error is generally high (low)-biased; (II) the conditional expected true error tends to be larger (smaller) than the estimated error for small (large) estimated errors; and (III) the confidence bounds tend to be well above the estimated error for low error estimates, becoming much less so for large estimates.

Entities:  

Mesh:

Year:  2006        PMID: 17121434     DOI: 10.1177/153303460600500605

Source DB:  PubMed          Journal:  Technol Cancer Res Treat        ISSN: 1533-0338


  9 in total

1.  Principles for the ethical analysis of clinical and translational research.

Authors:  Jonathan A L Gelfond; Elizabeth Heitman; Brad H Pollock; Craig M Klugman
Journal:  Stat Med       Date:  2011-07-12       Impact factor: 2.373

2.  A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions.

Authors:  Kevin K Dobbin
Journal:  Biostatistics       Date:  2008-11-27       Impact factor: 5.899

3.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

4.  Which is better: holdout or full-sample classifier design?

Authors:  Marcel Brun; Qian Xu; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2008

5.  Multiple-rule bias in the comparison of classification rules.

Authors:  Mohammadmahdi R Yousefi; Jianping Hua; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

6.  Classification and error estimation for discrete data.

Authors:  Ulisses M Braga-Neto
Journal:  Curr Genomics       Date:  2009-11       Impact factor: 2.236

7.  The illusion of distribution-free small-sample classification in genomics.

Authors:  Edward R Dougherty; Amin Zollanvari; Ulisses M Braga-Neto
Journal:  Curr Genomics       Date:  2011-08       Impact factor: 2.236

8.  Scientific knowledge is possible with small-sample classification.

Authors:  Edward R Dougherty; Lori A Dalton
Journal:  EURASIP J Bioinform Syst Biol       Date:  2013-08-20

9.  Robust model selection for classification of microarrays.

Authors:  Ikumi Suzuki; Takashi Takenouchi; Miki Ohira; Shigeyuki Oba; Shin Ishii
Journal:  Cancer Inform       Date:  2009-06-25
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.