Literature DB >> 19039030

A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions.

Kevin K Dobbin1.   

Abstract

Constructing a confidence interval for the actual, conditional error rate of a prediction rule from multivariate data is problematic because this error rate is not a population parameter in the traditional sense--it is a functional of the training set. When the training set changes, so does this "parameter." A valid method for constructing confidence intervals for the actual error rate had been previously developed by McLachlan. However, McLachlan's method cannot be applied in many cancer research settings because it requires the number of samples to be much larger than the number of dimensions (n >> p), and it assumes that no dimension-reducing feature selection step is performed. Here, an alternative to McLachlan's method is presented that can be applied when p >> n, with an additional adjustment in the presence of feature selection. Coverage probabilities of the new method are shown to be nominal or conservative over a wide range of scenarios. The new method is relatively simple to implement and not computationally burdensome.

Entities:  

Mesh:

Year:  2008        PMID: 19039030      PMCID: PMC2733174          DOI: 10.1093/biostatistics/kxn035

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  13 in total

1.  Confidence intervals for the conditional probability of misallocation in discriminant analysis.

Authors:  G J McLachlan
Journal:  Biometrics       Date:  1975-03       Impact factor: 2.571

2.  A paradigm for class prediction using gene expression profiles.

Authors:  Michael D Radmacher; Lisa M McShane; Richard Simon
Journal:  J Comput Biol       Date:  2002       Impact factor: 1.479

Review 3.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.

Authors:  Richard Simon; Michael D Radmacher; Kevin Dobbin; Lisa M McShane
Journal:  J Natl Cancer Inst       Date:  2003-01-01       Impact factor: 13.506

4.  Estimating dataset size requirements for classifying DNA microarray data.

Authors:  Sayan Mukherjee; Pablo Tamayo; Simon Rogers; Ryan Rifkin; Anna Engle; Colin Campbell; Todd R Golub; Jill P Mesirov
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

5.  How many samples are needed to build a classifier: a general sequential approach.

Authors:  Wenjiang J Fu; Edward R Dougherty; Bani Mallick; Raymond J Carroll
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

6.  Prediction of cancer outcome with microarrays: a multiple random validation strategy.

Authors:  Stefan Michiels; Serge Koscielny; Catherine Hill
Journal:  Lancet       Date:  2005 Feb 5-11       Impact factor: 79.321

7.  Calculating confidence intervals for prediction error in microarray classification using resampling.

Authors:  Wenyu Jiang; Sudhir Varma; Richard Simon
Journal:  Stat Appl Genet Mol Biol       Date:  2008-03-01

8.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

9.  Selection bias in gene extraction on the basis of microarray gene-expression data.

Authors:  Christophe Ambroise; Geoffrey J McLachlan
Journal:  Proc Natl Acad Sci U S A       Date:  2002-04-30       Impact factor: 11.205

10.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.

Authors:  A Bhattacharjee; W G Richards; J Staunton; C Li; S Monti; P Vasa; C Ladd; J Beheshti; R Bueno; M Gillette; M Loda; G Weber; E J Mark; E S Lander; W Wong; B E Johnson; T R Golub; D J Sugarbaker; M Meyerson
Journal:  Proc Natl Acad Sci U S A       Date:  2001-11-13       Impact factor: 11.205

View more
  3 in total

1.  Sample size requirements for training high-dimensional risk predictors.

Authors:  Kevin K Dobbin; Xiao Song
Journal:  Biostatistics       Date:  2013-07-19       Impact factor: 5.899

2.  An empirical assessment of validation practices for molecular classifiers.

Authors:  Peter J Castaldi; Issa J Dahabreh; John P A Ioannidis
Journal:  Brief Bioinform       Date:  2011-02-07       Impact factor: 11.622

3.  Optimally splitting cases for training and testing high dimensional classifiers.

Authors:  Kevin K Dobbin; Richard M Simon
Journal:  BMC Med Genomics       Date:  2011-04-08       Impact factor: 3.063

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.