Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Is cross-validation better than resubstitution for ranking genes?

Literature DB >> 14734317

Is cross-validation better than resubstitution for ranking genes?

Ulisses Braga-Neto¹, Ronaldo Hashimoto, Edward R Dougherty, Danh V Nguyen, Raymond J Carroll.

Abstract

MOTIVATION: Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in turn to each data observation. Cross-validation (in leave-one-out form) removes each observation in turn, constructs the classifier, and then computes whether this leave-one-out classifier correctly classifies the deleted observation. Resubstitution typically underestimates classifier error, severely so in many cases. Cross-validation has the advantage of producing an effectively unbiased error estimate, but the estimate is highly variable. In many applications it is not the misclassification rate per se that is of interest, but rather the construction of gene sets that have the potential to classify or predict. Hence, one needs to rank feature sets based on their performance.
RESULTS: A model-based approach is used to compare the ranking performances of resubstitution and cross-validation for classification based on real-valued feature sets and for prediction in the context of probabilistic Boolean networks (PBNs). For classification, a Gaussian model is considered, along with classification via linear discriminant analysis and the 3-nearest-neighbor classification rule. Prediction is examined in the steady-distribution of a PBN. Three metrics are proposed to compare feature-set ranking based on error estimation with ranking based on the true error, which is known owing to the model-based approach. In all cases, resubstitution is competitive with cross-validation relative to ranking accuracy. This is in addition to the enormous savings in computation time afforded by resubstitution.

Entities: Disease

Mesh：

Year: 2004 PMID： 14734317 DOI： 10.1093/bioinformatics/btg399

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

14 in total

1. Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors: Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal: EURASIP J Bioinform Syst Biol Date: 2007

2. Validation of computational methods in genomics.

Authors: Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal: Curr Genomics Date: 2007-03 Impact factor: 2.236

3. Objective detection of chronic stress using physiological parameters.

Authors: Rabah M Al Abdi; Ahmad E Alhitary; Enas W Abdul Hay; Areen K Al-Bashir
Journal: Med Biol Eng Comput Date: 2018-06-18 Impact factor: 2.602

Review 4. Radiomics: the process and the challenges.

Authors: Virendra Kumar; Yuhua Gu; Satrajit Basu; Anders Berglund; Steven A Eschrich; Matthew B Schabath; Kenneth Forster; Hugo J W L Aerts; Andre Dekker; David Fenstermacher; Dmitry B Goldgof; Lawrence O Hall; Philippe Lambin; Yoganand Balagurunathan; Robert A Gatenby; Robert J Gillies
Journal: Magn Reson Imaging Date: 2012-08-13 Impact factor: 2.546

5. Gene selection using iterative feature elimination random forests for survival outcomes.

Authors: Herbert Pang; Stephen L George; Ken Hui; Tiejun Tong
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2012 Sep-Oct Impact factor: 3.710

6. Use of wrapper algorithms coupled with a random forests classifier for variable selection in large-scale genomic association studies.

Authors: Andrei S Rodin; Anatoliy Litvinenko; Kathy Klos; Alanna C Morrison; Trevor Woodage; Josef Coresh; Eric Boerwinkle
Journal: J Comput Biol Date: 2009-12 Impact factor: 1.479

Is cross-validation better than resubstitution for ranking genes?

1. Decorrelation of the true and estimated classifier errors in high-dimensional settings.

2. Validation of computational methods in genomics.

3. Objective detection of chronic stress using physiological parameters.

Review 4. Radiomics: the process and the challenges.

5. Gene selection using iterative feature elimination random forests for survival outcomes.

6. Use of wrapper algorithms coupled with a random forests classifier for variable selection in large-scale genomic association studies.

7. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations.

8. Gene selection and classification of microarray data using random forest.

9. Unbiased bootstrap error estimation for linear discriminant analysis.

10. RiGoR: reporting guidelines to address common sources of bias in risk model development.