Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Molecular diagnosis. Classification, model selection and performance evaluation.

Literature DB >> 16113770

Molecular diagnosis. Classification, model selection and performance evaluation.

Abstract

OBJECTIVES: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces.
METHODS: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation.
RESULTS: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation.
CONCLUSIONS: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.

Mesh：

Year: 2005 PMID： 16113770

Source DB: PubMed Journal: Methods Inf Med ISSN： 0026-1270 Impact factor: 2.176

Keyword Cloud
Cited

7 in total

1. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.

Authors: Dmitry Grapov; John W Newman
Journal: Bioinformatics Date: 2012-07-18 Impact factor: 6.937

Review 2. Understanding health and disease with multidimensional single-cell methods.

Authors: Julián Candia; Jayanth R Banavar; Wolfgang Losert
Journal: J Phys Condens Matter Date: 2014-01-22 Impact factor: 2.333

Review 3. Inferring cellular networks--a review.

Authors: Florian Markowetz; Rainer Spang
Journal: BMC Bioinformatics Date: 2007-09-27 Impact factor: 3.169

4. NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins.

Authors: Daniel Restrepo-Montoya; Camilo Pino; Luis F Nino; Manuel E Patarroyo; Manuel A Patarroyo
Journal: BMC Bioinformatics Date: 2011-01-14 Impact factor: 3.169

5. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

Authors: Natalia Becker; Grischa Toedt; Peter Lichter; Axel Benner
Journal: BMC Bioinformatics Date: 2011-05-09 Impact factor: 3.169

6. Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs).

Authors: Christian Rausch; Tilmann Weber; Oliver Kohlbacher; Wolfgang Wohlleben; Daniel H Huson
Journal: Nucleic Acids Res Date: 2005-10-12 Impact factor: 16.971

7. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

Authors: Stefano Lise; Cedric Archambeau; Massimiliano Pontil; David T Jones
Journal: BMC Bioinformatics Date: 2009-10-30 Impact factor: 3.169

7 in total