Literature DB >> 16113770

Molecular diagnosis. Classification, model selection and performance evaluation.

F Markowetz1, R Spang.   

Abstract

OBJECTIVES: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces.
METHODS: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation.
RESULTS: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation.
CONCLUSIONS: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.

Mesh:

Year:  2005        PMID: 16113770

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  7 in total

1.  imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.

Authors:  Dmitry Grapov; John W Newman
Journal:  Bioinformatics       Date:  2012-07-18       Impact factor: 6.937

Review 2.  Understanding health and disease with multidimensional single-cell methods.

Authors:  Julián Candia; Jayanth R Banavar; Wolfgang Losert
Journal:  J Phys Condens Matter       Date:  2014-01-22       Impact factor: 2.333

Review 3.  Inferring cellular networks--a review.

Authors:  Florian Markowetz; Rainer Spang
Journal:  BMC Bioinformatics       Date:  2007-09-27       Impact factor: 3.169

4.  NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins.

Authors:  Daniel Restrepo-Montoya; Camilo Pino; Luis F Nino; Manuel E Patarroyo; Manuel A Patarroyo
Journal:  BMC Bioinformatics       Date:  2011-01-14       Impact factor: 3.169

5.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

Authors:  Natalia Becker; Grischa Toedt; Peter Lichter; Axel Benner
Journal:  BMC Bioinformatics       Date:  2011-05-09       Impact factor: 3.169

6.  Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs).

Authors:  Christian Rausch; Tilmann Weber; Oliver Kohlbacher; Wolfgang Wohlleben; Daniel H Huson
Journal:  Nucleic Acids Res       Date:  2005-10-12       Impact factor: 16.971

7.  Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

Authors:  Stefano Lise; Cedric Archambeau; Massimiliano Pontil; David T Jones
Journal:  BMC Bioinformatics       Date:  2009-10-30       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.