Literature DB >> 24148482

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis.

Julia Kuligowski1, David Pérez-Guaita, Javier Escobar, Miguel de la Guardia, Máximo Vento, Alberto Ferrer, Guillermo Quintás.   

Abstract

Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than that attributed to the presence of chance correlations in the original data set. Statistical significance of PLSDA CV-figures of merit obtained after variable selection is expressed by means of p-values calculated by using a permutation test that included the variable selection step. The reliability of the approach is evaluated using two variable selection methods on experimental and simulated data sets with and without induced class differences. The proposed approach can be considered as a useful tool when no external validation set is available and provides a straightforward way to evaluate differences between variable selection methods.
© 2013 Elsevier B.V. All rights reserved.

Keywords:  Chance correlations; Metabolomics; Partial Least Squares-Discriminant Analysis (PLSDA); Variable selection

Mesh:

Year:  2013        PMID: 24148482     DOI: 10.1016/j.talanta.2013.07.048

Source DB:  PubMed          Journal:  Talanta        ISSN: 0039-9140            Impact factor:   6.057


  3 in total

1.  Transcriptome profiles discriminate between Gram-positive and Gram-negative sepsis in preterm neonates.

Authors:  María Cernada; Alejandro Pinilla-González; Julia Kuligowski; José Manuel Morales; Sheila Lorente-Pozo; José David Piñeiro-Ramos; Anna Parra-Llorca; Inmaculada Lara-Cantón; Máximo Vento; Eva Serna
Journal:  Pediatr Res       Date:  2021-03-25       Impact factor: 3.756

2.  An ensemble variable selection method for vibrational spectroscopic data analysis.

Authors:  Jixiong Zhang; Hong Yan; Yanmei Xiong; Qianqian Li; Shungeng Min
Journal:  RSC Adv       Date:  2019-02-26       Impact factor: 3.361

3.  Serum Metabolomics Analysis of Asthma in Different Inflammatory Phenotypes: A Cross-Sectional Study in Northeast China.

Authors:  Zhiqiang Pang; Guoqiang Wang; Cuizhu Wang; Weijie Zhang; Jinping Liu; Fang Wang
Journal:  Biomed Res Int       Date:  2018-09-23       Impact factor: 3.411

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.