Literature DB >> 24856395

A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data.

Piotr S Gromski1, Yun Xu1, Elon Correa1, David I Ellis1, Michael L Turner2, Royston Goodacre3.   

Abstract

Many analytical approaches such as mass spectrometry generate large amounts of data (input variables) per sample analysed, and not all of these variables are important or related to the target output of interest. The selection of a smaller number of variables prior to sample classification is a widespread task in many research studies, where attempts are made to seek the lowest possible set of variables that are still able to achieve a high level of prediction accuracy; in other words, there is a need to generate the most parsimonious solution when the number of input variables is huge but the number of samples/objects are smaller. Here, we compare several different variable selection approaches in order to ascertain which of these are ideally suited to achieve this goal. All variable selection approaches were applied to the analysis of a common set of metabolomics data generated by Curie-point pyrolysis mass spectrometry (Py-MS), where the goal of the study was to classify the Gram-positive bacteria Bacillus. These approaches include stepwise forward variable selection, used for linear discriminant analysis (LDA); variable importance for projection (VIP) coefficient, employed in partial least squares-discriminant analysis (PLS-DA); support vector machines-recursive feature elimination (SVM-RFE); as well as the mean decrease in accuracy and mean decrease in Gini, provided by random forests (RF). Finally, a double cross-validation procedure was applied to minimize the consequence of overfitting. The results revealed that RF with its variable selection techniques and SVM combined with SVM-RFE as a variable selection method, displayed the best results in comparison to other approaches.
Copyright © 2014. Published by Elsevier B.V.

Keywords:  Bacillus; Bootstrapping; Double cross-validation; Pyrolysis mass spectrometry; Supervised learning; Variable selection

Mesh:

Year:  2014        PMID: 24856395     DOI: 10.1016/j.aca.2014.03.039

Source DB:  PubMed          Journal:  Anal Chim Acta        ISSN: 0003-2670            Impact factor:   6.558


  18 in total

1.  Changes of the plasma metabolome of newly born piglets subjected to postnatal hypoxia and resuscitation with air.

Authors:  Rønnaug Solberg; Julia Kuligowski; Leonid Pankratov; Javier Escobar; Guillermo Quintás; Isabel Lliso; Ángel Sánchez-Illana; Ola Didrik Saugstad; Máximo Vento
Journal:  Pediatr Res       Date:  2016-04-07       Impact factor: 3.756

2.  Metabolic fingerprinting analysis of oil palm reveals a set of differentially expressed metabolites in fatal yellowing symptomatic and non-symptomatic plants.

Authors:  Jorge Candido Rodrigues-Neto; Mauro Vicentini Correia; Augusto Lopes Souto; José Antônio de Aquino Ribeiro; Letícia Rios Vieira; Manoel Teixeira Souza; Clenilson Martins Rodrigues; Patrícia Verardi Abdelnur
Journal:  Metabolomics       Date:  2018-10-11       Impact factor: 4.290

3.  Personalized prediction model for seizure-free epilepsy with levetiracetam therapy: a retrospective data analysis using support vector machine.

Authors:  Jia-Hui Zhang; Xiong Han; Hong-Wei Zhao; Di Zhao; Na Wang; Ting Zhao; Gui-Nv He; Xue-Rui Zhu; Ying Zhang; Jiu-Yan Han; Dian-Ling Huang
Journal:  Br J Clin Pharmacol       Date:  2018-09-03       Impact factor: 4.335

4.  An integrative prediction algorithm of drug-refractory epilepsy based on combined clinical-EEG functional connectivity features.

Authors:  Xiong Han; Bin Wang; Shijun Yang; Pan Zhao; Mingmin Li; Zongya Zhao; Na Wang; Huan Ma; Yue Zhang; Ting Zhao; Yanan Chen; Zhe Ren; Yang Hong; Qi Wang
Journal:  J Neurol       Date:  2021-07-25       Impact factor: 4.849

5.  Evaluation of disease staging and chemotherapeutic response in non-small cell lung cancer from patient tumor-derived metabolomic data.

Authors:  Hunter A Miller; Xinmin Yin; Susan A Smith; Xiaoling Hu; Xiang Zhang; Jun Yan; Donald M Miller; Victor H van Berkel; Hermann B Frieboes
Journal:  Lung Cancer       Date:  2021-04-15       Impact factor: 6.081

6.  Influence of missing values substitutes on multivariate analysis of metabolomics data.

Authors:  Piotr S Gromski; Yun Xu; Helen L Kotze; Elon Correa; David I Ellis; Emily Grace Armitage; Michael L Turner; Royston Goodacre
Journal:  Metabolites       Date:  2014-06-16

7.  Using Resistin, glucose, age and BMI to predict the presence of breast cancer.

Authors:  Miguel Patrício; José Pereira; Joana Crisóstomo; Paulo Matafome; Manuel Gomes; Raquel Seiça; Francisco Caramelo
Journal:  BMC Cancer       Date:  2018-01-04       Impact factor: 4.430

8.  Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

Authors:  Patrick J Trainor; Andrew P DeFilippis; Shesh N Rai
Journal:  Metabolites       Date:  2017-06-21

9.  Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

Authors:  Dhouha Grissa; Mélanie Pétéra; Marion Brandolini; Amedeo Napoli; Blandine Comte; Estelle Pujos-Guillot
Journal:  Front Mol Biosci       Date:  2016-07-08

10.  Variable selection for binary classification using error rate p-values applied to metabolomics data.

Authors:  Mari van Reenen; Carolus J Reinecke; Johan A Westerhuis; J Hendrik Venter
Journal:  BMC Bioinformatics       Date:  2016-01-14       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.