Literature DB >> 19507882

Variable selection using iterative reformulation of training set models for discrimination of samples: application to gas chromatography/mass spectrometry of mouse urinary metabolites.

Kanet Wongravee1, Nina Heinrich, Maria Holmboe, Michele L Schaefer, Randall R Reed, Jose Trevejo, Richard G Brereton.   

Abstract

The paper discusses variable selection as used in large metabolomic studies, exemplified by mouse urinary gas chromatography of 441 mice in three experiments to detect the influence of age, diet, and stress on their chemosignal. Partial least squares discriminant analysis (PLS-DA) was applied to obtain class models, using a procedure of 20,000 iterations including the bootstrap for model optimization and random splits into test and training sets for validation. Variables are selected using PLS regression coefficients on the training set using an optimized number of components obtained from the bootstrap. The variables are ranked in order of significance, and the overall optimal variables are selected as those that appear as highly significant over 100 different test and training set splits. Cost/benefit analysis of performing the model on a reduced number of variables is also illustrated. This paper provides a strategy for properly validated methods for determining which variables are most significant for discriminating between two groups in large metabolomic data sets avoiding the common pitfall of overfitting if variables are selected on a combined training and test set and also taking into account that different variables may be selected each time the samples are split into training and test sets using iterative procedures.

Entities:  

Mesh:

Year:  2009        PMID: 19507882      PMCID: PMC2910586          DOI: 10.1021/ac900251c

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


  4 in total

1.  Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data.

Authors:  Roger M Jarvis; Royston Goodacre
Journal:  Bioinformatics       Date:  2004-10-28       Impact factor: 6.937

2.  High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses.

Authors:  Pär Jonsson; Annika I Johansson; Jonas Gullberg; Johan Trygg; Jiye A; Bjørn Grung; Stefan Marklund; Michael Sjöström; Henrik Antti; Thomas Moritz
Journal:  Anal Chem       Date:  2005-09-01       Impact factor: 6.986

3.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation.

Authors:  Sabina Bijlsma; Ivana Bobeldijk; Elwin R Verheij; Raymond Ramaker; Sunil Kochhar; Ian A Macdonald; Ben van Ommen; Age K Smilde
Journal:  Anal Chem       Date:  2006-01-15       Impact factor: 6.986

4.  Application of dissimilarity indices, principal coordinates analysis, and rank tests to peak tables in metabolomics of the gas chromatography/mass spectrometry of human sweat.

Authors:  Yun Xu; Fan Gong; Sarah J Dixon; Richard G Brereton; Helena A Soini; Milos V Novotny; Elisabeth Oberzaucher; Karl Grammer; Dustin J Penn
Journal:  Anal Chem       Date:  2007-06-30       Impact factor: 6.986

  4 in total
  5 in total

1.  Multivariate Analysis in Metabolomics.

Authors:  Bradley Worley; Robert Powers
Journal:  Curr Metabolomics       Date:  2013

2.  Mouse urinary biomarkers provide signatures of maturation, diet, stress level, and diurnal rhythm.

Authors:  Michele L Schaefer; Kanet Wongravee; Maria E Holmboe; Nina M Heinrich; Sarah J Dixon; Julie E Zeskind; Heather M Kulaga; Richard G Brereton; Randall R Reed; Jose M Trevejo
Journal:  Chem Senses       Date:  2010-04-23       Impact factor: 3.160

3.  Identification of Volatile Compounds and Selection of Discriminant Markers for Elephant Dung Coffee Using Static Headspace Gas Chromatography-Mass Spectrometry and Chemometrics.

Authors:  Poowadol Thammarat; Chadin Kulsing; Kanet Wongravee; Natchanun Leepipatpiboon; Thumnoon Nhujak
Journal:  Molecules       Date:  2018-07-31       Impact factor: 4.411

4.  PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data.

Authors:  Renata Bujak; Emilia Daghir-Wojtkowiak; Roman Kaliszan; Michał J Markuszewski
Journal:  Front Mol Biosci       Date:  2016-07-26

5.  Rapid geographical indication of peppercorn seeds using corona discharge mass spectrometry.

Authors:  Preeyarad Charoensumran; Monrawat Rauytanapanit; Nontawat Sricharoen; Barry L Smith; Kanet Wongravee; Simon Maher; Thanit Praneenararat
Journal:  Sci Rep       Date:  2021-08-09       Impact factor: 4.379

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.