Literature DB >> 22382778

Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study.

Henri S Tapp1, Marijana Radonjic, E Kate Kemsley, Uwe Thissen.   

Abstract

Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares "PLS"; a genetic algorithm-based multiple linear regression, "GA-MLR"; two least-angle shrinkage methods, "LASSO" and "ELASTIC NET"; and a variant of PLS that uses covariance-based variate selection, "CovProc." Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on "double cross-validation" predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods.

Entities:  

Year:  2012        PMID: 22382778      PMCID: PMC3380194          DOI: 10.1007/s12263-012-0288-4

Source DB:  PubMed          Journal:  Genes Nutr        ISSN: 1555-8932            Impact factor:   5.523


  21 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  TIMP-1/MMP-9 imbalance in an EBV-immortalized B lymphocyte cellular model: evidence for TIMP-1 multifunctional properties.

Authors:  P Gaudin; C Trocmé; S Berthier; S Kieffer; J Boutonnat; C Lamy; A Surla; J Garin; F Morel
Journal:  Biochim Biophys Acta       Date:  2000-12-11

Review 3.  Nutrigenomics: goals and strategies.

Authors:  Michael Müller; Sander Kersten
Journal:  Nat Rev Genet       Date:  2003-04       Impact factor: 53.242

Review 4.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data.

Authors:  Anne-Laure Boulesteix; Korbinian Strimmer
Journal:  Brief Bioinform       Date:  2006-05-26       Impact factor: 11.622

Review 5.  Nutrigenomics: from molecular nutrition to prevention of disease.

Authors:  Lydia Afman; Michael Müller
Journal:  J Am Diet Assoc       Date:  2006-04

6.  Multivariate techniques and their application in nutrition: a metabolomics case study.

Authors:  E Katherine Kemsley; Gwénaëlle Le Gall; Jack R Dainty; Andrew D Watson; Linda J Harvey; Henri S Tapp; Ian J Colquhoun
Journal:  Br J Nutr       Date:  2007-03-08       Impact factor: 3.718

Review 7.  Adipose tissue as an endocrine organ.

Authors:  R S Ahima; J S Flier
Journal:  Trends Endocrinol Metab       Date:  2000-10       Impact factor: 12.015

8.  MGAT2, a monoacylglycerol acyltransferase expressed in the small intestine.

Authors:  Chi-Liang Eric Yen; Robert V Farese
Journal:  J Biol Chem       Date:  2003-03-05       Impact factor: 5.157

9.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

10.  Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Authors:  Robert A van den Berg; Huub C J Hoefsloot; Johan A Westerhuis; Age K Smilde; Mariët J van der Werf
Journal:  BMC Genomics       Date:  2006-06-08       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.