Marion Brandolini-Bunlon1, Mélanie Pétéra2, Pierrette Gaudreau3,4, Blandine Comte5, Stéphanie Bougeard6, Estelle Pujos-Guillot2,5. 1. Université Clermont Auvergne, INRA, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, 63000, Clermont-Ferrand, France. marion.brandolini-bunlon@inra.fr. 2. Université Clermont Auvergne, INRA, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, 63000, Clermont-Ferrand, France. 3. Centre de Recherche du Centre hospitalier de l'Université de Montréal, Montréal, Canada. 4. Département de médecine, Université de Montréal, Montréal, Canada. 5. Université Clermont Auvergne, INRA, UNH, 63000, Clermont-Ferrand, France. 6. Anses, BP53, Technopole Saint Brieuc Armor, 22440, Ploufragan, France.
Abstract
INTRODUCTION: Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure. OBJECTIVE: The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data. METHODS: This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions. RESULTS: To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes. CONCLUSION: In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.
INTRODUCTION: Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure. OBJECTIVE: The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data. METHODS: This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions. RESULTS: To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes. CONCLUSION: In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.
Authors: Piotr S Gromski; Howbeer Muhamadali; David I Ellis; Yun Xu; Elon Correa; Michael L Turner; Royston Goodacre Journal: Anal Chim Acta Date: 2015-02-11 Impact factor: 6.558
Authors: Oliver P Günther; Heesun Shin; Raymond T Ng; W Robert McMaster; Bruce M McManus; Paul A Keown; Scott J Tebbutt; Kim-Anh Lê Cao Journal: OMICS Date: 2014-11