Lluís Revilla1,2, Aida Mayorgas2, Ana M Corraliza2, Maria C Masamunt2, Amira Metwaly3, Dirk Haller3,4, Eva Tristán1,5, Anna Carrasco1,5, Maria Esteve1,5, Julian Panés1,2, Elena Ricart1,2, Juan J Lozano1, Azucena Salas2. 1. Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain. 2. Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain. 3. Chair of Nutrition and Immunology, Technical University of Munich, Freising-Weihenstephan, Germany. 4. ZIEL Institute for Food and Health, Technical University of Munich, Freising-Weihenstephan, Germany. 5. Department of Gastroenterology, Hospital Universitari Mútua Terrassa, Barcelona, Spain.
Abstract
BACKGROUND: Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease. RESULTS: We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's disease patients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis. CONCLUSIONS: The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.
BACKGROUND: Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease. RESULTS: We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's diseasepatients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis. CONCLUSIONS: The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.
Authors: Anna Klindworth; Elmar Pruesse; Timmy Schweer; Jörg Peplies; Christian Quast; Matthias Horn; Frank Oliver Glöckner Journal: Nucleic Acids Res Date: 2012-08-28 Impact factor: 16.971
Authors: Stephanie Puget; Cathy Philippe; Dorine A Bax; Bastien Job; Pascale Varlet; Marie-Pierre Junier; Felipe Andreiuolo; Dina Carvalho; Ricardo Reis; Lea Guerrini-Rousseau; Thomas Roujeau; Philippe Dessen; Catherine Richon; Vladimir Lazar; Gwenael Le Teuff; Christian Sainte-Rose; Birgit Geoerger; Gilles Vassal; Chris Jones; Jacques Grill Journal: PLoS One Date: 2012-02-28 Impact factor: 3.240
Authors: Carlos G Gonzalez; Robert H Mills; Qiyun Zhu; Consuelo Sauceda; Rob Knight; Parambir S Dulai; David J Gonzalez Journal: Microbiome Date: 2022-08-24 Impact factor: 16.837