Literature DB >> 31738392

Link-HD: a versatile framework to explore and integrate heterogeneous microbial communities.

Laura M Zingaretti1,2, Gilles Renand3, Diego P Morgavi4, Yuliaxis Ramayo-Caldas3,5.   

Abstract

MOTIVATION: We present Link-HD, an approach to integrate multiple datasets. Link-HD is a generalization of 'Structuration des Tableaux A Trois Indices de la Statistique-Analyse Conjointe de Tableaux', a family of methods designed to integrate information from heterogeneous data. Here, we extend the classical approach to deal with broader datasets (e.g. compositional data), methods for variable selection and taxon-set enrichment analysis.
RESULTS: The methodology is demonstrated by integrating rumen microbial communities from cows for which methane yield (CH4y) was individually measured. Our approach reproduces the significant link between rumen microbiota structure and CH4 emission. When analyzing the TARA's ocean data, Link-HD replicates published results, highlighting the relevance of temperature with members of phyla Proteobacteria on the structure and functionality of this ecosystem.
AVAILABILITY AND IMPLEMENTATION: The source code, examples and a complete manual are freely available in GitHub https://github.com/lauzingaretti/LinkHD and in Bioconductor https://bioconductor.org/packages/release/bioc/html/LinkHD.html.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 31738392      PMCID: PMC7141858          DOI: 10.1093/bioinformatics/btz862

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The reduction of ‘omics’ technology costs now enables collection of data from multiple sources. This allows researchers to simultaneously study several datasets and investigate their relationship with complex traits. The integration of these heterogeneous datasets is not trivial and several statistical methods have been developed to address this challenge (Argelaguet ; Mariette and Villa-Vialaneix, 2018; Meng ). In particular, the amalgamation of multiple microbial ecosystems poses unique challenges as these are compositional and sparse data. MixKernel (Mariette and Villa-Vialaneix, 2018) is a well-known tool designed to integrate heterogeneous datasets including microbial communities, but no method to perform a taxonomic enrichment analysis is available. Another popular integrative approach is MOFA (Argelaguet ), however, it is unable to deal with compositional data. Here, we present Link-HD, a tool to integrate and explore multiple microbial communities based on STATIS (Des Plantes, 1976), a family of multivariate methods to integrate multiple datasets. Link-HD generalizes STATIS with Regression Biplot (Ter Braak, 1997), clustering, differential abundance, enrichment taxonomic analysis and visualization tools. Link-HD analyzes distance tables computed from numerical, categorical, or compositional data as a generalization of multidimensional scaling (Abdi ). Furthermore, Link-HD performs variable selection and can link the obtained common sub-space with phenotype information.

2 Materials and methods

Like STATIS, Link-HD aims to compare and analyze the relationships between datasets with a shared set of observations or variables. However, our package was specifically designed to integrate microbial communities and incorporate distances and transformations to deal with compositional data (Aitchison, 1982). The method is implemented in three main phases (Fig. 1).
Fig. 1.

Link-HD Workflow. In the Inter-structure step, raw data are transformed using cumulative sum scaling or centered log ratio, and the correlation coefficient (Rv) is computed. The second step is the compromise (W) and, finally, the intra-structure step involves the Eigen-decomposition of W. Observations can be clustered and methods for selecting variables and association with phenotypes are available

Inter-structure step: The algorithm first assesses the similarity between transformed distance tables using the vector correlation coefficient (Rv) (Escoufier, 1973), which can be interpreted as a general ‘vector covariance’ between matrices, i.e. this step evaluates similarity between the disparate datasets. Compromise step: Next, the ‘compromise’ matrix is calculated, which is a weighted sum of each cross-product matrix. This step involves an optimization problem since the weights are chosen to maximize the correlation between the compromise matrix and each individual component. Intra-structure step: Finally, the compromise matrix is evaluated through a Principal Component Analysis. The coordinates of the common elements are projected into a low rank space, where the relationships between them can be easily interpreted. Link-HD Workflow. In the Inter-structure step, raw data are transformed using cumulative sum scaling or centered log ratio, and the correlation coefficient (Rv) is computed. The second step is the compromise (W) and, finally, the intra-structure step involves the Eigen-decomposition of W. Observations can be clustered and methods for selecting variables and association with phenotypes are available Variable selection is tackled by two alternative approaches: (i) by projecting all the input variables into the compromise through a general Biplot formulation (Ter Braak, 1997); and (ii) by computing the differential abundance of features between clusters of samples. A novelty of Link-HD is its ability to aggregate the selected variables at several taxonomic levels and to establish whether that level is enriched using a cumulative hypergeometric distribution. This function also allows users to add a custom OTUs list. Finally, the SPIEC-EASI (Kurtz ) tool can be used to visualize variable interactions.

3 Case studies

We illustrate our approach with rumen microbial (Ramayo-Caldas ), TARA’s Ocean expedition (Sunagawa ) and transcriptome NCI-60 cell line datasets (Reinhold ). In the rumen study, we integrated Bacteria, Archaea and Protozoa from 65 Holstein cows. Link-HD was able to reproduce previous results (Danielsson ; Kittelmann ; Ramayo-Caldas ), showing a link between the structure of the rumen microbiota and CH4 emission. We also identify microbial markers associated to CH4. In the TARA’s example, Link-HD replicates the relevant role of temperature and Proteobacteria phyla on the structure of this ecosystem, as described in Mariette and Villa-Vialaneix (2018). Finally, we show the potential of Link-HD to integrate other omics layers by using transcriptome NCI-60 cell lines. Link-HD recapitulates the reported data structure (Meng ) and ontology analysis reveals several cancer-related pathways. In all, our results demonstrate that Link-HD is robust in combining several heterogeneous data types. A detailed description of these case studies and the theory behind Link-HD is available at https://lauzingaretti.github.io/LinkHD/ and in Bioconductor (https://bioconductor.org/packages/release/bioc/html/LinkHD.html).

4 Conclusions

We have developed an R package to integrate multiple microbial communities and other ‘omics’ layers combining a plethora of statistical methods in a fast, simple and flexible way.
  9 in total

1.  Ocean plankton. Structure and function of the global ocean microbiome.

Authors:  Shinichi Sunagawa; Luis Pedro Coelho; Samuel Chaffron; Jens Roat Kultima; Karine Labadie; Guillem Salazar; Bardya Djahanschiri; Georg Zeller; Daniel R Mende; Adriana Alberti; Francisco M Cornejo-Castillo; Paul I Costea; Corinne Cruaud; Francesco d'Ovidio; Stefan Engelen; Isabel Ferrera; Josep M Gasol; Lionel Guidi; Falk Hildebrand; Florian Kokoszka; Cyrille Lepoivre; Gipsi Lima-Mendez; Julie Poulain; Bonnie T Poulos; Marta Royo-Llonch; Hugo Sarmento; Sara Vieira-Silva; Céline Dimier; Marc Picheral; Sarah Searson; Stefanie Kandels-Lewis; Chris Bowler; Colomban de Vargas; Gabriel Gorsky; Nigel Grimsley; Pascal Hingamp; Daniele Iudicone; Olivier Jaillon; Fabrice Not; Hiroyuki Ogata; Stephane Pesant; Sabrina Speich; Lars Stemmann; Matthew B Sullivan; Jean Weissenbach; Patrick Wincker; Eric Karsenti; Jeroen Raes; Silvia G Acinas; Peer Bork
Journal:  Science       Date:  2015-05-22       Impact factor: 47.728

2.  CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set.

Authors:  William C Reinhold; Margot Sunshine; Hongfang Liu; Sudhir Varma; Kurt W Kohn; Joel Morris; James Doroshow; Yves Pommier
Journal:  Cancer Res       Date:  2012-07-15       Impact factor: 12.701

3.  Unsupervised multiple kernel learning for heterogeneous data integration.

Authors:  Jérôme Mariette; Nathalie Villa-Vialaneix
Journal:  Bioinformatics       Date:  2018-03-15       Impact factor: 6.937

4.  Sparse and compositionally robust inference of microbial ecological networks.

Authors:  Zachary D Kurtz; Christian L Müller; Emily R Miraldi; Dan R Littman; Martin J Blaser; Richard A Bonneau
Journal:  PLoS Comput Biol       Date:  2015-05-07       Impact factor: 4.475

5.  Methane Production in Dairy Cows Correlates with Rumen Methanogenic and Bacterial Community Structure.

Authors:  Rebecca Danielsson; Johan Dicksved; Li Sun; Horacio Gonda; Bettina Müller; Anna Schnürer; Jan Bertilsson
Journal:  Front Microbiol       Date:  2017-02-17       Impact factor: 5.640

6.  Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

Authors:  Ricard Argelaguet; Britta Velten; Damien Arnol; Sascha Dietrich; Thorsten Zenz; John C Marioni; Florian Buettner; Wolfgang Huber; Oliver Stegle
Journal:  Mol Syst Biol       Date:  2018-06-20       Impact factor: 11.429

7.  Identification of rumen microbial biomarkers linked to methane emission in Holstein dairy cows.

Authors:  Yuliaxis Ramayo-Caldas; Laura Zingaretti; Milka Popova; Jordi Estellé; Aurelien Bernard; Nicolas Pons; Pau Bellot; Núria Mach; Andrea Rau; Hugo Roume; Miguel Perez-Enciso; Philippe Faverdin; Nadège Edouard; Dusko Ehrlich; Diego P Morgavi; Gilles Renand
Journal:  J Anim Breed Genet       Date:  2019-08-16       Impact factor: 2.380

8.  A multivariate approach to the integration of multi-omics datasets.

Authors:  Chen Meng; Bernhard Kuster; Aedín C Culhane; Amin Moghaddas Gholami
Journal:  BMC Bioinformatics       Date:  2014-05-29       Impact factor: 3.169

9.  Two different bacterial community types are linked with the low-methane emission trait in sheep.

Authors:  Sandra Kittelmann; Cesar S Pinares-Patiño; Henning Seedorf; Michelle R Kirk; Siva Ganesh; John C McEwan; Peter H Janssen
Journal:  PLoS One       Date:  2014-07-31       Impact factor: 3.240

  9 in total
  3 in total

1.  kernInt: A Kernel Framework for Integrating Supervised and Unsupervised Analyses in Spatio-Temporal Metagenomic Datasets.

Authors:  Elies Ramon; Lluís Belanche-Muñoz; Francesc Molist; Raquel Quintanilla; Miguel Perez-Enciso; Yuliaxis Ramayo-Caldas
Journal:  Front Microbiol       Date:  2021-01-28       Impact factor: 5.640

2.  Early socialization and environmental enrichment of lactating piglets affects the caecal microbiota and metabolomic response after weaning.

Authors:  M Saladrigas-García; M D'Angelo; H L Ko; S Traserra; P Nolis; Y Ramayo-Caldas; J M Folch; P Vergara; P Llonch; J F Pérez; S M Martín-Orúe
Journal:  Sci Rep       Date:  2021-03-17       Impact factor: 4.379

3.  Understanding host-microbiota interactions in the commercial piglet around weaning.

Authors:  M Saladrigas-García; M D'Angelo; H L Ko; P Nolis; Y Ramayo-Caldas; J M Folch; P Llonch; D Solà-Oriol; J F Pérez; S M Martín-Orúe
Journal:  Sci Rep       Date:  2021-12-06       Impact factor: 4.379

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.