Literature DB >> 25957349

Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data.

Kathrin P Aßhauer1, Bernd Wemheuer2, Rolf Daniel2, Peter Meinicke1.   

Abstract

MOTIVATION: The characterization of phylogenetic and functional diversity is a key element in the analysis of microbial communities. Amplicon-based sequencing of marker genes, such as 16S rRNA, is a powerful tool for assessing and comparing the structure of microbial communities at a high phylogenetic resolution. Because 16S rRNA sequencing is more cost-effective than whole metagenome shotgun sequencing, marker gene analysis is frequently used for broad studies that involve a large number of different samples. However, in comparison to shotgun sequencing approaches, insights into the functional capabilities of the community get lost when restricting the analysis to taxonomic assignment of 16S rRNA data.
RESULTS: Tax4Fun is a software package that predicts the functional capabilities of microbial communities based on 16S rRNA datasets. We evaluated Tax4Fun on a range of paired metagenome/16S rRNA datasets to assess its performance. Our results indicate that Tax4Fun provides a good approximation to functional profiles obtained from metagenomic shotgun sequencing approaches.
AVAILABILITY AND IMPLEMENTATION: Tax4Fun is an open-source R package and applicable to output as obtained from the SILVAngs web server or the application of QIIME with a SILVA database extension. Tax4Fun is freely available for download at http://tax4fun.gobics.de/. CONTACT: kasshau@gwdg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25957349      PMCID: PMC4547618          DOI: 10.1093/bioinformatics/btv287

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Amplicon-based sequencing of marker genes is widely used for large-scale studies that involve many different sampling sites or time series. The common 16S rRNA gene-based analysis is a powerful tool for assessing the phylogenetic distribution of a metagenome but does not provide insights into the communities metabolic potential. Therefore, the prediction of the functional capabilities of a microbial community based on marker gene data would be highly beneficial. As a particular difficulty of such a predictive approach for most organisms in marker gene databases the genome and therefore the functional repertoire is not known. For instance, the SILVA SSU rRNA database (Quast ) (SILVA 115 full release) contains 3 808 884 rRNA sequences whereas KEGG (Release 71.1) (Kanehisa ) only comprises 2982 complete prokaryotic genomes. In a previous study (Aßhauer and Meinicke, 2013), we introduced a statistical method for predicting the metabolic profiles of a metagenome from its taxonomic composition using a linear combination of precomputed genomic reference profiles. A similar method has been proposed for inference of the community structure from remote sensing satellite image models (Larsen ). In this approach, the average EC number counts of all annotated genomes from a given taxonomic group, e.g. at order level, are used as reference for the linear combination of the community structure at this level. Recently, the PICRUSt approach was proposed to predict KEGG Ortholog (KO) functional profiles of microbial communities using 16S rRNA gene sequences (Langille ). PICRUSt infers unknown gene content by an extended ancestral state reconstruction algorithm. The algorithm uses a phylogenetic tree of 16S rRNA gene sequences to link operational taxonomic units (OTUs) with gene content. Thus, PICRUSt predictions depend on the topology of the tree and the distance to the next sequenced organism. Because a nearest neighbor within the tree topology always exists, PICRUSt links all OTUs, even if distances are large. This procedure can be problematic when analyzing microbial communities with a large proportion of so far not well-characterized phyla. Here, we present Tax4Fun, a novel tool for functional community profiling based on 16S rRNA data. In Tax4Fun the linking of 16S rRNA gene sequences with the functional annotation of sequenced prokaryotic genomes is realized with a nearest neighbor identification based on a minimum 16S rRNA sequence similarity. Tax4Fun can be applied to the output of 16S rRNA analysis pipelines that can perform a mapping of 16S rRNA gene reads to SILVA. The results of Tax4Fun indicate that the correlation of functional predictions with the metagenome profile is higher as compared to the PICRUSt tool.

2 Implementation

Our method provides a prediction of functional profiles on the basis of SILVA-labeled OTU abundances. After preprocessing and clustering of the 16S rRNA sequencing reads the resulting OTUs have to be assigned to reference sequences in the SILVA database. The SILVA assignment counts are then transformed to functional profiles using Tax4Fun, which proceeds in three steps. First, the SILVA-based 16S rRNA profile is transformed to a taxonomic profile of the prokaryotic KEGG organisms. The linear transformation is realized by a precomputed association matrix (see Supplementary Material section 2.1.1). Then, the estimated abundances of KEGG organisms are normalized by the 16S rRNA copy number obtained from the NCBI genome annotations. Finally, the normalized taxonomic abundances are used to linearly combine the precomputed functional profiles of the KEGG organisms for the prediction of the functional profile of the microbial community. The organism-specific reference profiles are estimated with the same method as used for the Taxy-Pro reference profiles (Klingenberg ). For a fast computation of the organism-specific and metagenomic functional KEGG Ortholog (KO) profiles, we utilized UProC (Meinicke, 2015) and PAUDA (Huson and Xie, 2014), respectively (see Supplementary Material section 2.1.2).

3 Results

We applied Tax4Fun and PICRUSt to a collection of paired metagenome/16S rRNA datasets that have also been used in the original PICRUSt study (Fierer ; Harris ; Human Microbiome Project Consortium, 2012; Kunin ; Muegge ) (see Supplementary Material section 1.1). Before applying Tax4Fun, the SILVA-based 16S rRNA profiles were computed using the QIIME tool (Caporaso ) or the SILVAngs web server (Quast ), respectively (see Supplementary Material section 2.2 and 2.3). For each paired dataset, the Spearman correlation of the whole metagenome and the 16S rRNA-predicted relative KO abundance profile was calculated. For the computation of the correlation, we excluded for each dataset all KO profile dimensions that did not contribute any non-zero count in the functional profiles. The resulting correlation coefficients are shown in Figure 1 for the UProC-based functional profiles. Using Tax4Fun, the median of the correlation coefficient varies between 0.8706 (soils) and 0.6427 (Guerrero Negro hypersaline microbial mat).
Fig. 1.

Spearman correlations between metagenomic and 16S-predicted functional profiles for comparison of Tax4Fun and PICRUSt on paired datasets from the human microbiome (HMP), mammalian guts, Guerrero Negro hypersaline microbial mat and soils

Spearman correlations between metagenomic and 16S-predicted functional profiles for comparison of Tax4Fun and PICRUSt on paired datasets from the human microbiome (HMP), mammalian guts, Guerrero Negro hypersaline microbial mat and soils In comparison with PICRUSt the correlation of Tax4Fun is significantly higher for all four datasets according to a nonparametric sign test (P-value < 0.001). Similar results are obtained using the PAUDA tool for estimation of the functional profiles (see Supplementary Material section 4.2). Further, we compared the coverage of the analysis pipelines in terms of the fraction of reads that were classified by QIIME/SILVAngs and the percentage of OTUs that were mapped to KEGG organisms using Tax4Fun. Especially for the soil samples, we observed rather low fractions of 16S rRNA sequences that were finally used to predict the functional profiles (SILVAngs: 0.02%, Tax4Fun: 4.78%; QIIME: 95.21%, Tax4Fun: 55.36%). Contrary, the coverage for the human microbiome and mammalian guts datasets is rather high for both QIIME/SILVAngs and Tax4Fun (SILVAngs: 95%, Tax4Fun 95%). In our study, the soil samples are probably the most complex communities under investigation. Further, our results revealed that members of the soil communities are poorly represented in the KEGG database. However, even when applying SILVAngs + Tax4Fun, the median of the correlation coefficients was rather high using merely a fraction of the reads to predict the functional profiles. Thus, a high correlation coefficient does not necessarily indicate the completeness of the estimated functional repertoire but rather provides a measure of correspondence between the whole metagenome and the 16S rRNA-predicted KO abundances. Therefore, the prediction accuracy is not a function of sample diversity but rather depends on a good correspondence between organisms in genome and 16S rRNA databases. Even though a sample is very diverse good predictions can be obtained in case that many of the detected organisms or close relatives are available in both databases. In contrast, the predicted functional profile of samples with large fractions of unknown organisms can be expected to be incomplete due to the low coverage of database reference profiles. Thus, the coverage of the taxonomic assignments should always be inspected to check the reliability of the predictions, in particular when using SILVAngs. For all datasets, the coverage values are provided in the Supplementary Excel File.

4 Conclusion

Tax4Fun predicts the functional profile of a microbial community just from 16S rRNA sequence data. Our approach cannot replace whole metagenome profiling but is useful to supplement 16S rRNA analyses in metagenome pre-studies or in situations where shotgun sequencing is prohibitively expensive, e.g. for broad surveys in microbial ecology applications. We evaluated our method on four paired data collections from different habitats and compared it to the PICRUSt tool. The results indicate a high correlation of the predicted Tax4Fun profiles with the corresponding functional profiles obtained from whole metagenome sequence data. Moreover, the results show that Tax4Fun outperforms PICRUSt on all test datasets. Additionally, our results revealed for all datasets a higher correlation between the metagenomic and 16S-predicted functional profiles when using UProC in comparison to the PAUDA tool (see Supplementary Material section 3 Figure S1–S3). Although we provide functional reference profiles from both tools for Tax4Fun, we recommend the usage of Tax4Fun in combination with the UProC-based reference profiles for prediction because of the higher sensitivity of UProC. Tax4Fun allows easy processing of the output from SILVAngs, QIIME or any other analysis pipeline using the SILVA database as reference. The implementation in R facilitates further statistical analyses of the Tax4Fun predictions, which can be processed within the same R environment.

Funding

Grants from the Deutsche Forschungsgemeinschaft (ME 3138, to P.M. in part, TRR51, to R.D. in part). Conflict of Interest: none declared.
  13 in total

1.  Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans.

Authors:  Brian D Muegge; Justin Kuczynski; Dan Knights; Jose C Clemente; Antonio González; Luigi Fontana; Bernard Henrissat; Rob Knight; Jeffrey I Gordon
Journal:  Science       Date:  2011-05-20       Impact factor: 47.728

2.  QIIME allows analysis of high-throughput community sequencing data.

Authors:  J Gregory Caporaso; Justin Kuczynski; Jesse Stombaugh; Kyle Bittinger; Frederic D Bushman; Elizabeth K Costello; Noah Fierer; Antonio Gonzalez Peña; Julia K Goodrich; Jeffrey I Gordon; Gavin A Huttley; Scott T Kelley; Dan Knights; Jeremy E Koenig; Ruth E Ley; Catherine A Lozupone; Daniel McDonald; Brian D Muegge; Meg Pirrung; Jens Reeder; Joel R Sevinsky; Peter J Turnbaugh; William A Walters; Jeremy Widmann; Tanya Yatsunenko; Jesse Zaneveld; Rob Knight
Journal:  Nat Methods       Date:  2010-04-11       Impact factor: 28.547

3.  Protein signature-based estimation of metagenomic abundances including all domains of life and viruses.

Authors:  Heiner Klingenberg; Kathrin Petra Aßhauer; Thomas Lingner; Peter Meinicke
Journal:  Bioinformatics       Date:  2013-02-15       Impact factor: 6.937

4.  Structure, function and diversity of the healthy human microbiome.

Authors: 
Journal:  Nature       Date:  2012-06-13       Impact factor: 49.962

5.  UProC: tools for ultra-fast protein domain classification.

Authors:  Peter Meinicke
Journal:  Bioinformatics       Date:  2014-12-23       Impact factor: 6.937

6.  A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

Authors:  Daniel H Huson; Chao Xie
Journal:  Bioinformatics       Date:  2013-05-07       Impact factor: 6.937

7.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

8.  Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat.

Authors:  J Kirk Harris; J Gregory Caporaso; Jeffrey J Walker; John R Spear; Nicholas J Gold; Charles E Robertson; Philip Hugenholtz; Julia Goodrich; Daniel McDonald; Dan Knights; Paul Marshall; Henry Tufo; Rob Knight; Norman R Pace
Journal:  ISME J       Date:  2012-07-26       Impact factor: 10.302

9.  Millimeter-scale genetic gradients and community-level molecular convergence in a hypersaline microbial mat.

Authors:  Victor Kunin; Jeroen Raes; J Kirk Harris; John R Spear; Jeffrey J Walker; Natalia Ivanova; Christian von Mering; Brad M Bebout; Norman R Pace; Peer Bork; Philip Hugenholtz
Journal:  Mol Syst Biol       Date:  2008-06-03       Impact factor: 11.429

10.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences.

Authors:  Morgan G I Langille; Jesse Zaneveld; J Gregory Caporaso; Daniel McDonald; Dan Knights; Joshua A Reyes; Jose C Clemente; Deron E Burkepile; Rebecca L Vega Thurber; Rob Knight; Robert G Beiko; Curtis Huttenhower
Journal:  Nat Biotechnol       Date:  2013-08-25       Impact factor: 54.908

View more
  326 in total

1.  CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis.

Authors:  Sebastian Hupfauf; Mohammad Etemadi; Marina Fernández-Delgado Juárez; María Gómez-Brandón; Heribert Insam; Sabine Marie Podmirseg
Journal:  PLoS One       Date:  2020-12-02       Impact factor: 3.240

2.  Divergent Co-occurrence Patterns and Assembly Processes Structure the Abundant and Rare Bacterial Communities in a Salt Marsh Ecosystem.

Authors:  Shicong Du; Francisco Dini-Andreote; Nan Zhang; Chunling Liang; Zhiyuan Yao; Huajun Zhang; Demin Zhang
Journal:  Appl Environ Microbiol       Date:  2020-06-17       Impact factor: 4.792

3.  Effects of Agricultural Management on Rhizosphere Microbial Structure and Function in Processing Tomato Plants.

Authors:  Jennifer E Schmidt; Rachel L Vannette; Alexandria Igwe; Rob Blundell; Clare L Casteel; Amélie C M Gaudin
Journal:  Appl Environ Microbiol       Date:  2019-08-01       Impact factor: 4.792

4.  Metagenomic analysis of drinking water samples collected from treatment plants of Hyderabad City and Mehran University Employees Cooperative Housing Society.

Authors:  Junaid Ahmed Kori; Rasool Bux Mahar; Muhammad Raffae Vistro; Huma Tariq; Ishtiaq Ahmad Khan; Ramesh Goel
Journal:  Environ Sci Pollut Res Int       Date:  2019-08-07       Impact factor: 4.223

5.  A Tripartite Microbial-Environment Network Indicates How Crucial Microbes Influence the Microbial Community Ecology.

Authors:  Yushi Tang; Tianjiao Dai; Zhiguo Su; Kohei Hasegawa; Jinping Tian; Lujun Chen; Donghui Wen
Journal:  Microb Ecol       Date:  2019-08-19       Impact factor: 4.552

Review 6.  Mapping the microbial interactome: Statistical and experimental approaches for microbiome network inference.

Authors:  Anders B Dohlman; Xiling Shen
Journal:  Exp Biol Med (Maywood)       Date:  2019-03-16

7.  Prokaryotic diversity and biogeochemical characteristics of field living and laboratory cultured stromatolites from the hypersaline Laguna Interna, Salar de Atacama (Chile).

Authors:  Jorge R Osman; Pabla Viedma; Jorge Mendoza; Gustavo Fernandes; Michael S DuBow; Davor Cotoras
Journal:  Extremophiles       Date:  2021-05-16       Impact factor: 2.395

8.  The beneficial effects of ultraviolet light supplementation on bone density are associated with the intestinal flora in rats.

Authors:  Jingjing Cui; Yuming Fu; Zhihao Yi; Chen Dong; Hong Liu
Journal:  Appl Microbiol Biotechnol       Date:  2021-04-24       Impact factor: 4.813

9.  The Effect of Inoculation of a Diazotrophic Bacterial Consortium on the Indigenous Bacterial Community Structure of Sugarcane Apoplast Fluid.

Authors:  Carlos M Dos-Santos; Náthalia V S Ribeiro; Stefan Schwab; José I Baldani; Marcia S Vidal
Journal:  Curr Microbiol       Date:  2021-06-25       Impact factor: 2.188

10.  Fine particulate matter alters the microecology of the murine respiratory tract.

Authors:  Biao Yang; Yu Zhang; Bingyu Li; Yang Zou; Chunling Xiao
Journal:  Environ Sci Pollut Res Int       Date:  2019-02-01       Impact factor: 4.223

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.