Literature DB >> 26104745

MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome.

Stephen Nayfach¹, Michael A Fischbach², Katherine S Pollard¹.

Abstract

UNLABELLED: Microbiome researchers frequently want to know how abundant a particular microbial gene or pathway is across different human hosts, including its association with disease and its co-occurrence with other genes or microbial taxa. With thousands of publicly available metagenomes, these questions should be easy to answer. However, computational barriers prevent most researchers from conducting such analyses. We address this problem with MetaQuery, a web application for rapid and quantitative analysis of specific genes in the human gut microbiome. The user inputs one or more query genes, and our software returns the estimated abundance of these genes across 1267 publicly available fecal metagenomes from American, European and Chinese individuals. In addition, our application performs downstream statistical analyses to identify features that are associated with gene variation, including other query genes (i.e. gene co-variation), taxa, clinical variables (e.g. inflammatory bowel disease and diabetes) and average genome size. The speed and accessibility of MetaQuery are a step toward democratizing metagenomics research, which should allow many researchers to query the abundance and variation of specific genes in the human gut microbiome.
AVAILABILITY AND IMPLEMENTATION: http://metaquery.docpollard.org. CONTACT: snayfach@gmail.comS UPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 26104745 PMCID： PMC4595903 DOI： 10.1093/bioinformatics/btv382

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

A number of large-scale shotgun metagenomics projects have been made publicly available, enabling researchers to investigate the functional composition of microbial communities from the human body and how microbial functions correlate with disease or other traits (Aagaard ; Li ). A common goal of many microbiome studies is to quantify the abundance of specific genes across these publicly available datasets. In most cases, this task involves (1) downloading metagenomes from public repositories, (2) mapping reads to a reference database and (3) estimating gene abundances. For example, this was the approach used by Donia to estimate the abundance of 14 000 biosynthetic gene clusters in the human microbiome. However, this approach is time-consuming and computationally demanding—requiring large amounts of storage space and processing power—and is therefore not practical for many research groups. In an attempt to address this issue, several microbiome studies have made their functional annotations publicly available. For example, the Human Microbiome Project (HMP) Data Analysis and Coordination Center provides the abundance of KEGG Orthology Groups across 649 metagenomes from the human microbiome. Other studies have provided similar resources for other samples and databases. While useful, these resources only represent a small proportion of available samples and their annotations typically cover only a small fraction of genes in most metagenomes. For example, only 36% of reads from the HMP were mapped to a KEGG Orthology Group (Abubucker ). Furthermore, databases such as KEGG use unsupervised methods to cluster genes into ortholog groups that may not track with protein function (Schnoes ). More recently, there have been several efforts to create comprehensive gene catalogs that cover a much higher proportion of genes in the gut microbiome. Most notably, Li used 1267 samples from six different studies together with 511 genomes from gut microbiota to assemble a gene catalog of 9.9 million non-redundant genes. MetaQuery leverages this existing resource in order to provide users the ability to rapidly estimate the abundance of one or more query sequences across 1267 fecal metagenomes. Instead of re-mapping metagenomic reads for each query, reads were mapped once to the gene catalog, which can then be queried many times. Our framework allows the specification of sequence homology thresholds, which enable the user to define the relationship between sequence similarity and function. Finally, we use a set of 30 universal single-copy genes to normalize gene abundances to eliminate biases due to average genome size and database coverage (Manor and Borenstein, 2015; Nayfach and Pollard, 2015). This simple yet efficient framework has the potential to make large-scale metagenomics research accessible to a greater number of microbiome researchers.

2 Implementation

2.1 Gene abundance estimation

MetaQuery leverages the gene catalog and gene abundances published by Li to rapidly estimate the abundance of one or more query genes in the human gut microbiome (Supplementary Fig. S1). First, the user submits one or more protein sequences in FASTA format, which are aligned against the gene catalog using either BLAST (Altschul ) or RAPsearch2 (Zhao ). Next, homologs of the query sequence(s) are identified in the gene catalog based on the resulting alignments and user-specified thresholds, which give the user flexibility to target either close or remote homologs of the query sequence in the gene catalog. For each query, the abundances of identified homologs are rapidly obtained from a precomputed matrix, and these abundances are summed per-query and per-sample. Next, gene abundances are optionally normalized using the relative abundance of 30 universal single-copy genes. Finally, gene abundance(s) are compared against a background set of queries in order to give the user a context in which to interpret their results.

2.2 Statistical analysis

After having obtained gene abundances, MetaQuery performs a number of statistical analyses. In the case of multiple query sequences, MetaQuery will build a Spearman correlation matrix of query genes across microbiome samples. Gene co-variation can identify genes that are physically linked on a genome, or genes that functionally interact in a metabolic pathway or protein complex. Next, Kruskal–Wallis tests are performed to identify genes that are differentially abundant between sample groups including: host continent (i.e. North America, Europe and Asia), and host health status (e.g. inflammatory bowel disease and diabetes). Finally, MetaQuery performs Spearman correlations of gene abundance versus average genome size (Nayfach and Pollard, 2015) and MetaPhlan (Segata ) taxonomic abundances.

3 Case study

We used MetaQuery to explore metagenomic variation of the fructan utilization locus found in Bacteroides thetaiotamicron. This locus consists of a cluster of co-regulated genes that degrade non-digestible fructose-based polysaccharides from the human diet (Sonnenburg ). We found that members of the locus tended to be quite abundant and varied extensively across gut microbiome samples, with an average estimated copy number of 1 per 50 cells, which ranked in the top 2% relative to other genes in the gene catalog. The locus was most abundant in American subjects (mean = 1 copy per 38 cells) and lowest in European individuals (mean = 1 copy per 220 cells) (Supplementary Fig. S2A). We found that the locus was marginally associated with both Crohn’s disease (P = 0.048) and diabetes (P = 0.045), indicating a potential role of microbes capable of fructan utilization in human disease (Supplementary Fig. S2B–D). Interestingly, variation of the fructan locus was strongly correlated with both AGS (ρ = 0.62) and the relative abundance of Bacteroides (ρ = 0.68), although even in communities with large AGS or high Bacteroides abundance, there was still a large variation in the abundance of the locus (Supplementary Fig. S3). Finally, we observed that the abundance of genes BT1757-58 and BT1760-63 was strongly correlated across hosts (all ρ > 0.97), which is consistent with the fact that these genes are physically and functionally linked (Supplementary Fig. S4).

4 Conclusions

MetaQuery is a web application that allows rapid and quantitative analysis of genes in the human gut microbiome. Our simple framework should enable researchers to easily investigate metagenomic variation of specific genes of interest across a large cohort of samples from the gut microbiome. Our current reference database contains genes and abundances for 1267 samples. In the future, these databases could be updated as additional fecal metagenomes become publicly available. Finally, this framework is not restricted to the human gut microbiome and could be applied to other environments, including metagenomes from soil and marine environments.

11 in total

1. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

2. Specificity of polysaccharide use in intestinal bacteroides species determines diet-induced microbiota alterations.

Authors: Erica D Sonnenburg; Hongjun Zheng; Payal Joglekar; Steven K Higginbottom; Susan J Firbank; David N Bolam; Justin L Sonnenburg
Journal: Cell Date: 2010-06-24 Impact factor: 41.582

3. An integrated catalog of reference genes in the human gut microbiome.

Authors: Junhua Li; Huijue Jia; Xianghang Cai; Huanzi Zhong; Qiang Feng; Shinichi Sunagawa; Manimozhiyan Arumugam; Jens Roat Kultima; Edi Prifti; Trine Nielsen; Agnieszka Sierakowska Juncker; Chaysavanh Manichanh; Bing Chen; Wenwei Zhang; Florence Levenez; Juan Wang; Xun Xu; Liang Xiao; Suisha Liang; Dongya Zhang; Zhaoxi Zhang; Weineng Chen; Hailong Zhao; Jumana Yousuf Al-Aama; Sherif Edris; Huanming Yang; Jian Wang; Torben Hansen; Henrik Bjørn Nielsen; Søren Brunak; Karsten Kristiansen; Francisco Guarner; Oluf Pedersen; Joel Doré; S Dusko Ehrlich; Peer Bork; Jun Wang
Journal: Nat Biotechnol Date: 2014-07-06 Impact factor: 54.908

4. A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics.

Authors: Mohamed S Donia; Peter Cimermancic; Christopher J Schulze; Laura C Wieland Brown; John Martin; Makedonka Mitreva; Jon Clardy; Roger G Linington; Michael A Fischbach
Journal: Cell Date: 2014-09-11 Impact factor: 41.582

5. Metagenomic microbial community profiling using unique clade-specific marker genes.

Authors: Nicola Segata; Levi Waldron; Annalisa Ballarini; Vagheesh Narasimhan; Olivier Jousson; Curtis Huttenhower
Journal: Nat Methods Date: 2012-06-10 Impact factor: 28.547

6. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.

Authors: Yongan Zhao; Haixu Tang; Yuzhen Ye
Journal: Bioinformatics Date: 2011-10-28 Impact factor: 6.937

7. Metabolic reconstruction for metagenomic data and its application to the human microbiome.

Authors: Sahar Abubucker; Nicola Segata; Johannes Goll; Alyxandria M Schubert; Jacques Izard; Brandi L Cantarel; Beltran Rodriguez-Mueller; Jeremy Zucker; Mathangi Thiagarajan; Bernard Henrissat; Owen White; Scott T Kelley; Barbara Methé; Patrick D Schloss; Dirk Gevers; Makedonka Mitreva; Curtis Huttenhower
Journal: PLoS Comput Biol Date: 2012-06-13 Impact factor: 4.475

8. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Authors: Stephen Nayfach; Katherine S Pollard
Journal: Genome Biol Date: 2015-03-25 Impact factor: 13.583

9. MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome.

Authors: Ohad Manor; Elhanan Borenstein
Journal: Genome Biol Date: 2015-03-25 Impact factor: 13.583

10. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies.

Authors: Alexandra M Schnoes; Shoshana D Brown; Igor Dodevski; Patricia C Babbitt
Journal: PLoS Comput Biol Date: 2009-12-11 Impact factor: 4.475

28 in total

Review 1. Investigating a holobiont: Microbiota perturbations and transkingdom networks.

Authors: Renee Greer; Xiaoxi Dong; Andrey Morgun; Natalia Shulzhenko
Journal: Gut Microbes Date: 2016-03-16

2. Discovery of Reactive Microbiota-Derived Metabolites that Inhibit Host Proteases.

Authors: Chun-Jun Guo; Fang-Yuan Chang; Thomas P Wyche; Keriann M Backus; Timothy M Acker; Masanori Funabashi; Mao Taketani; Mohamed S Donia; Stephen Nayfach; Katherine S Pollard; Charles S Craik; Benjamin F Cravatt; Jon Clardy; Christopher A Voigt; Michael A Fischbach
Journal: Cell Date: 2017-01-19 Impact factor: 41.582

3. Modulation of a Circulating Uremic Solute via Rational Genetic Manipulation of the Gut Microbiota.

Authors: A Sloan Devlin; Angela Marcobal; Dylan Dodd; Stephen Nayfach; Natalie Plummer; Tim Meyer; Katherine S Pollard; Justin L Sonnenburg; Michael A Fischbach
Journal: Cell Host Microbe Date: 2016-12-01 Impact factor: 21.023

4. A widely distributed metalloenzyme class enables gut microbial metabolism of host- and diet-derived catechols.

Authors: Vayu Maini Rekdal; Paola Nol Bernadino; Michael U Luescher; Sina Kiamehr; Chip Le; Jordan E Bisanz; Peter J Turnbaugh; Elizabeth N Bess; Emily P Balskus
Journal: Elife Date: 2020-02-18 Impact factor: 8.140

Review 5. Toward Accurate and Quantitative Comparative Metagenomics.

Authors: Stephen Nayfach; Katherine S Pollard
Journal: Cell Date: 2016-08-25 Impact factor: 41.582

Review 6. Recent Advances in the Etiopathogenesis of Inflammatory Bowel Disease: The Role of Omics.

Authors: Eleni Stylianou
Journal: Mol Diagn Ther Date: 2018-02 Impact factor: 4.074

7. CRISPR-Cas System of a Prevalent Human Gut Bacterium Reveals Hyper-targeting against Phages in a Human Virome Catalog.

Authors: Paola Soto-Perez; Jordan E Bisanz; Joel D Berry; Kathy N Lam; Joseph Bondy-Denomy; Peter J Turnbaugh
Journal: Cell Host Microbe Date: 2019-09-03 Impact factor: 21.023

8. Depletion of microbiome-derived molecules in the host using Clostridium genetics.

Authors: Chun-Jun Guo; Breanna M Allen; Kamir J Hiam; Dylan Dodd; Will Van Treuren; Steven Higginbottom; Kazuki Nagashima; Curt R Fischer; Justin L Sonnenburg; Matthew H Spitzer; Michael A Fischbach
Journal: Science Date: 2019-12-13 Impact factor: 47.728

9. Gut-inhabiting Clostridia build human GPCR ligands by conjugating neurotransmitters with diet- and human-derived fatty acids.

Authors: Fang-Yuan Chang; Piro Siuti; Stephane Laurent; Thomas Williams; Emerson Glassey; Andreas W Sailer; David Benjamin Gordon; Horst Hemmerle; Christopher A Voigt
Journal: Nat Microbiol Date: 2021-04-12 Impact factor: 17.745

10. Methotrexate impacts conserved pathways in diverse human gut bacteria leading to decreased host immune activation.

Authors: Renuka R Nayak; Margaret Alexander; Ishani Deshpande; Kye Stapleton-Gray; Bipin Rimal; Andrew D Patterson; Carles Ubeda; Jose U Scher; Peter J Turnbaugh
Journal: Cell Host Microbe Date: 2021-01-12 Impact factor: 21.023