| Literature DB >> 32382098 |
R Berlemont1, N Winans2, D Talamantes2,3, H Dang2, H-W Tsai2.
Abstract
The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don't mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. Next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RAST. In addition, using the sister program "GeneHunt" to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes.Entities:
Mesh:
Year: 2020 PMID: 32382098 PMCID: PMC7205989 DOI: 10.1038/s41598-020-63775-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Fold increase (log10) in the total normalized domain-specific hit count, accounting for the domain length, relative to the total raw domain specific hit count (log10) in the mouse GIT. (b) Rarefied-normalized domain specific hit count across the mouse GIT (only showing domain >100 hits). (c) Heatmap showing the distribution of rarefied-normalized GH domains most affected by the sample origin in the mouse GIT (see text, Mx:F/M:S/I/C/L – Mouse#: Female/Male: Stomach/Intestine/Cecum/Colon).
Figure 2(a) Sample clustering based on the (complete) microbial community composition identified at the genus level, after rarefaction, using Bray-Curtis dissimilarity index, and complete linkage. (Mx:F/M:S/I/C/L – Mouse#: Female/Male: Stomach/Intestine/Cecum/Colon). (b) Bar-plot highlighting the microbial community composition across samples, for clarity only the genera accounting for at least 1% of community, after rarefaction, of the annotated reads are displayed (V = phylum Verrucomicrobia, B = Bacteroidetes, A = Actinobacteria, F = Firmicutes). (c) NMDS analysis (2D stress=0.020) revealing the sample clustering overlaid with all the identified bacterial genera. The genera are color-coded by phylum and the major groups, highlighted in (b), are labelled individually. The size mirrors the maximum frequency of the genus across samples.
Figure 3Pairwise comparison of the structural and functional dissimilarities (Bray-Curtis) across samples from the same location, the lines depict the linear regressions.