| Literature DB >> 35416715 |
Chengfeng Yang1,2, Qinzhi Su1,2, Min Tang2, Shiqi Luo2, Hao Zheng1, Xue Zhang2, Xin Zhou2.
Abstract
An in-depth understanding of microbial function and the division of ecological niches requires accurate delineation and identification of microbes at a fine taxonomic resolution. Microbial phylotypes are typically defined using a 97% small subunit (16S) rRNA threshold. However, increasing evidence has demonstrated the ubiquitous presence of taxonomic units of distinct functions within phylotypes. These so-called sequence-discrete populations (SDPs) have used to be mainly delineated by disjunct sequence similarity at the whole-genome level. However, gene markers that could accurately identify and quantify SDPs are lacking in microbial community studies. Here, we developed a pipeline to screen single-copy protein-coding genes that could accurately characterize SDP diversity via amplicon sequencing of microbial communities. Fifteen candidate marker genes were evaluated using three criteria (extent of sequence divergence, phylogenetic accuracy, and conservation of primer regions) and the selected genes were subject to test the efficiency in differentiating SDPs within Gilliamella, a core honeybee gut microbial phylotype, as a proof-of-concept. The results showed that the 16S V4 region failed to report accurate SDP diversities due to low taxonomic resolution and changing copy numbers. In contrast, the single-copy genes recommended by our pipeline were able to successfully quantify Gilliamella SDPs for both mock samples and honeybee guts, with results highly consistent with those of metagenomics. The pipeline developed in this study is expected to identify single-copy protein coding genes capable of accurately quantifying diverse bacterial communities at the SDP level. IMPORTANCE Microbial communities can be distinguished by discrete genetic and ecological characteristics. These sequence-discrete populations are foundational for investigating the composition and functional structures of microbial communities at high resolution. In this study, we screened for reliable single-copy protein-coding marker genes to identify sequence-discrete populations through our pipeline. Using marker gene amplicon sequencing, we could accurately and efficiently delineate the population diversity in microbial communities. These results suggest that single copy protein-coding genes can be an accurate, quantitative, and economical alternative for characterizing population diversity. Moreover, the feasibility of a gene as marker for any bacterial population identification can be quickly evaluated by the pipeline proposed here.Entities:
Keywords: 16S; 16S V4 region; Gilliamella; SDP; metagenomics; microbiota; quantification
Mesh:
Substances:
Year: 2022 PMID: 35416715 PMCID: PMC9045262 DOI: 10.1128/spectrum.02105-21
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1Screening marker genes suitable for SDP discrimination and quantification. (A) SDPs are identified for gut bacterial phylotypes based on phylogenetic relationships and genome-wide pairwise average nucleotide identities (gANI). (B) A candidate marker gene for SDP discrimination is selected from a set of universal and single-copy genes based on sequence variation, phylogenetic relationship, and well-conserved regions for primer design. (C) The performance of marker gene amplicon sequencing (MGAS) on SDP identification and quantification is validated and compared as characterized using the mock samples and gut communities.
FIG 2Marker genes are highly variable among SDPs. (A) Average Shannon entropy of the 15 marker genes and the 16S gene at both phylotype and SDP levels of honeybee gut bacteria. Numbers in brackets for each of the SDP groups indicate the number of strains examined for that specific group. (B) The Shannon entropy across 16S and candidate marker genes of all A. cerana Gilliamella. The Shannon entropy value is subsequently averaged by a 20-bp slide-window at a 5-bp step. Gray shadows depict conserved regions optimal for primer-binding sites and blue shadows are considered hypervariable regions in this study. Dash lines represent the mean Shannon entropy values cross all sequences. Gray lines depict the classic variable regions of the 16S gene. Apib: Apibacter; Bifido: Bifidobacterium; Firm5: Lactobacillus Firm5; Gillia: Gilliamella; Snod: Snodgrassella alvi.
FIG 3MGAS accurately identifies A. cerana Gilliamella SDPs. (A) Intraclass correlation coefficient (ICC) of relative abundance among the three replicates of MGAS samples. The ICC is calculated using the two-way mixed effects model with consistency (C) as the relationship among replicates, and single (1) result as the unit of measurement, i.e., ICC(C, 1). (B) Relative SDP abundances in mock samples revealed by marker gene sequencing. The results shown in the heatmap are the logarithms of the relative abundances of the five representative strains of the five SDPs of A. cerana Gilliamella. Gray box indicates a relative abundance at zero. False positive results are framed in red. (C) Spearman correlation of SDP abundances in A. cerana Gilliamella communities revealed by sequencing against mock samples. P < 2.2e-16. The black line presents the linear regression of the MGAS results against SDP abundances in mock samples. The blue solid and gray dashed lines represent a 1: 1 line and the fitted exponential regression (with 95% confidence interval shown in gray shade), respectively. (D) Minimum read numbers required for detecting members at low abundances.
FIG 4MGAS shows high congruence to metagenomics sequencing at SDP-level analysis. (A) Relative abundances of Gilliamella SDPs revealed by MGAS (frr) and metagenomics sequencing of A. cerana gut communities. (B) Spearman correlation coefficient between MGAS and metagenomics results, with R2 = 0.99, P < 2.2e-16. The black line presents the linear regression of the MGAS results in SDP abundances against those of metagenomics. The blue solid and gray dashed lines represent a 1: 1 line and the fitted exponential regression (with 95% confidence interval shown in gray shade), respectively. (C) Shannon diversity index of SDP frequencies for bee guts from two locations calculated by MGAS (left panel) and metagenomics sequencing (right panel). The two methods showed no significant difference, with the P-value of 0.70 and 0.82 in SC and TW, respectively, by Wilcoxon rank-sum test. (D) Principal coordinate analysis (PCoA) based on Bray-Curtis dissimilarity of SDP compositions of honey bee workers from Sichuan and Taiwan using MGAS (left panel, Adonis PERMANOVA, R2 = 0.056, P = 0.204) and metagenomics sequencing (right panel, Adonis PERMANOVA, R2 = 0.096, P = 0.134). Each point represents the value for an individual bee, with the color showing its collection location (Sichuan or Taiwan). Note that samples B0108, B0120, B14756, B14757, and B14758had similar Gilliamella SDP compositions, therefore overlapped in the figure. The shaded ellipses represent 95% confidence intervals on the ordination. (E) Relative abundances of Gilliamella OTUs in the gut microbiota of A. cerana assigned by clustering at 97% or 99% thresholds for 16S V4 and frr. The result shown in the heatmap are the logarithms of the relative abundances of the OTUs or five SDPs. Individual bees are marked to right of each row. Gray box indicates a relative abundance at zero.