| Literature DB >> 36229545 |
Jean-Sebastien Gounot1, Minghao Chia1, Denis Bertrand1, Woei-Yuh Saw2,3, Aarthi Ravikrishnan1, Adrian Low4, Yichen Ding4, Amanda Hui Qi Ng1, Linda Wei Lin Tan5, Yik-Ying Teo6,7,8, Henning Seedorf9,10, Niranjan Nagarajan11,12.
Abstract
Despite extensive efforts to address it, the vastness of uncharacterized 'dark matter' microbial genetic diversity can impact short-read sequencing based metagenomic studies. Population-specific biases in genomic reference databases can further compound this problem. Leveraging advances in hybrid assembly (using short and long reads) and Hi-C technologies in a cross-sectional survey, we deeply characterized 109 gut microbiomes from three ethnicities in Singapore to comprehensively reconstruct 4497 medium and high-quality metagenome assembled genomes, 1708 of which were missing in short-read only analysis and with >28× N50 improvement. Species-level clustering identified 70 (>10% of total) novel gut species out of 685, improved reference genomes for 363 species (53% of total), and discovered 3413 strains unique to these populations. Among the top 10 most abundant gut bacteria in our study, one of the species and >80% of strains were unrepresented in existing databases. Annotation of biosynthetic gene clusters (BGCs) uncovered more than 27,000 BGCs with a large fraction (36-88%) unrepresented in current databases, and with several unique clusters predicted to produce bacteriocins that could significantly alter microbiome community structure. These results reveal significant uncharacterized gut microbial diversity in Southeast Asian populations and highlight the utility of hybrid metagenomic references for bioprospecting and disease-focused studies.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36229545 PMCID: PMC9561172 DOI: 10.1038/s41467-022-33782-z
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Assembly strategy for high-quality microbiome references.
a Boxplots showing the number of MAGs obtained across metagenomic datasets using short-read and hybrid assemblies (n = 109). b Stacked barchart showing genus-specific breakdown of the number of MAGs obtained using short-read and hybrid assemblies (left) and boxplots for corresponding relative abundances of the genera (right). N represents the number of hybrid only MAGs for each genus. c Scatter-plot showing the relative abundance of Bifidobacterium genomes estimated using short-read or hybrid assemblies for a sample (y-axis) versus corresponding relative abundances obtained using the standard Kraken2 database (x-axis). Points found along the x-axis represent Bifidobacterium species found using the Kraken standard database but not found using either short-read or hybrid MAGs. d Violin plots showing the distribution of a contiguity metric (N50 – largest contig size where >50% of the genome is in larger contigs) for short-read and hybrid assembly based MAGs. e Stacked barcharts showing the relative proportion of MAGs satisfying different MIMAG quality standards with short-read and hybrid assemblies of SPMP datasets. f Violin plots showing the relative improvement in contiguity (N50) obtained using hybrid assembly MAGs from SPMP relative to matched genomes in the GTDB database. g Barcharts showing the number of GTDB reference genomes which were improved from medium to high MIMAG quality using SPMP MAGs. Center lines in the boxplots represent median values, box limits represent upper and lower quartile values, whiskers represent 1.5 times the interquartile range above the upper quartile and below the lower quartile, and all data points are represented as dots in the figures. Source data are provided as a source data file.
Fig. 2Characterization of novel species, strains and gene families in SPMP genomes.
a Rarefaction analysis showing that the SPMP database covers a substantial fraction of the species level diversity in its MAGs. Error bands represent confidence intervals of 95%. b Pie-chart showing the breakdown of species-level clusters in SPMP that have an isolate genome, only have MAGs (uncultivated) and are novel compared to genomes in public databases (UHGG, GTDB, SGB). c Stacked barcharts showing the number of SPMP strains that have an isolate genome, only have MAGs (uncultivated), and are novel compared to all UHGG genomes (>200,000, <99% ANI). The species shown are the top 20 in terms of median relative abundance in SPMP (most abundant on the left). d Stacked barcharts showing the number of BGCs (top) and GCFs (bottom) in different product classes that are present or absent in existing annotations comprising of the antiSMASH and MiBIG databases as well as antiSMASH annotations from HRGM. Inset piecharts show the overall breakdown. e Synteny plots showing the conservation of gene order and orientation (colored arrows, relatedness shown by vertical lines) for a novel GCF (GCF382) and related families. f Network diagrams depicting correlations between gut microbial species (nodes – species, edges – significant correlations) and overall microbiome structure in SPMP metagenomes when stratified based on presence or absence of GCF 382/271/37 (or missing the corresponding transporter gene) in a Blautia species (enlarged teal node, solid edges to correlated species, dashed edges between other nodes). Source data are provided as a source data file.