| Literature DB >> 34473015 |
Žana Kapustina1,2, Justina Medžiūnė1,3, Gediminas Alzbutas1, Irmantas Rokaitis4, Karolis Matjošaitis1, Gytis Mackevičius1, Simona Žeimytė1, Laurynas Karpus4, Arvydas Lubys1.
Abstract
Sequence-based characterization of bacterial communities has long been a hostage of limitations of both 16S rRNA gene and whole metagenome sequencing. Neither approach is universally applicable, and the main efforts to resolve constraints have been devoted to improvement of computational prediction tools. Here, we present semi-targeted 16S rRNA sequencing (st16S-seq), a method designed for sequencing V1-V2 regions of the 16S rRNA gene along with the genomic locus upstream of the gene. By in silico analysis of 13 570 bacterial genome assemblies, we show that genome-linked 16S rRNA sequencing is superior to individual hypervariable regions or full-length gene sequences in terms of classification accuracy and identification of gene copy numbers. Using mock communities and soil samples we experimentally validate st16S-seq and benchmark it against the established microbial classification techniques. We show that st16S-seq delivers accurate estimation of 16S rRNA gene copy numbers, enables taxonomic resolution at the species level and closely approximates community structures obtainable by whole metagenome sequencing.Entities:
Keywords: 16S rRNA; high-throughput microbiome profiling; semi-targeted sequencing; targeted DNA sequencing
Mesh:
Substances:
Year: 2021 PMID: 34473015 PMCID: PMC8715429 DOI: 10.1099/mgen.0.000624
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Species-level discriminatory power of st16S-seq on complex samples and comparison with conventional techniques. (a) Shannon diversity indices obtained for six soil samples sequenced employing various library preparation approaches. (b) The fractions of taxa detected in each dataset as compared to either the total number of taxa identified in all soil samples by all sequencing methods (reference value ‘All’) or to taxa identified only by whole metagenome sequencing (reference value ‘WGS’). The analysis was conducted considering taxa for which abundance was above the defined minimum thresholds. (c) Principal components analysis considering the relative abundance of reads assigned per bacterial species across different soil samples processed by various library preparation techniques. The graph on the right depicts data that cluster near WGS. The dataset label ‘b’ stands for the downsampling level equivalent to the amount of on-target reads in st16S-seq datasets. The same downsampling strategy is true for WGS samples in all cases. The dataset label ‘s’ denotes the downsampling level equivalent to the number of unique reads in st16S-seq datasets retained after deduplication. The data for st16S-seq correspond to unique on-target reads in all cases.
Fig. 2.Validation of st16S-seq on mock community DNA standards and comparison with conventional techniques. (a) Read distribution across bacterial genera in libraries prepared from ZymoBIOMICS Microbial Community DNA standards with various commercially available kits and st16S-seq approach. Numbers above bars indicate Pearson’s correlation coefficients between the expected and obtained read distributions. Two replicates are shown for each sample. (b) Read distribution across bacterial genera in libraries prepared from ATCC microbiome standard (ATCC MSA-1002) DNA with various commercially available kits and the st16S-seq approach. Numbers above bars indicate Pearson’s correlation coefficients between the expected and obtained read distributions. Two replicates are shown for each sample. (c) The number of 16S rRNA gene copies detected by st16S-seq within genomes of the members of the ZymoBIOMICS Microbial Community DNA standard that equates to the number of 16S rRNA contigs after removal of artefactual sequences. (d) Species-level characterization of mock microbial communities using the NCBI database as a reference and either unmerged reads or only merged reads as an input for Kraken. Bars represent fractions of identified species as compared to the expected compositions.
Fig. 3.Discriminatory power of genomic regions upstream of the 16S rRNA gene. (a) Shannon entropy values of sequence regions upstream and within the 16S rRNA gene. Multiple sequence alignments were built on the basis of the database created in this study. (b) The percentage of identifiable 16S rRNA gene copy numbers as assessed by various regions of the 16S rRNA gene. (c) Krona charts depicting in silico estimated classification accuracy at the species level. The outer ring corresponds to the genus/family level. The size of circular fragments is proportional to the number of sequences belonging to the rank. For near-16S sequences, the length of included genomic fragments is indicated (100, 400, 1000 bp). In all cases, near-16S regions were linked to V1–V2 16S rRNA sequences. (d) Distribution of mean Shannon entropy values at different taxonomic ranks as assessed for sequences upstream of the 16S rRNA gene. Each boxplot represents 20 members of the taxonomic rank, with maximum number of five sub-members for each member of the rank and up to two strains per species. Centre line – median, box limits – upper and lower quartiles, whiskers – 1.5× interquartile range, points – outliers. (e) Distribution of fractions of identifiable strains within species. Each boxplot represents 100 species with the highest number of strains. For near-16S regions, the upstream fragment length (1000, 800, 600, 400 and 200 bp) is indicated. Near-16S regions in all cases were linked to the V1–V2 16S rRNA sequences. Centre line – median, box limits – upper and lower quartiles, whiskers – 1.5× interquartile range, points – outliers.
Fig. 4.Semi-targeted sequencing approach. (a) Outline of st16S-seq library preparation. (b) Read coverage of each of the six 16S rRNA gene copies and upstream regions within the genome. (c) The structure of oligonucleotide-tethered dideoxynucleotides (OTDDNs) as exemplified by oligo-modified ddUTP.