| Literature DB >> 33208514 |
Luisa W Hugerth1, Marcela Pereira1, Yinghua Zha1, Maike Seifert1, Vilde Kaldhusdal2, Fredrik Boulund1, Maria C Krog3,4, Zahra Bashir3,5, Marica Hamsten1, Emma Fransson1,6, Henriette Svarre-Nielsen3,7, Ina Schuppe-Koistinen1, Lars Engstrand8.
Abstract
The vaginal microbiome has been connected to a wide range of health outcomes. This has led to a thriving research environment but also to the use of conflicting methodologies to study its microbial composition. Here, we systematically assessed best practices for the sequencing-based characterization of the human vaginal microbiome. As far as 16S rRNA gene sequencing is concerned, the V1-V3 region performed best in silico, but limitations of current sequencing technologies meant that the V3-V4 region performed equally well. Both approaches presented very good agreement with qPCR quantification of key taxa, provided that an appropriate bioinformatic pipeline was used. Shotgun metagenomic sequencing presents an interesting alternative to 16S rRNA gene amplification and sequencing but requires deeper sequencing and more bioinformatic expertise and infrastructure. We assessed different tools for the removal of host reads and the taxonomic annotation of metagenomic reads, including a new, easy-to-build and -use reference database of vaginal taxa. This curated database performed as well as the best-performing previously published strategies. Despite the many advantages of shotgun sequencing, none of the shotgun approaches assessed here agreed with the qPCR data as well as the 16S rRNA gene sequencing.IMPORTANCE The vaginal microbiome has been connected to various aspects of host health, including susceptibility to sexually transmitted infections as well as gynecological cancers and pregnancy outcomes. This has led to a thriving research environment but also to conflicting available methodologies, including many studies that do not report their molecular biological and bioinformatic methods in sufficient detail to be considered reproducible. This can lead to conflicting messages and delay progress from descriptive to intervention studies. By systematically assessing best practices for the characterization of the human vaginal microbiome, this study will enable past studies to be assessed more critically and assist future studies in the selection of appropriate methods for their specific research questions.Entities:
Keywords: 16S rRNA; PCR; amplicon; human microbiome; metagenomics; molecular methods; quantitative methods; vaginal microbiome
Year: 2020 PMID: 33208514 PMCID: PMC7677004 DOI: 10.1128/mSphere.00448-20
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
List of primer pairs considered for in silico analysis, including region, sequence, and citation
| Variable region | Forward primer position | Forward primer sequence | Reverse primer position | Reverse primer sequence | Reference |
|---|---|---|---|---|---|
| V1-V2 | 27f | AGAGTTTGATCCTGGCTCAG | 338r | GCTGCCTCCCGTAGGAGT | |
| V1-V3 | 27f | AGAGTTTGATCCTGGCTCAG | 534r | ATTACCGCGGCTGCTGG | |
| V1-V3 | 27f-pool | 4× AGAGTTTGATYMTGGCTCAG; 1× AGGGTTCGATTCTGGCTCAG; 1× AGAATTTGATCTTGGTTCAG | 515r | TTACCGCGGCKGCTGVCAC | |
| V1-V3 | 27f-pool | 4× AGAGTTTGATYMTGGCTCAG; 1× AGGGTTCGATTCTGGCTCAG; 1× AGAATTTGATCTTGGTTCAG | 534r | ATTACCGCGGCTGCTGG | |
| V3-V4 | 319f | ACTCCTRCGGGAGGCAGCAG | 806r | GGACTACHVGGGTWTCTAAT | |
| V3-V4 | 341f | CCTACGGGNGGCWGCAG | 805r | GACTACHVGGGTATCTAATCC | |
| V3-V5 | 357f | CCTACGGGAGGCAGCAG | 926r | CCGTCAATTCMTTTRAGT | |
| V4 | 515f | GTGCCAGCMGCCGCGGTAA | 806r | GGACTACHVGGGTWTCTAAT | |
| V4-V5 | 515f | GTGCCAGCMGCCGCGGTAA | 907r | CCGTCAATTCMTTTRAGT | |
| V6 | 967f | CAACGCGARGAACCTTACC | 1061r | ACAACACGAGCTGACGAC |
Summary of the analyses presented in this work, including parameters varied and where to find the relevant results
| Data type | Goal of analysis | Parameters assessed | Parameters kept constant | Ideal scenario | Figure(s) or table(s) presenting results |
|---|---|---|---|---|---|
| Assess the percentage of vaginotropic species captured by different common primer pairs | Primer pairs 27f-338r, 27f-515r, 27f-534r, 319f-806r, 341f-805r, 357f-926r, 515f-806r, 515f-907r, 967f-1061r | No errors were simulated | The ideal primer set would give 100% coverage of all genera in the database | ||
| Assess how often amplicons can be annotated to species level | Primer pairs 27f-338r, 27f-515r, 27f-534r, 319f-806r, 341f-805r, 357f-926r, 515f-806r, 515f-907r, 967f-1061r; directly mapping amplicons or using the DADA2 classifier | No errors were simulated; Only the SILVA128 database was assessed | The ideal primer set would give 100% correct species-level annotation regardless of the method used | ||
| Amplicons | Assess the reproducibility of PCR triplicates | Three primer pairs (27f-515r, 27f-534r, 341f-805r) were each used in triplicate to amplify 8 different pools of samples | PCR parameters were not varied | Each triplicate would perfectly align, as well as triplicates from different primer pairs | |
| Amplicons | Evaluate the possibility of reducing PCR biases by using a single-step PCR amplification | 2-step (20 + 10 cycles) vs. 1-step (25 cycles) of amplification | PCR parameters for the first PCR are identical to the ones for the 1-step process; cleaning procedures are identical | Reducing PCR cycles would be cost effective and reduce PCR artifacts | |
| Amplicons | Assess whether reads covering the V1-V3 regions can be merged | Total no. of reads merged and taxonomic bias in merging | Merging and taxonomic annotation procedures | Reads would be merged at a high rate and with no taxonomic bias | |
| Amplicons | Assess the effects of the read processing strategy used on the estimated alpha-diversity | Clustering at 97%, DADA2 error correction and Unoise error correction | Read trimming and merging/concatenation were not varied | Each procedure would generate a comparable alpha-diversity and cluster size profile regardless of the parameters used | |
| Amplicons | Assess the effects of different annotation strategies on the perceived taxonomic profile | Mapping vs DADA2 classifier; SILVA, RDP, and GTDB databases | Mapping vs. DADA2 classifier was assessed against the SILVA database | Each procedure would generate a comparable taxonomic profile, which would also agree with the qPCR results | |
| Shotgun sequencing | Assess the effect of different human DNA removal strategies on the rate of microbial and human reads retained | BMTagger, BBMap, Kraken in quick and sensitive modes, Bowtie2 in quick mode | Bowtie2 in sensitive mode was used as the gold standard of comparison | All microbial reads would be kept, while all human reads would be discarded | |
| Shotgun sequencing | Assess the taxonomic bias of human read removal | BMTagger, BBMap, Kraken in quick and sensitive modes, Bowtie2 in quick mode | Bowtie2 in sensitive mode was used as the gold standard of comparison | The microbial reads wrongly discarded would be evenly distributed across the microbial tree of life | No differences found |
| Shotgun sequencing | Assess the effect of different taxonomic annotation strategies on the perceived microbial profile | Metaphlan, Metalign, Kraken2 against their standard microbial database, Kraken2 against the OptiVag database and VIRGO | Parameters were not varied within each classifier | Each procedure would generate a comparable taxonomic profile, which would also agree with the qPCR results |
Coverage of each primer assessed individually
| Primer | No. of counts | % coverage |
|---|---|---|
| 27f, simple | 249 | 26.8 |
| 27f, pool | 297 | 31.9 |
| 319f | 896 | 96.3 |
| 338r | 909 | 97.7 |
| 357f | 892 | 95.9 |
| 515f | 898 | 96.6 |
| 534f | 896 | 96.3 |
| 806r | 848 | 91.2 |
| 907/926r | 842 | 90.5 |
| 967f | 670 | 72.0 |
| 1061r | 55 | 5.9 |
Coverage of primers in relevant pairs
| Primer pair | Approach | Proportion (%) of database matched |
|---|---|---|
| 27f-338r | Pessimistic | 26.1 |
| 27f-338r | Optimistic | 95.7 |
| 27pool-515r | Pessimistic | 30.9 |
| 27fpool-515r | Optimistic | 96.6 |
| 27f-534r | Pessimistic | 25.6 |
| 27f-534r | Optimistic | 96.3 |
| 27pool-534r | Pessimistic | 25.6 |
| 27pool-534r | Optimistic | 96.3 |
| 319f-806r | NA | 87.8 |
| 341f-805r | NA | 88.5 |
| 357f-926r | NA | 87.6 |
| 515f-806r | NA | 88.9 |
| 515f-907r | NA | 88.1 |
| 967f-1061r | NA | 67.3 |
NA, not applicable.
FIG 1Heat map of taxon coverage at the genus level for commonly used primer pairs. Each column represents a primer pair, and each row depicts a vaginotropic genus. The percentage of sequences in each genus covered by each primer is indicated through a color scale. Most previously published work uses 16S primers with good coverage, but a few genera remain a problem, such as Chlamydia and Sneathia.
FIG 2Bar plots showing the taxonomic classification accuracy of each primer pair under two classification schemes. DADA2 taxonomic annotation gives higher taxonomic resolution than mapping to a comparable database, both in general and for Lactobacillus in particular. The entire OptiVag database was extracted in silico with each of the candidate primer sets, without errors. (Top left) The complete database, annotated by mapping; (top right) the complete database, annotated with DADA2’s algorithm; (bottom left) same as panel a but focusing only on Lacobacillus; (bottom right) same as panel b but focusing only on Lactobacillus.
FIG 3Effect of various analysis parameters on alpha- and beta-diversity of real amplicons. Orange, V3-V4, 2-step PCR; blue, V3-V4, 1-step PCR; green, V1-V3–515r; gray, V1-V3–534r. (a) Nonmetric multidimensional scaling of the 8 pools, processed in triplicate with V3-V4 primers, shows good replicability within triplicates and regardless of PCR set-up (single-step versus nested reactions). (b) Chao1 richness estimate for each of the samples in panel a. The 1-step PCR approach generally yields a lower richness estimate but has slightly higher variability within triplicates. (c) Box plots depicting the estimated relative abundance of Lactobacillus spp. in each sample when the reads were merged or simply concatenated. There is a disproportional loss of Lactobacillus spp. upon attempting to merge long amplicons. (d) Bar plots showing the number of ASV or operational taxonomic units (OTU) of different cluster size classes obtained with each primer pair and error correction or clustering method. The effect of these choices on alpha-diversity estimates can be seen in Fig. S1.
FIG 4Effects of various parameters on the taxonomic annotation of real amplicons. The taxonomic accuracy of each of the 16S primer sets is good, but V1-V3–534r yields more shallow annotations. It can, however, reliably detect Chlamydia trachomatis spike-in DNA. (a) Box plots showing the depth of classification for each sequence with different classification strategies. The DADA2 classifier yielded higher taxonomic resolution thatn simply mapping, regardless of the database used. (b) Taxonomy bar plots for each of the pools, processed with 2-step PCR with each of the primer sets and with each of the databases SILVA, RDP, and GTDB. An average for each triplicate is shown. Each technical replicate can be seen in Fig. S2. Only ASV with >10 counts are included in this figure. (c) Same samples as in panel a, compared to qPCR results for Lactobacillus iners, Lactobacillus crispatus, and Gardnerella vaginalis. For each sample, the sum of these three taxa was normalized to 1, to make them comparable to the qPCR results in the triaxial plot.
FIG 5Effect of human-DNA removal strategies on the amount of bacterial and human DNA retained. The human-DNA content in the 8 pools varied from 86 to 98% of the total DNA content. Different DNA removal methods have various amounts of human DNA left in the filtered sample but also retain various amounts of the original microbial pool. Kraken and BMTagger remove most human DNA but also the most microbial reads. (a) Percentage of human reads in each sample before and after each human removal strategy. (b) Percentage of the original pool of microbial reads kept in each sample after each human removal strategy. (c) The two measurements in a and b are combined into a scatterplot to give an overview of the performance of each tool.
FIG 6Effect of taxonomy assignment strategy on the perceived taxonomic profile of each sample. Assigning taxonomy to shotgun metagenomic reads with various tools yields somewhat different community profiles. (a) Taxonomy for each pool assigned with Metaphlan, Metalign, or Kraken2 to its complete microbial database, Kraken2 to the OptiVag database, or VIRGO. (b) Same samples as in panel a, compared to qPCR results for Lactobacillus iners, Lactobacillus crispatus, and Gardnerella vaginalis. For each sample, the sum of these three taxa was normalized to 1, to make them comparable to the qPCR results in the triaxial plot. (c) Manhattan distance between each sample and method and its corresponding qPCR profile. In this three-dimensional structure, the Manhattan distance is strictly limited between 0 (identical profiles) and 3 (maximum distance for each of the three species considered).