The quantification of the total microbial content in metagenomic samples is critical for investigating the interplay between the microbiome and its host, as well as for assessing the accuracy and precision of the relative microbial composition which can be strongly biased in low microbial biomass samples. In the present study, we demonstrate that digital droplet PCR (ddPCR) can provide accurate quantification of the total copy number of the 16S rRNA gene, the gene usually exploited for assessing total bacterial abundance in metagenomic DNA samples. Notably, using DNA templates with different integrity levels, as measured by the DNA integrity number (DIN), we demonstrated that 16S rRNA copy number quantification is strongly affected by DNA quality and determined a precise correlation between quantification underestimation and DNA degradation levels. Therefore, we propose an input DNA mass correction, according to the observed DIN value, which could prevent inaccurate quantification of 16S copy number in degraded metagenomic DNAs. Our results highlight that a preliminary evaluation of the metagenomic DNA integrity should be considered before performing metagenomic analyses of different samples, both for the assessment of the reliability of observed differential abundances in different conditions and to obtain significant functional insights.
The quantification of the total microbial content in metagenomic samples is critical for investigating the interplay between the microbiome and its host, as well as for assessing the accuracy and precision of the relative microbial composition which can be strongly biased in low microbial biomass samples. In the present study, we demonstrate that digital droplet PCR (ddPCR) can provide accurate quantification of the total copy number of the 16S rRNA gene, the gene usually exploited for assessing total bacterial abundance in metagenomic DNA samples. Notably, using DNA templates with different integrity levels, as measured by the DNA integrity number (DIN), we demonstrated that 16S rRNA copy number quantification is strongly affected by DNA quality and determined a precise correlation between quantification underestimation and DNA degradation levels. Therefore, we propose an input DNA mass correction, according to the observed DIN value, which could prevent inaccurate quantification of 16S copy number in degraded metagenomic DNAs. Our results highlight that a preliminary evaluation of the metagenomic DNA integrity should be considered before performing metagenomic analyses of different samples, both for the assessment of the reliability of observed differential abundances in different conditions and to obtain significant functional insights.
Entities:
Keywords:
DNA integrity; digital PCR; metabarcoding; microbiome content
All supporting data and protocols have been provided within the article or through supplementary data files. Six supplementary figures and six supplementary tables are available with the online version of this article. The sequencing data generated in this study are available from the National Center for Biotechnology Information (NCBI) database under the accession number PRJNA622512 (www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA622512).The identification and functional characterization of microbial communities present in a variety of environments are generally carried out through sequencing-based approaches, which provide relative quantifications of taxon occurrence or metabolic pathways, although they may not allow reliable biological interpretation of the functional interplay of microbial species among themselves or with their eventual specific host. This limitation may be overcome through quantification of the microbial content, which could help to provide a more accurate overview of the microbial dynamics in the investigated environments. Here we demonstrate that accurate quantification of the total 16S rDNA copy number in metagenomic DNAs can be performed by digital droplet PCR (ddPCR) and that it is strongly affected by DNA quality, as measured by the DNA integrity number (DIN). Therefore, our study suggests that preliminary accurate quantitative evaluation of the total microbial content by ddPCR is highly recommended before performing metagenomic analyses of different samples; both for the assessment of the reliability of observed differential abundances in different conditions and to obtain significant functional insights.
Introduction
The recent impressive blooming of metagenomics through the concurrent development of high-throughput sequencing platforms has opened up unprecedented avenues for studying the microbiome in a variety of physio-pathological contexts. The identification and functional characterization of microbial communities are generally performed using genome shotgun or amplicon-based sequencing approaches. Independently of the adopted approach, metagenomic projects generally provide relative quantifications of taxon or metabolic pathway abundance [1], limiting reliable biological interpretation of the functional interplay between present microbial species or with their eventual host organism [2]. Indeed, as relative abundance data are mutually dependent, they can lead to misinterpretations and false discoveries and may generate artefactual statistical inferences [3]. Therefore, microbial relative abundance could be reliably interpreted if related to the total bacterial content. In this context, the absolute quantification of the microbial species can assist in the reliable assessment of the functional impact of the microbiome as well as in evaluating pathogenicity [4]. For example, the absolute concentration of a pathogen is a specific marker of disease severity and could suggest the most adequate therapeutic strategy [5]. Total bacterial abundance is generally based on the quantification of the total number of copies of the 16S rRNA gene – a conserved prokaryotic gene with hypervariable sequences that differ between species – present in the metagenomic sample [6, 7]. Several aspects need to be carefully considered in microbial quantification studies in order to avoid the misinterpretation of experimental results [8, 9]. First, there is the sensitivity and specificity of the quantitative approach used. Quantitative PCR (qPCR) of a target gene has been widely used for the absolute quantification of bacterial content in recent years [10, 11]. While accurate, qPCR is limited by the need for a standard curve or a reference gene and many technical replicates, and has the problem that the detectable copy number is affected by the presence of inhibitors that are commonly present in metagenomic samples [12, 13]. Recently, these limitations have been overcome through the development of droplet digital PCR (ddPCR), which allows absolute DNA quantification without a standard curve and functions over a wide dynamic range with high sensitivity in low copy number detection [14-19]. Indeed, several examples of the application of ddPCR have recently been reported in the literature for the absolute quantification of microbial species [16, 20, 21].Then, for PCR-based methods, accurate selection of the primers targeting the 16S rRNA gene is required in order to broadly and appropriately sample the prokaryotic diversity characterizing the community under investigation, and to reduce biases and preserve the accuracy of abundance estimates [22-24].Furthermore, from complex matrices, such as those used for most microbial studies, for example soil, stool or swabs, the accuracy of bacterial quantification is affected by the efficiency of the DNA extraction method, which can lead to underestimation of genome abundances if DNA loss occurs [25-27].Finally, the overall quality of the metagenomic DNA, such as its integrity as measured by the DNA integrity number (DIN) and the absence of contaminants, is another crucial – but still largely unattended – aspect to be considered in total bacterial abundance studies, as low quality can lead to significantly inaccurate bacterial quantification, especially when comparisons between different communities are performed [28].In the present study, we applied ddPCR to DNAs from single bacterial strains, bacterial mocks and metagenomic samples, demonstrating that it allows accurate absolute quantification of 16S rRNA gene copy numbers. Moreover, using DNA templates with different integrity levels, we demonstrated that 16S copy number quantification is strongly affected by DNA quality, a relevant issue that has not been addressed in previous studies [16, 20, 21, 29, 30]. Remarkably, we determined a precise correlation between quantification underestimation and DNA degradation levels, measured as DIN values, and demonstrated that a correction of the DNA mass, based on the underestimation value, allows one to estimate the 16S copy number accurately even in the most degraded metagenomic DNAs.Our results provide a novel insight into the effect of metagenomic DNA quality on the quantification of microbial abundances and highlight that a preliminary evaluation of the metagenomic DNA integrity is required to assess the reliability of observed differential abundances in different conditions and obtain significant functional insights.
Methods
DNA samples
A plasmid (PGEM3.1) containing a single copy of a portion of the 16S rRNA gene (600 bp long), including the V5–V6 hypervariable regions, was synthesized by a gene synthesis service (GenScript Biotech, NJ, USA). Four different genomic DNA from bacterial strains (, , , ) were purchased from Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, (Braunschweig, Germany). ZymoBIOMICS Microbial Community DNA Standard, a mixture of genomic DNA isolated from pure cultures of eight bacterial and two fungal strains, was purchased from Zymo Research (Irvine, CA, USA) (Table 1). A laboratory mock was prepared by mixing together 2.1 pg of , 1 pg of s, 2.2 pg of and 1.2 pg of . A mixture of ZymoBIOMICS mock and human genomic DNA (ZymoBIOMICS/humDNA mixture) was prepared by combining 16 pg of ZymoBIOMICS DNA Standard with 200 ng of human genomic DNA.
Table 1.
Bacterial strains and the microbial community used in 16S absolute quantification by ddPCR. For each species, the genome size, the 16S rRNA gene copy number per genome and 16S rRNA gene copies ng−1 of DNA are reported
Bacterial species
Genome size (Mb)
16S rRNA copies/genome
16S rRNA copies ng−1
Bacteriovorax stolpii (DSM12778)
3.81
2
477 893
Deinococcus radiodurans (DSM20539)
3.28
3
834 550
Lactobacillus plantarum (DSM2601)
3.26
5
1 398 888
Bacteroides vulgatus (DSM1447)
5.16
7
1 239 104
ZymoBIOMICS mock
Pseudomonas aeruginosa
6.79
4
538 000
Escherichia coli
4.87
7
1 309 000
Salmonella enterica
4.76
7
1 344 000
Lactobacillus fermentum
1.90
5
2 395 000
Enterococcus faecalis
2.84
4
1 284 000
Staphylococcus aureus
2.73
6
2 004 000
Listeria monocytogenes
2.99
6
1 830 000
Bacillus subtilis
4.04
10
2 260 000
Saccharomyces cerevisiae
12.1
/
/
Cryptococcus neoformans
18.9
/
/
Bacterial strains and the microbial community used in 16S absolute quantification by ddPCR. For each species, the genome size, the 16S rRNA gene copy number per genome and 16S rRNA gene copies ng−1 of DNA are reportedBacterial speciesGenome size (Mb)16S rRNA copies/genome16S rRNA copies ng−1(DSM12778)3.812477 893(DSM20539)3.283834 550(DSM2601)3.2651 398 888(DSM1447)5.1671 239 104ZymoBIOMICS mock6.794538 0004.8771 309 0004.7671 344 0001.9052 395 0002.8441 284 0002.7362 004 0002.9961 830 0004.04102 260 000Saccharomyces cerevisiae12.1//Cryptococcus neoformans18.9//Stool samples from 13 volunteers from the laboratory group were collected and stored at −80 °C until use. Metagenomic DNA was extracted using the Fast DNA Spin Kit for Soil (MP Biomedicals, Santa Ana, CA, USA), according to the manufacturer’s instructions. A 40 s bead-beating step at speed 6 was applied using the FastPrep Instrument (BIO 101, Carlsbad, Canada). DNA was eluted in 100 µl and stored at −80 °C.
Assessment of DNA integrity and concentration
DNA integrity was evaluated using the Agilent TapeStation 2200 System (Agilent, Santa Clara, CA, USA) and the Genomic DNA ScreenTape assay (Agilent, Santa Clara, CA, USA). The DIN was determined for each sample using Agilent 2200 TapeStation software (controller version A.01.05). DNA concentration was assessed by fluorimetry using the Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen, Carlsbad, CA, USA) on a NanoDrop 3300 Fluorospectrometer (Thermo Fisher Scientific, Waltham, MA, USA).
Generation of DNA samples with decreasing DIN
DNA samples with increasing fragmentation rates were obtained using the Covaris M220 focused-ultrasonicator (Thermo Fisher Scientific, Waltham, MA, USA), setting variable time (seconds) and duty factor (DF) in combination with fixed peak incident power (PIP) of 50 W and 200 cycles per burst.For the ZymoBIOMICS/humDNA mixture, starting from the untreated sample (DIN9, DIN value of 9.1±0.1), six samples (each with five independent replicates) with increasing levels of degradation (7.5±0.3, 6.2±0.4, 4.5±0.3, 3.5±0.3, 2.2±0.3, 1.5±0.3) were prepared and named DIN7, DIN6, DIN4, DIN3, DIN2 and DIN1, respectively. The time and DF settings were: DIN7 : 5 s and 1 % DF; DIN6 : 5 s and 2 % DF; DIN4 : 3–5 s and 10 % DF; DIN3 : 20 s and 10 % DF; DIN2 : 20 s and 12 % DF; DIN1 : 35 s and 20 % DF (Fig. S1, available in the online version of this article).Metagenomic DNAs, with an average DIN of 6.5±0.6 (DIN6), were diluted at a concentration of 4 ng µl−1 and concentration was confirmed by fluorimetric assay. Approximately two hundred nanograms of DNA were transferred in a Covaris microTube-50 (Thermo Fisher Scientific, Waltham, MA, USA) and sonicated to prepare three progressive degradation points, using the time and DF settings as reported: DIN 4 : 5–10 s and 3 % DF; DIN 2 : 10–20 s and 12 % DF; DIN 1 : 15–30 s and 20 % DF. The final samples had the following average DIN values: 4.5±1 (DIN4), 2.05±0.35 (DIN2) and 1.35±0.35 (DIN1) (Fig. S2).
ddPCR experiments
ddPCR (Bio-Rad, Hercules, CA, USA) was performed to determine the total number of 16S copies by using universal primers targeting the V5–V6 regions of 16S rDNA [31], as they have been used successfully in other DNA metabarcoding analyses (primer sequences: forward, B-V5 : 5′-ATTAGATACCCYGGTAGTCC-3′; reverse, A-V6, 5′-ACGAGCTGACGACARCCATG-3′) [32, 33]. For the quantification of the gene copy numbers for 16S rRNA, specific primers targeting the V7–V8 regions were used (primer sequence: forward, AM1 : 5′-CAGCACGTGAAGGTGGGGAC-3′; reverse, AM2 : 5′-CCTTGCGGTTGGCTTCAGAT-3′) [34].Each DNA sample was diluted to a concentration of 1 ng µl−1 and the concentration was confirmed by fluorimetric assay. Starting from this concentration, serial dilutions were made for each DNA. The final dilution factor used in the ddPCR reaction was chosen on the basis of preliminary tests in order to balance positive events versus negative events and optimize quantification reliability.A reaction volume of 22 µl was prepared by combining the diluted DNA with 11 µl of 2× Evagreen Supermix (Bio-Rad, Hercules, CA, USA), 0.39 µl of 10 µM forward and reverse universal 16S primers or 0.44 µl of 10 µM forward and reverse specific 16S A. primers, and water. Emulsion was produced in the QX200 Droplet Generator (Bio-Rad, Hercules, CA, USA) according to the manufacturer’s instructions. The thermal cycling conditions were as follows: for total 16S copies quantification: 1 cycle at 95 °C for 5 min, 40 cycles at 95 °C for 30 s and 58 °C for 1 min, 1 cycle at 4 °C for 5 min, 1 cycle at 90 °C for 5 min, final hold at 4 °C; for A. 16S copy quantification: 1 cycle at 95 °C for 5 min, 40 cycles at 95 °C for 30 s and 60 °C for 1 min, 1 cycle at 4 °C for 5 min, 1 cycle at 90 °C for 5 min, final hold at 4 °C. Each DNA sample was analysed at least in duplicate, preparing independent dilutions. For each experiment, a negative control (no template control) was used. Absolute quantification was performed using QuantaSoft version 7.4.1 software (Bio-Rad, Hercules, CA, USA,) and the negative/positive thresholds were set manually, excluding samples with a number of droplets <10 000. Output results were expressed in 16S copies µl−1.
Estimation of 16S rRNA gene copy number
The expected copy number ng−1 DNA of 16S rDNA for the one-copy 16S plasmid, the commercial bacterial species and the ZymoBIOMICS mock was calculated on the basis of the known genome sizes and the 16S rRNA copy number/genome, as reported in Table 1.The ddPCR measured copy number ng−1 DNA was calculated considering the DNA dilution factor used in ddPCR reaction, the fluorimetric DNA concentration verified after dilution to 1 ng µl−1 and the final ddPCR reaction volume (22 µl) as reported in the formula below:A polynomial regression model to predict the percentage underestimation of 16S rDNA copies as a function of the measured DIN was inferred based on the experimental data obtained by using the ZymoBIOMICS/hum DNA mixture (Fig. S3). The model training was carried out in R by using the lm and poly functions belonging to the stat package and plotted with ggplot2. Finally, by using the regression estimation we corrected the input DNA mass in ddPCR experiments on metagenomic DNAs.
Metabarcoding sequencing and data analysis
The protocol used for amplicon library preparation was described previously [32, 33]. The V5–V6 region was amplified using the same primers pair as were used in the ddPCR experiments in five metagenomic DNAs from stool samples (F9–F13), at their high-quality (DIN6), low-quality (DIN1) and mass corrected low-quality (DIN1 input corrected, DIN1-IC) levels. Each DNA was analysed in triplicate and the prepared libraries were sequenced on Illumina MiSeq platform (Illumina, San Diego, CA, USA) to generate 2×250 bp paired end (PE) reads. 30% of the PhiX genome library was loaded in the run to compensate for low base diversity in the amplicon libraries.Overall, 5.6 M PE reads were generated, with an average of 125 000 PE reads per sample (sd 18 000, min 58 783, max 156 957). Raw sequencing data are available in SRA (PRJNA622512). The raw sequencing data were denoised using DADA2, a tool that applies a statistical approach to discriminate between the biological diversity and the noise introduced by both PCR and sequencing, in order to remove the latter [35]. Moreover, it removed reads derived from the PhiX genome and PCR chimeras. About 72 % of the raw reads (sd 10 %, min 49.35 %, max 86.30 %) passed the denoising step and were used to infer 1298 amplicon sequences variants (ASVs). The BioMaS pipeline was applied to annotate the inferred ASVs taxonomically using release 11.5 of the RDP database and the National Center for Biotechnology Information (NCBI) taxonomy for the 16S rRNA reference collection and taxonomy, respectively [36-38]. In particular, comparison of the ASV sequences with the reference collection was performed by means of Bowtie 2 and the resulting alignments were filtered according to query coverage (≥70 %) and identity percentage (≥90 %) [39]. The classification was performed using TANGO: for ASVs obtaining matches with an identity percentage equal to or higher than 97 %, the taxonomic classification at species level was assigned; otherwise, they were classified to higher taxonomic ranks [40, 41].
Statistics
Statistical analyses of ddPCR data were performed using GraphPad Prism 5.0 software (GraphPad Software, San Diego, CA, USA). Student’s t-test was used for statistical comparisons and data were presented as mean±standard deviation (sd). P-values <0.05 were considered to be indicative of statistically significant differences.The qualitative comparison of taxa observed in next-generation sequencing (NGS) data was performed by using an ad hoc-developed Python script. In particular, to infer the number of common and uncommon taxa among the DIN6, DIN1 and DIN-IC DNA samples for each subject (F9-F13), all the taxa observed in only one replicate or with an average relative abundance lower than 0.5 % were filtered out. The retained taxa were collected in three sets corresponding to the tested DIN values. The qualitative comparison between sets was summarized at the six main taxonomic ranks level (i.e. phylum, class, order, family, genus and species).Moreover, a quantitative comparison was performed by measuring for each sample the Pearson correlation between the relative frequencies of observed taxa in DIN6, DIN1 and DIN-IC samples.Finally, by using the Wilcoxon rank test, the relative abundances observed by using ddPCR and NGS data were compared.
Results
ddPCR allows accurate quantification of 16s rRNA gene copy number
The number of copies of the 16S rRNA gene (16S rDNA) were quantified by ddPCR technology, using the universal primers pair targeting the V5–V6 hypervariable regions [31], previously used for DNA metabarcoding investigations [32, 33]. First, we evaluated the specificity of the selected primers pair, performing the analysis on DNA samples with a known number of 16S rDNA copies. In particular, we used a plasmidic DNA, containing one copy of a synthetic 16S rRNA sequence, four individual bacterial species and a mock community (ZymoBIOMICS) with eight bacterial strains (Table 1). As reported in Fig. 1, perfect concordance was observed between the ddPCR estimated and the expected number of 16S copies in all samples, including the 16S single-copy DNA, the four bacterial DNAs analysed both individually and mixed together (lab mock) and the ZymoBIOMICS mock. The expected number of copies of 16S rDNA was calculated for 1 ng of DNA, taking into account both genome size and genomic 16S copy number. Hence, these results demonstrate the reliability of the V5–V6 primers pair for ddPCR-based absolute quantification of 16S rDNA copies.
Fig. 1.
Estimated vs ddPCR-measured total 16S copy number. Comparison between estimated and ddPCR-measured total 16S copies in the plasmid containing a single copy of the V5–V6 region, in the four bacterial DNAs, analysed individually and mixed together (labmock), and in the ZymoBIOMICS mock. 16S copy number is reported per ng of DNA. Each sample was analysed at least in triplicate and results are expressed as mean±sd.
Estimated vs ddPCR-measured total 16S copy number. Comparison between estimated and ddPCR-measured total 16S copies in the plasmid containing a single copy of the V5–V6 region, in the four bacterial DNAs, analysed individually and mixed together (labmock), and in the ZymoBIOMICS mock. 16S copy number is reported per ng of DNA. Each sample was analysed at least in triplicate and results are expressed as mean±sd.
DNA integrity significantly affects 16S rRNA copy number quantification
The extraction of high-quality DNA from complex matrices is often a challenging task, and this, in turn, may affect its accurate quantification [25]. Apart the extraction efficiency, another critical issue is the integrity of the extracted DNA [42], which is usually measured by the DIN using the Agilent 2200 TapeStation system [43].In order to evaluate how DNA integrity affects 16S rDNA copy number quantification in metagenomic DNAs by ddPCR, we first prepared a DNA sample resembling a metagenomic DNA of human origin, consisting of the ZymoBIOMICS DNA mock and human DNA (ZymoBIOMICS/humDNA mock; see the Methods section). This sample was subjected to gradients of DNA shearing by sonication to obtain DNA with increasing levels of degradation, evaluated by measuring the DIN value. We obtained samples with DIN values of 9.1±0.1, 7.5±0.3, 6.2±0.4, 4.5±0.3, 3.5±0.3, 2.2±0.3 and 1.5±0.3 that we named DIN9, DIN7, DIN6, DIN4, DIN3, DIN2 and DIN1, respectively (Fig. S1). Copy number quantification of 16S rDNA was performed for each of these ZymoBIOMICS/humDNA mocks. As shown in Fig. 2, the higher the DNA degradation rate, the greater the underestimation of the measured 16S copy number., The results were statistically significant for all samples, with the exception of the DIN7 sample, whose integrity level was not significantly different from that of the high-quality DIN9 sample (Fig. 2). The rate of 16S underestimation with respect to the DIN9 sample increased progressively up to 57 % in the DIN1 sample (Fig. 2, Table S1). Thus, these results confirm that DNA integrity affects target quantification accuracy to a remarkable degree and suggest that a preliminary evaluation of DNA integrity, by estimating the DIN value, can provide a reliable assessment of the quantification underestimation, which can then be properly corrected.
Fig. 2.
16S rRNA gene copy number quantification is influenced by the quality of the DNA mock. ZymoBIOMICS/humDNA mock at original DIN9 value and its six degradation levels corresponding to DIN7, DIN6, DIN4, DIN3, DIN2 and DIN1 were analysed for 16S copy quantification by ddPCR. The green line represents the total 16S copies ng−1 DNA, with this decreasing with the increase of the degradation level. The red line represents the 16S copy underestimation percentage, with this increasing with the DNA degradation level. 16S copies are reported as mean of at least three replicates for each sample±sd. 16S copy underestimation was calculated as a percentage compared to the highest quality (DIN 9) sample. **, P<0.01; ***, P<0,001.
16S rRNA gene copy number quantification is influenced by the quality of the DNA mock. ZymoBIOMICS/humDNA mock at original DIN9 value and its six degradation levels corresponding to DIN7, DIN6, DIN4, DIN3, DIN2 and DIN1 were analysed for 16S copy quantification by ddPCR. The green line represents the total 16S copies ng−1 DNA, with this decreasing with the increase of the degradation level. The red line represents the 16S copy underestimation percentage, with this increasing with the DNA degradation level. 16S copies are reported as mean of at least three replicates for each sample±sd. 16S copy underestimation was calculated as a percentage compared to the highest quality (DIN 9) sample. **, P<0.01; ***, P<0,001.
Assessment of the underestimation rate for 16s rRNA gene copy number at variable integrity levels of metagenomic DNAs
To evaluate the possibility of quantifying the underestimation rate of ddPCR determination as a function of the degradation state of metagenomic DNA samples, we analysed 13 metagenomic DNAs extracted from stool samples, with an average DIN of 6.5±06 (DIN6). From each of these DNA samples, three additional levels of degradation, corresponding to DIN values of 4.5±1 (DIN4), 2.05±0.35 (DIN2) and 1.35±0.35 (DIN1), were generated by sonication (Fig. S2). For each DNA sample, the number of 16S copies was quantified by ddPCR. We observed a progressive decrease in the 16S copy number correlated with degradation levels (Table S2). Interestingly, we observed a similar extent of copy number decrease for both the ZymoBIOMICS/humDNA mock and the faecal metagenomic samples (Fig. 3a, b). In order to investigate the species specificity of this pattern, we further investigated the 16S copy number in DIN6 and DIN1 faecal DNAs for a specific bacterium, , one of the most promising probiotics for gut microbiota-related diseases [44]. As shown in Fig. 3c, in the DIN1 sample, we observed a reduction of the 16S copy number with respect to the DIN6 sample and, interestingly, this reduction (~55 %) was comparable to that of the total 16S copy number in the same sample (Fig. 3d).
Fig. 3.
Total and species-specific 16S rRNA gene copy number quantification is affected by metagenomic DNA quality. (a) 16S copy quantification in 13 metagenomic DNAs at their original DIN6 value and at three degradation levels, corresponding to DIN4, DIN2 and DIN1 values, compared to the ZymoBIOMICS/humDNA mock at the same integrity levels. (b) 16S copies underestimation percentage for the ZymoBIOMICS/humDNA mock and for the 13 metagenomic DNAs at the same integrity levels. (c) 16S copies measured in five metagenomic DNAs (f9-f13) at their original DIN6 value and at the DIN1 degradation level. (d) 16S copy underestimation percentage (56%) is in accordance with that measured for total 16S copies in the same metagenomic DNAs (f9–f13) at the DIN1 degradation level. 16S copies are reported as the mean of at least two replicates for each sample±sd. 16S copy underestimation was calculated as a percentage compared to the highest quality (DIN 6) samples (considered as 100%). *, P<0.05, **,P<0.01, ***, P<0,001 vs DIN6.
Total and species-specific 16S rRNA gene copy number quantification is affected by metagenomic DNA quality. (a) 16S copy quantification in 13 metagenomic DNAs at their original DIN6 value and at three degradation levels, corresponding to DIN4, DIN2 and DIN1 values, compared to the ZymoBIOMICS/humDNA mock at the same integrity levels. (b) 16S copies underestimation percentage for the ZymoBIOMICS/humDNA mock and for the 13 metagenomic DNAs at the same integrity levels. (c) 16S copies measured in five metagenomic DNAs (f9-f13) at their original DIN6 value and at the DIN1 degradation level. (d) 16S copy underestimation percentage (56%) is in accordance with that measured for total 16S copies in the same metagenomic DNAs (f9–f13) at the DIN1 degradation level. 16S copies are reported as the mean of at least two replicates for each sample±sd. 16S copy underestimation was calculated as a percentage compared to the highest quality (DIN 6) samples (considered as 100%). *, P<0.05, **,P<0.01, ***, P<0,001 vs DIN6.Taken together, these data robustly confirm that 16S copy number estimates are remarkably biased by the DNA degradation level as measured by the DIN and suggest a simple correction based on visual or mathematical interpolation (Fig. S3) of the standard curve in Fig. 2.
DNA input correction in ddPCR for an accurate quantification of 16S copy number in degraded metagenomic DNA samples
To account for the levels of DNA integrity in the quantification of the 16S copy number, for five DIN1 faecal DNAs (samples F9–F13), we adopted a mass correction of the DNA template in the ddPCR reaction, based on the 55 % underestimation percentage calculated (DIN1 input corrected, DIN1-IC). As shown in Fig. 4a, with this correction, we obtained a number of total 16S copies roughly corresponding to that measured in the original DIN6 DNA samples. Remarkably, the same DNA mass correction also allowed us to recover 16S copies in these degraded DNAs (Fig. 4b, Table S3).
Fig. 4.
Input DNA mass correction allows accurate quantification of the 16S copy number in degraded metagenomic DNAs. For five DIN1 faecal DNAs (samples F9–F13), a mass correction of the DNA template in the ddPCR reaction, based on the 55 % underestimation percentage (DIN1 input corrected, DIN1-IC), allowed us to obtain the total number of 16S copies (a) and the number of 16S copies (b) corresponding to those measured in the original DIN6 DNA samples. *, P<0.05, **, P<0.01, ***, P<0,001 vs DIN1.
Input DNA mass correction allows accurate quantification of the 16S copy number in degraded metagenomic DNAs. For five DIN1 faecal DNAs (samples F9–F13), a mass correction of the DNA template in the ddPCR reaction, based on the 55 % underestimation percentage (DIN1 input corrected, DIN1-IC), allowed us to obtain the total number of 16S copies (a) and the number of 16S copies (b) corresponding to those measured in the original DIN6 DNA samples. *, P<0.05, **, P<0.01, ***, P<0,001 vs DIN1.Overall, these results demonstrate that the input DNA mass correction, calculated on the ZymoBIOMICS/humDNA mock standard curve (Figs 2 and S3), can mitigate inaccurate quantification of 16S copy number in degraded metagenomic DNAs.
Relative bacterial abundance is not affected by the integrity level in metagenomic DNA samples
In order to investigate whether relative bacterial abundance may also be affected by the degradation of metagenomic DNA, we calculated the relative abundance of 16S copies measured by ddPCR in a subset of five faecal DNAs (samples F9–F13), at DIN6 and DIN1 integrity levels and at DIN1 with mass correction (DIN1-IC). As shown in Fig. 5a and in Table S3, for all samples, the relative abundance of 16S copies was not affected by the level of DNA integrity.
Fig. 5.
relative abundance is not affected by the integrity level of metagenomic DNAs. (a) 16S copies/total 16S copies measured by ddPCR in DIN6, DIN1 and DIN1 with mass correction (DIN1-IC) for faecal DNAs (samples F9–F13). (b) relative abundance derived by ddPCR compared to DNA metabarcoding data. 16S copies measured by ddPCR and in NGS data analysis in DIN6, DIN1 and DIN1 with mass correction (DIN1-IC) for faecal DNAs (samples F9–F13) were box-plotted together. p-values of pairwise comparison are indicated.
relative abundance is not affected by the integrity level of metagenomic DNAs. (a) 16S copies/total 16S copies measured by ddPCR in DIN6, DIN1 and DIN1 with mass correction (DIN1-IC) for faecal DNAs (samples F9–F13). (b) relative abundance derived by ddPCR compared to DNA metabarcoding data. 16S copies measured by ddPCR and in NGS data analysis in DIN6, DIN1 and DIN1 with mass correction (DIN1-IC) for faecal DNAs (samples F9–F13) were box-plotted together. p-values of pairwise comparison are indicated.Next, we sequenced the V5–V6 amplicon in the same subset of metagenomic DNA samples at DIN6 and DIN1 integrity levels and at DIN1 with mass correction. We evaluated whether DNA integrity may influence the number of observed taxa in metabarcoding analysis. After the removal of marginally abundant taxa (average relative abundance among replicates ≤0.5 %) as well as those observed in only one out of three replicates, we enumerated common and uncommon taxa at different DNA integrity levels and collected the results for the F9–F13 samples at the six main taxonomic ranks level (i.e. phylum, class, order, family, genus and species). As expected, the higher the taxonomic rank, the smaller the differences in the observed taxa. In particular, a core set of shared taxa among DIN6, DIN1 and DIN1-IC samples was observed. The percentage of shared taxa was 100 % at phylum level and decreased up to 74 % at the order rank (Tables S4 and S5). The observation that the DNA integrity level does not significantly affect the relative abundance of detected taxa was confirmed by the high Pearson correlation obtained by comparing taxon relative abundance at phylum, class, order, family, genus and species level, for DIN6, DIN1 and DIN1-IC samples (Table S6, Figs S4–S6).Finally, in order to assess the species-specificity of DNA metabarcoding data, we compared relative abundances measured by both ddPCR and NGS analysis, by using the pairwise Wilcoxon rank test. Interestingly, we found a perfect concordance between ddPCR and DNA metabarcoding data (Fig. 5b), suggesting that the relative proportions of are preserved in NGS data analysis, regardless of the initial DNA quality.
Discussion
The large majority of microbiome investigations adopt the DNA metabarcoding approach, which provides relative estimates of the composition of microbial taxa, which are intrinsically mutually dependent. This introduces a systematic bias as the increase of one taxon in a specific condition inevitably leads to the decrease of all others. This bias is particularly relevant in the case of low microbial contents with a consequent high risk of false discoveries or artefactual results [45]. Indeed, measuring the absolute microbial content in a metagenomic sample is critical for reliable investigation of the interplay between the microbiome and its host, which is affected by quantitative parameters (e.g. metabolite concentration) [46], as well as for assessing the functional impact of microbial dynamics in a variety of environments and conditions [4].In the present study, we present data showing that ddPCR provides accurate quantification of 16S rDNA copy number in metagenomic DNAs. ddPCR is characterized by precise quantification, higher reproducibility compared to qPCR and higher sensitivity in low-copy-number detection [14, 20, 47]. Furthermore, ddPCR mitigates the effects of the presence of PCR inhibitors that may affect PCR sensitivity in 16S metabarcoding sequencing analysis [27, 48].For the quantification of 16S copy number in metagenomic DNAs by PCR, the choice of the primers pair is crucial, as it should be able to cover the entire microbial representation of a large variety of complex matrices [22-24]. For our study, we selected the universal primers pair targeting the V5–V6 hypervariable regions of the 16S rDNA [31], successfully used in DNA metabarcoding investigations [32, 33, 49–53]. We demonstrated the reliability of these primers pair as, in different DNA samples with a known number of 16S rDNA copies, a perfect concordance was observed between the expected 16S copy number and that estimated by ddPCR, both in DNAs from single bacterial strains and from bacterial communities.Furthermore, we showed that 16S copy number quantification is strongly affected by DNA quality, as measured by its DIN. By analysing a DNA mock, with a known 16S rDNA copy number, at different DIN values, we calculated a percentage of underestimation of 16S rDNA copy number correlated to the degradation state. Interestingly, in faecal metagenomic DNAs, we observed a similar degree of decrease both in total and species-specific 16S copy number, at the same level of DNA integrity. In degraded DNAs, we demonstrated that reliable quantification of 16S copy number can be obtained through a correction of the input DNA mass to use as template, based on the underestimation percentage calculated on the mock standard curve. Our results robustly confirmed that 16S copy number estimates are consistently biased by the DNA degradation level and indicate the necessity for a preliminary evaluation of metagenomic DNA integrity in quantitative analyses. For degraded DNAs, we recommend a correction of the DNA mass to use as template, calculated on the mock standard curve or on a mathematical interpolation.On the other hand, our results demonstrated that relative bacterial quantification is not affected by the integrity level in metagenomic DNAs, as we showed that the relative abundance of 16S rDNAs measured by ddPCR did not vary in different faecal metagenomic DNAs, at DIN6 and DIN1 integrity levels. These data were confirmed by NGS analysis of the same samples, which showed perfect concordance between ddPCR results and DNA metabarcoding data, suggesting that the relative proportions of specific taxa are also preserved in NGS data analysis, regardless of the initial DNA quality. Moreover, consistent with the proposal that DNA quality does not affect relative bacterial quantification, we found a high Pearson correlation by comparing taxon relative abundance, at phylum, class, order, family, genus and species level, for DIN6, DIN1 and DIN1-IC samples.While metabarcoding analyses allow estimation of the relative abundances of the different taxa, regardless of the quality of the input DNA, they do not provide any indication regarding the overall microbial content, which can be a very relevant factor. Indeed, even relative estimates of different taxa are unreliable when the overall microbial content is too low [54]. On the other hand, significant variations of the microbial content in different samples (e.g. 2–3-fold or more) may have remarkable effects on functional interpretations. Indeed, it is well known that an altered gut permeability of the microbiome is correlated with several diseases [55], and in this respect a quantitative assessment of systemic microbial leaks is mandatory for functional studies.Our study highlights that preliminary quantitative evaluation of the total microbial content by ddPCR is strongly indicated before performing metagenomic analyses of samples, for both assessment of the reliability of observed differential abundances in different conditions and to obtain significant functional insights. The method we propose to accurately quantify the total microbial content also makes suitable adjustments for DNA quality. Indeed, although ddPCR has already been applied for absolute quantification of bacteria [16, 20, 21] and viruses [29, 30], a suitable correction of input mass for DNA quality, which may vary substantially in different samples and conditions, has never been addressed.Click here for additional data file.
Authors: Jonathan Thorsen; Asker Brejnrod; Martin Mortensen; Morten A Rasmussen; Jakob Stokholm; Waleed Abu Al-Soud; Søren Sørensen; Hans Bisgaard; Johannes Waage Journal: Microbiome Date: 2016-11-25 Impact factor: 14.650
Authors: Henrik Krehenwinkel; Marisa Fong; Susan Kennedy; Edward Greg Huang; Suzuki Noriyuki; Luis Cayetano; Rosemary Gillespie Journal: PLoS One Date: 2018-01-05 Impact factor: 3.240
Authors: Ingrid Ziegler; Sofia Lindström; Magdalena Källgren; Kristoffer Strålin; Paula Mölling Journal: PLoS One Date: 2019-11-13 Impact factor: 3.240
Authors: Kristýna Fiedorová; Matěj Radvanský; Eva Němcová; Hana Grombiříková; Juraj Bosák; Michaela Černochová; Matej Lexa; David Šmajs; Tomáš Freiberger Journal: Front Microbiol Date: 2019-04-17 Impact factor: 5.640