| Literature DB >> 32000676 |
Samuel E Barnett1, Daniel H Buckley2.
Abstract
BACKGROUND: DNA-stable isotope probing (DNA-SIP) links microorganisms to their in-situ function in diverse environmental samples. Combining DNA-SIP and metagenomics (metagenomic-SIP) allows us to link genomes from complex communities to their specific functions and improves the assembly and binning of these targeted genomes. However, empirical development of metagenomic-SIP methods is hindered by the complexity and cost of these studies. We developed a toolkit, 'MetaSIPSim,' to simulate sequencing read libraries for metagenomic-SIP experiments. MetaSIPSim is intended to generate datasets for method development and testing. To this end, we used MetaSIPSim generated data to demonstrate the advantages of metagenomic-SIP over a conventional shotgun metagenomic sequencing experiment.Entities:
Keywords: Metagenomics; SIP; Simulation; Stable isotope probing
Mesh:
Substances:
Year: 2020 PMID: 32000676 PMCID: PMC6993524 DOI: 10.1186/s12859-020-3372-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Diagram of the simulation procedure
Descriptions of variables used in simulation equations
| Variable | Description |
|---|---|
| Proportion of DNA that is guanine or cytosine | |
| ρ | Theoretical BD of a fragment based on fragment G + C |
| ρ | BD of fragment after accounting for isotopic labeling |
| σ | Standard deviation of fragment BD (ρ) |
| ρ | Minimum BD of window to be sequenced |
| ρ | Maximum BD of window to be sequenced |
| ρ | Minimum BD of the DBL for a fragment |
| ρ | Maximum BD of the DBL for a fragment |
| Atom % excess of the fragment | |
| δ | Increase in DNA BD if atom % excess is 100% (13C: 0.036, 15N: 0.016) |
| Proportion of DNA found in the diffusive boundary layer. 1 − | |
| Proportion of the fragment lumen population recovered in the BD window | |
| Radius of the ultracentrifuge tube | |
| Minimum distance from the axis of rotation to the tube | |
| Maximum distance from the axis of rotation to the tube | |
| Fragment distance from the axis of rotation at equilibrium | |
| Minimum position on tube of the DBL range for a fragment | |
| Maximum position on tube of the DBL range for a fragment | |
| α | Abundance of the genome/fragment in the sample |
| α | Abundance of the fragment in the lumen of the tube within the BD |
| α | Abundance of a fragment in the DBL within the BD window |
| α | Abundance of the fragment in the BD window |
| α | Abundance of reads from the fragment |
| Length of the fragment in base pairs | |
| Read length in base pairs | |
| Universal gas constant (8.314 J/molK) | |
| Temperature in kelvin | |
| β | Proportionality constant of aqueous cesium chloride (1.14 × 109) |
| Buoyancy factor (7.87 × 10−10) | |
| Mean molar weight of standard nucleotide base pair in cesium chloride solution (882) | |
| Average density of the gradient solution | |
| ω | Angular velocity of centrifugation |
| Isoconcentration point | |
| θ | Angle of the tube relative to the axis of rotation in radians |
Studies used for validation of fragment abundance distributions including the isolates reported in the study, NCBI accessions used to download the genomes, and the stable isotopes simulated
| Study | Isolates (NCBI accession) | Isotope(s) | Original publication figure |
|---|---|---|---|
| Lueders et al. 2004 | 12C | 1A | |
| 13C | |||
| Buckley et al. 2007 | 14N, 15N | 2 | |
| 14N | |||
| Wawrik et al. 2009 | 14N, 15N | 3B |
Fig. 2Comparisons among MetaSIPSim and SIPSim simulated genomic DNA distribution across CsCl gradients and the empirical results from a Lueders et al. 2004 [34], b Buckley et al. 2007 [25], and c Wawrik et al. 2009 [35]
Fig. 3Improvement of metagenomic-SIP relative to conventional metagenomics with respect to coverage and recovery of 13C-labeled genomes with raw reads. Values greater than zero indicate improvement of metagenomic-SIP relative to conventional metagenomics (Bonferroni correction, n = 6; ***, p-values < 0.001). a and c The difference in labeled genome (n = 100) coverage and the difference in proportion of the genome recovered (i.e. mapped to reads), respectively, varies between communities that differ in G + C content when sequenced at different depths. b and d The difference in labeled genome coverage and the difference in proportion of the genome recovered (i.e. mapped to reads, n = 50, 100, 200), respectively, for low G + C communities with respect to the number of labeled genomes per sample and for high G + C communities with respect to the position of the BD window
Median difference () in metagenome quality for labeled genomes between simulations of metagenomic-SIP and conventional shotgun metagenomic data. For all measures, except bin contamination, a difference greater than zero indicates that metagenomic-SIP improved metagenome quality relative to the conventional metagenomics approach. For bin contamination, a difference less than zero indicates improved metagenome quality with metagenomic-SIP. All statistical analyses were single sided, Wilcoxon signed rank tests with an alternate hypothesis of greater than zero, except for bin contamination which used an alternate hypothesis of less than zero. All p-values are adjusted for multiple comparisons (Bonferroni, n = 6)
| Community G + C | Seq. depth | ||||||
|---|---|---|---|---|---|---|---|
| Low | 5 M | 1.769 (< 0.001) | 0.391 (< 0.001) | 0.052 (< 0.001) | 19,672 (< 0.001) | 0.567 (< 0.001) | − 0.038 (< 0.001) |
| Medium | 1.002 (< 0.001) | 0.315 (< 0.001) | 0.053 (< 0.001) | 9251 (< 0.001) | 0.353 (< 0.001) | −0.022 (< 0.001) | |
| High | 0.596 (< 0.001) | 0.240 (< 0.001) | 0.026 (NS) | 3511 (< 0.001) | 0.366 (< 0.001) | − 0.010 (0.001) | |
| Low | 10 M | 3.561 (< 0.001) | 0.237 (< 0.001) | 0.004 (0.007) | 11,636 (< 0.001) | 0.144 (< 0.001) | −0.017 (< 0.001) |
| Medium | 1.992 (< 0.001) | 0.306 (< 0.001) | 0.007 (< 0.001) | 7554 (< 0.001) | 0.092 (< 0.001) | −0.011 (< 0.001) | |
| High | 1.174 (< 0.001) | 0.222 (< 0.001) | 0.001 (NS) | 7718 (< 0.001) | 0.134 (< 0.001) | −0.009 (< 0.001) |
Median difference () in metagenome quality measures for labeled genomes between simulations of metagenomic-SIP and conventional metagenomics. For all measures, except bin contamination, a difference greater than zero indicates that metagenomic-SIP improved metagenome quality relative to conventional metagenomics. For bin contamination, a difference less than zero indicates improved metagenome quality with metagenomic-SIP. All statistical analyses were single sided, Wilcoxon signed rank tests with an alternate hypothesis of greater than zero, except for bin contamination which used an alternate hypothesis of less than zero. All p-values are adjusted for multiple comparisons (Bonferroni, n = 6)
| Comm. G + C | # Labeled genomes per sample | Buoyant Density window g/ml | ||||||
|---|---|---|---|---|---|---|---|---|
| Low | 25 | 1.72–1.77 | 2.202 (< 0.001) | 0.445 (< 0.001) | 0.058 (0.027) | 18,409 (< 0.001) | 0.675 (< 0.001) | −0.007 (NS) |
| 50 | 1.769 (< 0.001) | 0.391 (< 0.001) | 0.052 (< 0.001) | 19,672 (< 0.001) | 0.567 (< 0.001) | −0.038 (< 0.001) | ||
| 100 | 1.188 (< 0.001) | 0.324 (< 0.001) | 0.006 (NS) | 7779 (< 0.001) | 0.337 (< 0.001) | −0.010 (< 0.001) | ||
| High | 50 | 1.70–1.75 | 0.018 (NS) | 0.013 (NS) | −0.032 (NS) | − 212 (NS) | 0.019 (NS) | −0.001 (NS) |
| 1.72–1.77 | 0.589 (< 0.001) | 0.241 (< 0.001) | 0.026 (NS) | 3824 (< 0.001) | 0.330 (< 0.001) | −0.017 (0.006) | ||
| 1.75–1.79 | 3.935 (< 0.001) | 0.388 (< 0.001) | 0.102 (0.013) | 47,027 (< 0.001) | 0.825 (< 0.001) | −0.012 (NS) |
Fig. 4Improvement of metagenomic-SIP relative to conventional metagenomics with respect to co-assembly quality of 13C-labeled genomes. Values greater than zero indicate improvement of metagenomic-SIP relative to conventional metagenomics (Bonferroni correction, n = 6; *, p-values < 0.05; **, p-values < 0.01; ***; p-values < 0.001). a and c The difference in proportion of each labeled genome (n = 100) recovered (i.e. aligned to contigs) and the difference in their NGA50 (in base pairs; where > 50% of the genome is aligned to contigs in both metagenomic simulation types), respectively, varies between communities that differ in G + C content when sequenced at different depths. b and d The difference in proportion of each labeled genome recovered (i.e. aligned to contigs, n = 50, 100, 200) and the difference in their NGA50 (in base pairs; where > 50% of the genome is aligned to contigs in both metagenomic simulation types), respectively, for low G + C communities with respect to the number of labeled genomes per sample and for high G + C communities with respect to the position of the BD window
Fig. 5Improvement of metagenomic-SIP relative to conventional metagenomics with respect to MAG bin quality of 13C-labeled genomes. For genome recovery in bins, values greater than zero indicate improvement of metagenomic-SIP relative to conventional metagenomics, while for bin contamination values less than zero indicate improvement of metagenomic-SIP relative to conventional metagenomics (Bonferroni correction, n = 6; **, p-values < 0.01; ***, p-values < 0.001). a and c The difference in the proportion of each labeled genome (n = 100) recovered in a bin and the difference in their proportion bin contamination, respectively, varies between communities that differ in G + C content when sequenced at different depths. b and d The difference in the proportion of each labeled genome (n = 50, 100, 200) recovered in a bin and the difference in their proportion bin contamination, respectively, for low G + C communities with respect to the number of target genomes per sample and for high G + C communities with respect to the position of the BD window
Fig. 6Examples of MAG quality improvements achieved with metagenomic-SIP relative to conventional shotgun metagenomics for three target genomes of low abundance in the community. These examples are taken from the 50% G + C skewed (medGC) reference set sequenced at 5,000,000 reads. Genomes presented here are Clostridium stercorarium, Prevotella denticola, and Altererythrobacter ishigakiensis. a Percentage of each genome recovered in reads across 6 simulation trials in which community composition and 13C-labelling were varied randomly (T1 – T5); ‘con’ indicates the 12C-control, ‘L’ indicates a trail in which the organism was 13C-labeled. b Percentage of each genome recovered in contigs from the co-assembly of all 6 trials. c The NGA50 of the contigs from the co-assembly mapped to each genome. d The percentage of each genome recovered in a MAG bin. e The percentage each MAG bin that is contamination from other genomes. Note that we have no NGA50 for the shotgun metagenomics assembly of P. denticola as less than 50% of this genome was recovered from the co-assembly. Similarly, we recovered no bin mapping to P. denticola from the shotgun metagenome