| Literature DB >> 25111003 |
Anastasia S Khodakova1, Renee J Smith1, Leigh Burgoyne1, Damien Abarno2, Adrian Linacre1.
Abstract
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25111003 PMCID: PMC4128759 DOI: 10.1371/journal.pone.0104996
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
General characteristics of full sequencing data.
| Sequencingapproach | Average numberof reads (range) | Numberof Mbp | Average readlength, bp ± SD | FailedQC(%) | Number of reads withpredicted proteincoding regions (range) | Number of reads withpredicted rRNAgenes (range) | Number of assignedfeatures to M5NRdatabase (%) | Number of assignedfeatures to SEEDSubsystems (%) | Number of assignedfeatures to M5RNAdatabase (%) |
|
| 672 542(531 108–806 483) | 133.6 | 197±73 | 20 | 464 929(325 410–582 708) | 82 151(62 899–96 886) | 35 | 43 | 1.3 |
|
| 468 187(74370–1 074 266) | 70.7 | 142±69 | 24 | 287 840(49 902–617 609) | 44 896(5 868–104 247) | 26 | 35 | 0.0 |
|
| 911 553(506 028–2 012 359) | 178.5 | 198±75 | 20 | 549 355(354 930–1 032 625) | 96 117(61 694–187 539) | 26 | 30 | 0.8 |
Statistical data represented as mean ± Standard Deviation (SD). Percentage of sequences matching to the M5NR, M5RNA and SEED Subsystems databases was determined with an E-value cut-off of E<1×10−5. QC = quality control.
Figure 1Comparison of the taxonomic soil profiles generated on full datasets at the phylum (A, B, C) and species (D, E, F) resolution levels.
Bray-Curtis distance similarity matrix was calculated from the square-root transformed abundance of DNA fragments matching taxa in the M5NR database (E-value <1×10−5). The Bray-Curtis matrix was used for generating CLUSTER dendrogram, NMDS and CAP ordination plots. CLUSTER analysis (A and D). Red dotted branches on the CLUSTER dendrogram indicate no significant difference between metagenomic profiles (supported by the SIMPROF analysis, p<0.05). NMDS unconstrained ordination (B and E). The NMDS plot displays distances between samples. Data points that are closer to each other represent samples with highly similar metagenomic profiles. CAP constrained ordination (C and F). CAP analysis tests for differences among the groups in multivariate space. The significance of group separation along the canonical axis is indicated by the value of the squared canonical correlation (δ1 2) and P-value. A contour line on the NMDS and CAP ordinations drawn round each of the cluster defines the superimposition of clusters from CLUSTER dendrogram at the selected level of similarity.
Results of CAP model cross-validation of soil taxonomic profiles discrimination generated from full sequencing datasets.
| Original Group | AP_A | AP_B | WGA_A | WGA_B | SH_A | SH_B | |
| Taxonomy level |
| ||||||
| % correct | 67 | 100 | 100 | 0 | 67 | 67 | |
| correct/total | 2/3 | 3/3 | 3/3 | 0/3 | 2/3 | 2/3 | |
| Misclassifiedto group | AP_B | n/a | n/a | SH_A | SH_B | SH_A | |
| SH_B | |||||||
| WGA_A | |||||||
| Taxonomy level |
| ||||||
| % correct | 100 | 100 | 100 | 0 | 67 | 33 | |
| correct/total | 3/3 | 3/3 | 3/3 | 0/3 | 2/3 | 1/3 | |
| Misclassified togroup | SH_B | SH_A | |||||
| n/a | n/a | n/a | WGA_A | WGA_B | WGA_B | ||
| SH_A | |||||||
| Taxonomy level |
| ||||||
| % correct | 100 | 100 | 100 | 0 | 67 | 33 | |
| correct/total | 3/3 | 3/3 | 3/3 | 0/3 | 2/3 | 1/3 | |
| Misclassified togroup | SH_B | SH_A | |||||
| n/a | n/a | n/a | WGA_A | SH_B | WGA_B | ||
| SH_A | |||||||
| Taxonomy level |
| ||||||
| % correct | 100 | 100 | 100 | 0 | 67 | 67 | |
| correct/total | 3/3 | 3/3 | 3/3 | 0/3 | 2/3 | 2/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | n/a | WGA_A | SH_B | WGA_B | ||
| SH_A | |||||||
| Taxonomy level |
| ||||||
| % correct | 100 | 100 | 100 | 0 | 100 | 67 | |
| correct/total | 3/3 | 3/3 | 3/3 | 0/3 | 3/3 | 2/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | n/a | WGA_A | n/a | WGA_B | ||
| SH_A | |||||||
| Taxonomy level |
| ||||||
| % correct | 100 | 100 | 100 | 0 | 67 | 67 | |
| correct/total | 3/3 | 3/3 | 3/3 | 0/3 | 2/3 | 2/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | n/a | WGA_A | SH_B | WGA_B | ||
| SH_A | |||||||
Figure 2Comparison of the metabolic soil profiles generated on full datasets at the subsystems level 1 (A, B, C) and subsystems function (D, E, F) resolution levels.
Bray-Curtis distance similarity matrix was calculated from the square-root transformed abundance of DNA fragments matching taxa in the SEED database (E-value <1×10−5). The Bray-Curtis matrix was used for generating CLUSTER dendrogram, NMDS and CAP ordination plots. CLUSTER analysis (A and D). Red dotted branches on the CLUSTER dendrogram indicate no significant difference between metagenomic profiles (supported by the SIMPROF analysis, p<0.05). NMDS unconstrained ordination (B and E). The NMDS plot displays distances between samples. Data points that are closer to each other represent samples with highly similar metagenomic profiles. CAP constrained ordination (C and F). CAP analysis tests for differences among the groups in multivariate space. The significance of group separation along the canonical axis is indicated by the value of the squared canonical correlation (δ1 2) and P-value. A contour line on the NMDS and CAP ordinations drawn round each of the cluster defines the superimposition of clusters from CLUSTER dendrogram at the selected level of similarity.
Results of CAP model cross-validation of soil metabolic profiles discrimination generated from full sequencing datasets.
| Original Group | AP_A | AP_B | WGA_A | WGA_B | SH_A | SH_B | |
| Metabolic level |
| ||||||
| % correct | 100 | 100 | 100 | 33 | 67 | 33 | |
| correct/total | 3/3 | 3/3 | 3/3 | 1/3 | 2/3 | 1/3 | |
| Misclassified togroup | n/a | n/a | n/a | SH_A | SH_B | SH_A | |
| SH_B | WGA_B | ||||||
| Metabolic level |
| ||||||
| % correct | 100 | 100 | 100 | 33 | 67 | 100 | |
| correct/total | 3/3 | 3/3 | 3/3 | 1/3 | 2/3 | 3/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | n/a | WGA_A | SH_B | n/a | ||
| Metabolic level |
| ||||||
| % correct | 100 | 100 | 100 | 33 | 67 | 67 | |
| correct/total | 3/3 | 3/3 | 3/3 | 1/3 | 2/3 | 2/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | n/a | WGA_A | SH_B | SH_A | ||
| Metabolic level |
| ||||||
| % correct | 100 | 100 | 67 | 0 | 67 | 67 | |
| correct/total | 3/3 | 3/3 | 2/3 | 0/3 | 2/3 | 2/3 | |
| Misclassified togroup | SH_B | ||||||
| n/a | n/a | WGA_B | WGA_A | SH_B | SH_A | ||
| SH_B | |||||||
RELATE comparison of Bray-Curtis similarity matrices.
| Taxonomic level | Spearman rank coefficient | Metabolic level | Spearman rank coefficient |
| phylum | 0.887 | level 1 | 0.652 |
| class | 0.944 | level 2 | 0.958 |
| order | 0.959 | level 3 | 0.967 |
| family | 0.940 | functions | 0.969 |
| genus | 0.965 | ||
| species | 0.966 |
The Bray-Curtis similarity matrices calculated from square root transformed abundance of DNA fragments generated based on full datasets and sub-sampled datasets.