| Literature DB >> 33005868 |
Andrew J McArdle1, Myrsini Kaforou1.
Abstract
A recent study reported that increasing host DNA abundance and reducing read depth impairs the sensitivity of detection of low-abundance micro-organisms by shotgun metagenomics. The authors used DNA from a synthetic bacterial community with abundances varying across several orders of magnitude and added varying proportions of host DNA. However, the use of a marker-gene-based abundance estimation tool (MetaPhlAn2) requires considerable depth to detect marker genes from low-abundance organisms. Here, we reanalyse the deposited data, and place the study in the broader context of low microbial biomass metagenomics. We opted for a fast and sensitive read binning tool (Kraken 2) with abundance estimates from Bracken. With this approach all organisms are detected even when the sample comprises 99 % host DNA and similarly accurate abundance estimates are provided (mean squared error 0.45 vs. 0.3 in the original study). We show that off-target genera, whether contaminants or misidentified reads, come to represent over 10 % of reads when the sample is 99 % host DNA and exceed counts of many target genera. Therefore, we applied Decontam, a contaminant detection tool, which was able to remove 61 % of off-target species and 79 % of off-target reads. We conclude that read binning tools can remain sensitive to low-abundance organisms even with high host DNA content, but even low levels of contamination pose a significant problem due to low microbial biomass. Analytical mitigations are available, such as Decontam, although steps to reduce contamination are critical.Entities:
Keywords: deep sequencing; metagenomics; taxonomy
Year: 2020 PMID: 33005868 PMCID: PMC7523627 DOI: 10.1099/acmi.0.000104
Source DB: PubMed Journal: Access Microbiol ISSN: 2516-8290
Fig. 1.Taxonomic profile of the synthetic metagenome samples determined with Kraken 2, and expressed as the relative abundance of species in a heat map. Actual abundances are presented as per the original publication based on the theoretical number of genome copies present. Species are listed from highest to lowest expected relative abundances. MS=microbial sample; SS10=10 % host DNA; SS90=90 % host DNA; SS99=99 % host DNA.