| Literature DB >> 34260698 |
Guillem Salazar1, Hans-Joachim Ruscheweyh1, Falk Hildebrand2,3, Silvia G Acinas4, Shinichi Sunagawa1.
Abstract
SUMMARY: Profiling the taxonomic composition of microbial communities commonly involves the classification of ribosomal RNA gene fragments. As a trade-off to maintain high classification accuracy, existing tools are typically limited to the genus level. Here, we present mTAGs, a taxonomic profiling tool that implements the alignment of metagenomic sequencing reads to degenerate consensus reference sequences of small subunit ribosomal RNA genes. It uses DNA fragments, that is, paired-end sequencing reads, as count units and provides relative abundance profiles at multiple taxonomic ranks, including operational taxonomic units (OTUs) based on a 97% sequence identity cutoff. At the genus rank, mTAGs outperformed other tools across several metrics, such as the F1 score by > 11% across data from different environments, and achieved competitive (F1 score) or better results (Bray-Curtis dissimilarity) at the sub-genus level.Entities:
Year: 2021 PMID: 34260698 PMCID: PMC8696115 DOI: 10.1093/bioinformatics/btab465
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Benchmarking results on taxonomic profiling of microbial communities. (A) Internal benchmarking: benchmarking of the mTAGs reference database construction for read length of 150 bp. Values correspond to the performance in classification (F1 score) and profiling (Bray–Curtis similarity to the expected composition) at seven taxonomic ranks for the definition of the OTU representative sequence as (i) the degenerate consensus sequence of all respective members (blue) or (ii) the longest member sequence (green).The values of 10 independent evaluations are plotted. See the Supplementary Figure S1 for precision and recall values and results based on alternative read lengths. (B) External benchmarking: benchmarking of mTAGs against QIIME 1, QIIME 2, mothur and MAPseq using simulated datasets comprising the most abundant genera found in the human gut, ocean and soil environments (Almeida et al., 2018). Bray–Curtis similarity to the expected composition and F1 score values correspond to classifications at the genus-level (the lowest taxonomic rank common to all tools). To ensure comparability between the tools, the results are based on the SILVA SSU database version 128. See the Supplementary Information for more details and Supplementary Figure S2 for precision and recall values and results based on alternative reference databases. (C) Metagenomes-based benchmarking: benchmarking of mTAGs and MAPseq using metagenomic data from the second CAMI challenge (Meyer ). Values correspond to the performance in classification (F1 score) and profiling (Bray–Curtis dissimilarity to the expected composition) at seven taxonomic ranks