| Literature DB >> 34184911 |
Christian Milani1,2, Gabriele Andrea Lugli1, Federico Fontana1,3, Leonardo Mancabelli1, Giulia Alessandri1, Giulia Longhi1,3, Rosaria Anzalone3, Alice Viappiani3, Francesca Turroni1,2, Douwe van Sinderen4, Marco Ventura1,2.
Abstract
The use of bioinformatic tools for read-based taxonomic and functional analyses of metagenomic data sets, including their assembly and management, is rather fragmentary due to the absence of an accepted gold standard. Moreover, most currently available software tools need input of millions of reads and rely on approximations in data analysis in order to reduce computing times. These issues result in suboptimal results in terms of accuracy, sensitivity, and specificity when used either for the reconstruction of taxonomic or functional profiles through read analysis or analysis of genomes reconstructed by metagenomic assembly. Moreover, the recent introduction of novel DNA sequencing technologies that generate long reads, such as Nanopore and PacBio, represent a valuable data resource that still suffers from a lack of dedicated tools to perform integrated hybrid analysis alongside short read data. In order to overcome these limitations, here we describe a comprehensive bioinformatic platform, METAnnotatorX2, aimed at providing an optimized user-friendly resource which maximizes output quality, while also allowing user-specific adaptation of the pipeline and straightforward integrated analysis of both short and long read data. To further improve performance quality and accuracy of taxonomic assignment of reads and contigs, custom preprocessed and taxonomically revised genomic databases for viruses, prokaryotes, and various eukaryotes were developed. The performance of METAnnotatorX2 was tested by analysis of artificial data sets encompassing viral, archaeal, bacterial, and eukaryotic (fungal) sequence reads that simulate different biological matrices. Moreover, real biological samples were employed to validate in silico results. IMPORTANCE We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets. Through extensive tests performed on Illumina and Nanopore artificial data sets, we demonstrated the high performance of the software to not only extract taxonomic and functional information from sequence reads but also to assemble and process genomes from metagenomic data. The robustness of these functionalities was validated using "real-life" data sets obtained from Illumina and Nanopore sequencing of biological samples. Furthermore, the performance of METAnnotatorX2 was compared to other available software tools for analysis of shotgun metagenomics data.Entities:
Keywords: deep; functional profiling; metagenomics; shallow; taxonomy
Year: 2021 PMID: 34184911 PMCID: PMC8269244 DOI: 10.1128/mSystems.00583-21
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Schematic representation of read-based and assembly-based pipelines offered by METAnnotatorX2 for analysis of shotgun metagenomics data.
FIG 2Evaluation of species-level classification accuracy based on analysis of the nine artificial data sets. The four different heatmaps show the performance of METannotatorX2, MetaPhlAn 3, Kraken 2, and METAnnotatorX tools in profiling the species used to generate the nine artificial data sets. The black color indicates that the species was undetected or misclassified, the green color means that the species was profiled with a deviation of <5% compared to its expected relative abundance. The red color means that the species was profiled with a deviation of >5% compared to its expected abundance.
FIG 3Evaluation of METannotatorX2, MetaPhlAn 3, Kraken 2, and METAnnotatorX software tool performance in retrieving the taxonomic profiles expected for the nine artificial data sets. The performance of the software tools was evaluated using the DExA index, represented by the ratio between the sum of absolute variances observed compared to the expected profiles. This means that a 0% DExA index score indicates that the tool had retrieved all species with the expected abundance, whereas an DExA index score of 100% means that the profiling software unclassified or misclassified all species in the artificial data set. Panels a, b, c, and d show the performances of METAnnotatorX2, MetaPhlAn 3, Kraken 2, and METAnnotatorX, respectively.