| Literature DB >> 30104688 |
Alejandra Escobar-Zepeda1, Elizabeth Ernestina Godoy-Lozano1, Luciana Raggi1, Lorenzo Segovia1,2, Enrique Merino1,2, Rosa María Gutiérrez-Rios1,2, Katy Juarez1,2, Alexei F Licea-Navarro1,3, Liliana Pardo-Lopez1,2, Alejandro Sanchez-Flores4,5.
Abstract
Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark . Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30104688 PMCID: PMC6089906 DOI: 10.1038/s41598-018-30515-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Maximum coverage reached at each taxonomic level for methods tested in 16S rRNA amplicon datasets. Panels from A-C corresponds to BLAST-alignment based methods and represents coverage at (A) 1%, (B) 5%, (C) 10% error cut-offs. Panels from (D–F) corresponds to BLAST-independent based methods and represents coverage at (A) 1%, (B) 5%, (C) 10% error cut-offs. The main differences are observed at class and order taxonomic levels for Metaxa2, SPINGO-GG and QIIME-GG methods.
Figure 2Performance descriptors plots calculated for methods tested in 16S rRNA amplicon datasets annotation. Panels A and B corresponds to accuracy and specificity for BLAST-alignment based methods. Panels C and D corresponds to accuracy and specificity for BLAST-independent based methods. The score scale from 0 to 1 correspond the order of inner to outer circles.
Figure 3Maximum coverage reached at each taxonomic level for methods tested in whole metagenome shotgun datasets. Panels A-C corresponds to BLAST-alignment based methods and represents coverage at (A) 1%, (B) 5%, (C) 10% error cut-offs. Panels (D–F) corresponds to BLAST-independent based methods and represents coverage at (A) 1%, (B) 5%, (C) 10% error cut-offs. A coverage decrement is clear from class to family level in all the Blast-alignment based methods (A–C); Clark showed the lower coverages at 1 and 5% of error thresholds (D,E).
Figure 4Performance descriptors plots calculated for methods tested in whole metagenome shotgun datasets annotation. Panels A and B corresponds to accuracy and specificity for BLAST-alignment based methods. Panels C and D corresponds to accuracy and specificity for BLAST-independent based methods. The score scale from 0 to 1 correspond the order of inner to outer circles.
Figure 5Taxonomic abundance of annotation at phylum level. (A) BLAST-alignment based methods on 16S rRNA amplicon data, (B) BLAST-independent methods on 16S rRNA amplicon data, (C) BLAST-based methods on whole metagenome shotgun data, (D) BLAST-independent methods on whole metagenome shotgun data. The black line represents the average of the expected abundance for each plot.
Figure 6Matthews correlation coefficient (MCC) for (A) 16S rRNA amplicon data confusion matrices and (B) whole metagenome shotgun data confusion matrices.
Performance descriptors for the best methods ranked according to MCC at every taxonomic level.
| Taxonomic level | 16S rRNA amplicon | Whole Metagenome Shotgun | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | MCC | ACC | Speca | Sensb | Method | MCC | ACC | Speca | Sensb | |
| Phylum | QIIME-RDP | 1.000 | 1.000 | 1.000 | 1.000 | Parallel-meta-GG | 0.997 | 1.000 | 1.000 | 0.999 |
| Class | Metaxa2-MTX | 0.961 | 0.996 | 0.944 | 0.999 | MOCAT | 0.991 | 0.997 | 0.996 | 1.000 |
| Order | QIIME-MTX | 0.824 | 0.98 | 0.753 | 0.996 | MOCAT | 0.990 | 0.997 | 0.996 | 1.000 |
| Family | QIIME-MTX | 0.553 | 0.916 | 0.374 | 0.995 | MOCAT | 0.974 | 0.992 | 0.99 | 1.000 |
| Genus | Parallel-meta-MTX | 0.607 | 0.928 | 0.399 | 1.000 | MOCAT | 0.885 | 0.966 | 0.959 | 1.000 |
| Species | Parallel-meta-MTX | 0.083 | 0.661 | 0.147 | 0.908 | MOCAT | 0.824 | 0.948 | 0.940 | 1.000 |
| Subspecies | SPINGO-MTX | 0.061 | 0.249 | 0.898 | 0.213 | MetaPhlAn2 | 0.546 | 0.736 | 0.947 | 0.64 |
aSpecificity; bSensitivity.