| Literature DB >> 26778510 |
Stinus Lindgreen1,2,3, Karen L Adair1,2, Paul P Gardner1,2.
Abstract
Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html.Entities:
Mesh:
Year: 2016 PMID: 26778510 PMCID: PMC4726098 DOI: 10.1038/srep19233
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performance of the selected tools. Where applicable, the best performance in each category is highlighted in bold.
| Analysis tool | Fraction | Shuffled | False positives | Run time | Correlation |
|---|---|---|---|---|---|
| CLARK | 73.32% | 340,607 | 0.02% | 211.50 | |
| EBI | 0.08% | 41.74% | ~12 days | 0.7427 | |
| Genometa | 39.91% | 0.83% | 401 | 0.9136 | |
| GOTTCHA | 43.10% | NA | 229.49 | 0.1777 | |
| Kraken | 71.98% | 19 | 60.95 | 0.9915 | |
| LMAT | 56.61% | 1,486,699 | 0.63% | 981.21 | 0.9395 |
| MEGAN | 42.21% | NA | 0.49% | 2489.65 | 0.7728 |
| MetaPhlAn | 5.09% | 0.75% | 108.51 | 0.9552 | |
| MetaPhyler | 0.45% | 649 | 0.05% | 26586.15 | 0.7989 |
| MG-RAST | 56.17% | 3 | 0.27% | 16881.8 | 0.9209 |
| mOTU | 0.16% | NA | 0.10% | 45.8 | 0.9334 |
| One Codex | 73.68% | 23 | 27.77 | 0.9787 | |
| QIIME | 58.23% | 0.28% | 0.7772 | ||
| Taxator-tk | 45.67% | 2 | 14.07% | 9147.92 | 0.8561 |
Fraction: average fraction of all reads that the tool mapped.
Shuffled: average number of shuffled reads mapped.
False positives: fraction of mapped reads assigned to non-existing phyla. Run time: CPU time in minutes per metagenome (where applicable).
Correlation: the average Pearson correlation coefficient between predicted and known relative abundances of phyla in the data sets. GOTTCHA failed on setB3 so the average of sets B1 and B2 is used instead.
Phylum level performance metrics for the individual methods.
| Method | TP | FP | TN | FN | SEN | SPEC | PPV | NPV | MCC |
|---|---|---|---|---|---|---|---|---|---|
| CLARK | 23571770 | 1170750 | 4718015 | 0 | 1.0000 | 0.8012 | 0.9527 | 1.0000 | 0.8736 |
| EBI | 13879 | 9939 | 5782564 | 23654153 | 0.0006 | 0.9983 | 0.5826 | 0.1964 | –0.0157 |
| Genometa | 11732372 | 99524 | 5782564 | 11846075 | 0.4968 | 0.9831 | 0.9917 | 0.3280 | 0.3926 |
| GOTTCHA | 12756512 | 0 | 5782564 | 10921460 | 0.5388 | 1.0000 | 1.0000 | 0.3462 | 0.4327 |
| Kraken | 21305328 | 86 | 5782545 | 2372576 | 0.8998 | 1.0000 | 1.0000 | 0.7091 | 0.7991 |
| LMAT | 15166868 | 1592274 | 4295866 | 8405528 | 0.6433 | 0.7296 | 0.9050 | 0.3382 | 0.3023 |
| MEGAN | 12868515 | 63500 | 5782564 | 10745957 | 0.5452 | 0.9891 | 0.9951 | 0.3499 | 0.4305 |
| MetaPhlan | 1507348 | 0 | 5782564 | 22170624 | 0.0636 | 1.0000 | 1.0000 | 0.2069 | 0.1150 |
| MetaPhyler | 133836 | 713 | 5781915 | 23544072 | 0.0057 | 0.9999 | 0.9947 | 0.1972 | 0.0327 |
| MG-RAST | 16554882 | 44309 | 5782562 | 7078782 | 0.7015 | 0.9924 | 0.9973 | 0.4496 | 0.5605 |
| mOTU | 47846 | 0 | 5782564 | 23630126 | 0.0020 | 1.0000 | 1.0000 | 0.1966 | 0.0200 |
| OneCodex | 21808925 | 320 | 5782541 | 1868749 | 0.9210 | 0.9999 | 1.0000 | 0.7558 | 0.8345 |
| QIIME | 12914 | 37 | 5782564 | 23665021 | 0.0005 | 1.0000 | 0.9972 | 0.1964 | 0.0102 |
| Taxator-tk | 11610500 | 1898276 | 5782562 | 10169197 | 0.5335 | 0.7537 | 0.8593 | 0.3625 | 0.2537 |
Average numbers for the simulated data sets are given.
The metrics are true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) as well as sensitivity (SEN), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV) and Matthew's Correlation Coefficient (MCC).
Figure 1Plot of performance metrics for all methods included in this benchmark.
The closer each metric is to 1, the more accurate the method. The values are based on classifying reads mapped at the level of phylum (A) or genus (B).
Pearson correlations between different quality metrics at the phylum and genus level.
The metrics are Matthew's Correlation Coefficient (MCC), the sum of log-odds scores between predicted and known proportions, and the Pearson correlation between all the predicted and known relative abundances.
known proportions, and the Pearson correlation between all the predicted and known relative abundances. The run times are the same for both genus and phylum level. Significant correlations (P < 0.05) are highlighted with bold. Marginally significant correlations (P < 0.1) are indicated with asterisks (*).
Comparison between different quality metrics at the level of either phylum or genus.
| MCC vs sum of log-odds | MCC vs correlation | Sum of log-odds vs correlation | |
|---|---|---|---|
| Phylum | | ||
| Genus | |
Same metrics and notation as in Table 3.
Figure 2Analysis of performance at the level of phylum (left) and genus (right).
(A,B) Sum of absolute log-odds scores at the phylum (A) or genus (B) level for each tool (bars) and log2 of run time in minutes (asterisks, *). Sum of log-odds scores indicate the overall performance in terms of deviation from the known proportions. A low sum indicates a high accuracy. (C,D) NMDS plot of relative abundances at the level of phylum (C) and genus (D) for the known and predicted communities in replicates. Eukaryotes are not included. Metagenomes in set A are gray, and metagenomes in set B are black. The known communities are shown with a star.
Figure 3Shifts in relative abundance of the three functional categories (or set of categories) that vary between set A and set B for the tools that analyze the functional capacity of metagenomes.
A positive log-odds score means an increase in set A relative to set B, and a negative log-odds score means a decrease in set A relative to set B.
The metagenome analysis tools included in this benchmark.
| Tool | Version | Taxonomy | Function | Fastq | Zipped | Paired |
|---|---|---|---|---|---|---|
| CLARK | 1.1.3 | Yes | No | Yes | Yes | Yes |
| EBI | NA | Yes | Yes | Yes | Yes | Yes |
| Genometa | 0.51 | Yes | No | Yes | No | Yes |
| GOTTCHA | 1.0a | Yes (E) | No | Yes | No | Yes |
| Kraken | 0.10.4 beta | Yes | No | Yes | Yes | Yes |
| LMAT | 1.2.4 | Yes | Yes | No | No | Yes |
| MEGAN | 5.7.0 | Yes | Yes | Yes | No | (No) |
| MetaPhlAn | 2 | Yes (E) | No | Yes | Yes | Yes |
| MetaPhyler | 1.25 | Yes | No | No | No | Yes |
| MG-RAST | 3.3.6 | Yes (E) | Yes | Yes | Yes | (Yes) |
| mOTU | NA | Yes | No | Yes | Yes | Yes |
| One Codex | NA | Yes (E) | No | Yes | Yes | Yes |
| QIIME | 1.8.0 | Yes | No | Yes | Yes | Yes |
| Taxator-tk | 1.2.1 | Yes | No | No | No | Yes |
For each tool, it is shown if it does taxonomic analysis (tools that can also infer Eukaryotic taxa are noted with an “(E)”) and/or functional analysis, and whether it can analyze Fastq files directly, if you can use zipped input files, and if it utilizes paired end information.
1You need to concatenate the input files with an N between paired reads.
2Input to MEGAN was generated using the aligner Diamond (v0.6.3) from the same group.
3MEGAN supports paired end data. However, Diamond does not explicitly support this.
4Each file is treated separately, but the final results can be combined by the tool.
5The server recognizes paired end data but seems to treat reads separately.
6QIIME is highly flexible and can handle both zipped and unzipped fastq files, and both single- and paired-end reads. However, in this analysis we adapted QIIME to work with fasta output from HMMER and could not use these features.
7Although no direct support, the authors provided a way to use the paired information (see Supplementary Material section 2.10 for details).