| Literature DB >> 29690922 |
Sebastian Jaenicke1,2, Stefan P Albaum3, Patrick Blumenkamp4, Burkhard Linke4, Jens Stoye5, Alexander Goesmann4.
Abstract
BACKGROUND: The characterization of microbial communities based on sequencing and analysis of their genetic information has become a popular approach also referred to as metagenomics; in particular, the recent advances in sequencing technologies have enabled researchers to study even the most complex communities. Metagenome analysis, the assignment of sequences to taxonomic and functional entities, however, remains a tedious task: large amounts of data need to be processed. There are a number of approaches addressing particular aspects, but scientific questions are often too specific to be answered by a general-purpose method.Entities:
Keywords: Metagenomics; Microbial community analysis; Next-generation sequencing
Mesh:
Year: 2018 PMID: 29690922 PMCID: PMC5937802 DOI: 10.1186/s40168-018-0460-1
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1MGX system overview. MGX is a client/server framework, where each client connects to one (or several) MGX server instances; sequence data and corresponding metadata are stored on the server. A job dispatcher prioritizes and schedules analysis workflows to compute resources such as high-performance computing (HPC) clusters or a compute cloud
Fig. 2The MGX application client. Shown are the project explorer window (top left), quality control reports for the currently selected sequencing run (bottom left), and a hierarchical tree chart (center) displaying three groups, which are defined at the bottom. Different customization and filtering options are available for each chart (right)
Fig. 3Single sequence resolution. With MGX, analysis results can be inspected down to individual sequence level, allowing the comparison between different annotation strategies as well as providing additional contextual information. Here, a sequence assigned to Psychrobacter cryohalolentis carries a trehalose-phosphatase fragment, which is independently supported by three different analysis methods (TIGRFAMS, COG, and an EC number)
Taxonomic classification performance on genus level for benchmark datasets
| Kraken | Kaiju | Centrifuge | MetaPhlAn 2 | MGX | |
|---|---|---|---|---|---|
| RefSeq | |||||
| True positive | 12,059,412 | 9,329,288 | 12,611,380 | 414,943 | 12,566,362 |
| False positive | 18,748 | 185,899 | 53,092 | 7,171 | 20,698 |
| False negative | 1,281,840 | 3,844,813 | 695,528 | 12,937,886 | 772,940 |
| Sensitivity | 0.9039 | 0.7082 |
| 0.0311 | 0.9421 |
| Precision |
| 0.9805 | 0.9958 | 0.9830 |
|
| Accuracy | 0.9027 | 0.6983 |
| 0.0311 | 0.9406 |
| F1 score | 0.9488 | 0.8224 |
| 0.0602 | 0.9694 |
| GenBank | |||||
| True positive | 1,851,436 | 2,592,655 | 2,175,122 | 92,383 | 3,976,270 |
| False positive | 398,899 | 1,230,445 | 864,989 | 10,378 | 734,389 |
| False negative | 9,629,665 | 8,56,900 | 8,839,889 | 11,777,239 | 7,169,341 |
| Sensitivity | 0.1613 | 0.2435 | 0.1975 | 0.0078 |
|
| Precision | 0.8227 | 0.6782 | 0.7155 |
| 0.8441 |
| Accuracy | 0.1558 | 0.2182 | 0.1831 | 0.0078 |
|
| F1 score | 0.2697 | 0.3583 | 0.3095 | 0.0154 |
|
All tools achieve high precision on the RefSeq-derived metagenome, as the source organisms are already included in the relevant classification databases. For the GenBank-based metagenome containing only species not present in the tools’ databases, MetaPhlAn 2 offers high precision but only a very low sensitivity (0.78%), followed by the MGX-provided default pipeline, which ranks highest in sensitivity and accuracy as well as F1 score. Numbers in italics denote best results
Comparison of applications for metagenome analysis
| MGX | MG-RAST | IMG/M | EBI Metagenomics | CloVR | CyVerse | |
|---|---|---|---|---|---|---|
| Quality control | x | x | x | x | – | x |
| Taxonomic/functional profiling | x | x | x | x | x | x |
| Assembly support | – | (x)a | (x)a, b | (x)a | – | x |
| Charts/visualizations | x | x | x | x | x | – |
| Custom pipelines | x | – | – | – | – | x |
| User-provided databases | x | – | – | – | – | x |
| Fragment recruitment | x | – | – | – | – | – |
| Free of charge | x | x | x | x | (−)c | x |
aSupports analysis of preassembled data
bSubmission restricted to assembled data
cNo cost to run standalone virtual machine, but typical metagenome sizes will require additional compute resources on Amazon EC2