| Literature DB >> 32382536 |
Felix M Kibegwa1, Rawlynce C Bett1, Charles K Gachuiri1, Francesca Stomeo2, Fidalis D Mujibi3.
Abstract
Analysis of shotgun metagenomic data generated from next generation sequencing platforms can be done through a variety of bioinformatic pipelines. These pipelines employ different sets of sophisticated bioinformatics algorithms which may affect the results of this analysis. In this study, we compared two commonly used pipelines for shotgun metagenomic analysis: MG-RAST and Kraken 2, in terms of taxonomic classification, diversity analysis, and usability using their primarily default parameters. Overall, the two pipelines detected similar abundance distributions in the three most abundant taxa Proteobacteria, Firmicutes, and Bacteroidetes. Within bacterial domain, 497 genera were identified by both pipelines, while an additional 694 and 98 genera were solely identified by Kraken 2 and MG-RAST, respectively. 933 species were detected by the two algorithms. Kraken 2 solely detected 3550 species, while MG-RAST identified 557 species uniquely. For archaea, Kraken 2 generated 105 and 236 genera and species, respectively, while MG-RAST detected 60 genera and 88 species. 54 genera and 72 species were commonly detected by the two methods. Kraken 2 had a quicker analysis time (~4 hours) while MG-RAST took approximately 2 days per sample. This study revealed that Kraken 2 and MG-RAST generate comparable results and that a reliable high-level overview of sample is generated irrespective of the pipeline selected. However, Kraken 2 generated a more accurate taxonomic identification given the higher number of "Unclassified" reads in MG-RAST. The observed variations at the genus level show that a main restriction is using different databases for classification of the metagenomic data. The results of this research indicate that a more inclusive and representative classification of microbiomes may be achieved through creation of the combined pipelines.Entities:
Mesh:
Year: 2020 PMID: 32382536 PMCID: PMC7195676 DOI: 10.1155/2020/2348560
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Overview of the workflow used (Kraken 2 and MG-RAST) presenting software parameters used to analyze the data. MG-RAST has two additional steps for data transformation and reduction.
Comparison of the functionality and features of MG-RAST and Kraken 2.
| Kraken 2 | MG-RAST | |
|---|---|---|
| License | Open-source | Open-source |
| Implemented in | C++ and Perl | Perl |
| Current version (at 23.05.19) | v2.0.8-beta | 4.0.3 |
| Website |
|
|
| Web-based interface | No | Yes (at website above) |
| Primary usage | Command line | GUI (at website above) |
| Sequencing technology compatibility | Illumina, 454, Sanger, Ion Torrent, PacBio | Illumina, 454, Sanger, Ion Torrent, PacBio |
| Quality control | No | Yes |
| Taxonomic analysis/assignment |
| BLAT |
| Taxonomy | Yes | Yes (E) |
| Function | No | Yes |
| Fastq | Yes | Yes |
| Zipped | Yes | Yes |
| Paired | Yes | Yes (R) |
| Diversity analysis | NO | Alpha |
| Phylogenetic tree | NO | YES |
| Visualization | NO | PCA plots, heat maps, pie charts, bar plots, krona and Circos for visualisation |
“(E)” indicates if the tool infers Eukaryotic taxa and/or functional analysis. GUI means graphical user interface, and “(R)” means the server recognizes paired-end data but seems to treat reads separately. Part of this figure was adapted from the pipeline published by [18].
Evaluation of taxonomic phylotypes by each technique.
| Phylotypes | Kraken 2 | MG-RAST | Commonly detected phylotypes | ||
|---|---|---|---|---|---|
| Lushoto (no.) | Rungwe (no.) | Lushoto (no.) | Rungwe (no.) | ||
| Bacteria | |||||
| Phyla | 38 | 38 | 28 | 28 | 26 |
| Genera | 1191 | 1191 | 595 | 596 | 497 |
| Species | 4462 | 4465 | 1479 | 1481 | 933 |
| Archaea | |||||
| Phyla | 5 | 5 | 5 | 5 | 4 |
| Genera | 105 | 105 | 60 | 60 | 54 |
| Species | 235 | 236 | 88 | 88 | 72 |
Most abundant bacteria according to the two classification approaches.
| Taxa | Kraken 2 | MG-RAST |
|
|---|---|---|---|
| Mean ± SE (%) | Mean ± SE (%) | ||
| Phyla | |||
| Proteobacteria | 75.92 ± 4.1 | 75.12 ± 3.06 | 0.88 |
| Firmicutes | 9.69 ± 2.2 | 9.29 ± 1.63 | 0.88 |
| Bacteroidetes | 9.22 ± 1.41 | 12.7 ± 1.54 | 0.1 |
| Actinobacteria | 2.9 ± 0.41 | 1.25 ± 0.13 | <0.001 |
| Tenericutes | 0.72 ± 0.11 | 0.19 ± 0.05 | <0.001 |
| Genus | |||
| | 32.64 ± 4.1 | 32.42 ± 3.29 | 0.97 |
| | 8.38 ± 1.34 | 3.57 ± 0.54 | <0.001 |
| | 6.72 ± 1.82 | 1.87 ± 0.56 | 0.01 |
| | 2.2 ± 1.36 | 1.08 ± 0.22 | 0.42 |
| | 1.83 ± 0.4 | 3.72 ± 0.75 | 0.03 |
| | 1.61 ± 0.23 | 6.18 ± 0.83 | <0.001 |
| | 1.85 ± 0.35 | 1.75 ± 0.3 | 0.83 |
| | 1.19 ± 0.26 | 2.97 ± 0.54 | <0.001 |
| Species | |||
| | 6.77 ± 1.69 | 21.67 ± 2.81 | <0.001 |
| | 1.03 ± 0.35 | 3.57 ± 0.54 | <0.001 |
| | 1.34 ± 0.38 | 2.34 ± 0.18 | 0.02 |
| | 1.46 ± 0.49 | 1.66 ± 0.37 | 0.74 |
| | 1.22 ± 0.21 | 0.45 ± 0.13 | <0.001 |
| | 0.83 ± 0.13 | 0.21 ± 0.05 | <0.001 |
Most abundant archaea according to the two classification approaches.
| Taxa | Kraken 2 | MG-RAST |
|
|---|---|---|---|
| Mean ± SE (%) | Mean ± SE (%) | ||
| Phyla | |||
| Euryarchaeota | 90.96 ± 0.41 | 94.44 ± 0.44 | <0.001 |
| Crenarchaeota | 6.59 ± 0.27 | 4.45 ± 0.39 | <0.001 |
| Thaumarchaeota | 2.3 ± 0.25 | 0.54 ± 0.06 | <0.001 |
| Korarchaeota | 0.07 ± 0.03 | 0.49 ± 0.08 | <0.001 |
| Genus | |||
| | 8.17 ± 1.47 | 2.4 ± 0.16 | <0.001 |
| | 7.62 ± 0.51 | 12.83 ± 0.82 | <0.001 |
| | 6.79 ± 0.51 | 2.51 ± 0.19 | <0.001 |
| | 3.53 ± 0.59 | 6.39 ± 0.98 | 0.01 |
| Species | |||
| | 3.62 ± 0.7 | 9.51 ± 1.28 | <0.001 |
| | 2.82 ± 0.27 | 3.42 ± 0.23 | 0.1 |
| | 1.84 ± 0.27 | 15.87 ± 1.85 | <0.001 |
| | 1.97 ± 0.54 | 6.39 ± 0.98 | <0.001 |
| | 1.6 ± 0.19 | 3.96 ± 0.23 | <0.001 |
Figure 2Alpha diversity matrices of bacteria (a) and archaea (b), between Lushoto and Rungwe samples.