| Literature DB >> 32953263 |
Andres Benavides1,2, Friman Sanchez3, Juan F Alzate4,5, Felipe Cabarcas4,2.
Abstract
BACKGROUND: A prime objective in metagenomics is to classify DNA sequence fragments into taxonomic units. It usually requires several stages: read's quality control, de novo assembly, contig annotation, gene prediction, etc. These stages need very efficient programs because of the number of reads from the projects. Furthermore, the complexity of metagenomes requires efficient and automatic tools that orchestrate the different stages.Entities:
Keywords: Algorithm; Bioinformatics; Distributed computing; Grid computing; Metagenomics; Workflow
Year: 2020 PMID: 32953263 PMCID: PMC7474881 DOI: 10.7717/peerj.9762
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1DATMA structure.
DATMA automatically executes. (i) sequencing quality control (red blocks) (ii) 16S-identification (blue blocks), (iii) CLAME binning (yellow blocks), (iv) de novo assembly, ORF detection, taxonomic analysis (violet blocks) and (vi) data management report (green blocks).
Figure 2DATMA results for CAMI Low and Medium Complexity datasets.
(A) CAMI high. (B) CAMI medium. (C) CAMI low.
Analysis report for the Brocadia experiment.
| Largest (bp) | N50 (bp) | Genome (Mbp) | ORFS | Complete -ness (%) | Contami -nation (%) | Lineage | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 677 | 88819 | 18421 | 3.96 | 4330 | 93.96 | 10.05 | Brocadiaceae | 60 | |
| 1382 | 13527 | 2456 | 2.37 | 3656 | 47.95 | 1.78 | Brocadiaceae | |||
| 2 | 607 | 58497 | 9402 | 3.67 | 4273 | 96.08 | 5.00 | Brocadiaceae | 135 | |
| 374 | 29910 | 10268 | 2.81 | 4015 | 77.30 | 1.75 | Brocadiaceae | |||
| (NA*) | 10345 | 3264 | 519 | 4.13 | 10283 | 89.47 | 111.28 | Brocadiaceae | 85 | |
| 12753 | 3420 | 360 | 4.21 | 12607 | 74.76 | 100.00 | Bacteroidetes | |||
| 12698 | 4314 | 342 | 4.14 | 11916 | 65.33 | 84.78 | Proteobacteria | |||
Notes.
We manually selected the contigs from the annotation report.
Coverage report on the Brocadia genome using the contigs from each framework.
| DATMA | 677 | 19785 | 53 | 6 | 97.2 | 1.0 |
| MetaWRAP | 607 | 9191 | 73 | 2 | 96.4 | 1.0 |
| SqueezeMeta | 10345 | 579 | 5 | 1251 | 30.4 | 1.0 |
Figure 3Taxonomic report for the 16S rRNA ribosomal sequences from Biosolid metagenome.
Analysis report for the Biosolid metagenome.
| 2 | 1292 | 12707 | 2912 | 2.85 | 3786 | 54.81 | 101.18 (72.60%)a | Chloroflexi-Anaerolineaceae | 125 | |
| 647 | 37610 | 5380 | 2.10 | 2529 | 70.69 | 49.34 (97.75%)a | Chloroflexi-Anaerolineaceae | |||
| 8 | 495 | 87496 | 17399 | 4.92 | 4266 | 94.59 | 4.73 | Bacteria | 485 | |
| 157 | 82800 | 18931 | 2.10 | 2110 | 89.03 | 1.69 | Bacteria | |||
| 218 | 60788 | 20930 | 2.72 | 2954 | 88.70 | 3.22 | Proteobacteria | |||
| 463 | 34164 | 7341 | 2.57 | 3031 | 87.16 | 1.32 | Bacteria | |||
| 731 | 22922 | 4571 | 2.78 | 3293 | 85.49 | 3.01 | Actinobacteria | |||
| 994 | 23123 | 4103 | 3.58 | 4481 | 84.98 | 6.70 | Proteobacteria-Pseudomonas | |||
| 420 | 26523 | 6363 | 2.17 | 2969 | 83.09 | 1.11 | Gammaproteobacteria | |||
| 754 | 19037 | 5384 | 3.33 | 3944 | 82.64 | 3.61 | Proteobacteria-Pseudomonadaceae | |||
| NA (b) | 204323 | 11961 | 528 | 93.69 | 46730 | 100 | 2844 | Proteobacteria | 626 | |
| 49288 | 5376 | 579 | 24.30 | 49227 | 95.83 | 1258 | Firmicutes | |||
| 47728 | 5055 | 519 | 21.66 | 46730 | 100 | 675 | Actinobacteria | |||
| 41526 | 9084 | 585 | 20.69 | 41342 | 100 | 659 | Bacteroidetes | |||
| NA (b) | 114806 (c) | NA | NA | NA | NA | NA | NA | Proteobacteria-Pseudomonadaceae | 1 week | |
| 95148 (c) | Chloroflexi-Anaerolineaceae | |||||||||
Notes.
Strain-heterogeneity index.
We manually selected the contigs from the annotation report.
The values correspond to number of reads.
Figure 4Taxonomic report for the Bin 1 from Biosolid metagenome using DATMA.
Figure 5Phylogenetic tree for the 16S rRNA ribosomal gene (16S_Chloroflexi_UdeA).
The values in the branches indicate the percentage of replicate trees in which the associated taxa clustered together in the bootstrap test.
Figure 6Computational performance of DATMA.
(A) Computational time of DATMA for all datasets using several workers. (B) Memory performance of DATMA.