| Literature DB >> 36224545 |
Josh L Espinoza1,2, Chris L Dupont3,4.
Abstract
BACKGROUND: With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth's natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ, however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes.Entities:
Keywords: Binning; Metagenome-assembled genome; Metagenomics; Pipeline
Mesh:
Year: 2022 PMID: 36224545 PMCID: PMC9554839 DOI: 10.1186/s12859-022-04973-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Schematic of VEBA workflow. VEBA modules and workflow I/O connectivity
Fig. 2Schematic iterative binning algorithm. VEBA’s iterative binning algorithm and the flow of contigs through the procedure
Microeukaryotic protein database taxonomy synopsis
| Number Representatives | Number of sequences | ||||
|---|---|---|---|---|---|
| Class | Order | Family | Genus | Species | |
| Aconoidasida | 3366 | 420945 | |||
| Agaricomycetes | 20 | 122 | 730 | 7633 | 3598622 |
| Arthoniomycetes | 277 | 557 | |||
| Bacillariophyceae | 25 | 49 | 139 | 1139 | 3695969 |
| Bangiophyceae | 3 | 4 | 27 | 298 | 91032 |
| Conoidasida | 3 | 12 | 26 | 548 | 283655 |
| Coscinodiscophyceae | 11 | 24 | 49 | 369 | 761079 |
| Cryptophyceae | 5 | 9 | 18 | 126 | 1281699 |
| Dinophyceae | 13 | 37 | 80 | 404 | 9452835 |
| Dothideomycetes | 33 | 120 | 796 | 4173 | 2193726 |
| Eumycetozoa | 7 | 17 | 48 | 212 | 110038 |
| Eurotiomycetes | 10 | 29 | 137 | 2028 | 2406417 |
| Florideophyceae | 28 | 95 | 650 | 4014 | 140811 |
| Fragilariophyceae | 9 | 12 | 62 | 216 | 226623 |
| Glomeromycetes | 4 | 10 | 30 | 126 | 456928 |
| Haptophyta | 8 | 15 | 31 | 96 | 847085 |
| Kinetoplastea | 4 | 4 | 28 | 355 | 511789 |
| Lecanoromycetes | 15 | 66 | 435 | 2593 | 103042 |
| Leotiomycetes | 9 | 32 | 215 | 795 | 737176 |
| Mediophyceae | 8 | 10 | 49 | 155 | 190677 |
| Microbotryomycetes | 5 | 7 | 15 | 107 | 121936 |
| Mucoromycetes | 1 | 14 | 52 | 184 | 583544 |
| Oligohymenophorea | 10 | 37 | 70 | 406 | 266349 |
| Pezizomycetes | 1 | 15 | 143 | 709 | 224226 |
| Phaeophyceae | 12 | 43 | 236 | 1244 | 58542 |
| Pucciniomycetes | 5 | 19 | 62 | 379 | 228062 |
| Saccharomycetes | 1 | 15 | 83 | 844 | 1157942 |
| Sordariomycetes | 31 | 99 | 705 | 7228 | 3772436 |
| Spirotrichea | 8 | 34 | 84 | 199 | 429742 |
| Tremellomycetes | 4 | 17 | 50 | 316 | 377309 |
| Ustilaginomycetes | 4 | 10 | 25 | 169 | 137101 |
| Xanthophyceae | 4 | 11 | 21 | 149 | 49722 |
| Other (N = 147 classes) | 242 | 346 | 663 | 2065 | 13089302 |
| Total classes = 179 | 546 | 1345 | 5842 | 42922 | 48006918 |
Fig. 3Phylogenetic inference of diatoms recovered in Plastisphere. A Phylogenetic tree using the concatenated alignment of eukaryote_odb10 marker set from BUSCO and FastTree2 visualized with ETE 3. B VEBA eukaryotic classifications for diatom MAGs
Genome binning, clustering results, and complexity analysis for case studies
| BioProject | PRJNA777294 | PRJEB20421 | PRJNA551026 |
| Original Study | |||
| Number of samples | 44 | 64 | 17 |
| Gigabases | 237 | 90 | 9 |
| MAGs (Original Study) | 37 | 8 | 0 |
| MAGs (Sample-specific) | 194(91)c | 214 | 15 |
| MAGs (Multi-sample)b | 25(1)c | 3 | 5 |
| MAGs (Total) | 219 | 217 | 20 |
| SLCs | 154 | 48 | 12 |
| ORFs | 735406 | 652008 | 50711 |
| ORFsa | 706092 | 615479 | 47954 |
| SSOs | 483864 | 140638 | 25848 |
| Genomic FCR | 0.296803653 | 0.778801843 | 0.4 |
| Functional FCRa | 0.314729525 | 0.771498296 | 0.460983442 |
| MAGs (Original Study) | 0 | 17d | 0 |
| MAGs (Sample-specific) | 5(4)c | 3 | 0 |
| MAGs (Multi-sample)b | 0 | 0 | 0 |
| MAGs (Total) | 5 | 3 | 0 |
| SLCs | 4 | 1 | Not applicable |
| ORFs | 78750 | 49958 | Not applicable |
| ORFs (Orthogroups)a | 78171 | 46709 | Not applicable |
| SSOs | 63661 | 15335 | Not applicable |
| Genomic FCR | 0.2 | 0.666666667 | Not applicable |
| Functional FCRa | 0.185618708 | 0.671690681 | Not applicable |
| MAGs (Original Study) | 0 | 6d | 0 |
| MAGs (Sample-specific) | 119 | 345 | 18 |
| MAGs (Multi-sample)b | Not applicable | Not applicable | Not applicable |
| MAGs (Total) | 119 | 345 | 18 |
| SLCs | 81 | 69 | 12 |
| ORFs | 1317 | 20519 | 602 |
| ORFs (Orthogroups)a | 1279 | 20397 | 598 |
| SSOs | 686 | 3436 | 393 |
| Genomic FCR | 0.319327731 | 0.8 | 0.333333333 |
| Functional FCRa | 0.463643471 | 0.831543854 | 0.342809365 |
aOnly includes ORFs that are in SSOs
bMulti-sample binning uses unbinned contigs from all of the samples in a pseudo-coassembly
cParenthesis indicate completeness ≥ 70 and contamination < 2 as used in original study. Outer indicates completeness ≥ 50 and contamination < 10
dQuality was not assessed in original study
Taxonomy of recovered genomes
| Domain | Taxonomy | |||
|---|---|---|---|---|
| Eukaryotic | c_Bacillariophyceae | 4 | 0 | 0 |
| c_Coscinodiscophyceae | 0 | 3 | 0 | |
| c_Pelagophyceae | 1 | 0 | 0 | |
| Prokaryotic | c_Acidimicrobiia | 4 | 0 | 0 |
| c_Actinomycetia | 1 | 0 | 8 | |
| c_Alphaproteobacteria | 97 | 95 | 0 | |
| c_Anaerolineae | 1 | 0 | 0 | |
| c_Babeliae | 0 | 7 | 0 | |
| c_Bacilli | 0 | 0 | 13 | |
| c_Bacteriovoracia | 2 | 0 | 0 | |
| c_Bacteroidia | 26 | 38 | 1 | |
| c_Chlamydiia | 0 | 2 | 0 | |
| c_Cyanobacteriia | 15 | 0 | 0 | |
| c_Gammaproteobacteria | 64 | 70 | 0 | |
| c_Gracilibacteria | 1 | 0 | 0 | |
| c_Planctomycetes | 4 | 0 | 0 | |
| c_Thermoanaerobaculia | 1 | 0 | 0 | |
| c_UBA1135 | 0 | 5 | 0 | |
| c_Vampirovibrionia | 1 | 0 | 0 | |
| c_Verrucomicrobiae | 2 | 0 | 0 | |
| Viral | Caudovirales | 6 | 298 | 8 |
| CressDNAParvo | 1 | 0 | 1 | |
| Inoviridae | 3 | 0 | 0 | |
| PolyoPapillo | 0 | 0 | 1 | |
| Retrovirales | 71 | 13 | 0 | |
| Uncharacterized | 35 | 28 | 8 |
Per iteration genome binning yields
| Origin type | Iteration | |||
|---|---|---|---|---|
| Sample-specific | 1 | 175 | 202 | 15 |
| 2 | 14 | 7 | 0 | |
| 3 | 1 | 4 | 0 | |
| 4 | 1 | 1 | 0 | |
| 5 | 1 | 0 | 0 | |
| 6 | 0 | 0 | 0 | |
| 7 | 2 | 0 | 0 | |
| 8 | 0 | 0 | 0 | |
| 9 | 0 | 0 | 0 | |
| 10 | 0 | 0 | 0 | |
| Multi-sample | 1 | 14 | 3 | 5 |
| 2 | 1 | 0 | 2 | |
| 3 | 3 | 0 | 0 | |
| 4 | 5 | 0 | 0 | |
| 5 | 1 | 0 | 0 | |
| 6 | 1 | 0 | 0 | |
| 7 | 0 | 0 | 0 | |
| 8 | 0 | 0 | 0 | |
| 9 | 0 | 0 | 0 | |
| 10 | 0 | 0 | 0 | |
| Total | - | 219 | 217 | 22 |
Fig. 4Genome statistics of prokaryotic, eukaryotic, and viral genomes. A GC-content and B coding-density for prokaryotic, eukaryotic, and viral MAGs for Plastisphere (blue), MarineAerosol (black), and Netherton (red) datasets, respectively. C) Relationship between genome size and the number of genes for each MAG
Fig. 5Compositional data analysis of Plastisphere. A Clustered abundance heatmap of CLR values using Aitchison distance and 1 − ρ as sample and taxon distance metrics, respectively, followed by average linkage hierarchical clustering. B Differential co-occurrence hive network between mature and early plastic biofilms using ρ proportionality as the association matrix with positive and negative differential connectivity colored as red and blue, respectively. C Heatmap of differential connectivity values in the hive network