| Literature DB >> 35992057 |
Ivan Tolstoganov1, Yuri Kamenev2, Roman Kruglikov3, Sofia Ochkalova4, Anton Korobeynikov1,5.
Abstract
Despite the recent advances in high-throughput sequencing, metagenome analysis of microbial populations still remains a challenge. In particular, the metagenome-assembled genomes (MAGs) are often fragmented due to interspecies repeats, uneven coverage, and varying strain abundance. MAGs are constructed via a binning process that uses features of input data in order to cluster long contigs presumably belonging to the same species. In this work, we present BinSPreader-a binning refiner tool that exploits the assembly graph topology and other connectivity information to refine binning, correct binning errors, and propagate binning to shorter contigs. We show that BinSPreader could increase the completeness of the bins without sacrificing the purity and could predict contigs belonging to several MAGs. BinSPreader is effective in binning shorter contigs that often contain important conservative sequences that might be of great interest to researchers.Entities:
Keywords: Algorithms; Bioinformatics; Genomics; Microbial genomics
Year: 2022 PMID: 35992057 PMCID: PMC9386100 DOI: 10.1016/j.isci.2022.104770
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Comparison of running times for BinSPreader and other graph-aware binning refiners in the standard and paired-end utilizing modes on Zymo, BMock12, IC9, and Sharon datasets
| Method | Zymo | BMock12 | IC9 | Sharon |
|---|---|---|---|---|
| BinSPreader | ||||
| MetaCoAG | 19m 15s | 4m 22s | 14m 29s | 3m 3s |
| BinSPreader-PE | 8h 24m 29s | |||
| METAMVGL | 3h 24m 40s | 6h 14m 16s | 1h 16m 10s | |
| Binnacle (+MetaCarvel) | 3h 19m 29s | 4h 44m 10s | 1h 49m 23s | 12h 40m 21s |
The execution times for the Binnacle and the MetaCarvel scaffolder are summed because they are only intended to be used together. In addition to the time listed, Binnacle and METAMVGL, unlike BinSPreader, that maps reads on the run, require time for read alignment step. For the evaluation we used bins generated with MetaBAT2 and machine Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz with five cores.
Figure 1Mean F1 scores across all methods and samples
Figure 2Number of recovered high-quality genomes across all methods and samples
Figure 3Hierarchical clustering of Zymo MetaBAT2 refined bins using the prob Jaccard distance between bin distributions on the assembly graph
The leafs are colored by reference, and leaf numbers are bin labels. E. coli and S. enterica bins have overlap on the assembly graph and therefore are cross-contaminated.
Figure 4Hierarchical clustering of BMock12 MetaBAT2 refined bins using the prob Jaccard distance between bin distributions on the assembly graph
The leafs are colored by reference, and leaf numbers are bin labels. Two Micromonospora strains have significant overlap on the assembly graph, and one of the Marinobacter bins is clearly contaminated.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| MBARC26 dataset | NCBI SRA, accession number SRX1836716 | |
| BMock12 dataset | NCBI SRA, accession number SRX4901583 | |
| ZymoBIOMICS Microbial Community Standard | ||
| magsim-MGE dataset | ||
| simHC+ dataset | ||
| IC9 dataset | NCBI SRA, accession number SRX10650162 for Illumina data, accession number SRX10650163 for Hi-C data | |
| Sharon dataset | NCBI SRA, accession number SRX144807 | |
| Assembly graphs, scaffolds, abundance profiles, binning results for the datasets used in the study | This Study | |
| CARD database | ||
| AMBER v.2.0.3 | ||
| Barrnap v.0.9 | Torsten Seemann | |
| bin3C v.0.1.1 | ||
| Binnacle (January 16th version) | ||
| BinSPreader v.0.1 | This Study | |
| CheckM v.1.0.13 | ||
| DAS_Tool v.1.1.3 | ||
| MaxBin2 v.2.2.7 | ||
| MetaBAT2 v.2.12.1 | ||
| MetaCoAG v.1.0 | ||
| METAMVGL v.1.0 | ||
| metaQUAST v.5.0.2 | ||
| metaWRAP v.1.3 | ||
| MinCED v.0.4.2 | ||
| RGI v.5.2.1 | ||
| SPAdes v.3.15.3 | ||
| VAMB v.3.0.3 | ||