| Literature DB >> 27067514 |
Hsin-Hung Lin1, Yu-Chieh Liao1.
Abstract
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/.Entities:
Mesh:
Year: 2016 PMID: 27067514 PMCID: PMC4828714 DOI: 10.1038/srep24175
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An overview of the MyCC workflow and visualization.
(a) A schematic workflow for MyCC. (b) A plot of Barnes-Hut-SNE-based dimensionality reduction. (c) Automated clustering by affinity propagation. (c) Corrected clusters based on marker genes. These plots were output by MyCC in binning Sharon’s dataset (“MyCC.py carrol.fasta -a My.depth.txt -keep”).
Figure 2Explanations for outputs of MyCC.
(a) Visualization of metagenomic binning. (b) A summary file produced by MyCC, reporting genome size (WholeGenome), N50, numbers of contigs (NoOfCtg) and marker genes (Cogs) for each bin. (c) Binning sequences in a cluster are output in FASTA format. (d) Gold-standard binning assignments available at MetaBAT’s website. (e) Binning performance evaluation based on the gold-standard assignments. MyCC was applied to bin a mock dataset of 25 genomes (“MyCC.py assembly.fa -a My.depth.txt”).
Binning performance on various datasets (simulated reads, mock libraries and real samples).
| No. of bins | No. of binned contigs | Precision (%) | Recall (%) | F1 (%) | ||
|---|---|---|---|---|---|---|
| Simulated dataset | ||||||
| CONCOCT | 19 | 2,185 | 98.78 | 97.67 | 98.2 | |
| MaxBin | 10 | 2,125 | 93.16 | 97.17 | 95.1 | |
| MetaBAT | 9 | 1,653 | 90.26 | 95.13 | 92.6 | |
| MyCC (default) | 10 | 2,185 | 97.79 | 97.79 | 97.8 | |
| CONCOCT | 79 | 8,977 | 59.67 | 97.40 | 74.0 | |
| MaxBin | 84 | 7,308 | 89.64 | 84.52 | 87.0 | |
| MetaBAT | 105 | 5,430 | 92.72 | 89.59 | 91.1 | |
| MyCC (default) | 93 | 8,978 | 87.45 | 90.54 | 89.0 | |
| Mock datasets | ||||||
| CONCOCT | 29 | 1,892 | 72.67 | 97.15 | 83.1 | |
| MaxBin2 | 26 | 1,892 | 90.00 | 90.38 | 90.2 | |
| MetaBAT | 31 | 1,742 | 93.78 | 93.57 | 93.7 | |
| MyCC (default) | 23 | 1,893 | 88.97 | 97.35 | 93.0 | |
| CONCOCT | 84 | 23,585 | 70.63 | 93.90 | 80.6 | |
| MaxBin | 56 | 20,639 | 84.96 | 81.83 | 83.4 | |
| MetaBAT | 70 | 8,722 | 86.78 | 77.40 | 81.8 | |
| MyCC (default) | 61 | 23,602 | 83.19 | 88.76 | 85.9 | |
| Real dataset | ||||||
| CONCOCT | 32 | 2,291 | 79.92 | 97.58 | 87.9 | |
| GroopM | 13 | 1,687 | 88.39 | 86.29 | 87.3 | |
| MaxBin2 | 10 | 2,294 | 82.94 | 93.75 | 88.0 | |
| MetaBAT | 10 | 1,573 | 85.46 | 93.66 | 89.4 | |
aOnly contigs with a length longer or equal to 1,000 bp.