| Literature DB >> 25621171 |
Cedric C Laczny1, Tomasz Sternal2, Valentin Plugaru3, Piotr Gawron1, Arash Atashpendar3, Houry Hera Margossian3, Sergio Coronado1, Laurens van der Maaten4, Nikos Vlassis5, Paul Wilmes1.
Abstract
BACKGROUND: Metagenomics is limited in its ability to link distinct microbial populations to genetic potential due to a current lack of representative isolate genome sequences. Reference-independent approaches, which exploit for example inherent genomic signatures for the clustering of metagenomic fragments (binning), offer the prospect to resolve and reconstruct population-level genomic complements without the need for prior knowledge.Entities:
Keywords: Binning; Machine learning; Metagenomics; Visualization
Year: 2015 PMID: 25621171 PMCID: PMC4305225 DOI: 10.1186/s40168-014-0066-1
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Figure 1Visualization and polygonal selection in VizBin. Scatter plot visualization in VizBin of a groundwater-derived metagenomic dataset [20]. The manually placed red polygon highlights a selected cluster of interest. The corresponding sequences can be exported for further analysis. Minimal fragment length: 1,000 nt. Point size is proportional to the natural logarithm of sequence fragment length. Opaqueness is proportional to the natural logarithm of coverage (coverage values according to alignment of reads from [20] to the contigs). A star-like shape highlights contigs annotated to contain the GrpE gene.
Figure 2Comparison of sequence clusters apparent in VizBin to bins previously defined using MaxBin [ 16 ]. Scatter plot visualization of a cellulolytic microbial community metagenomic dataset (37A) [16]. Minimal fragment length: 1,000 nt. Points coloured according to original MaxBin-based bins.
Figure 3Visualization and polygonal selection of clusters from a cellulolytic microbial consortium metagenomic dataset 37B [ 16 ]. Points highlighted in red according to contig assignment in MaxBin: (A) bin 37B.out.024 and (B) bin 37B.out.026, respectively. Individual subclusters (37B.out.024.001, 37B.out.024.002, 37B.out.026.001, and 37B.out.026.002) are highlighted with inserts showing closeups. Minimal fragment length: 1,000 nt.
Statistics of subclusters identified using VizBin for MaxBin-based bins 37B.out.024, 37B.out.026, and SRS013705.out.029
|
|
|
|
|
|
|---|---|---|---|---|
| 37B.out.024.001 | 518 | 0.75 | 37 | 0 |
| 37B.out.024.002 | 1116 | 1.96 | 41 | 0 |
| 37B.out.026.001 | 569 | 0.79 | 12 | 3 |
| 37B.out.026.002 | 419 | 0.58 | 22 | 2 |
| SRS013705.out.029.001 | 675 | 1.52 | 41 | 2 |
| SRS013705.out.029.002 | 292 | 0.47 | 31 | 2 |
| SRS013705.out.029.003 | 485 | 0.81 | 22 | 0 |
| SRS013705.out.029.004 | 483 | 0.80 | 9 | 0 |
| SRS013705.out.029.005 | 370 | 0.58 | 33 | 0 |
Copy numbers according to annotation of 107 single-copy marker genes.
Figure 4VizBin runtimes. Average (n=3) serial (one thread; blue color) and parallel (four threads; red colour) runtimes of VizBin on datasets of different size.