| Literature DB >> 28464793 |
Bertjan Broeksema1, Magdalena Calusinska2, Fintan McGee1, Klaas Winter1,3, Francesco Bongiovanni1, Xavier Goux1, Paul Wilmes4, Philippe Delfosse1, Mohammad Ghoniem1.
Abstract
BACKGROUND: Recent advances in high-throughput sequencing allow for much deeper exploitation of natural and engineered microbial communities, and to unravel so-called "microbial dark matter" (microbes that until now have evaded cultivation). Metagenomic analyses result in a large number of genomic fragments (contigs) that need to be grouped (binned) in order to reconstruct draft microbial genomes. While several contig binning algorithms have been developed in the past 2 years, they often lack consensus. Furthermore, these software tools typically lack a provision for the visualization of data and bin characteristics.Entities:
Keywords: Contig bin visualization; Genome reconstruction; Metagenomics; Software
Mesh:
Year: 2017 PMID: 28464793 PMCID: PMC5414344 DOI: 10.1186/s12859-017-1653-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Static image from the ICoVeR interactive display of the AD microbiome dataset. ICoVeR-refined genome bins 17 and 30 are shown as examples
Summary of genome bins for AD microbiome dataset reconstructed with different binning algorithms
| Binning algorithm | No genome bins | Completeness | Contamination | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Near (≥90%) | Substantial (≥70 to 90%) | Moderate (≥50 to 70%) | Partial (<50%) | Low (≤5%) | Medium (5 to ≤10%) | High (10 to ≤15%) | Very high (>15%) | ||
| MetaBAT_1 | 34 | 8 | 7 | 5 | 14 | 31 | 3 | 0 | 0 |
| MetaBAT_2 | 33 | 8 | 7 | 5 | 13 | 29 | 3 | 0 | 1 |
| MyCC | 49 | 12 | 8 | 10 | 19 | 29 | 14 | 2 | 4 |
| CONCOCT | 34 | 14 | 4 | 5 | 11 | 17 | 3 | 1 | 13 |
| ICoVeRa |
|
|
|
|
|
|
|
|
|
MetaBAT_1 - ‘sensitive/specific’ mode
MetaBAT_2 - ‘superspecific’ mode
aFor ICoVeR (bold font), 31 pre-selected and the most complete genome bins (≥50% completeness based on MyCC genome bins) were refined
Completeness and contamination were calculated with CheckM. For CONCOCT the maximum number of clusters was setup to 34. The draft genome quality classification scheme is as proposed by [8]
Completeness and contamination for 31 ICoVeR-refined genome bins for AD microbiome dataset
| Bina | Marker lineageb | Metagen. abund. %c | GC % | ICoVeR | MyCC | MetaBAT_1 | MetaBAT_2 | CONCOCT | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | ||||
| 1 |
| 22.3 | 36.1 |
|
| 98.9 | 0 | 98.9 | 0 | 98.9 | 0 | 100.0 | 18.8 |
| 2 |
| 6.3 | 41.7 |
|
| 98.9 | 39.2 | 96.6 | 0.4 | 96.6 | 2.2 | 100.0 | 82.3 |
| 3 |
| 1.6 | 41.9 |
|
| 96.6 | 1.8 | 95.0 | 0.7 | 74.8 | 0.7 | 98.0 | 1.7 |
| 4 |
| 2.6 | 58.8 |
|
| 94.9 | 2.2 | 90.3 | 2.0 | 90.3 | 2.0 | 98.3 | 6.0 |
| 5 |
| 1.5 | 42.0 |
|
| 94.8 | 3.7 | 90.8 | 0 | 90.8 | 0 | 94.8 | 2.0 |
| 6 |
| 1.5 | 59.9 |
|
| 94.7 | 4.7 | 85.8 | 0.1 | 85.8 | 0.1 | 94.6 | 42.8 |
| 7 |
| 1.1 | 51.9 |
|
| 94.7 | 51.0 | 66.7 | 0 | 66.7 | 0 | 100.0 | 101.7 |
| 8 |
| 8.8 | 46.7 |
|
| 93.4 | 23.8 | 75.6 | 7.7 | 75.7 | 9.8 | 100.0 | 99.4 |
| 9 |
| 0.7 | 51.9 |
|
| 92.8 | 1.5 | 80.0 | 0.6 | 80.0 | 0.6 | 92.7 | 3.2 |
| 10 |
| 1.0 | 51.1 |
|
| 91.4 | 1.7 | 93.6 | 6.8 | 93.6 | 6.8 | 98.2 | 19.1 |
| 11 |
| 1.2 | 46.6 |
|
| 90.4 | 22.0 | 58.2 | 2.2 | 67.1 | 6.1 | 88.0 | 3.8 |
| 12 |
| 2.4 | 55.3 |
|
| 90.1 | 6.8 | 85.2 | 3.2 | 85.2 | 3.2 | 100 | 72.2 |
| 13 |
| 1.1 | 51.1 |
|
| 87.3 | 3.8 | 76.3 | 0 | 76.8 | 0 | 96.5 | 92.0 |
| 14 |
| 0.8 | 53.3 |
|
| 84.9 | 7.6 | 66.4 | 0.4 | 66.3 | 0.4 | 94.6 | 42.8 |
| 15 |
| 12.0 | 41.6 |
|
| 84.3 | 5.3 | 73.7 | 0.5 | 74.8 | 1.1 | 100.0 | 99.4 |
| 16 |
| 0.7 | 54.2 |
|
| 83.4 | 17.2 | 48.0 | 0.2 | 43.8 | 0.2 | 100.0 | 72.2 |
| 17 |
| 1.2 | 45.3 |
|
| 82.0 | 14.1 | 91.0 | 1.1 | 91.2 | 1.6 | 96.5 | 54.0 |
| 18 |
| 0.6 | 46.1 |
|
| 78.5 | 7.2 | 48.6 | 0 | 48.6 | 0 | 100 | 82.3 |
| 19 |
| 1.4 | 58.4 |
|
| 77.7 | 5.3 | 56.2 | 1.4 | 56.2 | 1.4 | 77.7 | 2.6 |
| 20 |
| 0.5 | 50.7 |
|
| 77.6 | 8.7 | 62.9 | 9.8 | 63.4 | 18.5 | 99.8 | 110.8 |
| 21 |
| 0.6 | 50.9 |
|
| 66.6 | 1.7 | 25.5 | 1.1 | 25.5 | 1.1 | 67.6 | 12.2 |
| 22 |
| 0.3 | 45.4 |
|
| 63.7 | 4.0 | NB | NB | 63.0 | 5.1 | ||
| 23 |
| 0.6 | 35.8 |
|
| 63.7 | 7.3 | 28.9 | 0.1 | 29.0 | 0 | 51.8 | 3.2 |
| 24 |
| 0.4 | 32.4 |
|
| 63.3 | 3.6 | 14.9 | 0 | 16.7 | 0 | 72.7 | 24.2 |
| 25 |
| 0.5 | 49.9 |
|
| 60.0 | 5.0 | 46.2 | 0.7 | 46.2 | 0.7 | 75.5 | 3.01 |
| 26 |
| 0.3 | 49.6 |
|
| 56.8 | 10.2 | NB | NB | 96.6 | 54.0 | ||
| 27 |
| 0.3 | 52.3 |
|
| 56.8 | 6.0 | 12.8 | 0 | NB | 47.0 | 1.1 | |
| 28 |
| 0.6 | 49.7 |
|
| 54.5 | 2.1 | 11.2 | 0 | NB | 62.4 | 15.3 | |
| 29d |
| 0.1 | 47.4 |
|
| 53.7 | 8.1 | NB | NB | 45.2 | 3.8 | ||
| 30 |
| 3.4 | 52.3 |
|
| 52.4 | 0.7 | 98.6 | 0.5 | 93.3 | 0.5 | 100.0 | 101.7 |
| 31 |
| 1.7 | 52.7 |
|
| 49.4 | 0 | 78.0 | 0.1 | 93.4 | 0.1 | 96.6 | 91.9 |
MetaBAT_1 - ‘sensitive/specific’ mode
MetaBAT_2 - ‘superspecific’ mode
aNumber corresponds to the ICoVeR-refined bin (Table S2)
bMarker lineage was defined by CheckM
cMetagenomic abundance corresponds to the % of reads mapping to the contigs binned inside each bin
dBased on the wide range in the GC content, the pattern of contigs abundances across the multiple samples and the level of contamination, we judged this bin to be a mixture of several low abundant organisms. Therefore, the ICoVeR-refined genome bin was less complete than the corresponding MyCC and CONCOCT genome bins
Completeness and contamination were calculated with CheckM (highlighted in bold font for ICoVeR-refined bins). NB – no genome bin assigned
Summary of binning performance for 31 ICoVeR-refined genome bins initially assigned with different binning algorithms for AD microbiome dataset
| Binning algorithm | Average completeness (%) | Average contamination (%) | F1 (%) |
|---|---|---|---|
| MetaBAT_1 | 66.3 | 1.4 | 75.4 |
| MetaBAT_2 | 70.4 | 2.2 | 79.3 |
| MyCC | 78.3 | 8.9 | 82.5 |
| CONCOCTa | 87.3 | 42.7 | 56.9 |
|
|
|
|
|
aThe relatively lower performance of CONCOCT on the AD microbiome dataset may be attributed to the loss of precision due to insufficient number of samples analysed (accuracy starts to decrease below 50 samples). Results for ICoVeR are highlighted in bold font
Fig. 2Visual representation of the paired-end connections for contigs grouping into the resulting genome bins for the AD microbiome dataset. Intra- (between the contigs inside the same bin) and inter-bin (between the contigs assigned to two different bins) paired-end contig connections are displayed as grey and red lines, respectively
Completeness and contamination for nine ICoVeR-refined genome bins for Sharon’s dataset
| Bina | Marker lineageb | ICoVeR | MaxBin2c | MyCCc | CONCOCTc | MetaBATc | GroopMc | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | Compl. % | Cont. % | ||
| 1 |
|
|
| 99.2 | 0 | 99.2 | 0 | 100 | 204.2 | 99.2 | 0 | 99.2 | 0 |
| 2 |
|
|
| 98.9 | 0 | 98.9 | 0 | 98.9 | 0 | 98.9 | 0 | NB | |
| 3 |
|
|
| 22.4 | 0 | 97.9 | 0 | 97.9 | 0 | 100 | 37.9 | 97.9 | 0 |
| 66.9 | 0 | ||||||||||||
| 4 |
|
|
| 99.5 | 0.1 | 99.5 | 0.1 | 99.5 | 2.9 | 99.5 | 2.9 | 99.5 | 0.1 |
| 5 |
|
|
| 97 | 1.1 | 100 | 104.2 | 100 | 204.2 | 100 | 108.3 | 100 | 104.2 |
| 6 |
|
|
| 97.9 | 3.4 | ||||||||
| 7 |
|
|
| 100 | 24.4 | 95.4 | 0.6 | 95.4 | 0.6 | 95.4 | 0.6 | 95.8 | 1.7 |
| 8 |
|
|
| 84.9 | 2.5 | 78.7 | 0 | 84.1 | 0 | 84.1 | 0 | 82.5 | 3.0 |
| 9 |
|
|
| 72.8 | 37.9 | 45.1 | 0.2 | 45.1 | 0.2 | 37.7 | 0 | 24.7 | 0 |
aNumber corresponds to the ICoVeR-refined bin (Table S3)
bMarker lineage was defined by CheckM. Completeness and contamination were calculated with CheckM (highlighted in bold font for ICoVeR-refined bins)
cBin assignments for MaxBin2, MyCC, CONCOCT, MetaBAT and GroopM were downloaded from https://sourceforge.net/projects/sb2nhri/files/MyCC/Data/benchmark/Sharon.zip/download
Summary of binning performance for nine ICoVeR-refined genome bins initially assigned with different binning algorithms for Sharon’s dataset
| Binning algorithm | Average completeness (%) | Average contamination (%) | F1 (%) |
|---|---|---|---|
| MaxBin2 | 83.9 | 6.9 | 90.9 |
| MyCC | 89.3 | 13.1 | 71.7 |
| CONCOCT | 90.1 | 51.5 | 60.8 |
| MetaBAT | 89.4 | 18.7 | 68.6 |
| GroopM | 85.6 | 15.6 | 65.5 |
| ICoVeR |
|
|
|
Completeness and contamination were calculated with CheckM. Bin assignments for MaxBin2, MyCC, CONCOCT, MetaBAT and GroopM were downloaded from https://sourceforge.net/projects/sb2nhri/files/MyCC/Data/benchmark/Sharon.zip/download.. Results for ICoVeR are highlighted in bold font