| Literature DB >> 22962452 |
Yi Wang1, Henry C M Leung, S M Yiu, Francis Y L Chin.
Abstract
MOTIVATION: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.Entities:
Mesh:
Year: 2012 PMID: 22962452 PMCID: PMC3436824 DOI: 10.1093/bioinformatics/bts397
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Performance on Dataset A with 100 species (15 high abundance and 85 extremely low abundance)
| Species discovered | Sensitivity | Overall performance | ||||||
|---|---|---|---|---|---|---|---|---|
| ≥ 10× | ≥ 10× | Precision | Sensitivity | Memory | Time (min) | |||
| MetaCluster4.0 | 4 | 0 | 0.79 | – | 0.67 | 0.79 | 29G | 70 |
| MetaCluster5.0 | 14 | 0 | 0.90 | – | 0.92 | 0.90 | 20G | 38 |
Performance on Dataset B with 20 species (11 high abundance, four low abundance and five extremely low abundance)
| Species discovered | Sensitivity | Overall performance | |||||||
|---|---|---|---|---|---|---|---|---|---|
| ≥ 10× | (6×, 10×) | ≥ 10× | (6×, 10×) | Precision | Sensitivity | Memory | Time (min) | ||
| MetaCluster4.0 | 9 | 0 | 0 | 0.79 | – | 0.82 | 0.82 | 12.5G | 16 |
| MetaCluster5.0 | 11 | 3 | 0 | 0.87 | 0.80 | 0.92 | 0.87 | 7.7G | 14 |
Fig. 1.Workflow of MetaCluster 5.0
Percentage of filtered reads by MetaCluster 5.0 (Dataset A)
| First round | Second round | |||
|---|---|---|---|---|
| ≥ 10× (%) | ≥ 6× (%) | |||
| Filter step 1 | 3.6 | 90.4 | 12 | 9 |
| Filter step 2 | 6.2 | 95.4 | – | |
Percentage of filtered reads by MetaCluster 5.0 (Dataset B)
| First round | Second round | ||||
|---|---|---|---|---|---|
| ≥ 10× (%) | [6×, 10×] (%) | ≥ 6× (%) | |||
| Filter step 1 | 3.4 | 37.1 | 86.2 | 20 | 22 |
| Filter step 2 | 6.2 | 61.4 | 97.5 | – | |
Performance on Dataset C with 100 species (16 high abundance, four low abundance and 80 extremely low abundance)
| Species discovered | Sensitivity | Overall performance | |||||||
|---|---|---|---|---|---|---|---|---|---|
| ≥ 10× | [6×, 10×) | ≥ 10× | [6×, 10×] | Precision | Sensitivity | Memory | Time (min) | ||
| MetaCluster4.0 | 9 | 1 | 1 | 0.81 | 0.60 | 0.62 | 0.80 | 31G | 87 |
| MetaCluster5.0 | 16 | 3 | 3 | 0.91 | 0.72 | 0.87 | 0.88 | 21G | 45 |
Percentage of filtered reads by MetaCluster 5.0 (Dataset C)
| First round | Second round | ||||
|---|---|---|---|---|---|
| ≥ 10× (%) | [6×, 10×] (%) | ≥ 6× (%) | |||
| Filter step 1 | 3.5 | 39.9 | 93 | 13 | 11 |
| Filter step 2 | 4.6 | 60.1 | 97 | – | |
Performance of MetaCluster 5.0 on the real dataset
| Groups | Major species | Precision | Sensitivity |
|---|---|---|---|
| Group 1 | 0.82 | 0.84 | |
| Group 2 | 0.79 | 0.54 | |
| Group 3 | 0.65 | 0.65 | |
| Group 4 | 0.98 | 0.70 | |
| Group 5 | 0.59 | 0.55 | |
| Group 6 | 0.76 | 0.69 | |
| Group 7 | 0.59 | 0.78 | |
| Group 8 | 0.71 | 0.62 |
Performance of MetaCluster 4.0 on the real dataset
| Groups | Major species | Precision | Sensitivity |
|---|---|---|---|
| Group 1 | 0.79 | 0.53 | |
| Group 2 | 0.77 | 0.56 | |
| Group 3 | 0.51 | 0.89 |