| Literature DB >> 25652152 |
Ying Wang1, Haiyan Hu2, Xiaoman Li3,4.
Abstract
BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement.Entities:
Mesh:
Year: 2015 PMID: 25652152 PMCID: PMC4339733 DOI: 10.1186/s12859-015-0473-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An example of binning reads from four species in the genus of by MBBC. α and λ represents the estimated relative species abundance and k-mer coverage, respectively. The real genome sizes, α and λ are listed in the parentheses of the last table in the figure. After updating k-mer occurrences for k-mers occurring fewer than 4 times, the estimated α becomes more accurate. After removing small groups, the estimated species number and α become more accurate.
Prediction by MBBC on datasets with different genome coverage ratios or species composition
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| spa4spd8sps18spt32 | 1498994 | 1160554 | 9.42% | 6.98% | 3.34 | 3.49 |
| 825923 | 945296 | 10.35% | 11.36% | 6.67 | 5.83 | |
| 1138156 | 1107344 | 27.91% | 29.95% | 13.05 | 12.48 | |
| 1212248 | 1075140 | 52.33% | 51.70% | 22.98 | 20.52 | |
| spa4spd8sps18 | 1281577 | 1160554 | 16.16% | 14.45% | 3.24 | 3.49 |
| 921307 | 945296 | 22.61% | 23.53% | 6.31 | 5.83 | |
| 1226752 | 1107344 | 61.23% | 62.02% | 12.83 | 12.48 | |
| spa5spd8sps15 | 1607360 | 1160554 | 27.03% | 19.36% | 4.03 | 4.01 |
| 682864 | 945296 | 20.95% | 25.23% | 7.36 | 5.83 | |
| 1139322 | 1107344 | 52.02% | 55.41% | 10.95 | 10.53 | |
| spa5baa8sps15 | 1463372 | 1160554 | 21.50% | 16.49% | 4.13 | 4.01 |
| 1318685 | 1596490 | 30.49% | 36.30% | 6.51 | 5.87 | |
| 1250815 | 1107344 | 48.01% | 47.21% | 10.80 | 10.53 |
Each species in each dataset is named by the first two letters of their genus name, followed by the first letter from the species name and then the genome coverage. The first dataset is the one used in Figure 1.
Prediction on the human gut dataset by MBBC
|
|
| ||||||
|---|---|---|---|---|---|---|---|
| genome size | 3524796 | 2315047 | 1745685 | 2274392 | NA | 2249085 | NA |
| relative abundance | 11.25% | 16.87% | 23.33% | 48.55% | 14.12% | 16.67% | 69.21% |
| k-mer coverage | 4.48 | 10.24 | 18.78 | 30 | 8.28 | 10.49 | 18.49 |
Binning accuracy of MBBC, AbundanceBin and MetaCluster
|
|
|
|
|
|---|---|---|---|
| lag5lar11las24 | 91.34% | 82.93% | 64.60% |
| lag4lar7las12 | 78.97% | 77.66% | 39.09% |
| laa4lag8lar15las30 | 86.43% | 83.49% | 50.98% |
| laa4lag8lar15las30 (no errors) | 87.13% | 85.64% | 86.41% |
| spa4spd9sps18 | 89.58% | 78.68% | 63.73% |
| spa5spd8sps15 | 82.01% | 73.71% | 52.44% |
| spa4spd8sps18spt32 | 87.35% | 72.64% | 54.60% |
| spa4spd8sps18spt32 (no errors) | 89.09% | 74.43% | 90.44% |
| baa3bab7bac15 | 79.55% | 64.83% | 61.11% |
| baa6bab10bac18 | 75.80% | 45.12% | 51.13% |
| baa5bab10bac18bah30 | 75.71% | 34.48% | 39.25% |
| baa5bab10bac18bah30 (no errors) | 79.90% | 45.82% | 66.25% |
| human gut dataset | 74.94% | 71.65% | 52.63% |
| AMD dataset | 94.14% | NA | 73.42% |
Figure 2The procedure of read clustering in MBBC. The output on the right from each of the main steps on the left is connected with the corresponding steps.