| Literature DB >> 21127032 |
Fabio Gori1, Gianluigi Folino, Mike S M Jetten, Elena Marchiori.
Abstract
MOTIVATION: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads.Entities:
Mesh:
Year: 2010 PMID: 21127032 PMCID: PMC3018814 DOI: 10.1093/bioinformatics/btq649
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Input covering matrix (left) and a solution of the SCP (Right)
Accuracy and number of assigned reads on M1 datasets
| M1 | 0.1× | 1× | 4× |
|---|---|---|---|
| MTR | |||
| Kingdom | 100.00 (5669) | 99.93 (56 348) | 99.93 (173 541) |
| Phylum | 92.50 (5669) | 92.59 (56 325) | 93.39 (173 521) |
| Class | 84.04 (5556) | 85.44 (54 341) | 87.15 (167 546) |
| Order | 64.93 (5366) | 66.23 (53 395) | 66.69 (163 840) |
| Family | 64.87 (4904) | 63.67 (50 587) | 63.22 (154 134) |
| Genus | 63.66 (4628) | 62.58 (48 244) | 60.50 (144 475) |
| LCA | |||
| Kingdom | 100.00 (4145) | 99.92 (42 620) | 99.91 (132 130) |
| Phylum | 95.08 (4145) | 94.81 (42 593) | 95.02 (132 099) |
| Class | 94.46 (3739) | 93.24 (38 970) | 93.60 (121 980) |
| Order | 75.29 (3497) | 74.18 (36 857) | 72.43 (116 632) |
| Family | 71.94 (2961) | 69.94 (31 913) | 69.07 (102 239) |
| Genus | 71.03 (2686) | 68.39 (29 360) | 66.63 (94 346) |
Accuracy and number of assigned reads on M3 datasets
| M3 | 0.1× | 1× | 4× |
|---|---|---|---|
| MTR | |||
| Kingdom | 100.00 (11 792) | 99.97 (116 869) | 100.00 (166 948) |
| Phylum | 99.58 (11 792) | 99.47 (116 869) | 99.86 (166 948) |
| Class | 96.97 (11 763) | 97.07 (116 134) | 99.73 (166 936) |
| Order | 91.79 (11 606) | 91.70 (115 034) | 97.67 (166 148) |
| Family | 92.27 (11 117) | 91.25 (111 560) | 97.62 (165 231) |
| Genus | 94.06 (10 419) | 92.19 (101 533) | 97.42 (140 476) |
| LCA | |||
| Kingdom | 100.00 (10 333) | 99.96 (102 824) | 99.99 (155 263) |
| Phylum | 99.72 (10 333) | 99.69 (102 813) | 99.93 (155 258) |
| Class | 98.86 (9162) | 98.82 (91 445) | 99.81 (141 829) |
| Order | 96.74 (7788) | 96.62 (77 822) | 98.14 (115 732) |
| Family | 96.87 (7545) | 96.42 (75 616) | 98.04 (110 488) |
| Genus | 97.61 (6748) | 96.01 (68 573) | 98.35 (110 139) |
Fig. 1.Population distributions (rank Genus) of M2, coverage 0.1×, by MTR and LCA, and the true population distribution. Label ‘Others’ means taxa with less than 5% of the reads and not occurring in the true distribution.
Divergence between true population distribution and the population distributions obtained by MTR and LCA at ranks Family and Genus
| Dataset | Family | Genus | ||
|---|---|---|---|---|
| MTR | LCA | MTR | LCA | |
| M1 0.1× | 0.539 | 0.608 | 0.544 | 0.601 |
| M1 1× | 0.565 | 0.604 | 0.570 | 0.607 |
| M1 4× | 0.628 | 0.642 | 0.643 | 0.654 |
| M2 0.1× | 0.172 | 0.232 | 0.696 | 0.611 |
| M2 1× | 0.191 | 0.256 | 0.690 | 0.623 |
| M2 4× | 0.261 | 0.334 | 0.825 | 0.747 |
| M3 0.1× | 0.099 | 0.091 | 0.103 | 0.095 |
| M3 1× | 0.102 | 0.091 | 0.115 | 0.104 |
| M3 4× | 0.024 | 0.020 | 0.026 | 0.017 |
Real-life datasets: number of reads assigned up to a rank
| Saltern | Coral | Chicken | |
|---|---|---|---|
| MTR | |||
| Kingdom | 1581 | 24 522 | 111 655 |
| Phylum | 1576 | 23 027 | 111 650 |
| Class | 1530 | 21 920 | 109 986 |
| Order | 1317 | 21 019 | 108 100 |
| Family | 1035 | 15 583 | 100 676 |
| Genus | 979 | 11 422 | 94 507 |
| Species | 937 | 9560 | 89 818 |
| LCA | |||
| Kingdom | 1217 | 21 287 | 93 416 |
| Phylum | 1208 | 16 526 | 93 399 |
| Class | 1051 | 12 301 | 87 917 |
| Order | 807 | 6841 | 87 146 |
| Family | 691 | 5045 | 70 376 |
| Genus | 635 | 4685 | 69 636 |
| Species | 311 | 4340 | 29 160 |
Fig. 2.Population distributions (rank Genus) of Coral dataset by MTR (top) and LCA (bottom).
Accuracy and number of assigned reads on M2 datasets
| M2 | 0.1× | 1× | 4× |
|---|---|---|---|
| MTR | |||
| Kingdom | 95.27 (9030) | 95.07 (88 537) | 91.41 (174 583) |
| Phylum | 93.83 (9030) | 93.21 (88 537) | 88.75 (174 583) |
| Class | 89.98 (9012) | 89.25 (87 635) | 86.32 (168 854) |
| Order | 90.44 (8822) | 89.24 (85 657) | 86.14 (167 222) |
| Family | 80.56 (7264) | 77.35 (81 366) | 73.01 (159 591) |
| Genus | 64.41 (6480) | 61.36 (77 307) | 55.91 (147 139) |
| LCA | |||
| Kingdom | 94.82 (7205) | 94.66 (73 176) | 90.76 (143 226) |
| Phylum | 93.21 (7205) | 92.57 (73 169) | 87.80 (143 206) |
| Class | 89.82 (5941) | 88.98 (60 294) | 83.59 (117 881) |
| Order | 89.90 (5615) | 88.44 (57 373) | 83.01 (113 168) |
| Family | 83.77 (4757) | 81.84 (48 760) | 77.61 (100 925) |
| Genus | 76.91 (3907) | 74.60 (40 823) | 69.68 (82 805) |