| Literature DB >> 20021646 |
Wolfgang Gerlach1, Sebastian Jünemann, Felix Tille, Alexander Goesmann, Jens Stoye.
Abstract
BACKGROUND: Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads.Entities:
Mesh:
Year: 2009 PMID: 20021646 PMCID: PMC2801688 DOI: 10.1186/1471-2105-10-430
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview WebCARMA. The web application WebCARMA.
Figure 2Functional Profile. Example of a functional profile: 40 most abundant GO-terms in the metagenome of an agricultural biogas reactor.
Number of reads in each data set.
| Length | 35 bp | 40 bp | 50 bp | 60 bp | 70 bp | 100 bp | 150 bp | 200 bp | 250 bp | original |
|---|---|---|---|---|---|---|---|---|---|---|
| Reads | 616 069 | 616 031 | 613 943 | 606 760 | 598 811 | 584 168 | 550 945 | 492 305 | 297 852 | 616 072 |
| EGTs | 886 | 7 836 | 29 999 | 48 472 | 62 112 | 92 000 | 119 674 | 130 544 | 89 979 | 172 461 |
| Unique | 886 | 7 827 | 29 923 | 48 218 | 61 687 | 90 854 | 116 743 | 125 624 | 85 565 | 164 444 |
| Yield | 0.14% | 1.27% | 4.87% | 7.95% | 10.30% | 15.55% | 21.19% | 25.52% | 28.73% | 26.69% |
Number of reads and EGT yield for each data set. Some metagenomic reads have matches to more than one Pfam family and therefore are translated into more than one EGT. The column "Unique" denotes the total number of EGTs where EGTs from the same read are counted only once. The column "Yield" denotes the fraction of (unique) EGTs that could be obtained from the corresponding data set.
Figure 3EGT lengths distribution. EGT length distribution in each data set as a function of read length. Shown are the minimum, 25% quantile, median, 75% quantile and maximum.
Rate of "Unknown" EGTs. Rate of "Unknown" EGTs that could not be classified further from the complete set of EGTs.
| Read Length | Superkingdom | Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|---|---|
| 35 | 0.09 | 0.31 | 0.38 | 0.45 | 0.52 | 0.53 | 0.59 |
| 40 | 0.09 | 0.26 | 0.37 | 0.43 | 0.51 | 0.52 | 0.57 |
| 50 | 0.09 | 0.27 | 0.38 | 0.43 | 0.51 | 0.52 | 0.58 |
| 60 | 0.09 | 0.28 | 0.39 | 0.45 | 0.53 | 0.54 | 0.61 |
| 70 | 0.09 | 0.29 | 0.4 | 0.46 | 0.54 | 0.56 | 0.63 |
| 100 | 0.1 | 0.32 | 0.43 | 0.49 | 0.58 | 0.6 | 0.68 |
| 150 | 0.11 | 0.33 | 0.44 | 0.52 | 0.6 | 0.62 | 0.71 |
| 200 | 0.11 | 0.34 | 0.45 | 0.52 | 0.61 | 0.63 | 0.73 |
| 250 | 0.11 | 0.32 | 0.44 | 0.51 | 0.6 | 0.63 | 0.73 |
Rate of "Other" EGTs.
| Read Length | Superkingdom | Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|---|---|
| 35 | 0.0011 | 0.0671 | 0.1651 | 0.2816 | 0.4057 | 0.4554 | 0.6776 |
| 40 | 0.0011 | 0.0678 | 0.1691 | 0.2901 | 0.4388 | 0.5056 | 0.7340 |
| 50 | 0.0019 | 0.0667 | 0.1609 | 0.2954 | 0.4591 | 0.5292 | 0.7606 |
| 60 | 0.0024 | 0.0619 | 0.1552 | 0.2864 | 0.4637 | 0.5302 | 0.7663 |
| 70 | 0.0023 | 0.0617 | 0.1554 | 0.2954 | 0.4535 | 0.5221 | 0.7505 |
| 100 | 0.0035 | 0.0655 | 0.1539 | 0.2891 | 0.4456 | 0.4978 | 0.7172 |
| 150 | 0.0071 | 0.0692 | 0.1565 | 0.2964 | 0.4555 | 0.5006 | 0.6756 |
| 200 | 0.0100 | 0.0651 | 0.1467 | 0.2938 | 0.4500 | 0.4954 | 0.6658 |
| 250 | 0.0137 | 0.0542 | 0.1377 | 0.2849 | 0.4317 | 0.4693 | 0.6364 |
"Other" are EGT's with a relative abundance below the threshold 0.015 and are not shown in the histograms. Here we show the rates of "Other" EGTs relative to the total number of classified EGTs for each taxonomic rank and data set.
Figure 4superkingdom. Taxonomic results on the level of superkingdom.
Figure 5order. Taxonomic results on the level of order. Only taxa with an abundance of 0.015 or higher are shown.
Figure 6species. Taxonomic results on the level of species. Only taxa with an abundance of 0.015 or higher are shown.