| Literature DB >> 19909532 |
Abstract
BACKGROUND: Gene prediction is an essential step in the annotation of metagenomic sequencing reads. Since most metagenomic reads cannot be assembled into long contigs, specialized statistical gene prediction tools have been developed for short and anonymous DNA fragments, e.g. MetaGeneAnnotator and Orphelia. While conventional gene prediction methods have been subject to a benchmark study on real sequencing reads with typical errors, such a comparison has not been conducted for specialized tools, yet. Their gene prediction accuracy was mostly measured on error free DNA fragments.Entities:
Mesh:
Year: 2009 PMID: 19909532 PMCID: PMC2781827 DOI: 10.1186/1471-2164-10-520
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Test species - species whose genomes were used to simulate sequencing reads.
| Species | Phylum | GC-content |
|---|---|---|
| Termicutes | 31% | |
| Proteobacteria ( | 30% | |
| Proteobacteria ( | 68% | |
| Bacteriodetes/Chlorobi group | 56% | |
| Actinobacteria | 61% | |
| Crenarcheota | 45% | |
| Dictyoglomi | 33% | |
| Firmicutes | 47% | |
| Chloroflexi | 50% | |
| Aquificae | 34% | |
| Euryarchaeota | 63% | |
| Crenarcheota | 34% | |
| Cyanobacteria | 31% | |
| Proteobacteria ( | 34% |
Accuracy on simulated Sanger reads.
| GeneMark | MetaGene | MGA | Orphelia | ESTScan | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Error rate1 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 |
| 0 to 0 | 91.9 ± 3.2 | 93.8 ± 4.9 | 94.4 ± 3.0 | 93.0 ± 2.9 | 94.7 ± 2.9 | 94.1 ± 2.9 | 89.7 ± 3.5 | 96.5 ± 1.7 | 78.9 ± 7.2 | 98.5 ± 1.2 |
| 1 to 2 × 10-5 | 91.9 ± 3.3 | 93.7 ± 5.2 | 94.8 ± 2.8 | 93.0 ± 3.0 | 94.8 ± 2.9 | 94.0 ± 3.1 | 90.1 ± 3.3 | 96.7 ± 1.6 | 79.2 ± 6.5 | 98.6 ± 1.1 |
| 1 to 2 × 10-4 | 91.8 ± 3.3 | 93.5 ± 5.2 | 94.5 ± 2.9 | 92.6 ± 3.2 | 94.5 ± 3.0 | 93.7 ± 3.1 | 89.6 ± 3.5 | 96.5 ± 1.7 | 79.0 ± 7.0 | 98.4 ± 1.3 |
| 1 to 2 × 10-3 | 90.5 ± 3.2 | 92.6 ± 4.8 | 93.3 ± 2.8 | 92.1 ± 3.0 | 93.3 ± 3.0 | 93.3 ± 2.8 | 87.2 ± 3.6 | 96.0 ± 1.7 | 78.0 ± 7.2 | 98.2 ± 1.2 |
| 1 to 2 × 10-2 | 77.7 ± 4.4 | 86.6 ± 6.9 | 79.8 ± 3.6 | 85.6 ± 3.9 | 81.2 ± 4.3 | 87.6 ± 3.1 | 65.7 ± 6.4 | 91.9 ± 1.8 | 66.2 ± 11.1 | 96.2 ± 1.8 |
The gene prediction accuracy (mean and standard deviation over all species in the simulated metagenome) results of four metagenomic gene prediction tools GeneMark, MetaGene, MetaGeneAnnotator (MGA) and Orphelia, and of the EST processing tool ESTScan on simulated Sanger reads is shown.
Figure 1Average gene prediction accuracy on simulated Sanger reads. The harmonic mean is a measure that combines sensitivity and specificity (mean and standard deviation over all species in the simulated metagenome are shown).
Accuracy on simulated 454 reads.
| GeneMark | MetaGene | MGA | Orphelia | ESTScan | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Error rate | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 |
| 0 | 91.0 ± 3.6 | 93.8 ± 4.8 | 95.4 ± 2.8 | 92.8 ± 2.4 | 94.6 ± 2.7 | 94.1 ± 2.5 | 88.4 ± 3.5 | 96.7 ± 1.7 | 81.3 ± 7.8 | 97.9 ± 1.4 |
| 0.0022 | 85.3 ± 4.2 | 90.4 ± 5.6 | 89.3 ± 3.1 | 89.2 ± 3.5 | 89.6 ± 3.3 | 90.8 ± 2.6 | 80.0 ± 4.2 | 94.7 ± 2.1 | 77.2 ± 9.0 | 97.2 ± 1.5 |
| 0.0049 | 79.5 ± 4.9 | 87.6 ± 6.4 | 83.7 ± 3.5 | 85.9 ± 3.9 | 84.7 ± 4.0 | 87.7 ± 2.8 | 70.9 ± 5.9 | 92.5 ± 2.1 | 71.7 ± 11.5 | 96.2 ± 1.7 |
| 0.028 | 36.8 ± 4.9 | 68.3 ± 8.0 | 39.6 ± 3.9 | 60.6 ± 8.8 | 43.3 ± 5.5 | 61.9 ± 3.6 | 26.3 ± 9.1 | 68.3 ± 5.0 | 26.4 ± 11.2 | 86.2 ± 4.7 |
The gene prediction accuracy (mean and standard deviation over all species in the simulated metagenome) of four metagenomic gene prediction tools GeneMark, MetaGene, MetaGeneAnnotator (MGA) and Orphelia, and of the EST processing tool ESTScan on simulated 454 reads is shown.
Figure 2Average gene prediction accuracy on simulated 454 reads. The harmonic mean is a measure that combines sensitivity and specificity (mean and standard deviation over all species in the simulated metagenome are shown).
Accuracy by Species.
| GeneMark | MetaGene | MGA | Orphelia | ESTScan | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 |
| GC-content 30 - 39% | ||||||||||
| 81.7 | 92.6 | 88.5 | 88.1 | 90.1 | 88.6 | 66.0 | 93.5 | 71.4 | 98.3 | |
| 86.2 | 94.6 | 90.4 | 89.9 | 91.9 | 91.3 | 75.3 | 96.2 | 81.4 | 97.3 | |
| 81.8 | 91.3 | 84.0 | 87.1 | 84.2 | 87.9 | 61.9 | 92.5 | 69.3 | 97.3 | |
| 85.1 | 94.2 | 84.7 | 90.5 | 86.6 | 91.2 | 71.8 | 94.6 | 70.5 | 98.5 | |
| 85.7 | 90.4 | 86.2 | 86.7 | 87.7 | 87.7 | 68.1 | 92.4 | 76.9 | 96.1 | |
| 80.1 | 81.0 | 83.6 | 84.0 | 84.7 | 85.6 | 65.2 | 90.9 | 67.6 | 94.5 | |
| 83.9 | 93.9 | 84.3 | 89.9 | 85.5 | 90.8 | 67.2 | 93.1 | 73.0 | 98.1 | |
| GC-content 40 - 49% | ||||||||||
| 73.3 | 94.3 | 77.1 | 89.5 | 79.5 | 90.4 | 61.4 | 93.1 | 35.0 | 94.4 | |
| 78.9 | 90.0 | 81.7 | 88.2 | 81.8 | 89.4 | 78.3 | 93.9 | 74.4 | 95.1 | |
| GC-content 50 - 59% | ||||||||||
| 70.2 | 80.1 | 80.2 | 84.9 | 78.6 | 86.1 | 74.0 | 92.7 | 76.4 | 95.5 | |
| 74.0 | 76.6 | 78.6 | 81.0 | 79.0 | 83.0 | 72.7 | 89.9 | 72.7 | 93.7 | |
| GC-content 60 - 69% | ||||||||||
| 77.2 | 83.7 | 84.2 | 83.3 | 86.6 | 88.2 | 79.7 | 93.2 | 76.1 | 96.3 | |
| 77.0 | 83.5 | 83.3 | 81.6 | 83.9 | 84.8 | 72.8 | 91.3 | 75.3 | 94.4 | |
| 77.7 | 80.5 | 85.2 | 77.6 | 85.9 | 83.3 | 77.0 | 87.8 | 84.5 | 97.7 | |
The gene prediction accuracy (mean and standard deviation over all species in the simulated metagenome) of four metagenomic gene prediction tools GeneMark, MetaGene, MetaGeneAnnotator (MGA) and Orphelia, and of the EST processing tool ESTScan on pyrosequencing reads (450 nt) with 0.49% errors.
Accuracy on FAMeS reads.
| simLC4 | simMC5 | simHC6 | ||||
|---|---|---|---|---|---|---|
| Method | Sens.2 | Spec.3 | Sens.2 | Spec.3 | Sens.2 | Spec.3 |
| GeneMark | 78.8 | 85.9 | 77.3 | 85.1 | 77.1 | 83.0 |
| MetaGene | 80.0 | 78.4 | 78.8 | 77.5 | 78.0 | 74.9 |
| MetaGeneAnnotator | 79.6 | 80.2 | 78.4 | 79.4 | 77.3 | 75.6 |
| Orphelia | 76.7 | 85.0 | 74.9 | 82.5 | 74.8 | 82.0 |
| ESTScan | 70.2 | 96.0 | 69.3 | 96.1 | 69.0 | 95.0 |
The gene prediction accuracy of four metagenomic gene prediction tools GeneMark, MetaGene, MetaGeneAnnotator (MGA) and Orphelia, and of the EST processing tool ESTScan on FAMeS reads [4] is shown.