| Literature DB >> 19429689 |
Katharina J Hoff1, Thomas Lingner, Peter Meinicke, Maike Tech.
Abstract
Metagenomic sequencing projects yield numerous sequencing reads of a diverse range of uncultivated and mostly yet unknown microorganisms. In many cases, these sequencing reads cannot be assembled into longer contigs. Thus, gene prediction tools that were originally developed for whole-genome analysis are not suitable for processing metagenomes. Orphelia is a program for predicting genes in short DNA sequences that is available through a web server application (http://orphelia.gobics.de). Orphelia utilizes prediction models that were created with machine learning techniques on the basis of a wide range of annotated genomes. In contrast to other methods for metagenomic gene prediction, Orphelia has fragment length-specific prediction models for the two most popular sequencing techniques in metagenomics, chain termination sequencing and pyrosequencing. These models ensure highly specific gene predictions.Entities:
Mesh:
Year: 2009 PMID: 19429689 PMCID: PMC2703946 DOI: 10.1093/nar/gkp327
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Orphelia's ORF scoring model. In Step 1, 7 ORF/fragment features are computed. Step 2 calculates a final gene probability, combining the features by means of a neural network.
Figure 2.Screenshot of the Orphelia web server application submission page.
Mean and standard deviation of sensitivity, specificity and harmonic mean on 300 and 700 bp DNA fragments that were randomly excised from 12 test species
| 300 bp fragments | 700 bp fragments | |||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Harmonic mean | Sensitivity | Specificity | Harmonic mean | |
| Orphelia Net300 | 82.1 ± 3.6 | 91.7 ± 3.8 | 86.6 ± 2.7 | 49.5 ± 13.8 | 79.3 ± 6.9 | 59.4 ± 10.2 |
| Orphelia Net700 | 83.8 ± 3.4 | 88.1 ± 4.9 | 85.8 ± 3.9 | 88.4 ± 3.1 | 92.9 ± 3.2 | 90.6 ± 2.9 |
| MetaGene | 89.3 ± 3.3 | 84.2 ± 6.0 | 86.6 ± 4.3 | 92.6 ± 3.1 | 88.6 ± 5.9 | 90.4 ± 4.0 |
| MetaGeneAnnotator | 90.1 ± 2.8 | 86.2 ± 5.7 | 89.1 ± 3.1 | 92.9 ± 3.0 | 90.0 ± 6.0 | 91.5 ± 3.3 |
| GeneMark | 87.4 ± 2.8 | 91.0 ± 4.2 | 89.1 ± 3.1 | 90.9 ± 2.7 | 92.2 ± 5.1 | 91.5 ± 3.1 |
Orphelia Net300 represents Orphelia with the 300 bp prediction model, Orphelia Net700 represents the 700 bp prediction model. In addition, the performance of MetaGene, MetaGeneAnnotator and GeneMark is shown.
Figure 3.Venn diagram of the number of million nucleotides predicted as protein encoding by FGENESB, Orphelia (Net700) and MetaGene in the hypersaline microbial mat metagenome samples.