| Literature DB >> 19494180 |
Gabriele Schweikert1, Jonas Behr, Alexander Zien, Georg Zeller, Cheng Soon Ong, Sören Sonnenburg, Gunnar Rätsch.
Abstract
We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).Entities:
Mesh:
Substances:
Year: 2009 PMID: 19494180 PMCID: PMC2703990 DOI: 10.1093/nar/gkp479
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison of the top-performing gene finding systems in the ab initio setting of the nGASP challenge (10)
| Method | Nucleotide | Exon | Transcript | Gene | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Sn | Sp | Sn | Sp | Sn | Sp | |||||
| mGene.init | 96.8 | 90.9 | 93.8 | 85.1 | 80.2 | 49.6 | 42.3 | 60.7 | 42.3 | |||
| mGene.init (dev) | 96.9 | 91.6 | 84.2 | 78.6 | 81.4 | 44.3 | 38.7 | 41.5 | 54.3 | 40.1 | 47.2 | |
| Craig | 95.5 | 90.9 | 93.2 | 80.3 | 78.2 | 79.2 | 35.7 | 35.4 | 35.6 | 43.7 | 35.4 | 39.6 |
| Fgenesh | 98.2 | 87.1 | 92.7 | 86.4 | 73.6 | 80.0 | 47.1 | 34.1 | 40.6 | 57.7 | 34.1 | 45.9 |
| Augustus | 97.0 | 89.0 | 93.0 | 86.1 | 72.6 | 79.3 | 52.9 | 28.6 | 40.8 | 64.4 | 34.5 | 49.4 |
Shown are sensitivity (Sn), specificity (Sp) and their average (each in percent) on nucleotide, exon, transcript and gene levels (if several submissions were made for one method, we chose the version with the best gene level average of sensitivity and specificity). The predictions of mGene.init were prepared after the deadline but strictly adhering to the rules and conditions of the nGASP challenge. The result of the best-performing method according to each of the evaluation levels is set in bold face. The evaluation is based on the submitted sets of the participants and performed with our own routine. The numbers slightly deviate from the official nGASP evaluation on the transcript and gene level due to minor differences in the evaluation criteria. These differences, however, do not change the ranking.
Figure 1.Simplified gene model underlying mGene: the vertices correspond to recognizable signals on the DNA, i.e. transcription start sites (TSSs), translation initiation sites (TISs), acceptor splice sites (Acc5′, Acc and Acc3′), donor splice sites (Don5′, Don and Don3′), translation termination sites (Stop) and cleavage sites (cleave). The edges correspond to segments associated with the content types for 5′-UTR, coding (CDS) exon, intron, 3′-UTR and intergenic sequences.