| Literature DB >> 15980513 |
Mario Stanke1, Burkhard Morgenstern.
Abstract
We present a WWW server for AUGUSTUS, a software for gene prediction in eukaryotic genomic sequences that is based on a generalized hidden Markov model, a probabilistic model of a sequence and its gene structure. The web server allows the user to impose constraints on the predicted gene structure. A constraint can specify the position of a splice site, a translation initiation site or a stop codon. Furthermore, it is possible to specify the position of known exons and intervals that are known to be exonic or intronic sequence. The number of constraints is arbitrary and constraints can be combined in order to pin down larger parts of the predicted gene structure. The result then is the most likely gene structure that complies with all given user constraints, if such a gene structure exists. The specification of constraints is useful when part of the gene structure is known, e.g. by expressed sequence tag or protein sequence alignments, or if the user wants to change the default prediction. The web interface and the downloadable stand-alone program are available free of charge at http://augustus.gobics.de/submission.Entities:
Mesh:
Year: 2005 PMID: 15980513 PMCID: PMC1160219 DOI: 10.1093/nar/gki458
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The types of constraints that can be imposed by the user on the predicted gene structure
| Constraint type | Meaning |
|---|---|
| Start | The translation initiation site (requires an atg in the sequence) |
| Stop | The translation end point (requires a stop codon) |
| Ass | Acceptor (3′) splice site (requires ag consensus) |
| Dss | Donor (5′) splice site (requires gt consensus) |
| Exonpart | An interval or a single position that is coding i.e. it is contained in an exon |
| Exon | An interval that is a complete exon |
| Intronpart | An interval or a single position that is contained in an intron |
The constraints can refer to either strand. Exon and exonpart constraints may optionally specify a reading frame.
Figure 1A contrived example for user constraints on the predicted gene structure. The top line shows the prediction of AUGUSTUS on a sequence of 5000 bp when no constraints are input. It predicted an incomplete gene with seven exons. The middle line shows six constraints: three constraints that enforce coding regions, two constraints that enforce intronic regions and one constraint that enforces the translation stop of a gene. The third line shows the prediction of AUGUSTUS under these constraints. This set of constraints is satisfiable and thus the prediction is consistent with all constraints.