| Literature DB >> 15980510 |
John Besemer1, Mark Borodovsky.
Abstract
The task of gene identification frequently confronting researchers working with both novel and well studied genomes can be conveniently and reliably solved with the help of the GeneMark web software (http://opal.biology.gatech.edu/GeneMark/). The website provides interfaces to the GeneMark family of programs designed and tuned for gene prediction in prokaryotic, eukaryotic and viral genomic sequences. Currently, the server allows the analysis of nearly 200 prokaryotic and >10 eukaryotic genomes using species-specific versions of the software and pre-computed gene models. In addition, genes in prokaryotic sequences from novel genomes can be identified using models derived on the spot upon sequence submission, either by a relatively simple heuristic approach or by the full-fledged self-training program GeneMarkS. A database of reannotations of >1000 viral genomes by the GeneMarkS program is also available from the web site. The GeneMark website is frequently updated to provide the latest versions of the software and gene models.Entities:
Mesh:
Year: 2005 PMID: 15980510 PMCID: PMC1160247 DOI: 10.1093/nar/gki487
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Gene predictions made by the prokaryotic version of GeneMark.hmm for a fragment of the Escherichia coli K12 genome
| Gene | Strand | Left end | Right end | Gene length | Class |
|---|---|---|---|---|---|
| 1 | + | 61 | 825 | 765 | 1 |
| 2 | + | 846 | 1112 | 267 | 2 |
| 3 | + | 1145 | 2092 | 948 | 1 |
| 4 | – | 2254 | 4386 | 2133 | 1 |
| 5 | – | 4388 | 520 | 366 | 1 |
In the ‘Class’ column, 1 and 2 indicate Typical and Atypical, respectively. Direct and reverse complement strands are indicated by ‘+’ and ‘−’, respectively. The graphical output for the first three predictions is shown in Figure 1.
Figure 1Graphical output from the combination of GeneMark and GeneMark.hmm for a fragment of the Escherichia coli K12 genome. The solid black and dashed traces indicate the coding potential calculated by the GeneMark program using the Typical and Atypical Markov chain models of coding DNA, respectively. Only the three reading frames in the direct strand are shown as there are no genes (either predicted or annotated) on the reverse strand in this section of the genome. The thick black horizontal bars indicate the locations of the predictions made by GeneMark.hmm. The thick grey horizontal bars indicate ‘regions of interest’ provided by the GeneMark program. The thin black horizontal lines indicate (longest) ORFs observed in each reading frame; ticks extending above and below this line indicate potential start and stop codons, respectively.
Prediction of a single gene (with seven exons) made by the eukaryotic version of GeneMark.hmm for a fragment of the Arabidopsis thaliana genome
| Gene no. | Exon no. | Strand | Exon type | Exon range | Exon length | Start frame | End frame |
|---|---|---|---|---|---|---|---|
| 1 | 1 | + | Initial | 23 525–24 451 | 927 | 1 | 3 |
| 1 | 2 | + | Internal | 24 752–24 962 | 211 | 1 | 1 |
| 1 | 3 | + | Internal | 25 041–25 435 | 395 | 2 | 3 |
| 1 | 4 | + | Internal | 25 524–25 743 | 220 | 1 | 1 |
| 1 | 5 | + | Internal | 25 939–25 997 | 59 | 2 | 3 |
| 1 | 6 | + | Internal | 26 292–26 776 | 485 | 1 | 2 |
| 1 | 7 | + | Terminal | 26 862–27 012 | 151 | 3 | 3 |
The gene that an exon belongs to and its strand are necessarily the same for all exons in a gene. Each exon is described by a type: ‘initial’ (begins with ATG), ‘internal’ or ‘terminal’ (ends with TAA, TAG or TGA) or ‘single’ for exons which are both initial and terminal (intronless gene). The start frame and end frame indicate the position of the codon (first, second or third) that the exon begins and ends with, respectively. Notably, all complete gene structures begin in codon position 1 and end in codon position 3.