| Literature DB >> 12793912 |
Gautam Aggarwal1, E A Worthey, Paul D McDonagh, Peter J Myler.
Abstract
BACKGROUND: Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12793912 PMCID: PMC165441 DOI: 10.1186/1471-2105-4-23
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1This panel of ARTEMIS shows the comparison of four different methods used at SBRI for sequence annotation: a) CODONUSAGE b) GENESCAN c) TESTCODE and d) GLIMMER. The CODONUSAGE panel shows results for the three reading frames (shown by different colors) of the top strand; those from the bottom strand are not shown. The panel immediately following the TESTCODE panel displays the position of all stop codons (with vertical lines) in all six reading frames. The vertical scales in the top three panels refer to the value of the statistic calculated by the corresponding algorithm. The predictions of GLIMMER appear as blue boxes in this panel. The horizontal scale in the center of this panel indicates the nucleotide coordinates of the sequence for this and the three upper panels (and is adjustable on the right hand scroll bar). The bottom panel displays the translated amino acids in six different reading frames. The horizontal scale refers to the nucleotide coordinates for the sequence within this panel.
Automated gene predictiona in Leishmania major
| Annotated CDSb | G | G | T | C | |||||
| FPc | FNd | FP | FN | FP | FN | FP | FN | ||
| Chr1 | 79 | 131 | 0 | 61 | 1 | 68 | 33 | 75 | 4 |
| Chr3 | 94(1) | 116 | 1 | 57 | 5 | 119 | 51 | 108 | 8 |
| Chr4 | 123 | 328 | 1 | 97 | 6 | 130 | 56 | 139 | 9 |
| 1.96 | 0.77 | 1.68 | 1.16 | ||||||
a All possible ORFs (i.e. starting with an ATG and ending with TAA, TAG or TGA) of >300 bp in the three chromosome sequence were scored by each of the programs. GLIMMER predictions (for ORFs > 100 amino acids, with default settings) were taken straight from the trained software. For GENESCAN and TESTCODE, ORFs were considered to be positive if the average score for the ORF exceeded a threshold of 4.0 and 9.7, respectively. For overlapping ORFs on the same strand, that with the highest score was chosen. In case of CODONUSAGE, ORFs were predicted as coding when the average in-frame score was higher than the two out-of-frame scores. b The number of CDS of more than 300 bp in GenBank Accession numbers AE001274 (chr1), AC125735 (chr3), AL389894 and AL139794 (chr4). The number of annotated CDS of <300 bp are shown in parentheses. c False positives d False negatives e Error Discovery Rate (EDR) = (FN+FP)/(CDS)
Automated gene prediction by combination of different methods.
| Chr | Annotated CDS | 4 methods | 3 methods | 2 methods | |||
| FP | FN | FP | FN | FP | FN | ||
| Chr1 | 79 | 0 | 34 | 13 | 5 | 65 | 1 |
| Chr3 | 90 | 1 | 50 | 7 | 10 | 50 | 5 |
| Chr4 | 123 | 1 | 58 | 14 | 13 | 109 | 6 |
| 0.49 | 0.21 | 0.80 | |||||