| Literature DB >> 16469098 |
Mario Stanke1, Oliver Schöffmann, Burkhard Morgenstern, Stephan Waack.
Abstract
BACKGROUND: In order to improve gene prediction, extrinsic evidence on the gene structure can be collected from various sources of information such as genome-genome comparisons and EST and protein alignments. However, such evidence is often incomplete and usually uncertain. The extrinsic evidence is usually not sufficient to recover the complete gene structure of all genes completely and the available evidence is often unreliable. Therefore extrinsic evidence is most valuable when it is balanced with sequence-intrinsic evidence.Entities:
Mesh:
Year: 2006 PMID: 16469098 PMCID: PMC1409804 DOI: 10.1186/1471-2105-7-62
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 2Combined hints. The information retrieved from a combination of EST and protein database searches. The input DNA sequence contains one gene of which the dark boxes are the coding parts. At first, ESTs matching the DNA sequence are found and clustered. The concatenation of the segments of the input DNA sequence which are aligned to the clustered ESTs is then searched against a protein database. The protein match can be used to infer which part of the EST consensus sequence was coding. In this example the alignment of the protein started at the first position of its amino acid sequence. Thus a likely translation start site (start hint) can be inferred.
Accuracy results on sag178
| base | exon | gene | |||||
| sn | sp | sn | sp | sn | sp | ||
| AUGUSTUS | 0.93 | 0.83 | 0.79 | 0.73 | 0.42 | 0.38 | |
| GENSCAN | 0.94 | 0.64 | 0.68 | 0.45 | 0.18 | 0.14 | |
| GENEID | 0.89 | 0.78 | 0.67 | 0.60 | 0.17 | 0.17 | |
| HMMGene | 0.87 | 0.49 | 0.71 | 0.30 | 0.20 | 0.07 | |
| AUGUSTUS+ | 0.95 | 0.85 | 0.85 | 0.76 | 0.49 | 0.46 | |
| 0.98 | 0.90 | 0.91 | 0.87 | 0.71 | 0.68 | ||
| 0.95 | 0.89 | 0.89 | 0.85 | 0.68 | 0.65 | ||
| all hints | 0.98 | 0.93 | 0.94 | 0.90 | 0.82 | 0.79 | |
| GenomeScan* | 0.83 | 0.77 | 0.69 | 0.64 | 0.37 | 0.38 | |
| TWINSCAN | 0.88 | 0.82 | 0.71 | 0.68 | 0.20 | 0.25 | |
Accuracy results on human data set sag178 with 43 sequences and 178 genes, sn = sensitivity = TP/AP, and sp = specificity = TP/PP where TP, AP and PP are the number of true positives, actual positives and predicted positives, 'exon' and 'gene' here refer to the protein-coding parts of exons or genes, respectively. * For the accuracy values of GenomeScan the 7 sequences which are longer than 230 Kb were deleted from the test set. They were too long for the GenomeScan web server.
Accuracy results on chromosome 22
| base | exon | gene | |||||
| sn | sp | sn | sp | sn | sp | ||
| AUGUSTUS | 0.85 | 0.59 | 0.71 | 0.52 | 0.19 | 0.09 | |
| GENSCAN | 0.89 | 0.49 | 0.73 | 0.39 | 0.10 | 0.05 | |
| AUGUSTUS+ | 0.90 | 0.63 | 0.80 | 0.58 | 0.25 | 0.12 | |
| 0.94 | 0.65 | 0.87 | 0.63 | 0.37 | 0.19 | ||
| 0.91 | 0.65 | 0.81 | 0.61 | 0.30 | 0.15 | ||
| all hints | 0.94 | 0.68 | 0.89 | 0.66 | 0.41 | 0.22 | |
| TWINSCAN | 0.85 | 0.65 | 0.76 | 0.58 | 0.14 | 0.10 | |
| SGP2 | 0.87 | 0.66 | 0.74 | 0.56 | 0.19 | 0.10 | |
Accuracy results on human chromosome 22 using the annotation of the Sanger Center as a standard of truth. The results were computed using the program Eval from Evan Keibler and Michael R. Brent. The TWINSCAN and the SGP2 predictions were taken from [28] and [29]. The AUGUSTUS predictions can be downloaded from the AUGUSTUS web server [27]
Figure 1Venn diagram of exons and genes. Area-proportional Venn diagram of three sets of exons (top) and three sets of genes (bottom) for chromosome 22. 'Annotation' refers to the set of 387 genes compiled by the Sanger Institute. Examples: 2271 exons were in the Sanger Center annotation and were exactly predicted by AUGUSTUS+ using the Combined hints and by SGP2. The annotation set and the set of predictions of AUGUSTUS+ shared 71 genes identically, that were not in the set of SGP2 predictions.