| Literature DB >> 17241472 |
Christine G Elsik1, Aaron J Mackey, Justin T Reese, Natalia V Milshina, David S Roos, George M Weinstock.
Abstract
BACKGROUND: We wished to produce a single reference gene set for honey bee (Apis mellifera). Our motivation was twofold. First, we wished to obtain an improved set of gene models with increased coverage of known genes, while maintaining gene model quality. Second, we wished to provide a single official gene list that the research community could further utilize for consistent and comparable analyses and functional annotation.Entities:
Mesh:
Year: 2007 PMID: 17241472 PMCID: PMC1839126 DOI: 10.1186/gb-2007-8-1-r13
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Initial evaluation
| Predicted gene set | Number of gene models | Number of perfect alignments/weighted by number of gene models | Number present/weighted by number of gene models |
| GLEAN | 10,157 | 111/0.011 | 356/0.035 |
| Fgenesh | 32,664 | 100/0.003 | 385/0.012 |
| NCBI | 9,759 | 88/0.009 | 340/0.035 |
| Evolutionary Conserved Core | 10,966 | 39/0.004 | 284/0.026 |
| Ensembl | 27,755 | 32/0.0012 | 217/0.008 |
| 8,878 | 4/0.0005 | 116/0.013 |
General Statistics for GLEAN and input gene prediction sets.
| GLEAN | Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | |||
| Genes | Count | 10,157 | 5,842 | 13,397 | 10,960 | 32,576 | 9,414 |
| All transcripts | Count | 10,157 | 8,875 | 27,663 | 10,960 | 32,576 | 9,744 |
| Average length | 8,288 | 4,053 | 5,633 | 6,573 | 2,054 | 9,909 | |
| Average coding length | 1,620 | 1,136 | 1,085 | 1,430 | 635 | 1,728 | |
| Ave exons per | 6.4 | 4.8 | 6.2 | 5.9 | 3.5 | 7.4 | |
| Complete transcripts | Count | 9,722 | 460 | 2,923 | 3,918 | 31,003 | 7,966 |
| Average length | 8,415 | 3,486 | 2,180 | 6,563 | 2,096 | 10,388 | |
| Average coding length | 1,644 | 1,112 | 631 | 1,545 | 631 | 1,808 | |
| Ave exons per | 6.5 | 5.2 | 3.7 | 6.3 | 3.5 | 7.8 | |
| Single exon transcripts | Count | 705 | 34 | 421 | 275 | 882 | 194 |
| Average length | 925 | 904 | 186 | 739 | 615 | 1,325 | |
| All exons | Count | 64,975 | 27,672 | 13,2964 | 60,601 | 113,465 | 70,627 |
| Average length | 253 | 239 | 163 | 243 | 182 | 234 | |
| Introns | Count | 54,818 | 21,254 | 101,056 | 49,587 | 80,889 | 61,107 |
| Average length | 1,235 | 700 | 1,016 | 1,089 | 571 | 1,287 | |
| Splice acceptors | Count | 55,249 | 26,532 | 125,739 | 55,192 | 82,024 | 61,903 |
| Splice donors | Count | 54,831 | 26,444 | 127,760 | 53,653 | 81,469 | 62,762 |
| Start codons | Count | 9,726 | 1,639 | 8,110 | 5,501 | 31,441 | 8,949 |
| Stop codons | Count | 10,144 | 1,857 | 6,153 | 7,133 | 31,996 | 8,123 |
Number (%) of GLEAN transcripts and exons with overlap to gene prediction sets
| Drosophila Ortholog | Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | |
| Transcript 80% overlap | 5,532 (55) | 8,806 (84) | 7,789 (81) | 9,873 (98) | 8,770 (93) |
| Transcript 80% both overlap | 2,559 (256) | 4,032 (40) | 4,776 (47) | 6,323 (62) | 7,117 (70) |
| Transcript exact overlap | 232 (2) | 706 (7) | 1,451 (14) | 3,595 (35) | 3,757 (37) |
| Exon 80% overlap | 26,290 (41) | 46,424 (72) | 48,902 (75) | 61,053 (94) | 61,890 (95) |
| Exon 80% both overlap | 22,566 (35) | 37,805 (58) | 43,023 (66) | 56,442 (87) | 57,128 (88) |
| Exon exact overlap | 16,621 (26) | 26,440 (41) | 38,040 (59) | 51,618 (79) | 53,435 (82) |
Number (%) GLEAN transcripts and exons with overlap to only one gene prediction set
| Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | ||
| Transcript 80% overlap | 1 (0.01) | 14 (0.14) | 1 (0.01) | 27 (0.27) | 3 (0.03) |
| Transcript 80% both overlap | 67 (0.66) | 160 (1.58) | 173 (1.70) | 647 (6.37) | 992 (9.77) |
| Transcript exact overlap | 35 (0.34) | 92 (0.91) | 289 (2.85) | 1431 (14.09) | 1569 (15.45) |
| Exon 80% overlap | 7 (0.01) | 46 (0.07) | 30 (0.05) | 346 (0.53) | 535 (0.82) |
| Exon 80% both overlap | 59 (0.09) | 221 (0.34) | 182 (0.28) | 1776 (2.73) | 2224 (3.42) |
| Exon exact overlap | 159 (0.24) | 305 (0.47) | 486 (0.75) | 3039 (4.68) | 4156 (6.40) |
Sensitivity and specificity using 684 manual gene models chromosomes 15 and 16
| GLEAN | Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | ||
| Gene sensitivity | 60 | 1 | 6 | 13 | 39 | 34 |
| Gene specificity | 65 | 2 | 5 | 12 | 15 | 40 |
| Transcript sensitivity | 53 | 1 | 6 | 12 | 34 | 30 |
| Transcript specificity | 65 | 1 | 2 | 12 | 15 | 41 |
| Exon sensitivity | 82 | 23 | 41 | 55 | 74 | 74 |
| Exon specificity | 90 | 56 | 20 | 61 | 52 | 77 |
| Nucleotide sensitivity | 91 | 37 | 63 | 72 | 91 | 87 |
| Nucleotide specificity | 97 | 96 | 79 | 91 | 82 | 95 |
Sensitivity and specificity using 33 manual gene models from scaffold 1.16
| GLEAN | Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | ||
| Gene sensitivity | 39 | 0 | 0 | 3 | 36 | 33 |
| Gene specificity | 46 | 0 | 0 | 4 | 17 | 48 |
| Transcript sensitivity | 37 | 0 | 0 | 3 | 34 | 31 |
| Transcript specificity | 46 | 0 | 0 | 4 | 17 | 48 |
| Exon sensitivity | 70 | 26 | 39 | 54 | 74 | 72 |
| Exon specificity | 81 | 64 | 23 | 63 | 53 | 77 |
| Nucleotide sensitivity | 89 | 34 | 66 | 67 | 96 | 92 |
| Nucleotide specificity | 98 | 99 | 88 | 95 | 89 | 98 |
Comparison of gene prediction sets with spliced EST alignments
| GLEAN | Ensembl | Evolutionary Conserved Core | Fgenesh | NCBI | ||
| Unique predicted donor/acceptor sites | 54,818 | 21,254 | 101,054 | 49,587 | 80,889 | 61,107 |
| Internal EST donor/acceptor sites | 3,255 | 1,467 | 3,157 | 2,504 | 3,227 | 3,233 |
| Perfect matches to EST donor/acceptor site | 2,985 | 1,354 | 2,861 | 2,094 | 2,812 | 2,857 |
| Perfect matches per internal EST donor/acceptor site | 0.92 | 0.92 | 0.91 | 0.84 | 0.87 | 0.88 |
| Perfect matches per predicted donor/acceptor site | 0.059 | 0.069 | 0.031 | 0.042 | 0.035 | 0.047 |
| Donor match | 3,083 | 1,369 | 2,909 | 2,230 | 3,071 | 3,008 |
| Acceptor match | 3,063 | 1,392 | 2,932 | 2,219 | 3,030 | 2,940 |
'Predicted donor/acceptor sites' are splice sites within predicted gene models. 'Internal EST donor/acceptor sites' are EST splice sites located between start and termination codons of predicted genes. EST, expressed sequence tag.