| Literature DB >> 15608219 |
Rama Balakrishnan1, Karen R Christie, Maria C Costanzo, Kara Dolinski, Selina S Dwight, Stacia R Engel, Dianna G Fisk, Jodi E Hirschman, Eurie L Hong, Robert Nash, Rose Oughtred, Marek Skrzypek, Chandra L Theesfeld, Gail Binkley, Qing Dong, Christopher Lane, Anand Sethuraman, Shuai Weng, David Botstein, J Michael Cherry.
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is a scientific database of gene, protein and genomic information for the yeast Saccharomyces cerevisiae. SGD has recently developed two new resources that facilitate nucleotide and protein sequence comparisons between S.cerevisiae and other organisms. The Fungal BLAST tool provides directed searches against all fungal nucleotide and protein sequences available from GenBank, divided into categories according to organism, status of completeness and annotation, and source. The Model Organism BLASTP Best Hits resource displays, for each S.cerevisiae protein, the single most similar protein from several model organisms and presents links to the database pages of those proteins, facilitating access to curated information about potential orthologs of yeast proteins.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608219 PMCID: PMC539977 DOI: 10.1093/nar/gki023
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Summary table of the Model Organism BLASTP Best Hits page. A summary table similar to this representative table is generated for each locus having a ‘hit’ in one or more model organism databases. In this figure, Saccharomyces protein Yer179wp results are shown as an example. Columns of the table are as follows: species of the hit protein; name of the database for the hit protein, hyperlinked to the home page of that database; name of the hit protein from its database, hyperlinked to the database page of that protein or its gene; description of the hit protein, as found in its database; E-value (expectation value), reflecting the number of hits expected to be found by chance; percent aligned, showing the percentage of the length of the query protein over which it aligns with the hit protein; source range, showing the amino acid coordinates of the region of the S.cerevisiae query protein that was aligned; and target range, showing the amino acid coordinates of the region of the ‘hit’ protein that was aligned with the S.cerevisiae query protein.
Summary of best BLASTP hits for S.cerevisiae proteins in selected model organism databases
| Organism (database name) | Total predicted proteins | Predicted proteins similar to | |
|---|---|---|---|
| 18 746 | 2949 (15.73%) | 2929 (44.44%) | |
| 4726 | 4231 (89.53%) | 4980 (75.56%) | |
| 22 254 | 2176 (9.77%) | 2834 (43.00%) | |
| 29 161 | 2718 (9.32%) | 3109 (47.17%) | |
| 29 802 | 2730 (9.16%) | 3137 (47.60%) | |
| 6703 | 2246 (33.51%) | 2984 (45.27%) |
For each model organism, the table displays: total number of predicted proteins, as of September 2004; predicted proteins that are ‘hit’ (E-value ≤ 0.01) by an S.cerevisiae query protein, expressed as the number of proteins and as the percentage of total proteins for that organism; and S.cerevisiae proteins that find a hit in the predicted proteins of that model organism, expressed as the number of proteins and as the percentage of total S.cerevisiae open reading frames. The S.cerevisiae set of predicted protein sequences comprised 6591 open reading frames predicted as of September 2004. For comparisons to S.cerevisiae, the best hit is defined as the most similar protein not identical to the query sequence.
BLASTP best hits for S.cerevisiae proteins, sorted by ORF classification
| ORF classification | Total number in the | ||
|---|---|---|---|
| Verified | 4231 | 4069 (96.17%) | 2105 (49.75%) |
| Uncharacterized | 1546 | 1221 (78.98%) | 282 (18.24%) |
| Dubious | 814 | 78 (9.58%) | 0 |
| Total | 6591 | 5368 (81.44%) | 2387 (36.22%) |
ORFs are classified at SGD as verified, uncharacterized or dubious, depending on the likelihood that they are expressed as protein products. For each ORF class and for the whole ORF set, the table displays: the total number of ORFs in that set; the ORFs that find a hit in one or more of the model organism protein sequence datasets including S.cerevisiae proteins, expressed as the number of ORFs and as the percentage of total ORFs in that set; and the ORFs that find a hit in all of the model organism protein datasets excluding S.cerevisiae proteins, expressed as the number of ORFs and as the percentage of total ORFs in that set.