| Literature DB >> 15608166 |
Paul Lu1, Duane Szafron, Russell Greiner, David S Wishart, Alona Fyshe, Brandon Pearcy, Brett Poulin, Roman Eisner, Danny Ngo, Nicholas Lamb.
Abstract
PA-GOSUB (Proteome Analyst: Gene Ontology Molecular Function and Subcellular Localization) is a publicly available, web-based, searchable and downloadable database that contains the sequences, predicted GO molecular functions and predicted subcellular localizations of more than 107,000 proteins from 10 model organisms (and growing), covering the major kingdoms and phyla for which annotated proteomes exist (http://www.cs.ualberta.ca/~bioinfo/PA/GOSUB). The PA-GOSUB database effectively expands the coverage of subcellular localization and GO function annotations by a significant factor (already over five for subcellular localization, compared with Swiss-Prot v42.7), and more model organisms are being added to PA-GOSUB as their sequenced proteomes become available. PA-GOSUB can be used in three main ways. First, a researcher can browse the pre-computed PA-GOSUB annotations on a per-organism and per-protein basis using annotation-based and text-based filters. Second, a user can perform BLAST searches against the PA-GOSUB database and use the annotations from the homologs as simple predictors for the new sequences. Third, the whole of PA-GOSUB can be downloaded in either FASTA or comma-separated values (CSV) formats.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608166 PMCID: PMC540074 DOI: 10.1093/nar/gki120
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Model organisms and annotation coverage in PA-GOSUB
| Model organisms | Number of Proteins | GO MF | Subcellular localization | ||
|---|---|---|---|---|---|
| GOA | PA-G | SP 42.7 | PA-G | ||
| 1868 | 497 | 1250 | 157 | 1100 | |
| 4105 | 1534 | 3187 | 862 | 2999 | |
| 4353 | 3524 | 3772 | 2167 | 3627 | |
| 5257 | 78 | 4309 | 85 | 4275 | |
| 6195 | 3017 | 5049 | 2024 | 4978 | |
| 16 371 | 1535 | 12 924 | 1246 | 12 869 | |
| 21 821 | 1459 | 14 379 | 933 | 14 297 | |
| 26 173 | 1891 | 18 338 | 1528 | 18 130 | |
| 26 556 | 5520 | 22 512 | 4912 | 22 431 | |
| 27 954 | 8230 | 23 064 | 7136 | 22 978 | |
| Total | 140 653 | 27 285 | 108 784 | 21 050 | 107 684 |
Figure 1Sample PACard: protein T4S4_HUMAN from H.sapiens.
Figure 2BLAST searches against the model organisms.
Figure 3Searching and filtering: part of a PACard Set matching the criteria.
Figure 4Searching and filtering: selecting criteria.
Figure 5The training and predicting phases of classification.
Figure 6The feature extraction algorithm for a protein sequence in PA.
Figure 7Part of the explain page for T4S4_HUMAN, signal transducer activity annotation.