| Literature DB >> 19920991 |
Gabriele Mayr1, Günter Lepperdinger, Peter Lackner.
Abstract
Primary protein sequence data are archived in databases together with information regarding corresponding biological functions. In this respect, UniProt/Swiss-Prot is currently the most comprehensive collection and it is routinely cross-examined when trying to unravel the biological role of hypothetical proteins. Bioscientists frequently extract single entries and further evaluate those on a subjective basis. In lieu of a standardized procedure for scoring the existing knowledge regarding individual proteins, we here report about a computer-assisted method, which we applied to score the present knowledge about any given Swiss-Prot entry. Applying this quantitative score allows the comparison of proteins with respect to their sequence yet highlights the comprehension of functional data. pfs analysis may be also applied for quality control of individual entries or for database management in order to rank entry listings.Entities:
Year: 2008 PMID: 19920991 PMCID: PMC2774577 DOI: 10.1155/2008/897019
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Computation of pfs was calculated with regard to the validity of bibliographic citations as well as a description and comments section. In line 9, the FUNCTION and CATALYTIC ACTIVITY records were evaluated. In step 2, automated entries resulting from large-scale experimental approaches were detached from further analysis. (B) Section evaluation. In every section, every single sentence was evaluated independently. Lines 12–14 are required in case literature citations contain down-weighting phrases, yet conclusions regarding functional properties of a protein have eventually been made by mere resemblance at the primary sequence level (Swiss-Prot term: “BY SIMILARITY”).
Expressions and corresponding weights used in pfs calculation: terminology of devaluating meaning frequently used in Swiss-Prot entries, which were applied in (1).
| Unknown | 0.0 |
| Not known | 0.0 |
| Not yet known | 0.0 |
| unnamed | 0.0 |
| Uncharacterized | 0.0 |
| Potential | 0.5 |
| Not clear | 0.5 |
| Not yet clear | 0.5 |
| By similarity | 0.5 |
| Putative | 0.5 |
| Similar to | 0.5 |
| Possible | 0.5 |
| Seems to | 0.5 |
| Thought to | 0.5 |
| Could be | 0.5 |
| Uncertain | 0.5 |
| Potentially | 0.5 |
| Might | 0.5 |
| May | 0.5 |
| Presumably | 0.75 |
| Probably | 0.75 |
| Probable | 0.75 |
Peer survey. 30 randomly picked Swiss-Prot entries (accession number: AccNum) were evaluated by trained biologists (undergraduate: un; PhD student: Ph; postdoctoral fellows: po; principal investigators: PI). Grades given for the provided information, in particular concerning functional properties as exemplified within an entry were “low” (L), “medium” (M) or “high” (H). The concordance of the ranking assigned by the peers is depicted (maj %), pfs: protein function score.
pfs analysis of Swiss-Prot entries. Absolute numbers of individual entries that were grouped according to computed pfs value are shown for the most prominent species in Swiss-Prot.
|
|
| |||||
|---|---|---|---|---|---|---|
|
| 0 | <1 | 1-2 | >2 | 3 | |
| 17169 | 880 | 956 | 4973 | 11240 | 5238 |
|
| 13826 | 441 | 480 | 5326 | 8020 | 3545 |
|
| 6493 | 1609 | 1650 | 941 | 3902 | 2470 |
|
| 6312 | 123 | 138 | 1710 | 4464 | 2278 |
|
| 6065 | 202 | 594 | 2233 | 3238 | 1613 |
|
| 4402 | 955 | 1024 | 1117 | 2261 | 1661 |
|
| 4272 | 130 | 145 | 2454 | 1673 | 995 |
|
| 3072 | 657 | 797 | 1310 | 965 | 609 |
|
| 2860 | 753 | 760 | 641 | 1459 | 940 |
|
| 2612 | 17 | 19 | 580 | 2013 | 1031 |
|
| 2199 | 49 | 56 | 1118 | 1025 | 515 |
|
| 1935 | 58 | 63 | 1656 | 216 | 107 |
|
| 1837 | 21 | 25 | 597 | 1215 | 652 |
|
| 1774 | 421 | 460 | 1090 | 224 | 136 |
|
| 1652 | 47 | 54 | 900 | 698 | 481 |
|
| 1536 | 36 | 43 | 982 | 511 | 211 |
|
| 1420 | 410 | 495 | 770 | 155 | 107 |
|
| 1401 | 6 | 20 | 489 | 892 | 412 |
|
| 1234 | 1 | 2 | 256 | 976 | 586 |
|
| 1226 | 26 | 36 | 744 | 446 | 324 |
|
| 919 | 3 | 11 | 829 | 79 | 51 |
|
|
|
|
|
|
|
|
|