| Literature DB >> 20043185 |
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6
Source DB: PubMed Journal: Cell Mol Life Sci ISSN: 1420-682X Impact factor: 9.261
Fig. 1UniProtKB serves as a knowledge repository and as a central hub that provides links to numerous other databases. New protein sequences are integrated in UniProtKB/TrEMBL and annotated by an automated procedure. UniProtKB/Swiss-Prot entries are manually annotated, combining carefully checked protein sequences with information from the scientific literature, protein 3D-structures, and specialised databases, together with feedback from the scientific community
Fig. 2Extracts from the UniProtKB/Swiss-Prot entry for arylsulfatase A (P15289), showing selected parts of the General annotation, Sequence annotation and Ontologies section, and of one of the summary pages that are linked to individual “variant” lines. The General annotation section indicates the catalytic activity of a protein, its subunit structure, subcellular location, sequence similarities, etc., and explains post-translational modifications and the involvement in human disease. The Sequence annotation section indicates the roles of individual residues with specific “feature keys” displaying the extents of signal peptide and mature chain, active site and metal-binding residues, amino acid modifications and natural variants. For each variant, clicking on the amino acid substitution leads to a specific summary page including, when available, data from 3D-structure models. Keywords and GO terms complement the annotation