| Literature DB >> 16381842 |
Cathy H Wu1, Rolf Apweiler, Amos Bairoch, Darren A Natale, Winona C Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J Martin, Raja Mazumder, Claire O'Donovan, Nicole Redaschi, Baris Suzek.
Abstract
The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381842 PMCID: PMC1347523 DOI: 10.1093/nar/gkj161
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Names and sizes of the UniProt databases
| Database name | Database size | |
|---|---|---|
| Abbreviation | Full name/meaning | |
| UniProt | Universal Protein Resource | |
| UniProtKB | UniProt Knowledgebase | 2 299 834 |
| UniProtKB/Swiss-Prot | Swiss-Prot section of the UniProt Knowledgebase | 194 317 |
| UniProtKB/TrEMBL | TrEMBL section of the UniProt Knowledgebase | 2 105 517 |
| UniParc | UniProt Archive | 5 025 587 |
| UniRef | UniProt Reference Clusters | |
| UniRef100 | UniProt Reference Clusters: 100% identity | 2 939 066 |
| UniRef90 | UniProt Reference Clusters: 90% identity | 1 730 689 |
| UniRef50 | UniProt Reference Clusters: 50% identity | 907 983 |
aBased on Release 6.0 (September 13, 2005).
Figure 1Overview of the major data sources of the UniProt databases.
Addition and redefinition of UniProt feature keys
| Feature key | Definition |
|---|---|
| COILED | A coiled-coil region |
| COMPBIAS | A compositionally biased region |
| MOTIF | A short (≤20 amino acids) sequence of biological interest |
| REGION | A region of interest in the sequence |
| TOPO_DOM | A topological domain |
| DOMAIN | A specific combination of secondary structures organized into a characteristic three-dimensional structure or fold |
| SITE | A single amino acid residue; can also apply to an amino acid bond represented by the positions of the two flanking amino acids |
aRevised.