| Literature DB >> 21447597 |
Abstract
The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to the database while extensive cross-referencing to more than 120 external databases provides access to additional relevant information in more specialized data collections. UniProtKB also integrates a range of data from other resources. All information is attributed to its original source, allowing users to trace the provenance of all data. The UniProt Consortium is committed to using and promoting common data exchange formats and technologies, and UniProtKB data is made freely available in a range of formats to facilitate integration with other databases. Database URL: http://www.uniprot.org/Entities:
Mesh:
Substances:
Year: 2011 PMID: 21447597 PMCID: PMC3070428 DOI: 10.1093/database/bar009
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Sequence sources for UniProtKB
| Sequence sources | Data integrated into UniProtKB |
|---|---|
| DDBJ, ENA, GenBank | All protein sequences resulting from translations of annotated coding regions in the DDBJ, ENA and GenBank databases except for non-germline immunoglobulins and T-cell receptors, synthetic sequences, patent application sequences, small fragments of less than eight amino acids, and pseudogenes |
| Submissions | Directly sequenced protein sequences which have been submitted to UniProtKB |
| Literature | Directly sequenced protein sequences which have been published but which have not been submitted to a publicly available database |
| Protein Data Bank | Protein sequences for which a structure is available but for which there is no corresponding UniProtKB entry |
| Ensembl | Protein sequences resulting from gene predictions by the Ensembl group or manual curation from the Vega database for which there is no corresponding UniProtKB entry |
| RefSeq | Protein sequences resulting from gene predictions or manual curation by RefSeq for which there is no corresponding UniProtKB entry |
Figure 1.Flow diagram showing an outline of the UniProtKB manual curation process.
Sequence analysis tools used during the UniProtKB manual curation process
| Sequence feature | Prediction method | URL |
|---|---|---|
| Topology | ||
| Signal peptides | SignalP | |
| Transit peptides | TargetP | |
| Mitochondrial, plastid or ER targeting sequences | Predotar | |
| Transmembrane domains | TMHMM | |
| Discrimination between signal and transmembrane domains | Phobius | |
| Domains | ||
| Protein diagnostic signatures | InterPro | |
| Gene3D | ||
| HAMAP | ||
| PANTHER | ||
| Pfam | ||
| PIRSF | ||
| PRINTS | ||
| ProDom | ||
| PROSITE | ||
| SMART | ||
| Superfamily | ||
| TIGRFAMs | ||
| Coiled coils | COILS | |
| Repeats | REP | |
| Post-translational modifications | ||
| GPI lipid anchor sites | bigPI | |
| | NetNGlyc | |
| | NetOGlyc | |
| N-terminal myristoylation | NMT | |
| Myristoylator | ||
| Tyrosine sulfation sites | Sulfinator |
Numbers of predicted annotations from the UniProt automatic annotation systems for release 2011_01 of 11 January 2011
| Predicted annotations | Number of entries for which annotation is predicted | |
|---|---|---|
| SAAS | UniRule | |
| Protein names | N/A | 1 488 518 |
| Gene names | N/A | 583 214 |
| Comments | 1 455 030 | 2 929 410 |
| Keywords | 2 083 619 | 3 043 730 |
| Sequence features | N/A | 343 288 |
Figure 2.Cross-references in a UniProtKB entry. This figure shows a subset of the cross-references provided in UniProtKB entry O54952.
Databases from which UniProtKB imports citations
| Database sources | Number of imported citations | Number of UniProtKB entries touched |
|---|---|---|
| BioCyc | 1780 | 1403 |
| dictyBase | 2530 | 2749 |
| Entrez Gene GeneRIF | 251 080 | 82 795 |
| FlyBase | 25 916 | 25 233 |
| GAD | 13698 | 24 042 |
| GeneDB_Spombe | 382 | 775 |
| MINT | 2521 | 26 181 |
| MGI | 110 796 | 54 016 |
| PDB | 16 455 | 15 575 |
| Reactome | 2574 | 3280 |
| RGD | 44 295 | 15 971 |
| SGD | 47 583 | 6316 |
| TAIR | 12 017 | 21409 |
| WormBase | 6747 | 8575 |
| ZFIN | 2987 | 6919 |
| Total | 475 490 | 230 991 |
Figure 3.Binary protein–protein interactions in UniProtKB entry Q13541 which have been imported from IntAct. Each interaction is displayed on a separate line. The ‘With’ column contains the gene names of the interacting proteins. Accession numbers of interacting proteins are listed in the ‘Entry’ column. The ‘#Exp’ column provides the number of experiments in which an interaction has been observed. The ‘IntAct’ column contains the IntAct database accession numbers of the two interacting proteins. These are hyperlinked to provide users with access to the underlying data in the IntAct database. Specific information regarding the interaction may be present in the ‘Notes’ column.
Coverage of UniProtKB-GOA annotation
| Annotation source | Number of associations | Number of distinct UniProtKB proteins |
|---|---|---|
| Electronic annotations | 74 764 592 | 9 001 654 |
| Manual annotations by UniProt | 129 305 | 27 554 |
| Total manual annotations | 736 895 | 113 675 |
| Total GOA annotations | 75 501 487 | 9 015 498 |
The data are based on UniProtKB-GOA release 91 which was released on 12 January 2011 and was assembled using the publicly released data available in the source databases on 10 January 2011. A more detailed breakdown which is updated with each release is available at http://www.ebi.ac.uk/GOA/uniprot_release.html.
Figure 4.Information in a UniProtKB entry is linked to underlying data sources. The source of each data item is indicated and the source information is hyperlinked to allow users to access the original data source directly.
Figure 5.Using the query builder on the UniProt website to refine a search. An initial query for insulin is further refined using the query builder to include a taxonomic restriction.
Figure 6.Mapping database identifiers using the identifier mapping tool on the UniProt website. The identifier mapping tool allows mapping of UniProt identifiers to identifiers in a database referenced from UniProt or vice versa. Here, a set of RefSeq identifiers are mapped to the corresponding UniProtKB entries.