| Literature DB >> 23203869 |
Sameer Velankar1, José M Dana, Julius Jacobsen, Glen van Ginkel, Paul J Gane, Jie Luo, Thomas J Oldfield, Claire O'Donovan, Maria-Jesus Martin, Gerard J Kleywegt.
Abstract
The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts) is a close collaboration between the Protein Data Bank in Europe (PDBe) and UniProt. The two teams have developed a semi-automated process for maintaining up-to-date cross-reference information to UniProt entries, for all protein chains in the PDB entries present in the UniProt database. This process is carried out for every weekly PDB release and the information is stored in the SIFTS database. The SIFTS process includes cross-references to other biological resources such as Pfam, SCOP, CATH, GO, InterPro and the NCBI taxonomy database. The information is exported in XML format, one file for each PDB entry, and is made available by FTP. Many bioinformatics resources use SIFTS data to obtain cross-references between the PDB and other biological databases so as to provide their users with up-to-date information.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23203869 PMCID: PMC3531078 DOI: 10.1093/nar/gks1258
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The SIFTS pipeline combines manual and automated processes to produce up-to-date residue-level mappings between proteins in the PDB and their corresponding UniProtKB entry. The pipeline also enriches the annotations of proteins in the PDB by adding data from other biological resources. The SIFTS data are distributed in XML format.
Number of PDB entries with cross-reference information in SIFTS to other data resources (as of 24 October 2012)
| Total PDB entries processed | 85 582 |
| Entries with UniProtKB cross-reference | 81 029 |
| Entries with residue-level mapping | 83 143 |
| Entries with no possible UniProtKB cross-reference | 4336 |
| Entries awaiting mapping | 217 |
| Entries with NCBI taxonomy identifier | 80 608 |
| Entries with cross-reference to InterPro | 79 886 |
| Entries with Pfam family annotation | 78 401 |
| Entries with cross-reference to Gene Ontology terms | 71 227 |
| Entries with primary citation PubMed identifier | 69 417 |
| Entries with assigned CATH identifier | 50 110 |
| Entries with SCOP cross-reference | 38 054 |
| Entries with assigned EC classification | 43 730 |
Figure 2.The PDBeXplore [6] and UniPDB [6] tools were made possible by the availability of SIFTS data. (a) PDBeXplore (http://pdbe.org/browse) is a browser that enables analysis of the PDB archive based on chemical and biological ontology and classification systems. The figure shows a pie chart of the distribution of ‘CATH architecture’ data for entries that have been annotated with the selected GO term (‘apoptotic process’; GO:0006915). (b) UniPDB (http://pdbe.org/unipdb) provides a graphical display of the availability and extent of 3D structural coverage for a given UniProtKB entry in the PDB. The figure shows the number of PDB entries and the extent of coverage for the human complement C5 protein (UniProt accession P01031), making it easy to identify PDB entries containing the structure of the complete protein or a part of it (e.g. PDB entry 1kjs contains the structure of a small part of the sequence that includes the anaphylotoxin-like Pfam domain, PF01821).