| Literature DB >> 30445541 |
Jose M Dana1, Aleksandras Gutmanas1, Nidhi Tyagi2, Guoying Qi2, Claire O'Donovan3, Maria Martin2, Sameer Velankar1.
Abstract
The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts/) was established in 2002 and continues to operate as a collaboration between the Protein Data Bank in Europe (PDBe; http://pdbe.org) and the UniProt Knowledgebase (UniProtKB; http://uniprot.org). The resource is instrumental in the transfer of annotations between protein structure and protein sequence resources through provision of up-to-date residue-level mappings between entries from the PDB and from UniProtKB. SIFTS also incorporates residue-level annotations from other biological resources, currently comprising the NCBI taxonomy database, IntEnz, GO, Pfam, InterPro, SCOP, CATH, PubMed, Ensembl, Homologene and automatic Pfam domain assignments based on HMM profiles. The recently released implementation of SIFTS includes support for multiple cross-references for proteins in the PDB, allowing mappings to UniProtKB isoforms and UniRef90 cluster members. This development makes structure data in the PDB readily available to over 1.8 million UniProtKB accessions.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30445541 PMCID: PMC6324003 DOI: 10.1093/nar/gky1114
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic diagrams of the SIFTS process. (A) Overall view of the data flow from PDB, UniProtKB and other resources to data distribution. (B) Calculation of direct mappings between protein structures in PDB and UniProtKB sequences, including isoforms. The process in panel B is invoked weekly and the data are released concurrently with the release of new PDB structures (see text). (C) Calculation of mappings for UniRef90 dataset. The process in panel C is invoked after the weekly release of new PDB structures.
Structure coverage of proteomes of selected model organisms via the UniRef90 clusters
| Number of UniProtKB accessions (unique protein names) from an organism → Organism | (1) Direct mappings to PDB entries with at least 70% sequence coverage | (2) In SIFTS UniRef90 datasets, excluding accessions in (1) | (3) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from another organism | (4) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from the same organism | (5) In SIFTS UniRef90 datasets, and mapping to both PDB sequence from the same and from different organism | (6) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from another organism only, i.e., inaccessible from the same species |
|---|---|---|---|---|---|---|
|
| 3010 (2959) | 26 673 (4918) | 1799 (1377) | 26 907 (5287) | 689 (531) | 1318 (970) |
|
| 203 (202) | 262 (205) | 22 (22) | 263 (206) | - | 21 (21) |
|
| 764 (752) | 4289 (2621) | 3264 (2144) | 1614 (911) | 270 (159) | 3045 (1954) |
|
| 2042 (1658) | 272 533 (14 080) | 27 801 (2307) | 258 324 (12 836) | 12 925 (1013) | 27 663 (2288) |
|
| 1187 (1168) | 12 070 (3841) | 789 (258) | 12 121 (3894) | 700 (214) | 725 (207) |
|
| 156 (156) | 5 (5) | 6 (5) | 1 (1) | - | 4 (4) |
|
| 106 (97) | 30 (27) | 10 (9) | 35 (32) | 2 (2) | 8 (8) |
|
| 71 (68) | 493 (341) | 408 (283) | 105 (72) | 7 (6) | 406 (282) |
|
| 344 (342) | 674 (472) | 73 (51) | 652 (465) | 1 (1) | 63 (47) |
|
| 48 (48) | 396 (118) | 279 (81) | 134 (49) | 12 (8) | 276 (79) |
Structure coverage of the UniProt human proteome
| Manually curated human proteins (Swiss-Prot) | Automatically curated human proteins (TrEMBL) and part of the UniProt Reference Proteome | Manually or automatically curated human proteins which are not included in the UniProt Reference Proteome | ||
|---|---|---|---|---|
| Number of UniProtKB accessions with a direct SIFTS mapping to proteins in the PDB and with 70% or more sequence coverage | Canonical | 2920 | 1 | 107 |
| Other isoforms | 2618a | 8b | ||
| Number of UniProtKB accessions in UniRef90 clusters with at least one SIFTS mapping to a PDB structure (excluding direct mappings) | Canonical | 240 | 21 | 24056 |
| Other isoforms | 169a | 2279b |
aThe number of isoforms of manually curated proteins (Swiss-Prot) includes an expansion into all isoforms of the canonical sequences from the corresponding row above.
bThe number of isoforms for mappings (direct or via UniRef90 clusters) to automatically curated proteins (TrEMBL) does not include the expansion of the canonical sequences.
Enzymes (Enzyme Commission numbers) in the IntEnz resource that are not annotated in the PDB but that belong to UniRef90 clusters with a mapping to PDB structure
| Mappings to PDB structures annotated with a different EC number from IntEnz | |||||||
|---|---|---|---|---|---|---|---|
| EC number in UniRef90 | Enzyme name in UniRef90 | UniProtKB accession in UniRef90 | Sequence identity to PDB entries | PDB entries (possible templates) | EC number associated with PDB entry | Enzyme name in PDB entry | UniProtKB accession mapped to PDB structure |
| 1.1.1.96 | Diiodophenylpyruvate reductase | P40925 | 95% | 4mdh 5mdh | 1.1.1.37 | Malate dehydrogenase | P11708 |
| 1.6.2.6 | Leghemoglobin reductase | Q41219 | 96% | 1dxl | 1.8.1.4 | Dihydrolipoyl dehydrogenase | P31023 |
| 3.4.24.73 | Jararhagin | P30431 | 95% | 3dsl | 3.4.24.49 | Bothropasin | O93523 |
| 3.5.4.45 | Melamine deaminase | Q9EYU0 | 98% | 4v1x 4v1y | 3.8.1.8 | Atrazine chlorohydrolase | P72156 |
| 3.7.1.13 | 2-hydroxy-6-oxo-6-(2-aminophenyl)hexa-2,4-dienoate hydrolase | Q9AQM4 | 98% | 1j1i | 3.7.1.8 | 2,6-dioxo-6-phenylhexa-3-enoate hydrolase | Q84II3 |
| 4.1.2.9 | Phosphoketolase | Q9AEM9 | 95% | 3ahc 3ahd 3ahe 3ahf 3ahg 3ahh 3ahi 3ahj | 4.1.2.22 | Fructose-6-phosphate phosphoketolase | D6PAH1 |
| 4.2.3.32 | Levopimaradiene synthase | H8ZM70 | 99% | 3s9v | 4.2.3.18 4.2.3.132 | Abieta-7,13-diene synthase Neoabietadiene synthase | Q38710 |
| 4.2.3.44 | Isopimara-7,15-diene synthase | H8ZM71 | 92% | 5.5.1.12 | Copalyl diphosphate synthase | ||
| 4.5.1.5 | S-carboxymethylcysteine synthase | P0ABK5 | 100% | 5j43 5j5v | 2.5.1.47 | Cysteine synthase | P0ABK6 |
| 5.3.1.34 | D-erythrulose 4-phosphate isomerase | Q9ZB26 | 99% | 5ifz | 5.3.1.6 | Ribose-5-phosphate isomerase | Q8YCV4 |
| 6.5.1.6 | DNA ligase (ATP or NAD(+)) | Q9HHC4 | 91% | 3rr5 | 6.5.1.1 | DNA ligase (ATP) | C0LJI8 |
| Mappings to PDB structures lacking annotation with an EC number from IntEnz | |||||||
| EC number in UniRef90 | Enzyme name in UniRef90 | UniProtKB accession in UniRef90 | Sequence identity to PDB entries | PDB entries (possible templates) | UniProtKB accession mapped to PDB structure | Unreviewed protein name from mapped UniProtKB accession | |
| 1.14.14.11 | Styrene monooxygenase | O50214 | 100% | 3ihm | O33471 | Styrene monooxygenase component A | |
| 1.3.1.29 |
| P0A170 | 98% | 5xtf 5xtg | G9G7I7 | 2,3-dihydroxy-2,3-dihydrophenylpropionate dehydrogenase | |
| 1.3.1.60 | Dibenzothiophene dihydrodiol dehydrogenase | ||||||
| 2.3.1.228 | Isovaleryl-homoserine lactone synthase | Q89VI2 | 100% | 5w8a 5w8c 5w8d 5w8e 5w8g | A0A0N0C224 | Autoinducer synthase | |
| 2.3.1.60 | Gentamicin 3- | P23181 | 99% | 6bvc | Q53396 | Aminoglycoside-(3)- | |
| 2.4.1.292 | GalNAc-alpha-(1→4)-GalNAc-alpha-(1→3)-diNAcBac-PP-undecaprenol alpha-1,4- | Q0P9C5 | 97% | 6eji 6ejj 6ejk | O86151 | WlaC protein | |
| 2.8.2.37 | Trehalose 2-sulfotransferase | A0QQ53 | 100% | 1tex | P84151 | Putative sulfotransferase | |
| 2.8.3.10 | Citrate CoA-transferase | P45413 | 92% | 1xr4 | Q8ZRY1 | Citrate lyase alpha chain | |
| 3.1.1.59 | Juvenile-hormone esterase | P19985 | 100% | 2fj0 | Q9GPG0 | Carboxylic ester hydrolase | |
| 3.2.1.94 | Glucan 1,6-alpha-isomaltosidase | Q44052 | 97% | 5awo 5awp 5awq | Q7WSN5 | Isomaltodextranase | |
| 3.5.1.105 | Chitin disaccharide deacetylase | Q99PX1 | 99% | 3wx7 | A6P4T5 | Chitin oligosaccharide deacetylase COD1 | |
| 4.2.1.163 | 2-Oxo-hept-4-ene-1,7-dioate hydratase | P42270 | 100% | 2eb4 2eb5 2eb6 | Q46982 | 2-hydroxyhexa-2,4-dienoate hydratase | |
| 4.2.1.168 | GDP-4-dehydro-6-deoxy-alpha-D-mannose 3-dehydratase | D3QY10 | 100% | 2gms 2gmu | Q9F118 | Putative pyridoxamine 5-phosphate-dependent dehydrase | |
| 4.2.3.108 | 1,8-Cineole synthase | O81191 | 92% | 2j5c | A6XH05 | Cineole synthase | |
| 6.2.1.13 | Acetate–CoA ligase (ADP-forming) | Q8U3D6 | 92% | 2csu | O58493 | Uncharacterized protein | |
| 6.3.2.39 | Aerobactin synthase | Q47318 | 92% | 6cn7 | Q6U605 | IucA/IucC family siderophore biosynthesis protein | |