| Literature DB >> 30380109 |
Yaping Guo1, Di Peng1, Jiaqi Zhou1, Shaofeng Lin1, Chenwei Wang1, Wanshan Ning1, Haodong Xu1, Wankun Deng1, Yu Xue1.
Abstract
Here, we described the updated database iEKPD 2.0 (http://iekpd.biocuckoo.org) for eukaryotic protein kinases (PKs), protein phosphatases (PPs) and proteins containing phosphoprotein-binding domains (PPBDs), which are key molecules responsible for phosphorylation-dependent signalling networks and participate in the regulation of almost all biological processes and pathways. In total, iEKPD 2.0 contained 197 348 phosphorylation regulators, including 109 912 PKs, 23 294 PPs and 68 748 PPBD-containing proteins in 164 eukaryotic species. In particular, we provided rich annotations for the regulators of eight model organisms, especially humans, by compiling and integrating the knowledge from 100 widely used public databases that cover 13 aspects, including cancer mutations, genetic variations, disease-associated information, mRNA expression, DNA & RNA elements, DNA methylation, molecular interactions, drug-target relations, protein 3D structures, post-translational modifications, protein expressions/proteomics, subcellular localizations and protein functional annotations. Compared with our previously developed EKPD 1.0 (∼0.5 GB), iEKPD 2.0 contains ∼99.8 GB of data with an ∼200-fold increase in data volume. We anticipate that iEKPD 2.0 represents a more useful resource for further study of phosphorylation regulators.Entities:
Year: 2019 PMID: 30380109 PMCID: PMC6324023 DOI: 10.1093/nar/gky1063
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The procedure for the construction of iEKPD 2.0. First, we searched PubMed for experimental verified PKs, PPs and PPBD-containing proteins. We hierarchically classified all known PKs, PPs and PPBDs proteins to distinct groups and families and built HMM profiles for all available families. Then, we conducted HMM identification in 164 eukaryotes. For families without an HMM profile, we further performed orthologue detection using the reciprocal best-hit approach. In addition to basic annotation, we further integrated 100 public databases based on 13 aspects: (i) cancer mutations, (ii) genetic variations, (iii) disease-associated information, (iv) mRNA expression, (v) DNA and RNA elements, (vi) DNA methylation, (vii) molecular interactions, (viii) drug–target relations, (ix) protein 3D structures, (x) post-translational modifications, (xi) protein expressions/proteomics, (xii) subcellular localizations and (xiii) protein functional annotations.
Figure 2.The classification of PKs, PPs and PPBDs together with cut-off values for all 176 HMM profiles. Log-odds likelihood scores are used as cutoffs for each family to avoid inconsistent results when the database is updated.
Figure 3.Usage of iEKPD 2.0. (A) Browse by species. (B) Browse by classification. (C) Basic annotation page of human PGAM5. (D) Additional annotation covering 13 aspects of human PGAM5.
Figure 4.Overview of comprehensive annotations of human MTOR. Record numbers integrated from 78 additional databases are presented. A more detailed summary of 100 databases is provided in Supplementary Table S7.