| Literature DB >> 34581824 |
Weizhi Zhang1, Xiaodan Tan1, Shaofeng Lin1, Yujie Gou1, Cheng Han1, Chi Zhang1, Wanshan Ning1, Chenwei Wang1, Yu Xue1,2.
Abstract
Here, we reported the compendium of protein lysine modifications (CPLM 4.0, http://cplm.biocuckoo.cn/), a data resource for various post-translational modifications (PTMs) specifically occurred at the side-chain amino group of lysine residues in proteins. From the literature and public databases, we collected 450 378 protein lysine modification (PLM) events, and combined them with the existing data of our previously developed protein lysine modification database (PLMD 3.0). In total, CPLM 4.0 contained 592 606 experimentally identified modification events on 463 156 unique lysine residues of 105 673 proteins for up to 29 types of PLMs across 219 species. Furthermore, we carefully annotated the data using the knowledge from 102 additional resources that covered 13 aspects, including variation and mutation, disease-associated information, protein-protein interaction, protein functional annotation, DNA & RNA element, protein structure, chemical-target relation, mRNA expression, protein expression/proteomics, subcellular localization, biological pathway annotation, functional domain annotation, and physicochemical property. Compared to PLMD 3.0 and other existing resources, CPLM 4.0 achieved a >2-fold increase in collection of PLM events, with a data volume of ∼45GB. We anticipate that CPLM 4.0 can serve as a more useful database for further study of PLMs.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34581824 PMCID: PMC8728254 DOI: 10.1093/nar/gkab849
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The procedure for development of CPLM 4.0. (A) First, we manually collected experimentally identified PLM substrates and sites from PubMed. We also integrated the existing data of 10 public databases, including PLMD 3.0 (28), dbPTM (30), ProteomeScout (31), iPTMnet (32), BioGRID (33), PhosphoSitePlus (34), mUbiSiDa (35), HPRD (36), ActiveDriverDB (37) and UniProt (29) (Supplementary Table S1). Furthermore, we annotated the PLM proteins and sites, using the knowledge from 102 additional databases that covered 13 aspects: (i) variation and mutation; (ii) disease-associated information; (iii) protein–protein interaction; (iv) protein function; (v) DNA & RNA element; (vi) chemical–target relation; (vii) protein structure; (viii) mRNA expression; (ix) physicochemical property; (x) protein expression/proteomics; (xi) subcellular localization; (xii) biological pathway; (xiii) domain annotation (Supplementary Table S2). Kla, lysine lactylation; Kcr, lysine crotonylation; Kmal, lysine malonylation; Kbhb, lysine β-hydroxybutyrylation; Kub: lysine ubiquitination; Kac, lysine acetylation; Ksucc, lysine succinylation. (B) A comparison of PLM events and proteins between CPLM 4.0 and other existing resources.
Figure 2.The data statistics and analysis of the data in CPLM 4.0. (A) The 29 PLM types were classified into three categories, including acylation, Ub/Ubl conjugation and others. The numbers of PLM sites and proteins were shown for each PLM type. (B) The distribution of different types of PLM protein substrates for the top 10 abundant species. More details on the data statistics were shown in Supplementary Table S4. (C) The proteins with at least one PLM site to be multiply regulated by acetylation, succinylation, malonylation, 2-hydroxyisobutyrylation and crotonylation. More details on the potential crosstalks among different types of PLMs were shown in Supplementary Table S5. (D) The GO-based enrichment analysis of the 687 multiply regulated PLM proteins (P-value < 1E–18).
Figure 3.The browse options of CPLM 4.0. (A) Browse by modification types. (B) Browse by species. (C) The tabular list and the protein page of human H3C1. Besides the basic information and details on PLM sites, additional annotations could be accessed. By clicking on the ‘Enlarging’ button under the structure window, the protein 3D structure of human H3C1 could be viewed in a larger window. (D) The annotation page of H3C1. As an example, ICGC cancer missense mutations that change PLM sites of H3C1 were shown (46).
Figure 4.The overview of integrated annotations for human H3C1. A brief summary of all the 111 data resources used in this study is shown in Supplementary Table S2. The details on processing each of the resources were present in Supplementary methods.