| Literature DB >> 25911152 |
Payman Nickchi1, Mohieddin Jafari2, Shiva Kalantari1.
Abstract
Conventional proteomics has discovered a wide gap between protein sequences and biological functions. The third generation of proteomics was provoked to bridge this gap. Targeted and untargeted post-translational modification (PTM) studies are the most important parts of today's proteomics. Considering the expensive and time-consuming nature of experimental methods, computational methods are developed to study, analyze, predict, count and compute the PTM annotations on proteins. The enrichment analysis softwares are among the common computational biology and bioinformatic software packages. The focus of such softwares is to find the probability of occurrence of the desired biological features in any arbitrary list of genes/proteins. We introduce Post-translational modification Enrichment Integration and Matching Analysis (PEIMAN) software to explore more probable and enriched PTMs on proteins. Here, we also represent the statistics of detected PTM terms used in enrichment analysis in PEIMAN software based on the latest released version of UniProtKB/Swiss-Prot. These results, in addition to giving insight to any given list of proteins, could be useful to design targeted PTM studies for identification and characterization of special chemical groups. Database URL: http://bs.ipm.ir/softwares/PEIMAN/Entities:
Mesh:
Year: 2015 PMID: 25911152 PMCID: PMC4408379 DOI: 10.1093/database/bav037
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The schematic procedure to exploit PTM information. This figure shows the undertaken procedure to create the ‘PEIMAN Database’, necessary for PTM enrichment and visualization. The complete downloaded UniProtKB/Swiss-Prot database, which was manually reviewed and consists of 546 439 proteins (October 2014), was selected. Filtering process just returns back the ID, AC, OS, CC, KW, FT and DR for protein with PTM annotations, necessary for enrichment and save them in ‘216,397 proteins with PTMs annotation’ table. ‘PEIMAN Database’ has all the necessary information about a protein which is required for enrichment and visualization. The total number of proteins with PTM annotation is 216 397.
Figure 2.(A) The constructed DAG is also depicted in the figure. The DAG consists of three levels: PTM, KW and FT, respectively. Some of the nodes are shown as sample to explicitly demonstrate the DAG structure. For example, d-4-hydroxyvaline and d-valine are both categorized in FT list and are located at the third level of the DAG. Both of these terms are the child of d-amino acid which is categorized in KW list and is located at the second level of the DAG. (B) The information about 216 397 proteins which somehow have valuable PTM vocabulary is also depicted. The information about PTM type is saved in one of the two fields: KW or FT. 129 553 (60%) proteins have PTM annotations in KW and 4740 (2%) proteins have the vocabularies just in FT field. Number of proteins that have vocabularies both in KW and FT is 82 104 (38%). (C) We consider proteins having three types of PTM annotation namely, LIPID, CROSSLNK and MOD_RES in UniProt database which indicates changes in the structure of proteins. The frequency of proteins having MOD_RES, LIPID and CROSSLNK is 62 675 (83.07%), 8681 (11.53%) and 4091 (5.4%), respectively.
Figure 3.The frequency of PTM terms. The pie plot demonstrates that Nucleotide binding, Phosphoprotein and Disulfide bond have the highest frequency among other PTM vocabularies in UniProtKB/Swiss-Prot with frequencies 97 643, 36 917 and 32 930, respectively. The figure also provides the word clouds of PTM terms in 10 of the well-known model organisms namely H. sapiens, M. musculus, R. norvegicus, D. melanogaster, D. riero (Zebrafish), C. elegans, S. cerevisiae, A. thaliana, O. sativa (Rice) and E. coli. More details are presented in Supplementary File 2.
Figure 4.The PEIMAN environment. (A) Input parameters in PEIMAN software. The fields with star sign indicates the mandatory input parameters in the software. (B) PEIMAN output. The figure demonstrates the output of PTM Enrichment analysis in software. A table and a bar chart are produced after the analysis is completed. The table provides the ID column for each PTM vocabulary. It also provides the frequency of each PTM vocabulary in UniProtKB and the corresponding frequency in protein list. The percent of each PTM vocabulary in UniProtKB and given list of protein is provided as well. The table shows which UniProtKB accession numbers in protein list have the corresponding PTM vocabularies. A P-value and corrected P-value (if multiple correction method is chosen) are provided for further analysis. A bar chart is provided to better represent the data. A comment about where more information is accessible about each protein (Database cross-reference—DR) is provided as well. The result of the integration and matching analysis are presented for two separate protein lists. The produced table and bar chart gives a better understanding about the PTM vocabularies found in two separate protein lists. The table provides the P-value, corrected P-value, frequency and percent of each PTM vocabulary in both lists.
The top 10 ranked PTM terms based on Khoury et al. and PEIMAN software
| Putative | Experimental | Total | ||||
|---|---|---|---|---|---|---|
| Khoury | PEIMAN | Khoury | PEIMAN | Khoury | PEIMAN | |
| 1 | N-linked glycosylation (98 732) | Phosphoserine (16 067) | Phosphoserine (30 795) | Phosphoserine (8337) | Phosphoprotein (108 222–39 733*) | Nucleotide-binding (97 643–117 591*) |
| 2 | Phosphoserine (39 478) | Phosphothreonine (7220) | Phosphothreonine (6031) | N6-(pyridoxal phosphate)lysine (4871) | N-linked glycosylation (104 966–437*) | Phosphoprotein (36 917–39 733*) |
| 3 | N6-acetyllysine (16 852) | N6-acetyllysine (6923) | N-linked glycosylation (5996) | Phosphothreonine (3165) | Acetylation (33 291–18 702*) | Disulfide bond (32 930–33 277*) |
| 4 | Phosphothreonine (10 291) | Phosphotyrosine (3684) | N6-acetyllysine (4929) | N-palmitoyl cysteine (1909) | Methylation (10 295–11 066*) | Glycoprotein (28 874–29 827*) |
| 5 | N6-(pyridoxal phosphate)lysine (6311) | N-acetylalanine (2986) | Glycyl lysine isopeptide (4919) | S-diacylglycerol cysteine (1909) | Palmitoylation (6069–1137*) | Phosphoserine (24 395–24 565*) |
| 6 | Phosphotyrosine (5808) | N-acetylmethionine (2456) | Phosphotyrosine (2176) | Phosphohistidine (1846) | Amidation (5548–3639*) | Acetylation (17 407–18 702*) |
| 7 | N6-succinyllysine (5397) | Glycyl lysine isopeptide (2121) | N-acetylalanine (1452) | Cysteine persulfide (1802) | Citrullination (4808–289*) | Phosphothreonine (10 385–10 513*) |
| 8 | Citrulline (4670) | Glycyl lysine isopeptide (Lys-Gly) (2032) | N6-succinyllysine (1380) | N6-acetyllysine (1773) | O-linked glycosylation (4104–343*) | Ubl conjugation (9182–9408*) |
| 9 | S-palmitoyl cysteine (3578) | N6-succinyllysine (2021) | O-linked glycosylation (1343) | N6-carboxylysine (1436) | Sulfation (3842–697*) | N6-acetyllysine (8693–8704*) |
| 10 | O-linked glycosylation (2684) | S-palmitoyl cysteine (1943) | Interchain with G-Cter in ubiquitin (1136) | N5-methylglutamine (1224) | Hydroxylation (3259–1669*) | Lipoprotein (8409–10 968*) |
The starred numbers indicate the number of UniProt search engine hits. The small differences between PEIMAN database and UniProtKB are due to our strategy to find PTM terms in FT and KW fields only.