| Literature DB >> 27174170 |
Xiang Chen1, Shao-Ping Shi1,2, Hao-Dong Xu1, Sheng-Bao Suo1, Jian-Ding Qiu1,3.
Abstract
The pathways of protein post-translational modifications (PTMs) have been shown to play particularly important roles for almost any biological process. Identification of PTM substrates along with information on the exact sites is fundamental for fully understanding or controlling biological processes. Alternative computational strategies would help to annotate PTMs in a high-throughput manner. Traditional algorithms are suited for identifying the common organisms and tissues that have a complete PTM atlas or extensive experimental data. While annotation of rare PTMs in most organisms is a clear challenge. In this work, to this end we have developed a novel homology-based pipeline named PTMProber that allows identification of potential modification sites for most of the proteomes lacking PTMs data. Cross-promotion E-value (CPE) as stringent benchmark has been used in our pipeline to evaluate homology to known modification sites. Independent-validation tests show that PTMProber achieves over 58.8% recall with high precision by CPE benchmark. Comparisons with other machine-learning tools show that PTMProber pipeline performs better on general predictions. In addition, we developed a web-based tool to integrate this pipeline at http://bioinfo.ncu.edu.cn/PTMProber/index.aspx. In addition to pre-constructed prediction models of PTM, the website provides an extensional functionality to allow users to customize models.Entities:
Year: 2016 PMID: 27174170 PMCID: PMC4865729 DOI: 10.1038/srep25801
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Test results for PTMProber pipeline.
| PTM | Rattus norvegicus (%) | Gallus gallus (%) | Bos taurus (%) | |||
|---|---|---|---|---|---|---|
| M1 | M2 | M1 | M2 | M1 | M2 | |
| Phosphorylation | 84.6/61.8 | 81.8/100.0 | 90.1/74.7 | 85.4/100.0 | 82.5/72.9 | 79.6/100.0 |
| Acetylation | 84.6/76.4 | 83.8/100.0 | 91.7/84.6 | 86.7/100.0 | 89.5/80.6 | 80.4/100.0 |
| Ubiquitination | 71.5/70.3 | 68.7/100.0 | 100.0/100.0 | 100.0/100.0 | 100.0/100.0 | 100.0/100.0 |
| Sumoylation | 95.4/72.4 | 93.5/100.0 | 75.0/75.0 | 70.9/100.0 | 83.3/80.0 | 71.4/100.0 |
| Methylation | 85.1/58.8 | 82.2/100.0 | 81.8/75.0 | 78.1/100.0 | 92.4/74.7 | 84.2/100.0 |
The segregative digits in each cell by backslash character represent the ratio of correct prediction sites in all prediction sites by model (refer to precision) and the ratio of correct prediction sites in all PTM data (refer to recall).
Figure 1Method comparison.
(A) Comparing the number of predicted PTM sites (y-axis) in p53 family proteins between MLA (machine-learning approach, yellow bars) and SSA (sequence similarity-based approach for PTMProber, peacock blue bars) for five PTM types. Our five MLAs works include SubPhosPred tool for phosphorylation, PSKAcePred tool for acetylation, UbiProber tool for ubiquitination, PMeS tool for methylation and SUMOAMVR for sumoylation. (B) Distribution of the number of phosphorylation sites (x-axis) predicted on the 21 proteins of p53 family by PTMProber.
Figure 2Extensible contents of PTMProber.
The extensible source (yellow box) includes data of known proteome that is constructed known proteome BLAST database and known peptides BLAST database and data of query proteome that yields query proteome BLAST database. PTMProber pipeline runs PTM prediction by invoking these databases (green box) updating from various data source (red box). The databases generated from makeblastdb programs in the blast-2.2.28+ package.
Figure 3System flow of PTMProber pipeline.
The gray box represents the computing node, while red characters correspond with the numbered steps in PTMProber pipeline. The step 0a represents that the protein as a query must correspond to query proteome database; and the query protein database must contain query protein. The step 0b acquires the known peptide from the known proteome, and vice versa. The steps 1–6 represent that search or obtain the corresponding symbol contained in gray box. The steps 5–6 (BLAST-2 and BLAST-3) ascertain whether user’s query proteins contain PTM sites.