| Literature DB >> 29554239 |
Matteo Lo Monte1, Candida Manelfi2, Marica Gemei2,3, Daniela Corda1, Andrea Rosario Beccari1,2.
Abstract
Motivation: ADP-ribosylation is a post-translational modification (PTM) implicated in several crucial cellular processes, ranging from regulation of DNA repair and chromatin structure to cell metabolism and stress responses. To date, a complete understanding of ADP-ribosylation targets and their modification sites in different tissues and disease states is still lacking. Identification of ADP-ribosylation sites is required to discern the molecular mechanisms regulated by this modification. This motivated us to develop a computational tool for the prediction of ADP-ribosylated sites.Entities:
Mesh:
Year: 2018 PMID: 29554239 PMCID: PMC6061869 DOI: 10.1093/bioinformatics/bty159
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic framework of ADPredict development. On the left, the vertical labels list the main stages of the study; relative itemized details follow. On the right, a diagram schematizes the activity flow
Fig. 2.ADP-ribosylated site distribution among the training set proteins. Yellow dots mark the percentage of proteins reporting one or up to three modifications. Blue and orange lines refer to the count of modified aspartic acid and glutamic acid, respectively; gray bars display the percentage of proteins (left y-axis) with a certain number of modifications; and the green line represents the cumulative curve (right y-axis)
Secondary structure hashing strategy
| VLS | Primary structure | Secondary structure | Metrics | Hashed motif |
|---|---|---|---|---|
| 9 | MATTEWLMN | CHH–HHT–TTC | 3-3-3 | H–H–T |
| 11 | WMATTEWLMNT | CCH–HHHTT–TCE | 3-5-3 | C–H–0 |
| 13 | YWMATTEWLMNTY | ECCH–HHHTT–TCEE | 4-5-4 | C–H–E |
| 15 | IYWMATTEWLMNTYA | EECCH–HHHTT–TCEEE | 5-5-5 | 0–H–E |
Note: Example of the hashing strategy exploited to annotate secondary structure information of the considered sub-sequences. The metrics accounts for the fragmentation of the annotated string. For each fragment, the most representative fold is taken; when not possible uncertainty is introduced (0).
Fig. 3.Cross-validation (a)–(c) and Benchmark (d)–(f) ROC curves. (a) Primary sequence-based models, (b) secondary structure-based models and (c) 3-D based model L1O results. Comparative analysis of the ADPredict and, ModPred performances in predicting (d) Yu, (e) Kraus and (f) Nielsen datasets. ModPred PSSM performance is evaluated for the Yu dataset only
Cross-validation results
| EF (TOP3) | Proteins with EF > 2 (%) | ROC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| L1O | L10%O | L1O | L10%O | ||||||||
| Mean | SDEV | Mean | SDEV(Mean) | SDEV(All) | Mean | SDEV | Mean | SDEV(Mean) | SDEV(All) | ||
| AAD-RP | 2.406 | 2.601 | 2.453 | 0.536 | 3.041 | 50.6 | 0.684 | 0.230 | 0.666 | 0.042 | 0.236 |
| AAD-RF | 2.778 | 7.046 | 2.499 | 1.307 | 7.481 | 31.0 | 0.684 | 0.224 | 0.671 | 0.039 | 0.238 |
| AAD-SVM | 3.104 | 8.157 | 2.884 | 1.432 | 8.733 | 26.8 | 0.671 | 0.247 | 0.666 | 0.041 | 0.250 |
| HM-ratio | 2.392 | 4.245 | 2.393 | 1.846 | 3.949 | 32.4 | 0.603 | 0.281 | 0.613 | 0.126 | 0.283 |
| HM-RP | 1.707 | 1.324 | 1.643 | 0.590 | 1.343 | 37.8 | 0.622 | 0.214 | 0.607 | 0.093 | 0.214 |
| 3-D-RP | 1.745 | 1.498 | 1.918 | 0.919 | 2.068 | 48.1 | 0.650 | 0.233 | 0.654 | 0.105 | 0.231 |
| ADPredict | 3.427 | 7.331 | 3.395 | 1.445 | 8.206 | 33.5 | 0.707 | 0.234 | 0.7 | 0.04 | 0.235 |
Note: Resuming table of the model performance in predicting ADP-ribosylated sites of the training set, in both a L1O and a L10%O cross-validation sessions. Selected models for each class of properties are reported, as well as the consumptive model, ADPredict. EF and ROC values, along with relative SDEV values, are calculated as evaluation functions. Proteins with an EF higher than two are considered correctly predicted and are here reported as percentage of the training set.
External validation results
| Nielsen dataset | Kraus dataset | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| EF (TOP3) | Proteins with EF > 2 (%) | ROC | EF (TOP3) | Proteins with EF > 2 (%) | ROC | |||||
| Mean | SDEV | Mean | SDEV | Mean | SDEV | Mean | SDEV | |||
| AAD-RP | 1.190 | 2.018 | 25.1 | 0.538 | 0.243 | 2.569 | 2.617 | 51.8 | 0.696 | 0.244 |
| AAD-RF | 1.289 | 6.998 | 10.7 | 0.532 | 0.249 | 3.176 | 6.823 | 30.0 | 0.689 | 0.248 |
| AAD-SVM | 1.387 | 7.013 | 11.5 | 0.534 | 0.262 | 2.585 | 5.991 | 26.4 | 0.661 | 0.249 |
| HM-ratio | 1.532 | 2.897 | 26.3 | 0.545 | 0.316 | 1.037 | 2.293 | 16.8 | 0.510 | 0.269 |
| HM-RP | 1.413 | 2.604 | 23.7 | 0.549 | 0.304 | 1.033 | 1.698 | 19.3 | 0.500 | 0.250 |
| 3-D-RP | 0.994 | 1.572 | 19.5 | 0.529 | 0.261 | 1.987 | 2.150 | 46.2 | 0.656 | 0.255 |
| ADPredict | 1.399 | 6.987 | 12.1 | 0.547 | 0.247 | 2.624 | 5.628 | 28.3 | 0.706 | 0.228 |
Note: Resuming table of the model performance in predicting ADP-ribosylated sites of the two external datasets. EF and Roc values, along with relative SDEV values, refers to a L1O session.
Fig. 4.ADPredict web application homepage semi-screenshot