| Literature DB >> 24260292 |
Mathieu Emily1, Anthony Talvas, Christian Delamarche.
Abstract
The aggregation of proteins or peptides in amyloid fibrils is associated with a number of clinical disorders, including Alzheimer's, Huntington's and prion diseases, medullary thyroid cancer, renal and cardiac amyloidosis. Despite extensive studies, the molecular mechanisms underlying the initiation of fibril formation remain largely unknown. Several lines of evidence revealed that short amino-acid segments (hot spots), located in amyloid precursor proteins act as seeds for fibril elongation. Therefore, hot spots are potential targets for diagnostic/therapeutic applications, and a current challenge in bioinformatics is the development of methods to accurately predict hot spots from protein sequences. In this paper, we combined existing methods into a meta-predictor for hot spots prediction, called MetAmyl for METapredictor for AMYLoid proteins. MetAmyl is based on a logistic regression model that aims at weighting predictions from a set of popular algorithms, statistically selected as being the most informative and complementary predictors. We evaluated the performances of MetAmyl through a large scale comparative study based on three independent datasets and thus demonstrated its ability to differentiate between amyloidogenic and non-amyloidogenic polypeptides. Compared to 9 other methods, MetAmyl provides significant improvement in prediction on studied datasets. We further show that MetAmyl is efficient to highlight the effect of point mutations involved in human amyloidosis, so we suggest this program should be a useful complementary tool for the diagnosis of these diseases.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24260292 PMCID: PMC3834037 DOI: 10.1371/journal.pone.0079722
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Receiver operating characteristic (ROC curves) obtained for the 4 selected predictors, PAFIG, SALSA, Fold Amyloid and Waltz, and Leave-One-Out cross validated MetAmyl on the training dataset.
Area Under the Curve (AUC) based on the training dataset.
| Predictor | AUC [95% CI] | p.value | AUC [95% CI] | AUC [95% CI] |
| (AUC vs MetAmyl AUC) | (FPR: 0–20%) | (FPR: 0–5%) | ||
| MetAmyl | 0.89 [0.87–0.92] | 1 | 0.13 [0.12–0.15] | 0.018 [0.014–0.024] |
| Waltz | 0.85 [0.82–0.88] | 0.029 | 0.10 [0.07–0.11] | 0.005 [0.002–0.011] |
| PAFIG | 0.82 [0.79–0.86] | 0.016 | 0.10 [0.08–0.11] | 0.011 [0.008–0.016] |
| PASTA | 0.80 [0.77–0.84] | 6.7 | 0.08 [0.07–0.10] | 0.005 [0.002–0.010] |
| SALSA | 0.79 [0.76–0.83] | 8.8 | 0.08 [0.06–0.09] | 0.007 [0.005–0.010] |
| AGGRESCAN | 0.76 [0.72–0.80] | 2.1 | 0.07 [0.05–0.08] | 0.003 [0.001–0.006] |
| 3D profile | 0.75 [0.72–0.79] | 1.9 | 0.07 [0.06–0.09] | 0.008 [0.005–0.011] |
| FoldAmyloid | 0.69 [0.65–0.73] | 1.7 | 0.04 [0.03–0.05] | 0.001 [0.000–0.003] |
| TANGO | 0.67 [0.64–0.71] | 2.1 | 0.05 [0.03–0.06] | 0.003 [0.001–0.006] |
Area Under the Curve (AUC) was obtained from the ROC curves of 9 predictors: AUC cannot be computed for AMYLPRED2 as it provides only a binary prediction. For each method, the global AUC, the AUC for the False Positive Rate range of 0–20% and the AUC for the False Positive Rate range of 0–5% are reported. Numbers in brackets correspond to 95% confidence intervals (95% C.I.) that were obtained using bootstrap replicates [43]. The comparison of MetAmyl AUC and the other methods is summarized by the p.value obtained with Delong's method [49]. For the MetAmyl classifier, results were obtained using a Leave-One-Out Cross Validation.
Prediction performances, based on the training dataset are given for the 10 compared predictors.
| Predictor | ACC [95% CI] | Sensitivity [95% CI] | Specificity [95% CI] | MCC [95% CI] |
| MetAmyl | 0.84 [0.81–0.87] | 0.78 [0.73–0.82] | 0.88 [0.85–0.92] | 0.67 [0.60–0.72] |
| Waltz | 0.79 [0.76–0.82] | 0.73 [0.68–0.77] | 0.83 [0.79–0.88] | 0.57 [0.50–0.63] |
| PAFIG | 0.69 [0.65–0.72] | 0.84 [0.80–0.89] | 0.57 [0.53–0.63] | 0.42 [0.36–0.49] |
| PASTA | 0.71 [0.67–0.74] | 0.38 [0.32–0.44] | 0.94 [0.92–0.97] | 0.41 [0.34–0.47] |
| SALSA | 0.69 [0.66–0.73] | 0.84 [0.80–0.89] | 0.59 [0.54–0.64] | 0.43 [0.37–0.50] |
| AGGRESCAN | 0.55 [0.51–0.59] | 0.92 [0.89–0.95] | 0.29 [0.24–0.34] | 0.26 [0.20–0.32] |
| 3D profile | 0.66 [0.63–0.70] | 0.59 [0.53–0.65] | 0.71 [0.67–0.75] | 0.31 [0.23–0.37] |
| FoldAmyloid | 0.61 [0.58–0.65] | 0.87 [0.83–0.91] | 0.43 [0.38–0.48] | 0.32 [0.26–0.39] |
| TANGO | 0.69 [0.66–0.73] | 0.52 [0.46–0.58] | 0.82 [0.78–0.86] | 0.36 [0.29–0.43] |
| AMYLPRED22 | 0.79 [0.76–0.82] | 0.65 [0.60–0.71] | 0.88 [0.85–0.92] | 0.57 [0.50–0.63] |
For each method, the accuracy, the sensitivity, the specificity and the Matthews correlation coefficients (MCC) are reported. Numbers in brackets correspond to 95% confidence intervals (95% C.I.) that were obtained using bootstrap replicates (Robin et al., 2011). For the MetAmyl classifier, results were obtained using a Leave-One-Out Cross Validation.
Evaluation of the performance of the tool MetAmyl on a subset of 33 proteins of the amylome.
| Predictor | TP | TN | FP | FN | Sensitivity (%) | Specificity (%) | Q (%) | MCC | F1 |
| MetAmyl | 508 | 5519 | 1064 | 740 | 40.71 | 83.84 | 62.27 | 0.23 | 0.36 |
| Waltz | 710 | 4300 | 2273 | 548 | 56.43 | 65.42 | 60.93 | 0.16 | 0.33 |
| PAFIG | 651 | 4695 | 1878 | 607 | 51.75 | 71.43 | 61.59 | 0.18 | 0.34 |
| PASTA | 230 | 6099 | 484 | 1018 | 18.43 | 92.65 | 55.54 | 0.14 | 0.23 |
| SALSA | 869 | 3123 | 3460 | 379 | 69.63 | 47.44 | 58.54 | 0.13 | 0.31 |
| AGGRESCAN | 445 | 5210 | 1363 | 813 | 35.37 | 79.26 | 57.32 | 0.13 | 0.29 |
| 3D profile | 224 | 5762 | 821 | 1024 | 17.95 | 87.53 | 52.74 | 0.06 | 0.20 |
| FoldAmyloid | 340 | 5659 | 924 | 908 | 27.24 | 85.96 | 56.60 | 0.13 | 0.27 |
| TANGO | 172 | 6282 | 291 | 1086 | 13.67 | 95.57 | 54.62 | 0.14 | 0.20 |
| AMYLPRED2 | 478 | 5512 | 1071 | 770 | 38.30 | 83.73 | 61.02 | 0.20 | 0.34 |
MetAmyl is compared to 9 other methods on a subset of 33 proteins (Tsolis et al., 2013).
Evaluation of the performance of the tool MetAmyl on scrambled sequences from the 17- amino acid N-terminal segment of the Huntingtin protein.
| Predictor | TP | TN | FP | FN | Sensitivity (%) | Specificity (%) | Q (%) | MCC | F1 |
| MetAmyl | 4 | 10 | 0 | 2 | 66.67 | 100 | 83.33 | 0.75 | 0.8 |
| Waltz | 2 | 10 | 0 | 4 | 33.33 | 100 | 66.67 | 0.49 | 0.5 |
| PAFIG | 5 | 3 | 7 | 1 | 83.33 | 30 | 56.67 | 0.15 | 0.56 |
| PASTA | 5 | 2 | 8 | 1 | 83.33 | 20 | 51.67 | 0.04 | 0.53 |
| SALSA | 5 | 6 | 4 | 1 | 83.33 | 60 | 71.67 | 0.42 | 0.67 |
| AGGRESCAN | 6 | 1 | 9 | 0 | 100 | 10 | 55 | 0.2 | 0.57 |
| 3D profile | 4 | 0 | 10 | 2 | 66.67 | 0 | 33.33 | −0.49 | 0.4 |
| FoldAmyloid | 6 | 1 | 9 | 0 | 100 | 10 | 55 | 0.2 | 0.57 |
| Tango | 2 | 7 | 3 | 4 | 33.33 | 70 | 51.67 | 0.03 | 0.36 |
| AMYLPRED2 | 6 | 5 | 5 | 0 | 100 | 50 | 75 | 0.52 | 0.71 |
MetAmyl is compared to 9 other methods on a set of 16 amino acid segments obtained from the Huntingtin protein [48].
Figure 2Metamyl predictions applied to human fibrinogen-.
The effect of mutations is reported on a diagram where each column represents the difference of scores between the mutant and the corresponding wild-type sequence. The analysis is limited to mutations affecting the fragment of 80 amino acids found in amyloid fibrils, which is the region 500–580 of the mature protein. In red are variants involved in renal amyloidosis. In blue are non-pathological variants.