| Literature DB >> 26241652 |
Carlos Família1, Sarah R Dennison2, Alexandre Quintas3, David A Phoenix4.
Abstract
Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26241652 PMCID: PMC4524629 DOI: 10.1371/journal.pone.0134679
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Training sequences dataset classification results (%) for the selected neural networks obtained through APDBase or AAindex encoding, after feature selection with one of the internal classifiers, rf, nb, svm, sda and spls.
Where SI is the sensitivity, SP the specificity, PPV the positive predictive value, NPV the negative predictive value and AC the overall accuracy, averaged after 10-fold stratified resampling.
| SI | SP | PPV | NPV | AC | ||
|---|---|---|---|---|---|---|
|
|
| 82.6 | 85.0 | 82.3 | 85.1 | 83.8 |
|
| 77.1 | 77.7 | 74.1 | 80.3 | 77.4 | |
|
| 87.9 | 86.5 | 84.3 | 89.0 | 86.8 | |
|
| 73.5 | 80.0 | 75.4 | 78.9 | 76.7 | |
|
| 80.4 | 79.1 | 76.5 | 82.4 | 79.7 | |
|
|
| 84.0 | 86.4 | 83.9 | 86.4 | 85.1 |
|
| 87.4 | 70.2 | 71.2 | 86.9 | 78.0 | |
|
| 80.1 | 76.9 | 75.1 | 82.8 | 78.7 | |
|
| 86.8 | 95.6 | 94.9 | 90.3 | 91.9 | |
|
| 81.0 | 83.3 | 80.1 | 84.2 | 81.7 |
External validation sequences dataset classification results (%) for the selected neural networks obtained through APDBase and AAindex encoding, after feature selection with one of the internal classifiers, rf, nb, svm, sda and spls.
Where SI is the sensitivity, SP the specificity, PPV the positive predictive value, NPV the negative predictive value and AC the overall accuracy, averaged after 10-fold stratified resampling.
| SI | SP | PPV | NPV | AC | ||
|---|---|---|---|---|---|---|
|
|
| 89.4 | 67.8 | 86.8 | 73.5 | 83.0 |
|
| 78.7 | 82.5 | 91.3 | 62.4 | 79.9 | |
|
| 91.8 | 40.1 | 78.7 | 68.2 | 76.6 | |
|
| 91.9 | 16.3 | 72.7 | 46.2 | 69.8 | |
|
| 94.1 | 17.9 | 73.1 | 57.6 | 71.4 | |
|
|
| 90.8 | 62.6 | 85.3 | 73.4 | 82.2 |
|
| 87.4 | 78.9 | 90.8 | 72.3 | 84.9 | |
|
| 95.6 | 12.6 | 72.4 | 51.0 | 71.2 | |
|
| 93.5 | 15.7 | 72.6 | 52.1 | 70.4 | |
|
| 93.9 | 12.5 | 72.1 | 48.9 | 70.0 |
Results obtained for the classification of the training sequences dataset for each predictor, where TP corresponds to the number of true positives, TN to the number of true negatives, FP to the number of false positives and FN to the number of false negatives.
The values of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy, with corresponding 95% confidence intervals, were obtained using bootstrap replicates. The p-value corresponds to the p-value obtained for the comparison of the accuracy values between the APPNN and each given other predictors using the Wilcoxon-Nemenyi-McDonald-Thompson post-hoc test performed after 10-fold stratified resampling.
| TP | TN | FP | FN | Sensitivity [95% CI] | Specificity [95% CI] | PPV [95% CI] | NPV [95% CI] | Accuracy [95% CI] | p-value | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 118 | 113 | 48 | 17 | 87.4 [80.6, 92.0] | 70.2 [62.7, 77.0] | 71.1 [63.7, 77.7] | 86.9 [79.9, 91.9] | 78.0 [72.6, 82.4] | - |
|
| 77 | 111 | 50 | 58 | 57.0 [48.5, 65.0] | 68.9 [60.8, 75.6] | 60.6 [51.6, 68.4] | 65.7 [58.1, 73.0] | 63.5 [57.1, 68.2] | 0.03 |
|
| 68 | 141 | 20 | 67 | 50.4 [42.0, 58.7] | 87.6 [81.4, 91.8] | 77.3 [67.9, 84.7] | 67.8 [61.4, 74.6] | 70.6 [64.5, 75.3] | 0.52 |
|
| 77 | 141 | 20 | 58 | 57.0 [48.3, 64.7] | 87.6 [81.2, 91.9] | 79.4 [70.2, 86.3] | 70.9 [63.9, 77.0] | 73.6 [67.6, 78.0] | 1.00 |
|
| 114 | 55 | 106 | 21 | 84.4 [77.7, 89.8] | 34.2 [26.9, 42.1] | 51.8 [44.6, 58.3] | 72.4 [61.3, 81.8] | 57.1 [50.7, 62.4] | 1.48E-04 |
|
| 104 | 130 | 31 | 31 | 77.0 [69.7, 83.6] | 80.7 [74.0, 86.3] | 77.0 [69.4, 83.6] | 80.7 [74.5, 86.4] | 79.1 [74.0, 83.1] | 1.00 |
|
| 115 | 94 | 67 | 20 | 85.2 [78.5, 90.4] | 58.4 [50.4, 65.7] | 63.2 [55.8, 69.6] | 82.5 [74.8, 88.4] | 70.6 [64.7, 75.0] | 0.49 |
|
| 84 | 130 | 31 | 51 | 62.2 [53.8, 70.2] | 80.7 [73.8, 86.6] | 73.0 [63.4, 80.2] | 71.8 [64.9, 78.0] | 72.3 [66.2, 76.7] | 0.94 |
|
| 82 | 127 | 34 | 53 | 60.7 [52.1, 68.6] | 78.9 [71.9, 84.6] | 70.7 [61.0, 78.0] | 70.6 [63.5, 76.8] | 70.6 [64.3, 75.0] | 0.54 |
|
| 6 | 158 | 3 | 129 | 4.4 [1.5, 9.1] | 98.1 [94.8, 99.4] | 66.7 [20.0, 92.9] | 55.1 [49.3, 60.9] | 55.4 [49.3, 60.8] | 3.08E-05 |
|
| 91 | 129 | 32 | 44 | 67.4 [59.6, 75.0] | 80.1 [73.4, 85.5] | 74.0 [65.9, 81.1] | 74.6 [67.8, 81.3] | 74.3 [68.9, 78.7] | 0.96 |
|
| 100 | 91 | 70 | 35 | 74.1 [65.6, 79.8] | 56.5 [48.4, 63.6] | 58.8 [51.3, 65.8] | 72.2 [63.6, 79.4] | 64.5 [58.4, 69.3] | 0.01 |
Results obtained for the classification of the external validation sequence dataset for each predictor, where TP corresponds to the number of true positives, TN to the number of true negatives, FP to the number of false positives and FN to the number of false negatives.
The values of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy, with corresponding 95% confidence intervals, were obtained using bootstrap replicates. The p-value corresponds to the p-value obtained for the comparison of the accuracy values between the APPNN and each given predictor using the Wilcoxon-Nemenyi-McDonald-Thompson post-hoc test performed after 10-fold stratified resampling.
| TP | TN | FP | FN | Sensitivity [95% CI] | Specificity [95% CI] | PPV [95% CI] | NPV [95% CI] | Accuracy [95% CI] | p-value | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 298 | 112 | 30 | 43 | 87.4 [83.4, 90.6] | 78.9 [70.9, 85.0] | 90.9 [87.2, 93.6] | 72.3 [64.9, 78.8] | 84.9 [81.2, 87.6] | - |
|
| 284 | 97 | 45 | 57 | 83.3 [78.9, 87.0] | 68.3 [59.7, 75.7] | 86.3 [82.1, 89.7] | 63.0 [55.1, 70.6] | 78.9 [74.9, 82.4] | 0.90 |
|
| 248 | 120 | 22 | 93 | 72.7 [68.1, 77.3] | 84.5 [77.9, 89.9] | 91.9 [88.2, 94.7] | 56.3 [49.3, 63.0] | 76.2 [72.3, 79.7] | 0.51 |
|
| 271 | 122 | 20 | 70 | 79.5 [75.3, 83.7] | 85.9 [79.6, 90.9] | 93.1 [89.9, 95.7] | 63.5 [56.4, 70.3] | 81.4 [77.6, 84.7] | 1.00 |
|
| 306 | 74 | 68 | 35 | 89.7 [86.3, 92.5] | 52.1 [43.9, 60.0] | 81.8 [77.6, 85.5] | 67.9 [58.7, 76.2] | 78.7 [74.9, 82.0] | 0.73 |
|
| 296 | 107 | 35 | 45 | 86.8 [82.9, 90.1] | 75.4 [67.9, 81.7] | 89.4 [85.5, 92.4] | 70.4 [62.5, 77.2] | 83.4 [79.5, 86.3] | 1.00 |
|
| 331 | 68 | 74 | 10 | 97.1 [94.8, 98.6] | 47.9 [40.2, 56.3] | 81.7 [77.3, 85.3] | 87.2 [78.4, 93.8] | 82.6 [78.7, 85.7] | 1.00 |
|
| 224 | 124 | 18 | 117 | 65.7 [60.8, 70.6] | 87.3 [81.1, 91.9] | 92.6 [88.6, 95.3] | 51.5 [45.2, 57.9] | 72.0 [67.7, 75.6] | 0.03 |
|
| 208 | 129 | 13 | 133 | 61.0 [55.7, 65.7] | 90.8 [85.1, 94.7] | 94.1 [90.5, 96.8] | 49.2 [43.2, 55.6] | 69.8 [65.4, 73.5] | 2.32E-03 |
|
| 191 | 132 | 10 | 150 | 56.0 [50.4, 60.9] | 93.0 [87.7, 96.5] | 95.0 [91.1, 97.5] | 46.8 [41.1, 53.0] | 66.9 [62.3, 70.8] | 4.07E-05 |
|
| 168 | 132 | 10 | 173 | 49.3 [43.8, 54.5] | 93.0 [87.7, 96.5] | 94.4 [90.2, 97.1] | 43.3 [37.7, 49.2] | 62.1 [57.3, 66.0] | 2.42E-07 |
|
| 228 | 116 | 26 | 113 | 66.9 [62.0, 72.0] | 81.7 [74.8, 87.5] | 89.8 [85.4, 93.0] | 50.7 [44.4, 57.5] | 71.2 [67.2, 75.4] | 0.01 |