| Literature DB >> 23327628 |
Jerzy Stanislawski1, Malgorzata Kotulska, Olgierd Unold.
Abstract
BACKGROUND: Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23327628 PMCID: PMC3566972 DOI: 10.1186/1471-2105-14-21
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Machine learning performance
| 0.78 | 0.95 | 0.91 | 0.96 | |
| 0.78 | 0.95 | 0.91 | 0.96 | |
| 0.53 | 0.98 | 0.88 | 0.95 | |
| 0.64 | 0.96 | 0.89 | 0.94 | |
| RF | 0.26 | 0.98 | 0.82 | 0.89 |
| FT | 0.73 | 0.94 | 0.90 | 0.85 |
| SVM | 0.76 | 0.95 | 0.91 | 0.86 |
| Part | 0.56 | 0.94 | 0.86 | 0.85 |
| BFTree | 0.67 | 0.91 | 0.86 | 0.82 |
| Ridor | 0.56 | 0.90 | 0.83 | 0.73 |
| Jrip | 0.29 | 0.93 | 0.79 | 0.61 |
The performance evaluation of the machine learning methods. The results are ordered by decreasing AUC.
Figure 1A plot of ROC curves of all methods. A plot of ROC curve for all the methods. Among all methods, MultiLayer Perceptron and Alternating Decision Tree with 250 boosting iterations cover the maximum area under the curve (i.e. 0.96), closely followed by Naive Bayes (AUC of 0.95) and Alternating Decision Tree with 50 boosting iterations (AUC of 0.94). In Table 1 all the corresponding AUC values are reported.
Figure 2ADTree with 50 rules. In notation of the rules, n: AA j, n indicates the rule number (ordered by their significance), j denotes the aminoacid position. Below the rule label, the aminoacid occurrence (or absence marked by “!”) are valued by numbers. Negative numbers denote low aminoacid occurrence. A detailed explanation on how to read the Alternating Decision Tree is given in the main text.
Statistical evaluation
| MLP | 10 | 0 | 0 |
| ADTree 250 | 8 | 1 | 1 |
| Naive Bayes | 6 | 3 | 1 |
| ADTree 50 | 6 | 2 | 2 |
| FT | 3 | 5 | 2 |
| SVM | 3 | 3 | 4 |
| PART | 3 | 3 | 4 |
| RF | 3 | 3 | 4 |
| Ridor | 1 | 1 | 8 |
| BFTree | 1 | 1 | 8 |
| JRip | 0 | 0 | 10 |
Summary of wins, draws and losses of each method confronted with others (with regard to their AUCs).
ZipperDB versus other methods
| 1. FoldAmyloid | 0.77 | 0.36 | 0.86 |
| 2. FoldAmyloid | 0.79 | 0.39 | 0.88 |
| 3. FoldAmyloid | 0.78 | 0.39 | 0.86 |
| 4. FoldAmyloid | 0.78 | 0.38 | 0.87 |
| 6. Waltz | 0.81 | 0.11 | 0.97 |
| 7. Waltz | 0.80 | 0.30 | 0.91 |
| 0.69 | 0.21 | 0.65 |
Compatibility of the 3D profile classification with FoldAmyloid and Waltz.