| Literature DB >> 20542913 |
Gilad Wainreb1, Haim Ashkenazy, Yana Bromberg, Alina Starovolsky-Shitrit, Turkan Haliloglu, Eytan Ruppin, Karen B Avraham, Burkhard Rost, Nir Ben-Tal.
Abstract
The discrimination between functionally neutral amino acid substitutions and non-neutral mutations, affecting protein function, is very important for our understanding of diseases. The rapidly growing amounts of experimental data enable the development of computational tools to facilitate the annotation of these substitutions. Here, we describe a Random Forests-based classifier, named Mutation Detector (MuD) that utilizes structural and sequence-derived features to assess the impact of a given substitution on the protein function. In its automatic mode, MuD is comparable to alternative tools in performance. However, the uniqueness of MuD is that user-reported protein-specific structural and functional information can be added at run-time, thereby enhancing the prediction accuracy further. The MuD server, available at http://mud.tau.ac.il, assigns a reliability score to every prediction, thus offering a useful tool for the prioritization of substitutions in proteins with an available 3D structure.Entities:
Mesh:
Year: 2010 PMID: 20542913 PMCID: PMC2896130 DOI: 10.1093/nar/gkq528
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The performance of MuD, SIFT, PolyPhen and SNAP on the Sub-BR, and the Sub-PMD data sets
| Sub-BR data set | Sub-PMD data set | |||||||
|---|---|---|---|---|---|---|---|---|
| MuD | SNAP | SIFT | PolyPhen | MuD | SNAP | SIFT | PolyPhen | |
| Precision | 72.8 ± 0.5 | 68.7 ± 0.4 | 69.6 ± 0.5 | 66.6 ± 0.5 | 73.3 ± 0.4 | 69.5 ± 0.4 | 70.6 ± 0.5 | 67.8 ± 0.5 |
| True positive rate | 76.6 ± 0.4 | 79.9 ± 0.4 | 73.5 ± 0.5 | 73.5 ± 0.5 | 76.6 ± 0.4 | 79.9 ± 0.4 | 73.6 ± 0.5 | 73.5 ± 0.5 |
| Specificity | 71.8 ± 0.5 | 64.1 ± 0.5 | 68.4 ± 0.5 | 63.9 ± 0.5 | 58.8 ± 0.6 | 48.3 ± 0.6 | 55.9 ± 0.6 | 48.9 ± 0.7 |
| MCC | 49.5 ± 0.4 | 48.3 ± 0.8 | 42.9 ± 0.1 | 39.2 ± 0.8 | 45.2 ± 0.1 | 41.0 ± 1.1 | 36.0 ± 1.0 | 29.7 ± 1.1 |
| ROC AUC | 81.9 ± 0.3 | 78.8 ± 0.3 | NR | NR | 74.8 ± 0.4 | 70.9 ± 0.4 | NR | NR |
The average and SD of the performance measures were obtained by a bootstrap procedure run for 1000 iterations performed on the cross-validation predictions. The results on the Sub-PMD data set are a subset of the results obtained during the cross-validation on the entire data set. According to the MCC, MuD and SNAP perform better than SIFT and PolyPhen both on the entire data set and on the PMD subset. According to the ROC AUC, MuD performed better than SNAP. However, according to the MCC both methods exhibited similar performance. Although all methods exhibited a decline in the performance on the Sub-PMD data set relative to the performance on the Sub-BR data set, MuD surpassed all methods on the Sub-PMD data set. Values in the table have been multiplied by 100.
Introduction of reliable structural data improves the prediction performance
| HIV protease | T4 lysozyme | Lac repressor | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SA MuD | MuD | SNAP | SIFT | SA MuD | MuD | SNAP | SIFT | PP | SA MuD | MuD | SNAP | SIFT | PP | |
| Precision | 67.6 | 61.6 | 54.3 | 66.0 | 37.0 | 24.4 | 34.9 | 29.8 | 31.1 | 79.2 | 81.83 | 64.1 | 65.2 | 83.9 |
| TP rate | 93.2 | 95.3 | 98.0 | 93.2 | 92.1 | 97.1 | 81.0 | 91.4 | 89.9 | 77.9 | 77.83 | 74.6 | 70.1 | 67.9 |
| Specificity | 58.0 | 44.3 | 22.3 | 54.8 | 66.5 | 35.9 | 67.8 | 54.1 | 58.0 | 83.8 | 83.00 | 66.9 | 70.3 | 85.7 |
| MCC | ||||||||||||||
| ROC AUC | ||||||||||||||
The performance of PolyPhen (PP), SNAP, SIFT, the semi-automatic (SA MuD) and the fully-automatic MuD (MuD) on the 3-PRO data set. The performance of the automatic predictors appears to depend on the evaluation criterion and data set. Nevertheless as indicated by all measures, the application of the semi-automatic prediction scheme improved the performance, thus surpassing all the fully-automatic methods. In the semi-automatic prediction, we removed from the solved crystal structures of the query proteins, the structures of the non-naturally present ligands and selected the appropriate dimerization state for each of the proteins. Values in the table have been multiplied by 100.
aPolyPhen did not produce predictions for the HIV-1 protease.