| Literature DB >> 28453649 |
Víctor López-Ferrando1,2, Andrea Gazzo2,3, Xavier de la Cruz4,5, Modesto Orozco2,3,6, Josep Ll Gelpí1,2,6.
Abstract
We present here a full update of the PMut predictor, active since 2005 and with a large acceptance in the field of predicting Mendelian pathological mutations. PMut internal engine has been renewed, and converted into a fully featured standalone training and prediction engine that not only powers PMut web portal, but that can generate custom predictors with alternative training sets or validation schemas. PMut Web portal allows the user to perform pathology predictions, to access a complete repository of pre-calculated predictions, and to generate and validate new predictors. The default predictor performs with good quality scores (MCC values of 0.61 on 10-fold cross validation, and 0.42 on a blind test with SwissVar 2016 mutations). The PMut portal is freely accessible at http://mmb.irbbarcelona.org/PMut. A complete help and tutorial is available at http://mmb.irbbarcelona.org/PMut/help.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28453649 PMCID: PMC5793831 DOI: 10.1093/nar/gkx313
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.ROC Curves corresponding to PMut2017 10-fold cross-validation based on protein families. No sequence in the validation set has more than 50% identity with sequences in the training set. Additional curves correspond to the subsets of prediction with 85% and 90% confidence. AUC: area under the curve, MCC: Matthews Correlation Coefficient.
Summary of performance metrics for PMut2017 predictor
| Confidence | Coverage | Accuracy | Sensitivity | Specificity | AUC | MCC |
|---|---|---|---|---|---|---|
| All | 100 | 0.82 | 0.76 | 0.86 | 0.81 | 0.62 |
| >85% a | 85.9 | 0.85 | 0.75 | 0.92 | 0.83 | 0.69 |
| >90% a | 64.9 | 0.90 | 0.80 | 0.95 | 0.87 | 0.77 |
aOnly predictions with scores corresponding to higher confidence levels are considered (see Supplementary Figure S4).
Comparative performance of PMut2017 predictor
| Method | Coverage (%) | Accuracy | Specificity | Sensitivity | AUC | MCC |
|---|---|---|---|---|---|---|
| SIFT ( | 89.6 | 0.61 | 0.33 | 0.88 | 0.60 | 0.25 |
| Polyphen2 ( | 92.1 | 0.64 | 0.35 | 0.91 | 0.63 | 0.32 |
| PROVEAN ( | 91.5 | 0.64 | 0.41 | 0.87 | 0.64 | 0.31 |
| FATHMM ( | 90.5 | 0.55 | 0.45 | 0.64 | 0.55 | 0.09 |
| PON-P2 ( | 42.4 | 0.72 | 0.52 | 0.9 | 0.71 | 0.45 |
| CADD ( | 95.0 | 0.65 | 0.33 | 0.94 | 0.64 | 0.35 |
| M-CAP ( | 91.5 | 0.60 | 0.19 | 0.95 | 0.57 | 0.22 |
| Condel ( | 91.0 | 0.63 | 0.40 | 0.84 | 0.62 | 0.26 |
| LRT ( | 95.1 | 0.73 | 0.58 | 0.87 | 0.73 | 0.47 |
| MutationAssessor ( | 95.1 | 0.63 | 0.46 | 0.78 | 0.62 | 0.26 |
| MetaSVM ( | 95.1 | 0.63 | 0.51 | 0.74 | 0.62 | 0.26 |
| MetaLR ( | 95.1 | 0.6 | 0.46 | 0.73 | 0.60 | 0.20 |
| MutationTaster ( | 95.1 | 0.65 | 0.31 | 0.96 | 0.64 | 0.36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Blind validation based on new variants added to SwissVar during 2016 (3166 variants), CADD predictor has been evaluated using a threshold of 20. AUC: area under the ROC curve, MCC: Matthews correlation coefficient.
aAnalysis performed from ANNOVAR data (42).
bAnalysis restricted to most reliable PMut predictions (reliability level in parentheses).
cBlind validation based on variants reported on ClinVar (43), not present in the SwissVar dataset (20,308 variants). Indicated coverage is calculated on ClinVar dataset.
Comparative performance of the PMut2017 predictor on selected genes
| Gene | Disease | #D | #N | PMut | SIFT | Polyphen | LRT | Mut. Taster | Mut. Assessor | PROVEAN |
|---|---|---|---|---|---|---|---|---|---|---|
|
| Rett syndrome | 46 | 22 |
| 0.66 | 0.85 | 0.69 | 0.64 | 0.41 | 0.53 |
|
| Osteogenesis Imperfecta | 78 | 20 |
| 0.74 | 0.62 | 0.55 | 0.55 | 0.74 | 0.74 |
|
| Distal Renal Tubular Acidosis | 38 | 36 |
| 0.65 | 0.65 | 0.54 | 0.55 | 0.68 | 0.60 |
|
| Upshaw-Schulman syndrome | 43 | 17 |
| 0.76 | 0.46 | 0.00 | 0.71 | 0.54 | 0.62 |
|
| Hereditary cancer-predisposing syndrome | 46 | 54 |
| 0.53 | 0.57 | 0.32 | 0.42 | 0.48 | 0.55 |
|
| Wilson disease | 195 | 25 |
| 0.34 | 0.49 | 0.37 | 0.43 | 0.29 | 0.52 |
|
| Lynch syndrome | 159 | 78 |
| 0.32 | 0.31 | 0.23 | 0.16 | 0.43 | 0.32 |
|
| Primary open angle glaucoma | 57 | 24 |
| 0.37 | 0.45 | 0.38 | 0.50 | 0.47 | 0.49 |
|
| Jeune thoracic dystrophy | 16 | 28 |
| 0.20 | 0.22 | 0.18 | 0.16 | 0.28 | 0.26 |
|
| Brugada syndrome | 154 | 46 |
| 0.32 | 0.26 | 0.43 | 0.31 | 0.34 | 0.34 |
|
| Congenital long QT syndrome | 270 | 54 |
| 0.32 | 0.28 | 0.36 | 0.32 | 0.30 | 0.38 |
|
| Tangier disease | 32 | 31 |
| 0.43 | 0.31 | 0.32 | 0.47 | 0.43 | 0.47 |
|
| Polycystic kidney disease | 197 | 96 |
| 0.43 | 0.37 | 0.30 | 0.41 | 0.36 | 0.45 |
|
| Marfan syndrome | 385 | 20 |
| 0.31 | 0.25 | 0.21 | 0.33 | 0.32 | 0.30 |
|
| Central core disease | 147 | 25 |
| 0.27 | 0.31 | 0.00 | 0.36 | 0.28 | 0.34 |
|
| Familial hypercholesterolemia | 103 | 23 |
| 0.29 | 0.08 | 0.17 | 0.09 | 0.26 | 0.25 |
|
| Limb-Girdle Muscular Dystrophy | 48 | 16 |
| 0.35 | 0.27 | 0.15 | 0.21 | 0.41 | 0.39 |
|
| Breast-ovarian cancer, familial 2 | 43 | 61 |
| 0.10 | 0.18 | 0.18 | 0.14 | 0.19 | 0.01 |
|
| Breast-ovarian cancer, familial 1 | 27 | 36 |
| 0.24 | 0.20 | 0.38 | 0.29 | 0.30 | 0.17 |
|
| WFS1-Related Spectrum Disorders | 40 | 17 |
| 0.25 | 0.35 | 0.20 | 0.18 | 0.16 | 0.26 |
|
| Parkinson Disease | 23 | 39 |
| 0.33 | 0.48 | 0.40 | 0.41 | 0.44 | 0.30 |
|
| Parkinson Disease | 21 | 24 |
| 0.06 | 0.14 | 0.01 | 0.13 | 0.09 | 0.14 |
|
| Cystic fibrosis | 146 | 32 |
| 0.06 | 0.20 | 0.21 | 0.12 | 0.20 | 0.27 |
|
| Thrombophilia | 36 | 28 |
| -0.15 | -0.08 | 0.07 | 0.14 | 0.08 | -0.01 |
MCC values obtained restraining the analysis to variants on the indicated genes. Analysis for non-PMut methods performed from ANNOVAR data (42). #N Neutral mutations, #D Disease causing mutations.
Statistics of PMut repository (January 2017)
| Proteins available (from human UniRef) | 106 407 |
|---|---|
| Analysed variants | 725 596 928 |
| Analysed variants (>85% prediction reliability) | 586 383 428 (80%) |
| Analysed variants (>90% prediction reliability) | 370 444 279 (51%) |
Figure 2.Partial screenshots of output of Predictor's training section. (A) Comparative plot of the selected protein features. (B) ROCs curves of performance evaluation.