| Literature DB >> 28962549 |
Juan C Aledo1, Francisco R Cantón2, Francisco J Veredas3.
Abstract
BACKGROUND: The oxidation of protein-bound methionine to form methionine sulfoxide, has traditionally been regarded as an oxidative damage. However, recent evidences support the view of this reversible reaction as a regulatory post-translational modification. The perception that methionine sulfoxidation may provide a mechanism to the redox regulation of a wide range of cellular processes, has stimulated some proteomic studies. However, these experimental approaches are expensive and time-consuming. Therefore, computational methods designed to predict methionine oxidation sites are an attractive alternative. As a first approach to this matter, we have developed models based on random forests, support vector machines and neural networks, aimed at accurate prediction of sites of methionine oxidation.Entities:
Keywords: Machine learning; Methionine sufoxide; Oxidation prediction; Post-translation modification
Mesh:
Substances:
Year: 2017 PMID: 28962549 PMCID: PMC5622526 DOI: 10.1186/s12859-017-1848-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance rates with three different ML models
| Feature set | AUC | Accuracy | Sensitivity | Specificity | F-measure | MCC |
|---|---|---|---|---|---|---|
| TRAINING SET | ||||||
| RF | ||||||
| Primary (52) | 1.0000 | 0.8233 | 1.0000 | 0.7980 | 0.5868 | 0.5756 |
| Tertiary (24) | 0.9958 | 0.7222 | 1.0000 | 0.6823 | 0.4746 | 0.4607 |
| Whole (76) | 1.0000 | 0.8476 | 1.0000 | 0.8258 | 0.6222 | 0.6107 |
| mRMR (54) | 1.0000 | 0.8348 | 1.0000 | 0.8111 | 0.6031 | 0.5918 |
| SVM | ||||||
| Primary (52) | 1.0000 | 0.4955 | 1.0000 | 0.4231 | 0.3322 | 0.2903 |
| Tertiary (24) | 0.9403 | 0.9232 | 0.8571 | 0.9327 | 0.7368 | 0.7024 |
| Whole (76) | 0.9927 | 0.9910 | 0.9592 | 0.9956 | 0.9641 | 0.9590 |
| mRMR (54) | 0.9952 | 0.9821 | 0.9490 | 0.9868 | 0.9300 | 0.9200 |
| NN | ||||||
| Primary (52) | 0.7148 | 0.6492 | 0.6020 | 0.6559 | 0.3010 | 0.1764 |
| Tertiary (24) | 0.7981 | 0.7273 | 0.7143 | 0.7291 | 0.3966 | 0.3132 |
| Whole (76) | 0.7827 | 0.6402 | 0.8061 | 0.6164 | 0.3599 | 0.2822 |
| mRMR (54) | 0.7933 | 0.6786 | 0.8061 | 0.6603 | 0.3863 | 0.3156 |
| TESTING SET | ||||||
| RF | ||||||
| Primary (52) | 0.7002 | 0.5969 | 0.8125 | 0.5664 | 0.3333 | 0.2500 |
| Tertiary (24) | 0.8014 | 0.6357 | 0.8750 | 0.6018 | 0.3733 | 0.3155 |
| Whole (76) | 0.8413 | 0.7597 | 0.8125 | 0.7522 | 0.4561 | 0.3998 |
| mRMR (54) | 0.8462 | 0.7597 | 0.7500 | 0.7611 | 0.4364 | 0.3668 |
| SVM | ||||||
| Primary (52) | 0.5603 | 0.4264 | 0.7500 | 0.3805 | 0.2449 | 0.0894 |
| Tertiary (24) | 0.4701 | 0.2791 | 0.6250 | 0.2301 | 0.1770 | -0.1106 |
| Whole (76) | 0.6831 | 0.7984 | 0.4375 | 0.8496 | 0.3500 | 0.2431 |
| mRMR (54) | 0.7406 | 0.7907 | 0.4375 | 0.8407 | 0.3415 | 0.2320 |
| NN | ||||||
| Primary (52) | 0.5669 | 0.5504 | 0.4375 | 0.5664 | 0.1944 | 0.0026 |
| Tertiary (24) | 0.8291 | 0.7364 | 0.8125 | 0.7257 | 0.4333 | 0.3742 |
| Whole (76) | 0.7959 | 0.6589 | 0.7500 | 0.6460 | 0.3529 | 0.2661 |
| mRMR (54) | 0.8208 | 0.7132 | 0.8750 | 0.6903 | 0.4308 | 0.3839 |
Performance rates for three different ML approaches: mean (sd)
| Feature set | AUC | Accuracy | Sensitivity | Specificity | F-measure | MCC |
|---|---|---|---|---|---|---|
| TRAINING SET | ||||||
| RF | ||||||
| Primary | 1.0000 (0) | 0.8957 (0.0480) | 1 (0) | 0.8807 (0.0546) | 0.7176 (0.0938) | 0.7054 (0.0920) |
| Tertiary | 0.9996 (0.0003) | 0.8316 (0.0591) | 1 (0) | 0.8074 (0.0674) | 0.6096 (0.0898) | 0.5977 (0.0882) |
| Whole | 1.0000 (0) | 0.8948 (0.0533) | 1 (0) | 0.8797 (0.0609) | 0.7192 (0.1053) | 0.7071 (0.1046) |
| mRMR | 1.0000 (0) | 0.8932 (0.0480) | 1 (0) | 0.8777 (0.0550) | 0.7138 (0.0966) | 0.7015 (0.0960) |
| SVM | ||||||
| Primary | 0.9997 (0.0011) | 0.9069 (0.1990) | 0.9990 (0.0034) | 0.8939 (0.2270) | 0.8751 (0.2584) | 0.8670 (0.2747) |
| Tertiary | 0.9924 (0.0090) | 0.7425 (0.1501) | 0.9865 (0.0217) | 0.7077 (0.1729) | 0.5562 (0.2335) | 0.5390 (0.2407) |
| Whole | 0.9992 (0.0025) | 0.9310 (0.1542) | 0.9980 (0.0058) | 0.9210 (0.1772) | 0.8936 (0.2254) | 0.8874 (0.2370) |
| mRMR | 0.9995 (0.0018) | 0.9044 (0.1766) | 0.9982 (0.0043) | 0.8907 (0.2026) | 0.8545 (0.2561) | 0.8463 (0.2690) |
| NN | ||||||
| Primary | 0.9482 (0.0339) | 0.7248 (0.1607) | 0.9377 (0.0416) | 0.6938 (0.1841) | 0.5195 (0.1975) | 0.4835 (0.2133) |
| Tertiary | 0.9336 (0.0227) | 0.7552 (0.1040) | 0.9079 (0.0322) | 0.7334 (0.1195) | 0.5082 (0.1322) | 0.4706 (0.1378) |
| Whole | 0.9616 (0.0247) | 0.8273 (0.1170) | 0.9491 (0.0327) | 0.8098 (0.1333) | 0.6292 (0.1883) | 0.6063 (0.1958) |
| mRMR | 0.9533 (0.0232) | 0.7897 (0.1160) | 0.9373 (0.0314) | 0.7684 (0.1325) | 0.5696 (0.1738) | 0.5413 (0.1822) |
| TESTING SET | ||||||
| RF | ||||||
| Primary | 0.6947 (0.0416) | 0.6207 (0.0666) | 0.6737 (0.1296) | 0.6139 (0.0883) | 0.3026 (0.0439) | 0.1936 (0.0573) |
| Tertiary | 0.7614 (0.0375) | 0.6975 (0.0485) | 0.7064 (0.1029) | 0.6959 (0.0633) | 0.3638 (0.0463) | 0.2781 (0.0547) |
| Whole | 0.7957 (0.0355) | 0.7458 (0.0622) | 0.6849 (0.1195) | 0.7540 (0.0813) | 0.4003 (0.0563) | 0.3205 (0.0625) |
| mRMR | 0.7998 (0.0334) | 0.7468 (0.0567) | 0.6817 (0.0982) | 0.7557 (0.0721) | 0.4003 (0.0562) | 0.3190 (0.0622) |
| SVM | ||||||
| Primary | 0.5660 (0.0431) | 0.5604 (0.0847) | 0.5383 (0.1381) | 0.5641 (0.1112) | 0.2286 (0.0414) | 0.0688 (0.0573) |
| Tertiary | 0.6480 (0.0534) | 0.6434 (0.0825) | 0.5500 (0.1329) | 0.6561 (0.1070) | 0.2741 (0.0459) | 0.1437 (0.0605) |
| Whole | 0.6753 (0.0424) | 0.6441 (0.0704) | 0.6037 (0.1301) | 0.6501 (0.0954) | 0.2924 (0.0417) | 0.1744 (0.0498) |
| mRMR | 0.6700 (0.0450) | 0.6348 (0.0802) | 0.5986 (0.1309) | 0.6398 (0.1047) | 0.2865 (0.0461) | 0.1641 (0.0585) |
| NN | ||||||
| Primary | 0.5601 (0.0479) | 0.5477 (0.0907) | 0.5465 (0.1349) | 0.5474 (0.1178) | 0.2274 (0.0411) | 0.0637 (0.0567) |
| Tertiary | 0.6887 (0.0470) | 0.6662 (0.0687) | 0.5998 (0.1412) | 0.6745 (0.0907) | 0.3047 (0.0523) | 0.1907 (0.0658) |
| Whole | 0.6846 (0.0469) | 0.6650 (0.0680) | 0.5793 (0.1194) | 0.6765 (0.0886) | 0.2981 (0.0453) | 0.1791 (0.0581) |
| mRMR | 0.6903 (0.0486) | 0.6573 (0.0696) | 0.6101 (0.1224) | 0.6640 (0.0903) | 0.3044 (0.0474) | 0.1900 (0.0627) |
Fig. 1Performance rates distributions for bootstrapping resamples. Box-plots of the performance rates on the testing sets (after ROC’s cut-off probability adjustment on the evaluation sets) for bootstrapping resamples. Data set mRMR 54 features. Number of resamples = 100
Models comparison. T-test p-value from bootstrap results on the testing sets
| Feature set | RF-SVM | RF-NN | SVM-NN |
|---|---|---|---|
| AUC | |||
| Primary | 1.337807e-53 | 1.656090e-52 | 3.629288e-01 |
| Tertiary | 7.466593e-08 | 7.749183e-10 | 3.076722e-01 |
| Whole | 1.620777e-11 | 1.207725e-10 | 6.687422e-01 |
| mRMR | 5.736385e-04 | 1.122952e-05 | 3.027066e-01 |
| Accuracy | |||
| Primary | 7.466593e-08 | 7.749183e-10 | 3.076722e-01 |
| Tertiary | 1.620777e-11 | 1.207725e-10 | 6.687422e-01 |
| Whole | 5.736385e-04 | 1.122952e-05 | 3.027066e-01 |
| mRMR | 1.110837e-35 | 7.810002e-38 | 5.302538e-01 |
| Sensitivity | |||
| Primary | 4.838807e-26 | 9.923600e-27 | 8.419212e-01 |
| Tertiary | 7.067182e-08 | 2.630463e-04 | 3.507737e-02 |
| Whole | 3.771161e-17 | 6.241079e-09 | 1.096249e-02 |
| mRMR | 1.650447e-03 | 5.410619e-02 | 1.924156e-01 |
| Specificity | |||
| Primary | 7.035627e-39 | 8.036713e-20 | 3.721847e-07 |
| Tertiary | 1.059365e-30 | 6.066923e-15 | 1.807435e-05 |
| Whole | 1.072350e-21 | 9.069624e-16 | 3.319756e-02 |
| mRMR | 7.569818e-06 | 2.475176e-09 | 1.699726e-01 |
| F-measure | |||
| Primary | 1.900064e-14 | 8.911586e-10 | 4.341598e-02 |
| Tertiary | 1.385330e-43 | 3.632520e-39 | 5.440616e-01 |
| Whole | 8.253875e-35 | 1.802488e-31 | 3.612268e-01 |
| mRMR | 5.984561e-23 | 4.366711e-19 | 3.520361e-02 |
| MCC | |||
| Primary | 9.137039e-07 | 8.985178e-06 | 5.212807e-01 |
| Tertiary | 1.701737e-16 | 1.821765e-13 | 8.146449e-02 |
| Whole | 6.201029e-44 | 2.659090e-33 | 2.914253e-03 |
| mRMR | 4.996287e-36 | 3.033082e-28 | 7.392408e-03 |
Fig. 2Variable Importance. Box-plots of the GI of the 20 most relevant predictors for the RF classifier. From top to bottom: variables in decreasing order of average GI (100 bootstrapping resamples). Dataset: mRMR 54 features
Model tuning. Best hyper-parameters
| Feature set | RF | SVM | NN | |||
|---|---|---|---|---|---|---|
| mtry | Number of trees | Sigma | C | Size | Decay | |
| Primary | 7 | 1000 | 0.01124415 | 8 | 15 0.003162278 | |
| Tertiary | 4 | 1000 | 0.04226239 | 8 | 3 | 0.0001995262 |
| Whole | 8 | 1000 | 0.007670497 | 4 | 1 | 0.001584893 |
| mRMR | 7 | 1000 | 0.01050984 | 4 | 19 | 0.001584893 |
Fig. 3ROC curves. From top to bottom: ROC curves and AUC values computed on the evaluation patterns for the RF, SVM and NN models, respectively. The point in each curve that gives the best balance between sensitivity and specificity rates has been marked and annotated with the corresponding “alternative” threshold and efficacy values. Solid black box: AUC = 1 reference area. Dashed gray line: smoothed ROC curve. Solid gray line: random guess
Performance rates for RF with two alternative ROC cutoffs
| Feature set | Accuracy | Sensitivity | Specificity | F-measure | MCC |
|---|---|---|---|---|---|
| Alternative cutoff: 0.392 | |||||
| Primary (52) | 0.5969 | 0.8125 | 0.5664 | 0.3333 | 0.2500 |
| Tertiary (24) | 0.6357 | 0.8750 | 0.6018 | 0.3733 | 0.3155 |
| Whole (76) | 0.7597 | 0.8125 | 0.7522 | 0.4561 | 0.3998 |
| mRMR (54) | 0.7597 | 0.7500 | 0.7611 | 0.4364 | 0.3668 |
| Standard cutoff: 0.5 | |||||
| Primary (52) | 0.8062 | 0.1875 | 0.8938 | 0.1935 | 0.0836 |
| Tertiary (24) | 0.7907 | 0.5625 | 0.8230 | 0.4000 | 0.3044 |
| Whole (76) | 0.8372 | 0.5625 | 0.8761 | 0.4615 | 0.3777 |
| mRMR (54) | 0.8372 | 0.6250 | 0.8673 | 0.4878 | 0.4105 |