| Literature DB >> 35115948 |
Yixiao Zhai1, Jingyu Zhang2, Tianjiao Zhang1, Yue Gong1, Zixiao Zhang1, Dandan Zhang3, Yuming Zhao1.
Abstract
Antioxidant proteins can not only balance the oxidative stress in the body, but are also an important component of antioxidant drugs. Accurate identification of antioxidant proteins is essential to help humans fight diseases and develop new drugs. In this paper, we developed a friendly method AOPM to identify antioxidant proteins. 188D and the Composition of k-spaced Amino Acid Pairs were adopted as the feature extraction method. In addition, the Max-Relevance-Max-Distance algorithm (MRMD) and random forest were the feature selection and classifier, respectively. We used 5-folds cross-validation and independent test dataset to evaluate our model. On the test dataset, AOPM presented a higher performance compared with the state-of-the-art methods. The sensitivity, specificity, accuracy, Matthew's Correlation Coefficient and an Area Under the Curve reached 87.3, 94.2, 92.0%, 0.815 and 0.972, respectively. In addition, AOPM still has excellent performance in predicting the catalytic enzymes of antioxidant drugs. This work proved the feasibility of virtual drug screening based on sequence information and provided new ideas and solutions for drug development.Entities:
Keywords: MRMD; antioxidant drugs; antioxidant proteins; drug screening and discovery; random forest
Year: 2022 PMID: 35115948 PMCID: PMC8803896 DOI: 10.3389/fphar.2021.818115
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
FIGURE 1The function structure of AOPM.
Antioxidant protein datasets information.
| Dataset | Sample | Class | Positive num | Negative num |
|---|---|---|---|---|
| Train dataset | 1810 | 2 | 568 | 1242 |
| Test dataset | 452 | 2 | 142 | 310 |
The UniProt ID of 36 protein sequences.
| UniProt ID | Drug | Type |
|---|---|---|
| P47989 | Carvedilol, Allopurinol | enzyme |
| P16662 | Carvedilol | enzyme |
| P06133 | Carvedilol | enzyme |
| P22309 | Carvedilol, Silibinin | enzyme |
| Q16881 | Ascorbic acid, Selenium | enzyme |
| P00441 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| Q96I15 | Selenium | enzyme |
| P16435 | Lipoic acid | enzyme |
| P15559 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P05164 | Melatonin | enzyme |
| P78329 | Tocopherol, alpha-Tocopherol acetate | enzyme |
| P14902 | Melatonin | enzyme |
| P09601 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P46597 | Melatonin | enzyme |
| P05091 | Nitric Oxide | enzyme |
| Q06278 | Allopurinol | enzyme |
| Q03154 | Acetylcysteine | enzyme |
| P11511 | Melatonin | enzyme |
| P04798 | Melatonin, Resveratrol, Carvedilol | enzyme |
| P05177 | Nitric Oxide, Pentoxifylline, Melatonin, Resveratrol, Carvedilol | enzyme |
| Q16678 | Melatonin, Resveratrol | enzyme |
| O43174 | Vitamin A | enzyme |
| P20813 | Nitric Oxide | enzyme |
| P33261 | Melatonin, Dimethyl sulfoxide | enzyme |
| P10632 | Quercetin | enzyme |
| P11712 | Melatonin, Carvedilol | enzyme |
| P10635 | Dimethyl sulfoxide, Anisodamine, Carvedilol | enzyme |
| P05181 | Carvedilol | enzyme |
| P08684 | Vitamin E, Nitric Oxide, Dimethyl sulfoxide, Resveratrol, Tocopherol, alpha-Tocopherol acetate, Carvedilol | enzyme |
| P48506 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P00390 | Selenium | enzyme |
| P09210 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P21266 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P78417 | Vitamin E, alpha-Tocopherol succinate | enzyme |
| P09211 | Vitamin E, alpha-Tocopherol succinate | enzyme |
FIGURE 2The number of catalytic enzymes contained in different antioxidants. The catalytic enzymes of antioxidant drugs are diverse. For example, for antioxidant drugs such as Vitamin E and Carvedilol, the number of enzymes that can catalyze is as high as 9 types. Of course, some antioxidant drugs can only be catalyzed by a specific enzyme, Such as Anisodamine, Silibinin, and Lipoic acid.
Ingredients contained in the 188-dimensional feature of a protein.
| Physicochemical property | Dimensions |
|---|---|
| Amino acid composition | 20 |
| Hydrophobicity | 21 |
| Van der Waals volume | 21 |
| Polarity | 21 |
| Polarizability | 21 |
| Charge | 21 |
| Surface tension | 21 |
| Secondary structure | 21 |
| Solvent accessibility | 21 |
| Total | 188 |
List of the 3 categories divided according to the physical and chemical properties of proteins.
| Physicochemical property | Ⅰ | Ⅱ | Ⅲ |
|---|---|---|---|
| Hydrophobicity | RKEDQN | GASTPHY | CVLIMFW |
| Van der Waals volume | GASCTPD | NVEQIL | MHKFRYW |
| Polarity | LIFWCMVY | PATGS | HQRKNED |
| Polarizability | GASDT | CPNVEQIL | KMHFRYW |
| Charge | KR | ANCQGHILMFPSTWYV | DE |
| Surface tension | GQDNAHR | KTSEC | ILMFPWYV |
| Secondary structure | EALMQKRH | VIYCWFT | GNPSD |
| Solvent accessibility | ALFCGIVW | RKQEND | MPSTHY |
Classification results of different under-sampling methods on the train dataset.
| Feature extration methods | Performance metrics (%) | ||||
|---|---|---|---|---|---|
| SN | SP | ACC | MCC | AUC | |
| 188D |
| 0.917 | 0.897 | 0.795 | 0.964 |
| 188D + CKSAAP (g = 0) | 0.840 | 0.945 | 0.893 | 0.79 | 0.961 |
| 188D + CKSAAP (g = 1) | 0.849 | 0.942 | 0.895 | 0.794 | 0.960 |
| 188D + CKSAAP (g = 2) | 0.833 | 0.947 | 0.890 | 0.785 | 0.961 |
| 188D + CKSAAP (g = 3) | 0.836 |
|
|
|
|
| 188D + CKSAAP (g = 4) | 0.833 | 0.945 | 0.889 | 0.783 | 0.961 |
| 188D + CKSAAP (g = 5) | 0.827 | 0.942 | 0.885 | 0.774 | 0.959 |
Bold values indicates the highest value of each indicator.
Classification results of different under-sampling methods on the train dataset.
| Under-sampling methods | Performance metrics (%) | ||||
|---|---|---|---|---|---|
| SN | SP | ACC | MCC | AUC | |
| OSS | 0.919 | 0.802 | 0.872 | 0.732 | 0.951 |
| CNNTomekLink |
| 0.631 | 0.868 | 0.641 | 0.892 |
| CPM | 0.944 | 0.576 | 0.836 | 0.584 | 0.892 |
| RUS | 0.836 | 0.960 | 0.898 |
|
|
| Without under-sampling | 0.79 |
|
| 0.79 | 0.96 |
Bold values indicates the highest value of each indicator.
FIGURE 3The classification results of different classifiers on the train dataset. The SP, ACC, MCC, and AUC of random forest were much higher than other traditional classifiers, which were 0.960, 0.898, 0.802, and 0.964, respectively. Compared with the Bagging classifier with the highest SN value, the SN value reaches 0.836, which was nearly lower than the highest value of 0.027.
FIGURE 4The classification results of the state-of-art methods on the test dataset. The SN, ACC, and AUC of AOPM were 0.873, 0.920, and 0.972, respectively. That was much higher than that of AodPred and Zhai, whose SN was higher than 0.99 and 0.204, respectively.
FIGURE 5The predicted result of the enzyme of antioxidant drugs. The predicted value of protein P00441 reached 0.99, and the predicted value of protein P48506 reached 0.79. According to related literature, protein P00441 and protein P48506 are the catalytic subunits of superoxide dismutase [Cu-Zn] and glutamate-cysteine ligase, respectively. The prediction results of the remaining proteins are also around 0.6.