| Literature DB >> 35889751 |
Mohammad Farhan Khan1, Gazal Kalyan2, Sohom Chakrabarty3, M Mursaleen4.
Abstract
The recent elevation of cases infected from novel COVID-19 has placed the human life in trepidation mode, especially for those suffering from comorbidities. Most of the studies in the last few months have undeniably raised concerns for hypertensive patients that face greater risk of fatality from COVID-19. Furthermore, one of the recent WHO reports has estimated a total of 1.13 billion people are at a risk of hypertension of which two-thirds live in low and middle income countries. The gradual escalation of the hypertension problem andthe sudden rise of COVID-19 cases have placed an increasingly higher number of human lives at risk in low and middle income countries. To lower the risk of hypertension, most physicians recommend drugs that have angiotensin-converting enzyme (ACE) inhibitors. However, prolonged use of such drugs is not recommended due to metabolic risks and the increase in the expression of ACE-II which could facilitate COVID-19 infection. In contrast, the intake of optimal macronutrients is one of the possible alternatives to naturally control hypertension. In the present study, a nontrivial feature selection and machine learning algorithm is adopted to intelligently predict the food-derived antihypertensive peptide. The proposed idea of the paper lies in reducing the computational power while retaining the performance of the support vector machine (SVM) by estimating the dominant pattern in the features space through feature filtering. The proposed feature filtering algorithm has reported a trade-off performance by reducing the chances of Type I error, which is desirable when recommending a dietary food to patients suffering from hypertension. The maximum achievable accuracy of the best performing SVM models through feature selection are 86.17% and 85.61%, respectively.Entities:
Keywords: COVID-19; SVM; feature filtration; hypertension; macronutrients
Mesh:
Substances:
Year: 2022 PMID: 35889751 PMCID: PMC9318145 DOI: 10.3390/nu14142794
Source DB: PubMed Journal: Nutrients ISSN: 2072-6643 Impact factor: 6.706
Comparison of accuracy of machine learning models for antihypertensive peptides database using Bayesian optimisation routine.
| Machine Learning Algorithms | Variants | Accuracy (%) | AUC |
|---|---|---|---|
| Decision trees | Fine | 76.9 | 0.66 |
| Coarse | 80.6 | 0.65 | |
| Logistic regression | - | 80.1 | 0.66 |
| Support vector machine | Linear kernel | 80.1 | 0.63 |
| Quadratic kernel | 80.4 | 0.66 | |
| Cubic kernel | 77.8 | 0.64 | |
| RBF kernel | 81.0 | 0.68 | |
| Fine | 78.2 | 0.63 | |
| Cosine | 80.7 | 0.66 |
Figure 1Percent variability explained or information preserved by each feature in the feature space X. Variability in the data by considering: (a) PseAAC feature; (b) structural feature.
Figure 2Confidence score of features represented in the form of bar graph: (a) feature importance of PseAAC, (b) feature importance of structural properties. Peaks in the graph represent higher confidence in predicting the most important feature for the classification process.
The p-values of all the features demonstrating the statistically significant difference between hypertensive and anti-hypertensive peptide.
| Features | Significant | ||
|---|---|---|---|
| PseAAC | A (alanine) | 0.6881 | No |
| C (cysteine) | 0.0023 | Yes † | |
| D (aspartic acid) | 0.8265 | No | |
| E (glutamic acid) | 9.2421 × 10 | Yes | |
| F (phenylalanine) | 0.4242 | No | |
| G (glycine) | 4.3718 × 10 | Yes | |
| H (histidine) | 0.4542 | No | |
| I (isoleucine) | 0.8942 | No | |
| K (lysine) | 0.1785 | No | |
| L (leucine) | 0.8502 | No | |
| M (methionine) | 0.9626 | No | |
| N (asparagine) | 0.3234 | No | |
| P (proline) | 0.0873 | No | |
| Q (glutamine) | 0.6676 | No | |
| R (arginine) | 0.1939 | No | |
| S (serine) | 0.3363 | No | |
| T (threonine) | 0.8461 | No | |
| V (valine) | 0.5726 | No | |
| W (tryptophan) | 0.0066 | Yes | |
| Y (tyrosine) | 1.0596 × 10 | Yes | |
| Sequence order effect | 0.0142 | Yes * | |
| Structural | Molecular weight | 0.0210 | Yes * |
| R | 0.0301 | Yes * | |
|
| 0.0723 | No | |
|
| 0.8902 | No | |
|
| 0.0016 | Yes | |
|
| 0.3122 | No | |
| Volume | 0.0138 | Yes * | |
† For p = 0.01 and p = 0.05, * For p = 0.05 only.
Figure 3Deviation in the accuracy of the SVM model due to a variation in the feature space for systematic combinations of box constraint () and kernel scale (). Using feature selection methods, the following features have been extracted for performance comparison: (a) all features (or reference feature space); (b) PseAAC features; (c) structural features; (d) features extracted from MRMR; (e) features extracted from SIDR (); (f) features extracted from SIDR (); (g) features extracted from MRMR ∩ SIDR; and (h) features extracted from MRMR ∪ SIDR.
Estimation of highest accuracy using Bayesian optimisation routine.
| Features | Best Accuracy (%) |
|---|---|
| Reference value (Entire space) | 81.0 |
| PseAAC | 82.6 |
| Structural | 84.5 |
| MRMR | 82.2 |
| SIDR ( | 83.5 |
| SIDR ( | 85.0 |
| MRMR ∩ SIDR | 83.2 |
| MRMR ∪ SIDR | 84.9 |
Comparison of highest attainable performance of SVM models using a systematic combination search algorithm.
| Performance | Reference | PseAAC | Structural | MRMR | SIDR | MRMR ∩ SIDR | MRMR ∪ SIDR | |
|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | 84.91 | 85.47 | 85.33 | 84.49 | 84.07 |
| 84.07 |
|
| AUC |
| 0.9769 | 0.9531 | 0.9093 | 0.7118 | 0.8718 | 0.7621 |
|
| Sensitivity (%) | 63.15 | 55.17 | 87.50 | 68.18 |
|
| 73.91 | 80.76 |
| Specificity (%) | 84.02 | 84.19 | 83.38 | 82.56 | 82.45 |
| 82.82 |
|
| MCC | 0.2880 | 0.2738 | 0.3252 | 0.2233 | 0.2524 |
| 0.2551 |
|
In each row, the top two performing metrics have been represented in boldface.