| Literature DB >> 35959292 |
Bin Yang1, Wenzheng Bao2, Shichai Hong3.
Abstract
Rapid screening and identification of potential candidate compounds are very important to understand the mechanism of drugs for the treatment of Alzheimer's disease (AD) and greatly promote the development of new drugs. In order to greatly improve the success rate of screening and reduce the cost and workload of research and development, this study proposes a novel Alzheimer-related compound identification algorithm namely forgeNet_SVM. First, Alzheimer related and unrelated compounds are collected using the data mining method from the literature databases. Three molecular descriptors (ECFP6, MACCS, and RDKit) are utilized to obtain the feature sets of compounds, which are fused into the all_feature set. The all_feature set is input to forgeNet_SVM, in which forgeNet is utilized to provide the importance of each feature and select the important features for feature extraction. The selected features are input to support vector machines (SVM) algorithm to identify the new compounds in Traditional Chinese Medicine (TCM) prescription. The experiment results show that the selected feature set performs better than the all_feature set and three single feature sets (ECFP6, MACCS, and RDKit). The performances of TPR, FPR, Precision, Specificity, F1, and AUC reveal that forgeNet_SVM could identify more accurately Alzheimer-related compounds than other classical classifiers.Entities:
Keywords: Alzheimer; data fusion; feature selection; machine learning; network pharmacology; virtual screening
Year: 2022 PMID: 35959292 PMCID: PMC9357977 DOI: 10.3389/fnagi.2022.931729
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.702
Figure 1The flowchart of forgeNet algorithm.
Figure 2Flowchart of Alzheimer-related active compound identification by forgeNet_SVM.
Figure 3Performances of forgeNet_SVM with the different numbers of features.
Figure 4ROC curves and AUC performances of our method with different feature sets for Alzheimer-related compound identification with Dat1.
Performances of our method with different feature sets for Alzheimer-related compound identification with Dat1.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Selected features |
|
|
|
|
|
| ECFP6 | 0.829787 | 0.060284 | 0.821053 | 0.939716 | 0.825397 |
| MACCS | 0.882979 | 0.124113 | 0.70339 | 0.875887 | 0.783019 |
| RDKit | 0.882979 | 0.106383 | 0.734513 | 0.893617 | 0.801932 |
| All features | 0.93617 | 0.056738 | 0.846154 | 0.943262 | 0.888889 |
The bold values denote the best performances.
Figure 5ROC curves and AUC performances of our method with different feature sets for Alzheimer-related compound identification with Dat2.
Performances of our method with different feature sets for Alzheimer-related compound identification with Dat2.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Selected features |
|
|
|
|
|
| ECFP6 | 0.678571 | 0.059524 | 0.791667 | 0.940476 | 0.730769 |
| MACCS | 0.821429 | 0.142857 | 0.657143 | 0.857143 | 0.730159 |
| RDKit | 0.857143 | 0.214286 | 0.571429 | 0.785714 | 0.685714 |
| All features | 0.678571 | 0.059524 | 0.791667 | 0.940476 | 0.730769 |
The bold values denote the best performances.
Performances of 15 methods for Alzheimer-related compound identification with Dat1.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| forgeNet_SVM | 0.946809 | 0.031915 | 0.908163 | 0.968085 |
|
|
| AdaBoost | 0.914894 | 0.035461 | 0.895833 | 0.964539 | 0.905263 | 0.974083 |
| forgeNet_AdaBoost | 0.914894 | 0.035461 | 0.895833 | 0.964539 | 0.905263 | 0.974083 |
| GBDT | 0.904255 | 0.039007 | 0.885417 | 0.960993 | 0.894737 | 0.981326 |
| forgeNet_GBDT | 0.914894 | 0.028369 | 0.914894 | 0.971631 | 0.914894 | 0.982383 |
| KNN |
| 0.77305 | 0.299035 | 0.22695 | 0.459259 | 0.798759 |
| forgeNet_KNN | 0.893617 | 0.028369 | 0.913043 | 0.971631 | 0.903226 | 0.978101 |
| LR |
| 0.56383 | 0.369048 | 0.43617 | 0.537572 | 0.942282 |
| forgeNet_LR | 0.93617 | 0.042553 | 0.88 | 0.957447 | 0.907216 | 0.942282 |
| NB | 0.287234 |
|
|
| 0.446281 | 0.643617 |
| forgeNet_NB | 0.946809 | 0.039007 | 0.89 | 0.960993 | 0.917526 | 0.962464 |
| RF | 0.882979 | 0.031915 | 0.902174 | 0.968085 | 0.892473 | 0.98823 |
| forgeNet_RF | 0.904255 | 0.031915 | 0.904255 | 0.968085 | 0.904255 | 0.986457 |
| DT | 0.87234 | 0.109929 | 0.725664 | 0.890071 | 0.792271 | 0.881206 |
| forgeNet_DT | 0.946809 | 0.060284 | 0.839623 | 0.939716 | 0.89 | 0.943262 |
The bold values denote the best performances.
Performances of 15 methods for Alzheimer-related compound identification with Dat2.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| forgeNet_SVM | 0.964286 |
|
|
|
|
|
| AdaBoost | 0.357143 | 0.309524 | 0.277778 | 0.690476 | 0.3125 | 0.991071 |
| forgeNet_AdaBoost | 0.892857 |
|
|
| 0.943396 | 0.995748 |
| GBDT | 0.821429 | 0.607143 | 0.310811 | 0.392857 | 0.45098 | 0.997449 |
| forgeNet_GBDT | 0.928571 |
|
|
| 0.962963 | 0.993197 |
| KNN |
|
| 0.25 | 0 | 0.4 | 0.742347 |
| forgeNet_KNN | 0.892857 | 0.035714 | 0.892857 | 0.964286 | 0.892857 | 0.94494 |
| LR |
| 0.678571 | 0.329412 | 0.321429 | 0.495575 | 0.964711 |
| forgeNet_LR | 0.928571 | 0.071429 | 0.8125 | 0.928571 | 0.866667 | 0.985544 |
| NB | 0 |
|
| 0.5 | ||
| forgeNet_NB | 0.964286 | 0.059524 | 0.84375 | 0.940476 | 0.9 | 0.951743 |
| RF | 0.535714 | 0.130952 | 0.576923 | 0.869048 | 0.555556 | 0.987724 |
| forgeNet_RF | 0.928571 |
|
|
| 0.962963 | 0.996173 |
| DT | 0.857143 | 0.630952 | 0.311688 | 0.369048 | 0.457143 | 0.839286 |
| forgeNet_DT | 0.964286 | 0.011905 | 0.964286 | 0.988095 | 0.964286 | 0.97619 |
The bold values denote the best performances.