| Literature DB >> 35150482 |
Wen-Xing Li1,2, Xin Tong3, Peng-Peng Yang3, Yang Zheng3, Ji-Hao Liang3, Gong-Hua Li4, Dahai Liu5, Dao-Gang Guan1,2, Shao-Xing Dai3.
Abstract
Bacterial infection is one of the most important factors affecting the human life span. Elderly people are more harmed by bacterial infections due to their deficits in immunity. Because of the lack of new antibiotics in recent years, bacterial resistance has increasingly become a serious problem globally. In this study, an antibacterial compound predictor was constructed using the support vector machines and random forest methods and the data of the active and inactive antibacterial compounds from the ChEMBL database. The results showed that both models have excellent prediction performance (mean accuracy >0.9 and mean AUC >0.9 for the two models). We used the predictor to screen potential antibacterial compounds from FDA-approved drugs in the DrugBank database. The screening results showed that 1087 small-molecule drugs have potential antibacterial activity and 154 of them are FDA-approved antibacterial drugs, which accounts for 76.2% of the approved antibacterial drugs collected in this study. Through molecular fingerprint similarity analysis and common substructure analysis, we screened 8 predicted antibacterial small-molecule compounds with novel structures compared with known antibacterial drugs, and 5 of them are widely used in the treatment of various tumors. This study provides a new insight for predicting antibacterial compounds by using approved drugs, the predicted compounds might be used to treat bacterial infections and extend lifespan.Entities:
Keywords: antibacterial compound; drug repositioning; machine learning; structural similarity; virtual screening
Mesh:
Substances:
Year: 2022 PMID: 35150482 PMCID: PMC8876917 DOI: 10.18632/aging.203887
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Figure 1The prediction accuracy of different machine learning methods for benchmark datasets. The filtered datasets include one positive dataset and 10 negative datasets, therefore, each value in the figure is the average of 10 prediction accuracy. Compared with other machine learning methods, random forest (RF), support vector machine (SVM), and multi-layer perception (MLP) all show higher prediction accuracy. The benchmark dataset based on FP2 molecular fingerprints shows the highest prediction accuracy in the RF and MLP methods, and also shows high prediction accuracy in the SVM method among all molecular fingerprints. The accuracy fluctuates greatly among different machine learning methods in the benchmark dataset based on vector features.
Figure 2Flow chart of the construction of the antibacterial compound prediction model. The benchmark dataset was built using the active and inactive antibacterial compounds downloaded from the ChEMBL and the PubChem database. The combination of SVM, RF, and MLP methods was used to construct the antibacterial compounds predictor, which is used to predict the antibacterial activity of approved small-molecule drugs from the DrugBank database.
Optimal parameters and prediction performance of different machine learning methods.
|
|
|
| |
| Optimal parameters | gamma: 0.01 | n_estimators: 750 | hidden_layer_sizes: 512 |
| Accuracy | 0.852 ± 0.002 | 0.849 ± 0.004 | 0.847 ± 0.004 |
| Precision | 0.854 ± 0.004 | 0.868 ± 0.004 | 0.850 ± 0.007 |
| Sensitivity | 0.850 ± 0.004 | 0.822 ± 0.007 | 0.845 ± 0.003 |
| Specificity | 0.854 ± 0.005 | 0.875 ± 0.004 | 0.850 ± 0.009 |
| F1 score | 0.852 ± 0.002 | 0.844 ± 0.005 | 0.847 ± 0.003 |
| AUC | 0.926 ± 0.002 | 0.932 ± 0.002 | 0.920 ± 0.002 |
| MSE | 0.148 ± 0.002 | 0.151 ± 0.004 | 0.153 ± 0.004 |
Abbreviations: AUC: area under the curve; MSE: mean squared error. Parameters of predictive performance were displayed as mean ± standard deviation.
Figure 3Antibacterial prediction results of approved small-molecule drugs. (A) The probability of predicted antibacterial activity for all small-molecule drugs in the SVM, RF, and MLP models. A drug with a probability value greater than 0.5 is considered an active antibacterial compound. (B) Venn diagram of the predicted antibacterial drugs in three machine learning models. (C) Venn diagram of the predicted antibacterial drugs and FDA-approved antibacterial drugs. (D) The top 20 categories of the 957 predicted novel antibacterial drugs.
Figure 4The similarity of the predicted antibacterial drugs and FDA-approved antibacterial drugs. (A) The molecular fingerprint similarity of 957 predicted novel antibacterial drugs and 206 FDA-approved antibacterial drugs. The average similarities between most of the predicted drugs and approved drugs were less than 0.2. (B) Substructure similarity between novel predicted antibacterial drugs and core scaffolds of approved antibacterial drugs. Compounds with an overlap coefficient higher than 0.9 are considered to have high substructure similarity.
The prediction results of 9 antibacterial drugs with low structural similarities.
|
|
|
|
| ||
|
|
|
| |||
| DB00228 | Enflurane | 0.741 | 0.544 | 0.916 | 0.055 (0.000–0.119) |
| DB00531 | Cyclophosphamide | 0.571 | 0.518 | 0.902 | 0.086 (0.010–0.150) |
| DB00753 | Isoflurane | 0.698 | 0.536 | 0.980 | 0.055 (0.000–0.120) |
| DB00964 | Apraclonidine | 0.514 | 0.514 | 0.501 | 0.093 (0.013–0.198) |
| DB01028 | Methoxyflurane | 0.770 | 0.504 | 0.913 | 0.048 (0.000–0.143) |
| DB01057 | Echothiophate | 0.703 | 0.518 | 0.864 | 0.072 (0.017–0.143) |
| DB01181 | Ifosfamide | 0.589 | 0.515 | 0.888 | 0.095 (0.010–0.172) |
| DB01189 | Desflurane | 0.732 | 0.546 | 0.975 | 0.055 (0.000–0.150) |
| DB01236 | Sevoflurane | 0.538 | 0.517 | 0.934 | 0.060 (0.000–0.162) |
Abbreviations: SVM: support vector machine; RF: random forest; MLP: multi-layer perception. 1The structural similarities were calculated between the novel predicted antibacterial drugs and FDA-approved antibacterial drugs.