| Literature DB >> 34357513 |
Akanksha Rajput1, Manoj Kumar2,3.
Abstract
Ebola virus is a deadly pathogen responsible for a frequent series of outbreaks since 1976. Despite various efforts from researchers worldwide, its mortality and fatality are quite high. For antiviral drug discovery, the computational efforts are considered highly useful. Therefore, we have developed an 'anti-Ebola' web server, through quantitative structure-activity relationship information of available molecules with experimental anti-Ebola activities. Three hundred and five unique anti-Ebola compounds with their respective IC50 values were extracted from the 'DrugRepV' database. Later, the compounds were used to extract the molecular descriptors, which were subjected to regression-based model development. The robust machine learning techniques, namely support vector machine, random forest and artificial neural network, were employed using tenfold cross-validation. After a randomization approach, the best predictive model showed Pearson's correlation coefficient ranges from 0.83 to 0.98 on training/testing (T274) dataset. The robustness of the developed models was cross-evaluated using William's plot. The highly robust computational models are integrated into the web server. The 'anti-Ebola' web server is freely available at https://bioinfo.imtech.res.in/manojk/antiebola . We anticipate this will serve the scientific community for developing effective inhibitors against the Ebola virus.Entities:
Keywords: Ebola virus; Machine learning; Prediction algorithm; QSAR; Random forest; Web server
Mesh:
Substances:
Year: 2021 PMID: 34357513 PMCID: PMC8343361 DOI: 10.1007/s11030-021-10291-7
Source DB: PubMed Journal: Mol Divers ISSN: 1381-1991 Impact factor: 2.943
Fig. 1Overall methodology used to develop anti-Ebola predictor
Table depicting the performance of training/testing (T274) and independent validation data set (V31) for the support vector machine, random forest and artificial neural network
| Ebola | Training/Testing dataset | Independent Validation dataset | |||||
|---|---|---|---|---|---|---|---|
| Algorithm | Dataset | MAE | RMSE | PCC | MAE | RMSE | PCC |
| SVM | T274 + V31 | 0.33 | 0.47 | 0.83 | 0.48 | 0.66 | 0.65 |
| RF | T274 + V31 | 0.19 | 0.28 | 0.98 | 0.52 | 0.63 | 0.62 |
| ANN | T274 + V31 | 0.23 | 0.29 | 0.95 | 0.76 | 0.97 | 0.64 |
*MAE, mean absolute error; RMSE, root mean absolute error; PCC, Pearson’s correlation coefficient; SVM, support vector machine; RF, random forest; ANN, artificial neural network
Fig. 2Applicability domain of the anti-Ebola compounds presented by William’s plot. a random forest, b support vector machine, c artificial neural Network
Fig. 3Chemical analysis of anti-Ebola compounds. a Scatter plot showing the diversity of the 305 anti-Ebola compounds, b chemical dendrogram of the anti-Ebola compounds showing the chemical side chain similarity among them