| Literature DB >> 34768951 |
Cristian R Munteanu1,2,3,4, Pablo Gutiérrez-Asorey1, Manuel Blanes-Rodríguez1, Ismael Hidalgo-Delgado1, María de Jesús Blanco Liverio5, Brais Castiñeiras Galdo1,2, Ana B Porto-Pazos1,2, Marcos Gestal1,2,3,4, Sonia Arrasate4,5, Humbert González-Díaz4,5,6,7.
Abstract
The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors.Entities:
Keywords: ChEMBL database; anti-glioblastoma; big data; decorated nanoparticles; drug delivery; machine learning; perturbation theory
Mesh:
Substances:
Year: 2021 PMID: 34768951 PMCID: PMC8584266 DOI: 10.3390/ijms222111519
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Baseline classification models for drug-decorated nanoparticle delivery systems against glioblastoma. (We included bold letters to help readers to locate the best values).
| Method | ACC | AUROC | Precision | Recall | f1-Score |
|---|---|---|---|---|---|
| KNeighborsClassifier | 0.7093 | 0.7882 | 0.7121 | 0.7093 | 0.7105 |
| GaussianNB | 0.6553 | 0.6752 | 0.6203 | 0.6553 | 0.5968 |
| LinearDiscriminantAnalysis | 0.7266 | 0.7988 | 0.7220 | 0.7266 | 0.7236 |
| LogisticRegression | 0.7206 | 0.8002 | 0.7150 | 0.7206 | 0.7169 |
| DecisionTreeClassifier | 0.8586 | 0.8544 | 0.8576 | 0.8586 | 0.8580 |
| RandomForestClassifier | 0.7923 | 0.8714 | 0.7943 | 0.7923 | 0.7931 |
| XGBClassifier | 0.7574 | 0.8502 | 0.7566 | 0.7574 | 0.7570 |
| GradientBoostingClassifier | 0.7599 | 0.8526 | 0.7603 | 0.7599 | 0.7601 |
| BaggingClassifier |
|
|
|
|
|
| AdaBoostClassifier | 0.7175 | 0.8100 | 0.7100 | 0.7175 | 0.7119 |
Figure 1Variation of max_sample parameter for the best classifier (Bagging classifier).
Figure 2Variation of n_estimators parameter for the best classifier (Bagging classifier).
Figure 3The most important features for the best classifier (normalized values).
Figure 4Accuracy progression with the removal of features with low importance in the best classifier.
Figure 5Methodology workflow for building classification models for DDNPs against anti-glioblastoma.