| Literature DB >> 32868845 |
Suryanaman Chaube1, Sriram Goverapet Srinivasan2, Beena Rai1.
Abstract
Binding affinities of metal-ligand complexes are central to a multitude of applications like drug design, chelation therapy, designing reagents for solvent extraction etc. While state-of-the-art molecular modelling approaches are usually employed to gather structural and chemical insights about the metal complexation with ligands, their computational cost and the limited ability to predict metal-ligand stability constants with reasonable accuracy, renders them impractical to screen large chemical spaces. In this context, leveraging vast amounts of experimental data to learn the metal-binding affinities of ligands becomes a promising alternative. Here, we develop a machine learning framework for predicting binding affinities (logK1) of lanthanide cations with several structurally diverse molecular ligands. Six supervised machine learning algorithms-Random Forest (RF), k-Nearest Neighbours (KNN), Support Vector Machines (SVM), Kernel Ridge Regression (KRR), Multi Layered Perceptrons (MLP) and Adaptive Boosting (AdaBoost)-were trained on a dataset comprising thousands of experimental values of logK1 and validated in an external 10-folds cross-validation procedure. This was followed by a thorough feature engineering and feature importance analysis to identify the molecular, metallic and solvent features most relevant to binding affinity prediction, along with an evaluation of performance metrics against the dimensionality of feature space. Having demonstrated the excellent predictive ability of our framework, we utilized the best performing AdaBoost model to predict the logK1 values of lanthanide cations with nearly 71 million compounds present in the PubChem database. Our methodology opens up an opportunity for significantly accelerating screening and design of ligands for various targeted applications, from vast chemical spaces.Entities:
Year: 2020 PMID: 32868845 PMCID: PMC7459320 DOI: 10.1038/s41598-020-71255-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Pearson correlation map depicting the correlation between the features.
Evaluation of the ten ML models employed in this work.
| Model used | Test R2 | Test RMSE | Test MAE | Optimized parameters | Normalization |
|---|---|---|---|---|---|
| Random forest | 0.97 | 0.94 | 0.44 | Normal quantile | |
| KNN | 0.95 | 1.31 | 0.62 | Robust | |
| SVM (linear) | 0.80 | 2.64 | 1.80 | Minmax | |
| SVR (RBF) | 0.95 | 1.25 | 0.57 | Uniform quantile | |
| KRR (linear) | 0.82 | 2.51 | 1.82 | Robust | |
| KRR (polynomial) | 0.96 | 1.17 | 0.60 | Uniform quantile | |
| KRR (RBF) | 0.96 | 1.17 | 0.53 | Robust | |
| KRR (Laplacian) | 0.98 | 0.86 | 0.43 | Uniform quantile | |
| MLP | 0.96 | 1.15 | 0.62 | Normal quantile | |
| AdaBoost | 0.98 | 0.91 | 0.39 | Normal quantile |
Figure 2:Top 10 highest ranked descriptors based on a variety of feature importance methods: (a) random forest; (b) permutation importance; and (c) AdaBoost.
Figure 3The distribution of data points in the initial lanthanides dataset based on: (a) the metal cation and (b) the range of logK1 values.
Figure 4Model predictions on the training and test dataset: (a,b) show the parity plot between the predicted and experimental logK1 values, (c) shows the regression error curve and (d) shows the MAE and RMSE values for individual cations.
Figure 5Computed error metrices for the train and test dataset as a function of the dimensionality of the descriptor space.
Figure 6Computed MAE and RMSE in the selectivities of several adjacent lanthanide metal ion pairs.
Comparison of experimental versus predicted logK1 values for nitrogen donor ligands.
| Ligand | Cation | ||
|---|---|---|---|
ADPTZ[
| Ce3+ | 4.82 | 4.28 |
| Pr3+ | 4.76 | 4.43 | |
| Nd3+ | 4.82 | 4.62 | |
| Sm3+ | 4.82 | 4.62 | |
| Eu3+ | 4.69 | 4.51 | |
| Gd3+ | 4.69 | 4.29 | |
| Tb3+ | 4.76 | 4.15 | |
| Dy3+ | 4.69 | 4.07 | |
| Ho3+ | 4.69 | 4.05 | |
| Er3+ | 4.69 | 4.1 | |
| Tm3+ | 4.62 | 4.23 | |
| Yb3+ | 4.69 | 4.3 | |
| Lu3+ | 4.74 | 4.4 | |
MePhPTA[
| Eu3+ | 6.7 | 6.95 |
Phen[
| Eu3+ | 4.84 | 4.23 |
TERPY
| Gd3+[ | 3.85 | 2.6 |
| Lu3+[ | 3.5 | 2.8 | |
| Eu3+[ | 4.15 | 2.4 | |
Me-BTP[
| Nd3+ | 3.46 | 2.9 |
| Eu3+ | 3.81 | 2.9 | |
PDAM[
| Ce3+ | 5.94 | 4.06 |
| Pr3+ | 5.93 | 4.09 | |
| Nd3+ | 6.3 | 4.09 | |
| Sm3+ | 6.32 | 4.27 | |
| Eu3+ | 6.32 | 4.17 | |
| Gd3+ | 6.28 | 4.3 | |
| Tb3+ | 6.26 | 3.93 | |
| Dy3+ | 6.15 | 4.05 | |
| Ho3+ | 4.69 | 3.89 | |
| Er3+ | 4.65 | 3.84 | |
| Tm3+ | 3.76 | 3.88 | |
| Yb3+ | 4.66 | 4.08 | |
| Lu3+ | 4.74 | 3.8 |
Carbon, Nitrogen, Hydrogen and Oxygen atoms are shown in cyan, blue, white and red colors, respectively. The molecular images were generated using the VMD 1.9.3 (https://www.ks.uiuc.edu/Research/vmd/vmd-1.9.3) software[44].