| Literature DB >> 24451033 |
Alexey V Zakharov1, Megan L Peach, Markus Sitzmann, Marc C Nicklaus.
Abstract
We describe a novel approach to RBF approximation, which combines two new elements: (1) linear radial basis functions and (2) weighting the model by each descriptor's contribution. Linear radial basis functions allow one to achieve more accurate predictions for diverse data sets. Taking into account the contribution of each descriptor produces more accurate similarity values used for model development. The method was validated on 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. We also compared the new method with five different QSAR methods implemented in the EPA T.E.S.T. program. Our approach, implemented in the program GUSAR, showed a reasonable accuracy of prediction and high coverage for all external test sets, providing more accurate prediction results than the comparison methods and even the consensus of these methods. Using our new method, we have created models for physicochemical and toxicity endpoints, which we have made freely available in the form of an online service at http://cactus.nci.nih.gov/chemical/apps/cap.Entities:
Mesh:
Year: 2014 PMID: 24451033 PMCID: PMC3985791 DOI: 10.1021/ci400704f
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Figure 1Schematic representation of the RBF-SCR radial basis function approach.
Figure 2Representation of linear and Gaussian functions in the descriptor space.
Comparison of QSAR Models Generated by GUSAR with Different RBF Methodsa
| RBF NN (Gaussian functions) | RBF interpolation (linear functions) | RBF-SCR (linear functions) | RBF NN (Gaussian functions) | RBF interpolation (linear functions) | RBF-SCR (linear functions) | |
|---|---|---|---|---|---|---|
| activity name | RMSE | RMSE | RMSE | |||
| boiling point (°C) | 0.84 | 0.95 | 0.95 | 34.63 | 20.11 | |
| density (g/cm3) | 0.93 | 0.97 | 0.97 | 0.08 | 0.06 | |
| flash point (°C) | 0.78 | 0.88 | 39.13 | 28.51 | ||
| thermal conductivity (mW/(m·K)) | 0.85 | 0.90 | 15.39 | 12.20 | ||
| viscosity (log10(cP)) | 0.65 | 0.87 | 0.34 | 0.22 | ||
| surface tension (dyn/cm) | 0.83 | 0.88 | 2.86 | 2.43 | ||
| water solubility (log10(mol/L)) | 0.83 | 0.87 | 0.87 | 0.90 | 0.80 | |
| vapor pressure (log10(mmHg)) | 0.86 | 0.95 | 0.95 | 1.34 | 0.82 | |
| melting point (°C) | 0.77 | 0.86 | 0.86 | 49.22 | 37.97 | |
| Fathead minnow, (−log10(LC50)) | 0.67 | 0.73 | 0.84 | 0.76 | 0.76 | |
| 0.57 | 0.60 | 1.16 | 1.10 | |||
| 0.70 | 0.81 | 0.55 | 0.43 | |||
| oral rat acute toxicity (−log10(LD50)) | 0.56 | 0.66 | 0.66 | 0.64 | 0.56 | 0.56 |
| bioconcentration factor (log10(BCF)) | 0.73 | 0.77 | 0.71 | 0.66 | 0.65 |
RBF-SCR is the novel method proposed in this article. Best model parameter for each endpoint is shown in bold.
Comparison of the Results of GUSAR Using RBF-SCR with Those of the T.E.S.T. Program
| hierarchical, T.E.S.T. | single model, T.E.S.T. | FDA, T.E.S.T. | group contribution, T.E.S.T. | nearest neighbor, T.E.S.T. | T.E.S.T. consensus | RBF-SCR, GUSAR | |
|---|---|---|---|---|---|---|---|
| RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | |
| endpoint | coverage | coverage | coverage | coverage | coverage | coverage | coverage |
| boiling point (°C) | 18.70 | N/A | 21.43 | 27.55 | 29.97 | 19.40 | |
| 0.935 | N/A | 0.988 | 0.977 | 0.988 | 0.981 | ||
| density (g/cm3) | 0.05 | N/A | 0.06 | 0.12 | 0.12 | 0.07 | 0.05 |
| 0.942 | N/A | 0.992 | 0.992 | 0.997 | 0.996 | ||
| flash point (°C) | 28.90 | N/A | 31.48 | 33.63 | 36.83 | 28.50 | |
| 0.924 | N/A | 0.989 | 0.987 | 0.992 | 0.953 | ||
| thermal conductivity (mW/(m·K)) | 11.02 | 11.86 | 16.41 | 15.90 | 12.83 | 12.41 | |
| 0.956 | 0.956 | 0.967 | 0.911 | 0.978 | 0.967 | ||
| viscosity (log10(cP)) | 0.21 | 0.35 | 0.21 | 0.20 | 0.29 | 0.22 | 0.20 |
| 0.929 | 0.929 | 0.929 | 0.814 | 0.920 | 0.929 | ||
| surface tension (dyn/cm) | N/A | 2.22 | 2.93 | 3.32 | 2.11 | 1.85 | |
| 0.919 | N/A | 0.979 | 0.926 | 0.936 | 0.968 | ||
| water solubility (log10(mol/L)) | 0.90 | N/A | 0.95 | 1.07 | 1.02 | 0.84 | |
| 0.935 | N/A | 0.984 | 0.982 | 0.985 | 0.950 | ||
| vapor pressure (log10(mmHg)) | 0.75 | N/A | 0.83 | 1.00 | 1.25 | 0.77 | |
| 0.940 | N/A | 0.968 | 0.980 | 0.980 | 0.935 | ||
| melting point (°C) | 44.36 | N/A | 45.10 | 54.95 | 52.10 | 41.46 | |
| 0.932 | N/A | 0.997 | 0.998 | 0.998 | 0.979 | ||
| Fathead minnow (−log10(LC50)) | 0.80 | 0.80 | 0.92 | 0.81 | 0.88 | 0.77 | |
| 0.951 | 0.945 | 0.945 | 0.872 | 0.939 | 0.951 | ||
| 0.98 | 0.99 | 1.19 | 0.98 | 0.91 | 1.07 | ||
| 0.886 | 0.871 | 0.900 | 0.657 | 0.871 | 0.900 | ||
| 0.54 | N/A | 0.49 | 0.58 | 0.64 | 0.48 | ||
| 0.933 | N/A | 0.978 | 0.955 | 0.986 | 0.983 | ||
| oral rat acute toxicity (−log10(LD50)) | 0.65 | N/A | 0.66 | N/A | 0.66 | 0.59 | |
| 0.876 | N/A | 0.984 | N/A | 0.984 | 0.960 | ||
| bioconcentration factor (log10(BCF)) | 0.71 | 0.68 | 0.75 | 0.76 | 0.88 | 0.66 | |
| 0.926 | 0.926 | 0.911 | 0.874 | 0.948 | 0.926 |
RMSE: root-mean-square error. The highest coverage and accuracy values are highlighted in bold.