| Literature DB >> 19325729 |
Aixia Yan1, Zhi Wang, Zongyuan Cai.
Abstract
QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius(2) and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to select proper descriptors. A Kohonen's self-organizing Neural Network (KohNN) map was used to split the whole dataset into a training set including 380 compounds and a test set consisting of 172 compounds. First, the six selected descriptors from ADRIANA.Code and the six selected descriptors from Cerius(2) were used as the input descriptors for building quantitative models using Partial Least Square (PLS) analysis and Support Vector Machine (SVM) Regression. Then, another two models were built based on nine descriptors selected by a combination of ADRIANA.Code and Cerius(2) descriptors using PLS and SVM, respectively. For the three SVM models, correlation coefficients (r) of 0.87, 0.89 and 0.88 were achieved; and standard deviations (s) of 10.98, 9.72 and 9.14 were obtained for the test set.Entities:
Keywords: Genetic Algorithm Feature Selection; Human intestinal absorption (HIA); Kohonen’s self-organizing Neural Network (KohNN); Quantitative Structure Activity Relationships (QSAR); Support Vector Machine (SVM)
Year: 2008 PMID: 19325729 PMCID: PMC2635609 DOI: 10.3390/ijms9101961
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1.An example for autocorrelation coefficient calculation.
Figure 2.GAPLSOPT(1) test
Figure 3.GAPLSOPT(2)differences curve.
Figure 4.Select frequency figure by GAPLS function. Five repetitions were executed to obtain an average result.
Selected descriptors and corresponding coefficients in the Partial Least Square models. Model 1A was based on six selected ADRIANA.Code descriptors, Model 2A was based on six selected Cerius2 and Model 3A was based on nine combined descriptors.
| Model 1A
| Model 2A
| Model 3A
| |||
|---|---|---|---|---|---|
| descriptors | coefficient | descriptors | coefficient | descriptors | coefficient |
| Nrule5 | 10.3161 | Nrule5 | −10.0335 | Nrule5 | 8.4014 |
| Hdon | 2.8231 | Nrot | 1.4978 | Nrot | 1.2908 |
| LogS | 2.9385 | LogP | 1.4458 | LogP | 1.4358 |
| MW | −0.0194 | Hdon | 2.7628 | Hdon | 2.5400 |
| TPSA | 0.1446 | Jurs- FNSA3 | 85.0957 | Jurs- FNSA3 | 97.3355 |
| Acorr_Sigchg_3 | 14.5617 | Jurs-RPCG | 38.3653 | Jurs-RPCG | 28.8753 |
| LogS | 1.7446 | ||||
| MW | −0.0236 | ||||
| Acorr_Sigchg_3
| 10.2598
| ||||
| 96.5824 | 102.393 | 105.466 | |||
Jurs- FNSA3 represents fractional charged partial surface areas [37].
Jurs-RPCG represents relative positive charge [37].
Acorr_Sigchg_3 is the third components of 2D autocorrelation coefficients for σ charge (where d=2)
Figure 5.A rectangular KohNN map for 552 compounds obtained by 10 descriptors. ‘low’ means compounds with low Human intestinal absorption (HIA) in the range of [0 ∼ 29%], ‘middle’ means compounds with middle HIA in the range of [30 ∼ 79%], and ‘high’ means compounds with high HIA in the range of [80 ∼ 100%].
The prediction performances of 6 models: Partial Least Square (PLS) models and Support Vector Machine (SVM) models. Model 1A and Model 1B are based on six selected ADRIANA.Code descriptors; Model2A and Model2B are based on six selected Cerius2 descriptors; Model 3A and Model 3B are based on nine combined descriptors.
| Model | Training set
| Test set
| RMS | |||||
|---|---|---|---|---|---|---|---|---|
| n | r | s | n | r | s | |||
| Model 1A | PLS | 380 | 0.72 | 15.10 | 172 | 0.83 | 13.06 | 18.79 |
| Model 1B | SVM | 380 | 0.79 | 13.25 | 172 | 0.87 | 10.98 | 16.68 |
| Model 2A | PLS | 380 | 0.73 | 14.67 | 172 | 0.83 | 13.12 | 18.67 |
| Model 2B | SVM | 380 | 0.80 | 13.40 | 172 | 0.89 | 9.72 | 16.35 |
| Model 3A | PLS | 380 | 0.74 | 14.97 | 172 | 0.83 | 13.36 | 18.18 |
| Model 3B | SVM | 380 | 0.81 | 12.50 | 172 | 0.88 | 9.14 | 16.00 |
| Hou’s model17 | 455 | 0.84 | 15.50 | 98 | 0.90 | - | - | |
n: number of compounds;r: correlation coefficient; s: standard deviation.
RMS: root-mean-square (RMS) deviation for the whole model
Figure 6.Calculated vs. Experimental values of human intestinal absorption (HIA) for the corresponding training sets and test sets of 552 compounds by Support Vector Machine (SVM) regression models. Model 1B are based on six selected ADRIANA.Code descriptors, Model2B are based on six selected Cerius2 descriptors and Model 3B are based on nine combined descriptors.