| Literature DB >> 22837674 |
Xue Xu1,2, Wuxia Zhang1, Chao Huang2, Yan Li3, Hua Yu2, Yonghua Wang2, Jinyou Duan1, Yang Ling4.
Abstract
Orally administered drugs must overcome several barriers before reaching their target site. Such barriers depend largely upon specific membrane transport systems and intracellular drug-metabolizing enzymes. For the first time, the P-glycoprotein (P-gp) and cytochrome P450s, the main line of defense by limiting the oral bioavailability (OB) of drugs, were brought into construction of QSAR modeling for human OB based on 805 structurally diverse drug and drug-like molecules. The linear (multiple linear regression: MLR, and partial least squares regression: PLS) and nonlinear (support-vector machine regression: SVR) methods are used to construct the models with their predictivity verified with five-fold cross-validation and independent external tests. The performance of SVR is slightly better than that of MLR and PLS, as indicated by its determination coefficient (R(2)) of 0.80 and standard error of estimate (SEE) of 0.31 for test sets. For the MLR and PLS, they are relatively weak, showing prediction abilities of 0.60 and 0.64 for the training set with SEE of 0.40 and 0.31, respectively. Our study indicates that the MLR, PLS and SVR-based in silico models have good potential in facilitating the prediction of oral bioavailability and can be applied in future drug design.Entities:
Keywords: P-glycoprotein; cytochrome P4503A4 and P4502D6; oral bioavailability; quantitative structure activity relationship
Mesh:
Substances:
Year: 2012 PMID: 22837674 PMCID: PMC3397506 DOI: 10.3390/ijms13066964
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Clustering of 8 × 8 Self-organizing map (SOM) of 224 compounds in Set 3. The numbers correspond to the series numbers of the compounds. Those numbers with frames are compounds of the test set, and the others are the compounds of the training set.
Figure 2Experimental and predicted LogB values for Set 1, Set 2, Set 3 and Set 4 using the multiple linear regression (MLR), partial least squares (PLS) and support-vector machine regression (SVR) models, respectively. For MLR, the training and test sets are represented by the black empty squares and black solid squares, respectively. For PLS, they are represented by the red empty circles and red solid circles, respectively, while for SVR, they are shown by the blue empty triangles and blue solid triangles, respectively.
Statistical results of MLR, PLS and SVR for oral bioavailability (OB) prediction of compounds.
| Set 1 | Set 2 | Set 3 | Set 4 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training size | Test size | Training size | Test size | Training size | Test size | Training size | Test size | |||||||||
| 156 | 36 | 122 | 27 | 180 | 44 | 197 | 43 | |||||||||
| SEE | SEP | SEE | SEP | SEE | SEP | SEE | SEP | |||||||||
| 0.621 | 0.411 | 0.612 | 0.311 | 0.521 | 0.400 | 0.541 | 0.482 | 0.610 | 0.492 | 0.612 | 0.48 | 0.61 | 0.482 | 0.622 | 0.480 | |
| 0.631 | 0.390 | 0.651 | 0.311 | 0.643 | 0.331 | 0.511 | 0.470 | 0.561 | 0.500 | 0.561 | 0.521 | 0.831 | 0.312 | 0.600 | 0.490 | |
| 0.800 | 0.311 | 0.720 | 0.220 | 0.750 | 0.280 | 0.630 | 0.772 | 0.780 | 0.361 | 0.800 | 0.361 | 0.690 | 0.421 | 0.682 | 0.461 | |
| 0.840 | - | 0.731 | - | 0.731 | - | 0.310 | - | 0.970 | - | 0.590 | - | 0.990 | - | 0.561 | - | |
R2, the regression coefficient of the training set; Qex2, the regression coefficient of the test set; SEE, standard error of estimate; SEP, standard error of prediction; SVMT represents the models using the total 1536 molecular descriptors as the input variables of SVR; -, not available.
Figure 3The prediction accuracies of 5-fold cross-validation for the 805 compounds derived from partial least squares analysis with latent variables varying from 3 to 20 in Set 1, Set 2, Set 3 and Set 4, respectively.
Figure 4Contour plots of the optimization error for SVR when optimizing the parameters γ and C for the prediction of bioavailability for the training (a) and test (b) sets in Set 1 and Set 2.