| Literature DB >> 22605964 |
Maolin Wang1, Kai Wang1, Aixia Yan1, Changyuan Yu2.
Abstract
Using a support vector machine (SVM), three classification models were built to predict whether a compound is an active or weakly active inhibitor based on a dataset of 386 hepatitis C virus (HCV) NS5B polymerase NNIs (non-nucleoside analogue inhibitors) fitting into the pocket of the NNI III binding site. For each molecule, global descriptors, 2D and 3D property autocorrelation descriptors were calculated from the program ADRIANA.Code. Three models were developed with the combination of different types of descriptors. Model 2 based on 16 global and 2D autocorrelation descriptors gave the highest prediction accuracy of 88.24% and MCC (Matthews correlation coefficient) of 0.789 on test set. Model 1 based on 13 global descriptors showed the highest prediction accuracy of 86.25% and MCC of 0.732 on external test set (including 80 compounds). Some molecular properties such as molecular shape descriptors (InertiaZ, InertiaX and Span), number of rotatable bonds (NRotBond), water solubility (LogS), and hydrogen bonding related descriptors performed important roles in the interactions between the ligand and NS5B polymerase.Entities:
Keywords: NS5B polymerase inhibitor; classification models; hepatitis C virus (HCV); support vector machine (SVM)
Mesh:
Substances:
Year: 2012 PMID: 22605964 PMCID: PMC3344200 DOI: 10.3390/ijms13044033
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Some representative molecular structures of HCV NS5B NNI III binding site inhibitors.
The intercorrelations between the 13 selected global descriptors and the activitya.
| Activity | InertiaZ | HAcc | NAtoms | NViolationsRo5 | LogS | InertiaX | Span | HDon | HDon_N | NRotBond | RComplexity | Dipole | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| InertiaZ | 0.540 | 1 | |||||||||||
| HAcc | 0.492 | 0.794 | 1 | ||||||||||
| NAtoms | 0.457 | 0.812 | 0.841 | 1 | |||||||||
| NViolationsRo5 | 0.452 | 0.770 | 0.734 | 0.607 | 1 | ||||||||
| LogS | −0.439 | −0.735 | −0.404 | −0.607 | −0.475 | 1 | |||||||
| InertiaX | 0.426 | 0.830 | 0.720 | 0.822 | 0.673 | −0.721 | 1 | ||||||
| Span | 0.403 | 0.824 | 0.677 | 0.760 | 0.501 | −0.625 | 0.667 | 1 | |||||
| HDon | 0.397 | 0.629 | 0.724 | 0.655 | 0.515 | −0.241 | 0.569 | 0.523 | 1 | ||||
| HDon_N | 0.364 | 0.653 | 0.604 | 0.503 | 0.553 | −0.314 | 0.460 | 0.576 | 0.792 | 1 | |||
| NRotBond | 0.323 | 0.686 | 0.681 | 0.759 | 0.615 | −0.431 | 0.702 | 0.658 | 0.461 | 0.348 | 1 | ||
| RComplexity | 0.298 | 0.280 | 0.426 | 0.497 | −0.007 | −0.246 | 0.266 | 0.483 | 0.366 | 0.259 | 0.077 | 1 | |
| Dipole | 0.296 | 0.303 | 0.416 | 0.464 | 0.067 | −0.306 | 0.322 | 0.534 | 0.288 | 0.237 | 0.179 | 0.794 | 1 |
| Eccentric | 0.168 | 0.275 | 0.139 | −0.039 | 0.189 | −0.018 | −0.241 | 0.267 | 0.089 | 0.374 | −0.045 | 0.009 | −0.008 |
InertiaZ represents principal component of the inertia tensor in z-direction; HAcc represents number of hydrogen bonding acceptors derived from the sum of nitrogen and oxygen atoms in the molecule; NAtoms represents number of all atoms in the molecule; NViolationsRo5 represents number of violations of the Lipinski's rule of 5 (Weight > 500, XlogP > 5, HDon > 5, HAcc > 10); LogS represents solubility of the molecule in water in [log units]; InertiaX represents principal component of the inertia tensor in x-direction; Span represents radius of the smallest sphere centered at the center of mass which completely encloses all atoms in the molecule; HDon represents number of hydrogen bonding donors derived from the sum of N-H and O-H groups in the molecule; HDon_N represents number of hydrogen bonding donors N-H groups in the molecule; NRotBond represents number of open-chain, single rotatable bonds; RComplexity represents ring complexity according to the approach by J. Gasteiger and C. Jochum [18]; Dipole represents dipole moment in [Debye] of the molecule; Eccentric represents molecular eccentricity [19].
The correlation coefficients between the 16 selected global and 2D autocorrelation descriptors and the activity.
| Activity | Description of Selected Descriptors | |
|---|---|---|
| Span | 0.403 | Radius of the smallest sphere centered at the center of mass which completely encloses all atoms in the molecule |
| NRotBond | 0.323 | Number of open-chain, single rotatable bonds |
| LogS | −0.439 | Solubility of the molecule in water in [log units] |
| InertiaZ | 0.540 | Principal component of the inertia tensor in |
| InertiaX | 0.426 | Principal component of the inertia tensor in |
| 2DACorr_TotChg_11 | −0.277 | The eleventh component of 2D autocorrelation coefficients for σ and π charges, where the distance |
| 2DACorr_TotChg_1 | 0.523 | The first component of 2D autocorrelation coefficients for σ and π charges, where the distance |
| 2DACorr_SigChg_4 | −0.452 | The fourth component of 2D autocorrelation coefficients for σ charge, where the distance |
| 2DACorr_SigChg_3 | 0.272 | The third component of 2D autocorrelation coefficients for σ charge, where the distance |
| 2DACorr_SigChg_2 | −0.249 | The second component of 2D autocorrelation coefficients for σ charge, where the distance |
| 2DACorr_PiChg_10 | 0.326 | The tenth component of 2D autocorrelation coefficients for π charges, where the distance |
| 2DACorr_LpEN_8 | 0.305 | The eighth component of 2D autocorrelation coefficient for lone pair electronegativities, where the distance |
| 2DACorr_LpEN_6 | 0.582 | The sixth component of 2D autocorrelation coefficient for lone pair electronegativities, where the distance |
| 2DACorr_LpEN_4 | 0.198 | The fourth component of 2D autocorrelation coefficient for lone pair electronegativities, where the distance |
| 2DACorr_LpEN_10 | 0.166 | The tenth component of 2D autocorrelation coefficient for lone pair electronegativities, where the distance |
| 2DACorr_Ident_11 | 0.421 | The eleventh component of 2D autocorrelation coefficient for identity, where the distance |
The correlation coefficients between the selected 19 global and 3D autocorrelation descriptors and the activity.
| Activity | Description of Selected Descriptors | |
|---|---|---|
| HDon | 0.397 | Number of hydrogen bonding donors derived from the sum of N-H and O-H groups in the molecule |
| HAcc_N | 0.431 | Number of hydrogen bonding acceptors derived from the nitrogen atoms in the molecule |
| HAcc_O | 0.417 | Number of hydrogen bonding acceptors derived from the oxygen atoms in the molecule |
| LogS | −0.439 | Solubility of the molecule in water in [log units] |
| NRotBond | 0.323 | Number of open-chain, single rotatable bonds |
| InertiaX | 0.426 | Principal component of the inertia tensor in |
| InertiaZ | 0.540 | Principal component of the inertia tensor in |
| Span | 0.403 | Radius of the smallest sphere centered at the center of mass which completely encloses all atoms in the molecule |
| Eccentric | 0.168 | Molecular eccentricity [ |
| 3DACorr_SigChg_2 | −0.210 | 3D autocorrelation weighted by σ atom charges, where d is in the range of 2–3 Å |
| 3DACorr_SigChg_6 | −0.364 | 3D autocorrelation weighted by σ atom charges, where d is in the range of 6–7 Å |
| 3DACorr_SigChg_7 | 0.345 | 3D autocorrelation weighted by σ atom charges, where d is in the range of 7–8 Å |
| 3DACorr_PiChg_4 | −0.165 | 3D autocorrelation weighted by π atom charges, where d is in the range of 4–5 Å |
| 3DACorr_PiChg_10 | 0.166 | 3D autocorrelation weighted by π atom charges, where d is in the range of 10–11 Å |
| 3DACorr_TotChg_1 | −0.514 | 3D autocorrelation weighted by total atom charges (sum of σ, π charges), where d is in the range of 1–2 Å |
| 3DACorr_TotChg_7 | 0.348 | 3D autocorrelation weighted by total atom charges (sum of σ, π charges), where d is in the range of 7–8 Å |
| 3DACorr_PiEN_7 | 0.436 | 3D autocorrelation weighted by π atom electronegativities, where d is in the range of 7–8 Å |
| 3DACorr_LpEN_5 | 0.413 | 3D autocorrelation weighted by lone pair electronegativities, where d is in the range of 5–6 Å |
| 3DACorr_LpEN_12 | 0.350 | 3D autocorrelation weighted by lone pair electronegativities, where d is in the range of 12–13 Å |
Prediction performance of the three SVM modelsa.
| Model | Number of Descriptors | Number of Compounds | Training Set | Test Set | |||
|---|---|---|---|---|---|---|---|
| Training Set/Test Set | Accuracy | SE | SP | Accuracy | MCC | ||
| Model 1 | 13 | 266/102 | 87.97% | 97.92% | 61.11% | 78.43% | 0.625 |
| Model 2 | 16 | 266/102 | 95.49% | 100% | 77.78% | 88.24% | 0.789 |
| Model 3 | 19 | 266/102 | 95.11% | 100% | 64.81% | 81.37% | 0.681 |
Model 1 represents the model built with 13 selected global descriptors as shown in Table 1. Model 2 represents the model built with 16 selected global and 2D autocorrelation descriptors as shown in Table 2. Model 3 represents the model built with 19 selected global and 3D autocorrelation descriptors as shown in Table 3;
SE (sensitivity) represents the prediction accuracy of the active inhibitors;
SP (specificity) represents the prediction accuracy of the weakly active inhibitors;
MCC represents Matthews Correlation Coefficient.
Prediction accuracy on the external test set with three models a. The external test set contains 38 active inhibitors and 42 weakly active inhibitors of NS5B polymerase.
| Model | Number of Descriptors | Number of Compounds | SE | SP | Accuracy | MCC |
|---|---|---|---|---|---|---|
| Model 1 | 13 | 80 | 92.11% | 80.95% | 86.25% | 0.732 |
| Model 2 | 16 | 80 | 92.11% | 69.05% | 80.00% | 0.623 |
| Model 3 | 19 | 80 | 65.79% | 54.76% | 60.00% | 0.206 |
Model 1 represents the model built with 13 selected global descriptors as shown in Table 1. Model 2 represents the model built with 16 selected global and 2D Autocorrelation descriptors as shown in Table 2. Model 3 represents the model built with 19 selected global and 3D Autocorrelation descriptors as shown in Table 3;
SE (sensitivity) represents the prediction accuracy of the active inhibitors;
SP (specificity) represents the prediction accuracy of the weakly active inhibitors;
MCC represents Matthews Correlation Coefficient.