| Literature DB >> 24812613 |
Vinicius Rosa Seus1, Giovanni Xavier Perazzo1, Ana T Winck2, Adriano V Werhli1, Karina S Machado1.
Abstract
The receptor-ligand interaction evaluation is one important step in rational drug design. The databases that provide the structures of the ligands are growing on a daily basis. This makes it impossible to test all the ligands for a target receptor. Hence, a ligand selection before testing the ligands is needed. One possible approach is to evaluate a set of molecular descriptors. With the aim of describing the characteristics of promising compounds for a specific receptor we introduce a data warehouse-based infrastructure to mine molecular descriptors for virtual screening (VS). We performed experiments that consider as target the receptor HIV-1 protease and different compounds for this protein. A set of 9 molecular descriptors are taken as the predictive attributes and the free energy of binding is taken as a target attribute. By applying the J48 algorithm over the data we obtain decision tree models that achieved up to 84% of accuracy. The models indicate which molecular descriptors and their respective values are relevant to influence good FEB results. Using their rules we performed ligand selection on ZINC database. Our results show important reduction in ligands selection to be applied in VS experiments; for instance, the best selection model picked only 0.21% of the total amount of drug-like ligands.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24812613 PMCID: PMC4000951 DOI: 10.1155/2014/325959
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The three-dimensional structure of the HIV-1 protease target receptor (PDB Code: 1HPV).
Figure 2Infrastructure to mine molecular descriptors for virtual screening. The structure is composed by 5 major interactive modules: virtual screening, ligand databases, data warehouse, mining, and ligand selection.
Example of data mining input file format. Column 1 represents the ligand identification (not used on data mining experiments). Columns MwT, logP, HBD, HBA, and so forth correspond to the molecular descriptors for each ligand, our predictive attributes. The last column is the target attribute FEB.
| Ligand | MwT | log | HBD | HBA | ⋯ | FEB |
|---|---|---|---|---|---|---|
| 1 | 297.44 | 4.61 | 1 | 2 | ⋯ | −8.50 |
| 2 | 348.47 | 3.82 | 2 | 4 | ⋯ | −7.96 |
| ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ |
|
| 200.19 | 0.54 | 2 | 5 | ⋯ | −6.89 |
Evaluation metric results of the first set of data mining experiments for validating the proposed architecture. Columns 1 and 2 detail the decision tree experiment. Column 3 corresponds to the accuracy value of the respective decision tree. Column 4 is the size of the tree. Columns 5 and 6 are the RMSE and MAE metrics. Column 7 is the F-measure obtained in each induced decision tree.
| Classes | Method | Accuracy | Size | RMSE | MAE | FM |
|---|---|---|---|---|---|---|
| 2 | 1 | 75 | 11 | 0.45 | 0.27 | 0.75 |
| 2 | 75 | 11 | 0.45 | 0.27 | 0.75 | |
| 3 | 1 | 61.84 | 19 | 0.47 | 0.27 | 0.62 |
| 2 | 73.32 | 18 | 0.44 | 0.21 | 0.73 | |
| 4 | 1 | 58.78 | 17 | 0.36 | 0.25 | 0.59 |
| 2 | 64.47 | 19 | 0.40 | 0.19 | 0.65 |
Figure 3Decision tree induced for the HIV-1 protease with 2 classes considering 9 molecular descriptors.
Evaluation metric results of the second set of data mining experiments for generating rules about the molecular descriptors. Columns 1 and 2 are the definition of the decision tree experiment characteristics. Columns 3–7 correspond to the resulted metrics for each performed experiment: accuracy, size, RMSE, MAE, and F-measure, respectively.
| Classes | Method | Accuracy | Size | RMSE | MAE | FM |
|---|---|---|---|---|---|---|
| 2 | 1 | 84.15 | 9 | 0.35 | 0.22 | 0.84 |
| 2 | 84.39 | 7 | 0.34 | 0.20 | 0.84 | |
| 3 | 1 | 64.88 | 9 | 0.39 | 0.28 | 0.65 |
| 2 | 77.81 | 13 | 0.33 | 0.20 | 0.78 | |
| 4 | 1 | 58.78 | 17 | 0.36 | 0.25 | 0.59 |
| 2 | 68.29 | 13 | 0.34 | 0.22 | 0.67 |
Evaluation of the obtained decision trees using the metric Ordinal Classification Index. Columns 1 and 2 describe the performed decision tree experiment and Column 3 describes the value of OC for each experiment where lower values indicate better confusion matrix result.
| Classes | Method | OC |
|---|---|---|
| 2 | 1 | 0.26375 |
| 2 | 0.26051 | |
| 3 | 1 | 0.47436 |
| 2 | 0.31321 | |
| 4 | 1 | 0.53126 |
| 2 | 0.37965 |
Figure 4Experiment 2: decision tree induced for the HIV-1 protease with 2 classes, discretizing method by equal width.
Figure 5Experiment 2: decision tree induced for the HIV-1 protease with 4 classes, discretizing method by equal width.
Molecular descriptors rules of the drug-like subset from ZINC database. In Column 1 are the molecular descriptors and in Columns 2 and 3 are the minimum and maximum values for each descriptor.
| Descriptor | Minimum | Maximum |
|---|---|---|
| Molecular weight (MwT) | 150 | 500 |
| log | −4 | 5 |
| Number of HBD | 0 | 10 |
| Number of HBA | 0 | 10 |
| Number of rotatable bonds (NRB) | 0 | 8 |
| Apolar desolvation energy (ADE) | −100 | 40 |
| Polar desolvation energy (PDE) | −400 | 1 |
| Total polar surface area (TPSA) | 0 | 150 |
| Charge (Ch) | −5 | 5 |
Ligands selection results considering the rules induced by decision trees. Column 1 describes the experiment that induced the decision tree. Column 2 describes the extracted rules. Column 3 shows the respective class of each selected rule and Column 4 describes the total number of selected ligands according to each rule.
| Tree | Rules | Class | Selection |
|---|---|---|---|
| 2 classes | MwT > 232.239 | Good | 1,945,022 |
|
MwT > 277.369 | Good | 4,003,380 | |
|
| |||
| 4 classes | MwT > 232.239, | Good | 33,211 |
| MwT > 277.369 | Good | 947,028 | |
| MwT > 277.369, | Good | 634,513 | |
| MwT > 319.316 | Excellent | 1,043,884 | |