| Literature DB >> 24564276 |
Ana T Winck, Karina S Machado, Osmar Norberto de Souza, Duncan D Ruiz.
Abstract
BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564276 PMCID: PMC3909228 DOI: 10.1186/1471-2164-14-S6-S6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 13D-Grid considering the InhA receptor and the PIF ligand. This 3D-Grid has 60.0 Å of size in axes x, y and z. The distance between each point is 0.375 Å.
Range of FEB (Kcal/mol) values to each ligand considered.
| Ligand | Min FEB | Max FEB | Avg FEB |
|---|---|---|---|
| NADH | -20.61 | -0.02 | -9.23 |
Number of attributes selected after applying feature selection approaches.
| Ligand | Context-FS Algorithm | CFS | Context-FS Algorithm ∪ CFS |
|---|---|---|---|
| NADH | 84 | 17 | 93 |
| TCL | 106 | 14 | 114 |
| PIF | 104 | 16 | 108 |
| ETH | 105 | 6 | 111 |
Figure 2Model Tree generated by the M5P Algorithm.
Model evaluation predictive measures.
| Ligand | Preprocessing Strategy | Evaluation | |||
|---|---|---|---|---|---|
| Nodes | Correlation | MAE | RMSE | ||
| NADH | 1 | 15 | 0.9536 | 1.0030 | 1.3660 |
| 2 | 5 | 0.9512 | 1.0189 | 1.4000 | |
| 3 | 6 | 0.9483 | 1.0578 | 1.4396 | |
| 4 | 9 | 0.9513 | 1.0211 | 1.3992 | |
| PIF | 1 | 22 | 0.9685 | 0.3077 | 0.4071 |
| 2 | 19 | 0.9692 | 0.3053 | 0.4022 | |
| 3 | 22 | 0.9653 | 0.3237 | 0.4264 | |
| 4 | 19 | 0.9686 | 0.3067 | 0.4060 | |
| TCL | 1 | 12 | 0.9700 | 0.2396 | 0.3108 |
| 2 | 19 | 0.9708 | 0.2364 | 0.3068 | |
| 3 | 15 | 0.9667 | 0.2508 | 0.3273 | |
| 4 | 24 | 0.9708 | 0.2369 | 0.3069 | |
| ETH | 1 | 18 | 0.6086 | 0.2106 | 0.2665 |
| 2 | 15 | 0.5999 | 0.2123 | 0.2687 | |
| 3 | 16 | 0.5566 | 0.2212 | 0.2790 | |
| 4 | 17 | 0.6047 | 0.2118 | 0.2675 | |
Model evaluation context measures.
| Ligand | Preprocessing Strategy | Evaluation | ||
|---|---|---|---|---|
| Precision | Recall | F-score | ||
| NADH | 1 | 0.1176 | 0.0385 | 0.0580 |
| 2 | 0.4375 | 0.1346 | 0.2059 | |
| 3 | 0.3636 | 0.0769 | 0.1270 | |
| 4 | 0.1875 | 0.0576 | 0.0882 | |
| PIF | 1 | 0.2143 | 0.1731 | 0.1915 |
| 2 | 0.5294 | 0.3462 | 0.4186 | |
| 3 | 0.4667 | 0.1346 | 0.2090 | |
| 4 | 0.4571 | 0.3076 | 0.3678 | |
| TCL | 1 | 0.1282 | 0.0962 | 0.1099 |
| 2 | 0.4412 | 0.2885 | 0.3488 | |
| 3 | 0.4286 | 0.1154 | 0.1818 | |
| 4 | 0.3928 | 0.2115 | 0.2750 | |
| ETH | 1 | 0.3939 | 0.2500 | 0.3059 |
| 2 | 0.4375 | 0.2692 | 0.3333 | |
| 3 | 0.1250 | 0.0192 | 0.0333 | |
| 4 | 0.4516 | 0.2692 | 0.3373 | |