| Literature DB >> 22373185 |
Varun Khanna1, Shoba Ranganathan.
Abstract
BACKGROUND: Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22373185 PMCID: PMC3278842 DOI: 10.1186/1471-2105-12-S13-S25
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of successful targets in helminths and corresponding drug class known to be active against those target.
| S.No | Target | Biochemical class | BLAST score | Drug family |
|---|---|---|---|---|
| 1. | Nicotinic acetylcholine receptor beta 1 | Ion transport | E: 7e-27 | Cholinergic Agents |
| 2. | Glutamate-gated chloride channel | Ion transport | E: e-137 | Macrolides |
| 3. | Glutathione S-transferase | Transferases transferring alkyl or aryl groups | E: 6e-47 | Isoquinolines |
| 4. | Tubulin beta | E: 0 | Benzimidazoles | |
| 5. | Gamma-aminobutyric acid receptor | Chloride channel | Piperazines |
Figure 1Examples of active and inactive compounds used in this analysis. The active compounds are collected from various literature sources and PubChem database while inactive compounds are adapted from DrugBank.
Composition of the datasets used in this study.
| Dataset | Training set | Testing set | Total |
|---|---|---|---|
| Active | 126 | 22 | |
| Inactive | 114 | 33 | |
| Prediction set (from ChEMBL) | − | − | 10,000 |
Figure 2Definition of the scaffold used in this study. The scaffold is obtained by iteratively removing side chains and converting all the bonds to single bonds
Figure 3Overall methodology adopted for descriptor selection. Out of the total 333 MOE descriptors only 14 are used in this analysis.
List of final 14 descriptors used in this analysis.
| S.No. | Descriptor | Description |
|---|---|---|
| 1. | AM1_HF | The heat of formation (kcal/mol) |
| 2. | AM1_HOMO | The energy (eV) of the Highest Occupied Molecular Orbital |
| 3. | ASA+ | Water accessible surface area of all atoms with positive partial charge |
| 4. | ASA- | Water accessible surface area of all atoms with negative partial charge |
| 5. | ASA_P | Water accessible surface area of all polar |
| 6. | E_ele | Electrostatic component of the potential energy. |
| 7. | KierFlex | Kier molecular flexibility index |
| 8 | LogS | Log of the aqueous solubility (mol/L). |
| 9. | Std_dim3 | The square root of the third largest eigenvalue of the covariance matrix of the atomic coordinates. |
| 10. | Vsurf_CP | |
| 11. | Vsurf_CW2 | Capacity factor |
| 12. | Vsuf_D8 | Hydrophobic volume |
| 13. | Vsurf_EWmin | Lowest hydrophilic energy |
| 14. | Vsurf_HB1 | H-bond donor capacity |
All the descriptors are derived from MOE software.
Performance measure of SVM classifier in training and test dataset.
| Dataset | SN (%) | SP (%) | BA (%) | F-measure (%) | MCC |
|---|---|---|---|---|---|
| Training set | 87.56 | 85.38 | 86.43 | 86.52 | 0.75 |
| Test set | 83.82 | 79.76 | 81.79 | 79.17 | 0.63 |
SN: sensitivity, SP: specificity, BA: balanced accuracy, MCC: Matthews correlation coefficient
The number of unique scaffolds found in active and inactive sets along with the percentage relative to the dataset size.
| Datasets | Size of the dataset | Non-redundant scaffolds | Percentage (relative to dataset size) |
|---|---|---|---|
| Actives | 148 | 48 | 32.43% |
| Inactives | 147 | 80 | 54.42% |
Figure 4Top ten scaffolds present in active and inactive dataset. Inactive dataset is more diverse than active dataset. Five out of top ten scaffolds are shared in both the datasets.
Figure 5Examples of the actives predicted in the prediction set derived from ChEMBL database. All the molecules shown in the figure pass “rule of five” (Ro5) test and are medicinal chemist friendly (MCF). Further a few of them also pass lead-likeness “rule of three” (Ro3) test.