| Literature DB >> 25629082 |
Sonam Gaba1, Salma Jamal2, Vinod Scaria1.
Abstract
Schistosomiasis is a neglected tropical disease caused by a parasite Schistosoma mansoni and affects over 200 million annually. There is an urgent need to discover novel therapeutic options to control the disease with the recent emergence of drug resistance. The multifunctional protein, thioredoxin glutathione reductase (TGR), an essential enzyme for the survival of the pathogen in the redox environment has been actively explored as a potential drug target. The recent availability of small-molecule screening datasets against this target provides a unique opportunity to learn molecular properties and apply computational models for discovery of activities in large molecular libraries. Such a prioritisation approach could have the potential to reduce the cost of failures in lead discovery. A supervised learning approach was employed to develop a cost sensitive classification model to evaluate the biological activity of the molecules. Random forest was identified to be the best classifier among all the classifiers with an accuracy of around 80 percent. Independent analysis using a maximally occurring substructure analysis revealed 10 highly enriched scaffolds in the actives dataset and their docking against was also performed. We show that a combined approach of machine learning and other cheminformatics approaches such as substructure comparison and molecular docking is efficient to prioritise molecules from large molecular datasets.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25629082 PMCID: PMC4275605 DOI: 10.1155/2014/957107
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
The Enriched scaffolds having P < 0.01 and enrichment factor >10 that are corresponding to 184 actives.
| Scaffolds | Matches in Actives | Matches in Inactives | Actives | Inactives | Chi-square |
| Enrichment factor |
|---|---|---|---|---|---|---|---|
|
| 13 | 1 | 10722 | 331527 | 370.9611 | 1.16 | 401.48 |
|
| |||||||
|
| 16 | 4 | 10719 | 331524 | 388.9498 | 1.40 | 123.53 |
|
| |||||||
|
| 11 | 4 | 10724 | 331524 | 243.3006 | 7.50 | 84.93 |
|
| |||||||
|
| 20 | 11 | 10715 | 331517 | 384.4568 | 1.33 | 56.15 |
|
| |||||||
|
| 21 | 20 | 10714 | 331508 | 312.045 | 7.83 | 32.43 |
|
| |||||||
|
| 28 | 32 | 10707 | 331496 | 374.2882 | 2.18 | 27.02 |
|
| |||||||
|
| 21 | 27 | 10714 | 331501 | 260.64 | 1.24 | 24.02 |
|
| |||||||
|
| 12 | 21 | 10723 | 331507 | 119.9333 | 6.54 | 17.65 |
|
| |||||||
|
| 12 | 30 | 10723 | 331498 | 89.44623 | 3.15 | 12.35 |
|
| |||||||
|
| 30 | 92 | 10705 | 331436 | 184.8912 | 4.15 | 10.0 |
Comparison of sensitivity, specificity, accuracy, and balanced classification rates and Matthews correlation coefficient for each of the classifiers used in the present study.
| Classifier | Cost | TP rate | FP rate | BCR | MCC |
|---|---|---|---|---|---|
| Naïve Bayes | 10 | 50.3 | 19.1 | 65 | 0.13 |
| Random forest | 860 | 79.4 | 19.1 | 80.1 | 0.25 |
| J48 | 150 | 73.2 | 18.5 | 77.3 | 0.23 |
Figure 1Comparison of the performance of the models of naïve Bayes, random forest, and J48 based on (a) sensitivity and specificity and (b) accuracy and BCR.
Figure 2Comparison of the performance of three classifiers based on ROC (receiver operating characteristics) curve.
Figure 3Docked molecules. (a) FAD and all the actives (b)–(k) corresponding to 10 enriched scaffolds in the enzyme TGR.