| Literature DB >> 27110288 |
Zhonghua Wang1, Lu Liang2, Zheng Yin1, Jianping Lin2.
Abstract
BACKGROUND: In silico target prediction of compounds plays an important role in drug discovery. The chemical similarity ensemble approach (SEA) is a promising method, which has been successfully applied in many drug-related studies. There are various models available analogous to SEA, because this approach is based on different types of molecular fingerprints. To investigate the influence of training data selection and the complementarity of different models, several SEA models were constructed and tested.Entities:
Keywords: Fingerprint; Off-target effect; Similarity; Target identification
Year: 2016 PMID: 27110288 PMCID: PMC4842302 DOI: 10.1186/s13321-016-0130-x
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Workflow of SEA. Data workflow and simple procedure of building an SEA model
Statistics of the training and test sets
| Data set | Target | Molecule | Ligand-target pair | Active |
|---|---|---|---|---|
| Training set (5) | 2,809 | 393,090 | 666,313 | All |
| Training set (6) | 2,297 | 294,877 | 407,296 | All |
| Training set (7) | 1,711 | 179,710 | 246,651 | All |
| Kinase training set (5) | 429 | 42,164 | 101,502 | All |
| Test set | 1190 | 26,498 | 80,066 | 37,138 |
| Kinase test | 259 | 2,225 | 3010 | 2,192 |
The size of 4 training data sets and 2 test sets. Numbers in brackets denote activity thresholds
Predictive results of SEA models with different activity thresholds (P value ≤ 0.05)
| Threshold (μm) | TS | Accuracy | Precision | Sensitivity | Specificity |
|
|
|---|---|---|---|---|---|---|---|
| 0.1 | 0.69 | 0.568 | 0.958 | 0.072 | 0.997 | 0.278 | 0.557 |
| 1 | 0.69 | 0.592 | 0.94 | 0.129 | 0.993 | 0.417 | 0.687 |
| 10 | 0.62 | 0.676 | 0.826 | 0.382 | 0.93 | 0.67 | 0.772 |
Predictive results of SEA models with different pharmacophore representations of compounds in fingerprints
| Points of pharmacophore | Bin shape | Accuracy | Precision | Sensitivity | Specificity |
|
|
|---|---|---|---|---|---|---|---|
| 2 | (0,2), (2,5), (5,8) | 0.513 | 0.479 | 0.567 | 0.466 | 0.494 | 0.483 |
| 2, 3 | (0,2), (2,5), (5,8) | 0.642 | 0.678 | 0.436 | 0.821 | 0.61 | 0.657 |
| 2, 3 | (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 20) | 0.667 | 0.752 | 0.42 | 0.88 | 0.65 | 0.719 |
At significance level 0.05, the test result of different SEA models. The numbers after “Multi-voting” denote each voting scheme, e.g. Mult-voting (3) is a 3-vote scheme
| Accuracy | Precision | Sensitivity | Specificity |
|
| |
|---|---|---|---|---|---|---|
| Atom pair | 0.692 | 0.817 | 0.432 | 0.916 | 0.694 | 0.777 |
| MACCS | 0.682 | 0.802 | 0.417 | 0.911 | 0.677 | 0.76 |
| Morgan | 0.676 | 0.826 | 0.382 | 0.93 | 0.67 | 0.773 |
| Topological | 0.682 | 0.837 | 0.39 | 0.934 | 0.681 | 0.784 |
| Pharmacophore | 0.667 | 0.752 | 0.42 | 0.88 | 0.65 | 0.719 |
| Multi-voting (1) | 0.681 | 0.71 | 0.529 | 0.813 | 0.664 | 0.696 |
| Multi-voting (2) | 0.688 | 0.797 | 0.44 | 0.903 | 0.686 | 0.761 |
| Multi-voting (3) | 0.684 | 0.837 | 0.396 | 0.933 | 0.684 | 0.786 |
| Multi-voting (4) | 0.675 | 0.864 | 0.356 | 0.952 | 0.672 | 0.797 |
| Multi-voting (5) | 0.669 | 0.906 | 0.32 | 0.971 | 0.663 | 0.817 |
The number of overlaps of true positive predictions of each SEA model
| Atom pair | MACCS | Morgan | Topological | Pharmacophore | |
|---|---|---|---|---|---|
| Atom pair | 16,044 | 13,853 | 13,335 | 13,805 | 13,600 |
| MACCS | 13,853 | 15,478 | 13,010 | 13,191 | 13,084 |
| Morgan | 13,335 | 13,010 | 14,176 | 13,282 | 12,902 |
| Topological | 13,805 | 13,191 | 13,282 | 14,467 | 12,814 |
| Pharmacophore | 13,600 | 13,084 | 12,902 | 12,814 | 15,594 |
Fig. 2The upper plot illustrates the total number of positive (in red) and true positive predictions (in light blue) with different vote numbers, and the lower part is the corresponding precision
Fig. 3Target relation network for kinase using a kinase-specific SEA model. The nodes represent targets, and the linkages indicate significant (P value ) relationships predicted by SEA. The nodes are colored according to 9 kinase subfamily types