| Literature DB >> 28744468 |
Jinjian Jiang1,2, Nian Wang1, Peng Chen3, Jun Zhang4, Bing Wang5.
Abstract
BACKGROUND: Drug-target interaction is key in drug discovery, especially in the design of new lead compound. However, the work to find a new lead compound for a specific target is complicated and hard, and it always leads to many mistakes. Therefore computational techniques are commonly adopted in drug design, which can save time and costs to a significant extent.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28744468 PMCID: PMC5514335 DOI: 10.1155/2017/6340316
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The details of the drug-target dataset.
| Dataset | Drugs | Targets | Positive pairs | Negative pairs | Total pairs |
|---|---|---|---|---|---|
| Enzymes | 419 | 643 | 2719 | 5438 | 8157 |
| Ion channels | 203 | 198 | 1372 | 2744 | 4116 |
| GPCRs | 217 | 92 | 620 | 1240 | 1860 |
| Nuclear receptors | 53 | 25 | 86 | 172 | 258 |
|
| |||||
| In total | 892 | 958 | 4797 | 9594 | 14391§ |
§The total number of drug-target pairs in the four datasets.
Figure 1Feature encoding for target proteins. From a target protein, the amino acid composition of segments can be obtained; for example, residue “L” appears four times in the first segment of the sequence and residue “V” three times in the 10th segment.
Figure 2The flowchart of the ensemble system for the drug-target prediction. It illustrates the system for the 7 top principal components. The “PCA 1” denotes the features of protein targets created by the first top principal component, while the “D1 1 ~ 206” means the first feature group of drugs from 1 to 206 created by the “PaDEL” software. Each feature group consists of the roughly same number of features. Therefore each instance is composed of one “PCA” feature group and one “PaDEL” feature group. In total, there are 49 combinations for the case of the 7 top principal components.
Prediction performance on GPCRs dataset for different numbers of PCAs and feature groups by the use of kNN classifier.
| PCAs | Number of fragments | Number of features‡ | Rec | Acc | Prec |
|
|---|---|---|---|---|---|---|
| 3 | 24 | 1440 | 0.670 | 0.873 | 0.930 | 0.793 |
| 5 | 14 | 1400 | 0.914 | 0.850 | 0.715 | 0.801 |
| 7 | 10 | 1400 | 0.985 | 0.863 | 0.712 | 0.827 |
| 15 | 5 | 1500 | 0.825 | 0.792 | 0.648 | 0.726 |
| 19 | 4 | 1520 | 0.631 | 0.846 | 0.871 | 0.732 |
‡The number of features for targets when using the top PCAs and the fragment of protein sequences.
PCA physicochemical property.
| PCA 1 | PCA 2 | PCA 3 | PCA 4 | PCA 5 | PCA 6 | PCA 7 |
|---|---|---|---|---|---|---|
| −82.02 | 357.18 | −55.81 | 26.80 | −18.23 | 26.74 | 19.30 |
| −269.25 | −276.13 | −18.42 | 137.60 | 87.32 | 99.24 | 8.59 |
| −280.14 | −66.50 | −85.96 | −55.27 | −74.70 | 20.86 | −10.55 |
| −134.57 | −20.18 | 243.00 | −95.26 | −12.48 | −25.00 | 91.83 |
| 460.62 | 102.96 | 214.88 | −209.61 | 43.97 | 25.68 | −69.26 |
| −277.72 | −209.13 | −16.25 | −51.29 | −29.74 | 2.98 | −27.24 |
| −257.00 | −101.24 | 150.56 | 3.46 | −21.27 | −7.72 | 49.92 |
| −260.61 | 376.97 | −105.75 | −48.65 | 90.13 | −27.93 | 32.63 |
| −19.38 | −257.49 | −154.94 | −98.92 | 26.61 | 59.15 | −1.32 |
| 271.52 | 74.55 | −81.98 | 58.23 | −58.45 | 13.58 | −28.15 |
| 220.12 | 203.29 | −67.17 | 179.39 | 2.03 | −26.37 | 0.92 |
| −350.16 | −77.85 | 212.57 | 185.66 | 3.86 | −52.35 | −99.21 |
| 316.33 | −140.77 | −70.95 | −100.15 | −67.87 | 1.04 | −5.41 |
| 408.95 | −30.29 | −35.82 | 67.80 | 25.37 | −5.40 | 2.90 |
| −262.26 | −30.49 | −187.34 | −151.98 | 44.89 | −97.33 | −38.50 |
| −262.34 | 189.47 | −44.11 | −28.52 | −38.51 | 40.75 | −11.42 |
| −44.89 | 112.74 | 155.45 | −31.04 | −11.86 | 13.72 | 12.65 |
| 467.95 | −270.00 | 10.58 | 28.37 | 52.56 | −36.18 | 31.41 |
| 125.11 | −178.27 | −76.35 | 104.42 | −37.11 | −62.19 | 42.52 |
| 229.75 | 241.19 | 13.81 | 78.94 | −6.55 | 36.72 | −1.62 |
|
| ||||||
| 51.01% | 25.45% | 10.09% | 7.23% | 1.40% | 1.26% | 1.10% |
The last row denotes the variances the components account for.
Prediction performance on GPCRs dataset for each pair of drug-target feature groups by the use of kNN classifier.
| PCA |
| Rec | Acc | Prec |
|
|---|---|---|---|---|---|
| 1 | 1 | 0.988 | 0.349 | 0.337 | 0.503 |
| 1 | 2 | 0.895 | 0.419 | 0.353 | 0.507 |
| 1 | 3 | 0.988 | 0.349 | 0.337 | 0.503 |
| 1 | 4 | 0.988 | 0.357 | 0.340 | 0.506 |
| 1 | 5 | 0.581 | 0.628 | 0.455 | 0.510 |
| 1 | 6 | 0.930 | 0.411 | 0.354 | 0.513 |
| 1 | 7 | 0.314 |
| 0.628 | 0.419 |
| Average | 0.812 | 0.460 | 0.401 | 0.494 | |
|
| |||||
| 2 | 1 | 1.000 | 0.353 | 0.340 | 0.507 |
| 2 | 2 | 0.895 | 0.422 | 0.355 | 0.508 |
| 2 | 3 | 0.988 | 0.345 | 0.336 | 0.501 |
| 2 | 4 | 0.988 | 0.349 | 0.337 | 0.503 |
| 2 | 5 | 0.570 | 0.636 | 0.462 | 0.510 |
| 2 | 6 | 0.907 | 0.426 | 0.358 | 0.513 |
| 2 | 7 | 0.279 |
| 0.686 | 0.397 |
| Average | 0.804 | 0.464 | 0.411 | 0.491 | |
|
| |||||
| 3 | 1 | 0.988 | 0.349 | 0.337 | 0.503 |
| 3 | 2 | 0.942 | 0.391 | 0.348 | 0.508 |
| 3 | 3 | 0.988 | 0.349 | 0.337 | 0.503 |
| 3 | 4 | 0.988 | 0.349 | 0.337 | 0.503 |
| 3 | 5 | 0.500 | 0.632 | 0.453 | 0.475 |
| 3 | 6 | 0.930 | 0.419 | 0.357 | 0.516 |
| 3 | 7 | 0.419 |
| 0.667 | 0.514 |
| Average | 0.822 | 0.461 | 0.405 | 0.503 | |
|
| |||||
| 4 | 1 | 0.988 | 0.349 | 0.337 | 0.503 |
| 4 | 2 | 0.895 | 0.415 | 0.352 | 0.505 |
| 4 | 3 | 1.000 | 0.345 | 0.337 | 0.504 |
| 4 | 4 | 1.000 | 0.349 | 0.339 | 0.506 |
| 4 | 5 | 0.558 | 0.632 | 0.457 | 0.503 |
| 4 | 6 | 0.942 | 0.419 | 0.358 | 0.519 |
| 4 | 7 | 0.314 |
| 0.643 | 0.422 |
| Average | 0.814 | 0.460 | 0.403 | 0.495 | |
|
| |||||
| 5 | 1 | 0.988 | 0.357 | 0.340 | 0.506 |
| 5 | 2 | 0.895 | 0.415 | 0.352 | 0.505 |
| 5 | 3 | 0.988 | 0.349 | 0.337 | 0.503 |
| 5 | 4 | 0.988 | 0.349 | 0.337 | 0.503 |
| 5 | 5 | 0.570 | 0.609 | 0.434 | 0.492 |
| 5 | 6 | 0.930 | 0.407 | 0.352 | 0.511 |
| 5 | 7 | 0.419 |
| 0.537 | 0.471 |
| Average | 0.825 | 0.453 | 0.384 | 0.499 | |
|
| |||||
| 6 | 1 | 0.988 | 0.349 | 0.337 | 0.503 |
| 6 | 2 | 0.895 | 0.426 | 0.356 | 0.510 |
| 6 | 3 | 0.988 | 0.349 | 0.337 | 0.503 |
| 6 | 4 | 1.000 | 0.345 | 0.337 | 0.504 |
| 6 | 5 | 0.535 | 0.640 | 0.465 | 0.497 |
| 6 | 6 | 0.942 | 0.419 | 0.358 | 0.519 |
| 6 | 7 | 0.337 |
| 0.707 | 0.457 |
| Average | 0.812 | 0.466 | 0.414 | 0.499 | |
|
| |||||
| 7 | 1 | 0.988 | 0.349 | 0.337 | 0.503 |
| 7 | 2 | 0.930 | 0.403 | 0.351 | 0.510 |
| 7 | 3 | 1.000 | 0.357 | 0.341 | 0.509 |
| 7 | 4 | 0.988 | 0.349 | 0.337 | 0.503 |
| 7 | 5 | 0.570 | 0.628 | 0.454 | 0.505 |
| 7 | 6 | 0.965 | 0.403 | 0.355 | 0.519 |
| 7 | 7 | 0.302 |
| 0.650 | 0.413 |
| Average | 0.820 | 0.457 | 0.283 | 0.495 | |
Prediction performance of the kNN ensemble classifier with majority vote technique. The ensemble system predicts a drug-target pair to be interacting if all of kNN classifiers in the ensemble predict it to be interacting.
| Dataset | Target type | Rec | Acc | Prec |
|
|---|---|---|---|---|---|
| Test‡ | Enzymes | 0.779 | 0.918 | 0.972 | 0.864 |
| Ion channels | 0.906 | 0.882 | 0.778 | 0.837 | |
| GPCRs | 0.985 | 0.863 | 0.712 | 0.827 | |
| Nuclear receptors | 0.916 | 0.921 | 0.856 | 0.885 |
‡Prediction on the test dataset ℵts.
Performance comparison of our method with two works on the same datasets in terms of “Acc” measure.
| Method | Type | Enzymes | Ion channels | GPCRs | Nuclear receptors |
|---|---|---|---|---|---|
| DrugECs | kNN | 0.918 | 0.882 | 0.863 | 0.921 |
| Reference [ | kNN | 0.855 | 0.808 | 0.785 | 0.857 |
| Web-servers | 0.910a | 0.873b | 0.855c | 0.892d | |
| Random predictor | 0.489 | 0.489 | 0.488 | 0.488 |
aSee [34] for the iEzy-Drug predictor and its reported success rates; bsee [35] for the iCDI-Drug predictor and its reported success rates; csee [36] for the iGPCR-Drug predictor and its reported success rates; dsee [37] for the iNR-Drug predictor and its reported success rates.
Figure 3Performance comparison of the two methods in MCC.
The most correlated properties in AAindex1 to the top component of PCA.
| PCA | AAindex1 | Data description |
|---|---|---|
| 1 | NADH010104 | Hydropathy scale based on self-information values in the two-state model (20% accessibility) (Naderi-Manesh et al., 2001) |
| 2 | RADA880103 | Transfer free energy from vap to chx (Radzicka-Wolfenden, 1988) |
| 3 | NADH010107 | Hydropathy scale based on self-information values in the two-state model (50% accessibility) (Naderi-Manesh et al., 2001) |
| 4 | RICJ880112 | Relative preference value at C3 (Richardson-Richardson, 1988) |
| 5 | KHAG800101 | The Kerr-constant increments (Khanarian-Moore, 1980) |
| 6 | RICJ880113 | Relative preference value at C2 (Richardson-Richardson, 1988) |
| 7 | PRAM820103 | Correlation coefficient in regression analysis (Prabhakaran-Ponnuswamy, 1982) |
The first column denotes the top components in PCA calculation; the second column denotes the property accessions in AAindex1 dataset.
Figure 4Illustration of the biggest correlation coefficient of each PCA component to AAindex1 properties and the variance of each PCA component accounted for. In x-axis, number 1 is for the first principal component while the height of green bar denotes the biggest correlation coefficient between the component and properties in AAindex1 dataset. The yellow bar denotes the variance the component accounted for.