| Literature DB >> 25705262 |
Wei Zhang1, Lijuan Ji2, Yanan Chen1, Kailin Tang1, Haiping Wang3, Ruixin Zhu1, Wei Jia4, Zhiwei Cao1, Qi Liu1.
Abstract
BACKGROUND: The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms.Entities:
Keywords: Data integration; Drug discovery; Learning to Rank; Virtual screening
Year: 2015 PMID: 25705262 PMCID: PMC4333300 DOI: 10.1186/s13321-015-0052-z
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Amount of CAS registry records of chemical substance.
Figure 2Different computational schemas in virtual screening.
Curated bingding database dataset
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Curated CSAR dataset
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| 25 | 25 | - | 110 | 52 | 35 |
|
| Kd | Kd | - | pIC50 | pKi | pKi |
|
| No | No | No | YES | YES | YES |
In the original CSAR dataset, LPXC has no compound affinity information, and the compound affinity associated with CDK2 and CDK2-CyclinA were measured with Kd value, which is a rough way to measure the affinity of combination rather than the exactly activity. These three targets were not selected in the final curated dataset.
Figure 3NDCG@10 in Strategy I.
NDCG@10 of strategy I
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
|
| 0.4463 | 0.5885 | 0.5119 | 0.4032 | 0.6543 | 0.6446 |
|
|
| 0.5549 |
| 0.4208 |
| 0.5564 | 0.5917 |
|
|
| 0.5913 | 0.4983 | 0.4586 | 0.6052 |
|
|
|
|
| 0.4225 | 0.3850 | 0.4741 | 0.4673 |
|
|
|
|
| 0.4122 | 0.5110 | 0.5704 |
|
|
|
|
|
| 0.1254 | 0.2978 | 0.1825 | 0.5366 | 0.6341 |
|
|
|
| 0.3295 | 0.5366 | 0.4880 |
|
|
|
|
|
| 0.2076 | 0.3441 | 0.5005 | 0.6284 |
|
|
|
|
| 0.4749 |
| 0.5481 | 0.5506 |
|
|
|
|
| 0.5476 | 0.5420 | 0.5328 | 0.6281 |
|
|
|
|
| 0.4078 | 0.5584 | 0.5475 | 0.6169 |
|
|
|
|
| 0.4168 | 0.3436 | 0.3555 | 0.4605 |
|
|
|
|
| 0.4208 | 0.3270 | 0.4184 | 0.5256 |
|
|
|
|
| 0.4682 | 0.4684 | 0.5724 | 0.5172 | 0.6912 |
|
|
|
| 0.5828 | 0.5293 | 0.5009 | 0.6288 |
|
|
|
|
| 0.5204 | 0.5169 | 0.4038 | 0.6657 | 0.8334 | 0.8357 |
|
|
| 0.5860 | 0.4398 | 0.4510 | 0.5909 |
| 0.6945 |
|
|
| 0.5792 | 0.4819 | 0.4843 | 0.5758 |
|
|
|
|
| 0.6082 | 0.3600 | 0.6024 | 0.6530 | 0.7270 |
|
|
|
| 0.4877 | 0.6042 | 0.4628 | 0.5718 |
|
|
|
|
| 0.4489 | 0.4484 | 0.5028 | 0.4054 |
| 0.6292 |
|
|
| 0.4619 | 0.3547 | 0.4285 | 0.5053 |
| 0.6826 |
|
|
| 0.4829 | 0.4057 | 0.4001 | 0.4823 |
| 0.4887 |
|
|
| 0.4251 | 0.3584 | 0.4199 | 0.5630 |
| 0.5629 |
|
The bold number among each row indicates the best performance among all the methods in this row.
Figure 4NDCG@10 in Strategy II.
PDE family
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cathepsin family
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
Figure 5NDCG@10 of CTSK and PDE5 in Strategy II and Strategy III.
NDCG@10 in strategy IV
|
|
|
| |
|---|---|---|---|
|
| 0.6562 | 0.7726 | 0.4876 |
|
|
|
|
|
Figure 6Proteochemometric Modeling.
Figure 7Research workflow for The datasets used in this study were curated from Binding Database and CSAR by well-designed filtering rules. The compounds and targets are represented in a specific feature vector respectively. With certain feature mapping function, the compound-target pair as a whole is transferred to a new feature vector. Based on four different testing strategies, the testing results on different VS algorithms are presented and evaluated quantitatively with NDCG@10. The color bars in the test frame indicate the corresponding algorithms investigated in the specific test strategies.
Figure 8Illustration of training and testing in
6 algorithms
|
|
|
|
|---|---|---|
|
| PRank | [ |
|
| RankNet | [ |
| RankBoost | [ | |
| SVMRank | [ | |
|
| AdaRank | [ |
| ListNet | [ |
Figure 9Three different approaches of