| Literature DB >> 27805045 |
Heng Luo1, Ping Zhang2, Xi Hang Cao3, Dizheng Du1, Hao Ye1, Hui Huang1, Can Li1, Shengying Qin1, Chunling Wan1, Leming Shi4, Lin He1, Lun Yang1.
Abstract
The cost of developing a new drug has increased sharply over the past years. To ensure a reasonable return-on-investment, it is useful for drug discovery researchers in both industry and academia to identify all the possible indications for early pipeline molecules. For the first time, we propose the term computational "drug candidate positioning" or "drug positioning", to describe the above process. It is distinct from drug repositioning, which identifies new uses for existing drugs and maximizes their value. Since many therapeutic effects are mediated by unexpected drug-protein interactions, it is reasonable to analyze the chemical-protein interactome (CPI) profiles to predict indications. Here we introduce the server DPDR-CPI, which can make real-time predictions based only on the structure of the small molecule. When a user submits a molecule, the server will dock it across 611 human proteins, generating a CPI profile of features that can be used for predictions. It can suggest the likelihood of relevance of the input molecule towards ~1,000 human diseases with top predictions listed. DPDR-CPI achieved an overall AUROC of 0.78 during 10-fold cross-validations and AUROC of 0.76 for the independent validation. The server is freely accessible via http://cpi.bio-x.cn/dpdr/.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27805045 PMCID: PMC5090963 DOI: 10.1038/srep35996
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performance evaluation of DPDR-CPI using the entire dataset versus the training set during 10-fold cross-validations.
| Endpoints | Dataset | Accuracy | Precision | Sensitivity | Specificity | AUROC | AUPR |
|---|---|---|---|---|---|---|---|
| 638 ICD-9 diseases | Entire dataset | 0.956 ± 0.000 | 0.176 ± 0.001 | 0.274 ± 0.002 | 0.972 ± 0.000 | 0.782 ± 0.001 | 0.151 ± 0.001 |
| Training set | 0.953 ± 0.000 | 0.152 ± 0.001 | 0.241 ± 0.001 | 0.969 ± 0.000 | 0.752 ± 0.001 | 0.123 ± 0.001 | |
| 328 ICD-9 disease families | Entire dataset | 0.925 ± 0.000 | 0.167 ± 0.000 | 0.363 ± 0.001 | 0.942 ± 0.000 | 0.783 ± 0.000 | 0.169 ± 0.001 |
| Training set | 0.919 ± 0.000 | 0.152 ± 0.001 | 0.341 ± 0.003 | 0.938 ± 0.000 | 0.760 ± 0.001 | 0.149 ± 0.001 |
The entire dataset was utilized to build server-side prediction models while the training set was used to construct models for independent validation. The training set is a half of the entire dataset.
Figure 1Under global metric, (A) the ROC curve comparison and (B) the precision-recall curve comparison for different prediction methods for 638 ICD-9 disease indications on the independent validation data.
Performance comparisons of the different structural descriptor-based methods and DPDR-CPI using 638 endpoints of ICD-9 disease indications on the independent validation data.
| Metric | Method | Accuracy | Precision | Sensitivity | Specificity | AUROC | AUPR |
|---|---|---|---|---|---|---|---|
| global | LR-ECFP6 | 0.801 | 0.040 | 0.355 | 0.811 | 0.651 | 0.040 |
| LR-E-State | 0.792 | 0.033 | 0.305 | 0.802 | 0.602 | 0.030 | |
| LR-FCFP6 | 0.904 | 0.050 | 0.193 | 0.919 | 0.670 | 0.051 | |
| LR-FP4 | 0.870 | 0.045 | 0.247 | 0.883 | 0.633 | 0.038 | |
| LR-KR | 0.897 | 0.045 | 0.190 | 0.912 | 0.660 | 0.048 | |
| LR-MACCS | 0.881 | 0.047 | 0.235 | 0.896 | 0.649 | 0.039 | |
| LR-PubChem | 0.866 | 0.054 | 0.318 | 0.878 | 0.704 | 0.050 | |
| DPDR-CPI | 0.964 | 0.192 | 0.203 | 0.981 | 0.764 | 0.118 | |
| drug-centric (628) | LR-ECFP6 | 0.906 ± 0.140 | 0.316 ± 0.303 | 0.490 ± 0.267 | 0.916 ± 0.144 | 0.783 ± 0.154 | 0.135 ± 0.161 |
| LR-E-State | 0.888 ± 0.153 | 0.235 ± 0.245 | 0.465 ± 0.273 | 0.898 ± 0.158 | 0.744 ± 0.150 | 0.085 ± 0.087 | |
| LR-FCFP6 | 0.902 ± 0.147 | 0.298 ± 0.292 | 0.497 ± 0.267 | 0.911 ± 0.150 | 0.781 ± 0.153 | 0.132 ± 0.156 | |
| LR-FP4 | 0.897 ± 0.145 | 0.258 ± 0.261 | 0.472 ± 0.266 | 0.907 ± 0.149 | 0.761 ± 0.146 | 0.103 ± 0.116 | |
| LR-KR | 0.903 ± 0.139 | 0.289 ± 0.290 | 0.498 ± 0.265 | 0.912 ± 0.142 | 0.778 ± 0.152 | 0.129 ± 0.156 | |
| LR-MACCS | 0.895 ± 0.148 | 0.271 ± 0.276 | 0.478 ± 0.269 | 0.905 ± 0.152 | 0.762 ± 0.152 | 0.108 ± 0.124 | |
| LR-PubChem | 0.897 ± 0.156 | 0.293 ± 0.294 | 0.486 ± 0.265 | 0.907 ± 0.160 | 0.766 ± 0.160 | 0.125 ± 0.146 | |
| DPDR-CPI | 0.893 ± 0.150 | 0.273 ± 0.282 | 0.511 ± 0.271 | 0.902 ± 0.154 | 0.775 ± 0.156 | 0.128 ± 0.163 | |
| disease-centric (638) | LR-ECFP6 | 0.668 ± 0.268 | 0.061 ± 0.075 | 0.559 ± 0.305 | 0.671 ± 0.278 | 0.563 ± 0.138 | 0.032 ± 0.036 |
| LR-E-State | 0.596 ± 0.323 | 0.059 ± 0.097 | 0.593 ± 0.338 | 0.596 ± 0.336 | 0.504 ± 0.150 | 0.026 ± 0.026 | |
| LR-FCFP6 | 0.711 ± 0.242 | 0.077 ± 0.105 | 0.529 ± 0.293 | 0.716 ± 0.250 | 0.583 ± 0.136 | 0.036 ± 0.044 | |
| LR-FP4 | 0.629 ± 0.307 | 0.064 ± 0.093 | 0.575 ± 0.326 | 0.632 ± 0.318 | 0.524 ± 0.148 | 0.030 ± 0.035 | |
| LR-KR | 0.690 ± 0.259 | 0.109 ± 0.212 | 0.553 ± 0.308 | 0.695 ± 0.268 | 0.571 ± 0.133 | 0.035 ± 0.048 | |
| LR-MACCS | 0.659 ± 0.286 | 0.061 ± 0.083 | 0.549 ± 0.310 | 0.663 ± 0.296 | 0.536 ± 0.145 | 0.030 ± 0.034 | |
| LR-PubChem | 0.746 ± 0.231 | 0.075 ± 0.090 | 0.523 ± 0.283 | 0.752 ± 0.239 | 0.609 ± 0.144 | 0.039 ± 0.046 | |
| DPDR-CPI | 0.888 ± 0.173 | 0.258 ± 0.261 | 0.388 ± 0.234 | 0.899 ± 0.179 | 0.682 ± 0.148 | 0.088 ± 0.091 |
Three types of metrics, including global, drug-centric and disease-centric metrics, were used.
Drug candidate positioning prediction for NM-702 using the DPDR-CPI server.
| Rank | Disease | Confidence |
|---|---|---|
| 1 | 458: Hypotension | 0.80 |
| 458: Hypotension | 0.80 | |
| 458.9: Hypotension, unspecified | 0.80 | |
| 2 | 434: Occlusion of cerebral arteries | 0.70 |
| 434.91: Cerebral artery occlusion, unspecified with cerebral infarction | 0.70 | |
| 3 | 443: Other peripheral vascular disease | 0.69 |
| 443.9: Peripheral vascular disease, unspecified | 0.69 | |
| 4 | 427: Cardiac dysrhythmias | 0.67 |
| 427: Cardiac dysrhythmias | 0.60 | |
| 427.1: Paroxysmal ventricular tachycardia | 0.59 | |
| 427.9: Cardiac dysrhythmia, unspecified | 0.58 |
The diseases are grouped into ICD-9 families and ranked by their confidence values.
Top disease predictions for rosiglitazone from the server.
| Rank | Disease | Confidence |
|---|---|---|
| 1 | 251: Other disorders of pancreatic internal secretion | 0.95 |
| 251.2: Hypoglycemia, unspecified | 0.83 | |
| 2 | 250: Diabetes mellitus | 0.93 |
| 250.1: Diabetes with ketoacidosis | 0.93 | |
| 250.10: Diabetes with ketoacidosis, type ii or unspecified type, not stated as uncontrolled | 0.93 | |
| 250.01: Diabetes mellitus without mention of complication, type i [juvenile type], not stated as uncontrolled | 0.89 | |
| 250.00: Diabetes mellitus without mention of complication, type ii or unspecified type, not stated as uncontrolled | 0.81 | |
| 250: Diabetes mellitus | 0.80 | |
| 3 | 362: Other retinal disorders | 0.91 |
| 362.83: Retinal edema | 0.73 | |
| 362.10: Background retinopathy, unspecified | 0.62 | |
| 362.9: Unspecified retinal disorder | 0.62 | |
| 4 | 277: Other and unspecified disorders of metabolism | 0.87 |
| 277.85: Disorders of fatty acid oxidation | 0.81 | |
| 5 | 276: Disorders of fluid electrolyte and acid-base balance | 0.87 |
| 276.2: Acidosis | 0.87 | |
| 276.69: Other fluid overload | 0.64 | |
| 6 | 365: Glaucoma | 0.85 |
| 365: Glaucoma | 0.85 | |
| 365.9: Unspecified glaucoma | 0.85 | |
| 365.1: Open-angle glaucoma | 0.84 | |
| 365.10: Open-angle glaucoma, unspecified | 0.84 | |
| 365.13: Pigmentary open-angle glaucoma | 0.84 | |
| 365.04: Ocular hypertension | 0.84 | |
| 365.00: Preglaucoma, unspecified | 0.82 | |
| 7 | 331: Other cerebral degenerations | 0.83 |
| 331.0: Alzheimer’s disease | 0.82 |
The diseases are grouped into ICD-9 families and ranked by their confidence values.
Figure 2Flow chart of the model training and prediction process.
We collected 1,256 drug molecules and 611 ligand-bindable targets (a) to constructed an in silico chemical-protein interactome (CPI) using docking (b). Based on the existing drug-indication knowledge, machine learning models (c) were trained to predict drug indications (d) based on the CPI. When a user submits a molecule to our server (e), it is docked against our library targets to generate docking scores. These scores are fed to the machine learning models (f) to predict the indications (g) for this molecule.
Figure 3Workflow of the server.
The user can submit a molecule in the format of MOL, MOL2, PDB, SDF or SMILES to the DPDR-CPI server. After the calculation is finished, the server will provide the indication predictions with probability values grouped by ICD-9 disease family. Then the user can check the target binding scores of the molecule across our 611 library targets. By clicking on the “Visualization” button, the user is able to view the interactive 3D binding confirmation between the molecule and any specific target.