| Literature DB >> 35710324 |
Yingnan Han1, Katherine Klinger1, Deepak K Rajpal1, Cheng Zhu2, Erin Teeple3.
Abstract
BACKGROUND: The Open Targets (OT) Platform integrates a wide range of data sources on target-disease associations to facilitate identification of potential therapeutic drug targets to treat human diseases. However, due to the complexity that targets are usually functionally pleiotropic and efficacious for multiple indications, challenges in identifying novel target to indication associations remain. Specifically, persistent need exists for new methods for integration of novel target-disease association evidence and biological knowledge bases via advanced computational methods. These offer promise for increasing power for identification of the most promising target-disease pairs for therapeutic development. Here we introduce a novel approach by integrating additional target-disease features with machine learning models to further uncover druggable disease to target indications.Entities:
Keywords: Data Integration; Drug discovery; Drug repurposing; Feature engineering; Machine learning; Open targets; Target indication expansion; XGBoost
Mesh:
Year: 2022 PMID: 35710324 PMCID: PMC9202116 DOI: 10.1186/s12859-022-04753-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Overview of Open Targets data and generation of newly computed features. Open Targets association evidence network edge weights are annotated for evidence from multiple sources (a). Novel target-disease association features generated from target-target similarity and target-disease matrices compared with factors used in calculation of a user-item matrix (b). Target-disease arrays are generated for each information source and association evidence for known drug status (c)
Fig. 2Workflow schematic for feature generation and therapeutic status prediction evaluation
Number of data instances used for training and validation after removal of all-zero value rows
| Set | Fold1 | Fold2 | Fold3 | Fold4 | Fold5 |
|---|---|---|---|---|---|
| Positive | 15,137 | 14,382 | 15,120 | 14,435 | 14,918 |
| Negative | 70,945 | 67,020 | 70,210 | 73,575 | 71,941 |
| Total | 86,082 | 81,402 | 85,330 | 88,010 | 86,859 |
| Positive | 3369 | 4085 | 3404 | 4098 | 3561 |
| Negative | 18,132 | 20,313 | 18,424 | 15,344 | 16,194 |
| Total | 21,501 | 24,398 | 21,828 | 19,442 | 19,755 |
Held-out testing data comprised of 46,290 instances (7382 positive: 38,907 negative)
Known drug status prediction (± standard deviation across 5 folds)
| Method | LogReg | RF | XGB |
|---|---|---|---|
| OT association evidence | |||
| AUROC | 0.7603 (± 0.0088) | 0.8685 (± 0.0075) | 0.8784 (± 0.0051) |
| AUPR | 0.0685 (± 0.0025) | 0.2074 (± 0.0093) | 0.2072 (± 0.0103) |
| Computed features + OT association evidence | |||
| AUROC | 0.8867 (± 0.0027) | 0.9262 (± 0.0019) | 0.9406 (± 0.0018) |
| AUPR | 0.6442 (± 0.0069) | 0.7500 (± 0.0070) | 0.7969 (± 0.0065) |
| OT association evidence | |||
| AUROC | 0.7625 (± 0.0357) | 0.8143 (± 0.0314) | 0.8076 (± 0.0335) |
| AUPR | 0.0707 (± 0.0102) | 0.0872 (± 0.0118) | 0.0888 (± 0.0177) |
| Computed features + OT association evidence | |||
| AUROC | 0.8864 (± 0.0076) | 0.9103 (± 0.0140) | 0.9137 (± 0.0142) |
| AUPR | 0.6452 (± 0.0226) | 0.7092 (± 0.0459) | 0.7264 (± 0.0457) |
Fig. 3Known drug prediction performance in Testing set by XGBoost, Random Forest and Logistic Regression. Precision-Recall curve (a), Receiver operating characteristic curve (b), F1 score (c), Sensitivity (d), Precision (e) and Specificity (f)
Fig. 4Newly computed features improve prediction accuracy. Prediction scores correspond to Testing set target-disease clinical trial stage (a). Feature importance scores indicate the feature types we generated strongly predict known drug therapeutic status (b). Target-disease arrays computed using target-target similarity reveal druggable target-disease pairs (c). Significant overlap between predicted indications and literature findings by text mining (d)
Top predicted indications for Ustekinumab
| Ranking | Disease | Disease_ID | Prediction_score |
|---|---|---|---|
| 1 | Immune system disease | EFO_0000540 | 0.95580226 |
| 2 | Asthma | EFO_0000270 | 0.94961476 |
| 3 | Rheumatoid arthritis | EFO_0000685 | 0.9285401 |
| 4 | Uveitis | EFO_1001231 | 0.8917595 |
| 5 | Alopecia areata | EFO_0004192 | 0.8835659 |
| 6 | Alzheimer's disease | EFO_0000249 | 0.8625836 |
| 7 | Schizophrenia | EFO_0000692 | 0.862216 |
| 8 | Atopic eczema | EFO_0000274 | 0.84960294 |
| 9 | Takayasu arteritis | EFO_1001857 | 0.8476705 |
| 10 | Temporal arteritis | EFO_1001209 | 0.8364161 |
| 11 | Obesity | EFO_0001073 | 0.8306895 |
| 12 | Multiple sclerosis | EFO_0003885 | 0.82465583 |
| 13 | Osteoarthritis | EFO_0002506 | 0.8140218 |
| 14 | Ulcerative colitis | EFO_0000729 | 0.8110609 |
| 15 | Relapsing–remitting multiple sclerosis | EFO_0003929 | 0.7750288 |
| 16 | Behcet's syndrome | EFO_0003780 | 0.75113726 |
| 17 | Diabetes mellitus | EFO_0000400 | 0.74368954 |
| 18 | Juvenile idiopathic arthritis | EFO_0002609 | 0.7401242 |
| 19 | Non-alcoholic steatohepatitis | EFO_1001249 | 0.7374891 |
| 20 | Systemic lupus erythematosus | EFO_0002690 | 0.73320544 |
| 21 | Ankylosing spondylitis | EFO_0003898 | 0.7330974 |
| 22 | Graft versus host disease | MONDO_0013730 | 0.72582567 |
| 23 | Psoriatic arthritis | EFO_0003778 | 0.71739846 |
| 24 | Alcohol dependence | EFO_0003829 | 0.71080697 |
| 25 | Dermatitis | MONDO_0002406 | 0.7098975 |
| 26 | Crohn's disease | EFO_0000384 | 0.6959177 |
| 27 | HIV-1 infection | EFO_0000180 | 0.6868525 |
| 28 | Periodontitis | EFO_0000649 | 0.6834563 |
| 29 | Tuberculosis | Orphanet_3389 | 0.67681295 |
| 30 | Post-traumatic stress disorder | EFO_0001358 | 0.6721301 |
| 31 | Psoriasis | EFO_0000676 | 0.67159593 |
| 32 | Viral disease | EFO_0000763 | 0.67042077 |
| 33 | Cystic fibrosis | Orphanet_586 | 0.6682126 |
| 34 | Abdominal Aortic Aneurysm | EFO_0004214 | 0.653386 |
| 35 | Psoriasis vulgaris | EFO_1001494 | 0.63942015 |