| Literature DB >> 29843590 |
Shengtian Sang1, Zhihao Yang2, Lei Wang3, Xiaoxia Liu1, Hongfei Lin1, Jian Wang1.
Abstract
BACKGROUND: Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries.Entities:
Keywords: Drug discovery; Knowledge graph; Literature mining; Literature-based discovery
Mesh:
Year: 2018 PMID: 29843590 PMCID: PMC5975655 DOI: 10.1186/s12859-018-2167-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The prototype example of SemKG. The symbol e, r and t represent entity, relation and the type of the entity, respectively. no is the number of occurrences and pmid is PubMed ID
Fig. 2An illustration of one edge in SemKG
Fig. 3Feature selection of SemaTyP method
Fig. 4Random Walk Algorithm for drug discovery
The detailed information of SemKG
| Materials | Number |
|---|---|
| PubMed abstracts | 22,769,789 |
| Predications | 39,133,975 |
| Selected predications | 17,651,279 |
| Entities of SemKG | 1,067,092 |
| Relations of SemKG | 14,419,744 |
| Entity types | 133 |
| Relation types | 52 |
Fig. 5The distribution of semantic types in SemKG
Fig. 6The performance of SemaTyP
The results of logistic regression model with different regularizations
|
| Precision | Recall | F-score | |||
|---|---|---|---|---|---|---|
| L1 | L2 | L1 | L2 | L1 | L2 | |
| 0.0001 | 0.908 | 0.923 | 0.889 | 0.899 | 0.903 | 0.911 |
| 0.001 | 0.907 | 0.914 | 0.878 | 0.88 | 0.892 | 0.900 |
| 0.01 | 0.899 | 0.903 | 0.869 | 0.876 | 0.884 | 0.889 |
| 0.1 | 0.905 | 0.903 | 0.887 | 0.877 | 0.896 | 0.89 |
| 1 | 0.866 | 0.907 | 0.849 | 0.879 | 0.857 | 0.892 |
| 10 | 0.847 | 0.902 | 0.837 | 0.877 | 0.842 | 0.889 |
| 100 | 0.823 | 0.893 | 0.811 | 0.876 | 0.817 | 0.884 |
Fig. 7Performance of SemaTyP with different size of training data
The performance of our model with different training data
|
| Positive cases | Precision | Recall | F-score |
|---|---|---|---|---|
| 2 | 32 | - | - | - |
| 3 | 1742 | 0.791 | 0.787 | 0.789 |
| 4 | 19,230 | 0.907 | 0.879 | 0.892 |
The performance of discovering drugs for disease
| Method | Not found | Mean ranking | Hits@10 (%) |
|---|---|---|---|
| RWA_1 | 262 | 72.28 | 28.8 |
| RWA_2 | 57 | 26.59 | 24.46 |
| RWA_3 | 2 | 32.45 | 23.37 |
| RWA_4 | 0 | 34.26 | 19.57 |
| RWA_5 | 0 | 35.81 | 18.75 |
| RWA_6 | 0 | 39.14 | 16.03 |
| RWA_7 | 0 | 42.13 | 14.95 |
| RWA_8 | 0 | 44.15 | 13.59 |
| RWA_9 | 0 | 45.69 | 11.96 |
| RWA_10 | 0 | 46.19 | 11.69 |
| NRWRH | 19 | 31.05 | 29.72 |
| TP-NRWRH | 17 | 29.87 | 30.83 |
| Our method | 0 |
|
|
Bold values denote the best scores corresponding to specific metric
Case study: rediscover known drugs for diseases and provide the new mechanism of action of the drugs
| Disease | Target | Drug | Rank |
|---|---|---|---|
| Osteoporosis | col18a1 | Testosterone | 1 |
| Osteoporosis | Bone metabolism | ap22408 | 3 |
| Cardiac arrhythmia | Actin | Terikalant | 8 |
| Cardiovascular disease | Lymphoid cell | Aspirin | 1 |
| Cardiovascular disease | slc5a1 | l-nmma | 2 |
| Skin allergie | Calprotectin | Mometasone | 1 |
| Osteoporosis | Kinase | Calcium-sensing receptor antagonist | 3 |
| Anxiety disorder | netrin-1 | Benzodiazepine | 1 |
| Anxiety disorder | Urotensin ii | Anxiolytic | 2 |
| Anxiety disorder | Platelet activating factor | Buspirone | 4 |
| Convulsion | epr | Anidulafungin | 7 |
| Graft-versus-host disease | fgf21 | Flavopiridol | 12 |