| Literature DB >> 31801521 |
Go Eun Heo1, Qing Xie1, Min Song2, Jeong-Hoon Lee3.
Abstract
BACKGROUND: Extracting useful information from biomedical literature plays an important role in the development of modern medicine. In natural language processing, there have been rigorous attempts to find meaningful relationships between entities automatically by co-occurrence-based methods. It has been increasingly important to understand whether relationships exist, and if so how strong, between any two entities extracted from a large number of texts. One of the defining methods is to measure semantic similarity and relatedness between two entities.Entities:
Keywords: Alzheimer’s disease; Information extraction; Knowledge discovery; Ranking algorithm; Semantic relatedness
Mesh:
Year: 2019 PMID: 31801521 PMCID: PMC6894106 DOI: 10.1186/s12911-019-0934-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Overview of the proposed approach
Fig. 2Number of papers by publication year
Pseudocode for our algorithm.
Alzheimer’s disease–APP direct entity pairs
| Entity(A) | Entity(C) | Direct Frequency | pmid_same | pmid_different | relatedness | Direct score (Ydirect) |
|---|---|---|---|---|---|---|
| Alzheimer’s disease | APP | 5126 | 4123 | 1003 | 0.427029 | 3949.592 |
Indirect entity pairs scores
| Entity A | Co-occurrence | relatedness (A, B) | Middle word B | Co-occurrence | Relatedness (B, C) | Entity C | Score X |
|---|---|---|---|---|---|---|---|
| Alzheimer’s disease | 1750 | 0.434575 | PSEN1 | 1562 | 0.712967 | APP | 1874.1596 |
| Alzheimer’s disease | 692 | 0.398862 | BACE1 | 1294 | 0.774334 | APP | 1278.0003 |
| Alzheimer’s disease | 3470 | 0.546675 | amyloid beta | 652 | 0.706621 | APP | 2357.6781 |
| Alzheimer’s disease | 471 | 0.449012 | PSEN2 | 648 | 0.703564 | APP | 667.3944 |
| Alzheimer’s disease | 5107 | 0.464037 | tau | 526 | 0.628522 | APP | 2700.4406 |
Alzheimer’s disease top 20 related entity scores (PKDE4J)
| No | Entity A | Entity C | Proposed | Co-occurrence | Word2Vec | COALS | Random indexing |
|---|---|---|---|---|---|---|---|
| 1 | Alzheimer’s disease | TAU | 1 | 0.6181 | 0.607 | 0.6424 | 0.6188 |
| 2 | Alzheimer’s disease | MCI | 0.99 | 1 | 0.7571 | 0.1408 | 0.084 |
| 3 | Alzheimer’s disease | Memory | 0.9873 | 0.5843 | 0.6139 | 0.6618 | 0.6395 |
| 4 | Alzheimer’s disease | Parkinson’s disease | 0.935 | 0.5004 | 0.8704 | 1 | 1 |
| 5 | Alzheimer’s disease | CSF | 0.9072 | 0.4738 | 0.5717 | 0.1133 | 0.0547 |
| 6 | Alzheimer’s disease | APP | 0.9062 | 0.6204 | 0.5685 | 0.3317 | 0.2876 |
| 7 | Alzheimer’s disease | APOE | 0.8879 | 0.4328 | 0.606 | 0.1214 | 0.0633 |
| 8 | Alzheimer’s disease | Neurodegenerative diseases | 0.8689 | 0.4348 | 0.7678 | 0.11 | 0.0512 |
| 9 | Alzheimer’s disease | Impairment | 0.8035 | 0.1258 | 0.7224 | 0.9951 | 0.9948 |
| 10 | Alzheimer’s disease | Amyloid beta | 0.8024 | 0.4199 | 0.615 | 0.0777 | 0.0661 |
| 11 | Alzheimer’s disease | Cognitive impairment | 0.8002 | 0.1237 | 0.7617 | 0.1019 | 0.0426 |
| 12 | Alzheimer’s disease | Neurodegeneration | 0.7984 | 0.1464 | 0.7375 | 0.233 | 0.1823 |
| 13 | Alzheimer’s disease | Neurodegenerative disorders | 0.7863 | 0.2935 | 0.767 | 0.1521 | 0.0961 |
| 14 | Alzheimer’s disease | Depression | 0.7827 | 0.241 | 0.6844 | 0.4013 | 0.3617 |
| 15 | Alzheimer’s disease | Oxidative stress | 0.782 | 0.2512 | 0.6038 | 0.1084 | 0.0495 |
| 16 | Alzheimer’s disease | Hippocampus | 0.7794 | 0.0856 | 0.6091 | 0.1553 | 0.0995 |
| 17 | Alzheimer’s disease | Vascular dementia | 0.7683 | 0.3273 | 0.7796 | 0.6845 | 0.6636 |
| 18 | Alzheimer’s disease | Patients | 0.7589 | 0.016 | 0.9726 | 0.1235 | 0.0661 |
| 19 | Alzheimer’s disease | Neurofibrillary tangles | 0.7448 | 0.3975 | 0.6191 | 0.1553 | 0.0995 |
| 20 | Alzheimer’s disease | MRI | 0.7405 | 0.1315 | 0.5843 | 0.1472 | 0.0909 |
Top 20 Alzheimer’s disease–related entities by each method (PKDE4J)
| Entity A | Proposed |
|---|---|
| Alzheimer’s disease | [ [ [ [ |
| Entity A | Co-occurrence |
| Alzheimer’s disease | [ [ |
| Entity A | Word2Vec |
| Alzheimer’s disease | [ |
| Entity A | COALS |
| Alzheimer’s disease | [ |
| Entity A | Random indexing |
| Alzheimer’s disease | [ [ |
Alzheimer disease top 20 related entity scores (SemRep)
| No | Entity A | Entity C | Ranking score | Co-occurrence | Word2Vec | COALS | Random indexing |
|---|---|---|---|---|---|---|---|
| 1 | Alzheimer’s disease | Patients | 1 | 1 | 0.5444 | 0.0413 | 0.0206 |
| 2 | Alzheimer’s disease | Disease | 0.9065 | 0.1167 | 0.98 | 0.8615 | 0.8585 |
| 3 | Alzheimer’s disease | Brain | 0.5989 | 0.023 | 0.6085 | 0.1052 | 0.0859 |
| 4 | Alzheimer’s disease | Dementia | 0.5902 | 0.1187 | 0.733 | 0.6591 | 0.6518 |
| 5 | Alzheimer’s disease | Impaired cognition | 0.5013 | 0.0637 | 0.6737 | 0.0413 | 0.0206 |
| 6 | Alzheimer’s disease | Therapeutic procedure | 0.4828 | 0.0512 | 0.6606 | 0.0413 | 0.0206 |
| 7 | Alzheimer’s disease | Neurodegenerative disorders | 0.4371 | 0.0444 | 0.7707 | 0.0413 | 0.0206 |
| 8 | Alzheimer’s disease | Persons | 0.408 | 0.0879 | 0.6314 | 0.0413 | 0.0206 |
| 9 | Alzheimer’s disease | Amyloid | 0.4018 | 0.0357 | 0.6118 | 0.1159 | 0.0968 |
| 10 | Alzheimer’s disease | Pharmaceutical preparations | 0.394 | 0.03 | 0.7011 | 0.0413 | 0.0206 |
| 11 | Alzheimer’s disease | APP gene | 0.3845 | 0.0098 | 0.5346 | 0.0413 | 0.0206 |
| 12 | Alzheimer’s disease | Amyloid beta-protein precursor | 0.3751 | 0.0211 | 0.3492 | 0.0413 | 0.0206 |
| 13 | Alzheimer’s disease | Functional disorder | 0.3746 | 0.0354 | 0.7702 | 0.0413 | 0.0206 |
| 14 | Alzheimer’s disease | Apolipoprotein E | 0.3707 | 0.0292 | 0.5656 | 0.052 | 0.0315 |
| 15 | Alzheimer’s disease | Parkinson’s disease | 0.3702 | 0.0052 | 0.9802 | 0.0413 | 0.0206 |
| 16 | Alzheimer’s disease | Population group | 0.3692 | 0.0464 | 0.629 | 0.0413 | 0.0206 |
| 17 | Alzheimer’s disease | Pathogenesis | 0.3685 | 0.0405 | 0.7519 | 0.0413 | 0.0206 |
| 18 | Alzheimer’s disease | Dementia, vascular | 0.3552 | 0.0095 | 0.7878 | 0.6494 | 0.5067 |
| 19 | Alzheimer’s disease | Nerve Degeneration | 0.3546 | 0.0282 | 0.712 | 0.0413 | 0.0206 |
| 20 | Alzheimer’s disease | Entire hippocampus | 0.3541 | 0.0038 | 0.5782 | 0.0413 | 0.0206 |
Top 20 Alzheimer’s disease–related entities by each method (SemRep)
| Entity A | Proposed |
|---|---|
| Alzheimer’s disease | [ |
| Entity A | Co-occurrence |
| Alzheimer’s disease | [ |
| Entity A | Word2Vec |
| Alzheimer’s disease | [ [ [ [ |
| Entity A | COALS |
| Alzheimer’s disease | [ |
| Entity A | Random indexing |
| Alzheimer’s disease | [ |
Fig. 3Alzheimer’s disease–related gene ranking (PKDE4J)
Comparison of Alzheimer’s disease–related gene ranking (PKDE4J)
| Pair rank | Co-occurrence | COALS | Random indexing | Word2Vec | Proposed |
|---|---|---|---|---|---|
| 1–10 | 1 | 0 | 0 | 0 | 1 |
| 11–100 | 2 | 2 | 2 | 0 | 1 |
| 101–500 | 4 | 1 | 1 | 0 | 5 |
| 501–2000 | 6 | 6 | 6 | 1 | 5 |
| 2000–3999 | 2 | 5 | 5 | 6 | 4 |
| 4000–5999 | 3 | 2 | 2 | 7 | 2 |
| 6000–9000 | 1 | 1 | 1 | 5 | 1 |
Quantitative evaluation in PKDE4J
| Method name | Precision*10 | Recall | F-Measure |
|---|---|---|---|
| Proposed | 68,97% | 63.16% | 65.94% |
| Word2Vec | 5.74% | 5.26% | 5.30% |
| Co-occurrence | 68,97% | 63.16% | 65.94% |
| COALS | 51.72% | 47.37% | 49.45% |
| RI | 45.98% | 42.10% | 43.95% |
Fig. 4Genome-wide overview
Top five important pathways sorted by p-value for each gene list
| Proposed | ||||||
|---|---|---|---|---|---|---|
| Pathway name | Entities | Reactions | ||||
| found | ratio | p-value | FDR* | found | ratio | |
| Nuclear signaling by ERBB4 | 3 / 35 | 0.002 | 5.37e-05 | 0.012 | 3 / 22 | 0.002 |
| Signaling by interleukins | 7 / 640 | 0.046 | 2.60e-04 | 0.019 | 6 / 491 | 0.041 |
| MECP2 regulates transcription of neuronal ligands | 3 / 61 | 0.004 | 2.74e-04 | 0.019 | 3 / 37 | 0.003 |
| Signaling by receptor tyrosine kinases | 2 / 13 | 9.25e-04 | 3.41e-04 | 0.019 | 2 / 8 | 6.68e-04 |
| NRIF signals cell death from the nucleus | 6 / 521 | 0.037 | 5.89e-04 | 0.023 | 71 / 633 | 0.053 |
| Co-occurrence | ||||||
| MECP2 regulates transcription of neuronal ligands | 2 / 13 | 9.25e-04 | 3.18e-04 | 0.036 | 2 / 8 | 6.68e-04 |
| RUNX1 and FOXP3 control the development of regulatory T lymphocytes (Tregs) | 2 / 17 | 0.001 | 5.41e-04 | 0.036 | 2 / 20 | 0.002 |
| NRIF signals cell death from the nucleus | 2 / 18 | 0.001 | 6.06e-04 | 0.036 | 4 / 7 | 5.84e-04 |
| Amyloid fiber formation | 3 / 88 | 0.006 | 7.15e-04 | 0.036 | 16 / 33 | 0.003 |
| Neurodegenerative diseases | 2 / 30 | 0.002 | 0.002 | 0.056 | 2 / 22 | 0.002 |
| COALS | ||||||
| Plasma lipoprotein assembly | 3 / 30 | 0.002 | 4.18e-05 | 0.01 | 8 / 19 | 0.002 |
| MECP2 regulates transcription of neuronal ligands | 2 / 13 | 9.25e-04 | 3.91e-04 | 0.042 | 2 / 8 | 6.68e-04 |
| HDL assembly | 2 / 18 | 0.001 | 7.44e-04 | 0.042 | 7 / 9 | 7.51e-04 |
| NRIF signals cell death from the nucleus | 2 / 18 | 0.001 | 7.44e-04 | 0.042 | 4 / 7 | 5.84e-04 |
| Amyloid fiber formation | 3 / 88 | 0.006 | 9.67e-04 | 0.044 | 16 / 33 | 0.003 |
| Word2Vec | ||||||
| Transfer of LPS from LBP carrier to CD14 | 1 / 3 | 2.13e-04 | 0.005 | 0.075 | 2 / 2 | 1.67e-04 |
| NTF3 activates NTRK2 (TRKB) signaling | 1 / 4 | 2.85e-04 | 0.006 | 0.075 | 3 / 3 | 2.50e-04 |
| NTF4 activates NTRK2 (TRKB) signaling | 1 / 4 | 2.85e-04 | 0.006 | 0.075 | 3 / 3 | 2.50e-04 |
| BDNF activates NTRK2 (TRKB) signaling | 1 / 4 | 2.85e-04 | 0.006 | 0.075 | 3 / 3 | 2.50e-04 |
| Defective GSS causes glutathione synthetase deficiency (GSS deficiency) | 1 / 4 | 2.85e-04 | 0.006 | 0.075 | 1 / 1 | 8.34e-05 |
* False Discovery Rate
Gene rankings for each method, ordered by pathway number
| Extraction | Ranking Numbers | ||||||
|---|---|---|---|---|---|---|---|
| Gene | Proposed | Co-occurrence | Word2Vec | COALS | Random Indexing | Pathway | |
| PKDE4J | *TNF | 361 | 742 | 4876 | 548 | 851 | 57 |
| *EGFR | 1468 | 2063 | 5550 | 2391 | 3381 | 43 | |
| *GSK3B | 990 | 1374 | 2341 | 2119 | 2349 | 39 | |
| *CREB1 | 6298 | 8240 | 6792 | 6468 | 7460 | 39 | |
| IL1B | 2401 | 1292 | 3451 | 3043 | 3733 | 37 | |
| *SRC | 3854 | 8230 | 3984 | 4257 | 5257 | 36 | |
| *ATF4 | 2890 | 6965 | 4204 | 3430 | 4538 | 32 | |
| MYC | 6531 | 4909 | 4957 | 6688 | 6578 | 32 | |
| *IGF1 | 446 | 463 | 3410 | 1940 | 2040 | 30 | |
| IGF1R | 1868 | 1397 | 3697 | 519 | 1119 | 29 | |
| SemRep | MAP 2 K1 | 3466 | 3462 | 3433 | 1.5 | 1.2 | 80 |
| PRKACB | 3952 | 2131 | 3444 | 1.5 | 1.2 | 67 | |
| *MAPK8 | 253 | 1004 | 3410 | 1.5 | 1.2 | 59 | |
| *TNF | 731 | 3564 | 2460 | 1.5 | 1.2 | 57 | |
| JUN | 2428 | 1804 | 2799 | 1.5 | 1.2 | 49 | |
| *TP53 | 399 | 724 | 2492 | 1.5 | 1.2 | 48 | |
| *IL6 | 925 | 1012 | 2816 | 1.5 | 1.2 | 43 | |
| EGFR | 3201 | 3032 | 2497 | 1.5 | 1.2 | 43 | |
| *BAX | 2099 | 3643 | 2780 | 1.5 | 1.2 | 41 | |
| IL1B | 2264 | 1910 | 2246 | 1.5 | 1.2 | 37 | |
The genes with the asterisk (*) symbol indicate that our method generates better ranking than the other methods do
Top 10 genes ordered by pathway number
| System | Co-occurrence | COALS | Random indexing | Word2Vec | Proposed |
|---|---|---|---|---|---|
| PKDE4J | 3 | 0 | 0 | 0 | 7 |
| SemRep | 3 | – | – | 2 | 5 |
Indirect relations for Alzheimer’s disease and coronary artery disease
| Entity A | Co-occurrences of (A, B) | Relatedness | Intermediate entity B | Co-occurrences of (B, C) | Relatedness | Entity C |
|---|---|---|---|---|---|---|
| Alzheimers disease | 1128 | 0.65167 | Diabetes | 14 | 0.75549 | Coronary artery disease |
| Alzheimers disease | 409 | 0.61262 | Hypertension | 13 | 0.78521 | Coronary artery disease |
| Alzheimers disease | 687 | 0.46660 | Cholesterol | 10 | 0.54389 | Coronary artery disease |
| Alzheimers disease | 1789 | 0.49688 | APOE | 9 | 0.55493 | Coronary artery disease |
| Alzheimers disease | 452 | 0.68793 | Atherosclerosis | 7 | 0.81259 | Coronary artery disease |
| Alzheimers disease | 437 | 0.65669 | Type 2 diabetes | 7 | 0.74681 | Coronary artery disease |
| Alzheimers disease | 1166 | 0.73817 | Schizophrenia | 6 | 0.66643 | Coronary artery disease |
| Alzheimers disease | 408 | 0.66002 | Diabetes mellitus | 6 | 0.75417 | Coronary artery disease |
| Alzheimers disease | 28 | 0.64372 | Atrial fibrillation | 6 | 0.70951 | Coronary artery disease |
| Alzheimers disease | 147 | 0.69426 | Bipolar disorder | 4 | 0.66286 | Coronary artery disease |
| Alzheimers disease | 49 | 0.66373 | Heart failure | 4 | 0.78867 | Coronary artery disease |
| Alzheimers disease | 27 | 0.52386 | PON1 | 4 | 0.59528 | Coronary artery disease |
| Alzheimers disease | 4135 | 0.97112 | Parkinson’s disease | 3 | 0.84996 | Coronary artery disease |
| Alzheimers disease | 1992 | 0.58868 | Depression | 3 | 0.55049 | Coronary artery disease |
| Alzheimers disease | 357 | 0.63473 | Obesity | 3 | 0.72262 | Coronary artery disease |
| Alzheimers disease | 221 | 0.50534 | APOE4 | 3 | 0.54360 | Coronary artery disease |
| Alzheimers disease | 165 | 0.73497 | Osteoporosis | 3 | 0.75669 | Coronary artery disease |
| Alzheimers disease | 58 | 0.64033 | Genome-wide association study | 3 | 0.65748 | Coronary artery disease |
| Alzheimers disease | 4414 | 0.66884 | Mild cognitive impairment | 2 | 0.61373 | Coronary artery disease |
Gene–phenotype ranking by number of common-phenotype genes (PKDE4J)
| Entity A | Entity C (Gene Only) | Phenotype | Phenotype-Related Gene | Common Phenotype Genes |
|---|---|---|---|---|
| Alzheimers disease | CD36 | Platelet glycoprotein IV deficiency | CD36 | 17 |
| Macrothrombocytopenia | – | |||
| Coronary heart disease | CD36 | |||
| Malaria, cerebral | ACKR1, FCGR2A, FCGR2B, FCGR2A, FCGR2B, CR1, GYPC, CISH, GYPB, GYPA, TNF, HBB, TIRAP, NOS2A, SLC4A1, ICAM1,G6PD, CD36 | |||
| Alzheimers disease | IL10 | Graft-versus-host disease | IL10 | 13 |
| HIV-1 | CXCR1, CX3CR1, TLR3, HLAC, CXCL12, IFNG, IL4R, CCL3L1, CCL2, CCL11, CCL3, CD209, KIR3DL1, IL10 | |||
| Rheumatoid arthritis | – | |||
| Alzheimers disease | ABCA1 | HDL deficiency | APOA1, ABCA1 | 2 |
| Tangier disease | ABCA1 | |||
| Coronary artery disease, familial | LDLR, ABCA1 |
Top ten gene rankings, ranked by number of common-phenotype genes
| System | Co-occurrence | COALS | Random indexing | Word2Vec | Proposed |
|---|---|---|---|---|---|
| PKDE4J | 2 | 0 | 0 | 2 | 6 |
| SemRep | 0 | – | – | 2 | 8 |