| Literature DB >> 34844576 |
Yinyu Lan1,2, Shizhu He3,4, Kang Liu1,2, Xiangrong Zeng5, Shengping Liu5, Jun Zhao1,2.
Abstract
BACKGROUND: Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the existed knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness.Entities:
Keywords: Medical knowledge graph completion; Path-based knowledge reasoning; Pre-trained language model; Textual semantic representation
Mesh:
Year: 2021 PMID: 34844576 PMCID: PMC8628388 DOI: 10.1186/s12911-021-01622-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1A subgraph in the Chinese symptom knowledge graph. The rectangles represent entities, the solid edges between entities represent the relationship between the entities connected in the path, and the dotted edges represent the relationship that combines the information on multiple paths to determine whether there is a relationship
Fig. 2Long-tailed distribution of entities and paths in Chinese symptom knowledge graph
Fig. 3The architecture of BERT enhanced entity representation used to extract path vector representation. is a dummy relation
Fig. 4The architecture of BERT enhanced path representation. The operation denotes element-wise summation, and the operation denotes weighted summation. The dotted line represents the attention mechanism
Statistics of CSKG dataset
| Stats | Number |
|---|---|
| # CSKG triples | 629,538 |
| # Relation types | 17 |
| # Entities | 59,881 |
| # Paths | 28M |
| Avg. paths/query relation | 1.68M |
| Avg. path length | 3.88 |
| Max path length | 7 |
| Avg. training positive instances/query relation | 19,799 |
| Avg. training negative instances/query relation | 14,929 |
| Avg. positive test instances/query relation | 4242 |
| Avg. negative test instances/query relation | 34,211 |
Experiments results on CSKG dataset
| Model | %MAP |
|---|---|
| PRA | 43.78 |
| Path-RNN | 43.83 |
| Single-Model | 45.93 |
| Att-Model | 46.37 |
| Single-Model + Types | 48.24 |
| Att-Model + Types | 48.85 |
| BERT enhanced entity representation | |
| BERT enhanced path representation |
Bold font shows best performance achieved in the experimental models
%MAP performance on each relation
| Relations | Single-model + types | Att-model + types | BERT enhanced entity representation | BERT enhanced path representation |
|---|---|---|---|---|
| 检查相关状 (Examination-related symptoms) | 38.58 | 50.79 | 38.84(−11.95) | |
| 检查相关部位 (Examination-related body parts) | 52.90 | 51.64 | 52.75(−0.15) | |
| 相关状 (Disease-related symptoms) | 34.35 | 45.51 | 38.56(−6.95) | |
| 相关科室 (Disease-related departments) | 43.56 | 42.27 | 47.23(+3.67) | |
| 检查相关检查 (Examination-related examinations) | 57.35 | 49.85 | 53.88(−3.47) | |
| 状相关 (Symptom-related diseases) | 39.54 | 46.81 | 47.12(+0.31) | |
| 相关检查 (Disease-related examinations) | 43.64 | 38.28 | 38.23(−5.41) | |
| 状相关科室 (Symptom-related departments) | 50.55 | 51.49 | 53.67(+2.18) | |
| 状相关状 (Symptom-related symptoms) | 57.59 | 71.44 | 48.75(−22.69) | |
| 相关 (Disease-related diseases) | 37.00 | 44.33 | 34.07(−10.26) | |
| 相关药品 (Disease-related drugs) | 51.29 | 47.07 | 56.34(−5.05) | |
| 状相关部位 (Symptom-related body parts) | 42.55 | 39.86 | 42.34(−0.21) | |
| 检查相关科室 (Examination-related departments) | 44.56 | 40.16 | 41.43(−3,13) | |
| 状相关检查 (Symptom-related departments) | 53.50 | 47.53 | 56.73(+3.23) | |
| 检查相关 (Examination-related diseases) | 58.17 | 56.05 | 43.80(−14.37) | |
| 相关部位 (Disease-related body parts) | 58.58 | 56.95 | 57.66(−0.92) | |
| 状相关药品 (Symptom-related drugs) | 56.28 | 50.39 | 48.81(−7.39) |
Bold font shows best performance achieved in the experimental model. The value in parentheses indicates the percentage increase compared to the best score between Single-model + types and Att-model + types
Examples of attention mechanism in CSKG dataset
| Query | 状相关状(两眼上视障碍, 耳聋)? Symptom-related symptoms(Binocular superior visual impairment, Epicophosis)? |
| High weight | 两眼上视障碍症状的相关症状是听觉下降,听觉下降症状的相关症状是耳聋。 The related symptom of the symptoms of upper binocular vision disorder is hearing loss, the related symptom of hearing loss is deafness. |
| Low weight | 两眼上视障碍症状的相关疾病是偏头风, 偏头风疾病的相关疾病是小儿偏头痛, 小儿偏头痛疾病的相关症状是复视, 复视症状的相关症状是耳聋. The related disease of the symptoms of visual disturbance in both eyes is migraine, the related disease of migraine is migraine in children, and the related symptom of migraine in children is diplopia, and the related symptom of diplopia is deafness. |
| Query | 相关药品(尿所致骨髓, 甲酚皂溶液)? Disease-related diseases(Bone marrow disease caused by diabetes, Cresol soap solution)? |
| High weight | 糖尿病所致骨髓疾病的相关症状是脊髓病变, 脊髓病变的相关药品是甲酚皂溶液. The related symptom of bone marrow disease caused by diabetes is spinal cord lesions, and the related medicine for spinal cord lesions is cresol soap solution. |
| Low weight | 糖尿病所致骨髓疾病的相关疾病是周围神经病损,周围神经病损疾病的相关症状是感觉过敏, 感觉过敏症状的相关疾病是神劳, 神劳疾病的相关症状是无力, 无力症状的相关疾病是重症肌无力危象, 重症肌无力疾病相关药品是甲酚皂溶液. The related disease of bone marrow disease caused by diabetes is peripheral neuropathy, the related symptom of peripheral neuropathy is hyperesthesia, the related disease of hyperesthesia is mental fatigue, the related symptom of mental fatigue is weakness, and the related disease of weakness is myasthenia gravis, and the related medicine for myasthenia crisis is cresol soap solution. |