| Literature DB >> 35855405 |
Zhiwen Xie1, Runjie Zhu2, Jin Liu1, Guangyou Zhou3, Jimmy Xiangji Huang4, Xiaohui Cui5.
Abstract
In response to fighting COVID-19 pandemic, researchers in machine learning and artificial intelligence have constructed some medical knowledge graphs (KG) based on existing COVID-19 datasets, however, these KGs contain a considerable amount of semantic relations which are incomplete or missing. In this paper, we focus on the task of knowledge graph embedding (KGE), which serves an important solution to infer the missing relations. In the past, there have been a collection of knowledge graph embedding models with different scoring functions to learn entity and relation embeddings published. However, these models share the same problems of rarely taking important features of KG like attribute features, other than relation triples, into account, while dealing with the heterogeneous, complex and incomplete COVID-19 medical data. To address the above issue, we propose a graph feature collection network (GFCNet) for COVID-19 KGE task, which considers both neighbor and attribute features in KGs. The extensive experiments conducted on the COVID-19 drug KG dataset show promising results and prove the effectiveness and efficiency of our proposed model. In addition, we also explain the future directions of deepening the study on COVID-19 KGE task.Entities:
Keywords: COVID-19; Knowledge Graph; Natural Language Processing; Text Mining
Year: 2022 PMID: 35855405 PMCID: PMC9279179 DOI: 10.1016/j.ins.2022.07.031
Source DB: PubMed Journal: Inf Sci (N Y) ISSN: 0020-0255 Impact factor: 8.233
Fig. 1An example of sub-graph in DrugKG. Different types of entities are represented in different colors: the purple nodes represent drug, the pink nodes represent virus, the green nodes represent host proteins, and the blue nodes represent virus protein.
The Notation and Definition of KG and KGE.
| Name | Notation | Definition |
|---|---|---|
| Knowledge Graph | A set of triplets in the form | |
| Entity Collection | Vocabulary collection of the entity | |
| Entity Relations | The set of pre-defined entity relations | |
| Entity Attributes | A set of attributes for entity | |
| Entity Embedding | The entity embedding in the | |
| Vector Dimension | The dimension of entity, attribute and relation vectors | |
| Activation Function | The activation function | |
| Entity Vectors | The entity vectors for entity | |
| Relation Vector | The vector for relation | |
| Attribute Vector | The vector for the attribute | |
| Hidden Neighbor Vector | The hidden feature vector for entity | |
| Hidden Attribute Vector | The hidden feature vector for entity | |
| Scoring Function | The scoring function for the triple | |
| Loss Function | The loss function of the model |
Fig. 2The structure of the proposed graph feature collection network (GFCNet). The proposed model consists of three components: neighbor collector, attribute collector and feature ensemble. The embedding layers are used to convert the relations, entities and attributes to low-dimensional vectors. The FFN denotes a feed-forward network.
Parameter efficiency of different models.
| Model | Parameter efficiency |
|---|---|
| TransE | |
| DistMult | |
| Rescal | |
| RotatE | |
| QuatE | |
| TuckER | |
| R-GCN | |
| GFCNet |
The statistic of the DrugKG dataset.
| Relations | Training | Validation | Testing |
|---|---|---|---|
| effect | 47 | 0 | 6 |
| produce | 709 | 28 | 84 |
| binding | 8790 | 458 | 1028 |
| interaction | 13999 | 723 | 1636 |
| 23545 | 1209 | 2754 | |
Link prediction results on DrugKG. The best results are in bold and the second best results are in underline.
| Models | MRR | MR | Hits@10 | Hits@3 | Hits@1 | |
|---|---|---|---|---|---|---|
| Translation-based models | TransE | 0.196 | 0.367 | 0.226 | 0.108 | |
| TransD | 0.147 | 750.71 | 0.332 | 0.181 | 0.052 | |
| TransH | 0.153 | 765.69 | 0.330 | 0.188 | 0.061 | |
| TransR | 0.130 | 792.03 | 0.282 | 0.147 | 0.056 | |
| KR-EAR | 0.184 | 768 | 0.346 | 0.205 | 0.102 | |
| Bilinear models | Rescal | 0.104 | 880.45 | 0.202 | 0.103 | 0.055 |
| DistMult | 0.169 | 796.35 | 0.302 | 0.180 | 0.104 | |
| ComplEx | 0.171 | 1004.97 | 0.313 | 0.184 | 0.104 | |
| SimplE | 0.172 | 788.11 | 0.308 | 0.179 | 0.106 | |
| TuckER | 0.224 | 1242.00 | 0.368 | 0.249 | 0.150 | |
| MARINE | 0.177 | 1126 | 0.338 | 0.186 | 0.115 | |
| Rotate-based models | RotatE | 820.40 | ||||
| QuatE | 0.198 | 777.81 | 0.351 | 0.220 | 0.123 | |
| CNN-based models | ConvE | 0.193 | 970.09 | 0.331 | 0.214 | 0.123 |
| ConvKB | 0.069 | 816.49 | 0.205 | 0.090 | 0.000 | |
| GNN-based models | R-GCN | 0.181 | 1341.61 | 0.294 | 0.196 | 0.124 |
| KBGAT | 0.092 | 761.00 | 0.198 | 0.090 | 0.040 | |
Fig. 3MRR results on different relations.
Ablation Study.
| Model | MRR | MR | Hits@10 | Hits@3 | Hits@1 |
|---|---|---|---|---|---|
| GFCNet w/o A | 0.265 | 674.51 | 0.426 | 0.291 | 0.185 |
| GFCNet w/o N | 0.264 | 0.424 | 0.292 | 0.184 | |
| GFCNet w/o A&N | 0.253 | 1052.89 | 0.402 | 0.283 | 0.175 |
| GFCNet-RGCN | 0.267 | 654.34 | 0.429 | 0.294 | 0.187 |
| GFCNet | 630.12 | ||||
Fig. 4The visualization of the two-dimensional PCA projection of the entity embeddings for different models.
Fig. 5The MRR results for entities with different degree.
Some example Predictions on test set using our model. Bold indicates the true tails in the DrugKG.
| Input: | Predicted Tails |
|---|---|
| ( | |
| ( | |
| ( | |
| ( | |
| ( |