| Literature DB >> 36053574 |
Heng Weng1, Jielong Chen2, Aihua Ou1, Yingrong Lao1.
Abstract
BACKGROUND: Knowledge discovery from treatment data records from Chinese physicians is a dramatic challenge in the application of artificial intelligence (AI) models to the research of traditional Chinese medicine (TCM).Entities:
Keywords: clinical; framework; knowledge discovery; knowledge embedding; knowledge graph; medicine; traditional Chinese medicine
Year: 2022 PMID: 36053574 PMCID: PMC9482071 DOI: 10.2196/38414
Source DB: PubMed Journal: JMIR Med Inform
Comparison of baseline KGEa models.
| Model | Scoring function fr(h,t) | Entity and relation embedding | |||
|
| |||||
|
| TransE [ |
|
| ||
|
| TransH [ |
|
| ||
|
| TransR [ |
|
| ||
|
| TransD [ |
|
| ||
|
| |||||
|
| SimplE [ |
|
| ||
|
| HolE [ |
|
| ||
|
| |||||
|
| QuatE [ |
|
| ||
|
| RotatE [ |
|
| ||
|
| |||||
|
| ConvE [ |
|
| ||
|
| ConvKB [ |
|
| ||
|
| |||||
|
| KBATc [ |
|
| ||
|
| |||||
|
| CoKEd [ |
|
| ||
aKGE: knowledge graph embedding.
bGNN: graph neural network.
cKBAT: knowledge base attention.
dCoKE: Contextualized Knowledge Graph Embedding.
Overview of the TCMa KGb.
| Relation name | Heads, n | Tails, n | Triples, n |
| symptom=>symptom | 8101 | 8544 | 51,345 |
| disease=>symptom | 12,225 | 15,071 | 133,648 |
| disease=>drug | 12,650 | 11,526 | 84,524 |
| mechanism=>mechanism | 527 | 51 | 590 |
| symptom=>drug | 3941 | 6145 | 24,724 |
| symptom=>mechanism | 6544 | 1096 | 10,906 |
| symptom=>disease | 8101 | 10,391 | 87,651 |
| mechanism=>department | 1908 | 65 | 4408 |
| symptom=>body parts | 318 | 85 | 548 |
| mechanism=>body parts | 2217 | 72 | 3221 |
| mechanism=>symptom | 2147 | 4191 | 16,377 |
| symptom=>department | 10,157 | 178 | 24,870 |
| disease=>mechanism | 7774 | 5304 | 46,425 |
| disease=>body parts | 7607 | 110 | 13,505 |
| disease=>department | 14,484 | 284 | 40,762 |
| disease=>disease | 9728 | 10,545 | 40,575 |
| mechanism =>disease | 2228 | 5443 | 20,621 |
aTCM: traditional Chinese medicine.
bKG: knowledge graph.
Figure 1Positive and negative examples of multihop relation filtering and generation. CKD: chronic kidney disease; T2DM: type 2 diabetes mellitus.
Figure 2Proposed framework of TCM KG representation learning. CoKE: Contextualized Knowledge Graph Embedding; KG: knowledge graph; TCM: traditional Chinese medicine.
Figure 3Architecture of CoKE-distillation. CoKE: Contextualized Knowledge Graph Embedding.
Statistics of the FB15k-237 data set and the constructed TCMdt data set.
| Data set | Entities, n | Relations, n | Triples in the training set, n | Triples in the validation set, n | Triples in the test set, n |
| FB15k-237 | 14,541 | 237 | 272,115 | 17,535 | 20,446 |
| TCMdt | 59,882 | 17 | 544,230 | 30,235 | 30,235 |
Statistics of the hypertension data set in TCMa.
| Features, n | Classes, n | Total cases, N | Validation |
| 121 | 8 | 886 | 10-fold cross-validation |
aTCM: traditional Chinese medicine.
Baseline methods for KGa representation learning.
| Type of model | Models |
| Translational model | TransE [ |
| Linear/bilinear model | ComplEx [ |
| Rotational model | RotatE [ |
| GNNb | KBATc [ |
| Transformer-based model | CoKEd [ |
aKGE: knowledge graph.
bGNN: graph neural network.
cKBAT: knowledge base attention.
dCoKE: Contextualized Knowledge Graph Embedding.
Performance comparison of link prediction on the FB15k-237 data set.
| Models | MRRa | Hits@N | ||
|
| @10 | @3 | @1 | |
| TransE | 0.296 | 0.499 | 0.330 | 0.196 |
| SimplE | 0.306 | 0.496 | 0.341 | 0.212 |
| RotatE | 0.314 | 0.505 | 0.347 | 0.221 |
| ComplEx | 0.296 | 0.489 | 0.333 | 0.200 |
| DistMult | 0.309 | 0.506 | 0.346 | 0.211 |
| KBATb | 0.103 | 0.337 | 0.248 | 0.103 |
| ConvKB | 0.407 | 0.527 | 0.333 | 0.200 |
| CoKEc | 0.362 | 0.550 | 0.400 | 0.269 |
aMSE: mean-square error.
bKBAT: knowledge base attention.
cCoKE: Contextualized Knowledge Graph Embedding.
Performance comparison of link prediction on the TCMdt data set.
| Models | MRRa | Hits@N | ||
|
| @10 | @3 | @1 | |
| TransE | 0.243 | 0.428 | 0.279 | 0.150 |
| SimplE | 0.162 | 0.436 | 0.222 | 0.113 |
| RotatE | 0.146 | 0.424 | 0.193 | 0.090 |
| ComplEx | 0.137 | 0.411 | 0.177 | 0.080 |
| DistMult | 0.164 | 0.438 | 0.223 | 0.117 |
| ConvKB | 0.271 | 0.464 | 0.302 | 0.192 |
| CoKEb | 0.332 | 0.491 | 0.365 | 0.250 |
| KBATc | 0.129 | 0.369 | 0.178 | 0.088 |
| CoKE-multihop | 0.251 | 0.515 | 0.278 | 0.261 |
| CoKE-multihop-distillation | 0.32 | 0.483 | 0.374 | 0.260 |
aMSE: mean-square error.
bKBAT: knowledge base attention.
cCoKE: Contextualized Knowledge Graph Embedding.
Results of 10-fold cross-validation of deep learning multilabel models.
| Index | Precision | Recall | F1 score | |
|
| ||||
|
| Micro-avg | 0.810 | 0.710 | 0.760 |
|
| Macro-avg | 0.800 | 0.610 | 0.660 |
|
| ||||
|
| Micro-avg | 0.790 | 0.740 | 0.760 |
|
| Macro-avg | 0.760 | 0.640 | 0.670 |
|
| ||||
|
| Micro-avg | 0.810 | 0.750 | 0.780 |
|
| Macro-avg | 0.760 | 0.660 | 0.700 |
|
| ||||
|
| Micro-avg | 0.790 | 0.740 | 0.760 |
|
| Macro-avg | 0.750 | 0.670 | 0.700 |
|
| ||||
|
| Micro-avg | 0.800 | 0.790 | 0.790 |
|
| Macro-avg | 0.740 | 0.740 | 0.740 |
|
| ||||
|
| Micro-avg | 0.860 | 0.820 | 0.840 |
|
| Macro-avg | 0.810 | 0.770 | 0.790 |
aMLKNN: multilabel k nearest neighbors.
bRAkEL: random k-labelsets.
cDNN: deep neural network.
dLSTM: long short-term memory.
eKGE: knowledge graph embedding.
fBILSTM: bidirectional long short-term memory.
Figure 4Performances of DNN and DNN+BILSTM-KGE. BILSTM: bidirectional long short-term memory; DNN: deep neural network; KGE: knowledge graph embedding.
Figure 5Learned representations of entity visualization.
Figure 6Visualization of a personalized KG that consists of theories, treatment methods, prescriptions, and medicines of EM in TCM. EM: endometriosis; KG: knowledge graph; TCM: traditional Chinese medicine.
Figure 7Application of the framework to knowledge discovery and decision-making in TCM. CKG: collaborative knowledge graph; TCM: traditional Chinese medicine; QA: question and answer.