| Literature DB >> 32436854 |
Linfeng Li1,2, Peng Wang3,4, Yao Wang2, Shenghui Wang1, Jun Yan2, Jinpeng Jiang2, Buzhou Tang5, Chengliang Wang3, Yuting Liu6.
Abstract
BACKGROUND: Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs.Entities:
Keywords: PrTransX; decision support systems, clinical; electronic health records; graph embedding; knowledge graph; medical informatics; natural language processing; probabilistic medical knowledge graph; representation learning
Year: 2020 PMID: 32436854 PMCID: PMC7273238 DOI: 10.2196/17645
Source DB: PubMed Journal: JMIR Med Inform
Notations used in the study.
| Symbols | Meaning |
| Head entities, relation, tail entities from positive triplet and negative triplet corresponding to positive triplet (marked as ′) | |
| Δ, Δ′ | Set of positive/negative triplets |
| h, r, t | Embedding vectors of head, relation, and tail entities |
| hp, tp | Projection vectors of head and tail entities |
|
| Score value of given triplet |
|
| Probability of given triplet |
| Mapping function between the score value and probability of triplets | |
|
| Probability-based loss of given triplet |
|
| Probability value of negative triplet |
|
| Minimum probability value of positive triplet |
| Scaling factors, margin parameters for loss function | |
| Parameters for given relation | |
| [ | The positive part of |
|
| Margin-based loss function |
|
| Loss function |
Figure 1Workflow for extracting probabilistic knowledge triplets from real-world electronic medical record data. ICD10: International Classification of Diseases, Tenth Revision.
Figure 2Equations.
Description and distribution of relationships in the medical knowledge graph.
| relation_name | Source | Head entity type | Tail entity type | Triplet count |
|
| EMRa data set | Disease | Medicine | 74,835 |
|
| EMR data set | Disease | Symptom | 53,885 |
|
| EMR data set | Disease | Operation | 13,292 |
|
| EMR data set | Disease | Laboratory | 71,805 |
|
| EMR data set | Disease | Examination | 38,061 |
|
| Domain knowledge: ICD-10b | Disease | Disease | 6455 |
| Total | —c | — | — | 258,333 |
aEMR: electronic medical record.
bICD-10: International Classification of Diseases, Tenth Revision.
cNot applicable.
Figure 3Proportion of corrected entities ranked in the top 10 of the tested algorithms.
Figure 4Mean rank of the tested algorithms.
Figure 5Normalized discounted cumulative gain of the top 10 predicted tail entities of the tested algorithms.
Figure 6Distribution of head entities numbers by different relations.
Figure 7Comparison of normalized discounted cumulative gain of the top 10 predicted tail entities of TransX and PrTransX by different relations.