Literature DB >> 32143785

Real-world data medical knowledge graph: construction and applications.

Linfeng Li1, Peng Wang2, Jun Yan3, Yao Wang3, Simin Li3, Jinpeng Jiang3, Zhe Sun3, Buzhou Tang4, Tsung-Hui Chang5, Shenghui Wang6, Yuting Liu7.   

Abstract

OBJECTIVE: Medical knowledge graph (KG) is attracting attention from both academic and healthcare industry due to its power in intelligent healthcare applications. In this paper, we introduce a systematic approach to build medical KG from electronic medical records (EMRs) with evaluation by both technical experiments and end to end application examples.
MATERIALS AND METHODS: The original data set contains 16,217,270 de-identified clinical visit data of 3,767,198 patients. The KG construction procedure includes 8 steps, which are data preparation, entity recognition, entity normalization, relation extraction, property calculation, graph cleaning, related-entity ranking, and graph embedding respectively. We propose a novel quadruplet structure to represent medical knowledge instead of the classical triplet in KG. A novel related-entity ranking function considering probability, specificity and reliability (PSR) is proposed. Besides, probabilistic translation on hyperplanes (PrTransH) algorithm is used to learn graph embedding for the generated KG.
RESULTS: A medical KG with 9 entity types including disease, symptom, etc. was established, which contains 22,508 entities and 579,094 quadruplets. Compared with term frequency - inverse document frequency (TF/IDF) method, the normalized discounted cumulative gain (NDCG@10) increased from 0.799 to 0.906 with the proposed ranking function. The embedding representation for all entities and relations were learned, which are proven to be effective using disease clustering.
CONCLUSION: The established systematic procedure can efficiently construct a high-quality medical KG from large-scale EMRs. The proposed ranking function PSR achieves the best performance under all relations, and the disease clustering result validates the efficacy of the learned embedding vector as entity's semantic representation. Moreover, the obtained KG finds many successful applications due to its statistics-based quadruplet. where Ncomin is a minimum co-occurrence number and R is the basic reliability value. The reliability value can measure how reliable is the relationship between Si and Oij. The reason for the definition is the higher value of Nco(Si, Oij), the relationship is more reliable. However, the reliability values of the two relationships should not have a big difference if both of their co-occurrence numbers are very big. In our study, we finally set Ncomin = 10 and R = 1 after some experiments. For instance, if co-occurrence numbers of three relationships are 1, 100 and 10000, their reliability values are 1, 2.96 and 5 respectively.
Copyright © 2020 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  CDSS; PSR; medical knowledge graph; quadruplet; real-world data

Mesh:

Year:  2020        PMID: 32143785     DOI: 10.1016/j.artmed.2020.101817

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  6 in total

1.  Accelerating Epidemiological Investigation Analysis by Using NLP and Knowledge Reasoning: A Case Study on COVID-19.

Authors:  Jian Wang; Ke Wang; Jing Li; Jianmin Jiang; Yanfei Wang; Jing Mei; Shaochun Li
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

2.  Multimodal reasoning based on knowledge graph embedding for specific diseases.

Authors:  Chaoyu Zhu; Zhihao Yang; Xiaoqiong Xia; Nan Li; Fan Zhong; Lei Liu
Journal:  Bioinformatics       Date:  2022-02-12       Impact factor: 6.937

3.  Venous thromboembolism risk assessment of surgical patients in Southwest China using real-world data: establishment and evaluation of an improved venous thromboembolism risk model.

Authors:  Peng Wang; Yao Wang; Zhaoying Yuan; Fei Wang; Hongqian Wang; Ying Li; Chengliang Wang; Linfeng Li
Journal:  BMC Med Inform Decis Mak       Date:  2022-03-04       Impact factor: 2.796

4.  Decision-Making System for the Diagnosis of Syndrome Based on Traditional Chinese Medicine Knowledge Graph.

Authors:  Rui Yang; Qing Ye; Chunlei Cheng; Suhua Zhang; Yong Lan; Jing Zou
Journal:  Evid Based Complement Alternat Med       Date:  2022-02-10       Impact factor: 2.629

5.  Characteristics of High Suicide Risk Messages From Users of a Social Network-Sina Weibo "Tree Hole".

Authors:  Bing Xiang Yang; Pan Chen; Xin Yi Li; Fang Yang; Zhisheng Huang; Guanghui Fu; Dan Luo; Xiao Qin Wang; Wentian Li; Li Wen; Junyong Zhu; Qian Liu
Journal:  Front Psychiatry       Date:  2022-02-18       Impact factor: 4.157

6.  MLEE: A method for extracting object-level medical knowledge graph entities from Chinese clinical records.

Authors:  Genghong Zhao; Wenjian Gu; Wei Cai; Zhiying Zhao; Xia Zhang; Jiren Liu
Journal:  Front Genet       Date:  2022-07-22       Impact factor: 4.772

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.