| Literature DB >> 35938002 |
Genghong Zhao1,2, Wenjian Gu3, Wei Cai2, Zhiying Zhao4, Xia Zhang1,2, Jiren Liu1,5.
Abstract
As a typical knowledge-intensive industry, the medical field uses knowledge graph technology to construct causal inference calculations, such as "symptom-disease", "laboratory examination/imaging examination-disease", and "disease-treatment method". The continuous expansion of large electronic clinical records provides an opportunity to learn medical knowledge by machine learning. In this process, how to extract entities with a medical logic structure and how to make entity extraction more consistent with the logic of the text content in electronic clinical records are two issues that have become key in building a high-quality, medical knowledge graph. In this work, we describe a method for extracting medical entities using real Chinese clinical electronic clinical records. We define a computational architecture named MLEE to extract object-level entities with "object-attribute" dependencies. We conducted experiments based on randomly selected electronic clinical records of 1,000 patients from Shengjing Hospital of China Medical University to verify the effectiveness of the method.Entities:
Keywords: Chinese clinical records; EMR data mining; knowledge graph (KG); medical entity extraction; natural language processing (computer science)
Year: 2022 PMID: 35938002 PMCID: PMC9354090 DOI: 10.3389/fgene.2022.900242
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Extract fever-related attributes from the fever description segment.
FIGURE 2From a sentence describing symptoms, separately extract segments describing fever and cough.
FIGURE 3Classify sentences in clinical record text into correct categories.
FIGURE 4Correct misuse of punctuation in clinical record text.
FIGURE 5Punctuation correction computing architecture.
FIGURE 6Sentence classification computing architecture.
FIGURE 7Entity extraction computing architecture.
FIGURE 8MLEE computing architecture.
Medical knowledge graph schema label for information extraction.
| Entity Type | Entity | Attributes |
|---|---|---|
| Symptom | Fever | Body Temperature |
| Occurrence | ||
| Duration | ||
| Cough | Occurrence | |
| Duration | ||
| Aggravating Factor | ||
| Relieving Factor | ||
| Cough Frequency | ||
| Situation | ||
| Treatment | Medication Treatment | Drug name |
| Drug dose | ||
| Duration of course of treatment | ||
| Operation | Type of operation | |
| Date of operation | ||
| Adverse reactions | ||
| Laboratory Test | Laboratory Test Entity | Test item |
| Value | ||
| Imaging | Computed Tomography | Body part |
| Abnormal seen | ||
| Magnetic Resonance Imaging | Body part | |
| Abnormal seen | ||
| T1WI | ||
| T2WI | ||
| Other |
Effect of each calculation step of MLEE.
| Computational Procedure | Precision | Recall | F1 value |
|---|---|---|---|
| Punctuation correction | 0.9874 | 0.9529 | 0.9698 |
| Sentence classification | 0.9812 | ||
| Medical entity extraction | 0.9611 | 0.9438 | 0.9524 |
| Entity object attribute extraction | 0.9638 | 0.9611 | 0.9624 |
Labels for flat transformation using the schema.
| Entity type | Entity | Attributes | NER Label |
|---|---|---|---|
| Symptom | Fever | Body Temperature |
|
| Occurrence |
| ||
| Duration |
| ||
| Cough | Occurrence |
| |
| Duration |
| ||
| Aggravating Factor |
| ||
| Relieving Factor |
| ||
| Cough Frequency |
| ||
| Situation |
| ||
| Treatment | Medication Treatment | Drug name |
|
| Drug dose |
| ||
| Duration of course of treatment |
| ||
| Operation | Type of operation |
| |
| Date of operation |
| ||
| Adverse reactions |
| ||
| Laboratory Test | Laboratory Test Entity | Test item |
|
| Value |
| ||
| Imaging | Computed Tomography | Body part |
|
| Abnormal seen |
| ||
| Magnetic Resonance Imaging | Body part |
| |
| Abnormal seen |
| ||
| T1WI |
| ||
| T2WI |
|
The bold values indicate NER label, it represents the label used to annotation the real data.
Comparison of MLEE information extraction and traditional sequence labeling.
| Method | F1 value |
|---|---|
| Bert + BiLSTM + CRF | 0.9367 |
|
|
|
The bold values indicate experiment results of the method proposed in this paper.