| Literature DB >> 35657661 |
Shicheng Li1,2, Lizong Deng1,2, Xu Zhang1,2, Luming Chen1,2,3, Tao Yang3,4, Yifan Qi1,2, Taijiao Jiang1,2,3.
Abstract
BACKGROUND: Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario.Entities:
Keywords: Chinese EHRs; deep phenotyping; linguistic pattern; motif discovery; pattern recognition
Mesh:
Year: 2022 PMID: 35657661 PMCID: PMC9206202 DOI: 10.2196/37213
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1The pipeline for the linguistic pattern–learning method. A: attribute; C: punctuation; EHR: electronic health record; MEME: Multiple Expectation Maximums for Motif Elicitation; O: other information; P: phenotype; PhenoSSU: Semantic Structured Unit of Phenotypes; re.compile: a Python method used to compile a regular expression pattern; SNOMED CT: Systematized Nomenclature of Medicine–Clinical Terms.
Figure 2Free-text phenotype descriptions and linguistic patterns. A. Examples of structuring free text by the PhenoSSU model. B. Examples of linguistic patterns in free text. A: attribute; L: analyte; N: number; P: pain; PhenoSSU: Semantic Structured Unit of Phenotypes; U: unit; WBC: white blood cell.
Figure 3The workflow of learning linguistic patterns of the PhenoSSU model from the corpus of Chinese electronic health records (EHRs). PhenoSSU: Semantic Structured Unit of Phenotypes; re.compile: a Python method used to compile a regular expression pattern.
Figure 4The workflow of recognizing PhenoSSU instances from free text via linguistic pattern recognition. The numbers within the square brackets represent the position indexes of single letters in the original text. A: attribute; P: phenotype; PhenoSSU: Semantic Structured Unit of Phenotypes; re.compile: a Python method used to compile a regular expression pattern.
Six regular expressions based on linguistic patterns of the Chinese electronic health record corpus in this study.
| Phenotype category and regular expressions | Example in Chinese (English translation) | |
|
| ||
|
| re.compilea(“Ab+Pc(CdP)+”) | “无/A咳嗽/P、/C发热/P” (no cough or fever) |
|
| re.compile(“AP+”) | “严重/A腹痛/P腹泻/P” (severe abdominal pain and diarrhea) |
|
| re.compile(“A+P”) | “右下腹/A严重/A疼痛/P” (severe right-lower abdominal pain) |
|
| re.compile(“A×PC×A+”) | “咳嗽/P,/C呈持续性/A” (cough, consistently) |
|
| ||
|
| re.compile(“Se×LfNgUh”) | “白细胞/L 12×109/N /L/U” (WBCi 12 × 109/L) |
|
| re.compile(“S×LRj” ) | “血/S糖/L升高/R” (high blood glucose) |
are.compile: a Python method used to compile a regular expression pattern.
bA: attribute.
cP: phenotype.
dC: punctuation.
eS: specimen.
fL: analyte.
gN: number.
hU: unit.
iWBC: white blood cell.
jR: results of laboratory examination.
Figure 5Determining the best strategy for recognizing PhenoSSU instances. A. The workflow of recognizing PhenoSSU instances from free text. B. The performance comparison between the dictionary-based method and the deep learning–based method in identifying phenotype concepts. C. The performance comparison between the SVM-based method and the pattern recognition–based method in recognizing a phenotype’s attributes. PhenoSSU: Semantic Structured Unit of Phenotypes; SNOMED CT: Systematized Nomenclature of Medicine–Clinical Terms; SVM: support vector machine.
Figure 6The comparison of PhenoSSU instances extracted from the clinical guidelines and electronic health records (EHRs) of chronic bronchitis. PhenoSSU: Semantic Structured Unit of Phenotypes.