| Literature DB >> 33323114 |
Ming Liang1, ZhiXing Zhang1, JiaYing Zhang1, Tong Ruan2, Qi Ye1, Ping He3.
Abstract
BACKGROUND: Laboratory indicator test results in electronic health records have been applied to many clinical big data analysis. However, it is quite common that the same laboratory examination item (i.e., lab indicator) is presented using different names in Chinese due to the translation problem and the habit problem of various hospitals, which results in distortion of analysis results.Entities:
Keywords: Active learning; Electronic health record; Entity alignment; Heart failure; Lab indicator standardization; Machine learning
Year: 2020 PMID: 33323114 PMCID: PMC7739485 DOI: 10.1186/s12911-020-01324-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The overall process of the indicator standardization algorithm
Comparison of different recall model which only use name
| Model | Recall | MRR |
|---|---|---|
| Edit distance | 86.67 | 0.74 |
| Bow (bag of words) | 84.32 | 0.49 |
| bm-25 | 90.10 | 0.35 |
| tf-idf | 92.50 | 0.82 |
MRR for mean reciprocal rank, which is the average of the reciprocal ranks of results for a sample of queries
Comparison of different recall model which use both name and abbreviation
| Model | Recall | MRR |
|---|---|---|
| Edit distance | 91.83 | 0.79 |
| Bow (bag of words) | 89.78 | 0.53 |
| bm-25 | 95.76 | 0.38 |
| tf-idf | 97.38 | 0.87 |
Performance comparisons of binary classification
| Methods | Precision | Recall | |
|---|---|---|---|
| Zhang | 88.38 | 79.89 | 83.53 |
| BiMPM | 83.07 | 90.13 | 86.19 |
| ESIM | 92.39 | 91.78 | 92.08 |
Zhang is the methods proposed in [24], a n-gram and stacking enhanced method
Fig. 2F1-score on test dataset with different amount of training data. RAND for random selection; LC for least confidence; GINI for gini index; ENTROPY for shannon entropy; TFD for trained with full data
Recall results of ”sodium ion”
| Non standard indicator | Standard indicators | ||
|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
Classification results of ”sodium ion”
| Standard indicator | Non_standard indicator | Label | Predict |
|---|---|---|---|
|
|
| 1 | 1 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
|
|
| 0 | 0 |
Fig. 3The process of the case study on heart failure
Fig. 4The screenshot of the heart failure indicator KB. a Lab indicators related to heart failure: Related indicators are picked out, and listed as standard names. The “serum sodium” is clicked this time. b Standard indicators and its synonyms: After clicked, the “serum sodium” and its synonymous names, which are all from the SHDC dataset and standardized by our proposed method, are displayed
Fig. 5Different names of “serum sodium”
Fig. 6A demo of named entity recognition and mapping. All relevant indicators of heart failure are recognized. Since some indicators are synonymous names, they should be mapped to their standard names. That is, “sodium” is mapped to “serum sodium” and “serum glutamic-oxaloacetic transaminase” is mapped to its standard indicator named “glutamate oxaloacetate transaminase”