| Literature DB >> 32716307 |
Xiaoyi Pan1, Boyu Chen1, Heng Weng2, Yongyi Gong1, Yingying Qu3.
Abstract
BACKGROUND: Temporal information frequently exists in the representation of the disease progress, prescription, medication, surgery progress, or discharge summary in narrative clinical text. The accurate extraction and normalization of temporal expressions can positively boost the analysis and understanding of narrative clinical texts to promote clinical research and practice.Entities:
Keywords: Clinical text; Heuristic rule; Machine learning; Pattern learning; Temporal expression extraction; Temporal expression normalization
Year: 2020 PMID: 32716307 PMCID: PMC7418025 DOI: 10.2196/17652
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Framework of TNorm.
International Organization for Standardization 8601 formats defined in TNorm.
| Format | Temporal expression in Chinese | Value |
| YYYY-MM-DD | 2014年6月3日 | 2014-06-03 |
| YYYY-MM | 2014年6月 | 2014-06 |
| YYYY | 2014年 | 2014 |
| YYYY-MM-DDThh:mm:ss | 2014年6月3日上午7点20分4秒 | 2014-06-03T07:20:04 |
| PnYnMnDTnHnMnS | 两年五个月 | P2Y5M |
Figure 2Flow of temporal features extraction.
Examples of temporal expressions and their corresponding reference time in texts.
| Example in text | Temporal expression | Critical event | Special phrase | Reference time |
|
|
|
|
| date of chemotherapy |
|
|
| none |
| date of discharge |
|
|
| none |
| nearest direct time |
Figure 3Reference time identification.
Figure 4Automatic temporal normalization pattern learning.
Statistics of the dataset containing Chinese discharge summary texts.
| Datasets | Texts, n | Temporal expressions, n | Temporal expressions per text, mean |
| Training | 450 | 5966 | 13.26 |
| Testing | 450 | 6130 | 13.62 |
| Total | 900 | 12,096 | 13.44 |
Figure 5Calculation equations of evaluation metrics macro-average precision, macro-average recall, and macro-average F1-measure. MP: macro-average precision; MR: macro-average recall; MF: macro-average F1-measure.
Figure 6Calculation equations of evaluation metrics precision, recall and F1-measure.
Detailed experiment result of the top 10 classification algorithms.
| Classification algorithm | Macro-average precision | Macro-average recall | Macro-average F1 |
| Multiclass classifier | 0.9553 | 0.9420 | 0.9485 |
| Logistic | 0.9558 | 0.9425 | 0.9488 |
| Simple logistic | 0.9560 | 0.9423 | 0.9490 |
| Iterative classifier optimizer | 0.9493 | 0.9525 | 0.9510 |
| Logit boost | 0.9493 | 0.9525 | 0.9510 |
| Decision table | 0.9493 | 0.9538 | 0.9513 |
| JRip | 0.9523 | 0.9518 | 0.9523 |
| K-nearest neighbor (k=1) | 0.9518 | 0.9613 | 0.9563 |
| Logistic model trees | 0.9545 | 0.9598 | 0.9570 |
| Randomizable filtered classifier | 0.9535 | 0.9613 | 0.9573 |
Evaluation result of the efficiency of the learned patterns in TNorm.
| Strategy | Accuracy | |
|
|
| |
|
| rule | 0.8587 |
|
| rule+pattern | 0.8654 |
|
|
| |
|
| rule | 0.8587 |
|
| rule+pattern | 0.8654 |
|
|
| |
|
| rule | 0.8587 |
|
| rule+pattern | 0.8654 |
|
|
| |
|
| rule | 0.8586 |
|
| rule+pattern | 0.8653 |
|
|
| |
|
| rule | 0.8586 |
|
| rule+pattern | 0.8653 |
Figure 7Performance changes using the method with different sizes of training dataset.
Figure 8Experiment result of the stability test.