| Literature DB >> 29589571 |
Yue-Shu Zhao1, Kun-Li Zhang2, Hong-Chao Ma3, Kun Li4.
Abstract
BACKGROUND: De-identification is the first step to use these records for data processing or further medical investigations in electronic medical records. Consequently, a reliable automated de-identification system would be of high value.Entities:
Keywords: De-identification; PHI; Text skeleton
Mesh:
Year: 2018 PMID: 29589571 PMCID: PMC5872383 DOI: 10.1186/s12911-018-0598-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1A snippet of an EMR
Overview of the datasets
| i2b2–2006 | i2b2–2014 | Chinese | |
|---|---|---|---|
| Number of records | 669 | 1304 | 9700 |
| Number of tokens | 560,852 | 1,005,582 | 3,026,944 |
| Number of PHIs | 19,498 | 28,862 | 48,072 |
| Number of PHI tokens | 29,917 | 38,435 | 137,496 |
| Vocabulary Size | 20,254 | 41,879 | 32,265 |
| Percentage of ID | 24.6% | 3.6% | 8.8% |
| Percentage of DATE | 36.4% | 43.2% | 38.9% |
| Percentage of HOSPITAL | 12.3% | 8.0% | 2.2% |
| Percentage of DOCTOR | 19.2% | 16.6% | 14.7% |
| Percentage of PATIENT | 4.7% | 7.6% | 17.3% |
| Percentage of AGE | 0.1% | 6.9% | 16.1% |
Fig. 2The RNN model for de-identification
Fig. 3A sample of the text skeleton
Fig. 4The structure of TS-RNN model
Comparison with the i2b2 shared task submissions
| 2006 i2b2 | 2014 i2b2 | |||
|---|---|---|---|---|
| Entity-level | Token-level | Entity-level | Token-level | |
| Submissions |
|
| 0.44–0.93 | 0.58–0.96 |
| TS-GRU | 0.9452 | 0.9540 | 0.9344 | 0.9401 |
Comparison between the state-of-the-art methods and our framework
| Model | 2006 i2b2 | 2014 i2b2 | Chinese | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | |
| Wellner | 0.9870 | 0.9750 | 0.9810 | – | – | – | – | – | – |
| Nottingham | – | – | – | 0.9900 | 0.9640 | 0.9768 | – | – | – |
| MIST | – | – | – | 0.9529 | 0.7569 | 0.84367 | – | – | – |
| CRF | 0.9640 | 0.9371 | 0.9504 | 0.9842 | 0.9663 | 0.9752 | 0.9863 | 0.9705 | 0.9783 |
| CRF + ANN | – | – | – | 0.9792 | 0.9784 | 0.9788 | – | – | – |
| Bi-LSTM | 0.9723 | 0.9656 | 0.9689 | 0.9878 | 0.9389 | 0.9627 | 0.9908 | 0.9584 | 0.9743 |
| Bi-GRU | 0.9871 | 0.9664 | 0.9766 | 0.9750 | 0.9704 | 0.9727 | 0.9898 | 0.9624 | 0.9759 |
| TS-GRU | 0.9903 | 0.9855 | 0.9879 | 0.9889 | 0.9723 | 0.9805 | 0.9875 | 0.9719 | 0.9797 |
Fig. 5Impact of the value of r
Fig. 6The performance under different window sizes
Fig. 7Token-level F1-scores for each PHI category on 2006 i2b2 dataset
Performance at entity-level and token-level
| Model | Entity-level | Token-level | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | |
| Rule-based | 0.8747 | 0.9276 | 0.9003 | 0.8802 |
| 0.9128 |
| CRF |
| 0.8972 | 0.9375 | 0.9669 | 0.9236 | 0.9448 |
| Bi-LSTM | 0.9701 | 0.9235 | 0.9462 | 0.9545 | 0.9027 | 0.9279 |
| Bi-GRU | 0.9665 | 0.9470 | 0.9567 | 0.9592 | 0.9270 | 0.9428 |
| TS-GRU | 0.9778 |
|
|
| 0.9447 |
|
Fig. 8Token-level F1-scores for each PHI category on 2014 i2b2 dataset
Fig. 9Token-level F1-scores for each PHI category on Chinese dataset