| Literature DB >> 32000770 |
Tzvika Hartman1, Michael D Howell1, Jeff Dean1, Shlomo Hoory2, Ronit Slyper1, Itay Laish1, Oren Gilon1, Danny Vainstein1, Greg Corrado1, Katherine Chou1, Ming Jack Po1, Jutta Williams3, Scott Ellis1, Gavin Bee1, Avinatan Hassidim1, Rony Amira1, Genady Beryozkin1, Idan Szpektor1, Yossi Matias1.
Abstract
BACKGROUND: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets.Entities:
Keywords: Clinical notes; De-identification; Electronic health records; Free text; Natural language processing; Recurrent neural networks
Year: 2020 PMID: 32000770 PMCID: PMC6993314 DOI: 10.1186/s12911-020-1026-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Descriptive statistics for datasets. Mimic3-echo does not contain enough PHI on which to train a model, and is thus used for testing only. We select Name, Date, and Location to show the variety in frequency of PHI types within the datasets
| Dataset | Note source | # of patients | # of notes | Train/Test partition by note | Total tokens | Total PHIs | % NAME | % DATE | % LOCATION |
|---|---|---|---|---|---|---|---|---|---|
| i2b2-2014 | diabetic longitudinal records | 296 | 1304 | 61% / 39% | 758 k | 28.8 k | 24.2% | 43.3% | 15.2% |
| i2b2-2006 | discharge notes | 889 | 889 | 75% / 25% | 487 k | 19.5 k | 24.0% | 36.4% | 13.7% |
| physionet | nursing notes | 163 | 2434 | 59% / 41% | 345 k | 1.9 k | 32.5% | 29.7% | 25.9% |
| mimic3-radiology | radiology notes | 1000 | 1000 | 50% / 50% | 205 k | 4.1 k | 10.2% | 44.8% | 1.8% |
| mimic3-echo | echocardiogram notes | 1000 | 1000 | Test only | 276 k | 2.5 k | 9.7% | 88.7% | 1.1% |
| mimic3-discharge | discharge notes | 1000 | 1000 | 81% / 19% | 128 k | 40.8 k | 21.2% | 61.1% | 9.9% |
Fig. 1Our de-identification system architecture. Clinical notes are broken into tokens, which are run through the network to be tagged as Not-PHI or Name, Date, etc.
Clinical note de-identification using fully customized systems, showing >97% recall of protected health information
| Dataset | Recall (%) | Precision (%) | F1 |
|---|---|---|---|
| i2b2-2014 | 99.1 | 85.7 | 91.7 |
| i2b2-2006 | 99.6 | 90.7 | 94.9 |
| mimic-discharge | 97.1 | 96.3 | 96.7 |
Off-the-shelf systems recall >90% of Names, with the exception of experiments using the i2b2-2006 dataset
| Train on | ||||
|---|---|---|---|---|
| Test on | i2b2-2014 | mimic-discharge | i2b2-2006 | |
| i2b2-2014 | 98.8/94.6/96.7 | 95.7/85.6/90.3 | 86.2/85.2/85.7 | |
| physionet | 92.9/73.1/81.8 | 94.3/70.6/80.7 | 69.0/78.6/73.4 | |
| mimic-radiology | 92.9/85.7/89.1 | 97.0/87.0/91.7 | 78.2/75.8/76.8 | |
| mimic-discharge | 92.5/85.4/88.8 | 97.9/85.2/91.0 | 79.1/85.4/82.1 | |
| mimic-echo | 95.5/61.4/74.7 | 99.6/86.6/92.6 | 54.2/20.3/29.0 | |
| i2b2-2006 | 87.5/86.7/87.0 | 76.9/85.1/80.8 | 97.0/97.2/97.1 | |
Fig. 2System performance as a function of number of labeled names. In each subfigure, an off-the-shelf system trained on dataset “A” (i2b2-2014) is partially customized using labeled examples from the target dataset “B”, then the system is evaluated on “B”. Training a system from scratch on “only B” is provided for comparison
Performance of an i2b2-2014 model with custom embedding tested on 4 different datasets
| Test on | PHI type | Embedding | ||
|---|---|---|---|---|
| GloVe | embed-mimic | Matching embedding | ||
| physionet | All | 76.2/61.3/67.9 | 81.8/64.1/71.8 | 76.9/62.4/68.9 - embed-mimic-nursing |
| physionet | Name | 92.9/73.1/81.8 | 95.5/81.8/88.1 | 91.0/73.4/81.3 - embed-mimic-nursing |
| mimic-radiology | Name | 92.9/85.7/89.1 | 97.2/85.9/91 | 92.0/87.2/89.4 - embed-mimic-radiology |
| mimic-discharge | Name | 92.5/85.4/88.8 | 93.4/89.8/91.6 | 92.1/86.3/89.1 - embed-mimic-discharge |
| mimic-echo | Name | 95.5/61.4/74.7 | 98.7/64.4/77.3 | too small to build embedding |