| Literature DB >> 31437917 |
Kahyun Lee1, Michele Filannino1, Özlem Uzuner1.
Abstract
De-identification aims to remove 18 categories of protected health information from electronic health records. Ideally, de-identification systems should be reliable and generalizable. Previous research has focused on improving performance but has not examined generalizability. This paper investigates both performance and generalizability. To improve current state-of-the-art performance based on long short-term memory (LSTM) units, we introduce a system that uses gated recurrent units (GRUs) and deep contextualized word representations, both of which have never been applied to de-identification. We measure performance and generalizability of each system using the 2014 i2b2/UTHealth and 2016 CEGS N-GRID de-identification datasets. We show that deep contextualized word representations improve state-of-the-art performance, while the benefit of switching LSTM units with GRUs is not significant. The generalizability of de-identification system significantly improved with deep contextualized word representations; in addition, LSTM units-based system is more generalizable than the GRUs-based system.Entities:
Keywords: Data Anonymization; Machine Learning; Natural Language Processing
Mesh:
Year: 2019 PMID: 31437917 DOI: 10.3233/SHTI190215
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630