| Literature DB >> 26225918 |
Amber Stubbs1, Christopher Kotfila2, Özlem Uzuner2.
Abstract
The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.Entities:
Keywords: Machine learning; Medical records; Natural language processing; Shared task
Mesh:
Year: 2015 PMID: 26225918 PMCID: PMC4989908 DOI: 10.1016/j.jbi.2015.06.007
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317