S M Meystre1, G K Savova, K C Kipper-Schuler, J F Hurdle. 1. University of Utah, Department of Biomedical Informatics, 26 South 2000 East, HSEB Suite 5700, Salt Lake City, UT 84112-5750, USA. stephane.meystre@hsc.utah.edu
Abstract
OBJECTIVES: We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). METHODS: Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. RESULTS: 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. CONCLUSIONS: Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.
OBJECTIVES: We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). METHODS: Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. RESULTS: 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. CONCLUSIONS: Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.
Authors: Li Zhou; Joseph M Plasek; Lisa M Mahoney; Neelima Karipineni; Frank Chang; Xuemin Yan; Fenny Chang; Dana Dimaggio; Debora S Goldman; Roberto A Rocha Journal: AMIA Annu Symp Proc Date: 2011-10-22
Authors: Jennifer H Garvin; Scott L DuVall; Brett R South; Bruce E Bray; Daniel Bolton; Julia Heavirland; Steve Pickard; Paul Heidenreich; Shuying Shen; Charlene Weir; Matthew Samore; Mary K Goldstein Journal: J Am Med Inform Assoc Date: 2012-03-21 Impact factor: 4.497
Authors: Stéphane M Meystre; Julien Thibault; Shuying Shen; John F Hurdle; Brett R South Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497