| Literature DB >> 22195219 |
Yonghui Wu1, S Trent Rosenbloom, Joshua C Denny, Randolph A Miller, Subramani Mani, Dario A Giuse, Hua Xu.
Abstract
Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%.Mesh:
Year: 2011 PMID: 22195219 PMCID: PMC3243185
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076