| Literature DB >> 26209007 |
James Cormack1, Chinmoy Nath2, David Milward3, Kalpana Raja2, Siddhartha R Jonnalagadda2.
Abstract
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.Entities:
Keywords: Clinical natural language processing; Information extraction; Text mining
Mesh:
Year: 2015 PMID: 26209007 PMCID: PMC4737484 DOI: 10.1016/j.jbi.2015.06.030
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317