| Literature DB >> 26433122 |
Vishesh Kumar1, Amber Stubbs2, Stanley Shaw3, Özlem Uzuner4.
Abstract
The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.Entities:
Keywords: Corpus; Machine learning; Medical records; NLP
Mesh:
Year: 2015 PMID: 26433122 PMCID: PMC4978168 DOI: 10.1016/j.jbi.2015.09.018
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317