| Literature DB >> 31258969 |
Danielle L Mowery1,2,3,4, Kensaku Kawamoto1, Rick Bradshaw1, Wendy Kohlmann5, Joshua D Schiffman5,6, Charlene Weir1, Damian Borbolla1, Wendy W Chapman1,2, Guilherme Del Fiol1.
Abstract
Background. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). Objective. To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. Methods. Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. Results. For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. Conclusion. NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.Entities:
Year: 2019 PMID: 31258969 PMCID: PMC6568127
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc