Surabhi Datta1, Elmer V Bernstam2, Kirk Roberts3. 1. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA. 2. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA; Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, TX, USA. 3. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA. Electronic address: kirk.roberts@uth.tmc.edu.
Abstract
OBJECTIVE: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. METHODS: We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps. RESULTS: Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis. CONCLUSION: The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.
OBJECTIVE: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. METHODS: We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps. RESULTS: Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancerpatients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis. CONCLUSION: The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.
Authors: Joseph Ross Mitchell; Phillip Szepietowski; Rachel Howard; Phillip Reisman; Jennie D Jones; Patricia Lewis; Brooke L Fridley; Dana E Rollison Journal: J Med Internet Res Date: 2022-03-23 Impact factor: 7.076
Authors: Ethan Andrew Chi; Gordon Chi; Cheuk To Tsui; Yan Jiang; Karolin Jarr; Chiraag V Kulkarni; Michael Zhang; Jin Long; Andrew Y Ng; Pranav Rajpurkar; Sidhartha R Sinha Journal: JAMA Netw Open Date: 2021-07-01