Literature DB >> 33929889

Development and Use of Natural Language Processing for Identification of Distant Cancer Recurrence and Sites of Distant Recurrence Using Unstructured Electronic Health Record Data.

Yasmin H Karimi1, Douglas W Blayney1, Allison W Kurian1,2, Jeanne Shen3, Rikiya Yamashita3, Daniel Rubin4,5, Imon Banerjee6,7.   

Abstract

PURPOSE: Large-scale analysis of real-world evidence is often limited to structured data fields that do not contain reliable information on recurrence status and disease sites. In this report, we describe a natural language processing (NLP) framework that uses data from free-text, unstructured reports to classify recurrence status and sites of recurrence for patients with breast and hepatocellular carcinomas (HCC).
METHODS: Using two cohorts of breast cancer and HCC cases, we validated the ability of a previously developed NLP model to distinguish between no recurrence, local recurrence, and distant recurrence, based on clinician notes, radiology reports, and pathology reports compared with manual curation. A second NLP model was trained and validated to identify sites of recurrence. We compared the ability of each NLP model to identify the presence, timing, and site of recurrence, when compared against manual chart review and International Classification of Diseases coding.
RESULTS: A total of 1,273 patients were included in the development and validation of the two models. The NLP model for recurrence detects distant recurrence with an area under the curve of 0.98 (95% CI, 0.96 to 0.99) and 0.95 (95% CI, 0.88 to 0.98) in breast and HCC cohorts, respectively. The mean accuracy of the NLP model for detecting any site of distant recurrence was 0.9 for breast cancer and 0.83 for HCC. The NLP model for recurrence identified a larger proportion of patients with distant recurrence in a breast cancer database (11.1%) compared with International Classification of Diseases coding (2.31%).
CONCLUSION: We developed two NLP models to identify distant cancer recurrence, timing of recurrence, and sites of recurrence based on unstructured electronic health record data. These models can be used to perform large-scale retrospective studies in oncology.

Entities:  

Mesh:

Year:  2021        PMID: 33929889      PMCID: PMC8462655          DOI: 10.1200/CCI.20.00165

Source DB:  PubMed          Journal:  JCO Clin Cancer Inform        ISSN: 2473-4276


  13 in total

1.  A hybrid approach to identify subsequent breast cancer using pathology and automated health information data.

Authors:  Reina Haque; Jiaxiao Shi; Joanne E Schottinger; Syed Ajaz Ahmed; Joanie Chung; Chantal Avila; Valerie S Lee; Thomas Craig Cheetham; Laurel A Habel; Suzanne W Fletcher; Marilyn L Kwan
Journal:  Med Care       Date:  2015-04       Impact factor: 2.983

2.  Oncoshare: lessons learned from building an integrated multi-institutional database for comparative effectiveness research.

Authors:  Susan C Weber; Tina Seto; Cliff Olson; Pragati Kenkare; Allison W Kurian; Amar K Das
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

3.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence.

Authors:  David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova
Journal:  Am J Epidemiol       Date:  2014-01-30       Impact factor: 4.897

4.  Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer.

Authors:  Jessica Chubak; Onchee Yu; Gaia Pocobelli; Lois Lamerato; Joe Webster; Marianne N Prout; Marianne Ulcickas Yood; William E Barlow; Diana S M Buist
Journal:  J Natl Cancer Inst       Date:  2012-04-30       Impact factor: 13.506

Review 5.  Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.

Authors:  Lionel T E Cheng; Jiaping Zheng; Guergana K Savova; Bradley J Erickson
Journal:  J Digit Imaging       Date:  2009-05-30       Impact factor: 4.056

6.  Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer.

Authors:  Imon Banerjee; Selen Bozkurt; Jennifer Lee Caswell-Jin; Allison W Kurian; Daniel L Rubin
Journal:  JCO Clin Cancer Inform       Date:  2019-10

7.  Measuring disease-free survival and cancer relapse using Medicare claims from CALGB breast cancer trial participants (companion to 9344).

Authors:  Elizabeth B Lamont; James E Herndon; Jane C Weeks; I Craig Henderson; Craig C Earle; Richard L Schilsky; Nicholas A Christakis
Journal:  J Natl Cancer Inst       Date:  2006-09-20       Impact factor: 13.506

8.  Identifying Metastases-related Information from Pathology Reports of Lung Cancer Patients.

Authors:  Ergin Soysal; Jeremy L Warner; Joshua C Denny; Hua Xu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2017-07-26

9.  Feasibility of Using Real-World Data to Replicate Clinical Trial Evidence.

Authors:  Victoria L Bartlett; Sanket S Dhruva; Nilay D Shah; Patrick Ryan; Joseph S Ross
Journal:  JAMA Netw Open       Date:  2019-10-02

10.  Validation of International Classification of Diseases coding for bone metastases in electronic health records using technology-enabled abstraction.

Authors:  Alexander Liede; Rohini K Hernandez; Maayan Roth; Geoffrey Calkins; Katherine Larrabee; Leo Nicacio
Journal:  Clin Epidemiol       Date:  2015-11-11       Impact factor: 4.790

View more
  1 in total

1.  Artificial intelligence in clinical research of cancers.

Authors:  Dan Shao; Yinfei Dai; Nianfeng Li; Xuqing Cao; Wei Zhao; Li Cheng; Zhuqing Rong; Lan Huang; Yan Wang; Jing Zhao
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.